Word Cloud in R

Feb 26 2016 R plot

The wordcloud package is available on CRAN.

Fake words

I create fake words to see a bit how the command is working.

library(wordcloud)
createWords <- function(w.l = 3) paste(sample(letters, 
    w.l, TRUE), collapse = "")
words = sapply(1:200, function(e) createWords(runif(1, 
    3, 10)))
freq = c(sample(1:30, 190, T), sample(30:150, 10, T))
freq = freq/sum(freq)
wordcloud(words, freq)

## Big words in the center
wordcloud(words, freq, random.order = FALSE)

## Max word number
wordcloud(words, freq, max.words = 50)

## Rotation: proportion of 90 degree
wordcloud(words, freq, rot.per = 0.01)

## Color the most frequent words
wordcloud(words, freq, colors = c("black", "blue", 
    "red"), random.order = FALSE)

## Color for each word
wordcloud(words, freq, colors = sample(c("black", "blue", 
    "red"), length(words), TRUE), random.order = FALSE, 
    ordered.colors = TRUE)

Command history

I retrieved the commands from my .bash_history files (laptop and HPCs) and I want to make a word cloud showing the commands I use the most.

library(dplyr)
library(RColorBrewer)
cmds = read.table("../../data/bash-commands.tsv.gz", 
    as.is = TRUE)
colnames(cmds) = c("cmd", "machine")

Laptop

By default the maximum history size was set to 500 commands so I don’t have the full set of commands, just the last 500. (I increased the limit, see you in 10,000 commands.)

cmds.s = cmds %>% filter(machine == "laptop", !grepl("=", 
    cmd), !grepl("\\.", cmd), !grepl("/", cmd)) %>% 
    group_by(cmd) %>% summarize(n = n()) %>% mutate(freq = n/sum(n))
wordcloud(cmds.s$cmd, cmds.s$freq, colors = c("black", 
    brewer.pal(8, "Set1")), random.order = FALSE, scale = c(10, 
    1), title = "All")

HPC

cmds.s = cmds %>% filter(machine != "laptop", !grepl("=", 
    cmd), !grepl("\\.", cmd), !grepl("/", cmd), !grepl("\\$", 
    cmd), !grepl("\\:", cmd)) %>% group_by(cmd) %>% 
    summarize(n = n()) %>% mutate(freq = n/sum(n))
wordcloud(cmds.s$cmd, cmds.s$freq, colors = c("black", 
    brewer.pal(8, "Set1")), random.order = FALSE, scale = c(10, 
    1), title = "All", min.freq = 1)

Good see the usual suspects ls and cd and their “typo” versions ;s/ks/ld and xs.

Hippocamplus My Second Memory

Word Cloud in R

Fake words

Command history

Laptop

HPC