R quarantine house

So I found this funny tweet

And Tyler Morgan made the “joke” to check the dependencies. So, let’s check them:

List libraries

First we set up the original choices:

env1 <- c("ggplot2", "dplyr", "data.table", "purrr")
env2 <- c("forecats", "glue", "jsonlite", "rmarkdown")
env3 <- c("shiny", "rayshader", "stringr", "tidytext")
env4 <- c("devtools", "xml2", "tidyr", "tibble")
env5 <- c("reticulate", "keras", "plumber", "usethis")
env6 <- c("blogdown", "brickr", "lubridate", "igraph")
quarantines <- list(env1 = env1, env2 = env2, 
                    env3 = env3, env4 = env4, 
                    env5 = env5, env6 = env6)

Dependencies

All of them are on CRAN (and I don’t have them installed on my computer) so let’s retrieve the available packages from CRAN. Then we can check how many unique packages are needed for each one:

library("tools")
ap <- available.packages()
unique_dep <- function(sets, db) {
  pd <- package_dependencies(packages = sets, recursive = TRUE, db = db)
  unique(unlist(pd))
}

uniq_p <- lapply(quarantines, unique_dep, db = ap)
sort(lengths(uniq_p))
## env2 env1 env5 env4 env6 env3 
##   22   57   63   83   91  104

So the environment with more dependencies is the third and the second is the one with least dependencies.

Similarity of the environments

We’ve seen that the number of package is quite different. But how many of them is shared? A little time ago I wrote a package aimed to this: {BioCor} you can install it from Bioconductor. I’ll use it now:

library("BioCor")
similarity <- mpathSim(names(uniq_p), inverseList(uniq_p), method = NULL)
similarity
##           env1      env2      env3      env4      env5      env6
## env1 1.0000000 0.2531646 0.5838509 0.5857143 0.6000000 0.7702703
## env2 0.2531646 1.0000000 0.3333333 0.3809524 0.3294118 0.3893805
## env3 0.5838509 0.3333333 1.0000000 0.7058824 0.6826347 0.7692308
## env4 0.5857143 0.3809524 0.7058824 1.0000000 0.6986301 0.6551724
## env5 0.6000000 0.3294118 0.6826347 0.6986301 1.0000000 0.5844156
## env6 0.7702703 0.3893805 0.7692308 0.6551724 0.5844156 1.0000000

The closer to 1 it means that they share more dependencies, so the most different are the environment 1 and the environment 2 We can see that the most similar packages are the environment 1 and environment 6 and that the environment 6 is the one with higher similarity to the other sets.

Which quarantine environment has some of the others?

So some of these environments call other packages from the other environments as dependencies. We can now look for how many of them:

inside_calls <- lapply(uniq_p, function(x, y) {
  # Look how many packages of each set is on the dependencies of this set
  vapply(y, function(z, x) { 
    sum(z %in% x)
  }, x = x, numeric(1L))
}, y = quarantines)
# Simplify and name for easier understanding
inside <- simplify2array(inside_calls)
names(dimnames(inside)) <- list("Package of", "Inside of")
inside
##           Inside of
## Package of env1 env2 env3 env4 env5 env6
##       env1    1    0    2    2    1    3
##       env2    1    2    2    2    2    3
##       env3    0    1    2    1    0    2
##       env4    1    0    1    2    1    2
##       env5    0    0    1    1    1    0
##       env6    0    0    0    0    0    0
colSums(inside)-diag(inside) # To avoid counting self-dependencies
## env1 env2 env3 env4 env5 env6 
##    2    1    6    6    4   10

We can see that environment 6 has more packages from the other environments.

Chances of survival:

Someone mentioned that the {survival} package wasn’t on any environment. But it might be on the dependencies:

vapply(uniq_p, function(x){"survival" %in% x},  logical(1L))
##  env1  env2  env3  env4  env5  env6 
## FALSE FALSE FALSE FALSE FALSE FALSE

No, it seems like we won’t survive well with this environments :)

Conclusions

Environment 6 is the one with more packages from the other environments, but if you want to have more packages use the second one. What you can do with these packages on a quarantine is harder to say :D

Reproducibility

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.6.1 (2019-07-05)
##  os       Ubuntu 19.10                
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Europe/Madrid               
##  date     2020-04-27                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package       * version  date       lib source        
##  annotate        1.64.0   2019-10-29 [1] Bioconductor  
##  AnnotationDbi   1.48.0   2019-10-29 [1] Bioconductor  
##  assertthat      0.2.1    2019-03-21 [1] CRAN (R 3.6.0)
##  Biobase         2.46.0   2019-10-29 [1] Bioconductor  
##  BiocGenerics    0.32.0   2019-10-29 [1] Bioconductor  
##  BioCor        * 1.11.2   2020-02-03 [1] Bioconductor  
##  BiocParallel    1.20.1   2019-12-21 [1] Bioconductor  
##  bit             1.1-15.2 2020-02-10 [1] CRAN (R 3.6.1)
##  bit64           0.9-7    2017-05-08 [1] CRAN (R 3.6.0)
##  bitops          1.0-6    2013-08-17 [1] CRAN (R 3.6.0)
##  blob            1.2.1    2020-01-20 [1] CRAN (R 3.6.1)
##  blogdown        0.18     2020-03-04 [1] CRAN (R 3.6.1)
##  bookdown        0.18     2020-03-05 [1] CRAN (R 3.6.1)
##  cli             2.0.2    2020-02-28 [1] CRAN (R 3.6.1)
##  crayon          1.3.4    2017-09-16 [1] CRAN (R 3.6.0)
##  DBI             1.1.0    2019-12-15 [1] CRAN (R 3.6.1)
##  digest          0.6.25   2020-02-23 [1] CRAN (R 3.6.1)
##  evaluate        0.14     2019-05-28 [1] CRAN (R 3.6.0)
##  fansi           0.4.1    2020-01-08 [1] CRAN (R 3.6.1)
##  glue            1.4.0    2020-04-03 [1] CRAN (R 3.6.1)
##  graph           1.64.0   2019-10-29 [1] Bioconductor  
##  GSEABase        1.48.0   2019-10-29 [1] Bioconductor  
##  htmltools       0.4.0    2019-10-04 [1] CRAN (R 3.6.1)
##  IRanges         2.20.2   2020-01-13 [1] Bioconductor  
##  knitr           1.28     2020-02-06 [1] CRAN (R 3.6.1)
##  lattice         0.20-41  2020-04-02 [1] CRAN (R 3.6.1)
##  magrittr        1.5      2014-11-22 [1] CRAN (R 3.6.0)
##  Matrix          1.2-18   2019-11-27 [1] CRAN (R 3.6.1)
##  memoise         1.1.0    2017-04-21 [1] CRAN (R 3.6.0)
##  Rcpp            1.0.4.6  2020-04-09 [1] CRAN (R 3.6.1)
##  RCurl           1.98-1.2 2020-04-18 [1] CRAN (R 3.6.1)
##  rlang           0.4.5    2020-03-01 [1] CRAN (R 3.6.1)
##  rmarkdown       2.1      2020-01-20 [1] CRAN (R 3.6.1)
##  RSQLite         2.2.0    2020-01-07 [1] CRAN (R 3.6.1)
##  S4Vectors       0.24.4   2020-04-09 [1] Bioconductor  
##  sessioninfo     1.1.1    2018-11-05 [1] CRAN (R 3.6.0)
##  stringi         1.4.6    2020-02-17 [1] CRAN (R 3.6.1)
##  stringr         1.4.0    2019-02-10 [1] CRAN (R 3.6.0)
##  vctrs           0.2.4    2020-03-10 [1] CRAN (R 3.6.1)
##  withr           2.2.0    2020-04-20 [1] CRAN (R 3.6.1)
##  xfun            0.13     2020-04-13 [1] CRAN (R 3.6.1)
##  XML             3.99-0.3 2020-01-20 [1] CRAN (R 3.6.1)
##  xtable          1.8-4    2019-04-21 [1] CRAN (R 3.6.0)
##  yaml            2.2.1    2020-02-01 [1] CRAN (R 3.6.1)
## 
## [1] /home/lluis/R/x86_64-pc-linux-gnu-library/3.6
## [2] /usr/local/lib/R/site-library
## [3] /usr/lib/R/site-library
## [4] /usr/lib/R/library
Avatar
Lluís Revilla Sancho
Bioinformatician

Bioinformatician with interests in functional enrichment, data integration and transcriptomics.

Related