Fast correlations
Correlations
One of the few methods that are commonly done are correlations. I found several implementations of Pearson correlations, and I was curious to know if which is the fastest one.
Implementations
So far the implementations I found are:
stats::cor
WGCNA::cor
miRcomb::cor
coop::pcor
HiClimR::fastCor
Most of them are on CRAN and one is at github.
Dependencies
If we are only interested on the correlation most of the dependencies that a package would bring aren’t relevant. So how many dependencies have each of those packages?
ip <- installed.packages()
dp <- tools::package_dependencies(c("stats", "WGCNA", "miRcomb", "coop", "HiClimR"),
db = ip, which = "Imports")
sort(lengths(dp), decreasing = TRUE)
## WGCNA HiClimR stats coop miRcomb
## 15 5 3 0 0
Speed
library("bench")
x <- runif(50)
y <- runif(50)
stats_cor <- function() {
stats::cor(x, y, method = "pearson")
}
WGCNA_cor <- function() {
WGCNA::cor(x, y, method = "pearson")[1, 1]
}
coop_cor <- function() {
coop::pcor(x, y)
}
HiClimR_cor <- function() {
HiClimR::fastCor(matrix(c(x, y), nrow = 50, ncol = 2), upperTri = FALSE, verbose = FALSE)[2, 1]
}
bm <- mark(stats_cor(), WGCNA_cor(), coop_cor(), HiClimR_cor(), iterations = 10000)
##
bm
## # A tibble: 4 x 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 stats_cor() 30.1µs 32.4µs 24351. 88.6KB 4.87
## 2 WGCNA_cor() 66.3µs 83.7µs 7315. 207.3MB 1.46
## 3 coop_cor() 31.5µs 33.2µs 19694. 82.7KB 1.97
## 4 HiClimR_cor() 115.6µs 126.9µs 5425. 171.1KB 2.17
plot(bm) + ggplot2::theme_bw()
## Loading required namespace: tidyr
So we can see that for this basic comparison stats is king followed by coop, WGCNA, and HiClimR.
It might be that these other functions are optimized for matrices so lets see it again
x <- matrix(runif(100), ncol = 10, nrow = 10)
y <- matrix(runif(100), ncol = 10, nrow = 10)
stats_cor2 <- function() {
stats::cor(x, x, method = "pearson")
}
WGCNA_cor2 <- function() {
WGCNA::cor(x, x, method = "pearson")
}
coop_cor2 <- function() {
coop::pcor(x)
}
HiClimR_cor2 <- function() {
HiClimR::fastCor(x, upperTri = FALSE, verbose = FALSE)
}
bm2 <- mark(stats_cor2(), WGCNA_cor2(), coop_cor2(), HiClimR_cor2(), iterations = 10000)
bm2
## # A tibble: 4 x 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 stats_cor2() 33.4µs 35.4µs 25493. 848B 2.55
## 2 WGCNA_cor2() 64.2µs 69.4µs 9988. 3.36KB 2.00
## 3 coop_cor2() 30.9µs 33.8µs 18887. 49.8KB 1.89
## 4 HiClimR_cor2() 193.3µs 214.1µs 3466. 5.02KB 2.43
plot(bm2) + ggplot2::theme_bw()
Here we can see that coop takes the lead and stats is the second fastest followed by WGCNA and HiClimR.
References
Reproducibility
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
## setting value
## version R version 4.0.1 (2020-06-06)
## os Ubuntu 20.04.1 LTS
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Europe/Madrid
## date 2021-01-10
##
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
## package * version date lib source
## AnnotationDbi 1.52.0 2020-10-27 [1] Bioconductor
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.1)
## backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.1)
## base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.0.1)
## beeswarm 0.2.3 2016-04-25 [1] CRAN (R 4.0.1)
## bench * 1.1.1 2020-01-13 [1] CRAN (R 4.0.1)
## Biobase 2.50.0 2020-10-27 [1] Bioconductor
## BiocGenerics 0.36.0 2020-10-27 [1] Bioconductor
## bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.1)
## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.1)
## blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.1)
## blogdown 1.0.1 2021-01-10 [1] Github (rstudio/blogdown@0f7f73f)
## bookdown 0.21 2020-10-13 [1] CRAN (R 4.0.1)
## checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.0.1)
## cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.1)
## cluster 2.1.0 2019-06-19 [1] CRAN (R 4.0.1)
## codetools 0.2-18 2020-11-04 [1] CRAN (R 4.0.1)
## colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.1)
## coop 0.6-2 2019-04-22 [1] CRAN (R 4.0.1)
## crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.1)
## data.table 1.13.6 2020-12-30 [1] CRAN (R 4.0.1)
## DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.1)
## digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.1)
## doParallel 1.0.16 2020-10-16 [1] CRAN (R 4.0.1)
## dplyr 1.0.2 2020-08-18 [1] CRAN (R 4.0.1)
## dynamicTreeCut 1.63-1 2016-03-11 [1] CRAN (R 4.0.1)
## ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.1)
## evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
## fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.1)
## farver 2.0.3 2020-01-16 [1] CRAN (R 4.0.1)
## fastcluster 1.1.25 2018-06-07 [1] CRAN (R 4.0.1)
## foreach 1.5.1 2020-10-15 [1] CRAN (R 4.0.1)
## foreign 0.8-80 2020-05-24 [1] CRAN (R 4.0.1)
## Formula 1.2-4 2020-10-16 [1] CRAN (R 4.0.1)
## generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.1)
## ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.0.1)
## ggplot2 3.3.3 2020-12-30 [1] CRAN (R 4.0.1)
## glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.1)
## GO.db 3.12.1 2020-12-17 [1] Bioconductor
## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.1)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.1)
## HiClimR 2.1.8 2021-01-05 [1] CRAN (R 4.0.1)
## Hmisc 4.4-2 2020-11-29 [1] CRAN (R 4.0.1)
## htmlTable 2.1.0 2020-09-16 [1] CRAN (R 4.0.1)
## htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.1)
## htmlwidgets 1.5.3 2020-12-10 [1] CRAN (R 4.0.1)
## httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.1)
## impute 1.64.0 2020-10-27 [1] Bioconductor
## IRanges 2.24.1 2020-12-12 [1] Bioconductor
## iterators 1.0.13 2020-10-15 [1] CRAN (R 4.0.1)
## jpeg 0.1-8.1 2019-10-24 [1] CRAN (R 4.0.1)
## jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.1)
## knitcitations * 1.0.10 2019-09-15 [1] CRAN (R 4.0.1)
## knitr 1.30 2020-09-22 [1] CRAN (R 4.0.1)
## lattice 0.20-41 2020-04-02 [1] CRAN (R 4.0.1)
## latticeExtra 0.6-29 2019-12-19 [1] CRAN (R 4.0.1)
## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.1)
## lubridate 1.7.9.2 2020-11-13 [1] CRAN (R 4.0.1)
## magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.1)
## Matrix 1.3-2 2021-01-06 [1] CRAN (R 4.0.1)
## matrixStats 0.57.0 2020-09-25 [1] CRAN (R 4.0.1)
## memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.1)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.1)
## ncdf4 1.17 2019-10-23 [1] CRAN (R 4.0.1)
## nnet 7.3-14 2020-04-26 [1] CRAN (R 4.0.1)
## pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.1)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.1)
## plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.1)
## png 0.1-7 2013-12-03 [1] CRAN (R 4.0.1)
## preprocessCore 1.52.0 2020-10-27 [1] Bioconductor
## profmem 0.6.0 2020-12-13 [1] CRAN (R 4.0.1)
## purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.1)
## R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.1)
## RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.0.1)
## Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.1)
## RefManageR 1.3.0 2020-11-13 [1] CRAN (R 4.0.1)
## rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.1)
## rmarkdown 2.6 2020-12-14 [1] CRAN (R 4.0.1)
## rpart 4.1-15 2019-04-12 [1] CRAN (R 4.0.1)
## RSQLite 2.2.1 2020-09-30 [1] CRAN (R 4.0.1)
## rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.1)
## S4Vectors 0.28.1 2020-12-09 [1] Bioconductor
## scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.1)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.1)
## stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.1)
## stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.1)
## survival 3.2-7 2020-09-28 [1] CRAN (R 4.0.1)
## tibble 3.0.4 2020-10-12 [1] CRAN (R 4.0.1)
## tidyr 1.1.2 2020-08-27 [1] CRAN (R 4.0.1)
## tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.1)
## utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.1)
## vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.1)
## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.0.1)
## WGCNA 1.69 2020-02-28 [1] CRAN (R 4.0.1)
## withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.1)
## xfun 0.20 2021-01-06 [1] CRAN (R 4.0.1)
## xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.1)
## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.1)
##
## [1] /home/lluis/bin/R/4.0.1/lib/R/library