Models

Introduction

I have learned a lot about linear models through carefully reading limma’s vignette. However from time to time I find that I cannnot do the model of the effects as I wish to compare the samples as I need. Here I describe some pitfalls I stumbled on and how to overcome them:

Complex comparisons problems

One of the main problems I have is having multiple factors that can be two options which result in several comparisons. The researcher usually wants to know the differences between all the combinations which leads to a complex design as such:

type <- c("A", "A", "A", "B", "B", "B", "A", "B", "A", "B")
origin <- c("O", "P", "O", "P", "O", "P", "O", "P", "O", "O")
groups <- as.factor(paste(type, origin, sep = "_"))
m <- model.matrix(~0 + groups, data = groups)
colnames(m) <- levels(groups)
m
##    A_O A_P B_O B_P
## 1    1   0   0   0
## 2    0   1   0   0
## 3    1   0   0   0
## 4    0   0   0   1
## 5    0   0   1   0
## 6    0   0   0   1
## 7    1   0   0   0
## 8    0   0   0   1
## 9    1   0   0   0
## 10   0   0   1   0
## attr(,"assign")
## [1] 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$groups
## [1] "contr.treatment"

Here we build a design factor where each colum is the combination of the two factors we have type and origin. This model includes the interaction effect between the two variables type and group. The idea is now to compare between the samples A_O and A_P and A_O and B_O and so on.

We can build our contrast and perform our comparisons:

library("limma")
contr <- makeContrasts(O_vs_P = A_O+ B_O - A_P-B_P,
              A_vs_P = A_O+A_P-B_O-B_P,
              A__O_vs_P = A_O - A_P,
              B__O_vs_P = B_O - B_P,
              levels = colnames(m))
contr
##       Contrasts
## Levels O_vs_P A_vs_P A__O_vs_P B__O_vs_P
##    A_O      1      1         1         0
##    A_P     -1      1        -1         0
##    B_O      1     -1         0         1
##    B_P     -1     -1         0        -1

We can see which samples do we pick in each comparison with:

pick <- m %*% contr
pick
##     Contrasts
##      O_vs_P A_vs_P A__O_vs_P B__O_vs_P
##   1       1      1         1         0
##   2      -1      1        -1         0
##   3       1      1         1         0
##   4      -1     -1         0        -1
##   5       1     -1         0         1
##   6      -1     -1         0        -1
##   7       1      1         1         0
##   8      -1     -1         0        -1
##   9       1      1         1         0
##   10      1     -1         0         1

And we could continue to analys our expression data with this contrasts. But we would get an incorrect result whatever our data is. The problem is that we didn’t realize that we have some coefficients estimated with a single sample.

We can easily see it with:

colSums(m)
## A_O A_P B_O B_P 
##   4   1   2   3
unic_coef <- colSums(m)

This mean that we cannot use this combination in a comparison The solution is to remove those coefficients from the model matrix and add a interception:

m1 <- m[, unic_coef != 1]
m1 <- cbind(Intercept = 1, m1)

Now we can compare less but with accurate results:

contr <- makeContrasts(O_vs_P = A_O+ B_O - B_P,
              A_vs_P = A_O-B_O-B_P,
              B__O_vs_P = B_O - B_P,
              levels = colnames(m1))
contr
##            Contrasts
## Levels      O_vs_P A_vs_P B__O_vs_P
##   Intercept      0      0         0
##   A_O            1      1         0
##   B_O            1     -1         1
##   B_P           -1     -1        -1

This models takes into account all samples to estimate the “natural” variation of the genes, while comparing only those samples that are interesting to the researcher.

Conclusions

Check how many samples do you have for each coefficient, or combination of factors you want to compare. You should have at least two samples per combination. If you have some samples that you sequenced that do not belong to any group, create the intercept group, this way you will still use them but not directly in a comparisons.

References

Reproducibility

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.1 (2020-06-06)
##  os       Ubuntu 20.04.1 LTS          
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Europe/Madrid               
##  date     2021-01-08                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package       * version date       lib source                           
##  assertthat      0.2.1   2019-03-21 [1] CRAN (R 4.0.1)                   
##  blogdown        0.21.84 2021-01-07 [1] Github (rstudio/blogdown@c4fbb58)
##  bookdown        0.21    2020-10-13 [1] CRAN (R 4.0.1)                   
##  cli             2.2.0   2020-11-20 [1] CRAN (R 4.0.1)                   
##  crayon          1.3.4   2017-09-16 [1] CRAN (R 4.0.1)                   
##  digest          0.6.27  2020-10-24 [1] CRAN (R 4.0.1)                   
##  evaluate        0.14    2019-05-28 [1] CRAN (R 4.0.1)                   
##  fansi           0.4.1   2020-01-08 [1] CRAN (R 4.0.1)                   
##  generics        0.1.0   2020-10-31 [1] CRAN (R 4.0.1)                   
##  glue            1.4.2   2020-08-27 [1] CRAN (R 4.0.1)                   
##  htmltools       0.5.0   2020-06-16 [1] CRAN (R 4.0.1)                   
##  httr            1.4.2   2020-07-20 [1] CRAN (R 4.0.1)                   
##  jsonlite        1.7.2   2020-12-09 [1] CRAN (R 4.0.1)                   
##  knitcitations * 1.0.10  2019-09-15 [1] CRAN (R 4.0.1)                   
##  knitr           1.30    2020-09-22 [1] CRAN (R 4.0.1)                   
##  limma         * 3.46.0  2020-10-27 [1] Bioconductor                     
##  lubridate       1.7.9.2 2020-11-13 [1] CRAN (R 4.0.1)                   
##  magrittr        2.0.1   2020-11-17 [1] CRAN (R 4.0.1)                   
##  plyr            1.8.6   2020-03-03 [1] CRAN (R 4.0.1)                   
##  R6              2.5.0   2020-10-28 [1] CRAN (R 4.0.1)                   
##  Rcpp            1.0.5   2020-07-06 [1] CRAN (R 4.0.1)                   
##  RefManageR      1.3.0   2020-11-13 [1] CRAN (R 4.0.1)                   
##  rlang           0.4.10  2020-12-30 [1] CRAN (R 4.0.1)                   
##  rmarkdown       2.6     2020-12-14 [1] CRAN (R 4.0.1)                   
##  sessioninfo     1.1.1   2018-11-05 [1] CRAN (R 4.0.1)                   
##  stringi         1.5.3   2020-09-09 [1] CRAN (R 4.0.1)                   
##  stringr         1.4.0   2019-02-10 [1] CRAN (R 4.0.1)                   
##  withr           2.3.0   2020-09-22 [1] CRAN (R 4.0.1)                   
##  xfun            0.20    2021-01-06 [1] CRAN (R 4.0.1)                   
##  xml2            1.3.2   2020-04-23 [1] CRAN (R 4.0.1)                   
##  yaml            2.2.1   2020-02-01 [1] CRAN (R 4.0.1)                   
## 
## [1] /home/lluis/bin/R/4.0.1/lib/R/library
Avatar
Lluís Revilla Sancho
Bioinformatician

Bioinformatician with interests in functional enrichment, data integration and transcriptomics.