Social activities on GitHub

On my last post I explored the Bioconductor submissions using {gh} to retrieve some data. After some feedback from the Bioconductor community I realized I should download other kind of data to improve my analysis on the reviews.

To make this I developed a new package to retrieve information from GitHub.

socialGH

This package based on {gh}, allows to retrieve, data from Github.

You can install it with

remotes::install_github("llrs/socialGH")

Basically pulls the data in list format and transforms it into a data.frame in order to be able to do analysis, filter it or analyze it.

library("socialGH")
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.4     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

It allows to selective download comments, pull requests, issues, events, labels and the timeline of an issue.

With the issues we can see the labels, how many coments and many information:

issues_blog <- get_issues("llrs/blogR")
dim(issues_blog)
## [1] 46 15
colnames(issues_blog)
##  [1] "assignees"   "assignee"    "label"       "state"       "locked"     
##  [6] "milestone"   "n_comments"  "title"       "created"     "updated"    
## [11] "association" "text"        "id"          "closer"      "poster"
# Labels used
issues_blog %>% 
  pull(label) %>% 
  unlist(FALSE, FALSE) %>% 
  table()
## .
##  b101nfo.blogspot.com          Bioconductor                config 
##                     1                     4                     1 
##                  CRAN             goverment           help wanted 
##                     3                     1                     1 
##               invalid               package                  Post 
##                     1                     2                    27 
##              question              rOpenSci todo 🗒 
##                     2                     1                     4 
##               website 
##                     1
count(issues_blog, state)
##    state  n
## 1 closed 27
## 2   open 19
count(issues_blog, n_comments)
##   n_comments  n
## 1          0 24
## 2          1 15
## 3          2  2
## 4          3  4
## 5          5  1

However, it doesn’t retrieve each comment of an issue.

# Issues with comments
issues_blog %>% 
  filter(n_comments > 0) %>% 
  pull(id)
##  [1] 46 42 40 39 36 34 33 29 28 26 25 23 16 10  9  8  7  6  5  4  3  2

comments <- get_comments("llrs/blogR")
dim(comments)
## [1] 36  6
colnames(comments)
## [1] "text"        "created"     "updated"     "association" "id"         
## [6] "commenter"
count(comments, association)
##   association  n
## 1       OWNER 36

We can see that I was the only one writing on the issues and we already retrieved the text of the comments.

We can also look for events on issues:

events <- get_events("llrs/blogR", 23)
count(events, event)
##     event n
## 1 labeled 1
## 2  closed 1

On all the functions you can provide a number of the issue and you’ll retrieve the information just for that issue. If you don’t provide an issue it will search the whole repository:

events <- get_events("llrs/blogR")
count(events, event)
##                 event  n
## 1             labeled 50
## 2              closed 27
## 3             renamed  4
## 4 marked_as_duplicate  1
## 5            assigned  6
## 6          subscribed  3
## 7           mentioned  3
## 8           unlabeled  1

However it is better if we look to the timeline of an issue:, which downloads each comment of the issues.

gt <- get_timelines("llrs/blogR", 23)
## Warning: This is under preview and may fail.
gt[, c("label", "event", "created", "association", "actor")]
##   label     event             created association             actor
## 1    NA commented 2020-02-14 00:39:47       OWNER llrs, User, FALSE
## 2    NA commented 2020-02-14 09:44:26       OWNER llrs, User, FALSE
## 3  Post   labeled 2020-02-18 10:10:35        <NA> llrs, User, FALSE
## 4    NA commented 2020-02-29 17:58:51       OWNER llrs, User, FALSE
## 5    NA    closed 2020-02-29 17:58:51        <NA> llrs, User, FALSE

With timeline we don’t get the initial information of when the issue was created and we’ll need to call get_issue("llrs/blogR", 23) to know that. Here I did omit the text of the comment to make it readable, but we can see what has been happening and by who or who is affecting.

Learning

Developing this package I learned more about the {gh} package (In the previous blog I wrote manually the calls to different pages, which later on I discovered it is automatically handled by {gh}). And learned that the different accept headers have influenced on the total information returned (and that you cannot pass several accept headers at the same time).
Hope to learn more about the R community that is using Github as a way to help each other, improve packages and process.

Reproducibility

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.1 (2020-06-06)
##  os       Ubuntu 20.04.1 LTS          
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Europe/Madrid               
##  date     2021-01-08                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package     * version date       lib source                           
##  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.1)                   
##  backports     1.2.1   2020-12-09 [1] CRAN (R 4.0.1)                   
##  blogdown      0.21.84 2021-01-07 [1] Github (rstudio/blogdown@c4fbb58)
##  bookdown      0.21    2020-10-13 [1] CRAN (R 4.0.1)                   
##  broom         0.7.3   2020-12-16 [1] CRAN (R 4.0.1)                   
##  cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.0.1)                   
##  cli           2.2.0   2020-11-20 [1] CRAN (R 4.0.1)                   
##  colorspace    2.0-0   2020-11-11 [1] CRAN (R 4.0.1)                   
##  crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.1)                   
##  curl          4.3     2019-12-02 [1] CRAN (R 4.0.1)                   
##  DBI           1.1.0   2019-12-15 [1] CRAN (R 4.0.1)                   
##  dbplyr        2.0.0   2020-11-03 [1] CRAN (R 4.0.1)                   
##  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.1)                   
##  dplyr       * 1.0.2   2020-08-18 [1] CRAN (R 4.0.1)                   
##  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.1)                   
##  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.1)                   
##  fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.1)                   
##  forcats     * 0.5.0   2020-03-01 [1] CRAN (R 4.0.1)                   
##  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.1)                   
##  generics      0.1.0   2020-10-31 [1] CRAN (R 4.0.1)                   
##  ggplot2     * 3.3.3   2020-12-30 [1] CRAN (R 4.0.1)                   
##  gh            1.2.0   2020-11-27 [1] CRAN (R 4.0.1)                   
##  gitcreds      0.1.1   2020-12-04 [1] CRAN (R 4.0.1)                   
##  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.1)                   
##  gtable        0.3.0   2019-03-25 [1] CRAN (R 4.0.1)                   
##  haven         2.3.1   2020-06-01 [1] CRAN (R 4.0.1)                   
##  hms           0.5.3   2020-01-08 [1] CRAN (R 4.0.1)                   
##  htmltools     0.5.0   2020-06-16 [1] CRAN (R 4.0.1)                   
##  httr          1.4.2   2020-07-20 [1] CRAN (R 4.0.1)                   
##  jsonlite      1.7.2   2020-12-09 [1] CRAN (R 4.0.1)                   
##  knitr         1.30    2020-09-22 [1] CRAN (R 4.0.1)                   
##  lifecycle     0.2.0   2020-03-06 [1] CRAN (R 4.0.1)                   
##  lubridate     1.7.9.2 2020-11-13 [1] CRAN (R 4.0.1)                   
##  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.1)                   
##  modelr        0.1.8   2020-05-19 [1] CRAN (R 4.0.1)                   
##  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.0.1)                   
##  pillar        1.4.7   2020-11-20 [1] CRAN (R 4.0.1)                   
##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.1)                   
##  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.0.1)                   
##  R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.1)                   
##  Rcpp          1.0.5   2020-07-06 [1] CRAN (R 4.0.1)                   
##  readr       * 1.4.0   2020-10-05 [1] CRAN (R 4.0.1)                   
##  readxl        1.3.1   2019-03-13 [1] CRAN (R 4.0.1)                   
##  reprex        0.3.0   2019-05-16 [1] CRAN (R 4.0.1)                   
##  rlang         0.4.10  2020-12-30 [1] CRAN (R 4.0.1)                   
##  rmarkdown     2.6     2020-12-14 [1] CRAN (R 4.0.1)                   
##  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.0.1)                   
##  rvest         0.3.6   2020-07-25 [1] CRAN (R 4.0.1)                   
##  scales        1.1.1   2020-05-11 [1] CRAN (R 4.0.1)                   
##  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.1)                   
##  socialGH    * 0.0.3   2020-08-17 [1] local                            
##  stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.1)                   
##  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.0.1)                   
##  tibble      * 3.0.4   2020-10-12 [1] CRAN (R 4.0.1)                   
##  tidyr       * 1.1.2   2020-08-27 [1] CRAN (R 4.0.1)                   
##  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.1)                   
##  tidyverse   * 1.3.0   2019-11-21 [1] CRAN (R 4.0.1)                   
##  vctrs         0.3.6   2020-12-17 [1] CRAN (R 4.0.1)                   
##  withr         2.3.0   2020-09-22 [1] CRAN (R 4.0.1)                   
##  xfun          0.20    2021-01-06 [1] CRAN (R 4.0.1)                   
##  xml2          1.3.2   2020-04-23 [1] CRAN (R 4.0.1)                   
##  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.1)                   
## 
## [1] /home/lluis/bin/R/4.0.1/lib/R/library

Edit this page

Avatar
Lluís Revilla Sancho
Bioinformatician

Bioinformatician with interests in software quality, mostly R.

Related