Bugs in R

Exploring the history of bugs in R

A bug

This post has a relatively long introduction, you can skip and go ahead to the analysis.

I knew about the r-devel (abbreviated Rd) mailing list that is where some discussions about the language happen and I had read that post of the R core requesting help reviewing bug reports and I had requested an account the same day it came out to be able to post on Bugzilla: https://bugs.r-project.org/. But I hadn’t reported any bug or anything, what had I to bring?

After RStudio 2021 I saw announced the https://r-devel.slack.com (you can join via this website) and when I joined I checked some bugs and I found something odd. This lead to the creation of my first R bug: 18055.

Then I thinking about the analysis about package reviews I realized it was analyzing issues. Then it came to me that I could actually also analyse the issues of R, aka bugs reports.

Collecting the data

First was step was collecting the data needed. Similar with the rOpenSci and Bioconductor analysis I knew I might need to create a package or a script just to retrieve the data.

I found somehow that it reports some data as xml and I though I could use that. But exploring the documentation I found it had an API that could be used to retrieve data. It required authentication to interact with the API. Instead of putting me off it made it a reasonable challenge and progression. If previously I used the gh package to authenticate and retrieve the raw data this time I had to learn how to authenticate to an API. I already developed a package that uses a poorly explained API to retrieve documents and adding a new step to authenticate requests was a small enough step.

I also found a package, bugtractr, that did this already but I had some problems using it and didn’t use authenticated requests to get the data. This meant it couldn’t retrieve all data I wanted. So I went on developing my own package to interact with Bugzilla’s API.

As I was learning to interact with API and I wanted to make it useful for the R community I looked at how to make that. Lukyly for me at the time I was doing this, I found the book HTTP testing in R, which was still under heavy writing but almost complete, and started reading and using its helpful advice to learn. One of the first recommendations was to contact with the API providers, so I email the R core about my intentions.

They raised some concerns: - Impact on the load of the machine. - if the API is robust enough. - However, semi-automating report submissions needs more thought.

Yes, at that time I had realized that the API allows to submit bug reports on the database (comments and attachments too), so I thought it could be an easy way to help people submit more bugs: submitting bugs from R itself.

After some exchanges about why, and how I was trying to retrieve data from Bugzilla I was deferred Simon Urbanek.

By that time I already posted about this on R-devel and got some interest from R contributing working group to which I presented the idea on March 12 (One month after the first commit on the package).

It was suggested on that meeting to present a Google Summer of Code1 project which closed the project submission period shortly after. Soon two students contacted me and Heather Turner, who agreed to co-mentor the project, to write a proposal to work on my project.

By that time Simon kindly provided a database dump (without the user list) due to concerns on privacy and load on the server (which I found it can return different results on the same query), and provided the id of R core members.

The analysis

This is a mix of analysis for three several purposes:

  • To understand what is going on with bug reports.
  • To understand how to make better bug reports to help bug submissions via bugRzilla.
  • To help the R contributor working group and R forwards to identify contributors.
  • To help the R core team identifying possible areas of improvements.

It is based on the analysis by Piyush Kumar and myself. If you want code and explore the database follow these links and you’ll be able to download it.

A first exploration is to see the bug ids and the creation time of the bug id:

The first suprising thing are these three points that appear outside the line the other bugs form.

One of these outliers is a test when the bug reports moved from Jitterbug to Bugzilla as mentioned on the mailing list.

I don’t know the reason of the move, probably due to SPAM or interface improvements. If we look at those missing ids we can estimate the SPAM, note that some might be vulnerabilities on R (but I doubt that there are this much):

The first observable thing is the high number of ids missing on Jitterbug. I heard that this was an abuse of the site, which seems particularly bad arround two dates.

Later on when the system moved to Bugzilla there are much less missing ids, until one day there are around 120 missing ids and the date later to have an account it was required to send a message to the R core about it. Probably spamers abused the Bugzilla’s API. Is understandable that the R core team is concerned on receiving spam messages from those past experiences.

As the information from Jitterbug is both old and not as reach as those bugs on Bugzilla I will limit from now on to analyze those bugs reported on Bugzilla. In addition the number of bugs reported on Bugzilla is similar to the ones on Jitterbug:

Reported on Reports
Jitterbug 3594
Bugzilla 3448

There are some old bugs opened on Jitterbug modified on Bugzilla (428) and still not closed.

Openning bugs reports

If we focus only on bugs reported on Bugzilla we find the following number of bugs:

Most bugs are closed followed by unconfirmed:

Status Bug reports
CLOSED 2792
UNCONFIRMED 324
NEW 125
ASSIGNED 39
CONFIRMED 13
REOPENED 11
RESOLVED 9
VERIFIED 3

Do bugs reported have an attachment with a patch?

Attachment on opening Patch Bugs %
No No 380 43.18
No Yes 500 56.82
Yes No 438 61.60
Yes Yes 273 38.40

Many bug reports have attachments on opening, mostly they contain code to reproduce the problem. If bugs do not contain attachments and they receive an attachment it will usually be a patch (but it might not).

What happens after submitting a bug report?

One of the most common thing to happen is that someone might comment on the bug either to ask for clarification or to discuss the bug report and possible solutions:

Most common action is receiving a comment, whose author is added on the CC field.

The R core is very active answering on the bug reports, only it seems that those that are trivial do not receive a reply as often as the other categories.

If we look by component and OS there are some combinations that received few comments, mostly those that are wishes for R.

If we split them between peitions to improve R and actual bug reports we can see a different pattern:

Enhancements usually receive less comments from the R core. Wishlists receive more comments from the R core among the enhancements.

What about the comments made by the original poster? Do they comment when they receive some feedback from other users?

Responsive Bugs reports
no 409
yes 122

Most of them do not reply back when they receive a comment. There might be several causes, one of them is if their bug report is closed or assigned to an R core member.

handled bugs
no 177
yes 232

Which shows that the majority of those who do not respond is because either the bug report is closed (fixed or not) or a R core member is assigned (usually to himself).

Who is active ?

So far we have explored the activity of users who report a bug and the R core members that receive this. But there are some users that beyond this they also participate and collaborate with the R core. To make it more recent we will look at those users active on the bugs opened on the last three years.

ID Name All comments All attachments Comments Attachments Bugs opened Bugs interacted Actions
3299 Elin Waring 100 2 99 1 1 66 101
963 Suharto Anggono 115 39 55 7 21 42 83
274 Sebastian Meyer 80 24 49 11 18 58 78
430 Benjamin Tyner 59 4 52 4 5 32 61
3256 Michael Chirico 110 51 2 1 52 54 55
1044 Kevin Ushey 65 10 4 0 35 39 39
1036 Henrik Bengtsson 56 14 9 1 26 31 36
2307 Lionel Henry 41 25 6 2 20 22 28
1056 Bill Dunlap 28 0 1 0 21 22 22
2801 Bob Rudis 19 4 16 2 2 12 20
1299 Gabriel Becker 52 28 0 0 18 18 18
11 Ben Bolker 14 0 11 0 3 9 14
114 Gabor Csardi 26 8 4 0 10 11 14
610 Jeroen Ooms 15 4 6 1 7 11 14
1602 brodie.gaslam@ 53 16 0 0 13 13 13
6 Duncan Murdoch 22 7 0 0 12 12 12
921 Dirk Eddelbuettel 17 4 4 0 7 11 11
1715 Herv� Pag�s 24 1 5 1 5 7 11
2885 Jan Gorecki 14 1 1 0 9 10 10
3051 Xianying Tan 20 0 4 0 6 7 10
3228 Emil Bode 16 4 0 0 10 10 10
3344 Joe Cheng 7 3 7 3 0 1 10
2264 Neal Fultz 20 5 0 0 9 9 9
317 Mikko Korpela 9 0 5 0 3 6 8
847 Pavel N. Krivitsky 15 2 0 0 8 8 8
1251 Arni Magnusson 13 8 0 0 8 8 8
1849 Andre Mikulec 21 4 1 0 7 8 8
3330 André Gillibert 20 10 0 0 8 8 8
3376 Hangfan Zhang 14 3 0 0 8 8 8
2040 Bill Denney 16 1 0 0 7 7 7

(You can find a sortable anf filterable version of this table here). One of the top contributors was recently added as R core member. Other

Some contributors focus on providing patches, other’s open many bugs, other’s comment on the bugs to confirm bugs or provide context to the bugs.

Future of the report system ?

Last what is the progression of bugs

If we look only at bug reports and not enhancements every 1.52 day there is one bug report and one enhancement petitions per 4.44 days.

If we look only at bug reports and not enhancements there are 1.48 comments every day from R users, one comment on enhancement per 1.9 days.

Both of them has remained fairly constant over the years, at times a bit faster and sometimes slower.

Final comments

This is probably the post that took longer so far. I started on February to work on this blog post but until now I haven’t actually written the blog post I wanted.

Many thanks to Simon Urbanek for providing the database dump, without it would have been slower and harder if not impossible to do this analysis. Thanks to Heather Turner for encouraging me to do more on this project and providing valuable feedback about what kind of analysis could be useful and co-mentoring Pyush Kumar who I also like to thank the first analysis on the data and his contributions during GSoC. Many thanks to Gabe Becker and Michael Quirico for their feedback on the R contributors working group.

Now that the analysis is done, I want to finish the bugRzilla package (which I already started to use for small tasks on this analysis). I’m still working on it testing how is the best way to submit properly formatted bug reports with a developer instance set up by Simon. Then I’ll ask the R core if the way it submits bug reports works well for them.

TL:DR: There are many bug reported and handled by the R core and many users contribute on solving the bug reports. The peace of new bug reports and comments is constant as well as enhancements on the language itself.

Reproducibility

Session Info

## ─ Session info  ──────────────────────────────────────────────────────────────────────────────────────────────────────
##  hash: nail polish: medium skin tone, woman artist: dark skin tone, pager
## 
##  setting  value
##  version  R version 4.1.2 (2021-11-01)
##  os       Ubuntu 20.04.3 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/Madrid
##  date     2021-11-16
##  pandoc   2.14.0.3 @ /usr/lib/rstudio/bin/pandoc/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package     * version   date (UTC) lib source
##  assertthat    0.2.1     2019-03-21 [1] CRAN (R 4.1.2)
##  backports     1.3.0     2021-10-27 [1] CRAN (R 4.1.2)
##  bit           4.0.4     2020-08-04 [1] CRAN (R 4.1.2)
##  bit64         4.0.5     2020-08-30 [1] CRAN (R 4.1.2)
##  blob          1.2.2     2021-07-23 [1] CRAN (R 4.1.2)
##  blogdown      1.6       2021-11-09 [1] CRAN (R 4.1.2)
##  bookdown      0.24      2021-09-02 [1] CRAN (R 4.1.2)
##  broom         0.7.10    2021-10-31 [1] CRAN (R 4.1.2)
##  bslib         0.3.1     2021-10-06 [1] CRAN (R 4.1.2)
##  bugRzilla   * 0.0.90001 2021-11-13 [1] Github (llrs/bugRzilla@24bc5de)
##  cachem        1.0.6     2021-08-19 [1] CRAN (R 4.1.2)
##  cli           3.1.0     2021-10-27 [1] CRAN (R 4.1.2)
##  colorspace    2.0-2     2021-06-24 [1] CRAN (R 4.1.2)
##  crayon        1.4.2     2021-10-29 [1] CRAN (R 4.1.2)
##  curl          4.3.2     2021-06-23 [1] CRAN (R 4.1.2)
##  DBI         * 1.1.1     2021-01-15 [1] CRAN (R 4.1.2)
##  dbplyr      * 2.1.1     2021-04-06 [1] CRAN (R 4.1.2)
##  digest        0.6.28    2021-09-23 [1] CRAN (R 4.1.2)
##  dplyr       * 1.0.7     2021-06-18 [1] CRAN (R 4.1.2)
##  ellipsis      0.3.2     2021-04-29 [1] CRAN (R 4.1.2)
##  evaluate      0.14      2019-05-28 [1] CRAN (R 4.1.2)
##  fansi         0.5.0     2021-05-25 [1] CRAN (R 4.1.2)
##  farver        2.1.0     2021-02-28 [1] CRAN (R 4.1.2)
##  fastmap       1.1.0     2021-01-25 [1] CRAN (R 4.1.2)
##  forcats     * 0.5.1     2021-01-27 [1] CRAN (R 4.1.2)
##  generics      0.1.1     2021-10-25 [1] CRAN (R 4.1.2)
##  ggpattern   * 0.2.2     2021-11-11 [1] Github (coolbutuseless/ggpattern@7214181)
##  ggplot2     * 3.3.5     2021-06-25 [1] CRAN (R 4.1.2)
##  ggrepel     * 0.9.1     2021-01-15 [1] CRAN (R 4.1.2)
##  glue          1.5.0     2021-11-07 [1] CRAN (R 4.1.2)
##  gtable        0.3.0     2019-03-25 [1] CRAN (R 4.1.2)
##  highr         0.9       2021-04-16 [1] CRAN (R 4.1.2)
##  htmltools     0.5.2     2021-08-25 [1] CRAN (R 4.1.2)
##  httr          1.4.2     2020-07-20 [1] CRAN (R 4.1.2)
##  jquerylib     0.1.4     2021-04-26 [1] CRAN (R 4.1.2)
##  jsonlite      1.7.2     2020-12-09 [1] CRAN (R 4.1.2)
##  kableExtra    1.3.4     2021-02-20 [1] CRAN (R 4.1.2)
##  knitr         1.36      2021-09-29 [1] CRAN (R 4.1.2)
##  labeling      0.4.2     2020-10-20 [1] CRAN (R 4.1.2)
##  lattice       0.20-45   2021-09-22 [1] CRAN (R 4.1.2)
##  lifecycle     1.0.1     2021-09-24 [1] CRAN (R 4.1.2)
##  lubridate   * 1.8.0     2021-10-07 [1] CRAN (R 4.1.2)
##  magrittr      2.0.1     2020-11-17 [1] CRAN (R 4.1.2)
##  Matrix        1.3-4     2021-06-01 [1] CRAN (R 4.1.2)
##  memoise       2.0.0     2021-01-26 [1] CRAN (R 4.1.2)
##  mgcv          1.8-38    2021-10-06 [1] CRAN (R 4.1.2)
##  munsell       0.5.0     2018-06-12 [1] CRAN (R 4.1.2)
##  nlme          3.1-153   2021-09-07 [1] CRAN (R 4.1.2)
##  patchwork   * 1.1.1     2020-12-17 [1] CRAN (R 4.1.2)
##  pillar        1.6.4     2021-10-18 [1] CRAN (R 4.1.2)
##  pkgconfig     2.0.3     2019-09-22 [1] CRAN (R 4.1.2)
##  purrr         0.3.4     2020-04-17 [1] CRAN (R 4.1.2)
##  R6            2.5.1     2021-08-19 [1] CRAN (R 4.1.2)
##  Rcpp          1.0.7     2021-07-07 [1] CRAN (R 4.1.2)
##  rlang         0.4.12    2021-10-18 [1] CRAN (R 4.1.2)
##  rmarkdown     2.11      2021-09-14 [1] CRAN (R 4.1.2)
##  RMySQL      * 0.10.22   2021-06-22 [1] CRAN (R 4.1.2)
##  RSQLite     * 2.2.8     2021-08-21 [1] CRAN (R 4.1.2)
##  rstudioapi    0.13      2020-11-12 [1] CRAN (R 4.1.2)
##  rvest         1.0.2     2021-10-16 [1] CRAN (R 4.1.2)
##  sass          0.4.0     2021-05-12 [1] CRAN (R 4.1.2)
##  scales        1.1.1     2020-05-11 [1] CRAN (R 4.1.2)
##  sessioninfo   1.2.1     2021-11-02 [1] CRAN (R 4.1.2)
##  stringi       1.7.5     2021-10-04 [1] CRAN (R 4.1.2)
##  stringr       1.4.0     2019-02-10 [1] CRAN (R 4.1.2)
##  svglite       2.0.0     2021-02-20 [1] CRAN (R 4.1.2)
##  systemfonts   1.0.3     2021-10-13 [1] CRAN (R 4.1.2)
##  tibble        3.1.6     2021-11-07 [1] CRAN (R 4.1.2)
##  tidyr         1.1.4     2021-09-27 [1] CRAN (R 4.1.2)
##  tidyselect    1.1.1     2021-04-30 [1] CRAN (R 4.1.2)
##  utf8          1.2.2     2021-07-24 [1] CRAN (R 4.1.2)
##  vctrs         0.3.8     2021-04-29 [1] CRAN (R 4.1.2)
##  viridisLite   0.4.0     2021-04-13 [1] CRAN (R 4.1.2)
##  webshot       0.5.2     2019-11-22 [1] CRAN (R 4.1.2)
##  withr         2.4.2     2021-04-18 [1] CRAN (R 4.1.2)
##  xfun          0.28      2021-11-04 [1] CRAN (R 4.1.2)
##  xml2          1.3.2     2020-04-23 [1] CRAN (R 4.1.2)
##  yaml          2.2.1     2020-02-01 [1] CRAN (R 4.1.2)
## 
##  [1] /home/lluis/bin/R/4.1.2/lib/R/library
## 
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

  1. A program from Google where they sponsor students and organizations to work on open source projects. I didn’t knew R participated, but here is the organization on Github.↩︎

Avatar
Lluís Revilla Sancho
Bioinformatician

Bioinformatician with interests in functional enrichment, data integration and transcriptomics.

Related