Bugs in R
Exploring the history of bugs in R
This post has a relatively long introduction, you can skip and go ahead to the analysis.
I knew about the r-devel (abbreviated Rd) mailing list that is where some discussions about the language happen and I had read that post of the R core requesting help reviewing bug reports and I had requested an account the same day it came out to be able to post on Bugzilla: https://bugs.r-project.org/. But I hadn’t reported any bug or anything, what had I to bring?
After RStudio 2021 I saw announced the https://r-devel.slack.com (you can join via this website) and when I joined I checked some bugs and I found something odd. This lead to the creation of my first R bug: 18055.
Then I thinking about the analysis about package reviews I realized it was analyzing issues. Then it came to me that I could actually also analyse the issues of R, aka bugs reports.
Collecting the data
First was step was collecting the data needed. Similar with the rOpenSci and Bioconductor analysis I knew I might need to create a package or a script just to retrieve the data.
I found somehow that it reports some data as xml and I though I could use that. But exploring the documentation I found it had an API that could be used to retrieve data. It required authentication to interact with the API. Instead of putting me off it made it a reasonable challenge and progression. If previously I used the gh package to authenticate and retrieve the raw data this time I had to learn how to authenticate to an API. I already developed a package that uses a poorly explained API to retrieve documents and adding a new step to authenticate requests was a small enough step.
I also found a package, bugtractr, that did this already but I had some problems using it and didn’t use authenticated requests to get the data. This meant it couldn’t retrieve all data I wanted. So I went on developing my own package to interact with Bugzilla’s API.
As I was learning to interact with API and I wanted to make it useful for the R community I looked at how to make that. Lukyly for me at the time I was doing this, I found the book HTTP testing in R, which was still under heavy writing but almost complete, and started reading and using its helpful advice to learn. One of the first recommendations was to contact with the API providers, so I email the R core about my intentions.
They raised some concerns: - Impact on the load of the machine. - if the API is robust enough. - However, semi-automating report submissions needs more thought.
Yes, at that time I had realized that the API allows to submit bug reports on the database (comments and attachments too), so I thought it could be an easy way to help people submit more bugs: submitting bugs from R itself.
After some exchanges about why, and how I was trying to retrieve data from Bugzilla I was deferred Simon Urbanek.
By that time I already posted about this on R-devel and got some interest from R contributing working group to which I presented the idea on March 12 (One month after the first commit on the package).
It was suggested on that meeting to present a Google Summer of Code1 project which closed the project submission period shortly after. Soon two students contacted me and Heather Turner, who agreed to co-mentor the project, to write a proposal to work on my project.
By that time Simon kindly provided a database dump (without the user list) due to concerns on privacy and load on the server (which I found it can return different results on the same query), and provided the id of R core members.
This is a mix of analysis for three several purposes:
- To understand what is going on with bug reports.
- To understand how to make better bug reports to help bug submissions via bugRzilla.
- To help the R contributor working group and R forwards to identify contributors.
- To help the R core team identifying possible areas of improvements.
A first exploration is to see the bug ids and the creation time of the bug id:
The first suprising thing are these three points that appear outside the line the other bugs form.
I don’t know the reason of the move, probably due to SPAM or interface improvements. If we look at those missing ids we can estimate the SPAM, note that some might be vulnerabilities on R (but I doubt that there are this much):
The first observable thing is the high number of ids missing on Jitterbug. I heard that this was an abuse of the site, which seems particularly bad arround two dates.
Later on when the system moved to Bugzilla there are much less missing ids, until one day there are around 120 missing ids and the date later to have an account it was required to send a message to the R core about it. Probably spamers abused the Bugzilla’s API. Is understandable that the R core team is concerned on receiving spam messages from those past experiences.
As the information from Jitterbug is both old and not as reach as those bugs on Bugzilla I will limit from now on to analyze those bugs reported on Bugzilla. In addition the number of bugs reported on Bugzilla is similar to the ones on Jitterbug:
There are some old bugs opened on Jitterbug modified on Bugzilla (428) and still not closed.
Openning bugs reports
If we focus only on bugs reported on Bugzilla we find the following number of bugs:
Most bugs are closed followed by unconfirmed:
Do bugs reported have an attachment with a patch?
|Attachment on opening||Patch||Bugs||%|
Many bug reports have attachments on opening, mostly they contain code to reproduce the problem. If bugs do not contain attachments and they receive an attachment it will usually be a patch (but it might not).
What happens after submitting a bug report?
One of the most common thing to happen is that someone might comment on the bug either to ask for clarification or to discuss the bug report and possible solutions:
Most common action is receiving a comment, whose author is added on the CC field.
The R core is very active answering on the bug reports, only it seems that those that are trivial do not receive a reply as often as the other categories.
If we look by component and OS there are some combinations that received few comments, mostly those that are wishes for R.
If we split them between peitions to improve R and actual bug reports we can see a different pattern:
Enhancements usually receive less comments from the R core. Wishlists receive more comments from the R core among the enhancements.
What about the comments made by the original poster? Do they comment when they receive some feedback from other users?
Most of them do not reply back when they receive a comment. There might be several causes, one of them is if their bug report is closed or assigned to an R core member.
Which shows that the majority of those who do not respond is because either the bug report is closed (fixed or not) or a R core member is assigned (usually to himself).
Who is active ?
So far we have explored the activity of users who report a bug and the R core members that receive this. But there are some users that beyond this they also participate and collaborate with the R core. To make it more recent we will look at those users active on the bugs opened on the last three years.
|ID||Name||All comments||All attachments||Comments||Attachments||Bugs opened||Bugs interacted||Actions|
|847||Pavel N. Krivitsky||15||2||0||0||8||8||8|
(You can find a sortable anf filterable version of this table here). One of the top contributors was recently added as R core member. Other
Some contributors focus on providing patches, other’s open many bugs, other’s comment on the bugs to confirm bugs or provide context to the bugs.
Future of the report system ?
Last what is the progression of bugs
If we look only at bug reports and not enhancements every 1.52 day there is one bug report and one enhancement petitions per 4.44 days.
If we look only at bug reports and not enhancements there are 1.48 comments every day from R users, one comment on enhancement per 1.9 days.
Both of them has remained fairly constant over the years, at times a bit faster and sometimes slower.
This is probably the post that took longer so far. I started on February to work on this blog post but until now I haven’t actually written the blog post I wanted.
Many thanks to Simon Urbanek for providing the database dump, without it would have been slower and harder if not impossible to do this analysis. Thanks to Heather Turner for encouraging me to do more on this project and providing valuable feedback about what kind of analysis could be useful and co-mentoring Pyush Kumar who I also like to thank the first analysis on the data and his contributions during GSoC. Many thanks to Gabe Becker and Michael Quirico for their feedback on the R contributors working group.
Now that the analysis is done, I want to finish the bugRzilla package (which I already started to use for small tasks on this analysis). I’m still working on it testing how is the best way to submit properly formatted bug reports with a developer instance set up by Simon. Then I’ll ask the R core if the way it submits bug reports works well for them.
TL:DR: There are many bug reported and handled by the R core and many users contribute on solving the bug reports. The peace of new bug reports and comments is constant as well as enhancements on the language itself.
## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────── ## hash: nail polish: medium skin tone, woman artist: dark skin tone, pager ## ## setting value ## version R version 4.1.2 (2021-11-01) ## os Ubuntu 20.04.3 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Europe/Madrid ## date 2021-11-16 ## pandoc 188.8.131.52 @ /usr/lib/rstudio/bin/pandoc/ (via rmarkdown) ## ## ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────── ## package * version date (UTC) lib source ## assertthat 0.2.1 2019-03-21  CRAN (R 4.1.2) ## backports 1.3.0 2021-10-27  CRAN (R 4.1.2) ## bit 4.0.4 2020-08-04  CRAN (R 4.1.2) ## bit64 4.0.5 2020-08-30  CRAN (R 4.1.2) ## blob 1.2.2 2021-07-23  CRAN (R 4.1.2) ## blogdown 1.6 2021-11-09  CRAN (R 4.1.2) ## bookdown 0.24 2021-09-02  CRAN (R 4.1.2) ## broom 0.7.10 2021-10-31  CRAN (R 4.1.2) ## bslib 0.3.1 2021-10-06  CRAN (R 4.1.2) ## bugRzilla * 0.0.90001 2021-11-13  Github (llrs/bugRzilla@24bc5de) ## cachem 1.0.6 2021-08-19  CRAN (R 4.1.2) ## cli 3.1.0 2021-10-27  CRAN (R 4.1.2) ## colorspace 2.0-2 2021-06-24  CRAN (R 4.1.2) ## crayon 1.4.2 2021-10-29  CRAN (R 4.1.2) ## curl 4.3.2 2021-06-23  CRAN (R 4.1.2) ## DBI * 1.1.1 2021-01-15  CRAN (R 4.1.2) ## dbplyr * 2.1.1 2021-04-06  CRAN (R 4.1.2) ## digest 0.6.28 2021-09-23  CRAN (R 4.1.2) ## dplyr * 1.0.7 2021-06-18  CRAN (R 4.1.2) ## ellipsis 0.3.2 2021-04-29  CRAN (R 4.1.2) ## evaluate 0.14 2019-05-28  CRAN (R 4.1.2) ## fansi 0.5.0 2021-05-25  CRAN (R 4.1.2) ## farver 2.1.0 2021-02-28  CRAN (R 4.1.2) ## fastmap 1.1.0 2021-01-25  CRAN (R 4.1.2) ## forcats * 0.5.1 2021-01-27  CRAN (R 4.1.2) ## generics 0.1.1 2021-10-25  CRAN (R 4.1.2) ## ggpattern * 0.2.2 2021-11-11  Github (coolbutuseless/ggpattern@7214181) ## ggplot2 * 3.3.5 2021-06-25  CRAN (R 4.1.2) ## ggrepel * 0.9.1 2021-01-15  CRAN (R 4.1.2) ## glue 1.5.0 2021-11-07  CRAN (R 4.1.2) ## gtable 0.3.0 2019-03-25  CRAN (R 4.1.2) ## highr 0.9 2021-04-16  CRAN (R 4.1.2) ## htmltools 0.5.2 2021-08-25  CRAN (R 4.1.2) ## httr 1.4.2 2020-07-20  CRAN (R 4.1.2) ## jquerylib 0.1.4 2021-04-26  CRAN (R 4.1.2) ## jsonlite 1.7.2 2020-12-09  CRAN (R 4.1.2) ## kableExtra 1.3.4 2021-02-20  CRAN (R 4.1.2) ## knitr 1.36 2021-09-29  CRAN (R 4.1.2) ## labeling 0.4.2 2020-10-20  CRAN (R 4.1.2) ## lattice 0.20-45 2021-09-22  CRAN (R 4.1.2) ## lifecycle 1.0.1 2021-09-24  CRAN (R 4.1.2) ## lubridate * 1.8.0 2021-10-07  CRAN (R 4.1.2) ## magrittr 2.0.1 2020-11-17  CRAN (R 4.1.2) ## Matrix 1.3-4 2021-06-01  CRAN (R 4.1.2) ## memoise 2.0.0 2021-01-26  CRAN (R 4.1.2) ## mgcv 1.8-38 2021-10-06  CRAN (R 4.1.2) ## munsell 0.5.0 2018-06-12  CRAN (R 4.1.2) ## nlme 3.1-153 2021-09-07  CRAN (R 4.1.2) ## patchwork * 1.1.1 2020-12-17  CRAN (R 4.1.2) ## pillar 1.6.4 2021-10-18  CRAN (R 4.1.2) ## pkgconfig 2.0.3 2019-09-22  CRAN (R 4.1.2) ## purrr 0.3.4 2020-04-17  CRAN (R 4.1.2) ## R6 2.5.1 2021-08-19  CRAN (R 4.1.2) ## Rcpp 1.0.7 2021-07-07  CRAN (R 4.1.2) ## rlang 0.4.12 2021-10-18  CRAN (R 4.1.2) ## rmarkdown 2.11 2021-09-14  CRAN (R 4.1.2) ## RMySQL * 0.10.22 2021-06-22  CRAN (R 4.1.2) ## RSQLite * 2.2.8 2021-08-21  CRAN (R 4.1.2) ## rstudioapi 0.13 2020-11-12  CRAN (R 4.1.2) ## rvest 1.0.2 2021-10-16  CRAN (R 4.1.2) ## sass 0.4.0 2021-05-12  CRAN (R 4.1.2) ## scales 1.1.1 2020-05-11  CRAN (R 4.1.2) ## sessioninfo 1.2.1 2021-11-02  CRAN (R 4.1.2) ## stringi 1.7.5 2021-10-04  CRAN (R 4.1.2) ## stringr 1.4.0 2019-02-10  CRAN (R 4.1.2) ## svglite 2.0.0 2021-02-20  CRAN (R 4.1.2) ## systemfonts 1.0.3 2021-10-13  CRAN (R 4.1.2) ## tibble 3.1.6 2021-11-07  CRAN (R 4.1.2) ## tidyr 1.1.4 2021-09-27  CRAN (R 4.1.2) ## tidyselect 1.1.1 2021-04-30  CRAN (R 4.1.2) ## utf8 1.2.2 2021-07-24  CRAN (R 4.1.2) ## vctrs 0.3.8 2021-04-29  CRAN (R 4.1.2) ## viridisLite 0.4.0 2021-04-13  CRAN (R 4.1.2) ## webshot 0.5.2 2019-11-22  CRAN (R 4.1.2) ## withr 2.4.2 2021-04-18  CRAN (R 4.1.2) ## xfun 0.28 2021-11-04  CRAN (R 4.1.2) ## xml2 1.3.2 2020-04-23  CRAN (R 4.1.2) ## yaml 2.2.1 2020-02-01  CRAN (R 4.1.2) ## ##  /home/lluis/bin/R/4.1.2/lib/R/library ## ## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────