Exploring CRAN's files: part 3

References: links between pages, packages and base R

In the blog post of CRAN files, I initially explored the database (mostly around CRAN_package_db()). On the second post I moved to analyze the archive around CRAN_archive_db().

Recently at useR!2024, Kurt Hornik offered to export data from CRAN. After an email several new functions were exported and introduced to base R.

In that presentation and in a previous exchange with Kurt, he explained his project about providing the HTML manual pages of all CRAN packages. One of the problems with this projects is providing links to the right pages of the packages (others are making it accessible for all users).

In R-devel now there are test that ensure links to other pages are in the right format 1. In this post we we’ll explore links between documentation pages, what manuals say about them and what is the state of base R and the CRAN packages.

Introduction

Help pages are defined in R documentation files, commonly with the .Rd extension. Each documentation file can have multiple topics. A topic is defined in R documentation format with \alias{topic}, so once can use interchangeably alias and topics. Links, or cross-references (xref for short) should be to an alias, not to files (allowing to move topics between help pages)2.

One can define links with \link{alias} or \link[=alias]{name} if the link is to a different place than the name 3. In the second form the content within the square bracket is known as anchor.

There are two other forms accepted: \link[pkg]{alias} and \link[pkg:alias]{name} which do not use the = inside the anchor 4. These last two forms are only used in the HTML format. Packages referred to on these square brackets should be declared in the DESCRIPTION file, in the ‘Depends’, ‘Imports’, ‘Suggests’ or ‘Enhances’ fields.

One important thing is that links are case sensitive: ?Complex leads to a different help page than ?complex.

With this in mind we can check to which packages have more manual pages, have more cross-references, or have an overview help page (those with pkgname-package.Rd). We will also check which documentation pages have more links, are more linked to, or needs its links fixed.

Base R

Before looking to CRAN packages we can start by looking to R base packages.

Previously there wasn’t a way to know the links and aliases in base R. Since recently (~2024/08/20) we can download them (on R-devel till it is released on R 4.5) with tools::base_aliases_db() and tools::base_rdxrefs_db().

Aliases

On base R there are 1458 help pages with 4676 topics/aliases.

There are some help files with the same name in base R that can be accessed via their targets (via ?Target). Those files are dev.Rd, format.Rd, units.Rd.
To distinguish between those files one need to check the title(s) and content of the pages.

Similarly there are multiple help pages of different packages with the same topics:

Table 1: Duplicate alias on R base packages. Alias and packages with the same alias.
Alias Packages
Complex base, methods
GSC base, tools
Logic base, methods
Math base, methods
Ops base, methods
R_GSCMD base, tools
S4 base, methods
Summary base, methods
clipboard base, utils
hat grDevices, stats
matrixOps base, methods
plot base, graphics
show-methods methods, stats4
symbol base, grDevices

Most of them are in base and other packages, from the methods package to tools, grDevices, graphics, and utils.

To access them directly one needs to use help(): help("clipboard", package = "utils") otherwise one will need to choose them from the menu it will pop up from using ?clipboard. This works to access any other help page too.

There are 665 help pages that only have one alias. For example, .BasicFunsList, .Platform or .Last.value. Some of these are datasets or very specific help pages.

Last, we can check the topics for the packages’ help page. While usually packages have at least two alias, pkg and one pkgname-package, there are two packages that do not have this, grid and methods packages.
These packages do not have the alias for pkgname to their own package help page. ?grid could be confused with graphics::grid and ?methods could be confused with utils::methods.

Cross references

Having explored the topics on the help pages, it is time to move to the cross references on R.

We can check where these help pages link to, starting by knowing how many are directly to the alias or those that use an anchor:

**Packages with links using alias per help page.** Each point is a help pages with different number of links. Some packages directly link a lot while others are more mixed.

Figure 1: Packages with links using alias per help page. Each point is a help pages with different number of links. Some packages directly link a lot while others are more mixed.

We can further explore to where they link if we find out the package and topic, instead of just the topic used.

We can check for pages that link to themselves. The cross-reference system doesn’t provide a way to link to a specific section (yet?). When users follow those links they are redirected to the top of the same help page, while probably they were expecting to go to another help page. At least this is something I have felt in the past. In my opinion these links could be deleted and rewritten to explain which section should the user read:

Table 2: Help pages on base with links to themselves. These links could be removed or perhaps a linking to a specific section could be implemented.
Package Help page
base double.Rd
base numeric.Rd
base parse.Rd
base call.Rd
base is.unsorted.Rd
base showConnections.Rd
base taskCallback.Rd
grDevices axisTicks.Rd
grid grid.add.Rd
grid grid.get.Rd
methods setIs.Rd
parallel makeCluster.Rd
parallel unix/children.Rd
stats uniroot.Rd
stats NLSstRtAsymptote.Rd
stats dist.Rd
stats dendrogram.Rd
stats stepfun.Rd
utils update.packages.Rd

There are 19 links to their own help pages that could be removed.

R documentation files with references to other help pages:

**Links between R help pages are predominantly to their own package.** Base is more connected than any other package. Some packages recieve links from very few packages.

Figure 2: Links between R help pages are predominantly to their own package. Base is more connected than any other package. Some packages recieve links from very few packages.

We can clearly see that the packages link more between their own package but there are some heavily connected packages like utils and base or stats and base. Surprisingly, compiler package doesn’t have any reference to other packages.

There were help pages that use the file names instead of topics to link pages5. I raised this issue on the mailing list and submitted a patch later on 18815 that got them fixed before publishing this post.

We also observed that some packages do not provide links. How frequent is this and how it affects base R?

Table 3: References on help pages. Most base help pages are referenced and link to other help pages.
Link from Reference to Help pages perc
TRUE TRUE 1093 74.8%
TRUE FALSE 211 14.4%
FALSE FALSE 113 7.7%
FALSE TRUE 44 3.0%

Almost all help pages have links and most of them receive references from other help pages (compre this with Table 9). Only 113 help pages do not have links or receive references, so unless they are documented somewhere else, like R internals, Writing R extensions, R Installation and Administration, it will be hard to find them.

This seems a good opportunity to patch documentation and make it easier to find and use those help pages. For example there is currently no CRAN package, or base package that links to Rmcd(). This makes it harder to discover and use it.

**Help pages receiving and sending references to other help pages.** The help page more linked on base is the options help page

Figure 3: Help pages receiving and sending references to other help pages. The help page more linked on base is the options help page

While so far we focus on links if we compare that with the existing help pages we might find some interesting patterns:

Table 4: Help pages with no links to them.
Package Help pages
datasets 67
grid 50
base 46
stats 40
utils 32
tools 28
methods 20
graphics 10
stats4 10
grDevices 7
tcltk 6
splines 5
parallel 3

As one could expect there aren’t many links on documentation help pages from datasets. Surprisingly grid is the next package without links from other base packages, followed by base itself. It would be interesting to see which help pages, as connecting these help pages to other topics might help users to find what R is capable.

Table 5: Help pages with no references from them.
Package Help pages
datasets 71
base 20
grid 13
stats 12
tools 12
utils 11
grDevices 6
methods 2
parallel 2
splines 2
stats4 2
tcltk 2
compiler 1
graphics 1

Similar to the table above there are many help pages that do not link to them

But having links to and from help pages is not enough for discovering them. They could form a closed graph, where some pages are unreachable from the others.

**Graph of all the links between help pages of base R**. There are some help pages that are not connected to any other help page

Figure 4: Graph of all the links between help pages of base R. There are some help pages that are not connected to any other help page

Apparently there are some help page on base R that don’t have links to navigate to the rest of the documentation. There are 4 isolated clusters of help pages. These are tools:toHTML.Rd, tools:HTMLheader.Rd, grid:absolute.size.Rd, grid:widthDetails.Rd, grid:gridCoords.Rd, grid:grobCoords.Rd. These help pages would be easier to find if at least they were linked from other help pages from base R.

To CRAN

We can also ask which packages are linked from R that are not part of R.

Table 6: Most packages linked from base R are from current R core members. Information about non-base packages linked from base R.
Package links Priority Maintainer Core member
MASS 23 recommended Brian Ripley TRUE
Matrix 10 recommended Martin Maechler TRUE
KernSmooth 9 recommended Brian Ripley TRUE
lattice 8 recommended Deepayan Sarkar TRUE
nlme 7 recommended R Core Team TRUE
cluster 6 recommended Martin Maechler TRUE
survival 3 recommended Terry M Therneau FALSE
SuppDists 3 Thorsten Pohlert FALSE
coin 3 Torsten Hothorn FALSE
knitr 2 Yihui Xie FALSE
mathjaxr 2 Wolfgang Viechtbauer FALSE
vcd 2 David Meyer FALSE
nnet 1 recommended Brian Ripley TRUE
rpart 1 recommended Beth Atkinson FALSE
chron 1 Kurt Hornik TRUE
date 1 Kurt Hornik TRUE
robustbase 1 Martin Maechler TRUE
round 1 Martin Maechler TRUE
Kendall 1 A.I. McLeod FALSE
multcomp 1 Torsten Hothorn FALSE
pcaPP 1 Valentin Todorov FALSE
pspearman 1 Petr Savicky FALSE

We can see that some packages linked from base R are from the recommended packages and many from the R core members or maintained by the whole R core group. Some other links include popular packages, while others are not so popular.

CRAN packages

Now that we explored base R packages we can also explore CRAN’s help pages.

1## Aliases

As before we start exploring CRAN aliases. Which are the packages that have more aliases?

Table 7: Help pages per CRAN package. Top 10 packages with more help pages.
Package Help page
aroma.affymetrix 2158
Matrix 1997
VGAM 1948
CVXR 1723
paws.management 1580
paws.security.identity 1580
aroma.core 1566
paws.compute 1429
paws.analytics 1349
photobiology 1319

The top packages with more help pages have more than 2000! On the other side there are more than 800 packages (835) with only one alias.

**Most CRAN packages have 14 aliases.** Aliases distribution on CRAN.

Figure 5: Most CRAN packages have 14 aliases. Aliases distribution on CRAN.

I’m surprised to see so many packages with just one alias. This might indicates packages that only try to do just one thing, while more complex packages will have more aliases and maybe also more help pages.

We can also find those targets that are duplicated on CRAN:

**Top 25 duplicated topics on CRAN.** Common methods and the magrittr pipe dominate the most frequent topics.

Figure 6: Top 25 duplicated topics on CRAN. Common methods and the magrittr pipe dominate the most frequent topics.

The most repeated topic is the magrittr pipe %>% followed by reexports (curious as usually the pipe is also re-exported). We can see some common methods like plot, print, show and some that are related to the dplyr and shiny, mixed between some methods used for statistics.

Help pages

Similar to the alias on CRAN we can check what happens with help pages:

Table 8: Number of help pages of top 10 packages. Some packages have many help pages.
Help page
1573
1573
1422
1342
1283
1091
970
911
901
874

The top packages with more help pages have more than 800! On the other side there are more than 1000 packages (1086) with only one help page.

Overall doesn’t seem like there is a distribution pattern. Packages end up with as many aliases as maintainers see fit.

**Most CRAN packages have 11 help pages.** Help page distribution.

Figure 7: Most CRAN packages have 11 help pages. Help page distribution.

We can check the ratio of alias per help page and number of help pages by package:

**Some packages have many alias per help pages.** Packages with high number of alias per help page are colored in green while packages with high number of help pages and alias are in black. The dashed line correspond to a perfect match between help pages and alias.

Figure 8: Some packages have many alias per help pages. Packages with high number of alias per help page are colored in green while packages with high number of help pages and alias are in black. The dashed line correspond to a perfect match between help pages and alias.

It is clear that more help pages might lead to more topic pages but there are some packages with low number of help pages but high number of alias (those in green on the plot above). These packages tend to have 5 alias per help page or more.

However this is not normal, the median number of help pages is 11 and the median number of topics is 14. So usually CRAN packages tend to be much smaller.

Cross references

So far we analyzed CRAN help pages by the number of alias and help pages they have. While this is interesting I think it can be more revealing checking what they are linking to.

Similar to what we did with R cross-references we first start finding links to base packages and then we’ll search for links across CRAN packages.

Now that I have corrected many issues I’ll check if it matches CRAN’s output.

Further analysis

CRAN packages

Now it is time to check for CRAN packages.

There are 1387 packages that lack destination of the R document. There are also 11684 that miss anchors, the majority of these are to their own R documentation help pages. There are 695 packages that link to a packages that is not on the dependency, while these could be packages using “LinkingTo” this could also be packages linking to archived packages. There are 1335 packages that link to a file instead of a topic. While this is not forbidden (yet)

We can look at packages are CRAN packages linking to (excluding themselves):

**Links to other packages** Most packages receive links  from few packages, but there are some that receive links from few packages and many help pages

Figure 15: Links to other packages Most packages receive links from few packages, but there are some that receive links from few packages and many help pages

There are few packages thightly linked together (those that start with paws.*) and then there are the base packages linked by many different packages.

If we focus more on the most common packages we might find some other common practices:

**Zooming to links to other packages** Focusing on packages with less than 400 help pages linked and less than 1000 packages being referenced.

Figure 16: Zooming to links to other packages Focusing on packages with less than 400 help pages linked and less than 1000 packages being referenced.

But who is linking to these packages?

**Links to other packages** Most packages receive links  from few packages, but there are some that receive links from

Figure 17: Links to other packages Most packages receive links from few packages, but there are some that receive links from

There are many packages that link to few packages but from many help pages. There are some that link to many packages from relatively few help pages. There is broom that links to many distinct packages from many help pages.

Putting this together we can explore which packages are more connected, those that provide links, those that recieve and those that do both or neither.

Table 10: Table describing packages cross reference status. Most packages are not linked, most provides link and few receive links but do not link to other packages.
Receives_links Provides_links Packages
FALSE FALSE 11056
FALSE TRUE 7466
TRUE TRUE 2578
TRUE FALSE 743

Most packages do not provide links to external packages or receive them. Next come those that link to other packages but are not linked back, followed by those that are linked to and are linked back. Last there are those package that receives links but do not link to other resources.

**Packages receiveing and giving links.** Very few packages link to many others. While some other packages are linked by many.

Figure 18: Packages receiveing and giving links. Very few packages link to many others. While some other packages are linked by many.

broom is a package that receives few links from packages but link to multiple packages.

**Distribution of links on help pages with links.** CRAN packages have more links than base packages.

Figure 19: Distribution of links on help pages with links. CRAN packages have more links than base packages.

CRAN’s packages tend to have more links to other packages than base R.

Conclusions

As a summary of the analysis of the links and alias. About base:

  • Help page more linked to from base: options.Rd
  • Help page more linked to from CRAN: character.Rd
  • Help page with more links: par.Rd
  • Package more linked to from CRAN: base
  • Page more linked from different packages: par.Rd
  • Package with more links: base
  • Package with less links: compiler

About CRAN packages:

  • Package with more help pages: paws.management
  • Package more linked by other packages: ggplot2
  • Package help with more links: spatstat-package.Rd
  • Help page more repeated: reexports
  • Package help with linked to: R6Class.Rd
  • Package with more links to other packages: paws
  • Package with more links : keras3
  • Topic more repeated: %>%
  • Links I couldn’t resolve: 5358
  • Packages with link related problems:
  • Packages linking to Bioconductor:

Wrong assumptions I had about cross-references:

  • All packages will have one link to other help pages.
  • One cannot link to a file.
  • Base packages wouldn’t link to CRAN packages.

I don’t think I was ever aware of the compiler package.

Reproducibility

Session info
## - Session info -------------------------------------------------------------------------------------------------------
##  setting  value
##  version  R Under development (unstable) (2024-12-20 r87452)
##  os       Ubuntu 24.04.1 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language en
##  collate  C
##  ctype    C
##  tz       Europe/Madrid
##  date     2025-01-01
##  pandoc   3.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
##  quarto   1.4.553 @ /usr/local/bin/quarto
## 
## - Packages -----------------------------------------------------------------------------------------------------------
##  package      * version    date (UTC) lib source
##  BiocManager    1.30.25    2024-08-28 [1] CRAN (R 4.5.0)
##  blogdown       1.19       2024-02-01 [1] CRAN (R 4.5.0)
##  bookdown       0.41       2024-10-16 [1] CRAN (R 4.5.0)
##  bslib          0.8.0      2024-07-29 [1] CRAN (R 4.5.0)
##  cachem         1.1.0      2024-05-16 [1] CRAN (R 4.5.0)
##  cli            3.6.3      2024-06-21 [1] CRAN (R 4.5.0)
##  codetools      0.2-20     2024-03-31 [2] CRAN (R 4.5.0)
##  colorspace     2.1-1      2024-07-26 [1] CRAN (R 4.5.0)
##  digest         0.6.37     2024-08-19 [1] CRAN (R 4.5.0)
##  dplyr        * 1.1.4      2023-11-17 [1] CRAN (R 4.5.0)
##  evaluate       1.0.1      2024-10-10 [1] CRAN (R 4.5.0)
##  fansi          1.0.6      2023-12-08 [1] CRAN (R 4.5.0)
##  farver         2.1.2      2024-05-13 [1] CRAN (R 4.5.0)
##  fastmap        1.2.0      2024-05-15 [1] CRAN (R 4.5.0)
##  forcats      * 1.0.0      2023-01-29 [1] CRAN (R 4.5.0)
##  generics       0.1.3      2022-07-05 [1] CRAN (R 4.5.0)
##  ggforce        0.4.2      2024-02-19 [1] CRAN (R 4.5.0)
##  ggplot2      * 3.5.1      2024-04-23 [1] CRAN (R 4.5.0)
##  ggraph       * 2.2.1      2024-03-07 [1] CRAN (R 4.5.0)
##  ggrepel      * 0.9.6      2024-09-07 [1] CRAN (R 4.5.0)
##  glue           1.8.0      2024-09-30 [1] CRAN (R 4.5.0)
##  graphlayouts   1.2.1      2024-11-18 [1] CRAN (R 4.5.0)
##  gridExtra      2.3        2017-09-09 [1] CRAN (R 4.5.0)
##  gtable         0.3.6      2024-10-25 [1] CRAN (R 4.5.0)
##  htmltools      0.5.8.1    2024-04-04 [1] CRAN (R 4.5.0)
##  igraph         2.1.1      2024-10-19 [1] CRAN (R 4.5.0)
##  jquerylib      0.1.4      2021-04-26 [1] CRAN (R 4.5.0)
##  jsonlite       1.8.9      2024-09-20 [1] CRAN (R 4.5.0)
##  knitr        * 1.49       2024-11-08 [1] CRAN (R 4.5.0)
##  labeling       0.4.3      2023-08-29 [1] CRAN (R 4.5.0)
##  lifecycle      1.0.4      2023-11-07 [1] CRAN (R 4.5.0)
##  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.5.0)
##  MASS           7.3-61     2024-06-13 [2] CRAN (R 4.5.0)
##  memoise        2.0.1      2021-11-26 [1] CRAN (R 4.5.0)
##  munsell        0.5.1      2024-04-01 [1] CRAN (R 4.5.0)
##  pillar         1.9.0      2023-03-22 [1] CRAN (R 4.5.0)
##  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.5.0)
##  polyclip       1.10-7     2024-07-23 [1] CRAN (R 4.5.0)
##  purrr          1.0.2      2023-08-10 [1] CRAN (R 4.5.0)
##  R6             2.5.1      2021-08-19 [1] CRAN (R 4.5.0)
##  Rcpp           1.0.13-1   2024-11-02 [1] CRAN (R 4.5.0)
##  rlang          1.1.4      2024-06-04 [1] CRAN (R 4.5.0)
##  rmarkdown      2.29       2024-11-04 [1] CRAN (R 4.5.0)
##  rstudioapi     0.17.1     2024-10-22 [1] CRAN (R 4.5.0)
##  sass           0.4.9      2024-03-15 [1] CRAN (R 4.5.0)
##  scales         1.3.0      2023-11-28 [1] CRAN (R 4.5.0)
##  sessioninfo    1.2.2.9000 2024-09-15 [1] Github (r-lib/sessioninfo@442a686)
##  tibble         3.2.1      2023-03-20 [1] CRAN (R 4.5.0)
##  tidygraph    * 1.3.1      2024-01-30 [1] CRAN (R 4.5.0)
##  tidyr          1.3.1      2024-01-24 [1] CRAN (R 4.5.0)
##  tidyselect     1.2.1      2024-03-11 [1] CRAN (R 4.5.0)
##  tweenr         2.0.3      2024-02-26 [1] CRAN (R 4.5.0)
##  utf8           1.2.4      2023-10-22 [1] CRAN (R 4.5.0)
##  vctrs          0.6.5      2023-12-01 [1] CRAN (R 4.5.0)
##  viridis        0.6.5      2024-01-29 [1] CRAN (R 4.5.0)
##  viridisLite    0.4.2      2023-05-02 [1] CRAN (R 4.5.0)
##  withr          3.0.2      2024-10-28 [1] CRAN (R 4.5.0)
##  xfun           0.49       2024-10-31 [1] CRAN (R 4.5.0)
##  yaml           2.3.10     2024-07-26 [1] CRAN (R 4.5.0)
## 
##  [1] /home/lluis/bin/R/4.5
##  [2] /opt/R/devel/lib/R/library
##  * -- Packages attached to the search path.
## 
## ----------------------------------------------------------------------------------------------------------------------

  1. By setting _R_CHECK_XREFS_NOTE_MISSING_PACKAGE_ANCHORS_=true as environmental variable.↩︎

  2. But using links to files is still fine.↩︎

  3. There is also the \linkS4class{abc} that expands to \link[=abc-class]{abc} for S4 classes.↩︎

  4. So there are three names for (almost) the same thing: topic, alias and anchor. Anchors do not need to start with = but should be resolvable.↩︎

  5. Yes, even though the documentation makes it clear that this shouldn’t be done the help system works around this issue.↩︎

Edit this page

Avatar
Lluís Revilla Sancho
Data scientist

Data scientist with interests in software quality, mostly R.

Related