Hippocamplus My Second Memory

Speeding up blogdown/Pandoc for large bibliography

I have another website where I write down my reviews of the papers I read. To handle citations in pages and posts, I was originally using jekyll-scholar. It scales well enough that I could have one main BibTeX file for all the pages of the website. I’m now switching to blogdown/Hugo because it’s apparently faster, with less dependencies, but most importantly because it’s very easy to integrate R code with RMarkdown. To use citations in blogdown, we can specify the BibTeX file in the YAML header and then use either @blabla or [@blabla] in the text (for multiple citations: [@blabla;@foo]). In the YAML header:

bibliography: [../../static/library.bib]
link-citations: true

It seemed to work well at first but after adding more pages the rendering got stuck. Googling around, it seemed to be a known issue for pandoc-citation and large bibliography, the solution being to use a bib file as small as possible by keeping only the records cited. Surprisingly this didn’t fix my problem and now even with a tiny BibTeX file I couldn’t render some pages. It turned out that the problem was the very long author list in some citations, which is common in the genomics field. So the solution for me was to keep only the records cited AND set a maximum number of authors.

I wrote a small Python script that scans Markdown files for citations and extract them from a BibTeX file, shortening the author list if necessary. I put the reduceBib.py Python script on GitHub here.

reduceBib.py usage

At the root of the website, I use the following command:

python reduceBib.py -b static/library.bib -o static/library-small.bib content/fixed/*.Rmd content/post/*.Rmd

The help page:

> python reduceBib.py -h
usage: reduceBib.py [-h] [-b BIB] [-o OUT] [-a NAUTHS] [-f FIELDS]
                    mds [mds ...]

Reduce a .bib file.

positional arguments:
  mds         the markdown files to scan

optional arguments:
  -h, --help  show this help message and exit
  -b BIB      the original bib file
  -o OUT      the new bib file
  -a NAUTHS   the maximum number of authors. Default: 5.
  -f FIELDS   the BibTeX fields to keep (comma separated). Default: