Preserving Comments From PubMed Commons

By judell | 9 February, 2018

@PubMedCommons tweet announcing the discontinuation of comments.On 1 February 2018, the National Center for Biotechnology Information (NCBI) announced the discontinuation of PubMed Commons, citing usage that had been “minimal, with comments submitted on only 6,000 of the 28 million articles indexed in PubMed.” Although sparse, these comments are a valuable part of the scholarly record: contributors asked questions, gave answers and provided additional insight into published articles. Many in the biomedical community questioned the decision and mourned their impending loss:

On Twitter, several observers suggested Hypothesis might help carry those comments forward:

Archiving PubMed Commons

So we rolled up our sleeves and began our investigation, led by Jon Udell, Director of Integrations at Hypothesis, who so often creates connections between open, standards-based annotation and real-world needs like these. There’s no formal API for retrieving the PubMed comments, but Alf Eaton kindly provided a screenscraper that jump-started our effort. Alf’s code queries PubMed for the roughly 6K articles with comments, downloads them, and extracts the the comments. Because PubMed Commons comments can have replies, our first task was to rework the export to capture the reply structure. With that data in hand, we were in a position to reformulate the archive as a set of Hypothesis annotations attached to those 6K PubMed articles.

But what about the licensing? Public contributions to Hypothesis are governed by the Creative Commons Public Domain Declaration (CC0). PubMed Commons comments, however, are governed by the more restrictive Creative Commons Attribution 3.0 License (CC BY), which enables reuse, but requires attribution to a comment’s author.

Our solution: Embed the CC BY license in the body of each imported annotation, like so:


Screenshot of an example comment from the Hypothesis PubMed Commons Archive.


The CC BY license link there, by the way, isn’t simply a link to the Creative Commons license page. Instead, it’s a Hypothesis direct link that points to the PubMed Commons FAQ that explains its use of Creative Commons. The Hypothesis direct link delivers that context, saving readers the time and effort of tracking it down.

Next we considered how to add DOI equivalences to the imported annotations so they could appear on articles at PubMed that also appear elsewhere. Thanks to a mapping file provided by Jo McEntyre at Europe PMC, we were able acquire DOIs for many of the commented articles. When we imported the annotations using the Hypothesis API, we associated the annotations with those DOIs. That means that as an annotation, a comment originally made on an article at PubMed can now also display on the copy of the same article published at Wiley.

The end results of our import from PubMed Commons are simple, yet powerful. Every comment from PubMed and any replies are now annotations that can appear on the original PubMed abstracts on which they were made, as well as on other published versions of the documents abstracted. Because the original comments were not linked to specific content in the abstracts or underlying documents, they appear as what Hypothesis calls “page notes”: annotations on a page or document as a whole. You can browse and search the entire collection of over 6K comments by visiting the Hypothesis search page, filtered by the “PubMedCommonsArchive” tag we added to every annotation on import. Each comment is also tagged with the unique PMID of its related document, so it’s also possible to browse and search all comments related to any specific document using a PMID tag, as in this example. And like with all Hypothesis annotations, you can now interact further with these PubMed comments, adding replies or using their unique URLs in other contexts.

Making comments FAIR

This exercise in preservation surfaced important underlying issues about the status of such scholarly commentary: is it a valuable part of scholarship that deserves more formal status, and if so, how can it be supported and preserved? Towards this end, we and others have been considering how annotations can benefit from adopting FORCE11’s FAIR principles, which define four characteristics that data — including comments — should support in order to fully participate in scholarship: be Findable, Accessible, Interoperable, and Reusable.

Like many comments on the web, those at PubMed Commons were not especially FAIR. They did display a clear license, which helped ensure that they were reusable. But as our screen-scraping and DOI exercises showed, they were not particularly accessible or interoperable. In the process of archiving these comments to Hypothesis, we were able to increase their FAIRness substantially. Each comment now clearly states its provenance, relates in metadata to the unique identifier of the document it addresses, and is available for access and reuse both at its own unique URL and over an open API that matches W3C standards. Before, the comments existed only on PubMed Commons abstracts. Now, they have their own status and a direct relationship to their related documents everywhere those might be published, in any common web format. Learn more about recent conversations to make annotations FAIR.

From comments to annotations

At Hypothesis, we believe strongly that there is a role for community feedback on scholarship. Despite the fact that PubMed Commons struggled, we believe that the kinds of conversation it provided should be ubiquitous capabilities for scientific and scholarly content. Annotation systems like Hypothesis go well beyond typical commenting systems by:

  • Syncing annotations across different copies of the same article wherever they are published: in PubMed, PubMed Central and journal websites.
  • Anchoring annotations to specific portions of text, rather than in disconnected scrolls at the end of articles.
  • Providing flexible modes of operation such as groups or private note taking.
  • Enabling authors to gain credit for their annotations by linking to common identifiers like ORCIDs.
  • Or…allowing authors to participate anonymously or pseudonymously.
  • Adhering to open standards, enabling cross-platform interoperability.

We believe that conversation about documents will increase dramatically and have greater value when solutions begin to address these and other key sources of friction.

In our strong opinion, all article feedback, commenting, review or other collaboration systems should as quickly as possible adopt W3C standards for web annotation, so that we can begin to move towards an integrated and open framework for engagement across scientific and scholarly literature.

For now, we’re delighted to import this key element of the scholarly record in order to preserve it for all going forward and we welcome your suggestions and feedback about how we might improve similar efforts to preserve scholarly activity.

Share this article