Re: NIH RFI on Plan to Enhance Public Access to the Results of NIH-Supported Research

tl;dr: The NIH should directly oppose a for-profit APC-driven publication system and cloud research infrastructure, and instead focus efforts on building truly public information infrastructures.

This is a response to: Request for Information on the NIH Plan to Enhance Public Access to the Results of NIH-Supported Research, an RFI for NOT-OD-23-091 and the 2022 Nelson open access memo.

The RFI was posed as four questions rather than a general response, the prompt is in the blockquote and my response is the body text.


How to best ensure equity in publication opportunities for NIH-supported investigators.

The NIH Public Access Plan aims to maintain the existing broad discretion for researchers and authors to choose how and where to publish their results. Consistent with current practice, the NIH Public Access Plan allows the submission of final published articles to PubMed Central (PMC) (in cases where a formal agreement is in place) to minimize the compliance burden on NIH-supported researchers and also maintains the flexibility of NIH-supported researchers to submit the final peer-reviewed manuscript. NIH seeks information on additional steps it might consider taking to ensure that proposed changes to implementation of the NIH Public Access Policy do not create new inequities in publishing opportunities or reinforce existing ones.

The steps towards openness in the 2022 OSTP Memorandum and subsequent notices like NOT-OD-23-091 are admirable steps to use the power of the NIH as a funding body to set standards for equity in public research. The proposals as written seem to be “fighting the last war,” however, focused on closed-access publication without considering the significant shift in market structure as traditional scientific publishers have transformed into data brokers.

It is impossible to ignore the role of for-profit academic publishers as a primary source of inequity when considering these policies – without their prior model of subscription-based access, there would be no need for these policies at all. We cannot play coy and pretend to be market neutral when considering how scientific publishing should work: for-profit scientific publishing, now largely an oligopoly owned by a handful of information conglomerates, is an ethical catastrophe, and if we intend to grasp at the root of the problem we need to contend with the ways their business models distort the practice of science at every stage.

The publishing oligopoly has had ample time to prepare for the shoe of universal open access to drop, and if their shareholder-facing communications are any indication, they have already fully accounted for it and adapted their business models accordingly. They have been focused heavily on shifting their default strategy from subscription-based publication to author-pays APC-driven open access, as this proposal tacitly endorses. This model is intrinsically inequitable, as it is explicitly designed to shift the burden of payment from libraries to individual researchers, and more closely align the cost of publication with the benefits accrued through the prestige associated with a journal brand. At the point when (1) there is any gradient of APCs such that high-prestige journals like Nature and Cell have a higher cost, and (2) publications in high-prestige journals are a necessity for grant funding and promotion, the system is fundamentally inequitable. Worse, by atomizing the ability to negotiate with publishers, shifting from libraries and library consortia to individual researchers, we neutralize the power of some of the few organizations capable of pushing back against the for-profit publishers by embracing a positive feedback loop where researchers have every incentive to slide the slippery slope of rising APCs in order to retain their employment.

If this proposal leaves the for-profit publishing apparatus largely intact, it will enter the history of half-measures made in deference to the publishing oligopoly that leave the problem perpetually unsolved. One can only imagine the state of every field of research from pharmaceuticals to astrophysics if we had the courage in 1999 to implement the full version of Harold Varmus’ vision for PubMed Central, displacing for-profit publishing entirely with free to publish, free to read research as the norm. What could the world be like if we had 20 years of experimenting with open research dissemination, rather than spending the dawn of the information era hobbled by broken systems accessible to a vanishingly small and privileged few? Will we be looking back in another 20 years wishing we had the courage to end for-profit publishing now?

The very framing of this RFI as being focused on open access publication rather than the infrastructure of our communication demonstrates that we are missing the implications of the shift in the business models of the major for-profit publishers towards “surveillance publishing.” The next era of scholarly communication battles will be about infrastructure. Profit models are consolidating around collecting user data and repackaging it into bibliometrics and informatics platforms like so-called “research intelligence” tools like RELX’s SciVal. With the requirement for open data, we will face another period of enclosure where there is a less clear distinction between publishing, data sharing, and computation. As written, the NIH would directly create a new triple-pay system in the very policy that is intended to address the prior one: if NIH’s STRIDES project is the intended model, NIH pays cloud providers for discounts so that researchers can pay to archive their data as well as pay to export it.

The infrastructure of scientific communication is a fraction of the complexity of that which will be required for universal open data: it is trivial to start a new journal-like website, it is not so trivial to create a new server farm for storing bulk data. The inequity from APCs will be orders of magnitude greater as the process of science congeals into a series of pay-to-use platforms that skim public funding at every stage from grant proposal through data collection, analysis, and publication. The NIH discusses monitoring funding inequity for publication, but is it prepared to handle the broader inequities from the capture of research information infrastructure by a handful of cloud platform giants? Who, exactly, will have the funding necessary to pay for tools that produce clean data, to hire the data scientists to manage it, and to pay the costs of cloud storage and computation? Plainly, the NIH stands to slice off an increasing fraction of its budget to orbiting information rentiers rather than directly funding research, and the dream of universal information access will always be out of reach beyond some exorbitant hosting bill.

The landscape of options that would truly make a more equitable and robust scientific process is wide open, and all of them mean taking a meaningful stand in favor of a public information commons and against for-profit private ownership of information infrastructure. Rather than a single recommendation, I urge the NIH to reorient this and future proposals towards a nonprofit, publicly-owned informational commons. Requiring that all publishers must be operated as nonprofits is one first step. A fixed and decreasing cap on APCs to sunset pay-to-publish models in favor of so-called “diamond” open access is another. Publishing venue-agnostic grant decisions are another. Addressing the next generation of infrastructure needs equitably requires that we look beyond the “Platform as a Service” model articulated in NIH’s 2018 strategic plan for data science where public research bodies outsource and rent basic infrastructure from cloud providers. A full technical evaluation is of course out of scope of this RFI, but a system of peer to peer infrastructure that can leverage resources from individual computers through institutional and federal systems without dependence on cloud providers would be capable of addressing inequity as well as realizing the ambitions of information access articulated in these proposals.

I and others have written elsewhere and are working on these systems.

Steps for improving equity in access and accessibility of publications.

Removal of the currently allowable 12-month embargo period for NIH-supported publications will improve access to these research products for all. As noted in the NIH Public Access Plan, NIH also plans to continue making articles available in human and machine-readable forms to support automated text processing. NIH will also seek ways to improve the accessibility of publications via assistive devices. NIH welcomes input on other steps that could be taken to improve equity in access to publications by diverse communities of users, including researchers, clinicians and public health officials, students and educators, and other members of the public.

The greatest hindrance to accessibility of scientific publications is not technical (though the ailing infrastructure of the traditional publications is some decades behind the rest of the web), but the socio-economic construct of traditional journals themselves. The form of the scientific journal article is entirely unlike how the vast majority of non-scientists interact with information, and is structured by an industry that maintains its profit by strategically suppressing semantic organization in favor of using journal brands as the primary organization principle to maintain the effect of their prestige. It is prestigious to publish in Nature because people will read it. People read Nature papers because there are no effective means of finding research based on its content, leaving scientists to organize dissemination in ad-hoc media like Twitter or be dependent on downstream patches like Google Scholar.

If the NIH is serious about making scientific research more accessible to non-scientists, it must address the ways that research incentives uniformly encourage publication of impenetrable prose in domain- or prestige-limited venues in favor of promoting alternative means of organizing scientific communication, including peer review and publication. We need to not only make it easier for everyone to make sense of the scientific record, we must also reckon with how our incentive structures cause the scientific record to be so difficult to make sense of in the first place.

Accessibility for people that need assistive technologies can only be helped by taking more direct control over our infrastructures of communication. Rather than being beholden to the structure imposed by journals, we should directly address the technologies and social systems that structure scientific communication as part of a holistic project of information accessibility.

Methods for monitoring evolving costs and impacts on affected communities.

NIH proposes to actively monitor trends in publication fees and policies to ensure that they remain reasonable and equitable. NIH seeks information on effective approaches for monitoring trends in publication fees and equity in publication opportunities.

If the NIH agrees to step in and offset exorbitant APCs in prestige journals in the name of equity, particularly without clear language about what counts as a “reasonable” cost, it sends the message that it is willing to pay any price that the publishers demand. The framing of monitoring evolving costs indicates that the NIH is aware that this policy will increase publication costs, and those increases will inequitably affect researchers outside of the highest echelons of funding and prestige. We do not need to accept this as an inevitability — there are multiple routes towards explicitly avoiding an APC-driven publishing market, and towards creating a peer to peer data infrastructure that avoids outsized cost burdens for marginalized researchers.

Early input on considerations to increase findability and transparency of research.

Section IV of the NIH Public Access Plan is a first step in developing the NIH’s updated plan for persistent identifiers (PIDs) and metadata, which will be submitted to OSTP by December 31, 2024. NIH seeks suggestions on any specific issues that should be considered in efforts to improve use of PIDs and metadata, including information about experiences institutions and researchers have had with adoption of different identifiers.

It is critical to understand the history of PIDs and how they structure and reinforce the for-profit publishing system, advantaging larger players and disadvantaging independent alternatives. The DOI system itself was created in response to NIH’s 1999 push for PubMed Central in order to preserve the publishing industry’s dominance in assigning identifiers — and thus what can be counted as research. The decades of research on persistent identifiers show that decentralized alternatives like the ARK or IPFS’s CID work, and we should prioritize identifiers that can be created and structured by any researcher, rather than controlled by a centralized authority. Critical research on ontologies and metadata also show their intrinsically political nature, which also points towards tooling to express metadata rather than the current approach taken by NIH’s Biomedical Translator project of creating quasi-universal ontologies to be mapped onto.

I am available for further comment on this and the rest of the responses to this RFI, and I appreciate any time taken to read this.

bad at programming and neuroscience in beautiful Oregon.