Citation Integrity
NLP for citation accuracy analysis
Project Title: Natural Language Processing to Assess and Improve Citation Integrity in Biomedical Publications
Funding: HHS Office of Research Integrity (ORI) (ORIIR220073)
Project Period: 2022-2024
Role: PI
Trustworthy science is crucial to scientific progress, evidence-based policies, and human health. Citations play a fundamental role in diffusion of scientific knowledge and research assessment; yet their role in research integrity is often overlooked. Citation inaccuracies (e.g., citation of non-existent findings) undermine the integrity of the biomedical literature, distorting the perception of available evidence with potentially serious consequences for human health. A recent meta-analysis showed that 25.4% of medical articles contained a citation error. The objective of this project is to develop scalable natural language processing (NLP) and artificial intelligence (AI) algorithms to automatically assess biomedical publications for citation content accuracy. The resulting models can be embedded in practical software tools. With these new tools, authors will be able to improve their citation quality; journals and peer reviewers will be able to scrutinize questionable citation practices pre-publication; and research administrators, research integrity officers, funders, and policymakers will be able to investigate citation practices, integrity issues, and knowledge diffusion via citations.