PubMed Knowledge Graph 2.0

Connecting papers, patents, and clinical trials in biomedical science.
A comprehensive resource for next-generation knowledge discovery.

Get Dataset Read Paper
36M+Scientific Papers
1.3M+Biomedical Patents
0.48M+Clinical Trials

About The Dataset

PubMed Knowledge Graph (PKG) is a large-scale, high-quality knowledge graph dataset tailored for the biomedical field. It integrates multi-source heterogeneous data to solve the problem of information silos.

Papers, patents, and clinical trials are indispensable types of scientific literature in biomedicine. PKG 2.0 connects these dispersed resources into a systematic, fine-grained network, enabling researchers to perform deep mining and analysis.

  • Entity Integration: Bio-entities extraction.
  • Author Disambiguation: High-precision author networks.
  • Cross-Domain Links: Citation relationships.
  • Funding Data: NIH funding & research projects.

Data Organization

Table A Series

Contains the original data tables directly from PubMed source files (XML parsed).

Table B Series

Includes raw data from other sources (NIH ExPORTER, ORCID, etc.) used during processing.

Table C Series

Represents the core data of PKG, centered around Papers, Patents, and Clinical Trials entities.

Update Log

Jul 20, 2025

Updated to PKG24S4 based on PubMed 2025 Baseline. External sources (Patents/Clinical Trials) synchronized for consistency.

Jun 17, 2025

Privacy Update: Personal information (name, gender, race) removed to comply with new data protection regulations.

Apr 17, 2025

Correction: C04_ReferenceList_Papers - Removed OpenCitations data to fix ID mapping errors.

Oct 10, 2024

Initial release of PKG 2.0 alongside the arXiv article.

Download & Resources

Access Data Now

All data is stored in SQL and TSV files available via Science Data Bank.

FAQ: How to decompress .gz files?

Windows

We recommend using 7-Zip or WinRAR.

Download 7-Zip
Linux / macOS

Use terminal command:

gunzip tablename.sql.gz

Citation

Xu, J., Yu, C., Xu, J. et al. PubMed knowledge graph 2.0: Connecting papers, patents, and clinical trials in biomedical science. Sci Data 12, 1018 (2025). https://doi.org/10.1038/s41597-025-05343-8
Xu, J., Kim, S., Song, M., et al. (2020). Building a PubMed knowledge graph. Scientific Data, 7, 205. https://doi.org/10.1038/s41597-020-0543-2