Connecting papers, patents, and clinical trials in biomedical science.
A comprehensive resource for next-generation knowledge discovery.
PubMed Knowledge Graph (PKG) is a large-scale, high-quality knowledge graph dataset tailored for the biomedical field. It integrates multi-source heterogeneous data to solve the problem of information silos.
Papers, patents, and clinical trials are indispensable types of scientific literature in biomedicine. PKG 2.0 connects these dispersed resources into a systematic, fine-grained network, enabling researchers to perform deep mining and analysis.
Contains the original data tables directly from PubMed source files (XML parsed).
Includes raw data from other sources (NIH ExPORTER, ORCID, etc.) used during processing.
Represents the core data of PKG, centered around Papers, Patents, and Clinical Trials entities.
Updated to PKG24S4 based on PubMed 2025 Baseline. External sources (Patents/Clinical Trials) synchronized for consistency.
Privacy Update: Personal information (name, gender, race) removed to comply with new data protection regulations.
Correction: C04_ReferenceList_Papers - Removed OpenCitations data to fix ID mapping errors.
Initial release of PKG 2.0 alongside the arXiv article.
All data is stored in SQL and TSV files available via Science Data Bank.
Use terminal command:
gunzip tablename.sql.gz