NDC-classes dataset
This is a temporal higher-order network dataset, which here means a
sequence of timestamped simplices where each simplex is a set of
nodes. Under the Drug Listing Act of 1972, the U.S. Food and Drug
Administration releases information on all commercial drugs going
through the regulation of the agency, forming the National Drug Code
(NDC) Directory. In this dataset, each simplex corresponds to a drug
and the nodes are class labels applied to the drugs. Timestamps are
in days and represent when the drug was first marketed. We restricted
to simplices that consist of at most 25 nodes. Some basic statistics
of this dataset are:
- number of nodes: 1,161
- number of timestamped simplices: 49,724
- number of unique simplices: 1,222
- number of edges in projected graph: 6,222
- NDC-classes.tar.gz (timestamped simplices, node labels, and simplex labels)
- NDC-classes-proj-graph.tar.gz (weighted projected graph)
- NDC-classes-full.tar.gz (timestamped simplices, node labels, and simplex labels)
- NDC-classes-full-proj-graph.tar.gz (weighted projected graph)
- Simplicial closure and higher-order link prediction.
Austin R. Benson, Rediet Abebe, Michael T. Schaub, Ali Jadbabaie, and Jon Kleinberg.
Proceedings of the National Academy of Sciences (PNAS), 2018. [bibtex]