phs-email-Enron dataset
This is a hypergraph dataset of Enron emails with a core-fringe
structure. Nodes are labeled as either "core" or "fringe", with core
nodes corresponding to email addresses of the individuals whose email
inboxes were released as part of the investigation by the Federal
Energy Regulatory Commission. Each hyperedge consists of a set of
email addresses, which have all appeared on the same email.
Each hyperedge has at least one core node, so the core forms a hitting
set for the hypergraph. We studied ways of recorvering core labels
from network structure, i.e., the case of finding a planted hitting
set. Some summary statistics of the network are:
- number of nodes: 4,423
- number of hyperedges: 15,653
- number of core nodes: 146
- rank of hypergraph (maximum hyperedge size): 25
- Planted Hitting Set Recovery in Hypergraphs.
Ilya Amburg, Jon Kleinberg, and Austin R. Benson.
Journal of Physics: Complexity, 2021. [bibtex]