email-Enron dataset
This is a temporal higher-order network dataset, which here means a
sequence of timestamped simplices where each simplex is a set of
nodes. In email communication, messages can be sent to multiple
recipients. In this dataset, nodes are email addresses at Enron and a
simplex is comprised of the sender and all recipients of the
email. Timestamps are in millisecond resolution. Only email addresses
from a core set of employees are included. The dataset was derived
from the corpus hosted by William
Cohen here. We restricted
to simplices that consist of at most 25 nodes. Some basic statistics
of this dataset are:
- number of nodes: 143
- number of timestamped simplices: 10,883
- number of unique simplices: 1,542
- number of edges in projected graph: 1,800
- email-Enron.tar.gz (timestamped simplices and node labels)
- email-Enron-proj-graph.tar.gz (weighted projected graph)
- email-Enron-full.tar.gz (timestamped simplices and node labels)
- email-Enron-full-proj-graph.tar.gz (weighted projected graph)
- Simplicial closure and higher-order link prediction.
Austin R. Benson, Rediet Abebe, Michael T. Schaub, Ali Jadbabaie, and Jon Kleinberg.
Proceedings of the National Academy of Sciences (PNAS), 2018. [bibtex]