sos-tags-mathoverflow dataset
This dataset is a collection of sequences of sets. Stack exchange is
a collection of question-and-answer web sites. Users post questions
and annotate them with up to 5 tags. In this dataset, each sequence is
the time-ordered set of tags applied to questions asked by a user
on MathOverflow. All
sequences contain at least 10 sets, and only sets of size at most 5
are considered. Some basic statistics of this dataset are:
- number of sequences: 1,594
- number of unique elements appearing in sets: 1,399
- number of sets: 44,950
- number of unique sets: 24,157
- Sequences of sets.
Austin R. Benson, Ravi Kumar, and Andrew Tomkins.
Proceedings of KDD, 2018. [bibtex]