Evaluation methods for unsupervised word embeddings
Query inventory
The following archive contains the query inventory used in our online experiments in tsv (tab separated values) format.
Judgements
The following archive contains all user judgements in tsv (tab separated values) format.
Embeddings
This archive (780 MB) contains all embedding files used in our experiments. Please also read the README file inside the archive.
Wikipedia dump
The archive below (1.3 GB) contains the tokenized and lowercased Wikipedia dump of Jan 3, 2008 used to train all embedding models.