Evaluation methods for unsupervised word embeddings

All data and code is released under the Creative Commons BY-NC license.

Query inventory

The following archive contains the query inventory used in our online experiments in tsv (tab separated values) format.

Judgements

The following archive contains all user judgements in tsv (tab separated values) format.

Embeddings

This archive (780 MB) contains all embedding files used in our experiments. Please also read the README file inside the archive.

Wikipedia dump

The archive below (1.3 GB) contains the tokenized and lowercased Wikipedia dump of Jan 3, 2008 used to train all embedding models.