Cornell Movie--Dialogs Corpus
Distributed together with: Chameleons in Imagined Conversations.
Data and Code available in ConvoKit: a toolkit for analyzing conversations
Related corpus: Cornell Movie-Quotes Corpus
DESCRIPTION:
This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts:
- 220,579 conversational exchanges between 10,292 pairs of movie characters
- involves 9,035 characters from 617 movies
- in total 304,713 utterances
- movie metadata included:
- genres
- release year
- IMDB rating
- number of IMDB votes
- IMDB rating
- character metadata included:
- gender (for 3,774 characters)
- position on movie credits (3,321 characters)
- see the documentation for details
BibTeX ENTRY:
@InProceedings{Danescu-Niculescu-Mizil+Lee:11a,
author={Cristian Danescu-Niculescu-Mizil and Lillian Lee},
title={Chameleons in imagined conversations:
A new approach to understanding coordination of linguistic style in dialogs.},
booktitle={Proceedings of the
Workshop on Cognitive Modeling and Computational Linguistics, ACL 2011},
year={2011}
}
This material is based upon work supported in part by the National Science Foundation under grant IIS-0910664.
Any opinions, findings, and conclusions or recommendations expressed above are those of the author(s) and do
not necessarily reflect the views of the National Science Foundation.