More and more of life is now manifested online, and many of the digital traces that are
left by human activity are increasingly recorded in natural-language format.
This research-oriented course examines the opportunities for natural language
processing to contribute to the analysis and facilitation of socially embedded processes. Possible topics include conversation modeling, analysis of group and sub-group language, language and social relations, persuasion and other causal effects of language.
Click on the tabs just above to see information about enrollment/prerequisite policies, administrative info, overall course structure, resources, and so on.
Enrollment, prerequisites, related classes
Enrollment Limited to [[PhD and [CS MS] students] who meet the prerequisites]; PhD students not in CS/INFO will receive manual instructor permission to enroll (details to be arranged at lecture). Auditing (either officially or unofficially) is not permitted. These policies are to enable class meetings to be heavily discussion-focused.
Prerequisites All of the following: (1) CS 2110
or equivalent programming experience;
(2) a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning,
Cornell CS courses numbered 47xx or 67xx); (3)
proficiency with using machine learning tools
(e.g., fluency at training an SVM or other classifier, comfort with assessing a classifier’s performance using cross-validation)
Please take a look at the contents of some of the papers on this quick list of sample papers (URLs should be clickable) before deciding on enrollment; if most of them seem completely impenetrable (or uninteresting), this class may not be the right fit for you.
Zoom. Only accessible to enrolled students, and only meant for cases of illness, travel, and emergency. Notify the instructor ahead of time for each lecture you need to zoom-attend.
CMS.
Site for submitting assignments, unless otherwise noted. Login with NetID credentials and select course CS 6742.
You may find this graphically-oriented guide to common operations useful: see how to replace a prior submission; how to tell if CMS successfully received your files; how to form a group.
Office hours and contact info
See Prof. Lee's homepage and scroll to the section on Contact and availability info.
In-class presentations (exact number depends on number of students enrolled and the difficulty of the papers we tackle). These may involve meeting with the instructor beforehand.
For days where another student is presenting: all non-presenting students are expected to prepare for class by at least skimming the abstract and intro of the paper(s) to be presented
Participation in discussion, either during class meetings or offline
Occasional small exercises of lecture material
Midterm paper that reviews and critically analyzes the class material. full instructions.
Final paper that reviews and critically analyzes the class material.
Policies
Use of AI generation/editing systems: For each component of the workload, the vast majority of the intellectual work must be originated by you, not by text generation systems. It is OK to use aids for writing fluency --- but note that writing fluency is not part of the assessment rubrics above anyway.
Example of something that is allowed: you write the initial draft(s), review its contents and double-check with the original paper. You then use some form of text generation system to proofread and improve the flow. You do not use the system’s output to add extra content.
Example of something that is definitely not allowed: You essentially use a text generation system to generate an early draft, even if you later post-edit and correct the output.
Example of something that is OK but requires special treatment: You start with the procedure in point 1. But, the system output includes good points that you hadn’t thought of before, or makes you realize that a point you had made isn’t quite right.
You may include the new material and/or make appropriate edits, but you should mention what specific system(s) you used and what changes you made based on it.
Attendance: Please attend all class meetings in person that you are reasonably able to.
If in-person attendance isn’t a reasonable option for a given class meeting, please contact the instructor ahead of time.
Illness is always a valid reason to not attend and is not held against participation accounting, but please let me know that illness is the issue.
Zoom attendance is available, but is only accessible to enrolled students, and only meant for cases of illness, travel, and emergency. Notify the instructor ahead of time for each lecture you need to zoom-attend.
Deadlines: We do not have slip days, and there is no "you can submit late for a small penalty": you need to hit the deadlines. But if there are extenuating circumstances, please email the instructor and we can talk. (Still submit what you have before the deadline, so we have an indication of your progress at that point.)
SDS accommodations: The instructor(s) have online access to SDS letters regarding accommodations for exams and other course matters, and will honor these accommodations. As recommended by the SDS office, we do ask that for each deadline, you let the instructor know beforehand in a timely fashion whether you wish to apply your accommodations.
Academic integrity
Claiming the work of others as your own is intellectual fraud and a violation of academic integrity. To avoid this, always track and credit your sources appropriately.
Liberman, Mark. Debate words (Fox News Republican presidential debate) 2023. Liberman's Language Log blog post also links to his previous analyses of other data using Monroe et al.'s technique.
Hessel, Jack (who took this class!).
FightingWords. In Python.
Lim, Kenneth (who took this class!).
fightin-words.
Compliant with sci-kit learn and distributed by PyPI; borrows (with acknowledgment)
from Jack's version.
Marzagão, Thiago. mcq.py. "Because this script processes one file at a time, it can handle corpora that are too large to fit in memory".
Fitch, W. Tecumseh. 2007. An Invisible Hand. Nature 7163:665--667. https://doi.org/10.1038/449665a.
Hamilton, William L., Jure Leskovec, and Dan Jurafsky. 2016. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). https://doi.org/10.18653/v1/P16-1141.
Noble, Bill, Asad Sayeed, Raquel Fernández, and Staffan Larsson. 2021. Semantic Shift in Social Networks. *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics, 26–37.
#13 Mar 5: Conversation II: the Grosz and Sidner '86 theory of discourse
Garry Kasparov, Maurice Ashley, Yasser Seirawan and a bunch of soft drinks during the period of the 1996 match against Deep Blue.
Photo by Kenneth Thompson,
provided at computerhistory.org
Yuan, Jiaqing, and Munindar P. Singh. 2023. Conversation Modeling to Predict Derailment. International AAAI Conference on Web and Social Media (ICWSM) 17: 926–35. doi:10.1609/icwsm.v17i1.22200. Observation: there are some differences of note between the proceedings version and the arxiv version; the latter makes it clear that the datasets used were not created by the authors.
#18 Mar 21: Reflections on intention-recognition and conversation-trajectories papers presented
Bryan, Christopher J., Gregory M. Walton, Todd Rogers, and Carol S. Dweck. 2011. Motivating Voter Turnout by Invoking the Self. Proceedings of the National Academy of Sciences 108 (31): 12653–56. https://doi.org/10.1073/pnas.1103343108.
Response to followup: "What is an authentic replication attempt and what is not? Gerber et al.’s paper ... gives us the opportunity to reflect on this issue of longstanding concern to us." Bryan, Christopher J., Gregory M. Walton, and Carol S. Dweck, Oct 18, 2016. Psychologically authentic versus inauthentic replication attempts. Proceedings of the National Academy of Sciences 113(43): E6548.
Response: "Although we find Bryan et al.’s ... explanation unconvincing, this exchange is well-timed. The original findings have (to our knowledge) never been successfully replicated, and this November provides ample opportunity to test noun vs. verb in the political environment Bryan et al. ... suggest is ideal for producing 11–14 percentage-point effects." Gerber, Alan S., Gregory A. Huber, Daniel R. Biggers, and David J. Hendry, Oct 25, 2016. Reply to Bryan et al.: Variation in context unlikely explanation of nonrobustness of noun versus verb results. Proceedings of the National Academy of Sciences 113(43): E6549--E6550.
May 16 (Th), 4:30pm, as determined by the registrar: Final paper due. [instructions]
Code for generating the calendar formatting
adapted from Andrew Myers. Portions of the content of this
website and course were created by collaboration between Cristian Danescu-Niculescu-Mizil and Lillian Lee over multiple
runnings of this course.