CS/IS 6742, Spring 2024: Natural Language Processing and Social Interaction.  Prof. Lillian Lee. Tu/Th 1:25-2:40pm, Phillips 213 Image source: http://en.wikipedia.org/wiki/The_School_of_Athens

Main

More and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format. This research-oriented course examines the opportunities for natural language processing to contribute to the analysis and facilitation of socially embedded processes. Possible topics include conversation modeling, analysis of group and sub-group language, language and social relations, persuasion and other causal effects of language.

Click on the tabs just above to see information about enrollment/prerequisite policies, administrative info, overall course structure, resources, and so on.

Enrollment, prerequisites, related classes

Enrollment Limited to [[PhD and [CS MS] students] who meet the prerequisites]; PhD students not in CS/INFO will receive manual instructor permission to enroll (details to be arranged at lecture). Auditing (either officially or unofficially) is not permitted. These policies are to enable class meetings to be heavily discussion-focused.

Prerequisites All of the following: (1) CS 2110 or equivalent programming experience; (2) a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning, Cornell CS courses numbered 47xx or 67xx); (3) proficiency with using machine learning tools (e.g., fluency at training an SVM or other classifier, comfort with assessing a classifier’s performance using cross-validation)

Please take a look at the contents of some of the papers on this quick list of sample papers (URLs should be clickable) before deciding on enrollment; if most of them seem completely impenetrable (or uninteresting), this class may not be the right fit for you.

Related classes: see Cornell's NLP course list.

In particular, Spring 2024 courses CS 6741 Topics in natural language processing and machine learning, CS 5740 Natural language processing (Cornell Tech students only), INFO 4940-LEC 006 Advanced NLP for Humanities Research, CS 4744 (and other crosslists) Computational linguistics I, or CS/IS 4300 Language and information may be a better choice for you; they are excellent courses for sure!

Other classes I am less knowledgeable about: SOC 6520 Culture wars in the age of tribal politics, GOVT 3282 Data science applications in political and social research.

The webpage from the last time I (Prof. Lee) taught this class may be useful, as might the webpage from the last time I taught a graduate NLP course.

Links, office hours

Websites

Office hours and contact info

See Prof. Lee's homepage and scroll to the section on Contact and availability info.

Coursework, policies (that aren't enrollment-related)

Coursework

  1. In-class presentations (exact number depends on number of students enrolled and the difficulty of the papers we tackle). These may involve meeting with the instructor beforehand.
    • For days where another student is presenting: all non-presenting students are expected to prepare for class by at least skimming the abstract and intro of the paper(s) to be presented
  2. Participation in discussion, either during class meetings or offline
  3. Occasional small exercises of lecture material
  4. Midterm paper that reviews and critically analyzes the class material. full instructions.
  5. Final paper that reviews and critically analyzes the class material.

Policies

  1. Use of AI generation/editing systems: For each component of the workload, the vast majority of the intellectual work must be originated by you, not by text generation systems. It is OK to use aids for writing fluency --- but note that writing fluency is not part of the assessment rubrics above anyway.
    1. Example of something that is allowed: you write the initial draft(s), review its contents and double-check with the original paper. You then use some form of text generation system to proofread and improve the flow. You do not use the system’s output to add extra content.
    2. Example of something that is definitely not allowed: You essentially use a text generation system to generate an early draft, even if you later post-edit and correct the output.
    3. Example of something that is OK but requires special treatment: You start with the procedure in point 1. But, the system output includes good points that you hadn’t thought of before, or makes you realize that a point you had made isn’t quite right.
      • You may include the new material and/or make appropriate edits, but you should mention what specific system(s) you used and what changes you made based on it.
  2. Attendance: Please attend all class meetings in person that you are reasonably able to. If in-person attendance isn’t a reasonable option for a given class meeting, please contact the instructor ahead of time.
    • Illness is always a valid reason to not attend and is not held against participation accounting, but please let me know that illness is the issue.
    • Zoom attendance is available, but is only accessible to enrolled students, and only meant for cases of illness, travel, and emergency. Notify the instructor ahead of time for each lecture you need to zoom-attend.
  3. Deadlines: We do not have slip days, and there is no "you can submit late for a small penalty": you need to hit the deadlines. But if there are extenuating circumstances, please email the instructor and we can talk. (Still submit what you have before the deadline, so we have an indication of your progress at that point.)
  4. SDS accommodations: The instructor(s) have online access to SDS letters regarding accommodations for exams and other course matters, and will honor these accommodations. As recommended by the SDS office, we do ask that for each deadline, you let the instructor know beforehand in a timely fashion whether you wish to apply your accommodations.
  5. Academic integrity

    Claiming the work of others as your own is intellectual fraud and a violation of academic integrity. To avoid this, always track and credit your sources appropriately.

    Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. The Dean of the Faculty’s page has more information on Code and related procedures: https://theuniversityfaculty.cornell.edu/dean/academic-integrity/

 

Lectures

Note that assignments will remain visible even when details are hidden.
#1 Jan 23: Introduction

Lecture

Lecture references and further reading

#2 Jan 25: Getting to know each other; easing into paper readings

Assignments/announcements

  • Annotation of the "No country" paper due on Perusall by midnight Wed Jan 31. See slides for details.

Lecture

Lecture references and further reading

#3 Jan 30: Exploring differences between two language samples: "Fightin' Words"

Lecture

Lecture references and further reading

Implementations

  • Convokit implementation, based on prior code from Jack Hessel implementation and Xanda Schofield's visualizer
  • Denny, Matt. SpeedReader. In R.
  • Hessel, Jack (who took this class!). FightingWords. In Python.
  • Lim, Kenneth (who took this class!). fightin-words. Compliant with sci-kit learn and distributed by PyPI; borrows (with acknowledgment) from Jack's version.
  • Marzagão, Thiago. mcq.py. "Because this script processes one file at a time, it can handle corpora that are too large to fit in memory".
  • Silge, Julia, Alex Hayes, Tyler Schnoebelen. tidylo: Weighted Tidy Log Odds Ratio. In R.
#4 Feb 1: Distances between language sources
plot of the behavior of different distributional difference functions

Lecture

Lecture references and further reading

#5 Feb 6: "No country for old members"

Lecture

Lecture references and further reading

#6 Feb 8: Breezy intro to semantic shift

Assignments/announcements

  • Assignment 2: presentation/annotation of semantic shift papers: schedule and instructions posted.

Lecture

Lecture references and further reading

#7 Feb 13: Semantic shift II

Assignments/announcements

  • Assignment 3 "Fightin' words" announced: Ed post due and presentations on Th Feb 22. Details in slides.

Lecture

Lecture references and further reading

#8 Feb 15: Semantic shift: presentations by PH and BW.

Lecture

Lecture references and further reading

#9 Feb 20: Semantic shift: Presentations by DK, HK, and TW

Lecture

Lecture references and further reading

#10 Feb 22: Fightin' words presentations

Lecture

  • Slides on are Ed discussion. Recording (only accessible to enrolled students)
#11 Feb 27: No class: Feb break. Keeping the lecture number so that even lecture numbers remain Thursdays.
#12 Feb 29: Conversation I

Assignments/announcements

Lecture

Lecture references and further reading

#13 Mar 5: Conversation II: the Grosz and Sidner '86 theory of discourse


Garry Kasparov, Maurice Ashley, Yasser Seirawan and a bunch of soft drinks during the period of the 1996 match against Deep Blue. Photo by Kenneth Thompson, provided at computerhistory.org

Lecture

Lecture references and further reading

#14 Mar 7: Conversation III: Conversational trajectories

Lecture

Lecture references and further reading

#15 Mar 12: Presentations by KL and AM

Lecture

Lecture references and further reading

#16 Mar 14: Presentations by FH, MM, YW

Assignments/announcements

  • Full midterm-paper instructions released
  • Policies on academic integrity, use of AI generation/editing systems posted on course webpage

Lecture

Lecture references and further reading

#17 Mar 19: Presentations by AB and EF

Lecture

Lecture references and further reading

#18 Mar 21: Reflections on intention-recognition and conversation-trajectories papers presented

Lecture

Lecture references and further reading

#19 Mar 26: Midterm consultations

Assignments/announcements

  • Midterm paper due, 11:59pm Date moved
#20 Mar 28: Midterm consultations
Fri Mar 29: midterm paper due 11:59pm on CMSX. [instructions]
Apr 2: No class — Spring break
Apr 4: No class — Spring break
#21 Apr 9: (Cancelled: out sick)
#22 Apr 11: Community-specific controversy prediction with early comment trees

Lecture

Lecture references and further reading

#23 Apr 16: NLP and causal inference: an example paper

Assignments/announcements

Lecture

Lecture references and further reading

#24 Apr 18: Polarization presentations (A6 part 1)

Lecture

Lecture references and further reading

#25 Apr 23: Paper presentations (A6 part 2)

Lecture

Lecture references and further reading

#26 Apr 25: Paper presentations (A6 part 3)

Lecture

Lecture references and further reading

#27 Apr 30: Paper presentations (A6 part 4)

Lecture

Lecture references and further reading

#28 May 2: Paper presentations (A6 part 5)

Lecture

Lecture references and further reading

#29 May 7: Paper presentations (A6 part 6)

Assignments/announcements

Lecture

Lecture references and further reading

May 16 (Th), 4:30pm, as determined by the registrar: Final paper due. [instructions]

Code for generating the calendar formatting adapted from Andrew Myers. Portions of the content of this website and course were created by collaboration between Cristian Danescu-Niculescu-Mizil and Lillian Lee over multiple runnings of this course.