Assignment 1
(updates will be posted on Piazza).
Task: Propose a research idea related to one
of the readings below and execute a pilot empirical study using one of the
listed datasets. Most crucial to is that (a) your idea
is interesting, and (b) your pilot empirical study demonstrates that you can
quickly evaluate feasibility and estimate the chances of an interesting result.
It is neither required nor expected that your proposal for this assignment
will relate to your final course project.
Please strive to post your initial ideas well in advance of the actual due date
(a suggested goal: Tuesday Aug. 28, 11:59pm) to (a) give time to your classmates to read your proposal and post feedback; (b) since you are encouraged to work in groups, early posting will facilitate linking up with classmates having similar interests.
After posting your proposal, continue to monitor and participate on the course discussion site.
After all, your classmates have read the same papers and are using the same data,
so we have a lot of common ground.
Example things to post: feedback on other people's proposals;
some oddity of the datasets you've found that is worth alerting others to;
unexpected early results that are interesting or that you need help interpreting.
Basically, I would like us all to act as a team; we're all in this together!
The two required readings
- Excerpts from anaesthetica's “Attacked from within”,
2009.
- Justine Zhang, Ravi Kumar, Sujith Ravi, and Cristian Danescu-Niculescu-Mizil, 2016.
Conversational flow in Oxford-style debates.
NAACL, pp.136–141.
These readings were chosen because they are thought-provoking, accessible, short,
and together represent a wide range of possibilities.
The two datasets — you are required to use one.
- Cornell ChangeMyView data, November 2016 version
- README for the January 2016 version — still mostly applicable, since the file format did not change.
- Discussion and example code
- Optional reading: the original paper
in which this dataset was introduced, Chenhao Tan, Vlad Niculae, Cristian Danescu-Niculescu-Mizil, and Lillian Lee, 2016,
Winning arguments: Interaction dynamics and persuasion strategies in good-faith online discussions,
WWW, pp. 613–624.
- Miscellaneous notes: the reason I chose this dataset, rather than
the dataset associated with the Zhang et al. reading, is that it has more types of information in it, and
so might be conducive to a wider variety of exploratory projects.
- Reddit coarse discourse dataset
- README and annotation instructions containing a description of the provided labels
- Note that the content of the threads needs to be retrieved; fortunatelly, a script for doing that is provided (you will need to have a Reddit account). Even more fortunate is that your colleague Jonathan P. Chang has already ran the script and courteously shared the data (now on Piazza). Note that some comments might not be retirevable, as they might have been deleted in the meanwhile.
- Optional reading: the original paper
in which this dataset was introduced, Amy Zhang and Bryan Culbertson and Praveen Paritosh, 2017,
Characterizing Online Discussion Using Coarse Discourse Sequences,
ICWSM.
Data format
If you program in Python3, you are strongly engouraged to transform the dataset you are working with into ConvoKit
format. This will allow you to (a) directly use the
ConvoKit functionality; (b) share code with (future) teammates and other groups; (c) contribute to ConvoKit.
Collaboration
Teamwork is encouraged.
Groups of any size can be formed, where each group jointly submits a single project report
at the end on the official course management system, CMS. However, each individual remains
individually responsible for posting feedback on other people's/group's proposals.
There are further notes on how to find/work as a group below.
Due dates
All deadlines refer to 5:00pm unless otherwise specified.
- Monday Aug. 27:
- Enroll on the course Piazza page http://piazza.com/cornell/Fall2018/cs6742. The piazza password will be provided on the first day of class (and later listed in the CMS class description).
- Wednesday Aug. 29, 2:30pm (Note the earlier-than-5pm deadline, and, as mentioned in the "Task" description above, aim for an earlier date of Tuesday Aug. 28, 11:59pm):
- Before completing this you need to have read the readings and have peeked at the data. The main purpose of this initial round of ideas is to fuel a round of guided brainstorming that will take place in class on Thursday Aug 30 and to help with group formation.
- Post study idea(s) to Piazza using the folder (topic) "Assignment 1". Post it as an individual "Question" (not a "Note"), which makes it easier for to track whether have replied yet. Choose a title for your "Question" that describes your project idea (e.g., "identifying common challenger strategies" as opposed to "four random ideas"). length expectation is 3+ paragraphs; these paragraphs don't have to be long. Be as detailed as possible while remaining sensible; make connections to the readings and to specifics of the dataset.
- If persons A, B, and C have already decided to work together, then A should post the "Question", and B and C should each individually post a response to A's "question" stating that they've agreed to work together. This way, can tell who has finished this part of the assignment.
- If, subsequently, D and E want to join forces with A, B and C because your proposals are similar, please arrange to do so among yourselves. The deadline for CMS group formation is a bit later than the proposal submission deadline precisely to allow for this possibility.
- Friday August 31: form groups on CMS. CMS group formation requires invitations and acceptance of invitations via the system, i.e., action by two people per person added; please check the official CMS documentation or this more graphically-oriented guide for instructions. need the group information from CMS to schedule the group presentations.
- Thursday Sept. 6, before class:
- Check back on Piazza
for any comments on your proposal, and add, as replies, any suggestions
you have on other people's proposals. Ideally, you will continually
monitor the site for updates to your or other people's proposals.
- Be prepared to informally discuss in class how things are going. For example, any preliminary observations about the data? No formal presentation materials are required.
- Monday Sept. 17: Submit a project report on CMS. One group = one CMS submission: any
group member can upload a version, which will overwrite any previous versions
by any other members of the group.
Required information: (a) the overall research problem you proposed; (b) relation
of your research problem to the reading(s) (this description should provide
evidence that you read the relevant parts of the readings carefully);
(c) proposed techniques; steps employed to process/clean/select data;
(d) results (probably preliminary, possibly negative); (e) what you learned;
(f) a list of the roles that each member of the group played, if there is more than one person in your group.
(g) If you collaborated a bit with people outside your group, acknowledge those
other people by name and explain their contribution in the writeup.
- Thursday Sept. 20, in class: Group presentations. You can bring handouts (often most effective for discussions, since people can refer to things out of order) or project slides off a laptop. If the latter, bring a spare copy of your presentation on a flash drive and email a copy.