Goal
The goal of this task is to predict changes in the number of citations to
individual papers over time.
Timeline
The task and data will be available April 6, 2003. Submissions are due by May
16, 23:59pm EST. Submission instructions are online.
Input
Contestants will be given:
-
the LaTeX source of all papers in the hep-th portion of the arXiv through March
1, 2003. For each paper, this includes the main .tex file but not separate
include files or figures. It also includes the hep-th arxiv number as a unique
ID.
-
The abstracts for all of the hep-th papers in the arXiv. For each paper the
abstract file contains:
-
arXiv submission date
-
revised date(s)
-
title
-
authors
-
abstract
-
The SLAC/SPIRES dates for all hep-th papers. Some older papers were uploaded
years after their intial publication and the arXiv submission date from the
abstracts may not correspond to the publication date. An alternative date has
been provided from SLAC/SPIRES that may be a better estimate for the initial
publication of these old papers.
-
The complete citation graph for the hep-th papers, obtained from SLAC/SPIRES.
Each node will be labeled by its unique ID from (1). Note that revised papers
may have updated citations. As such, citations may refer to future papers, i.e.
a paper may cite another paper that was published after the first paper.
Update May 12, 2003: An updated version
of the data for March and April of 2003 has been provided.
Output
For each paper P in the collection, contestants should report the predicted
difference between
-
the number of citations P will receive from hep-th papers submitted during the
period May 1, 2003 - July 31, 2003, and
-
the number of citations P will receive from hep-th papers submitted during the
period February 1, 2003 - April 30, 2003. (So if there were more citations
during the period May 1, 2003 - July 31, 2003, then the prediction should be a
positive number.)
The format for the submission is a simple 2 column vector of [arxiv id]
[difference] sorted by arxiv id.
Update May 6, 2003: This difference does not need to be an integer;
floating point numbers are valid predictions.
Evaluation
The target result is a vector V with one coordinate for each paper in the
initial collection (1) that receives at least 6 citations during the period
February 1, 2003 - April 30, 2003. The P-th coordinate of V will consist of the
true difference in number of citations for paper P.
Based on a contestant's predictions, a vector W will be constructed, over the
same set of paper; the P-th coordinate of W will consist of the predicted
difference in number of citations for paper P.
The score of a prediction vector W will be equal to the L_1 difference between
the vectors V and W.
|