http://www.cs.cornell.edu/home/kleinber/sp05course.html
Overview
Information networks such as the World Wide Web are
characterized by the interplay between heterogeneous content
and a complex underlying link structure.
This course covers recent research on algorithms for analyzing
such networks, and models that abstract their basic properties.
Topics include combinatorial and probabilistic techniques for
link analysis, centralized and decentralized search algorithms,
network models based on random graphs, and connections with work
in the social sciences.
The course pre-requisites include background in algorithms
and graphs, as well as some
familiarity with probability and linear algebra.
The work for the course will consist primarily
of a short reaction paper and a more subtantial project.
The coursework is discussed
in more detail here.
Course Outline
(1) Small-World Properties in Networks
A major goal of the course is to illustrate
how networks across a variety of domains exhibit
common structure at a qualitative level.
One area in which this arises is in the study
of `small-world properties' in networks:
many large networks have short paths between most pairs of nodes,
even though they are highly clustered at a local level,
and they are searchable in the sense that one can
navigate to specified target nodes without global knowledge.
These properties turn out to provide insight into the
structure of large-scale social networks, and,
in a different direction, to have applications
to the design of decentralized peer-to-peer systems.
- Small-World Phenomena and Decentralized Seearch
-
S. Milgram.
The small world problem.
Psychology Today 1(1967).
-
J. Travers and S. Milgram.
An experimental study of the small world problem.
Sociometry 32(1969).
-
P. Killworth and H. Bernard,
Reverse small world experiment.
Social Networks 1(1978).
-
Watts, D. J. and S. H. Strogatz.
Collective dynamics of 'small-world' networks.
Nature 393:440-42(1998).
-
J. Kleinberg.
The small-world phenomenon: An algorithmic perspective.
Proc. 32nd ACM Symposium on Theory of Computing, 2000.
-
J. Kleinberg.
Small-World Phenomena and the Dynamics of Information.
Advances in Neural Information Processing Systems (NIPS) 14, 2001.
-
D. J. Watts, P. S. Dodds, M. E. J. Newman
Identity and Search in Social Networks.
Science, 296, 1302-1305, 2002.
-
J. Kleinfeld.
Could it be a Big World After All?
The `Six Degrees of Separation' Myth.
Society, April 2002.
-
Peter Sheridan Dodds, Roby Muhamad, Duncan J. Watts.
An Experimental Study of Search in Global Social Networks.
Science 301(2003), 827.
-
Duncan Watts and I discuss the small-world problem on NPR's
Science Friday (8 August 2003).
-
F. Menczer.
Growing and Navigating the Small World Web by Local Content.
Proc. Natl. Acad. Sci. USA 99(22): 14014-14019, 2002
-
Lada A. Adamic, Rajan M. Lukose, Amit R. Puniyani, Bernardo A. Huberman.
Search in Power-Law Networks.
Phys. Rev. E, 64 46135 (2001).
-
L. Adamic, E. Adar.
How To Search a Social Network.
submitted to Social Networks, 2004.
-
Lada A. Adamic and Bernardo A. Huberman
Information dynamics in a networked world.
in: Eli Ben-Naim, Hans Frauenfelder, Zoltan Toroczkai, (Eds.),
'Complex Networks', Lecture Notes in Physics, Springer, 2003.
-
A. Clauset and C. Moore
How Do Networks Become Navigable?
preprint at arxiv.org, 2003.
- Decentralized Search in Peer-to-Peer Networks
-
T. Hong.
Performance.
Peer-to-Peer: Harnessing the Power of Disruptive Technologies.
(A. Oram, editor),
O'Reilly and Associates, 2001.
-
A. Goel, H. Zhang, and R. Govindan.
Using the Small-World Model to Improve Freenet Performance.
IEEE Infocom, 2002.
-
C. Greg Plaxton, Rajmohan Rajaraman, Andrea W. Richa.
Accessing Nearby Copies of Replicated Objects in a Distributed Environment.
ACM Symposium on Parallel Algorithms and Architectures, SPAA 1997.
-
S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker.
A Scalable Content-Addressable Network.
ACM SIGCOMM, 2001
-
A. Rowstron, P. Druschel.
Pastry: Scalable, distributed object location and routing
for large-scale peer-to-peer systems.
18th IFIP/ACM International Conference on Distributed Systems
Platforms (Middleware 2001).
-
I. Stoica, R. Morris, D. Karger, F. Kaashoek, H. Balakrishnan.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
ACM SIGCOMM, 2001.
-
B. Y. Zhao, J. D. Kubiatowicz, A. D. Joseph,
Tapestry: An Infrastructure for Fault-Tolerant Wide-Area
Location and Routing.
UC Berkeley Computer Science Division, Report No. UCB/CSD 01/1141, April 2001.
-
Sylvia Ratnasamy, Scott Shenker and Ion Stoica.
Routing Algorithms for DHTs: Some Open Questions.
1st International Workshop on Peer-to-Peer Systems (IPTPS), 2002.
-
Dalia Malkhi, Moni Naor, David Ratajczak.
Viceroy: A Scalable and Dynamic Emulation of the Butterfly.
ACM Symposium on Principles of Distributed Computing, 2002.
-
G. Manku, M. Naor, and U. Wieder.
Know Thy Neighbor's Neighbor:
The Power of Lookahead in Randomized P2P Networks.
In Proc. of ACM Symp. on Theory of Computing (STOC), 2004.
(2) Power-Law Distributions
If we were to generate a random graph on n nodes by including each
possible edge independently with some probability p, then the fraction
of nodes with d neighbors would decrease exponentially in d.
But for many large networks -- including the Web, the Internet,
collaboration networks, and semantic networks --
it quickly became clear that the fraction of nodes with d
neighbors decreases only polynomially in d;
to put it differently, the distribution of degrees obeys a power law.
What processes are capable of generating such power laws,
and why should they be ubiquitous in large networks?
The investigation of these questions suggests that power laws
are just one reflection of the local and global processes driving
the evolution of these networks.
-
A.-L. Barabasi, Reka Albert, and Hawoong Jeong.
Mean-field theory for scale-free random networks.
Physica A 272 173-187 (1999).
-
Bernardo A. Huberman, Lada A. Adamic.
Growth dynamics of the World-Wide Web.
Nature, 399 (1999) 130.
-
Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos.
On Power-Law Relationships of the Internet Topology.
ACM SIGCOMM 1999.
-
R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal.
Stochastic models for the Web graph.
41th IEEE Symp. on Foundations of Computer Science, 2000, pp. 57-65.
-
W. Aiello, F. Chung, L. Lu.
Random evolution of massive graphs.
Handbook of Massive Data Sets, (Eds. James Abello et al.), Kluwer, 2002,
pages 97-122.
-
B. Bollobas, C. Borgs, J. Chayes, and O. Riordan
Directed scale-free graphs
Proceedings of the 14th ACM-SIAM Symposium
on Discrete Algorithms (2003), 132-139.
-
R. Kleinberg, J. Kleinberg,
Isomorphism and Embedding Problems for Infinite Limits of Scale-Free Graphs.
Proc. 15th ACM-SIAM Symposium on Discrete Algorithms, 2005.
-
M. Mitzenmacher
A Brief History of Generative Models for Power Law and Lognormal Distributions.
Internet Mathematics, vol 1, No. 2, pp. 226-251, 2004.
-
R. Albert and A.-L.Barabasi.
Statistical mechanics of complex networks.
Reviews of Modern Physics 74, 47 (2002).
-
A. Fabrikant, E. Koutsoupias, C. Papadimitriou.
Heuristically Optimized Trade-offs: A New Paradigm for
Power Laws in the Internet.
29th International Colloquium on Automata, Languages,
and Programming (ICALP), 2002.
-
Noam Berger, Christian Borgs, Jennifer Chayes,
Raissa D'Souza, and Robert Kleinberg.
Competition-Induced Preferential Attachment.
To appear in Combinatorics, Probability, and Computing.
Extended abstract appeared in Proceedings of the
31st International Colloquium on Automata, Languages, and Programming
(ICALP 2004), pages 208-221.
-
M. Molloy and B. Reed.
A Critical Point for Random Graphs with a Given Degree Sequence.
Random Structures and Algorithms 6(1995) 161-180.
-
M. E. J. Newman, S. H. Strogatz and D. J. Watts,
Random graphs with arbitrary degree distributions and their applications.
Phys. Rev. E 64, 026118 (2001).
-
Qian Chen, Hyunseok Chang. Ramesh Govindan,
Sugih Jamin, Scott J. Shenker, Walter Willinger.
The Origin of Power Laws in Internet Topologies Revisited.
Proc. of IEEE Infocom 2002.
-
J. Carlson and J. Doyle.
Highly Optimized Tolerance: A Mechanism for Power Laws in Designed Systems.
Physical Review E 60:2(1999).
(3) Cascading Behavior in Networks
We can think of a network as a large circulatory system,
through which information continuously flows.
This diffusion of information can happen rapidly or slowly;
it can be disastrous -- as in a panic or cascading failure --
or beneficial -- as in the spread of an innovation.
Work in several areas has proposed models for such processes,
and investigated when a network is more or less susceptible
to their spread.
This type of diffusion or cascade process
can also be used as a design principle for network protocols.
This leads to the idea of
epidemic algorithms, also called gossip-based algorithms,
in which information is propagated through a collection
of distributed computing hosts, typically using some
form of randomization.
- Diffusion of Innovation in Social Networks
-
M. Granovetter.
Threshold models of collective behavior.
American Journal of Sociology 83(6):1420-1443, 1978.
-
T. Schelling.
Micromotives and Macrobehavior.
Norton, 1978.
-
S. Morris.
Contagion.
Review of Economic Studies 67 (2000), 57-78.
-
H. Peyton Young.
The Diffusion of Innovations in Social Networks.
Santa Fe Institute Working Paper 02-04-018.
-
E. Berger.
Dynamic Monopolies of Constant Size.
Journal of Combinatorial Theory Series B 83(2001), 191-200.
-
Pedro Domingos, Matt Richardson.
Mining Knowledge-Sharing Sites for Viral Marketing.
Eighth International Conference on Knowledge Discovery and Data Mining,
KDD-2002.
-
D. Kempe, J. Kleinberg, E. Tardos.
Maximizing the Spread of Influence through a Social Network.
Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2003.
-
P. Dodds and D. J. Watts.
Universal Behavior in a Generalized Model of Contagion.
Phyical Review Letters, 2004.
- Epidemic Algorithms in Networks
-
Alan J. Demers, Daniel H. Greene, Carl Hauser, Wes Irish, John Larson,
Scott Shenker, Howard E. Sturgis, Daniel C. Swinehart, Douglas B. Terry.
Epidemic Algorithms for Replicated Database Maintenance.
Operating Systems Review 22(1): 8-32 (1988)
-
R. van Renesse.
Scalable and secure resource location.
Hawaii International Conference on System Sciences, 2000.
-
R. van Renesse, K. Birman, W. Vogels.
Astrolabe: A Robust and Scalable Technology For Distributed System
Monitoring, Management, and Data Mining.
to appear in ACM Transactions on Computer Systems, 2003.
-
Chalee Asavathiratham.
The Influence Model: A Tractable Representation
for the Dynamics of Networked Markov Chains.
Ph.D. Thesis, MIT 2000.
-
Mor Harchol-Balter, Tom Leighton, Daniel Lewin.
Resource Discovery in Distributed Networks.
ACM Symposium on Principles
of Distributed Computing (PODC), 1999, pp. 229-238.
-
Shay Kutten, David Peleg
Deterministic Distributed Resource Discovery.
ACM Symposium on Principles
of Distributed Computing (PODC), 2000.
-
R. Karp, C. Schindelhauer, S. Shenker, B. Vocking.
Randomized Rumor Spreading.
41st IEEE Symposium on Foundations of Computer Science, 2000.
-
D. Kempe, J. Kleinberg, A. Demers.
Spatial gossip and resource location protocols.
Proc. 33rd ACM Symposium on Theory of Computing, 2001.
(4) Nash Equilibria in Networks
In order to model the interaction of agents in a large network,
it often makes sense to assume that their behavior is strategic --
that each agent operates so as to optimize his/her/its own self-interest.
The study of such systems involves issues at the interface of
algorithms and game theory.
A central definition here is that of a Nash equilibrium --
a state of the network from which no agent has an incentive to deviate --
and recent work has studied how well a system operates when
it is in a Nash equilibrium.
-
Christos H. Papadimitriou,
Algorithms, Games, and the Internet.
Proc. 33rd ACM Symposium on Theory of Computing, 2001.
-
E. Tardos.
Network Games.
Proc. 36th ACM Symposium on Theory of Computing, 2004.
-
T. Roughgarden.
Selfish Routing.
Ph.D. thesis, Cornell University, 2002.
-
Yannis A. Korilis, Aurel A. Lazar, Ariel Orda
Architecting Noncooperative Networks
IEEE Journal of Selected Areas in Communications, 1995.
-
T. Roughgarden and E. Tardos:
How Bad is Selfish Routing?
JACM, 2002. Preliminary version has appeared in the
Proceedings of the 41st Annual IEEE Symposium on
the Foundations of Computer Science, 2000.
J. Feibenbaum, C. Papadimitriou, and S. Shenker.
Sharing the Cost of Multicast Transmissions.
Journal of Computer and System Sciences, 63:21--41, 2001.
-
Alex Fabrikant, Ankur Luthra, Elitza Maneva, Christos H. Papadimitriou,
and Scott Shenker,
In Proc. of 2003 PODC, pages 347-351.`
-
E. Anshelevich, A. Dasgupta, E. Tardos, T. Wexler.
Near-Optimal Network Design with Selfish Agents.
Proc. 35th ACM Symposium on Theory of Computing, 2003
-
E. Anshelevich, A. Dasgupta, J. Kleinberg,
E. Tardos, T. Wexler, T. Roughgarden.
The Price of Stability for Network Design with Fair Cost Allocation.
In FOCS 2004.
-
R. Johari and J. N. Tsitsiklis.
Efficiency Loss in a Network Resource Allocation Game.
Mathematics of Operations Research, 2004.
(5) Spectral Analysis of Networks
A powerful approach to analyzing networks is to look
at the eigenvalues and eigenvectors of their adjacency matrices.
The connection between these parameters and
the structure of the network is a subtle issue, and
while many results have been established about this connection,
it is still not fully understood.
-
N. Alon.
Eigenvalues and Expanders.
Combinatorica 6(1986), pp. 83-96.
-
A. Sinclair and M. Jerrum.
Approximate Counting, Uniform Generation, and Rapidly Mixing Markov Chains.
Information and Computation 82(1989), pp. 93-133.
-
F.R.K. Chung.
Spectral Graph Theory.
AMS Press. 1997.
-
Daniel A. Spielman and Shang-Hua Teng.
Spectral Partitioning Works: Planar graphs and finite element meshes.
Proceedings of the 37th Annual IEEE
Conference on Foundations of Computer Science, 1996.
and UC Berkeley Technical Report number UCB CSD-96-898.
(6) Link Analysis for Web search
Link structure can be a powerful source of information about the underlying
content in the network. In the context of the Web, we can try to identify
high-quality information resources from the way in which other pages
link to them; this idea has reflections in the analysis of citation data
to find influential journals, and in the analysis of social networks to
find important members of a community.
From a methodological point of view, current approaches
to link analysis on the Web make extensive of
methods based on eigenvalues and eigenvectors.
- Hubs and Authorities, PageRank, and variants
-
J. Kleinberg.
Authoritative sources in a hyperlinked environment.
Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.
Extended version in Journal of the ACM 46(1999).
Also appears as IBM Research Report RJ 10076, May 1997.
-
S. Brin and L. Page.
The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Proc. 7th International World Wide Web Conference, 1998.
-
S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg,
S.R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins,
Mining the link structure of the World Wide Web.
IEEE Computer, August 1999.
-
A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, S. Raghavan.
Searching the Web.
ACM Transactions on Internet Technology 1(1): 2-43 (2001)
-
A. Borodin, J. S. Rosenthal, G. O. Roberts, P. Tsaparas,
Finding Authorities and Hubs From Link Structures on the World Wide Web.
10th International World Wide Web Conference, May 2001.
-
Dimitris Achlioptas, Amos Fiat, Anna Karlin, Frank McSherry,
Web Search via Hub Synthesis.
42nd IEEE Symposium on Foundations of Computer Science, 2001, p.611-618.
-
Davood Rafiei, Alberto Mendelzon.
What is this Page Known for? Computing Web Page Reputations.
Proc. WWW9 Conference, Amsterdam, May 2000
-
Pedro Domingos, Matt Richardson.
The Intelligent Surfer: Probabilistic Combination of
Link and Content Information in PageRank.
Advances in Neural Information Processing Systems 14, 2002.
-
Taher H. Haveliwala.
Topic-Sensitive PageRank.
11th International World Wide Web Conference, 2002.
- Connections with Social Networks and Citation Analysis.
-
G. Pinski, F. Narin.
Citation influence for journal aggregates of scientific
publications: Theory, with application
to the literature of physics.
Information Processing and Management, 12(1976), pp. 297--312.
-
L. Katz.
A new status index derived from sociometric analysis.
Psychometrika 18(1953).
-
C.H. Hubbell.
An input-output approach to clique identification.
Sociometry 28(1965).
-
P. Bonacich.
Power and Centrality: A family of measures.
American Journal of Sociology 92(1987).
(7) Clustering and Community Structures in Networks
Clustering is one of the oldest and most well-established
problems in data analysis; in the context of networks,
it can be used to search for densely connected communities.
A number of techniques have been applied to this problem,
including combinatorial and spectral methods.
-
R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins.
Trawling the web for emerging cyber-communities.
8th International World Wide Web Conference, May 1999.
-
R. Agrawal, R. Srikant.
Fast Algorithms for Mining Association Rules.
20th Int'l Conference on Very Large Databases (VLDB), 1994.
-
S. Dill, R. Kumar, K. McCurley, S. Rajagopalan, D. Sivakumar, A. Tomkins.
Self-similarity in the Web.
27th International Conference on Very Large Data Bases, 2001.
-
Gary Flake, Steve Lawrence, C. Lee Giles, Frans Coetzee.
Self-Organization and Identification of Web Communities.
IEEE Computer, 35:3, March 2002.
-
Gary Flake, K. Tsioutsiouliklis, R.E. Tarjan.
Graph Clustering Techniques based on Minimum Cut Trees.
Technical Report 2002-06, NEC, Princeton, NJ, 2002.
-
M. Girvan and M. E. J. Newman.
Community structure in social and biological networks.
Proc. Natl. Acad. Sci. USA 99, 8271-8276 (2002).
-
J. Kleinberg.
An Impossibility Theorem for Clustering.
Advances in Neural Information Processing Systems (NIPS) 15, 2002.
-
J. Hopcroft, O. Khan, B. Kulis, and B. Selman.
Natural communities in large linked networks.
In Proceedings of the 9th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, pages 541--546,
-
M. Granovetter.
The strength of weak ties.
American Journal of Sociology, 78(6):1360-1380, 1973.
(8) Labeling and Classification using Networks of Pairwise Relationships
A task closely related to clustering is the problem
of classifying the nodes of a network using a known set of labels.
For example, suppose we wanted to
classify Web pages into topic categories.
Automated text analysis can give us an estimate of the topic of each page;
but we also suspect that pages have some tendency to be similar
to neighboring pages in the link structure.
How should we combine these two sources of evidence?
A number of probabilistic frameworks are useful for this task,
including the formalism of Markov random fields,
which -- for quite different applications --
has been extensively studied in computer vision.
-
J. Besag.
Spatial interaction and the statistical analysis of lattice systems.
J. Royal Statistical Society B, 36(1974).
-
Soumen Chakrabarti, Byron E. Dom, and Piotr Indyk.
Enhanced hypertext categorization using hyperlinks.
Proceedings of the ACM International Conference on Management of Data,
SIGMOD 1998, pages 307-318.
-
O. Veksler.
Efficient Graph-Based Energy Minimization Methods in Computer Vision.
Ph.D. Thesis, Cornell University, 1999.
-
J. Kleinberg, E. Tardos.
Approximation Algorithms for Classification Problems
with Pairwise Relationships: Metric Labeling and Markov Random Fields.
Proc. 40th IEEE Symposium on Foundations of Computer Science, 1999.
-
A. Broder, R. Krauthgamer, and M. Mitzenmacher.
Improved Classification via Connectivity Information.
ACM-SIAM Symposium on Discrete Algorithms, 2000.
-
Avrim Blum, Shuchi Chawla.
Learning from Labeled and Unlabeled Data using Graph Mincuts.
International Conference on Machine Learning (ICML), 2001.
-
T. Joachims, N. Cristianini, and J. Shawe-Taylor.
Composite Kernels for Hypertext Categorisation.
International Conference on Machine Learning (ICML), 2001.
-
B. Taskar, P. Abbeel and D. Koller.
Discriminative Probabilistic Models for Relational Data.
Eighteenth Conference on Uncertainty in Artificial Intelligence
(UAI 2002).
(9) The VC-dimension and some of its applications.
Random sampling is a useful technique in many of
the types of problems we have been considering,
and understanding the full power of sampling can be
more subtle than it first appears.
Here we try to understand the
following phenomenon:
when the underlying space being sampled has
a certain kind of ``orderly'' structure,
very small samples can provide extremely strong information.
There is a general and powerful theory,
the notion of VC-dimension, that explains this;
after developing the concept of VC-dimension,
we'll look at some of its applications
in approximation algorithms and computational geometry.
and efficient indexing.
(10) Rank Aggregation and Meta-Search
We have now seen a number of different techniques for
ranking Web pages according to various measures of quality.
Are there principled methods for combining them,
to get an even better `meta-ranking'?
We begin this discussion with a celebrated result of Arrow
from mathematical economics, suggesting some of
the trade-offs inherent in such an approach.
-
K. Arrow.
Social Choice and Individual Values. Wiley, 1951.
-
H.P. Young.
Condorcet's Theory of Voting.
American Political Science Review 82(1988), pp. 1231-1244.
-
Cynthia Dwork, Ravi Kumar, Moni Naor, D. Sivakumar.
Rank Aggregation Methods for the Web.
10th International World Wide Web Conference, May 2001.
-
Weiyi Meng, Clement Yu, King-Lup Liu.
Building Efficient and Effective Metasearch Engines.
ACM Computing Surveys 34(2002).
-
William W. Cohen, Robert E. Schapire, and Yoram Singer.
Learning to order things.
Journal of Artificial Intelligence Research, 10: 243--270, 1999.
-
T. Joachims.
Optimizing Search Engines Using Clickthrough Data.
Eighth International Conference on Knowledge Discovery and Data Mining,
KDD-2002.
Network Datasets
There are a number of interesting network datasets available
on the Web; they form a valuable resource for trying out
algorithms and models across a range of settings.
-
Collaboration and citation networks: For the 2003 KDD Cup
competition, Johannes Gehrke, Paul Ginsparg, and I provided
a dataset based on the arXiv pre-print database,
which allows one to study the networks of co-authorships and citations
among a large community of physicists.
Here is the KDD Cup dataset and a paper describing the
competition in more detail.
-
Internet topology: The network structure of the Internet
can be studied at several levels of resolution.
Here is a dataset at the autonomous system (AS) level.
-
Web subgraphs:
There are many such datasets available for download.
One set is maintained by Panayiotis Tsaparas;
the experiments that used this data are described in his Ph.D. thesis,
and in other papers linked from his home page.
-
Biological interaction networks:
These are constructed from collections of biological molecules that
participate in a cell's metabolism,
with edges joining pairs that have been found to interact.
Naturally, such networks are incomplete, since some
interactions have not yet been discovered.
-
Semantic networks:
Free association datasets for words have
been collected by cognitive scientists;
these are constructed by compiling the free responses
of test subjects when presented with cue words.
(For example, a test subject presented with
the cue word `ice' might react with the word `cold,' `cream,' or `water.')