The Structure of Information Networks

Computer Science / Information Science 6850
Cornell University
Fall 2024

  • Time: Mondays and Wednesdays at 10:10-11:25

  • Place: G01 Gates Hall

  • http://www.cs.cornell.edu/courses/cs6850/

    Course Staff

    Overview

    The past two decades have seen a convergence of social and technological networks, with systems such as the World Wide Web characterized by the interplay between rich information content, the millions of individuals and organizations who create it, and the technology that supports it. This course covers recent research on the structure and analysis of such networks, and on models that abstract their basic properties. Topics include combinatorial and probabilistic techniques for link analysis, centralized and decentralized search algorithms, network models based on random graphs, and connections with work in the social sciences.

    The course prerequisites include introductory-level background in algorithms, graphs, probability, and linear algebra, as well as some basic programming experience (to be able to manipulate network datasets).

    The work for the course will consist primarily of two problem sets, a short reaction paper, and a more substantial project.

    Course Outline

    (1) Random Graphs and Small-World Properties

    A major goal of the course is to illustrate how networks across a variety of domains exhibit common structure at a qualitative level. One area in which this arises is in the study of `small-world properties' in networks: many large networks have short paths between most pairs of nodes, even though they are highly clustered at a local level, and they are searchable in the sense that one can navigate to specified target nodes without global knowledge. These properties turn out to provide insight into the structure of large-scale social networks, and, in a different direction, to have applications to the design of decentralized peer-to-peer systems.

    (2) Cascading Behavior in Networks

    We can think of a network as a large circulatory system, through which information continuously flows. This diffusion of information can happen rapidly or slowly; it can be disastrous -- as in a panic or cascading failure -- or beneficial -- as in the spread of an innovation. Work in several areas has proposed models for such processes, and investigated when a network is more or less susceptible to their spread. This type of diffusion or cascade process can also be used as a design principle for network protocols. This leads to the idea of epidemic algorithms, also called gossip-based algorithms, in which information is propagated through a collection of distributed computing hosts, typically using some form of randomization.

    (3) Spectral Analysis and Random Walks in Networks

    One can gain a lot of insight into the structure of a network by analzing the eigenvalues and eigenvectors of its adjacency matrix. The connection between spectral parameters and the more combinatorial properties of networks and datasets is a subtle issue, and while many results have been established about this connection, it is still not fully understood. This connection has also led to a number of applications, including the development of link analysis algorithms for Web search.