Summary
Consider a system consisting of a relatively small number of hosts in a large network (like the Internet). We would like to know the properties of every pairwise path between the hosts in the system, but we would prefer not to have to measure all such paths. Somewhat surprisingly, this is possible. Many path metrics, such as latency, jitter, and (log) packet transmission success rates, can be written as sums of corresponding measures on the edges that make up the paths. Therefore, we can write a linear system relating row properties to path properties via a path matrix. The rank of the path matrix is typically much lower than the number of end-to-end paths. Thus, we can use linear dependencies between rows of the path matrix to infer properties of all the paths from a relatively small number of measurements. This path matrix also serves as the basis of inference about properties of individual links, when such inference is possible. Our current work involves building fast algorithms for factoring path matrices based on understanding the relation between the linear algebraic structure and the structure of the underlying network routing.
Papers
@article{2009-tons, author = {Zhao, Yao and Chen, Yan and Bindel, David}, title = {Towards Unbiased End-to-End Network Diagnosis}, journal = {IEEE/ACM Transactions on Networking}, volume = {17}, number = {6}, pages = {1724--1737}, month = dec, year = {2009}, doi = {10.1109/TNET.2009.2022158} }
Abstract:
Internet fault diagnosis is extremely important for end-users, overlay network service providers (like Akamai ), and even Internet service providers (ISPs). However, because link-level properties cannot be uniquely determined from end-to-end measurements, the accuracy of existing statistical diagnosis approaches is subject to uncertainty from statistical assumptions about the network. In this paper, we propose a novel least-biased end-to-end network diagnosis (in short, LEND) system for inferring link-level properties like loss rate. We define a minimal identifiable link sequence (MILS) as a link sequence of minimal length whose properties can be uniquely identified from end-to-end measurements. We also design efficient algorithms to find all the MILSs and infer their loss rates for diagnosis. Our LEND system works for any network topology and for both directed and undirected properties and incrementally adapts to network topology and property changes. It gives highly accurate estimates of the loss rates of MILSs, as indicated by both extensive simulations and Internet experiments. Furthermore, we demonstrate that such diagnosis can be achieved with fine granularity and in near real-time even for reasonably large overlay networks. Finally, LEND can supplement existing statistical inference approaches and provide smooth tradeoff between diagnosis accuracy and granularity.
@article{2007-tons, author = {Chen, Yan and Bindel, David and Song, Hanhee and Chavez, Brian and Katz, Randy}, title = {Algebra-Based Scalable Overlay Network Monitoring: Algorithms, Evaluation, and Applications}, journal = {ACM Transactions on Networking}, volume = {15}, number = {5}, pages = {1084--1097}, month = oct, year = {2007}, doi = {10.1109/TNET.2007.896251} }
Abstract:
Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with end hosts, existing systems either require measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Our earlier extended abstract [Y. Chen, D. Bindel, and R. H. Katz, “Tomography-based overlay network monitoring,” Proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC), 2003] briefly proposes an algebraic approach that selectively monitors linearly independent paths that can fully describe all the paths. The loss rates and latency of these paths can be used to estimate the loss rates and latency of all other paths. Our scheme only assumes knowledge of the underlying IP topology, with links dynamically varying between lossy and normal. In this paper, we improve, implement, and extensively evaluate such a monitoring system. We further make the following contributions: i) scalability analysis indicating that for reasonably large n (e.g., 100), the growth of $k$ is bounded as $O(n \log n)$, ii) efficient adaptation algorithms for topology changes, such as the addition or removal of end hosts and routing changes, iii) measurement load balancing schemes, iv) topology measurement error handling, and v) design and implementation of an adaptive streaming media system as a representative application. Both simulation and Internet experiments demonstrate we obtain highly accurate path loss rate estimation while adapting to topology changes within seconds and handling topology errors.
@inproceedings{2006-sigcomm, author = {Zhao, Yao and Chen, Yan and Bindel, David}, title = {Toward Unbiased End-to-End Network Diagnosis}, booktitle = {Proceedings of SIGCOMM 2006}, pages = {219--230}, year = {2006}, doi = {10.1145/1151659.1159939} }
Abstract:
Internet fault diagnosis is extremely important for end users, overlay network service providers (like Akamai [1]) and even Internet service providers (ISPs). However, because link-level properties cannot be uniquely determined from end-to-end measurements, the accuracy of existing statistical diagnosis approaches is subject to uncertainty from statistical assumptions about the network. In this paper, we propose a novel Least-biased End-to-end Network Diagnosis (in short, LEND) system for inferring link-level properties like loss rate. We define a minimal identifiable link sequence (MILS) as a link sequence of minimal length whose properties can be uniquely identified from end-to-end measurements. We also design efficient algorithms to find all the MILSes and infer their loss rates for diagnosis. Our LEND system works for any network topology and for both directed and undirected properties, and incrementally adapts to network topology and property changes. It gives highly accurate estimates of the loss rates of MILSes, as indicated by both extensive simulations and Internet experiments. Furthermore, we demonstrate that such diagnosis can be achieved with fine granularity and in near real-time even for reasonably large overlay networks. Finally, LEND can supplement existing statistical inference approaches and provide smooth tradeoff between diagnosis accuracy and granularity.
@inproceedings{2004-sigcomm, author = {Chen, Yan and Bindel, David and Song, Hanhee and Katz, Randy}, title = {An Algebraic Approach to Practial and Scalable Overlay Network Monitoring}, booktitle = {Proceedings of ACM SIGCOMM 2004}, year = {2004}, doi = {10.1145/1015467.1015475} }
Abstract:
Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with $n$ end hosts, existing systems either require $O(n^2)$ measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Our earlier extended abstract briefly proposes an algebraic approach that selectively monitors $k$ linearly independent paths that can fully describe all the $O(n^2)$ paths. The loss rates and latency of these $k$ paths can be used to estimate the loss rates and latency of all other paths. Our scheme only assumes knowledge of the underlying IP topology, with links dynamically varying between lossy and normal.In this paper, we improve, implement and extensively evaluate such a monitoring system. We further make the following contributions: i) scalability analysis indicating that for reasonably large n (e.g., 100), the growth of $k$ is bounded as $O(n \log n)$, ii) efficient adaptation algorithms for topology changes, such as the addition or removal of end hosts and routing changes, iii) measurement load balancing schemes, and iv) topology measurement error handling. Both simulation and Internet experiments demonstrate we obtain highly accurate path loss rate estimation while adapting to topology changes within seconds and handling topology errors.
@inproceedings{2003-imc, author = {Chen, Yan and Bindel, David and Katz, Randy}, title = {Tomography-Based Overlay Network Monitoring}, booktitle = {Proceedings of ACM SIGCOMM Internet Measurement Conference (IMC)}, year = {2003}, doi = {10.1145/948205.948233} }
Abstract:
Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with $n$ end hosts, existing systems either require $O(n^2)$ measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Unlike other network tomography systems, we characterize end-to-end losses (this extends to any additive metrics, including latency) rather than individual link losses. We find a minimal basis set of $k$ linearly independent paths that can fully describe all the $O(n^2)$ paths. We selectively monitor and measure the loss rates of these paths, then apply them to estimate the loss rates of all other paths. By extensively studying synthetic and real topologies, we find that for reasonably large n (e.g., 100), $k$ is only in the range of $O(n \log n)$. This is explained by the moderately hierarchical nature of Internet routine.Our scheme only assumes the knowledge of underlying IP topology, and any link can become lossy or return to normal. In addition, our technique is tolerant to topology measurement inaccuracies, and is adaptive to topology changes.
@techreport{2003-tomography-tr, author = {Chen, Yan and Bindel, David and Song, Hanhee and Katz, Randy}, title = {Tomography-Based Overlay Network Monitoring}, number = {CSD-03-1252}, institution = {UC Berkeley Computer Science Division}, month = jun, year = {2003}, status = {unrefereed}, submit = {Appeared in SIGCOMM IMC 2003} }
Abstract:
Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with $n$ end hosts, existing systems either require $O(n^2)$ measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Unlike other network tomography systems, we characterize end-to-end losses (this extends to any additive metrics, including latency) rather than individual link losses. We find a minimal basis set of $k$ linearly independent paths that can fully describe all the $O(n^2)$ paths. We selectively monitor and measure the loss rates of these paths, then apply them to estimate the loss rates of all other paths. By extensively studying synthetic and real topologies, we find that for reasonably large n (e.g., 100), $k$ is only in the range of $O(n \log n)$. This is explained by the moderately hierarchical nature of Internet routine.Our scheme only assumes the knowledge of underlying IP topology, and any link can become lossy or return to normal. In addition, our technique is tolerant to topology measurement inaccuracies, and is adaptive to topology changes.
Talks
Matrix Factorizations for Computer Network Tomography
Householder Symposium
tomography
•
meeting external invited plenary
Matrix Factorizations for Computer Network Tomography
NYU NA Seminar
tomography
•
seminar external invited
Linear Algebra for Network Loss Characterization
UC Berkeley LAPACK Seminar
tomography
•
seminar local
Linear Algebra for Network Loss Characterization
UC Berkeley LAPACK seminar
tomography
•
seminar local