by Laurent Massoulié

At the Thomson Research Lab in Paris, epidemic algorithms are being investigated. These algorithms are expected to have a strong impact on the dissemination of media content over the Internet, notably in peer-to-peer systems for live streaming applications.

While epidemics are part of everyday life, they nonetheless constitute an intriguing phenomenon. In particular, one may wonder at how quickly a global outbreak (say of influenza) occurs, given the local and limited nature (individual contacts) of the propagation mechanisms of the infectious agent.

The earliest attempt at a mathematical understanding of these dynamics can be traced back to the Swiss scientist Daniel Bernoulli (1700-1782) in his study of smallpox in 1766. A subsequent milestone was reached in 1838 by the Belgian mathematician Pierre Francois Verhulst, who described the temporal evolution of a population size by the so-called logistic equation (see box). The most salient feature of this equation is its exponential growth in the early stages.

The Belgian mathematician Pierre Francois Verhulst described the temporal evolution of a population size by the so-called logistic equation
The Belgian mathematician Pierre Francois Verhulst described the temporal evolution of a population size by the so-called logistic equation

With the advent of personal computers, the eighties also witnessed the invention of computer viruses. Their propagation is as fast as that of biological epidemics, and is accurately modelled by the logistic equation of Verhulst. (This has been strikingly illustrated by the latest incarnations of computer viruses, namely Internet worms, and in particular the so-called Code-RED worm.) Clearly, this is not an invention that computer scientists should be proud of! In the same period however, a group of computer scientists from Xerox PARC have compensated for this by inventing a way of putting epidemic propagation to good use. Specifically, they have solved the delicate problem of synchronizing distributed databases by spreading data updates among individual databases by random, viral-like propagation attempts.

This seminal work introduced the idea of mimicking biological epidemics to propagate useful information: the area of epidemic algorithms was born. In a nutshell, this approach lets each constituent of the system choose at random which of its neighbours to ‘infect’ with information. This is certainly simple, but is it efficient?

Surprisingly, in a number of situations of interest, epidemic approaches are as efficient as any other technique, no matter how elaborate. Let us illustrate this in the context of peer-to-peer systems. These consist in collections of machines (the peers) which act both as servers and as clients of services, and interact over the Internet. Such systems became popular for file-sharing applications with Napster at the turn of the century.

More recent versions of these systems provide so-called live streaming applications: a stream of audio-video data (think of a TV channel) is to be disseminated in real time to all participating peers. This data stream is fragmented into data chunks by the data source. Peers subsequently exchange data chunks among themselves to eventually recover the original data stream.

An epidemic scheme for this live streaming application is then specified by detailing how peers choose which of their neighbours to contact, and which data chunk to forward to the selected peer. As a result, after appearing at the data source, each chunk should spread within the system to reach all peers within a limited delay. However, the epidemics corresponding to individual chunks compete for communication resources. The challenge is then to define mechanisms for which this competition does not slow down individual epidemic disseminations. We now consider several scenarios for which we have identified efficient epidemic schemes.

Symmetric networks: Assume all peers have the same communication capacity of one chunk per second. Then the time for a chunk to reach all peers in a system of N such nodes is at least log2(N), since in one second the number of recipients of a chunk can at most double. Moreover, the rate at which peers can all receive data cannot exceed one chunk per second. In this context, consider the Random Peer Latest Useful Chunk scheme, whereby each peer selects a target at random, and sends to the elected target the latest chunk it received, that the target has not yet received.

Recently we (T. Bonald, L. Massoulié, F. Mathieu, D. Perino and A. Twigg) established that for any data rate strictly below 1, this scheme ensures that each peer receives chunks within the optimal time log2(N), plus some constant (not depending on N). Thus this scheme achieves delivery in optimal time for any injection rate strictly below 1.

Our proof builds upon previous work (by S. Sanghavi, B. Hajek and L. Massoulié) where the performance of the Random Peer Latest Blind Chunk policy is analysed. The argument in this analysis relies on the identification of secondary epidemic disseminations, which are shown to follow a variant of the logistic equation.

Heterogeneous networks: Assume now that peer uplink capacities are no longer identical. In this context, we considered the Most Deprived Peer Random Useful Chunk policy, whereby each peer picks the neighbour to which it can provide the largest number of chunks, after which it selects at random which of these chunks to forward. We (L. Massoulié and A. Twigg) showed that this scheme ensures delivery of all chunks in bounded time, provided the data rate is strictly below the optimal injection rate. In this sense, this is a rate-optimal scheme.

The above results show that epidemic schemes can deliver live streaming at optimal performance in several scenarios of interest. Thus randomized local decisions can lead to globally optimal behaviour. Peer-to-Peer architectures are expected to become the principal channel for delivering media content (legal or otherwise) in the near future. Epidemic algorithms will then have a strong impact in driving the dissemination of all such media content.

Many exciting problems remain to be solved in this area: suitable schemes for delivery of video-on-demand are being studied; delay optimality in heterogeneous environments is still poorly understood; and the search for schemes to ensure that data flows reduce the cost on the underlying networks (the so-called ISP-friendliness issue of peer-to-peer design) is also wide open.

Link:
The above-mentioned articles can be found at http://www.thlab.net/~lmassoul/

Please contact:
Laurent Massoulié
Paris Research Lab, Thomson, Corporate Research, France
E-mail: laurent.massoulie@thomson.net

Next issue: January 2025
Special theme:
Large-Scale Data Analytics
Call for the next issue
Get the latest issue to your desktop
RSS Feed