by Fatemeh Rahimian
Researchers at SICS have invented a new way to partition graphs into a given number of clusters, such that the related information belongs to the same cluster and the number of connections between clusters are minimized. Graph partitioning, a well known NP-Complete problem, has been already thoroughly investigated for small and medium sized graphs, but this distributed algorithm works on very large graphs.
Our world is increasingly connected, and this is changing our lives in ways that we still do not fully understand. New tools and technologies have given us the unprecedented potential to make sense of the huge inter-connected datasets that exist in various fields of science, from social networks to biological networks. Relations can be mathematically described as “graphs”; for example, a network of friends on Facebook can be described by a graph, which shows a user’s closest friends and who they are related to. The increasing size of datasets means that graph sizes are also increasing: this means that they must be partitioned into smaller clusters so they can be more easily managed on multiple machines in a distributed fashion.
The new solution for this is a distributed heuristic based algorithm that can efficiently partition big graphs into a given number of clusters of equal size or any given size. The key point in this approach is that no global knowledge is required, and simultaneous access to the entire graph is not assumed.
The algorithm used to solve the problem is very intuitive. It starts by randomly assigning vertices to partitions. Over the course of the algorithm, vertices from different partitions iteratively negotiate with each other and exchange their partition assignments, if that exchange results in a better partitioning [1]. When this iterative optimization converges, neighboring vertices of the graph will mostly end up in the same partition, while there will be very few edges that cross different partitions (See Figure 1). The algorithm is inherently a local search optimization enhanced using a simulated annealing technique. A variant of this work has been specifically designed for graphs with power-law degree distribution, where vertex-cut partitioning has proved more efficient than edge-cut partitioning [2]. Both solutions are kept as simple as possible, so that the algorithm can be easily adopted by various real world applications.
With the recent advances in graph database technologies, it is anticipated that leading companies in this field will soon move to make use of graph partitioning for scaling up. Successful partitioning is an important ingredient in efficient big data analysis and that is why this solution has attracted a lot of attention in the community.
Links: http://e2e-clouds.org/
References:
[1] Rahimian, Fatemeh, Amir H. Payberah, Sarunas Girdzijauskas, Mark Jelasity, and Seif Haridi. "Ja-be-ja: A distributed algorithm for balanced graph partitioning." In the 7th IEEE Conference on Self-Adaptive and Self-Organizing Systems (SASO), 2013.
[2] Rahimian, Fatemeh, Amir H. Payberah, Sarunas Girdzijauskas, and Seif Haridi. "Distributed vertex-cut partitioning." In the 14th IFIP International conference on Distributed Applications and Interoperable Systems (DAIS), 2014.
Please contact:
Fatemeh Rahimian
SICS Swedish ICT
E-mail: