by Fatemeh Rahimian

Researchers at SICS have invented a new way to partition graphs into a given number of clusters, such that the related information belongs to the same cluster and the number of connections between clusters are minimized. Graph partitioning, a well known NP-Complete problem, has been already thoroughly investigated for small and medium sized graphs, but this distributed algorithm works on very large graphs.

Our world is increasingly connected, and this is changing our lives in ways that we still do not fully understand. New tools and technologies have given us the unprecedented potential to make sense of the huge inter-connected datasets that exist in various fields of science, from social networks to biological networks. Relations can be mathematically described as “graphs”; for example, a network of friends on Facebook can be described by a graph, which shows a user’s closest friends and who they are related to. The increasing size of datasets means that graph sizes are also increasing: this means that they must be partitioned into smaller clusters so they can be more easily managed on multiple machines in a distributed fashion.

Figure 1: 4elt sparse graph, partitioned into 4 balanced components that are indicated by colors. Very few links are on the border of regions with different colors. For her achievement [1] Fatemeh Rahimian received the Best Paper Award at the 7th IEEE International Conference on Self-Adaptive and Self-Organizing Systems in Philadelphia in September 2013. Moreover, in May 2014 she obtained a doctoral degree that was partly based on this work.

The new solution for this is a distributed heuristic based algorithm that can efficiently partition big graphs into a given number of clusters of equal size or any given size. The key point in this approach is that no global knowledge is required, and simultaneous access to the entire graph is not assumed.

The algorithm used to solve the problem is very intuitive. It starts by randomly assigning vertices to partitions. Over the course of the algorithm, vertices from different partitions iteratively negotiate with each other and exchange their partition assignments, if that exchange results in a better partitioning [1]. When this iterative optimization converges, neighboring vertices of the graph will mostly end up in the same partition, while there will be very few edges that cross different partitions (See Figure 1). The algorithm is inherently a local search optimization enhanced using a simulated annealing technique. A variant of this work has been specifically designed for graphs with power-law degree distribution, where vertex-cut partitioning has proved more efficient than edge-cut partitioning [2]. Both solutions are kept as simple as possible, so that the algorithm can be easily adopted by various real world applications.

With the recent advances in graph database technologies, it is anticipated that leading companies in this field will soon move to make use of graph partitioning for scaling up. Successful partitioning is an important ingredient in efficient big data analysis and that is why this solution has attracted a lot of attention in the community.

Links: http://e2e-clouds.org/

References:
[1] Rahimian, Fatemeh, Amir H. Payberah, Sarunas Girdzijauskas, Mark Jelasity, and Seif Haridi. "Ja-be-ja: A distributed algorithm for balanced graph partitioning." In the 7th IEEE Conference on Self-Adaptive and Self-Organizing Systems (SASO), 2013.
[2] Rahimian, Fatemeh, Amir H. Payberah, Sarunas Girdzijauskas, and Seif Haridi. "Distributed vertex-cut partitioning." In the 14th IFIP International conference on Distributed Applications and Interoperable Systems (DAIS), 2014.

Please contact:
Fatemeh Rahimian
SICS Swedish ICT
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

Next issue: January 2025
Special theme:
Large-Scale Data Analytics
Call for the next issue
Image ERCIM News 98 epub
This issue in ePub format
Get the latest issue to your desktop
RSS Feed