Scheduling Data Mining Applications in Mobile Computing Environments

by Carmela Comito, Deborah Falcone, Domenico Talia and Paolo Trunfio

Data mining is emerging as a promising topic in mobile computing environments. We have defined a distributed architecture in which mobile devices cooperate in a peer-to-peer style to perform a data mining process, tackling the problem of energy capacity shortage by distributing the energy consumption among the available devices. An energy-aware (EA) scheduling strategy assigns data mining tasks over a network of mobile devices optimizing energy usage. The main design principle is to find a task allocation that prolongs the network residual life by balancing the energy load among the devices.

The wide availability and growing computing power of mobile devices has opened the way for data analysis and mining in mobile scenarios [1]. Mobile applications exploiting data mining techniques have appeared on the market in recent years. Examples include smartphone-based systems for body-health monitoring, vehicle control, and wireless security systems. An important aspect that must be addressed is that of ensuring energy efficiency, as most mobile devices have battery power which would last only a few hours. Data mining tasks in mobile environments should be allocated and scheduled so as to minimize the energy consumption of low-capacity mobile devices.

We have worked in this direction by defining a distributed architecture in which mobile devices cooperate in a peer-to-peer style to perform data mining tasks, tackling the problem of energy capacity shortage by distributing the energy consumption among the available devices. Efficient resource allocation and energy management is achieved through clustering of mobile devices, as shown in Figure 1. With this approach, mobile nodes can be assigned different roles, such as cluster-head or cluster member. A cluster-head serves as the local coordinator for its cluster, performing intra-cluster transmission arrangement and data forwarding, while a cluster member is a non-cluster-head node without any inter-cluster links.

To evenly exploit all available resources, a proper distribution of data mining tasks among clusters and individual devices is crucial. To this end, we defined an energy-aware (EA) distributed task scheduling strategy whose goal is to find a task allocation that prolongs the network residual life. The EA scheduler implements a two-phase heuristic-based algorithm. When an assignment decision has to be made for a task, the first phase, denoted local assignment phase, is responsible for local task arbitration: it considers the energy consumption of task execution on the different devices within the local cluster. The algorithm tries to minimize the total energy consumed in the cluster by assigning the task to that device that permits the cluster residual life to be extended. If the first phase is not feasible, the second phase, denoted global assignment phase, is responsible for task arbitration among clusters: the task will be assigned to the most suitable device, in the network of clusters, that maximizes the network residual life.

We performed an evaluation of the EA allocation strategy using a network simulator, which allowed us to assess its effectiveness on a set of data mining tasks [2]. The simulation mainly aims to study the behaviour of the scheduler with respect to network residual life, number of alive devices, and number of completed tasks at the end of the simulation. To assess the effectiveness of the EA strategy, we compared its performance with that achieved by the well-known round-robin (RR) scheduling algorithm. As a first step, the simulator builds a network composed of 100 mobile devices, and lets them group into clusters. Then, an initial energy capacity ranging from 3,000 to 11,000 Joules is assigned to each device, following a normal distribution. After the initial setup, mobile devices start generating a set of data mining tasks to be executed; these are allocated to the available nodes according to the EA strategy. For the purpose of our simulation, we considered three reference data mining algorithms from the Weka project [3], namely J48, K-means and Apriori.

For each reference algorithm, we ran a set of tasks with a fixed dataset size (200 kB) and task arrival rate λ varying from 80 to 1280 tasks per hour. Figure 2 shows the network residual life measured at the end of the experiments for the three algorithms, using EA and RR. As expected, increasing the task arrival rate, the network residual live tends to zero both for EA and RR. However, for the lightest of the three data mining algorithms (Apriori), the residual life does not reach zero and the difference between EA and RR increases with λ in favor of EA. Figure 3 shows the number of alive devices for the three algorithms, using EA and RR. Also in this case, the results demonstrate that the number of alive devices with EA is greater than that achieved by RR. Finally, Figure 4 compares the performance of EA and RR in terms of completed tasks for the three algorithms. With Apriori, both EA and RR are able to complete more tasks as λ increases, but EA ensures better performance. With J48 and K-means, with a given task arrival rate, the number of completed tasks cannot increase because the network residual life is zero, as shown in Figure 2. However, even in these cases, there is an advantage for EA compared to RR with λ lower than 320 tasks/hour.

In summary, the experimental results showed that a significant improvement can be achieved using our EA scheduler compared to the time-based RR scheduler. Our algorithm: i) is effective in extending the network residual life by reducing the energy consumption, without limiting the number of completed tasks; ii) keeps alive most of the mobile devices, in all the experiments performed, thanks to its energy load balancing strategy.

Link: http://grid.dimes.unical.it

References:
[1] D. Talia, P. Trunfio: “Service-oriented distributed knowledge discovery”, Chapman and Hall/CRC, 2012
[2] C. Comito et al: “A distributed allocation strategy for data mining tasks in mobile environments”, in proc. of the 6th International Symposium on Intelligent Distributed Computing, Studies in Computational Intelligence 446: 231-240, 2012
[3] M. Hall et al: “The WEKA data mining software: an update”. SIGKDD Explorations 11(1): 10-18, 2009.

Please contact:
Carmela Comito and Domenico Talia
ICAR-CNR and DIMES, University of Calabria, Italy
E-mail: {ccomito,talia}@dimes.unical.it

Deborah Falcone and Paolo Trunfio
DIMES, University of Calabria, Italy
E-mail: {dfalcone|trunfio}@dimes.unical.it

{jcomments on}