by Vincent Keller and Wolfgang Ziegler
Intelligent ApplicatioN-Oriented Scheduling (ÏANOS) is a framework built atop a Grid middleware that uses resources in an energy-efficient manner. ÏANOS selects appropriate resources for a given application, chooses the most energy-efficient one, and turns off the unused parts of the selected resource if not needed.
You are a computational scientist using high-performance computing (HPC) resources. Maybe you are an administrator or a manager of an HPC centre providing resources to multiple communities of users working on different scientific areas, or you are simply interested in how to efficiently use a Grid of HPC resources and applications. Do you think it is possible to do more science while using the resources in a more energy-efficient way? Is it possible to predict a better set of resources based on the needs of your users' applications? Can a system give you hints and tips to improve the implementation of your applications? If your answer is “no” to these questions, take five minutes. After reading this article your ideas will be quite different, because these are the goals of Intelligent ApplicatioN-Oriented Scheduling (ÏANOS).
Images of flow simulation created with SpecuLOOS, one of the first applications of the testbed used to validate the IANOS-prototype. Pictures courtesy of Roland Bouffanais, PhD. Massachusset Institute of Technology (MIT), MA, USA.
In 2009, HPC resources worldwide are consuming approximately the energy produced by fifteen 2GW power plants. Is all that energy used in an efficient manner? If it were possible to talk about a metric 'science per watt', its value today would surely be far less than twenty-five years ago. There are many reasons for this. For instance, high-end resource consumption today is an order of magnitude greater than what it was two decades earlier: 1-5 MW is the common energy consumption for a Top10 system, compared with the 200-300 KW needed by a Cray-2 twenty years ago. In addition, applications are now less efficient: today, an efficiency of 50% is considered good, whereas in 1985 codes performing at over 90% efficiency were common.
Energy-hungry resources are thus being used in a non-efficient manner, effectively meaning that less and less computing power is at the disposal of scientists. The final consequence is that the resources are over-booked.
ÏANOS is a framework built atop a Grid middleware (UNICORE or Globus) but agnostic to that middleware. It consists of (i) an information system that stores information relating to the applications and resources; (ii) a Grid-level resource broker that decides which resources are the most appropriate for a given application under a user's request for quality of service (QoS) at a given time; (iii) a meta-scheduler that interacts with the underlying Grid middleware and collects information on the status of the resources; and (iv) a monitoring system that records the behaviour of the applications during their execution.
The first original concept of ÏANOS is that the applications are parameterized, the needs of the applications are characterized, and the resources are also parameterized so that the resource broker knows what each resource provides (computing power, network performance or memory bandwidth for each node, for instance). These parameterizations are used to predict the execution time on a given resource of a given instance under its input data (such as the size of the problem, a previous stage of a simulation or a given accuracy for certain numerical simulation).
The brokering algorithm is the second original concept of ÏANOS: it is the matching function between what the application requires and what the resource provides under a user's request for QoS. It is based on a cost function. All the costs of an application's submission are computed: execution cost, waiting-time cost, licence cost, ecological cost and data transfer cost. Note that the execution cost includes all the fixed costs of the usage of a resource within a data centre. We propose a model which includes: investment, personnel, maintenance fees, interest to be paid to the bank, infrastructure, management, overhead, insurance fees and margin costs. This information is stored in the information system and entered once by the system administrator through a Web interface. The QoS specified by the user can be: “I want my results as soon as possible regardless of the cost”, or “I want my results for the smallest amount of money, regardless of the time”, or a mix of both. At submission time, the status of all the resources is gathered by the meta-scheduler. The overall cost is then computed by a minimization process for all the resources, and the job is assigned to the best and most energy-efficient resource. The Grid of resources is thus better used and the metric 'science per watt' increases. The process is completely automatic: the user need only choose the application, select the QoS and hit the 'submit' button.
With ÏANOS therefore, energy efficiency is achieved through three mechanisms based on different stages of running an application. First, ÏANOS selects the most appropriate resources based on the needs of the applications, ie from the available resources it selects the set that will execute the application with the highest efficiency and the lowest execution time. Second, ÏANOS chooses and schedules the application on the most energy-efficient resource (from the previously selected set) according to the QoS specified by the user. Third, ÏANOS goes even further: knowing in advance what the application needs, it is possible to modify the target architecture, for instance by switching off a processor to increase energy efficiency without reducing performance. As an additional benefit, by comparing the actual runtime of the application with a prediction based on the theoretical behaviour of the application, the user can detect problems in the implementation of the application and pinpoint flaws in the code.
In the context of the CoreGRID Network of Excellence and the Swiss ISS project, a prototype of ÏANOS was developed, implemented and in mid-2008 tested on an international testbed across Switzerland and Germany. We plan to develop and deploy the ÏANOS services for the European
e-Infrastructure through a new European project.
Fraunhofer SCAI, Germany
Tel: +49 2241 14 2280
Fraunhofer SCAI, Germany
Tel: +49 2241 14 2248