ElasticSSI: Self-optimizing Metacomputing through Process Migration and Elastic Scaling

by Philip Healy, John Morrison, Ray Walshe

Single System Image (SSI) systems allow a collection of discrete machines to be presented to the user in the guise of a single virtual machine. Similarly, Infrastructure-as-a-Service (IaaS) interfaces allow one or more physical machines to be presented to the user in the guise of a collection of discrete virtual machines. Creating an SSI instance from a pool of virtual resources provisioned from an IaaS provider affords the ability to leverage the “on-demand” nature of IaaS to quickly and easily adjust the size of the resource pool. With ElasticSSI we propose to automate this adjustment process through the application of elastic scaling. The automation of the scaling process will result in systems that are self-optimizing with respect to resource utilization; virtual resources are allocated and released based the value of system load metrics, which in turn are dependent on the resources allocated. The scaling process is transparent to the end user as the SSI system maintains the illusion of interacting with a single Linux OS instance.

Single System Image (SSI) is a metacomputing paradigm where a number of discrete computing resources are aggregated and presented to the user via an interface that maintains the illusion of interaction with a single system. Although the interface chosen could represent any one of a number of resource types at varying levels of abstraction, the term SSI has become synonymous with the abstraction of clusters of machines at the kernel level. The greatest advantage of the process migration approach to parallelism is that in most cases the user does not need to modify application code in order to parallelize a particular workload. However, performance gains are only possible if the workload in question is amenable to process migration. Suitable applications include long-running shell scripts, place and route, and 3D rendering.

The rise of virtualization in recent years has lead to the popularization of Infrastructure-as-a-Service (IaaS). Under the IaaS model, resources (such as virtual machine instances and block devices) can be provisioned as needed by the end user. Although the IaaS model can be applied to physical hardware, it is typically used to provide access to virtualized resources. The IaaS provisioning process becomes more interesting when it is automated, i.e., the system modifies itself by allocating and deallocating resources in response to rises and falls in demand, a technique referred to as elastic scaling. Virtualization and SSI solve similar problems in that both divorce the resources available to an OS from the physical hardware.

To date the SSI approach has been applied mostly to physical compute clusters and grids where it has gained some popularity but has never emerged as a mainstream technique. The reasons underlying the lack of uptake for SSI were examined in a series of blog posts by Greg Pfister, a former IBM Distinguished Engineer and cluster computing specialist (see link below). Pfister identified availability during OS updates, application compatibility issues, cognitive inertia, and matching the parallelism of a workload to the underlying computing resources available. We are cognizant of these issues and postulate another: barrier to entry. The creation and administration of compute clusters is an activity outside the core competency of many organizations, and so tends to be performed only in situations where the need is acute. Taking a chance on exotic, experimental systems such as SSI with associated kernel patching appears to be a bridge too far for most organizations.

Figure 1: As the load on an ElasticSSI instance increases, new virtual machines are added and processes are migrated.

Figure 1: As the load on an ElasticSSI instance increases, new virtual machines are added and processes are migrated.

The ElasticSSI project, led by Philip Healy and John Morrison of University College Cork in conjunction with Ray Walshe of Dublin City University, intends to address these issues by creating an SSI implementation that builds on the IaaS model to provide elastic scaling. We believe that the barrier to entry for SSI can be removed by making our implementation available as a Platform-as-a-Service (PaaS) offering; users will be able to create an instance of our system at the push of a button and experiment from there for evaluation purposes. The PaaS approach also solves the issue of OS updates as users simply create a new ElasticSSI instance whenever an OS upgrade is required. We do not propose to take any significant steps to address application compatibility or workload matching as these go against the simplicity that characterizes the SSI approach; in our view, if the user's anticipated workload is not a good fit from the outset then attempting to force a fit will rapidly lead to diminishing returns. The issue of cognitive inertia is, we believe, closely related to the barrier to entry issue; solving the latter would go a long way towards addressing the former.

Related work includes MIT’s Factored Operating System (fos), which is an attempt to create a scalable cloud-based SSI operating system from scratch. Although this approach has considerable merit, the scope of the fos project is much more ambitious than the simple “process migration with elastic scaling” approach advocated here. In contrast, ElasticSSI will be a minimal implementation based on an existing Linux distribution. This is in line with our goal of reaching as wide an audience as possible and encouraging experimentation. In light of this, we have decided that the best starting point is one of the existing Linux-based SSI implementations, of which there are several. These are typically implemented as a set of kernel patches that implement the required OS modifications along with user-mode tools and libraries that implement complementary non-kernel functionality. Existing implementations include MOSIX, OpenMOSIX, LinuxPMI, Kerrighed and OpenSSI. A final decision on which implementation to adopt will be taken once a thorough evaluation of each has been completed.

SSI implementations do not require homogeneity across the machines that make up the underlying resource pool. Similarly, IaaS providers typically offer a range of virtual machine types, with some types being tailored for compute-intensive workloads (by increasing the number and type of available cores) and others similarly geared towards memory-intensive workloads (by increasing the amount of memory available). An active area of research will be to introduce heterogeneity into the resource pool by allocating virtual machine instances of varying types based on the value of metrics such as system-wide CPU load and memory utilization.

References:
Rajkumar Buyya, Toni Cortes and Hai Jin, Single System Image, International Journal of High Performance Computing Applications, 15 (2): 124.

David Wentzlaff, Charles Gruenwald III, Nathan Beckmann, Kevin Modzelewski, Adam Belay, Lamia Youseff, Jason Miller, and Anant Agarwal, A Unified Operating System for Clouds and Manycore: fos, 1st Workshop on Computer Architecture and Operating System co-design (CAOS), Jan 2010.

Link:
http://perilsofparallel.blogspot.com/2009/01/multi-multicore-single-system-image.html

Please contact:
Philip Healy
Irish Centre for Cloud Computing and Commerce, Ireland
Tel: +353 21 4205935
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

{jcomments on}