XtreemOS is an open-source distributed operating system for large scale dynamic Grids. It has been developed in the framework of the XtreemOS European project funded by the European Commission under the FP6.
XtreemOS can be seen as an alternative to traditional Grid middleware, facilitating the use of federated resources for scientific and business communities. The XtreemOS operating system provides for Grids what a traditional operating system offers for a single computer: abstraction from the hardware and secure resource sharing between different users. When a user runs an application on XtreemOS, the system automatically finds all resources necessary for the execution. It simplifies the user’s work by giving them the illusion of using a traditional computer. XtreemOS supports legacy Linux applications as well as Grid-aware MPI and SAGA applications. Applications can be run in the background or interactively. The latter option allows the use of numerical simulation platforms such as Mathlab on the Grid. It also considerably eases Grid application debugging.
The XtreemOS system provides three major services to users: application execution management (AEM), data management (XtreemFS) and virtual organization management (X-VOMS). The application execution manager provides scalable resource discovery through a peer-to-peer overlay which connects all self-described resources. XtreemOS provides location-independent access of user data through XtreemFS (http://www.xtreemfs.org), a Posix compliant file system spanning the grid. User management in XtreemOS is delegated to virtual organization managers. Access rights to resources are based on policies. Policy rules are defined by virtual organizations as well as by administration domains. They are checked at reservation time and are enforced on resources during execution.
Cloud Computing: A New Playground for XtreemOS
While XtreemOS was originally designed for Grids, it now appears to be an attractive technology for Cloud computing. During the last two years, we conducted a number of feasibility studies demonstrating that XtreemOS is highly relevant in the context of virtualized distributed computing infrastructures.
Infrastructure as a Service (IaaS) refers to systems that provide their users with computing resources delivered as a service, including servers, network equipment, memory, CPU and disk space. Although the term was coined after the start of the XtreemOS project, it precisely describes the goal of XtreemOS: to provide users with computing resources that can be assigned to them dynamically when required. In particular, AEM and XtreemFS can legitimately be classified as IaaS components.
AEM allows users to reserve machines through an XtreemOS Virtual Organization (VO) when they need to execute a job. In this sense, it is directly comparable to Amazon’s EC2 and other similar services. The two types of services differ in four main aspects:
1. Different computation-as-a-service offers rely on different APIs. So far no clear consensus has emerged regarding a standard access API for such services. On the other hand, AEM is meant to be invoked via the XOSAGA API, which relies on the standard SAGA API for Grid Applications. One could relatively easily build an EC2-compatible API to the AEM.
2. IaaS services typically rely on virtualization techniques where the resources are offered to users in the form of a full virtualized operating system instance. On the other hand, the AEM executes jobs directly, with no virtualization. This difference arises from different requirements in different services. Cloud platforms must be usable by a very large range of users each of whom may want to use a different operating system and work in total isolation from others. On the other hand, XtreemOS intends to be the standard operating system used to develop Grid applications, so there is no need to virtualize XtreemOS on top of XtreemOS. XtreemOS also provides strong isolation between multiple jobs running on the same hardware through the use of Linux containers.
3. IaaS platforms use a pay-as-you-go pricing model, while XtreemOS relies on the trust relationships between system administrators of a VO to implement a shared resource available to all users of the VO.
4. The security of IaaS platforms relies on a one-to-one trust relationship between the cloud provider and the cloud customer. On the other hand, the VO support in XtreemOS allows one to support several potentially mutually distrustful Cloud providers, allowing Cloud customers to select the provider of their choice.
XtreemFS allows Grid users to store data efficiently and share them across the whole system. In this sense, it is directly comparable to Amazon’s S3 and other similar services. Again, no standard API for Cloud storage seems to have emerged yet. Most Cloud storage services provide simplistic functionality, allowing a user to write blocks of files that can be read later but remain immutable. Conversely, XtreemFS implements the full Posix API where files can be updated and overwritten. One could relatively easily build an S3-compatible API to XtreemFS.
Although XtreemOS was not originally designed for Cloud computing applications, it does provide a good base platform for developing advanced Cloud computing functionality. We selected the specific topic of scalable database support to demonstrate how one can deploy Cloud functionality on XtreemOS.
Relational databases such as Oracle have been popular for decades. However, the great expressive power of the SQL query language makes it very difficult to scale them up by using large numbers of computers instead of a single powerful database server.
A new family of scalable database systems is being developed for Cloud computing environments, exemplified by Amazon.com’s SimpleDB, Google’s Bigtable, Yahoo’s PNUTS and Facebook’s Cassandra. These systems scale nearly linearly with the number of servers they are using, thanks to the systematic use of automatic data partitioning. On the other hand, they do not support the SQL language but rather provide a simpler query language. Data are organized in tables, which can be queried by primary key only. Similarly, these systems do not support join operations. As restrictive as such limitations may look, they do allow construction of useful applications.
To demonstrate how XtreemOS can be a great platform for PaaS Cloud computing, we ported the HBase system (an open-source clone of Bigtable) to XtreemOS. This provides XtreemOS with a scalable database service that can be used by Grid applications to store and query their structured data. Our performance evaluations show that HBase performs well on XtreemOS and allows Grid developers to write scalable data-intensive applications easily.
Contrail: an Integrated Approach to Virtualization
The experience acquired in the design of distributed operating systems for the Grid can be exploited in order to deliver in a timely fashion a system for dependable federated Clouds. The goal of the Contrail project is to develop, evaluate and promote an open source system for Cloud Federations. Contrail will leverage and extend the results from the XtreemOS project. As illustrated in Figure 1, the individual resources being contributed to the Federated Cloud will be highly heterogeneous in their hardware configuration and system-level organization. They may take the form of physical machines running the XtreemOS system (see panel 1), virtual instances from external Clouds (panel 2), virtual machines running XtreemOS (panel 3), or XtreemOS machines running virtualization software (panel 4).
Figure 1: individual resources being contributed to the Federated Cloud.
Contrail will vertically integrate an open-source distributed operating system for autonomous resource management in Infrastructure-as-a-Service environments, and high level services and runtime environments as foundations for Platform-as-a-Service. The main achievement will be a tightly integrated software stack in open source including a comprehensive set of system, runtime and high level services providing standardized interfaces for supporting cooperation and resource sharing over Cloud federations.
Contrail will address key technological challenges in existing commercial and academic Clouds: the lack of standardized rich and stable interfaces; limited trust from customers; and relatively poor Quality of Service (QoS) guarantees and SLA support regarding the performance and availability of Cloud resources. Addressing these important issues is fundamental to support large user communities formed of individual citizens and/or organizations relying on Cloud resources for their mission-critical applications.
Christine Morin, Yvon Jégou
INRIA Rennes - Bretagne Atlantique, France
VU University Amsterdam, The Netherlands