by Thorsten Schuett and Guillaume Pierre
ConPaaS makes it easy to write scalable Cloud applications without worrying about the complexity of the Cloud.
ConPaaS is the platform as a service (PaaS) component of the Contrail FP7 project. It provides a runtime environment that facilitates deployment of end-user applications in the Cloud. The team encompasses developers and researchers from the Vrije Universiteit in Amsterdam, the Zuse Institute in Berlin, and XLAB in Ljubljana.
In ConPaaS, applications are organized as a collection of services. ConPaaS currently provide services for web hosting (PHP and Java), SQL and NoSQL databases (MySQL and Scalaris), data storage (XtreemFS) and for large scale data processing (Task Farming and MapReduce). Using these services a bioinformatics application could, for example, be composed of a MapReduce service backend to process genomic data, as well as a Web hosting and SQL database service to provide a Web-based graphical interface to the users. Each service can be scaled on demand to adjust the quantity of computing resources to the capacity needs of the application.
ConPaaS contains two services specifically dedicated to Big Data: MapReduce and TaskFarming. MapReduce provides users with the well-known parallel programming paradigm. TaskFarming allows the automatic execution of a large collection of independent tasks such as those issued by Monte-Carlo simulations. The ability of these services to dynamically vary the number of Cloud resources they use makes it well-suited to very large computations: one only needs to scale services up before a big computation, and scale them down afterwards. This organization provides all the benefits of Cloud computing to application developers -- without having to worry about Cloud-specific details.
An important element in all Big Data applications is the requirement for a scalable file system where input and output data can be efficiently stored and retrieved. ConPaaS comes together with the XtreemFS distributed file system for clouds. Like ConPaaS services, XtreemFS is designed to be highly available and fully scalable. Unlike most other file systems for the Cloud, XtreemFS provides a POSIX API. This means that an XtreemFS volume can be mounted locally, giving transparent access to files in the Cloud.
Figure 1: The main ConPaaS dashboard with three services running
One of our demonstrator applications is a Wikipedia clone. It can load database dumps of the official Wikipedia and store their content in the Scalarix NoSQL database service. The business logic is written in Java and runs in the Web hosting service. Deploying Wikipedia in the Cloud takes about 10 minutes. Increasing the processing capacity of the application requires two mouse clicks.
One of ConPaaS’s Big Data use cases is a bioinformatics application that analyses large datasets across distributed computers. It uses large amounts of data from a Chip-Seq analysis, a type of genomic analysis methodology, and an application that can be parallelized in order to make use of multiple instances or processors to analyse data faster. The application stores its data in XtreemFS and makes extensive use of ConPaaS’s MapReduce, TaskFarming, and Web hosting services. Users will use the application either directly through an API or through a web interface.
Although ConPaaS is already sufficiently mature to support challenging applications, we have many plans for further developments. In the near future, instead of manually choosing the number of resources each service should use, a user will be able to specify the performance she expects. ConPaaS will dimension each service such that the system meets its performance guarantees, while using the smallest possible number of computing resources. In the wiki example, for instance, one may want to request that user requests are processed on average in no more than 500 milliseconds.
We plan to allow users to upload complex applications in a single operation. Instead of starting and configuring multiple ConPaaS services one by one, a user will be able to upload a single manifest file describing the entire application organization. Thanks to this manifest, ConPaaS will be able to orchestrate the deployment and configuration of entire applications automatically.
Finally, we plan to provide an SDK for external users to implement their own services. For example, one could write a new service for demanding statistical analysis, for video streaming, or for scientific workflows. The platform will allow third-party developers to upload their own service as a plugin to the existing ConPaaS system.
In conclusion, ConPaaS is a runtime environment for Cloud applications. It takes care of the complexity of Cloud environments, letting application developers focus on what they do best: program great applications to satisfy their customers’ needs.
Vrije Universiteit in Amsterdam,
Zuse Institute in Berlin, Germany