by Luca Console and Mariagrazia Fugini
The WS-DIAMOND project aims at developing a framework for Web services that are endowed with self-diagnosis and self-repair capabilities. There are two main goals: the definition of an operational framework for self-healing service execution of conversationally complex Web services, where monitoring, detection and diagnosis of anomalous situations are carried out and repair/ reconfiguration is performed; and the definition of a methodology and tools for service design that guarantee run-time diagnosability and repairability.
The WS-Diamond Project, funded by the EU commission under the FET Open Framework (EU IST FET-STREP no. 516933), is developing a framework for self-healing Web services; that is, services able to self-monitor, to self-diagnose the causes of a failure, and to recover from both functional failures (eg the inability to provide a given service) and non-functional failures (eg loss of Quality of Service QoS). The focus of WS-Diamond is on composite and conversationally complex Web services. The second goal of WS-Diamond is to devise guidelines and tools for designing services in such a way that they can be easily diagnosed and recovered at execution time, as well as tools to support the design of complex self-healing processes.
WS-Diamond commenced in September 2005 and will end in February 2008. The project involves universities and research institutions in Italy (Università di Torino the coordinator and Politecnico di Milano), Austria (University of Klagenfurt and University of Vienna), France (LAAS-CNRS-Toulouse, IRISA-Université Rennes, Universitè Paris Sud), and the Netherlands (University of Amsterdam).
The project assumes that the availability and reliability of complex services will be of paramount importance in the near future. Indeed the reliability and availability of software, together with the possibility of creating self-healing software, is recognized as one of the major challenges for IST research in coming years. Hence, WS-Diamond research concerns a number of "grand challenges" as described within the Service-Oriented Computing research roadmap at all levels: dynamic connectivity capabilities based on service discovery, QoS-aware service composition, and design principles for self-healing.
This work considers complex Web services, described using Web service workflow languages and frameworks such as BPEL4WS and extended Petri Nets. These services include mechanisms for augmenting processes with monitoring process functionalities using a methodological approach that focuses on exception handling and compensation mechanisms. Methodologies and tools will achieve adaptive Web-based process execution based on flexible services. Attention is paid to conform to interaction patterns between organizations and to provide inherent flexibility and fault tolerance in process execution.
In its first phase (September 2005-January 2007), WS-Diamond designed and developed a platform for self-healing execution of complex Web Services, concentrating on run-time faults; the design issues are faced in the second phase of the project. This has led us to define the types of faults that can occur and that we want to diagnose, namely:
functional faults and specifically semantic data errors (eg wrong data exchanges, wrong data in databases, wrong inputs from user)
In this first phase, WS-Diamond concentrated on orchestrated services, even though some proposed solutions already take choreographed services into account. We extended Web service execution environments to include features useful in the support of the diagnostic/fault recovery process. An architecture that supports self-healing service execution was then defined. The architecture provides support for associating a diagnostic service with each application service, for gathering observations about service execution (eg data exchanged between services) and provides a repair service as sets of recovery and repair actions. The architecture also includes a monitoring service aimed at identifying QoS problems, and a repair plan generator and executor that support the execution of recovery plans on the basis of the diagnostic information.
We defined a catalogue of faults and possible observations, and proposed an architecture for the surveillance platform. The correctness of the distributed diagnosis algorithms has been proved formally. Repair is based on repair actions (retry, compensate, substitute etc) and plans (generated online or pre-prepared offline for a given process) which are executed if a failure occurs in the process and a fault is diagnosed. Repair is thus characterized as a planning problem, where the goal is to build the plan of recovery actions to be performed at run time in order to recover from service and application errors.
Future activities will work on the complete definition of a method for designing self-healing Web services, focusing on the detailed design and development of a platform for observing a set of symptoms occurring in complex applications, for executing the distributed diagnosis, and for generating, selecting, and executing minimal repair plans (eg with respect to cost functions).
Politecnico di Milano, Italy
Tel: +39 02 23993405