by Tesfaye A. Zerihun, Bjarne E. Helvik, Poul E. Heegaard and John Krogstie
In a world where ICT systems are everywhere and are critical for the well being, productivity and in fact the survivability of our society, it is crucial that they are resilient to all kinds of undesired events, random failures, mistakes, incompetence, attacks, etc. To deal with this challenge, a thorough understanding of the nature of their complexity and inter-dependencies is needed. A quantitative model of a digital ecosystem can offer insights into how management and operations can be conducted within, and coordinated across the different autonomous domains that constitute the global, complex, digital ecosystems.
Interworking ICT systems have become critical infrastructure for society, and are a prerequisite for the operation of critical infrastructures – e.g. payment systems, electricity grids and transportation. The challenges posed by these highly interwoven infrastructures were addressed in the FutureICT initiative [1], [2]. Modern society depends on the robustness and survivability of ICT infrastructure; but to achieve these qualities, we must address several challenges posed by the evolution of this technology:
- The public ICT service provisioning infrastructure can viewed as an ecosystem; the result of cooperation between many market actors. The overall ecosystem is not engineered, and there is no aggregate insight into its design and operation.
- There is no coordinated management that may deal with issues involving several autonomous systems, in spite of such issues being a likely cause of extensive problems and outages.
- It is necessary to prepare for restoration of service after a major event such as common software breakdown, security attacks or natural disasters. This preparation must include technical, operational as well as organizational and societal aspects.
- There are currently no theoretical foundations to control the societal and per service dependability of this infrastructure, neither from a public regulatory position, nor from groups of autonomous (commercially) co-operating and partly competing providers.
The objective of the Quantitative Modelling of Digital Ecosystems project is to establish a quantitative model for a digital ecosystem. The model should form the basis for a resilience engineering oriented approach [3] to deal with robustness and survivability challenges in the ICT infrastructure.
The model of an ICT infrastructure must describe the structure and behaviour of the physical and logical information and network infrastructure, including the services provided. Through the modelling phases it should also describe how resilience engineering [3] can be applied to manage the robustness and survivability of the ICT infrastructure. The simplest resilience approach is simply to monitor the system’s state and react to anomalies. This might work well when failure events are infrequent and the response to one event can be completed before the next occurs. The modelling should help us determine how to monitor and react to anomalies.
A more realistic approach is to have both reactive and proactive responses, and to learn from the experiences. Again the modelling should help achieve the insight and understanding necessary to define and take actions that will improve the resilience of the ICT system. The learning includes regulations, management guidelines, and policies, which will influence the properties of the system and therefore also refine the model. The last and very crucial step in resilience engineering is to anticipate known and unknown events so it is possible to be proactive as well as reactive. The predictions that can be learnt from the modelling provide very important input to the assessment of the risk of being too early; i.e. proactive measures that are considered to be a waste of time and money, in contrast to being too late, which implies that the events escalate with larger consequences and much higher cost of recovery than necessary. The holistic model of the ICT infrastructure and the resilience engineering applied to it, is illustrated in Figure 1.
Figure 1: Conceptual sketch for a resilience engineering approach to improve ICT infrastructure robustness.
This work is still at an early stage. Among the outcomes we aim to achieve are:
- A basis for a continuous monitoring, anomaly detection and handling, system improvement cycle, according to the Resilient Engineering approach.
- Better prediction of risks and vulnerabilities incurred by ICT services provided by a heterogeneous eco-system like infrastructure.
- A basis for setting guidelines for regulation by public authorities.
Links:
NTNU/IME: Open and Autonomous Digital Ecosystems (OADE): http://www.ntnu.edu/ime/oade
NTNU QUAM Lab: Quantitative modeling of dependability and performance: http://www.item.ntnu.no/research/quam
References:
[1] D. Helbing: “Globally networked risks and how to respond”, Nature, 497(7447):51–59, 05 2013.
[2] S. Bishop: “FuturICT: A visionary project to explore and manage our future”, ERCIM News, (87) p.14, October 2011.
[3] E. Hollnagel, D. D Woods, N. Leveson: “Resilience engineering: Concepts and precepts”, Ashgate, 2006.
Please contact:
Bjarne E. Helvik
NTNU, Norway
E-mail: