DECEPT: Detecting Cyber-Physical Attacks using Machine Learning on Log Data

by Florian Skopik, Markus Wurzenberger, and Max Landauer (AIT Austrian Institute of Technology)

Most current security solutions are tailored to protect against a narrow set of security threats and can only be applied to a specific application domain. However, even very different domains share commonalities, indicating that a generally applicable solution, to achieve advanced protection, should be possible. In fact, enterprise IT, facility management, smart manufacturing, energy grids, industrial IoT, fintech, and other domains, operate interconnected systems, which follow predefined processes and are employed according to specific usage policies. The events generated by the systems governed by these processes are usually recorded for maintenance, accountability, or auditing purposes. Such records contain valuable information that can be leveraged to detect any inconsistencies or deviations in the processes, and indicate anomalies potentially caused by attacks, misconfigurations or component failures. However, syntax, semantics, frequency, information entropy and level of detail of these data records vary dramatically and there is no uniform solution yet that understands all the different dialects and is able to perform reliable anomaly detection on top of these data records.

Today’s advanced process security and protection mechanisms for IT systems apply white-listing approaches based on anomaly detection that observe events within a system and automatically establish a baseline of normal user- and system behaviour. Every deviation from this normal behaviour triggers an alert. While there exist numerous behaviour-based anomaly detection approaches for IT security in research [1], they are not easily applicable to other non-IT-centric domains. The reason for this is that these anomaly detection approaches for IT security are usually highly optimised for very specific application areas, i.e. different approaches exist for CPS, cloud security, etc., but they are not adaptive enough to be generally applicable to other domains. Most of them require detailed expertise in the application area and are costly to set up and maintain. Furthermore, most of them analyse network-traffic only, which relies on investigation of domain-specific protocols and becomes ineffective due to the wide adoption of end-to-end encryption. This makes it impossible to track the real system behaviour by inspecting network traffic only. Thus, generally applicable anomaly detection solutions that utilise unstructured textual event logs, created directly by the entities in an environment (e.g., host, camera, control panel etc.) are a promising means to security.

DECEPT technical objective
The overall goal of DECEPT is to develop a generally applicable concept of an anomaly detection (AD) approach that can be applied to various domains, and to implement a proof of concept that demonstrates the ability of the DECEPT approach to analyse and evaluate work processes, as well as to monitor environmental events, in different application areas. Figure 1 illustrates the methodology to achieve this objective. The DECEPT approach will analyse unstructured textual event data, such as syslog messages from computer systems or protocol data from manually recorded events (e.g., access logs). In the training phase (1a) a parser generator [2] analyses the text data and automatically builds event parsers, i.e., identifies implicit structures in apparently unstructured records. Then, it uses general data representation models as building blocks to iteratively increase the comprehension of text structures and embedded data types (such as dates, times, identifiers etc.). This way, parsers can be created to decompose and understand textual representations of events with no manual intervention.

Applying parsers generated in this way to different sources of unstructured textual event data, allows data of different types, sources and domains to be correlated. After the training phase (2), the parser obtains the event parsers from the parser generator. The parser then analyses the unstructured textual event data entities separately (1b), i.e. it performs a single event evaluation. Thus, depending on the configuration, the parser either forwards not parse-able events to the parser generator, which collects them and adapts the event parser (3a), or it triggers a point anomaly (3b). The rule generator/evaluator and time series analysis module [3] obtains parse-able events from the parser and defines and evaluates statistical rules. For example, it evaluates the distribution with which events occur, it defines correlation rules (e.g., timely correlation of variable parts of the event data), and it carries out a time series analysis. Thus, the module performs event correlation evaluation, allowing detection of deviations of complex processes from the normal system behaviour; these deviations manifest in anomalous event frequencies and sequences (5). Eventually, the parser along with the rule evaluator module, define a model of the normal behaviour of the observed environment (e.g., a computer network, or a facility), evaluate the model and continuously verify it.

Figure 1: The DECEPT methodology.

DECEPT will demonstrate in the course of a proof of concept the general applicability of the anomaly detection approach in two independent application areas: (i) Enterprise IT security and (ii) (IT-supported) facility security. Since modern attacks often exploit vulnerabilities in different application areas, processes of, for example, IT security and facility security must be aligned to allow a timely reaction to potential attacks. Some examples of such attacks are the remote manipulation of IP-protocol based access control systems to aid physical intrusion, or the physical access to and manipulation of switches to make them vulnerable to cyber-attacks.

The project DECEPT and its consortium
In order to attain these ambitious goals and finally ensure the wide applicability of developed tools and procedures, the project consortium consists of a vital mix of a strong academic partner with deep knowledge in cyber security and machine learning (Austrian Institute of Technology), an enterprise security solution vendor (Huemer iT-Solutions) and a vendor of physical security equipment (PKE Holding AG). DECEPT is a 30-month national research project running from 2020 to 2022 and is funded by the Austrian FFG Research Program “ICT of the Future”.

References:
[1] R. Mitchell and I.R.Chen: “A survey of intrusion detection techniques for cyberphysical systems”, ACM Comp. Surv. 46, 4, Article 55, 2014.
[2] M. Wurzenberger et al.: “AECID-PG: A Tree-Based Log Parser Generator to Enable Log Analysis”, in 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM) (pp. 7-12), IEEE.
[3] M. Landauer et al.: “Dynamic log file analysis: An unsupervised cluster evolution approach for anomaly detection”, computers & security, 7(9), 94-116, 2018.

Please contact:
Florian Skopik, AIT Austrian Institute of Technology, Austria
+43 664 8251495, This email address is being protected from spambots. You need JavaScript enabled to view it.

Sidebar

Contents

DECEPT: Detecting Cyber-Physical Attacks using Machine Learning on Log Data