INDICÆTING – Automatically Detecting, Extracting, and Correlating Cyber Threat Intelligence from Raw Computer Log Data

by Max Landauer and Florian Skopik (AIT Austrian Institute of Technology)

“Cyber threat intelligence” is security-relevant information, often directly derived from cyber incidents that enables comprehensive protection against upcoming cyber-attacks. However, collecting and transforming the available low-level data into high-level threat intelligence is usually time-consuming and requires extensive manual work as well as in-depth domain knowledge. INDICÆTING supports this procedure by developing and applying machine learning algorithms that automatically detect anomalies in the monitored system behaviour, correlate affected events to generate multi-step attack models and aggregate them to generate usable threat intelligence.

Today’s omnipresence of computer systems and networks provides great benefit to the economy and society, but also opens the door to digital threats, such as data theft. Hackers and cyber-criminal groups carry out attacks by exploiting vulnerabilities of the deployed systems, with little or no time for the victims to react.

The growing interconnectedness of digital services helps to enable cyber-attacks. The Internet of Things and Industry 4.0 entail the emergence of highly complex system landscapes that offer numerous entry points for infiltration. As a simple measure of protection, most modern systems are equipped with blacklists containing indicators of compromise, such as IP addresses, known to correspond to adversarial entities. Detection systems monitor infrastructure states and outbound connections, compare the observed events with predefined signatures specified in the blacklists to raise alerts if certain thresholds are exceeded.

However, this basic approach suffers from a serious shortcoming: simple indicators of compromise (IoC) such as malicious IP addresses are highly volatile and only valid for a short period of time, since it is easy for attackers to circumvent their detection. Tactics, techniques and procedures (TTP) on the other hand are valid for a longer time, because it is difficult to change the modus operandi of attacks [1]. However, compiling threat intelligence on TTPs is difficult and requires manually analysis of complex attack patterns. All manual analyses are tedious and time-consuming, and since attacks are often carried out on a large-scale and affect multiple organisations, too much time passes until blacklists are updated with information on immanent attacks.

INDICÆTING aims to solve this problem by automatically generating complex “threat intelligence”, i.e., more expressive than simple IoCs, but rather complex TTPs. INDICÆTING thereby pursues anomaly detection rather than blacklisting, i.e., instead of relying on an existing knowledge base, INDICÆTING makes use of self-learning algorithms that capture the normal system behaviour over time and detect deviations from the expected patterns [2]. This way, continuously generated low-level log data that documents almost all events occurring in the observed system is continuously monitored as soon as it is generated.

Log data contains semantically expressive parameters describing the current system state and is thus suitable for analysing the roots of system failures in hindsight after incidents occurred – a task that has been carried out by software engineers for decades. Only recently, have system logs been analysed in real-time in order to indicate system problems almost as they occur. However, this is a highly non-trivial activity: the main issue with processing system logs is that they are unstructured and different on every system, and it is challenging to automatically extract parameters and map individual log lines to more abstract event classes without human intervention. INDICÆTING achieves this by parsing the data, i.e., determining which parts of the log lines correspond to constant (textual) parts, and which correspond to parameters such as usernames, IDs, IP addresses, etc. As shown in Figure 1, a parser generator learns the structure of log data collected from a honeynet, i.e., a system specifically set up to attract attackers. Once the parsing model is established, INDICÆTING is able to retrieve parameter values and reason on their static distributions, discover dynamic dependencies between events and construct process models. Based on these values, anomalies are then detected by comparing each newly incoming log line with a corresponding model that was trained over a long time, i.e., a baseline model. Thereby, efficiency of the proposed algorithms is a key feature, because log data is typically produced in enormous amounts and fast rates.

Figure 1: Process for the automatic generation of Cyber Threat Intelligence.

Since anomalies reported on individual parameters or events are not sufficient for describing complex attack patterns, a subsequent step is required that analyses the overall system behaviour. For this purpose, the identified anomalies are correlated over multiple data channels and architectural layers in order to model multi-step attacks and derive abstract TTPs that affect several components in a narrow time window. For example, consider an employee entering login credentials after receiving a phishing email that contains a URL to a malicious website. Using the credentials, the attacker then infiltrates the network from a remote connection. This is a multi-step attack that can be detected by correlating URLs in mails, DNS, and web proxy logs. After appropriately modelling all involved steps and additional information on parameters, timing and context, the resulting high-level threat intelligence is suitable to be shared with other parties in cyber-security communities [L1].

INDICÆTING is financially supported by the Austrian Research Promotion Agency (FFG) under grant number 868306. The project is carried out in course of an industry-related PhD thesis at the Austrian Institute of Technology (AIT) in cooperation with the Vienna University of Technology (TU WIEN). During the runtime of the project, AIT’s Automatic Event Correlation for Incident Detection (AECID) [L2] tool will be further developed and used for evaluating the proposed concepts.

References:
[1] D. Chismon, M. Ruks: “Threat Intelligence: Collecting, Analysing, Evaluating”, MWR InfoSecurity, 2015.
[2] V. Chandola, et al., “Anomaly Detection: A Survey”, in ACM Comput. Surv., 2009.

Links:
[L1] http://misp-project.org/
[L2] https://aecid.ait.ac.at/

Please contact:
Max Landauer
AIT Austrian Institute of Technology, Austria
+43 664 88256012
This email address is being protected from spambots. You need JavaScript enabled to view it.