by Lasse Nitz, Mehdi Akbari Gurabi, Avikarsha Mandal and Benjamin Heitmann (Fraunhofer FIT)
Many European organisations suffer from a lack of sufficient resources to provide satisfactory and timely response and recovery (R&R) actions when targeted by cyber-attacks. R&R capabilities can be significantly improved through sharing of information related to incident detection and handling. In this context, privacy-preserving technologies can enable data sharing, while protecting privacy- and security-critical information. The technologies to achieve this are being developed and evaluated in the SAPPAN project.
The computer security incident response team (CSIRT) plays a crucial role in an organisation's digital infrastructure. One of the responsibilities of a CSIRT is to detect, investigate, and mitigate potentially security-critical incidents. To help with the vast number of potential threats, many CSIRTs rely on partly automated systems, especially for the detection of incidents. Since the quality of these detection systems has a direct impact on the manual workload of incident handlers, who have to investigate the detected incidents, the false-positive rate of the incident detection should be as low as possible. The same applies to the false-negative rate, as every undetected incident might pose a serious security risk to an organisation. There is hence a need for high-quality detection system components, which detect incidents reliably without unnecessarily increasing the investigative workload of human operators. But since considerable effort is required to create such high-quality components, it is unfeasible for many small and medium-sized enterprises (SMEs) to create them on their own.
This problem could be overcome by the sharing of cyber-threat intelligence that helps detect, assess and handle incidents, for example as trained classifiers or cybersecurity playbooks. An abstract overview of a sharing system is shown in Figure 1. For security providers, this could constitute a meaningful way of extending their services, and for academic organisations it would allow research results to be made usable in practice. The main problem in sharing resources, however, is that they are usually based on privacy- and security-critical data, so it is vital that no sensitive information can be extracted from the shared resources.
Figure 1: Privacy-preserving data sharing approach for response and recovery. The process is split into four phases: local detection, local handling, collaborative detection, and collaborative handling. The local detection and handling address the detection, assessment and handling phases of the incident response lifecycle at the local level. On the collaborative level, information from the local level is shared to achieve a mutual perspective on attack detection, incident assessment and incident handling. Privacy issues are handled before or during the sharing of information.
While anonymisation and sanitisation solutions exist for various kinds of data within the cybersecurity domain (e.g., for IP addresses), other kinds of data – for example, uniform resource locators (URLs) – have not received the same level of attention. While URLs have been used in research, e.g., for the identification of phishing websites [1], methods for sanitising URLs are not yet mature. In particular, URLs collected as benign samples do not only reveal information about the browsing behaviour of individuals but can also provide access to restricted web resources via access tokens, and leak organisation-internal information via directory and file names, e.g., for URLs pointing to resources in a company's intranet. Hence, measures must be taken to prevent shared URLs and detection system components trained on URLs from leaking sensitive information. To avoid such leakage, URLs could be transformed into pseudo-URLs, which do not include any feasibly retrievable sensitive information, but still contain enough properties of the original URLs to be suitable for various tasks, such as machine learning. Compared to techniques like differentially private machine learning, such an approach based on pre-processing has the advantage of allowing not only trained models but also training data to be shared. We are evaluating different pre-processing approaches in regard to privacy guarantees and their suitability for various use cases.
Sharing of response and recovery (R&R) recommendations via machine-readable cybersecurity playbooks between organisations can facilitate security orchestration, automation and response (SOAR) [2]. A cybersecurity playbook is a guideline to build an action plan to follow before, during and after a cyber-attack. It includes the important and common steps to prepare, assess, and handle the incidents and provides best practices for combating similar threats. Many of the steps in playbooks are organisation- and resource-specific, and in sharing this confidential information with external parties, an organisation risks opening itself up to threats from attackers and aggressive business competitors. One of the main privacy requirements of playbook sharing is to identify crucial sensitive data, such as personally identifiable information (PII), tools, and infrastructure elements, which should not be revealed in the shared playbooks. Another requirement is to define resources to indicate the confidentiality level of any specific element of shared playbooks. In this case, if an element is marked ‘confidential’ by the producer, it must be masked even with no pre-defined sensitive or private information.
The aim of our project is to work out how to offer simple solutions such as access control on the shared data, as well as advanced anonymisation techniques to mask or remove confidential data. Making a playbook more generally applicable could be seen as one step in a sanitisation process, since playbooks that are less organisation-specific are more widely applicable. We are also considering how to enable consumers of the playbook to map abstract identifiers onto their organisation-specific identifiers. Increasing the abstraction level of playbooks may hamper automation and reduce its ability to identify a proper response, thus we will also evaluate the abstraction level with respect to the trade-off between data protection and usability of shared playbooks.
This work is being done within the EU H2020 project SAPPAN: Sharing and Automation for Privacy Preserving Attack Neutralization [L1]. SAPPAN has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 833418. The SAPPAN consortium consists of eight partners from five countries: Germany (Fraunhofer FIT as project coordinator [L2], RWTH Aachen University, University of Stuttgart), Czech Republic (CESNET, Masaryk University), Ireland (HPE), Finland (F-Secure), and Switzerland (Dreamlab). The project scope is not limited to privacy aspects of data sharing. Other topics include federated machine learning, automation of incident response, and suitable visualization methods for work within CSIRTs.
Links:
[L1] https://sappan-project.eu/
[L2] https://www.fit.fraunhofer.de/en/business-areas/data-science-and-artificial-intelligence/data-protection-and-sovereignty.html
References:
[1] O. K. Sahingoz, E. Buber, O. Demir, B. Diri: “Machine learning based phishing detection from URLs”, Expert Systems with Applications, vol. 117, pp. 345-357, 2019.
[2] C. Islam, M. A. Baer, S. Nepal: “A Multi-Vocal Review of Security Orchestration”, ACM Computing Surveys, 2019.
Please contact:
Avikarsha Mandal
Fraunhofer Institute for Applied Information Technology (FIT), Germany
+49 241 80 21510