GLITCH: Polyglot Code Smell Detection in Infrastructure as Code

by Nuno Saavedra, João F. Ferreira (INESC-ID and University of Lisbon) and Alexandra Mendes (INESC TEC and University of Porto)

GLITCH is a versatile tool designed for detecting code smells in Infrastructure as Code (IaC) scripts across multiple technologies. Developed by researchers from INESC-ID (Lisbon), INESC TEC (Porto), Instituto Superior Técnico / University of Lisbon, and the Faculty of Engineering / University of Porto, GLITCH automates the detection of both security and design flaws in scripts written in Ansible, Chef, Docker, Puppet, and Terraform. By using a technology-agnostic framework, GLITCH aims to improve the consistency and efficiency of code smell detection, making it a valuable resource for DevOps engineers and researchers focused on software quality.

As Infrastructure as Code (IaC) scripts become increasingly critical for automating IT infrastructure, ensuring their consistency and security is essential [1, 2]. Code smells – patterns that suggest potential issues – can lead to vulnerabilities, inefficiencies, and maintenance difficulties. GLITCH addresses these concerns by detecting a wide range of smells across multiple IaC languages, including security, design, and implementation flaws.

The GLITCH project is supported by a collaboration of researchers focused on enhancing software engineering practices. Their combined expertise aims to improve the quality and security of software systems through the development and application of this technology-agnostic tool.

Motivation
As organisations increasingly rely on IaC to automate and manage their infrastructure, the risk of introducing issues or vulnerabilities through poorly written or insecure code has become a significant concern. For example, Facebook’s outage in 2021 was triggered by a misconfiguration in their internal backbone network, leading to disruptions in communication between data centres. This issue disconnected Facebook’s services and tools, making it impossible to diagnose and resolve the problem quickly. During the day of the outage, shares in the company dropped by nearly 5% and Facebook CEO Mark Zuckerberg’s wealth fell by more than $6 billion. According to a report produced by Fortune and Snopes, Facebook lost at least $60 million in advertising revenue. In the developing world, the outage disrupted daily life as Facebook’s platforms are key for communication. In conflict zones like Syria, aid workers relied on WhatsApp to share bombing locations for safe travel but were unable to do so during the outage.

Traditional tools for detecting code smells in IaC scripts often focus on specific technologies and are developed independently, leading to inconsistencies and a lack of comprehensive coverage. GLITCH was conceived to address these gaps by offering a unified, technology-agnostic approach to code smell detection. By detecting and mitigating code smells early in the development process, GLITCH aims to prevent potential security vulnerabilities, reduce maintenance costs, and enhance the overall quality of IaC scripts.

GLITCH is open-source [L1] and its development began in 2022. As the project progressed, it expanded to include design and implementation smells and extended its support to Docker and Terraform [2]. Preliminary results have already demonstrated GLITCH’s effectiveness in detecting a broad range of code smells. The project is ongoing, with continuous improvements being made to enhance its capabilities and extend its applicability to new IaC technologies.

Aim and Techniques Employed
By providing a unified approach to code smell detection, GLITCH seeks to reduce the effort required to develop and maintain secure and high-quality IaC scripts. Additionally, the project aims to create large datasets of IaC scripts and code smells that can be used by researchers and practitioners to further advance the field of software quality and security.

GLITCH approaches code smell detection by transforming IaC scripts into an intermediate representation. This representation captures common concepts across different IaC technologies, enabling the development of code smell detectors that are not tied to any specific language or tool. The framework uses both rule-based and algorithmic approaches to detect various smells. For security smells, a rule-based approach is used, while design and implementation smells are detected using more complex algorithms. The tool traverses the intermediate representation of the scripts, applying multiple analyses to identify all potential code smells. Figure1 shows GLITCH’s architecture overview.

Figure 1: GLITCH’s architecture overview.

Academically, it contributes to the field of software engineering by providing a novel, unified framework for code smell detection in IaC scripts. Practically, it offers a valuable tool for DevOps engineers, system administrators, and organisations that rely on IaC to manage their infrastructure. By improving the security and quality of IaC scripts, GLITCH helps organisations prevent potential vulnerabilities and reduce the costs associated with maintaining complex infrastructure systems.

Future Activities
The team aims to refine the existing detection mechanisms to improve their precision and recall. Another significant future activity is the creation of oracle datasets for design and implementation smells, similar to those already developed for security smells. These datasets will be used to evaluate the effectiveness of GLITCH and to facilitate further research in the field. Current efforts also include automated program repair of IaC scripts at the level of GLITCH’s intermediate representation.

In conclusion, GLITCH represents a significant step forward in the field of software security, particularly in the context of IaC. By providing a comprehensive, technology-agnostic tool for code smell detection, GLITCH helps ensure that IaC scripts are secure, consistent, and maintainable, ultimately contributing to the integrity and reliability of the software systems they support.

Link:
[L1] https://github.com/sr-lab/GLITCH

References:
[1] N. Saavedra and J. F. Ferreira, “GLITCH: automated polyglot security smell detection in infrastructure as code,” in Proc. of the 37th IEEE/ACM Int. Conf. on Automated Software Engineering, pp. 1–12, 2022.
[2] N. Saavedra, et al., “Polyglot code smell detection for infrastructure as code with GLITCH,” in Proc. of the 38th IEEE/ACM Int. Conf. on Automated Software Engineering (ASE), pp. 2042–2045, IEEE, 2023.

Please contact:
João F. Ferreira, INESC-ID and IST, University of Lisbon, Lisbon, Portugal
This email address is being protected from spambots. You need JavaScript enabled to view it.

Sidebar

Contents

GLITCH: Polyglot Code Smell Detection in Infrastructure as Code