by Daniel Setó-Rey, Carlos López-Nozal, and José Ignacio Santos-Martín (Universidad de Burgos)

The reuse of software by importing packages from repositories is an efficient way to develop software. However, reusing software in this manner introduces vulnerability risks due to transitive dependencies. These vulnerabilities must be measured to identify risks and propose corrective actions.

Software ecosystems are communities formed around programming languages and shared package management tools, enabling developers to create new packages that import and reuse the functionality of others [1]. Development within these ecosystems is efficient because common functionalities only need to be developed, maintained and tested by a single team, rather than having multiple authors reimplement the same functionality. The use of centralised library repositories to reduce development times and costs is widespread across nearly all languages and software projects. However, this efficient approach introduces vulnerability risks, primarily due to transitive dependencies and packages cycles. Because of dependency transitivity, a single defect in the repository can have far-reaching and unpredictable effects on the ecosystem. These defects may result in functional errors or issues with performance and security. The risk can be hard for developers to assess, as they typically import only a small portion of the dependencies.

In [2], we propose, develop, and test a new theoretical model to characterise the vulnerability of package repositories, which are represented as complex networks of dependencies. We define vulnerability as a metric that measures the sensitivity of a package repository to random defects represented by a cost function called φ Reach. We applied this model to three well-known package repositories (PyPI, Maven and npm) to calculate their vulnerability [L1]. The package network and dependency information were constructed using the libraries.io data dump [L2]. Our analysis revealed that the emergence of a large strongly connected component (SCC), a set of mutually dependent packages, is associated with a disproportionate increase in the vulnerability of package dependency networks (see details in Table 1).

Table 1: Characteristics and vulnerability to failure of reference package dependency networks. n: number of packages, m: number of arcs (dependency relations), 2nd and 1st-SCC: second largest and largest strongly connected component present, φ Reach: vulnerability to failure measured by the Reach cost function and next to it percentage vulnerability in relation to network size (n).
Table 1: Characteristics and vulnerability to failure of reference package dependency networks. n: number of packages, m: number of arcs (dependency relations), 2nd and 1st-SCC: second largest and largest strongly connected component present, φ Reach: vulnerability to failure measured by the Reach cost function and next to it percentage vulnerability in relation to network size (n).

Based on the concept of node vulnerability, we define immunisation at a node as any corrective or preventive action taken to eliminate the possibility of it failing or incorporating a defect. The effect of immunising a set of nodes is calculated as the difference between the network vulnerability of the initial network and that of the immunised network. In our experiments [2], we observed that protecting SCC to prevent the introduction or propagation of defects can nearly eliminate the network’s vulnerability. However, depending on the number of such packages, this solution may not be practical. Identifying the optimal set of nodes to immunise for the greatest reduction in vulnerability is an NP-hard problem, so heuristics are necessary to find sufficiently good solutions. Figure 1 shows the SCC of the Maven package dependency network, consisting of 981 nodes (0.8% of the network). Immunising the component’s cut vertices (351 nodes) can reduce vulnerability by 93.9%. Additionally, Figure 1 highlights the ten cut nodes with the highest out-degree centrality and betweenness centrality, resulting in vulnerability reductions of 13% and 26%, respectively.

Figure 1: The SCC of the Maven package dependency network consists of 981 nodes, representing 0.8% of the network. The nodes highlighted in white (351 nodes) are the component’s cut vertices. The nodes highlighted in blue are the ten cut vertices with the highest out-degree centrality, while the nodes highlighted in red are the ten with the highest betweenness centrality.
Figure 1: The SCC of the Maven package dependency network consists of 981 nodes, representing 0.8% of the network. The nodes highlighted in white (351 nodes) are the component’s cut vertices. The nodes highlighted in blue are the ten cut vertices with the highest out-degree centrality, while the nodes highlighted in red are the ten with the highest betweenness centrality.

We used a variety of techniques to narrow down the sets of important packages related to the network’s vulnerability, achieving similar reductions by acting on a smaller number of packages. We demonstrate that selecting the set of strong articulation points (SAP) of the SCC achieves reductions similar to acting on the entire SCC.

This work helps decision-makers in software ecosystems (such as software developers, package developers and package repository managers) in assessing vulnerability risks caused by dependencies on third-party packages. Specifically, it can help:
• package repository managers to establish and follow vulnerability reduction plans
• software developers to evaluate the overall risk associated with the use of a given package
• package developers to assess the overall risk associated with package development in the context of a given repository
• package repository managers to establish measures to reduce or eliminate the occurrence of strongly related components, such as dependency cycle control
• package repository managers to implement immunisation policies to reduce the vulnerability of the network, with different techniques available to find good immunisation target sets.

Links: 
[L1] https://doi.org/10.5281/zenodo.7358391 
[L2] https://libraries.io/ 

References: 
[1] C. Bogart, et al., “When and how to make breaking changes: policies and practices in 18 open source software ecosystems,” ACM Trans. Softw. Eng. Methodol., vol. 30, no. 4, 2021. doi:10.1145/3447245
[2] D. Seto-Rey, J. I. Santos-Martin, and C. Lopez-Nozal, “Vulnerability of package dependency networks,” IEEE Transactions on Network Science and Engineering, vol 10, no. 6, pp. 3396–3408, 2023. https://doi.org/10.1109/TNSE.2023.3260880

Please contact: 
Carlos López-Nozal, Universidad de Burgos, Spain
This email address is being protected from spambots. You need JavaScript enabled to view it.

 

Next issue: January 2025
Special theme:
Large-Scale Data Analytics
Call for the next issue
Image ERCIM News 139
This issue in pdf

 

Image ERCIM News 139 epub
This issue in ePub format

Get the latest issue to your desktop
RSS Feed