by Peter Biegelbauer, Alexander Schindler, Rodrigo Conde-Jimenez, and Pia Weinlinger (AIT Austrian Institute of Technology)
Large Language Models (LLMs) have the potential to support the civil service. They can be used to automate tasks such as document classification, summarisation, and translation, among others. However, there are also risks and challenges associated with their use.
LLMs have gained visibility with ChatGPT becoming publicly accessible in November 2022 and are increasingly integrated in widely used applications such as the search engines of Google and Microsoft (Bing). LLM-driven solutions such as Retrieval Augmented Generation (RAG) are utilised by firms such as Morgan Stanley for internal knowledge search and retrieval. This task is also a key problem for public administrations, which have to digest large amounts of data curated partially over centuries in various ways, utilising different logics and ontologies.
With civil service tasks increasing constantly, baby boomers retiring and qualified personnel becoming scarce, support is needed for, e.g. automating reviews of case files, summarising complex information, translating documents, and analysing large datasets. Also, the interaction with an increasingly active and critical public in democracies is becoming ever more time-consuming and might be supported by LLMs, e.g. in the form of responding to public inquiries as part of sunshine laws and powering chatbots.
However, in democracies, public administrations must operate strictly under the rule of law and their activities are constantly under public scrutiny – and for good reasons, too, since they are operating, e.g. with sensible financial, health, and security data. Ethical principles such as privacy, fairness, transparency, accountability and security, which over recent years have become part of the discussion of AI ethics [1, 2, 3], are nothing new for the civil service and indeed have been part of laws regulating public administrations for a long time.
The usage of LLMs carries several risks, which are potentially endangering these ethical principles. A taxonomy of risks by DeepMind lists the potential for discrimination, hate speech and inclusion, information hazards stemming from models leaking or inferring sensitive information, misinformation by misleading information, malicious uses, human-computer interaction harms stemming from overly trusting models as well as automation, access, and environmental harms with environmental or economic impacts arising from models [L1].
It is difficult to assess many of these risks, given the opacity surrounding LLMs. The data used to train them, especially, is practically unknown, as it is to a large part indiscriminately scraped from the Internet – what is referred to as common crawl, but so is the process in which LLMs are trained and finally the models themselves. Furthermore, LLMs – often referred to as foundation models – are general-purpose models, and the tasks imposed on them are often of a “zero-shot” nature, meaning that they have not been trained to solve these tasks and the results depend entirely on the opaque and likely biased training data.
Figure 1: Picture created by DALL·E – an office desk with books, files, and pencils.
These factors have contributed to a cautious approach to apply and introduce LLMs to new areas of public life, which has in turn led to the creation of regulations, guidelines, and internal rules across the world, from international organisations (OECD, EU), and nation states (e.g. China, US), to cities (e.g. City of Boston, City of Vienna) and organisations (e.g. Austrian Institute of Technology, Springer Publishing). These approaches to AI governance, however, are not the only ones to have emerged.
The Ada Lovelace Institute, for example, has proposed a series of strategies towards AI governance in the public sector. Such strategies include, among others, carefully considering counterfactuals to LLM implementation, guiding the evaluation of appropriate use cases with public life principles, and even third-party audits [L2].
Another alternative is to use (and even develop) LLMs locally. This offers the advantage of sovereignty in many respects, as no data is shared with services hosted abroad. On the other hand, publicly accessible foundation models are smaller in terms of size and complexity and therefore underperform compared to commercial products, which also have access to almost unlimited computing resources. To assess the trade-off between sovereignty, security and quality, relevant practical experience must first be gathered.
A number of measures are discussed in the Practical Guidelines for AI in the Civil Service [L3], developed by the AIT AI Ethics Lab [L4] for the Austrian government. The guidelines are designed to support public administration in procuring, using and evaluating AI applications. They outline the prerequisites and challenges of integrating AI into the civil service, focusing on ethical principles and regulatory discourse to establish a robust framework for AI implementation. In addition, the guide provides a “Criteria and Measures Catalogue for Ethical AI (EKIV)” to ensure compliance with ethical and legal standards and thus enable the responsible deployment of AI in the civil service. It propagates training opportunities for the administrative staff dealing with the planning, application or management of AI applications. We strongly believe this to be the most important step towards safely using AI and LLMs in the public sector, as trained civil servants better understand the chances and challenges involved.
 OECD Council (2019), Recommendation on Artificial Intelligence, Paris.
 EU High-Level Expert Group on Artificial Intelligence (2020), Assessment List for Trustworthy Artificial Intelligence (ALTAI) for self-assessment, Brussels.
 UNESCO (2021), Recommendation on the Ethics of Artificial Intelligence, Paris.
Peter Biegelbauer, AIT Austrian Institute of Technology
Alexander Schindler, AIT Austrian Institute of Technology