by Björn Levin (RISE) and Peter Kunz (ERCIM Office)
ERCIM invited experts in the field of artificial intelligence to a workshop held on 26 May 2020, with the aim of collecting ideas and suggestions about approaches to ensure high quality artificial intelligence (AI) in practical applications.
We briefly summarise the main items and ideas discussed in the workshop. The full report will be published on the ERCIM website. The discussed items have been grouped into five clusters:
- Governance – what factors and regulations should society consider to ensure the best use of AI, who should be responsible for what, and how should rules and regulations be developed?
- Trust – how is trust best built both among the public, and among professionals in other fields that use AI?
- Skills – how do we ensure that we have sufficiently skilled and responsible AI engineers?
- Process – how do we avoid repeating errors, how do we build and communicate best practices, and what public resources are needed?
- Testing – what can be tested, and what about the variables that are outside the realm of testing?
- Quality of data and methods – transparency vs integrity and trade secrets, deterioration over time, and the pros and cons of explainable AI.
Governance for AI touches fundamental values such as equality and fairness. Several participants called for the development of a vision of how society should work, including for AI and its use. However, a problem that was raised is that the interpretation of these values changes over time. What was morally the norm one or two generations ago is in many cases no longer valid. Humanity improves, but this makes it hard to create fixed rules that are built into systems that may exist over long periods. The interpretations are also complex, require delicate balancing of risks, and vary with different applications, domains, and communities where the same AI components are used.
Unlike other systems, the complexity of AI systems often makes them hard to grasp. AI systems are also usually trained in a way that is unusual in engineering, in that they minimise average error but provide no hard limits. (A bar of steel sold as 1 m plus or minus 1 mm will never be longer or shorter than that, whereas AI systems on average do not deviate more than a certain error, but may be wildly off in individual situations.) This is a huge obstacle in building trust. One way of handling this is to educate people. Another way of improving the situation is to educate AI ambassadors – people who can explain what a certain piece of equipment does and what to expect from it. They could also assist in procurement and in user studies. An issue that came up several times is expectation management. AI has often been grossly oversold; even the name AI is poorly chosen. In its defence it is hard to describe all the things that can be achieved using AI without it sounding like a universal solution for everything. A solution that was discussed heavily during the workshop is explainable AI, i.e. systems that can motivate their decisions when required. Such a property would be highly effective in building trust and setting the right expectations.
One conclusion that was almost unanimously agreed on during the workshop is the need for a common curriculum for AI education. This could include a minimum set of topics to be covered and practical exercises in AI applications going wrong. It could also be combined with a European certification as an AI engineer. Outside this, the workshop favoured more focus on the processes (see below) rather than focus on the individuals creating AI applications. It was also pointed out that ACM is currently working on standards for AI degrees.
There was a strong agreement in the workshop that the processes around the applications of AI are highly important. Good AI algorithms and well curated data are extremely important for a good AI application, and a good process surrounding these is critical. A principal purpose of a good process is to continuously improve quality. All agreed that there is vast experience that should be collected, curated and unified, compiled and condensed, and used for teaching and continuous improvement. However, there was some debate about how this should be done, since many perceive that there is a general reluctance in reporting, due to stigma and financial risk. There is also a large amount of work involved in curating and condensing the information. One proposal is to use the data factories (i.e. experimental facilities that facilitate the access to data for the development of methods and applications) to gather and disseminate best practice. A complementary proposal is an official European Commission publication, acting as an official channel to communicate best practices from multiple sources. Looking further into the future, one could also try to establish a research field in the use of AI, distinct from the development of AI, in order to stress the importance of this. Another suggestion was the creation of standard benchmarks. The difficulty here is maintaining an up-to-date set, and how strongly dependent on the application domains that the benchmarks will be. A solution would be to rely on institutes and universities to maintain this under the patronage of the European Commission. A point of strong agreement is that AI is software and that it is almost always a part of a large software system. In many respects we need to view the application of AI as a process and not as a product. We therefore need a software engineering approach to the use of AI, contrary to the mostly mathematical and algorithmic approaches used so far.
Testing and validation are integral parts of AI, and there is a large body of publications on this with respect to core AI. However, testing the entire systems that AI is part of is a different matter. This is especially difficult since the effects in the surrounding systems stemming from errors in the AI part are difficult to anticipate. It is also impossible to exhaustively test the core AI module once it has even a moderate number of inputs.
While the "unknown unkowns" will always exist, their effects can be reduced with good processes. Some interesting questions evolved from the concept of the contract between the AI specialist and the problem owner. What are component’s properties that the AI specialist delivers? How should they be described? What does it mean legally? How would one test for legal conformity against what is actually statistical properties in high dimensions? These are questions that need to be answered.
Quality of data and methods
The quality of data is an obvious weakness of AI. It is especially true if the system continuously learns from new data, as there is generally poor control over bias in this stream, and according to experience, the quality of data tends to deteriorate over time. Many of the participants advocated requiring AI providers and users to openly publish all their data. This was criticised based on issues of individual privacy and, business considerations. The proposed solution is to create AI auditors that will perform audits in a similar way to financial audits but on processes and data practices related to AI. It may be possible to create standards for training data with corresponding certification (“Only organic data used”). Given the rapid development in the area, it was suggested that this should be done by industry consortia or institutes, as formal standardisation processes would be too slow. If data is shared in several steps, a pedigree of the data needs to be established. A complementary approach that was suggested, is to provide good public datasets on which systems can be trained. A question is then how to maintain quality and relevance over time. This would require a curator, which could be – as mentioned above – under the auspices of the European Commission and might be part of the mission of data factories. There also need to be best practices on how to use data for training, testing and validation, the importance of cross-validation and permutation, etc.
Please note that the ideas presented do not necessarily reflect the opinions of any individual participant in the workshop.
The participants of the workshop were:
- Björn Levin, RISE
- Gabriel David, INESC TEC
- Daniel Gillblad, RISE
- Arnaud Gotlieb, Simula Research Laboratory
- Olivier Grisel, Inria
- Alípio Jorge, INESC TEC
- Bert Kappen, Radboud University
- Fabio Martinelli, CNR
- Michael Mock, Fraunhofer IAIS
- Anirban Mukhopadhyay, TU Darmstadt
- Ana Paiva, INESC-ID
- Han La Poutré, CWI
- Andreas Rauber, TU Wien
Björn Levin, RISE, Sweden