by Anirban Mukhopadhyay, David Kügler (TU Darmstadt), Andreas Bucher (University Hospital Frankfurt), Dieter Fellner (Fraunhofer IGD and TU Darmstadt) and Thomas Vogl (University Hospital Frankfurt)
From screening diseases to personalised precision treatments, AI is showing promise in healthcare. But how comfortable should we feel about giving black box algorithms the power to heal or kill us?
In healthcare, trust is the basis of the doctor-patient relationship. A patient expects the doctor to act reliably and with precision and to explain options and decisions. The same accuracy and transparency should be expected of computational systems redefining the workflow in healthcare. Since such systems have inherent uncertainties, it is imperative to understand a) the reasoning behind such decisions and b) why mistakes occur. Anything short of this transparency will adversely affect the fabric of trust in these systems and consequently impact the doctor-patient relationship.
Figure 1: Doctors and patients are important stakeholders in the discussion about safe and transparent AI. Involving them in solutions is crucial for successful applications.
Current solutions for transparency in deep learning (used synonymously with AI) centre around the generation of heat-maps. These highlight high-impact image regions on deep learning decisions. While informative in nature, direct adaptation of such methods into healthcare is insufficient, because the actual reasoning patterns remain opaque and leave a lot of room for guesswork. Here, deep generative networks show promise by generating visual clues as to why a decision was made .
We believe transparency in image based algorithmic decision making can only be achieved if expert computer scientists and healthcare professionals (radiologists, pathologists etc.) closely collaborate in an equal-share environment. In Central Germany, TU Darmstadt and Goethe University Frankfurt have formed an interdisciplinary expert working group of computer scientists and radiologists.
We identified three challenges that render interpretable and robust AI in medical applications difficult, and started researching systematic solutions:
- Bias, quality and availability of medical data for data-driven algorithms,
- Strict requirements by the regulatory approval process and
- Integration of AI into the doctor-patient relationship.
Bias, quality and availability of medical data for data-driven algorithms
Human beings are unique. The considerable inherent variability of human physiology is generally addressed by curated medical datasets and large case numbers. Additional variability is introduced by the complexity of healthcare facilities. When dealing with diseases, often the number of relevant variables is unknown in addition to the causality. Furthermore, data acquisition is sometimes limited by the need to restrict some investigations or treatments, owing to negative side effects on human health (e.g. radiation exposure of computer tomography). In comparison, other AI application areas can use vast Big Data databases by crawling the web or collected from users. This is compounded by the fact that indication, acquisition and in part interpretation of medical image data has by and large not undergone the standardisation already present in other clinical tests (e.g. blood sample tests). As such, clinics are not currently structured to collect data for traditional data-driven algorithms. Our initial work has shown specialised handling of such data significantly improves the performance of deep learning algorithms .
In a day to day context, medical annotations often remain guesses, whose uncertainty is acceptable to justify individual treatment decisions based on averaged cut-off values. To solve this contradiction, we model and incorporate the uncertainty of annotations into AI-based methods.
Strict requirements by the regulatory approval process
The volume of documentation and the regulatory complexity for licensing in health care are high for good reason: the stakes are high. Through regularisation and a high number of parameters, deep learning achieves high accuracy and good generalisation, but also obscures the insights into reasoning patterns and learned knowledge. One unexplained phenomenon can be found in adversarial examples: only minimally perturbed, crafted images lead to AI-assessments that contradict those (Figure 2) of the unperturbed image. Our work on single pixel flips introduces a systematic analysis for patterns of adversarial examples .
Figure 2: With perturbations beyond human perception, deep learning often leads to both wrong decisions and wrong corresponding interpretation.
Effective adversarial detection and prevention methods are required to avoid harm to patients. If left ignored, a malicious attacker, be it competitor or criminal organisation, can use this vulnerability to damage or extort medical device manufacturers and hospitals. Improvements are required to meet GDPR and documentation requirements. Based on our initial work , we focus on a systematic understanding of such adversarial examples.
Integration of AI into the doctor-patient relationship
How would you feel if you saw your doctor “google” all your questions? Theoretical machine learning research on interpretability often ignores the crucial “humane” aspect of trust between patient and medical expert ‒ and the role a doctor would take as “black-box vocalizer” in this context. When implemented wisely, AI should enhance, not replace, the decision making process. The ethical obligation to make the most informed decisions on what often resembles a life-altering or -ending change for the patient might make AI a necessity in the near future.
In light of the GDPR regulations for a “Right-to-Explanation” and the required high standards for the documentation of all medical examinations and incidents, this black-box behaviour of obscuring the reasoning is a significant obstacle for the clinical implementation of AI in healthcare. As (closely) collaborating experts, we are developing courses and guidelines to make doctors AI-ready.
In retrospective in-silico studies, deep learning-based algorithms are already showing promise to be the single most successful concept in medical image analysis. In order to realise the potential impact these algorithms can have on our healthcare, we need to introduce trust-focused prospective clinical studies. Transparency in decision making has developed to be a key component we expect from our doctors. As deep learning is introduced into healthcare, we need standardised methods to guarantee results and transparency to doctors.
 J. C. Y. Seah, et al.: “Chest radiographs in congestive heart failure: visualizing neural network learning”, Radiology, 2018.
 M. Sahu, et al.: “Addressing multi-label imbalance problem of surgical tool detection using CNN”, IJCARS, 2017.
 D. Kügler, et al.: “Exploring Adversarial Examples”, Understanding and Interpreting Machine Learning in Medical Image Computing Applications, Springer, Cham, 2018.
Informatik, Technische Universität Darmstadt, Germany