by Stefanie Grimm, Stefanie Schwaar and Patrick Holzer (Fraunhofer ITWM)

During the course of process digitalisation, new possibilities arise to efficiently check billing transactions. Our previous research has led to the development of auditing methodology using machine learning for several industries. To take this approach to the next level, we are helping organisations to collaborate through federated learning that complies with all aspects of confidentiality and security restrictions.

Federated learning can be used to train models across multiple clients individually without exchanging training data but utilising those trained models to contribute to a joint model. In the Department of Financial Mathematics at Fraunhofer Institute for Industrial Mathematics, we employ centralised federated learning in a cross-silo setting, i.e., models are sent and aggregated via a central server, and we assume a small number of clients, most of whom have large datasets. The basic procedure is to transmit a model, e.g. the architecture of a neural net, to all clients, train the model on each client separately using well-known algorithms, send the training results, e.g. weights of a neural net, back to a central server and aggregate them to a global model as visualised in Figure 1. This process is repeated until some stopping criterion steps in. Obviously, there are many ways of performing this procedure. Our aim is to find the most appropriate variants for use cases arising in fraud detection for accounting audits.

Figure 1 Basic centralised federated learning procedure.
Figure 1: Basic centralised federated learning procedure.

Account auditing is required in various areas: fraudulent claims affect both private sector companies and public entities, and the consequences of detected fraud range from minor accounts receivable to criminal prosecution. Thus, the requirements and objectives can differ substantially between applications. Nevertheless, most cases share one feature in common: the growing amount of data makes it impossible to audit all claims and billings individually. At this point, machine learning algorithms come into the picture. Depending on variables such as data structure and the aim of the investigation, data scientists can apply algorithms to address different objectives – e.g., outlier detection, change detection or classification.

This brings us to the question: how can federated learning help with fraud detection in accounting audits? Even though most individual organisations would possess enough data to meet the requirements of common machine learning algorithms, there can still be significant benefits to collaborating with others. In particular, organisations with a diverse range of data can benefit from collaborating with others with the same type of data. Yet, sharing data is often not an option due to data security and confidentiality obligations. By sharing training results, it is not only possible to increase the data basis but also to spot undetected fraudulent structures. Additionally, although federated learning itself is still a young research area, it promises to overcome privacy, organisational and technical obstacles to artificial intelligence methods in application domains with advanced data integrity requirements, and to make collaboration possible.

In one of the first studies in this area to date, Yang et al. [1] investigated the application of federated learning–based methods in credit card fraud detection. Employing a real-world credit card transaction dataset, they experimentally demonstrated that federated learning methods can be used to implement a working fraud detection system that does not require financial institutions to share private data with each other. Further, a study by Suzumura et al. [2] (not yet peer-reviewed) show improved results when employing a centralised cross silo fraud detection setting.

We are investigating research questions that arise in fraud-detection applications. We are using our own framework for federated learning, which is designed for highly diverse models – e.g., random forests, neural networks or linear regression – and was jointly developed by our Departments of High Performance Computing and Financial Mathematics. We have integrated methods based on horizontal as well as vertical splits. Horizontal federated learning, i.e., datasets that share the same feature base, can improve results by increasing the amount of training data and thus improving goodness of the fitted models. While horizontal federated learning is sometimes dismissed as being unrealistic when pairing models across organisations, accounting data often has to follow certain forms, especially in highly regulated areas. For applications that don’t fulfil these consistency assumptions, we use domain knowledge and break down the datasets into categories of features. Beyond the horizontal approach, which aims for enlargement of data basis and sharing of labelling cost, vertical federated learning, i.e., datasets that share the same sample spaces, also gives us interesting insights into fraudulent structures. For example, claims of a suspect at a single organisation might be unsuspicious, but the vertical combination over organisations is worth reporting. Therefore, we are combining our knowledge in anomaly detection and classification, the domain knowledge of our partners, and the aforementioned federated techniques to provide a decision support system for fraud detection in accounting audits as a first result.

[1] W. Yang et al.: “FFD: a federated learning based method for credit card fraud detection”, IEEE BigData 2019. Springer, Cham.
[2] T. Suzumura et al.: “Towards federated graph learning for collaborative financial crimes detection”, arXiv:1909.12946, 2019.

Please contact:
Stefanie Grimm
Fraunhofer Institute for Industrial Mathematics (ITWM), Germany
This email address is being protected from spambots. You need JavaScript enabled to view it.

Next issue: January 2024
Special theme:
Large Language Models
Call for the next issue
Image ERCIM News 126
This issue in pdf


Image ERCIM News 126 epub
This issue in ePub format

Get the latest issue to your desktop
RSS Feed
Cookies user preferences
We use cookies to ensure you to get the best experience on our website. If you decline the use of cookies, this website may not function as expected.
Accept all
Decline all
Read more
Tools used to analyze the data to measure the effectiveness of a website and to understand how it works.
Google Analytics
Set of techniques which have for object the commercial strategy and in particular the market study.
DoubleClick/Google Marketing