by Ahmet Soylu and Till C. Lech (SINTEF)
Annually, around 14% of the EU’s GDP is spent on the procurement of goods and services by over 250,000 public authorities. Imfproving the effectiveness, efficiency, transparency and accountability of government procurement is therefore in the public’s interest. The increasing amount of open procurement data enables us to analyse public spending to deliver better quality and more economical public services, prevent fraud and corruption, and build healthy and sustainable economies.
TheyBuyForYou [L1] is a three year project, funded by European Union's Horizon 2020 program, that aims to build a technology platform consisting of a set of modular web-based services and APIs, to publish, curate, integrate, analyse, and visualise an open, comprehensive, cross-border and cross-lingual procurement knowledge graph, including public spending and corporate data from multiple sources across the EU.
Many of the tools in use by governments are not optimised for government use or are subject to restrictive contracts that create unnecessary complications when it comes to publishing open data. Other contracts, such as contracts for tender advertising portals, are hampering the progress of transparency because the portals are claiming copyright over all data published in the portals, even though their public-sector clients are the authors and the data on tender opportunities are required to be published openly by law. The technical landscape for managing such contracts is very heterogeneous: for example, even in medium-sized cities, contracts are handled using different tools and formats across departments, including relational databases, Excel spreadsheets, and Lotus Notes. This makes it difficult to achieve a high-level overview of processes and decisions. There are various initiatives, such as Open Contracting Data Standard (OCDS), that aim to create de-jure and de-facto standards for electronic procurement. However, these are mostly oriented to achieve interoperability (i.e., addressing communication between systems), document oriented (i.e., the structure of the information is commonly provided by the content of the documents that are exchanged), and provide no standardised practices to refer to third parties, companies participating in the process, or even the main object of contracts. In short, there is enormous heterogeneity in systems and processes. The Semantic Web approach has the potential to benefit the procurement domain by allowing the reuse of existing vocabularies, ontologies, and standards.
The TheyBuyForYou project explores how procurement knowledge graphs, paired with data management, analytics and interaction design could be used to reform four key procurement areas : (i) economic development: facilitating better economic outcomes from public spending for SMEs; (ii) demand management: spotting trends in public spending to achieve long-term goals such as savings; (iii) competitive markets: promoting healthier competition and identifying collusions and other irregularities; and (iv) supplier intelligence: optimising supply chains. The project develops an integrated technology platform with data, core services, open APIs and online tools, which will be validated in different business cases.
The project is underway and we have integrated two high-quality datasets according to an ontology network , company (i.e., legal entities) and procurement (e.g., tenders and contracts) data, to form an interconnected knowledge graph for public procurement. We ingest data from two main providers: OpenCorporates [L2] for supplier data (i.e., company) and OpenOpps [L3] for procurement data. OpenOpps has gathered over 2,000,000 tender documents from more than 300 publishers through web scrapping and by using open APIs, while OpenCorporates currently has 1,400,000 entities collected from national registers. The data ingestion process comprises several steps using data APIs of both providers, including data curation (e.g., handling missing values and duplicates), matching suppliers appearing in tender data against company data (i.e., reconciliation), and translating datasets into the underlying graph data representation (i.e., RDF) with respect to the ontology network . The current release of the knowledge graph includes 2,300,000 triples (i.e., records) [L4]. An example query and its results are depicted in Figure 1. The example query lists the top ten companies that constitute the major suppliers, in the Norwegian jurisdiction, where the jurisdiction data comes from OpenCorporates and contract data comes from OpenOpps. In addition to the knowledge graph and data ingestion components, we are developing a set of online toolkits including cross-language document comparison and analytics components; anomaly detection components; a comprehensive set of guidelines for data visualisation and interaction design; and a design for a story-telling tool.
Figure 1: An example query executed on the TBFY knowledge graph integrating two datasets.
For the remainder of the project, we will ingest more data and focus on data quality issues, which turns out to be considerable challenge. Currently, we are working on various approaches to improve data quality, ranging from machine learning to crowd-sourcing. Finally, we are consolidating an integrated platform with data access APIs and online tools supporting different data analytics tasks. A series of business cases are being built on top of the knowledge graph and online tools to realise the aforementioned four key innovation areas.
The project consortium is composed of SINTEF (coordinator, Norway), University of Southampton (UK), OpenOpps (UK), OpenCorporates (UK), Cerved (Italy), Ministrstvo za Javno Upravo (Slovenia), Ayuntamiento de Zaragoza (Spain), Jozef Stefan Institute (Slovenia), Oesía Networks SL (Spain), and Universidad Politecnica de Madrid (Spain).
 Simperl et al.: “Towards a Knowledge Graph Based Platform for Public Procurement”, in Proc. of MTSR 2018.
 Soylu et al.: “Towards an Ontology for Public Procurement Based on the Open Contracting Data Standard”, in Proc. of I3E 2019.
 Soylu et al.: “An Overview of the TBFY Knowledge Graph for Public Procurement”, in Proc. of ISWC Satellites 2019.
Ahmet Soylu, SINTEF, Norway