by Jennifer Edmond and Georgina Nugent Folan (Trinity College Dublin)
The KPLEX project is looking at big data from a rich data perspective. It uses humanities knowledge to explore bias in big data approaches to knowledge creation.
The KPLEX Project is an H2020 funded project tasked with investigating the complexities of humanities and cultural data, and the implications of digitisation on the unique and complex messy data that humanities and cultural researchers are accustomed to dealing with. The drive for ever greater integration of digital humanities (DH) data is complicated by the uncomfortable truth that a lot of the information that should be the cornerstones of our decision making, rich data about the history of our economies, societies and cultures, isn’t digitally available. This, along with the “epistemics of the algorithm” are key concerns of the KPLEX project, and we are working to expand awareness of the risks inherent in big data for DH and cultural research, and to suggest ways in which phenomena that resist datafication can still be represented (if only by their absence) in knowledge creation approaches reliant upon the interrogation of large data corpora.
KPLEX addresses the repercussions of the dissociation of data sources from the people, institutions and conditions that created them. In a rapidly evolving DH environment where large scale data aggregation is becoming ever more accepted as the gold standard, the K-PLEX project is defining and describing some of the key aspects of data that are at risk of being left out of our knowledge creation processes, and the strategies researchers have developed to deal with these complexities.
The K-PLEX team is diverse and has adopted a comparative, multidisciplinary, and multi-sectoral approach to his problem, focussing on four key challenges to the knowledge creation capacity of big data approaches:
1) redefining what data is and the terms we use to speak about it ;
2) the manner in which data that are not digitised or shared become “hidden” from aggregation systems;
3) the fact that data is human created, and lacks the objectivity often ascribed to the term;
4) the subtle ways in which data that are complex almost always become simplified before they can be aggregated.
We approach these questions with a humanities research perspective and remain committed to humanities methodologies and forms of knowledge, but we make use of social science research tools to look at both the humanistic and computer science approaches to the term “data” and its many possible meanings and implications. Our core shared discourse of the digital humanities allows us to use these methods and knowledge in a contextualised, socially relevant manner, a strength of our consortium that is further enhanced by our inclusion of both ethnographic/anthropological and industrial perspectives.
Led by Trinity College Dublin, the KPLEX team spans four countries, taking in Freie Universität Berlin (Germany), DANS-KNAW (The Hague) and TILDE (Latvia). Each of the K-PLEX project partners addresses an integrated set of research questions and challenges. The research teams have been assembled to pursue a set of questions that are humanist-led, but broadly interdisciplinary, including humanities and digital humanities, data management, anthropology and computer science, but also including stakeholders from outside of academic research able to inform the project’s evidence gathering and analysis of the challenges, including participation from both a technology SME (TILDE) and a major national ICT research centre (ADAPT, Ireland). In addition, KPLEX takes in the experiences of a large number of major European digital research infrastructure projects federating cultural heritage data for use by researchers, through the contributions by TCD (Dublin) and KNAW-DANS (The Hague). These projects (including CENDARI, EHRI, DARIAH-EU, DASISH, PARTHENOS, ARIADNE and HaS) have all faced and progressed the issues surrounding the federation and sharing of cultural heritage data. In addition, two further projects that deal with non-scientific aspects of researcher epistemics are also engaged, namely the “Scholarly Primitives and Renewed Knowledge Led Exchanges” project (SPARKLE, based at TCD) and the “Affekte der Forscher” (based at FUB). These give the KPLEX team and project a firm baseline of knowledge for dealing with the question of how epistemics creates and marks data.
The KPLEX project kicked off in January 2017, and will conclude in March 2018, presenting its results via a composite white paper that unites the findings of each research team, with each research team also producing a peer reviewed academic paper on their findings. Over the coming months the project will be represented at DH conferences in Liverpool (“Ways of Being in the Digital Age”), Austria (“Data First!? Austrian DH Conference”), Manchester (“Researching Digital Cultural Heritage International Conference”) and Tallin (“Metadata and Semantics Research Conference”).
[L1] https://kplex-project.com/, Twitter: @KPLEXProject, Facebook: KPLEXProject
 T Presner: “The Ethics of the Algorithm”, in Probing the Ethics of Holocaust Culture.
 J Edmond: “Will Historians Ever Have Big Data?” In Computational History and Data-Driven Humanities. doi:10.1007/978-3-319-46224-0_9.
 L Gitelman, ed., “‘Raw Data’ is an Oxymoron”.
Jennifer Edmond, Georgina Nugent Folan, Trinity College Dublin, Ireland