by Mihály Héder (MTA-SZTAKI)
Engineering research literature tends to have fewer citations per document than other areas of science. Natural language processing, argumentation structure analysis, and pattern recognition in graphs can help to explain this by providing an understanding of the scientific impact mechanisms between scientists, engineers and society as a whole.
Institutionalised engineering research often needs to follow research policy that was designed with natural science in mind. In this setting, individual academic advancement as well as research funding largely depends on scientific indicators. A dominant way of measuring impact in science is counting citations at different levels and in different dimensions: for individual articles and journals as well as for individuals and groups. If an engineering research group does not deliver on these metrics, it might compensate for this with the revenue it generates. But this strategy increasingly pushes a group from basic engineering research activity to applied and short-term profitable research, since basic research does not result in income.
Our main research question is: why are there fewer citations per document in this field compared to almost any other branch of science? This has direct consequences on the aforementioned scientific metrics of the field. We are also addressing the questions: are conventional citations good indicators for engineering research at all? And what would be the ideal impact measurement mechanism for engineering research?
Our work approaches the problem by investigating both the argumentative structure  of engineering research articles and the directed graphs that represent citations between papers. For investigating individual papers we use natural language processing, including named entity recognition, clasterization, classification and keyword analysis. During our work we rely on publicly available Open Access registries and citation databases represented in linked open datasets.
We are testing several initial hypotheses which might explain the low number of citations:
- there are virtually no long debates that manifest themselves in debate-starter papers and follow-ups; (
- The audience itself on which engineering research has impact does not publish in big numbers;
- There are some additional effects: references to standards, design documents and patents are often just implied and not made explicit – and even when they are, they do not count.
The first hypothesis is tested with graph pattern matching. In this case we are defining an abstract pattern in the citation graphs of noted debates in the fields of philosophy of science and philosophy of technology. Then, we attempt to recognise similar patterns in the engineering research literature.
To test the second hypothesis we look into additional sources, like standards and software code that are known to be applications of certain research papers. Then we investigate the publication history of the creators of those applications, to see if they report those applications in publications. Here we rely on publicly available citation databases and searchable databases provided by big publisher and internet search firms.
For the third hypothesis we have invented the definitions of several types of “implicit citations”. Implicit citations are cases where the impact of a research article is clear in some work or artefact – standards, software code, patents, etc – but because of the nature of the work the impact never appears as a citation in any database. A typical example of this is the usage of a particular algorithm in software. While it is not appropriate to consider these kinds of “implicit citations” as equal to citations from within prestigious papers, they still point to the research’s effects on industry and society. Public funding is often justified by the advantages a research direction eventually brings to the taxpayer and society, so it is good to have an objective metric.
The preliminary results indicate that the low number of citations in the field of engineering research can, to a great extent, be explained by the hypotheses above. We have also identified other unanticipated factors, namely a proxy effect that lowers the overall number of citations as well as a tendency in the field to cite a well-known named entity (like the name of an algorithm) but not referencing to it in a bibliographically correct way.
Science metrics on which researchers need to deliver are ways of limiting the freedom of inquiry, since they prescribe the publication types researchers need use, as well as the places – prestigious journals, conferences – where they must publish. This is usually done in good will and with the intent of improving the quality of science, but the usual metrics can be detrimental to the cause of an envisioned engineering science . Since the need for some kind of metrics is likely to remain, the project will propose alternatives that measure engineering research impact more inclusively by incorporating design documents, patents, standards, and source code usage.
 S. Teufel: The Structure of Scientific Articles: Applications to Citation Indexing and Summarization. Center for the Study of Language and Information-Lecture Notes, 2010.
 B. Martin The use of multiple indicators in the assessment of basic research. Scientometrics, 1996, 36.3: 343-362.
SZTAKI, Hungarian Academy of Sciences, Hungary
36 1 279 6027