by Jiří Šmerda and Radka Findeisová

A group at Masaryk University has developed a method that creates comparable statistical reports on the use of heterogeneous digital libraries. It achieves this by analysing network traffic to selected digital library repositories. Such statistical reports are crucial, particularly to aid institutions in evaluating and optimizing their digital library portfolios.

Many research institutions subscribe to various providers of digital libraries, the subscription fees for which are often substantial. Institutions must therefore evaluate which digital libraries are used most frequently, which should be used frequently (but aren't), and which organizational unit uses each particular library the most. The results of the evaluation are used to optimize the portfolio of digital libraries to which the institution subscribes.
Digital library providers usually offer their own detailed statistical reports. These are clearly very useful for analysing the utilization of the selected library on its own. However, several difficulties exist. First, each provider offers its reports in a different format, making it hard to compare usage values for different providers. The second problem appears especially in larger institutions. In many cases it is unnecessary for the entire institution to subscribe to a given library: only certain organizational units will require access. The summary reports for the whole institution are therefore unhelpful, because larger institutions want to break the usage figures down by organizational unit.

We have developed a method to deal with these problems. It uses data on network traffic collected by a hardware probe. The probe is attached to the point at which the institution is connected to the Internet. It collects the network traffic going to and from all computers located in the institution's local network. The collected data are filtered according to the digital libraries we want to monitor, and the results are aggregated, visualized and collated into reports. We measure the amount of data transferred from the digital library servers, the number of connections and the number of unique IP addresses that are connected to the digital library servers.

Figure 1: Dynamic mindmaps in MyLibScope analytical desktop application - expanded nodes and edges with more information about usage within selected faculty and digital library.
Figure 1: Dynamic mindmaps in MyLibScope analytical desktop application - expanded nodes and edges with more information about usage within selected faculty and digital library.

This research is taking place in the Institute of Computer Science at Masaryk University in the Czech Republic, in collaboration with the Faculty of Informatics. It has resulted in an application called MyLibScope, which measures usage figures of over a hundred digital library repositories used by computers in the university network, which includes about 10,000 computers.

The application consists of three parts. The server part runs in the background, communicates with the network probe, collects and aggregates filtered data, and prepares them for visualization. The Web part takes the data and creates reports, which are available on the Masaryk University library Web site. The third part comprises an analytical desktop application that uses advanced methods of visualization to combine dynamic graphs, tables, forms and statistical graphs. This enables users to see the communication between faculties and digital library servers in a dynamic mind map: faculties and digital library servers are shown as nodes, while edges between nodes represent their communication. The size of the node illustrates the amount of data transferred to and from the node. The thickness of the edge shows the amount of data transferred during the communication between two nodes. The values are relative to the time period a user selects for analysis. A user can interactively open details about each node and edge and see detailed usage graphs for faculties or digital libraries.

This work is just the beginning. We look forward to analysing usage data over longer periods, seeing trends and anomalies, and thoroughly evaluating the use of digital libraries at Masaryk University. In the future, we also plan to enrich the application with other data sources like the university information system or to include reports from digital library providers.

The application MyLibScope is the result of a broader project, which aims to enhance work with digital libraries both for end users and library administrators. The internal Masaryk University project 'Digital Libraries at Masaryk University' started at the beginning of 2007. MyLibScope was developed and deployed at the end of 2008 and will run in pilot phase from the beginning of 2009. The application uses hardware probes by INVEA-TECH a.s. and is built on software technology by Mycroft Mind a.s.

Please contact:
Jiří Šmerda
Institute of Computer Science, Masaryk University, Czech Republic
Tel: +420 549497676
E-mail: smerda@ics.muni.cz

Next issue: January 2025
Special theme:
Large-Scale Data Analytics
Call for the next issue
Get the latest issue to your desktop
RSS Feed