by András Micsik, Péter Pallinger, László Kovács and András Benczúr
The KOPI Online Plagiarism Search Portal announced the detection of cross-language or translational plagiarism in 2011. This new feature requires more data intensive processing than the traditional monolingual plagiarism search (offered since 2004). As our service suffered from performance problems, we sought a platform for trying and testing automated scaling solutions applied on our service. Within the BonFIRE open call experiment we successfully implemented and analysed our elastic scaling solutions for KOPI.
{jcomments on}BonFIRE is an EU-funded Future Internet project with the participation of several major European research clouds. BonFIRE offers a multi-site testbed with heterogeneous compute, storage and networking resources in a cloud federation for large-scale testing of applications, services and systems targeting the Internet of Services community. BonFIRE provides an API and a portal both supporting the uniform management of compute nodes, data blocks and network connections in the federated environment of seven clouds. Among the specific features of BonFIRE one can find network bandwidth control, Amazon integration, ubiquitous monitoring and other practical add-ons to regular cloud infrastructures.
The KOPI service works asynchronously: it accepts requests in the form of uploaded documents, which are checked for copied content over various databases. After this, a report is sent to the user containing the copied parts and their original sources. Processing of incoming user requests is based on a queue located on the KOPI Frontend (Figure 1), from which KOPI Engines take out requests and put back results after processing. KOPI Engines submit hundreds of search requests per document to a Fulltext Search Cluster. It typically takes 30-50 minutes to process a document.
Within the experiment called KOPFire we created a realistic test version of the KOPI service consisting of four different virtual machine (VM) types and suitable test data including test index, test requests and usage patterns. We partitioned the fulltext index and developed an aggregator of the search results. This search cluster typically used 6-11 VMs and 11-21 cores.
Our main measurement considers the effect of increasing the throughput of different service components. We collected possible atomic scaling actions together with the time needed to perform these actions. As a result, we could determine the optimal ratio of system components and the most suitable configuration options.
We measured performance by document characters processed per second (cps). Over a longer time period, the average cps can characterize the processing speed of the service. Therefore, the size of the document queue and the current processing speed can be used for triggering the scaling actions.
We created an automated scaling solution that can speed up or slow down document processing depending on the number and size of documents waiting in the queue. When we run out of capacity in one cloud, we can expand to other clouds in the federation provided by BonFIRE.
The scaling solution is based on a set of Ruby scripts using the BonFIRE API to manage cloud resources and BonFIRE Monitoring to collect measurements about virtual machines. Monitored data include both built-in metrics and our additional metrics. The scaling script detects situations when scaling is beneficial, selects and executes appropriate scaling actions and ensures that the new components are properly configured to work in cooperation with others.
Various algorithms can be plugged into the scaling script. We implemented and compared several algorithms using greedy, lazy or speed-oriented adaptation. Samples taken from the usage statistics of the real service were used to test and tune the scaling algorithms. Although we cannot say there is a single best algorithm for all usage patterns, most algorithms are good enough to raise the throughput of the service to an acceptable level and we could scale our performance in the 1:25 magnitude region during the experiment.
Furthermore, the scaling script was enhanced to continuously check the state of service components and to replace failed components. This means that as a side effect of elastic scaling we get improved fault-tolerance for the service.
These experiments helped us immensely to find the appropriate scaling solution, which at the end enables us to provide a faster and more reliable service to our growing user community.
Links:
Project home page: http://www.bonfire-project.eu/
KOPI home page: http://kopi.sztaki.hu/?lang=eng
Please contact:
András Micsik
SZTAKI, Hungary
Tel: +36 1 279 6248
E-mail: