Code duplication is considered a widespread code smell, a symptom of bad code development practices or potential design issues. Code smells are also considered to be indicators of poor software maintainability. The refactoring cost associated with removing code clones can be very high, partly because of the number of different decisions that must be made regarding the kind of refactoring steps to apply. Here, we describe a tool that has been developed to suggest the best refactoring steps that could be taken to remove clones in Java code. Our approach is based on the classification of clones, in terms of their location in a class hierarchy, so that decisions can be made from a restricted set of refactorings that have been evaluated using multiple criteria.
Code clones [1] have been widely studied and there is a large amount of literature on this issue. This work has led to a number of different types of clones being identified. Clones involve all non-trivial software systems; the percentage of involved duplicated lines is usually estimated to be between 5% and 20% but can sometimes even reach 50% [2]. Many of the studies have investigated the factors that cause clone insertion and their results have enabled several criteria and detection techniques to be developed.
When addressing the issue of duplicated code management, we have to consider the following aspects:
- what instances are worth refactoring and which are not; and
- once an instance has been evaluated as worth of refactoring, which technique should be applied to remove a duplicated instance.
Refactoring duplicated code is a task in which code fragments are merged or moved to other locations, for example other functions, methods or classes. Moving code means that the computational logic belonging to a specific entity of the system is moved: it should be approached with caution as relocation can break the original design coherence, reducing cohesion and/or moving responsibilities to unsuitable entities. There are a number of refactoring techniques available, each having its own pros and cons in both design and lower-level aspects.
In this study, we proposed an approach that aims at automatically evaluating and selecting suitable refactoring techniques based on the classification of the clones, thus reducing the human involvement in the process. We focused our attention on the following aspects:
- an analysis of the location of each clone pair resulting in a specific set of applicable refactoring techniques,
- the ranking of the applicable refactoring techniques based on a set of weighting criteria, and
- the aggregation of the critical clone information and best refactoring techniques, according to those numerical criteria.
Figure 1: Duplicate code data flow through DCRA components." title="Figure 1: Duplicate code data flow through DCRA components.
In line with this vision, we developed a tool which suggests the ‘best’ refactoring techniques for code clones in Java and named it the Duplicated Code Refactoring Advisor (DCRA; Figure 1). The tool consists of four components, each designed with a specific goal. Every component enriches the information obtained on the duplicate code and the whole elaboration process identifies a suitable list of techniques that could be applied to the most problematic duplications. The four components are:
- the Clone Detector, which is an external tool for detecting clone pairs (we are currently using a well known tool called NiCad [3]);
- the Clone Detailer, which analyzes the Clone Detector output and characterises every clone, detailing information such as clone location, size and type;
- the Refactoring Advisor, which visits a decision tree to choose the possible refactoring techniques related to each clone pair; the use of this component allows for refactoring technique suggestions to be made, based on the clone location and the variables contained in the clone; suggestions are ranked on the basis of the clone’s different features, e.g., a Lines of Code (LOC) variation and an evaluation of the quality resulting from its application, in terms of object-oriented programming constructs exploitation; and
- the Refactoring Advice Aggregator, which aggregates the available information on clones and refactoring techniques, groups them by class or package and then sorts them by refactoring significance or clone pair impact, thus providing a summary report which captures the most interesting information about clones, e.g., what are the largest clones and which clones should be easiest (or most convenient) to remove.
In developing this approach, our dual aim was to filter out which clone pairs are worthy of refactoring and suggest the best refactoring techniques for those worthy clone pairs. We have successfully provided an automated technique for selecting the best refactoring techniques in a given situation that is based on a classification of code clones. We experimented the Clone Detailer module on 50 systems of the Qualitas Corpus from Tempero et al. We validated all the modules of our DCRA tool on four systems of the Qualitas Corpus. The tool suggested a successful refactoring in most cases.
Through its use, the aim is that DCRA will offer a concrete reduction in the human involvement currently required in duplicated code refactoring procedures and, thus, reducing the overall effort required from software developers.
References:
[1] M. Fowler: “Refactoring. Improving the Design of Existing Code”, Addison- Wesley, 1999
[2] M. F. Zibran, C. K. Roy: “The road to software clone management: A survey”, The Univ. of Saskatchewan, Dept. Computer Science, Tech. Rep. 2012-03, Feb. 2012, http://www.cs.usask.ca/documents/techreports/2012/TR- 2012- 03.pdf
[3] C. Roy, J. Cordy: “NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization”, in proc. of ICPC 2008, Amsterdam, pp. 172–181.
Please contact:
Francesca Arcelli Fontana,
Marco Zanoni
University of Milano Bicocca, Italy,
E-mail: