by Gianluigi Folino, Massimo Guarascio, Luigi Pontieri and Paolo Zicari (CNR-ICAR)
Pushing intelligence and integrating explainable tools in the new generation of ticket-management systems is crucial for supporting customer-support activities. To this aim, we defined a comprehensive ticket-classification framework, which integrates deep ensemble methods and AI-based interpretation techniques to help both the operator identify misclassification errors and the analyst improve the model. Tests on real data demonstrate the quality of the predictions returned by the framework and the practical value of their associated explanations.
Nowadays, ticket-management systems (TMS) are widely used to improve the organisation, efficiency and effectiveness of customer support, with relevant impacts on costs and revenues, customer retention and public brand image. In particular, equipping these systems with intelligent tools able to provide reliable classifications and speed up the assignment process represents a challenging task for both industrial and academic research, as it requires coping with several issues (e.g. data scarcity, noise and skewness on data and processing natural language).
Tickets opened by a customer request through different channels (e.g. phone calls, emails, web forms, live chats and recently also social media like Facebook and Twitter) can be differently routed based on their properties, for example, the urgency, impact, specific area of interest or resource allocation scheme. Natural language processing methods and machine learning techniques have been widely used in the literature to develop automatic ticket classification approaches and boost customer-support systems' capacities.
In this respect, deep learning (DL) is effectively and efficiently used to process text data . Despite the great potential of DL-based text classifiers, the performances of the current solutions can be affected by different challenging issues frequently occurring in real-life applications. First, large corpora of labelled data are usually necessary to adequately train a deep model (the deeper and more complex the model, the more example data are needed). Moreover, configuring the topology and hyper-parameters of a deep neural network (DNN) architecture is a difficult task that entails long and careful design and tuning activities to make the DNN perform well. Finally, training data typically exhibit an unbalanced distribution, that is, some classes are more frequent than others (e.g. in the TMS scenario, tickets with high urgency are less frequent than the ones with lower urgency). These issues entail an increased risk of learning DNN-based classifiers that overfit the training data and rely on non-general, biased and unreliable classification patterns hinging on spurious features. Moreover, the black-box nature of a DNN model does not allow an easy understanding of which features of a data instance drove the model to its classification decision.
To cope with these issues, we have defined a ticket-classification framework based on ensemble DNN classifiers, leveraging different types of neural architectures (LSTM, CNN, GRU and transformers) as base classifiers that promote diversity, expressiveness, and robustness to over-fitting and class-imbalance risks. The framework introduces two novel ensemble combination strategies based on stacking and mixture-of-expert (MoE) architectures, both leveraging an ad hoc sub-net, named High-LevelFeature Extractor (HLFE), which allows for extracting compressed text representations (latent space). The explanation module described in Figure 1 is integrated within the framework to support a continuous, human-in-the-loop scheme for discovering, using, validating and improving the ensemble DL model.
Figure 1: The Human-in-the-loop scheme of the proposed intelligent ticket classification framework.
In more detail, two kinds of explanation artefacts are provided. Based on the post-hoc explanation algorithm LIME, the former extracts the subset of terms that likely drove the model predictions. The latter yields per-class word-cloud representations, obtained by computing the distributions of words in neighbours of the test instance that belong to each class.
Tests assessed the effectiveness and usefulness of the proposed framework over two real-life and unbalanced ticket datasets furnished by customer-support companies. The first dataset is related to a phone-company customer-care ticket collection in the form of SMS messages and short Facebook chat texts. The second dataset is a publicly available customer-support ticket dataset containing about 50 thousand tickets submitted via email by the customers of the Endava company to the helpdesk. Comparison tests demonstrated evident improvements in terms of the F1, AUC and G-measure metrics, obtained by using the novel deep ensemble approach, both in comparison with the baseline DNNs and with other state-of-the-art machine learning and ensemble-based methods (i.e. Gradient Boosting, Random Forest). For all the competitors in the comparison, the implementations available in the popular machine learning library scikit-learn were adopted, with settings chosen after performing a grid-search procedure.
The proposed explanation workflow was validated through several tests furnishing the LIME representation and the word cloud of the k nearest neighbour tickets found by using the cosine similarity in the latent-space representation of the tickets. An explanation obtained with our LIME-based procedure (on the left) and with the word cloud (on the right) is shown in Figure 2 for a ticket classified in the critical (really urgent) class. It is evident that the terms that LIME deems most influential for this prediction decision (highlighted in blue in the ticket message) confirm that the model focuses on concepts (namely: server, connection, password, tester, sent) that are really relevant for deciding the class of the ticket. The relevance of these terms was confirmed by the word cloud derived from the k = 40 nearest neighbours.
Figure 2: The explanation output (left: LIME-based explanation; right: word cloud).
Moreover, the k-nearest neighbour-based explanation supports the customer operators in explaining classification decisions by showing similar labelled examples. Thus, all these explanation artefacts support the model's prediction for the considered ticket. The explanation quality tests demonstrated the practical added value given by the proposed explanation artefacts to the operators in improving the classification model and the customer services, thus providing a toolset of interpretation methods helping both the operator in recognising misclassification errors and the analyst in improving and fine-tuning the model.
This work is a result of research activities conducted in the context of the H2020 project ‘‘HumanE-AI-Net’’ [L1], funded by the European Commission (grant no. 952026).
 S. Minaee et al., “Deep Learning-based Text Classification: A Comprehensive Review", ACM Computing Surveys 54(3): 62:1–62:40, 2022
 P. Zicari et al., “Discovering accurate deep learning based predictive models for automatic customer support ticket classification,” in Proc. ACM SAC, 2021, pp. 1098–1101. Available: https://doi.org/10.1145/3412841.3442109
 P. Zicari et al., “Combining deep ensemble learning and explanation for intelligent ticket Management,”Expert Systems with Applications, vol. 206, 2022. Available: https://doi.org/10.1016/j.eswa.2022.117815
Gianluigi Folino, CNR-ICAR, Italy