by Vladimir Samsonov (Cybernetics Lab IMA & RWTH Aachen University), Mohamed Behery and Gerhard Lakemeyer (RWTH Aachen University)
Continually refined and adjusted methods for production planning are among the cornerstones of manufacturing excellence. Heuristics and metaheuristic methods developed to address these tasks are often hard to deploy or lead to suboptimal results under constantly changing conditions combined with short response times of modern production planning. Within the DFG-funded Cluster of Excellence “Internet of Production”, a team of researchers from RWTH Aachen University is investigating the use of novel deep learning algorithms to facilitate complex decision-making processes along the manufacturing chain.
Modern manufacturing is a highly complex and dynamic international “ecosystem”. Every company involved has to interact with multiple players and fulfil numerous requirements and constraints while compromising between opposing goals. The main task of production planning is to reach and maintain a balance between these constantly changing factors.
Scheduling the orders to production machines in a way that ensures high machine utilisation rates, does not exceed available production capacities, minimises the capital costs, and meets delivery dates turns into a challenging combinatorial task. The size of the solution space grows exponentially with the increasing number of orders to be manufactured and quickly surpasses human capabilities. Time limitations represent an additional dimension of complexity in the case of short-term planning. Multiple events, such as machine breakdowns, change in production priorities, material availability, or personnel shortages can hardly be foreseen and require a quick change of the entire production plan. This calls for approaches that can find solutions to complex combinatorial tasks fulfilling multiple goals and constraints within a short time window available for the decision.
Deep learning demonstrates a number of successful applications for solving complex tasks with big solution spaces, such as defeating the world champion in the complex game Go  or the emergence of the new field “Neural Combinatorial Optimization” addressing combinatorial tasks . We adopt a reinforcement learning approach  to the task of weekly production scheduling for foil extrusion. In this case, along with the aforementioned constraints, the setup waste is heavily dependent on the production sequence.
We use historical production data and machine learning to learn complex setup dependencies between thousands of foil types. A trained regression model approximates the setup waste for previously unseen product combinations and serves as a cost function while building a new production schedule. As a part of the validation, dependencies learned by the regression model are extracted with the help of machine learning interpretability methods and are validated through expert knowledge. The trained reinforcement learning agent is benchmarked against two established solvers: Gurobi and Google OR-Tools. These solvers are based on established exact and metaheuristic approaches respectively. Our first validation and comparison runs involve 2,000 different production scenarios of relatively small problem sizes. They show interesting insights into the strength and weaknesses of involved scheduling methods. Figure 1 demonstrates the average resulting setup waste for exemplary planning tasks involving three extrusion machines and ten, twelve, and fifteen orders referred to as 10x3, 12x3, and 15x3. Figure 2 demonstrates the time each of the approaches requires to find a scheduling solution.
Figure 1: Average Setup Waste [kg], less is better.
Figure 2: Average Scheduling Time Comparison [mins], less is better.
For the considered production scenarios, metaheuristics and exact solvers are able to find solutions that are 1.13% to 10% closer to the optimal solution than our approach. Nevertheless, our approach is able to solve each problem instance at least 20 times faster. For example, on the demonstrated 15x3 scheduling task, exact and metaheuristic methods take 54 and 57 minutes respectively. The reinforcement learning agent solves the same task in under 2.3 minutes. This is a crucial advantage while reacting to an unexpected production deviation.
To conclude, our results encourage the use of reinforcement learning for the task of short-term production planning and scheduling. Future work involves closing the demonstrated optimality gap, extending the use of neural combinatorial optimisation methods to multi-step job shop production environments, as well as working on the aspects of continuous learning, validation, safety, and decision transparency of trained reinforcement learning agents for the deployment in real manufacturing environments.
 D. Silver, et al.: “Mastering the game of go without human knowledge”, Nature, 2017.
 I. Bello, et al.: “Neural combinatorial optimization with reinforcement learning”, arXiv preprint arXiv:1611.09940, 2016.
 M. Nazari, et al.: “Reinforcement learning for solving the vehicle routing problem”, NIPS, 2018.
Vladimir Samsonov, Cybernetics Lab IMA & IfU, RWTH Aachen University
Gerhard Lakemeyer, KBSG, RWTH Aachen University