Deep Q-Networks for Minimizing Total Tardiness on a Single Machine

This paper considers the single-machine scheduling problem of total tardiness minimization. Due to its computational intractability, exact approaches such as dynamic programming algorithms and branch-and-bound algorithms struggle to produce optimal solutions for large-scale instances in a reasonable...

Full description

Saved in:
Bibliographic Details
Main Authors: Kuan Wei Huang, Bertrand M. T. Lin
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/1/62
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841549184483721216
author Kuan Wei Huang
Bertrand M. T. Lin
author_facet Kuan Wei Huang
Bertrand M. T. Lin
author_sort Kuan Wei Huang
collection DOAJ
description This paper considers the single-machine scheduling problem of total tardiness minimization. Due to its computational intractability, exact approaches such as dynamic programming algorithms and branch-and-bound algorithms struggle to produce optimal solutions for large-scale instances in a reasonable time. The advent of Deep Q-Networks (DQNs) within the reinforcement learning paradigm could be a viable approach to transcending these limitations, offering a robust and adaptive approach. This study introduces a novel approach utilizing DQNs to model the complexities of job scheduling for minimizing tardiness through an informed selection utilizing look-ahead mechanisms of actions within a defined state space. The framework incorporates seven distinct reward-shaping strategies, among which the Minimum Estimated Future Tardiness strategy notably enhances the DQN model’s performance. Specifically, it achieves an average improvement of 14.33% over Earliest Due Date (EDD), 11.90% over Shortest Processing Time (SPT), 17.65% over Least Slack First (LSF), and 8.86% over Apparent Tardiness Cost (ATC). Conversely, the Number of Delayed Jobs strategy secures an average improvement of 11.56% over EDD, 9.10% over SPT, 15.01% over LSF, and 5.99% over ATC, all while requiring minimal computational resources. The results of a computational study demonstrate DQN’s impressive performance compared to traditional heuristics. This underscores the capacity of advanced machine learning techniques to improve industrial scheduling processes, potentially leading to decent operational efficiency.
format Article
id doaj-art-e4b70ba8a5e94fb293c34f7b59f4de92
institution Kabale University
issn 2227-7390
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-e4b70ba8a5e94fb293c34f7b59f4de922025-01-10T13:18:08ZengMDPI AGMathematics2227-73902024-12-011316210.3390/math13010062Deep Q-Networks for Minimizing Total Tardiness on a Single MachineKuan Wei Huang0Bertrand M. T. Lin1Institute of Information Management, Institute of Hospital and Health Care Administration, National Yang Ming Chiao Tung University, Hsinchu 300, TaiwanInstitute of Information Management, Institute of Hospital and Health Care Administration, National Yang Ming Chiao Tung University, Hsinchu 300, TaiwanThis paper considers the single-machine scheduling problem of total tardiness minimization. Due to its computational intractability, exact approaches such as dynamic programming algorithms and branch-and-bound algorithms struggle to produce optimal solutions for large-scale instances in a reasonable time. The advent of Deep Q-Networks (DQNs) within the reinforcement learning paradigm could be a viable approach to transcending these limitations, offering a robust and adaptive approach. This study introduces a novel approach utilizing DQNs to model the complexities of job scheduling for minimizing tardiness through an informed selection utilizing look-ahead mechanisms of actions within a defined state space. The framework incorporates seven distinct reward-shaping strategies, among which the Minimum Estimated Future Tardiness strategy notably enhances the DQN model’s performance. Specifically, it achieves an average improvement of 14.33% over Earliest Due Date (EDD), 11.90% over Shortest Processing Time (SPT), 17.65% over Least Slack First (LSF), and 8.86% over Apparent Tardiness Cost (ATC). Conversely, the Number of Delayed Jobs strategy secures an average improvement of 11.56% over EDD, 9.10% over SPT, 15.01% over LSF, and 5.99% over ATC, all while requiring minimal computational resources. The results of a computational study demonstrate DQN’s impressive performance compared to traditional heuristics. This underscores the capacity of advanced machine learning techniques to improve industrial scheduling processes, potentially leading to decent operational efficiency.https://www.mdpi.com/2227-7390/13/1/62single-machine schedulingtotal tardinessmachine learningdeep Q-networksheuristic rulesshort-term rewards
spellingShingle Kuan Wei Huang
Bertrand M. T. Lin
Deep Q-Networks for Minimizing Total Tardiness on a Single Machine
Mathematics
single-machine scheduling
total tardiness
machine learning
deep Q-networks
heuristic rules
short-term rewards
title Deep Q-Networks for Minimizing Total Tardiness on a Single Machine
title_full Deep Q-Networks for Minimizing Total Tardiness on a Single Machine
title_fullStr Deep Q-Networks for Minimizing Total Tardiness on a Single Machine
title_full_unstemmed Deep Q-Networks for Minimizing Total Tardiness on a Single Machine
title_short Deep Q-Networks for Minimizing Total Tardiness on a Single Machine
title_sort deep q networks for minimizing total tardiness on a single machine
topic single-machine scheduling
total tardiness
machine learning
deep Q-networks
heuristic rules
short-term rewards
url https://www.mdpi.com/2227-7390/13/1/62
work_keys_str_mv AT kuanweihuang deepqnetworksforminimizingtotaltardinessonasinglemachine
AT bertrandmtlin deepqnetworksforminimizingtotaltardinessonasinglemachine