Deep Q-Networks for Minimizing Total Tardiness on a Single Machine
This paper considers the single-machine scheduling problem of total tardiness minimization. Due to its computational intractability, exact approaches such as dynamic programming algorithms and branch-and-bound algorithms struggle to produce optimal solutions for large-scale instances in a reasonable...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-12-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/13/1/62 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841549184483721216 |
---|---|
author | Kuan Wei Huang Bertrand M. T. Lin |
author_facet | Kuan Wei Huang Bertrand M. T. Lin |
author_sort | Kuan Wei Huang |
collection | DOAJ |
description | This paper considers the single-machine scheduling problem of total tardiness minimization. Due to its computational intractability, exact approaches such as dynamic programming algorithms and branch-and-bound algorithms struggle to produce optimal solutions for large-scale instances in a reasonable time. The advent of Deep Q-Networks (DQNs) within the reinforcement learning paradigm could be a viable approach to transcending these limitations, offering a robust and adaptive approach. This study introduces a novel approach utilizing DQNs to model the complexities of job scheduling for minimizing tardiness through an informed selection utilizing look-ahead mechanisms of actions within a defined state space. The framework incorporates seven distinct reward-shaping strategies, among which the Minimum Estimated Future Tardiness strategy notably enhances the DQN model’s performance. Specifically, it achieves an average improvement of 14.33% over Earliest Due Date (EDD), 11.90% over Shortest Processing Time (SPT), 17.65% over Least Slack First (LSF), and 8.86% over Apparent Tardiness Cost (ATC). Conversely, the Number of Delayed Jobs strategy secures an average improvement of 11.56% over EDD, 9.10% over SPT, 15.01% over LSF, and 5.99% over ATC, all while requiring minimal computational resources. The results of a computational study demonstrate DQN’s impressive performance compared to traditional heuristics. This underscores the capacity of advanced machine learning techniques to improve industrial scheduling processes, potentially leading to decent operational efficiency. |
format | Article |
id | doaj-art-e4b70ba8a5e94fb293c34f7b59f4de92 |
institution | Kabale University |
issn | 2227-7390 |
language | English |
publishDate | 2024-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj-art-e4b70ba8a5e94fb293c34f7b59f4de922025-01-10T13:18:08ZengMDPI AGMathematics2227-73902024-12-011316210.3390/math13010062Deep Q-Networks for Minimizing Total Tardiness on a Single MachineKuan Wei Huang0Bertrand M. T. Lin1Institute of Information Management, Institute of Hospital and Health Care Administration, National Yang Ming Chiao Tung University, Hsinchu 300, TaiwanInstitute of Information Management, Institute of Hospital and Health Care Administration, National Yang Ming Chiao Tung University, Hsinchu 300, TaiwanThis paper considers the single-machine scheduling problem of total tardiness minimization. Due to its computational intractability, exact approaches such as dynamic programming algorithms and branch-and-bound algorithms struggle to produce optimal solutions for large-scale instances in a reasonable time. The advent of Deep Q-Networks (DQNs) within the reinforcement learning paradigm could be a viable approach to transcending these limitations, offering a robust and adaptive approach. This study introduces a novel approach utilizing DQNs to model the complexities of job scheduling for minimizing tardiness through an informed selection utilizing look-ahead mechanisms of actions within a defined state space. The framework incorporates seven distinct reward-shaping strategies, among which the Minimum Estimated Future Tardiness strategy notably enhances the DQN model’s performance. Specifically, it achieves an average improvement of 14.33% over Earliest Due Date (EDD), 11.90% over Shortest Processing Time (SPT), 17.65% over Least Slack First (LSF), and 8.86% over Apparent Tardiness Cost (ATC). Conversely, the Number of Delayed Jobs strategy secures an average improvement of 11.56% over EDD, 9.10% over SPT, 15.01% over LSF, and 5.99% over ATC, all while requiring minimal computational resources. The results of a computational study demonstrate DQN’s impressive performance compared to traditional heuristics. This underscores the capacity of advanced machine learning techniques to improve industrial scheduling processes, potentially leading to decent operational efficiency.https://www.mdpi.com/2227-7390/13/1/62single-machine schedulingtotal tardinessmachine learningdeep Q-networksheuristic rulesshort-term rewards |
spellingShingle | Kuan Wei Huang Bertrand M. T. Lin Deep Q-Networks for Minimizing Total Tardiness on a Single Machine Mathematics single-machine scheduling total tardiness machine learning deep Q-networks heuristic rules short-term rewards |
title | Deep Q-Networks for Minimizing Total Tardiness on a Single Machine |
title_full | Deep Q-Networks for Minimizing Total Tardiness on a Single Machine |
title_fullStr | Deep Q-Networks for Minimizing Total Tardiness on a Single Machine |
title_full_unstemmed | Deep Q-Networks for Minimizing Total Tardiness on a Single Machine |
title_short | Deep Q-Networks for Minimizing Total Tardiness on a Single Machine |
title_sort | deep q networks for minimizing total tardiness on a single machine |
topic | single-machine scheduling total tardiness machine learning deep Q-networks heuristic rules short-term rewards |
url | https://www.mdpi.com/2227-7390/13/1/62 |
work_keys_str_mv | AT kuanweihuang deepqnetworksforminimizingtotaltardinessonasinglemachine AT bertrandmtlin deepqnetworksforminimizingtotaltardinessonasinglemachine |