Generating Authentic Grounded Synthetic Maintenance Work Orders

Large language models (LLMs) are promising for generating synthetic technical data, particularly for industrial maintenance where real datasets are often limited and unbalanced. This study generates synthetic maintenance work orders (MWOs) that are grounded to accurately represent engineering knowle...

Full description

Saved in:

Bibliographic Details
Main Authors:	Allison Lau, Jadeyn Feng, Melinda Hodkiewicz, Caitlin Woods, Michael Stewart, Adriano Polpo
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Maintenance work orders large language models GPT knowledge graphs grounded synthetic data synthetic data generation
Online Access:	https://ieeexplore.ieee.org/document/11124200/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849223086678736896
author	Allison Lau Jadeyn Feng Melinda Hodkiewicz Caitlin Woods Michael Stewart Adriano Polpo
author_facet	Allison Lau Jadeyn Feng Melinda Hodkiewicz Caitlin Woods Michael Stewart Adriano Polpo
author_sort	Allison Lau
collection	DOAJ
description	Large language models (LLMs) are promising for generating synthetic technical data, particularly for industrial maintenance where real datasets are often limited and unbalanced. This study generates synthetic maintenance work orders (MWOs) that are grounded to accurately represent engineering knowledge and authentic–reflecting technician language, jargon, and abbreviations. First, we extracted valid engineering paths from a knowledge graph constructed using the MaintIE gold-annotated industrial MWO dataset. Each path encodes engineering knowledge as a triple. These paths are used to constrain the output of an LLM (<monospace>GPT-4o mini</monospace>) to generate grounded synthetic MWOs using few-shot prompting. The synthetic MWOs are made authentic by incorporating human-like elements, such as contractions, abbreviations, and typos. Evaluation results show that the synthetic data is 86% as natural and 95% as correct as real MWOs. Turing test experiments reveal that subject matter experts could distinguish real from synthetic data only 51% of the time while exhibiting near-zero agreement, indicating random guessing. Statistical hypothesis testing confirms the results from the Turing Test. This research offers a generic approach to extracting legitimate paths from a knowledge graph to ensure that synthetic data generated are grounded in engineering knowledge while reflecting the style and language of the technicians who write them. To enable replication and reuse, code, data and documentation are at <uri>https://github.com/nlp-tlp/LLM-KG-Synthetic-MWO</uri>
format	Article
id	doaj-art-1e1e4441dabb47e3b5cf792675b85d80
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-1e1e4441dabb47e3b5cf792675b85d802025-08-25T23:12:57ZengIEEEIEEE Access2169-35362025-01-011314588814590410.1109/ACCESS.2025.359875111124200Generating Authentic Grounded Synthetic Maintenance Work OrdersAllison Lau0https://orcid.org/0009-0007-0817-8099Jadeyn Feng1https://orcid.org/0009-0007-5591-153XMelinda Hodkiewicz2https://orcid.org/0000-0002-7336-3932Caitlin Woods3Michael Stewart4Adriano Polpo5https://orcid.org/0000-0002-5959-1808Department of Computer Science and Software Engineering, The University of Western Australia, Perth, WA, AustraliaDepartment of Computer Science and Software Engineering, The University of Western Australia, Perth, WA, AustraliaSchool of Engineering, The University of Western Australia, Perth, WA, AustraliaDepartment of Computer Science and Software Engineering, The University of Western Australia, Perth, WA, AustraliaDepartment of Computer Science and Software Engineering, The University of Western Australia, Perth, WA, AustraliaDepartment of Mathematics and Statistics, The University of Western Australia, Perth, WA, AustraliaLarge language models (LLMs) are promising for generating synthetic technical data, particularly for industrial maintenance where real datasets are often limited and unbalanced. This study generates synthetic maintenance work orders (MWOs) that are grounded to accurately represent engineering knowledge and authentic–reflecting technician language, jargon, and abbreviations. First, we extracted valid engineering paths from a knowledge graph constructed using the MaintIE gold-annotated industrial MWO dataset. Each path encodes engineering knowledge as a triple. These paths are used to constrain the output of an LLM (<monospace>GPT-4o mini</monospace>) to generate grounded synthetic MWOs using few-shot prompting. The synthetic MWOs are made authentic by incorporating human-like elements, such as contractions, abbreviations, and typos. Evaluation results show that the synthetic data is 86% as natural and 95% as correct as real MWOs. Turing test experiments reveal that subject matter experts could distinguish real from synthetic data only 51% of the time while exhibiting near-zero agreement, indicating random guessing. Statistical hypothesis testing confirms the results from the Turing Test. This research offers a generic approach to extracting legitimate paths from a knowledge graph to ensure that synthetic data generated are grounded in engineering knowledge while reflecting the style and language of the technicians who write them. To enable replication and reuse, code, data and documentation are at <uri>https://github.com/nlp-tlp/LLM-KG-Synthetic-MWO</uri>https://ieeexplore.ieee.org/document/11124200/Maintenance work orderslarge language modelsGPTknowledge graphsgrounded synthetic datasynthetic data generation
spellingShingle	Allison Lau Jadeyn Feng Melinda Hodkiewicz Caitlin Woods Michael Stewart Adriano Polpo Generating Authentic Grounded Synthetic Maintenance Work Orders IEEE Access Maintenance work orders large language models GPT knowledge graphs grounded synthetic data synthetic data generation
title	Generating Authentic Grounded Synthetic Maintenance Work Orders
title_full	Generating Authentic Grounded Synthetic Maintenance Work Orders
title_fullStr	Generating Authentic Grounded Synthetic Maintenance Work Orders
title_full_unstemmed	Generating Authentic Grounded Synthetic Maintenance Work Orders
title_short	Generating Authentic Grounded Synthetic Maintenance Work Orders
title_sort	generating authentic grounded synthetic maintenance work orders
topic	Maintenance work orders large language models GPT knowledge graphs grounded synthetic data synthetic data generation
url	https://ieeexplore.ieee.org/document/11124200/
work_keys_str_mv	AT allisonlau generatingauthenticgroundedsyntheticmaintenanceworkorders AT jadeynfeng generatingauthenticgroundedsyntheticmaintenanceworkorders AT melindahodkiewicz generatingauthenticgroundedsyntheticmaintenanceworkorders AT caitlinwoods generatingauthenticgroundedsyntheticmaintenanceworkorders AT michaelstewart generatingauthenticgroundedsyntheticmaintenanceworkorders AT adrianopolpo generatingauthenticgroundedsyntheticmaintenanceworkorders

Generating Authentic Grounded Synthetic Maintenance Work Orders

Similar Items