Information extraction from historical well records using a large language model

Abstract To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Manual reading and digitizing of information from historical documents is not feasible, given the large number of wells. Here, we propose a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhiwei Ma, Javier E. Santos, Greg Lackey, Hari Viswanathan, Daniel O’Malley
Format:	Article
Language:	English
Published:	Nature Portfolio 2024-12-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-024-81846-5
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841559541700886528
author	Zhiwei Ma Javier E. Santos Greg Lackey Hari Viswanathan Daniel O’Malley
author_facet	Zhiwei Ma Javier E. Santos Greg Lackey Hari Viswanathan Daniel O’Malley
author_sort	Zhiwei Ma
collection	DOAJ
description	Abstract To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Manual reading and digitizing of information from historical documents is not feasible, given the large number of wells. Here, we propose a new computational approach for rapidly and cost-effectively characterizing these wells. Specifically, we leverage the advanced capabilities of large language models (LLMs) to extract vital information including well location and depth from historical records of orphaned wells. In this paper, we present an information extraction workflow based on open-source Llama 2 models and test it on a dataset of 160 well documents. The developed workflow achieves an overall accuracy of 100%, accounting for both text conversion and LLM analysis when applied to clean, PDF-based reports. However, it struggles with unstructured image-based well records, where accuracy drops to 70%. The workflow provides significant benefits over manual human digitization, because it reduces labor and increases automation. Additionally, more detailed prompting leads to improved information extraction, and LLMs with more parameters typically perform better. Given that a vast amount of geoscientific information is locked up in old documents, this work demonstrates that recent breakthroughs in LLMs allow us to access and utilize this information more effectively.
format	Article
id	doaj-art-6f72b5f8e99d4dfc9b23a6cea9a09ebf
institution	Kabale University
issn	2045-2322
language	English
publishDate	2024-12-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-6f72b5f8e99d4dfc9b23a6cea9a09ebf2025-01-05T12:25:16ZengNature PortfolioScientific Reports2045-23222024-12-0114111410.1038/s41598-024-81846-5Information extraction from historical well records using a large language modelZhiwei Ma0Javier E. Santos1Greg Lackey2Hari Viswanathan3Daniel O’Malley4Earth & Environmental Sciences Division, Los Alamos National LaboratoryEarth & Environmental Sciences Division, Los Alamos National LaboratoryGeological and Environmental Systems Directorate, National Energy Technology LaboratoryEarth & Environmental Sciences Division, Los Alamos National LaboratoryEarth & Environmental Sciences Division, Los Alamos National LaboratoryAbstract To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Manual reading and digitizing of information from historical documents is not feasible, given the large number of wells. Here, we propose a new computational approach for rapidly and cost-effectively characterizing these wells. Specifically, we leverage the advanced capabilities of large language models (LLMs) to extract vital information including well location and depth from historical records of orphaned wells. In this paper, we present an information extraction workflow based on open-source Llama 2 models and test it on a dataset of 160 well documents. The developed workflow achieves an overall accuracy of 100%, accounting for both text conversion and LLM analysis when applied to clean, PDF-based reports. However, it struggles with unstructured image-based well records, where accuracy drops to 70%. The workflow provides significant benefits over manual human digitization, because it reduces labor and increases automation. Additionally, more detailed prompting leads to improved information extraction, and LLMs with more parameters typically perform better. Given that a vast amount of geoscientific information is locked up in old documents, this work demonstrates that recent breakthroughs in LLMs allow us to access and utilize this information more effectively.https://doi.org/10.1038/s41598-024-81846-5
spellingShingle	Zhiwei Ma Javier E. Santos Greg Lackey Hari Viswanathan Daniel O’Malley Information extraction from historical well records using a large language model Scientific Reports
title	Information extraction from historical well records using a large language model
title_full	Information extraction from historical well records using a large language model
title_fullStr	Information extraction from historical well records using a large language model
title_full_unstemmed	Information extraction from historical well records using a large language model
title_short	Information extraction from historical well records using a large language model
title_sort	information extraction from historical well records using a large language model
url	https://doi.org/10.1038/s41598-024-81846-5
work_keys_str_mv	AT zhiweima informationextractionfromhistoricalwellrecordsusingalargelanguagemodel AT javieresantos informationextractionfromhistoricalwellrecordsusingalargelanguagemodel AT greglackey informationextractionfromhistoricalwellrecordsusingalargelanguagemodel AT hariviswanathan informationextractionfromhistoricalwellrecordsusingalargelanguagemodel AT danielomalley informationextractionfromhistoricalwellrecordsusingalargelanguagemodel

Information extraction from historical well records using a large language model

Similar Items