Information extraction from historical well records using a large language model

Abstract To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Manual reading and digitizing of information from historical documents is not feasible, given the large number of wells. Here, we propose a...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhiwei Ma, Javier E. Santos, Greg Lackey, Hari Viswanathan, Daniel O’Malley
Format: Article
Language:English
Published: Nature Portfolio 2024-12-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-024-81846-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559541700886528
author Zhiwei Ma
Javier E. Santos
Greg Lackey
Hari Viswanathan
Daniel O’Malley
author_facet Zhiwei Ma
Javier E. Santos
Greg Lackey
Hari Viswanathan
Daniel O’Malley
author_sort Zhiwei Ma
collection DOAJ
description Abstract To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Manual reading and digitizing of information from historical documents is not feasible, given the large number of wells. Here, we propose a new computational approach for rapidly and cost-effectively characterizing these wells. Specifically, we leverage the advanced capabilities of large language models (LLMs) to extract vital information including well location and depth from historical records of orphaned wells. In this paper, we present an information extraction workflow based on open-source Llama 2 models and test it on a dataset of 160 well documents. The developed workflow achieves an overall accuracy of 100%, accounting for both text conversion and LLM analysis when applied to clean, PDF-based reports. However, it struggles with unstructured image-based well records, where accuracy drops to 70%. The workflow provides significant benefits over manual human digitization, because it reduces labor and increases automation. Additionally, more detailed prompting leads to improved information extraction, and LLMs with more parameters typically perform better. Given that a vast amount of geoscientific information is locked up in old documents, this work demonstrates that recent breakthroughs in LLMs allow us to access and utilize this information more effectively.
format Article
id doaj-art-6f72b5f8e99d4dfc9b23a6cea9a09ebf
institution Kabale University
issn 2045-2322
language English
publishDate 2024-12-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-6f72b5f8e99d4dfc9b23a6cea9a09ebf2025-01-05T12:25:16ZengNature PortfolioScientific Reports2045-23222024-12-0114111410.1038/s41598-024-81846-5Information extraction from historical well records using a large language modelZhiwei Ma0Javier E. Santos1Greg Lackey2Hari Viswanathan3Daniel O’Malley4Earth & Environmental Sciences Division, Los Alamos National LaboratoryEarth & Environmental Sciences Division, Los Alamos National LaboratoryGeological and Environmental Systems Directorate, National Energy Technology LaboratoryEarth & Environmental Sciences Division, Los Alamos National LaboratoryEarth & Environmental Sciences Division, Los Alamos National LaboratoryAbstract To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Manual reading and digitizing of information from historical documents is not feasible, given the large number of wells. Here, we propose a new computational approach for rapidly and cost-effectively characterizing these wells. Specifically, we leverage the advanced capabilities of large language models (LLMs) to extract vital information including well location and depth from historical records of orphaned wells. In this paper, we present an information extraction workflow based on open-source Llama 2 models and test it on a dataset of 160 well documents. The developed workflow achieves an overall accuracy of 100%, accounting for both text conversion and LLM analysis when applied to clean, PDF-based reports. However, it struggles with unstructured image-based well records, where accuracy drops to 70%. The workflow provides significant benefits over manual human digitization, because it reduces labor and increases automation. Additionally, more detailed prompting leads to improved information extraction, and LLMs with more parameters typically perform better. Given that a vast amount of geoscientific information is locked up in old documents, this work demonstrates that recent breakthroughs in LLMs allow us to access and utilize this information more effectively.https://doi.org/10.1038/s41598-024-81846-5
spellingShingle Zhiwei Ma
Javier E. Santos
Greg Lackey
Hari Viswanathan
Daniel O’Malley
Information extraction from historical well records using a large language model
Scientific Reports
title Information extraction from historical well records using a large language model
title_full Information extraction from historical well records using a large language model
title_fullStr Information extraction from historical well records using a large language model
title_full_unstemmed Information extraction from historical well records using a large language model
title_short Information extraction from historical well records using a large language model
title_sort information extraction from historical well records using a large language model
url https://doi.org/10.1038/s41598-024-81846-5
work_keys_str_mv AT zhiweima informationextractionfromhistoricalwellrecordsusingalargelanguagemodel
AT javieresantos informationextractionfromhistoricalwellrecordsusingalargelanguagemodel
AT greglackey informationextractionfromhistoricalwellrecordsusingalargelanguagemodel
AT hariviswanathan informationextractionfromhistoricalwellrecordsusingalargelanguagemodel
AT danielomalley informationextractionfromhistoricalwellrecordsusingalargelanguagemodel