Information extraction from historical well records using a large language model
Abstract To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Manual reading and digitizing of information from historical documents is not feasible, given the large number of wells. Here, we propose a...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-12-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-024-81846-5 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841559541700886528 |
---|---|
author | Zhiwei Ma Javier E. Santos Greg Lackey Hari Viswanathan Daniel O’Malley |
author_facet | Zhiwei Ma Javier E. Santos Greg Lackey Hari Viswanathan Daniel O’Malley |
author_sort | Zhiwei Ma |
collection | DOAJ |
description | Abstract To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Manual reading and digitizing of information from historical documents is not feasible, given the large number of wells. Here, we propose a new computational approach for rapidly and cost-effectively characterizing these wells. Specifically, we leverage the advanced capabilities of large language models (LLMs) to extract vital information including well location and depth from historical records of orphaned wells. In this paper, we present an information extraction workflow based on open-source Llama 2 models and test it on a dataset of 160 well documents. The developed workflow achieves an overall accuracy of 100%, accounting for both text conversion and LLM analysis when applied to clean, PDF-based reports. However, it struggles with unstructured image-based well records, where accuracy drops to 70%. The workflow provides significant benefits over manual human digitization, because it reduces labor and increases automation. Additionally, more detailed prompting leads to improved information extraction, and LLMs with more parameters typically perform better. Given that a vast amount of geoscientific information is locked up in old documents, this work demonstrates that recent breakthroughs in LLMs allow us to access and utilize this information more effectively. |
format | Article |
id | doaj-art-6f72b5f8e99d4dfc9b23a6cea9a09ebf |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2024-12-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-6f72b5f8e99d4dfc9b23a6cea9a09ebf2025-01-05T12:25:16ZengNature PortfolioScientific Reports2045-23222024-12-0114111410.1038/s41598-024-81846-5Information extraction from historical well records using a large language modelZhiwei Ma0Javier E. Santos1Greg Lackey2Hari Viswanathan3Daniel O’Malley4Earth & Environmental Sciences Division, Los Alamos National LaboratoryEarth & Environmental Sciences Division, Los Alamos National LaboratoryGeological and Environmental Systems Directorate, National Energy Technology LaboratoryEarth & Environmental Sciences Division, Los Alamos National LaboratoryEarth & Environmental Sciences Division, Los Alamos National LaboratoryAbstract To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Manual reading and digitizing of information from historical documents is not feasible, given the large number of wells. Here, we propose a new computational approach for rapidly and cost-effectively characterizing these wells. Specifically, we leverage the advanced capabilities of large language models (LLMs) to extract vital information including well location and depth from historical records of orphaned wells. In this paper, we present an information extraction workflow based on open-source Llama 2 models and test it on a dataset of 160 well documents. The developed workflow achieves an overall accuracy of 100%, accounting for both text conversion and LLM analysis when applied to clean, PDF-based reports. However, it struggles with unstructured image-based well records, where accuracy drops to 70%. The workflow provides significant benefits over manual human digitization, because it reduces labor and increases automation. Additionally, more detailed prompting leads to improved information extraction, and LLMs with more parameters typically perform better. Given that a vast amount of geoscientific information is locked up in old documents, this work demonstrates that recent breakthroughs in LLMs allow us to access and utilize this information more effectively.https://doi.org/10.1038/s41598-024-81846-5 |
spellingShingle | Zhiwei Ma Javier E. Santos Greg Lackey Hari Viswanathan Daniel O’Malley Information extraction from historical well records using a large language model Scientific Reports |
title | Information extraction from historical well records using a large language model |
title_full | Information extraction from historical well records using a large language model |
title_fullStr | Information extraction from historical well records using a large language model |
title_full_unstemmed | Information extraction from historical well records using a large language model |
title_short | Information extraction from historical well records using a large language model |
title_sort | information extraction from historical well records using a large language model |
url | https://doi.org/10.1038/s41598-024-81846-5 |
work_keys_str_mv | AT zhiweima informationextractionfromhistoricalwellrecordsusingalargelanguagemodel AT javieresantos informationextractionfromhistoricalwellrecordsusingalargelanguagemodel AT greglackey informationextractionfromhistoricalwellrecordsusingalargelanguagemodel AT hariviswanathan informationextractionfromhistoricalwellrecordsusingalargelanguagemodel AT danielomalley informationextractionfromhistoricalwellrecordsusingalargelanguagemodel |