Annotated corpus for traditional formula-disease relationships in biomedical articles
Abstract The Traditional Formula (TF), a combination of herbs prepared in accordance with traditional medicine principles, is increasingly garnering global attention as an alternative to modern medicine. Specifically, there is growing interest in exploring TF’s therapeutic effects across various dis...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Data |
Online Access: | https://doi.org/10.1038/s41597-025-04377-2 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841544955334492160 |
---|---|
author | Sangjun Yea Ho Jang Soyoung Kim Sanghun Lee Jaeuk U. Kim |
author_facet | Sangjun Yea Ho Jang Soyoung Kim Sanghun Lee Jaeuk U. Kim |
author_sort | Sangjun Yea |
collection | DOAJ |
description | Abstract The Traditional Formula (TF), a combination of herbs prepared in accordance with traditional medicine principles, is increasingly garnering global attention as an alternative to modern medicine. Specifically, there is growing interest in exploring TF’s therapeutic effects across various diseases. A significant portion of the state-of-the-art knowledge regarding the relationship between TF and disease is found in scientific publications, where manual knowledge extraction is impractical. Thus, Natural Language Processing (NLP) is being employed to efficiently and accurately search and extract crucial knowledge from unstructured literatures. However, the absence of a high-quality manually annotated corpus focusing on TF-disease relationships hampers the use of NLP in the fields of traditional medicine and modern biomedical science. This article introduces the Traditional Formula-Disease Relationship (TFDR) corpus, a manually annotated corpus designed to facilitate the automatic extraction of TF-disease relationships from biomedical literatures. The TFDR corpus includes information gleaned from 740 PubMed abstracts, encompassing a total of 6,211 TF mentions, 7,166 disease mentions, and 1,109 relationships between them encapsulated within 744 key-sentences. |
format | Article |
id | doaj-art-2214fd8bb09e416fbd3ac04eabc5b7f7 |
institution | Kabale University |
issn | 2052-4463 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Data |
spelling | doaj-art-2214fd8bb09e416fbd3ac04eabc5b7f72025-01-12T12:07:31ZengNature PortfolioScientific Data2052-44632025-01-0112111110.1038/s41597-025-04377-2Annotated corpus for traditional formula-disease relationships in biomedical articlesSangjun Yea0Ho Jang1Soyoung Kim2Sanghun Lee3Jaeuk U. Kim4Korean medicine data division, Korea Institute of Oriental MedicineKorean medicine data division, Korea Institute of Oriental MedicineKorean medicine data division, Korea Institute of Oriental MedicineKorean medicine data division, Korea Institute of Oriental MedicineKorean convergence medical science, University of Science and TechnologyAbstract The Traditional Formula (TF), a combination of herbs prepared in accordance with traditional medicine principles, is increasingly garnering global attention as an alternative to modern medicine. Specifically, there is growing interest in exploring TF’s therapeutic effects across various diseases. A significant portion of the state-of-the-art knowledge regarding the relationship between TF and disease is found in scientific publications, where manual knowledge extraction is impractical. Thus, Natural Language Processing (NLP) is being employed to efficiently and accurately search and extract crucial knowledge from unstructured literatures. However, the absence of a high-quality manually annotated corpus focusing on TF-disease relationships hampers the use of NLP in the fields of traditional medicine and modern biomedical science. This article introduces the Traditional Formula-Disease Relationship (TFDR) corpus, a manually annotated corpus designed to facilitate the automatic extraction of TF-disease relationships from biomedical literatures. The TFDR corpus includes information gleaned from 740 PubMed abstracts, encompassing a total of 6,211 TF mentions, 7,166 disease mentions, and 1,109 relationships between them encapsulated within 744 key-sentences.https://doi.org/10.1038/s41597-025-04377-2 |
spellingShingle | Sangjun Yea Ho Jang Soyoung Kim Sanghun Lee Jaeuk U. Kim Annotated corpus for traditional formula-disease relationships in biomedical articles Scientific Data |
title | Annotated corpus for traditional formula-disease relationships in biomedical articles |
title_full | Annotated corpus for traditional formula-disease relationships in biomedical articles |
title_fullStr | Annotated corpus for traditional formula-disease relationships in biomedical articles |
title_full_unstemmed | Annotated corpus for traditional formula-disease relationships in biomedical articles |
title_short | Annotated corpus for traditional formula-disease relationships in biomedical articles |
title_sort | annotated corpus for traditional formula disease relationships in biomedical articles |
url | https://doi.org/10.1038/s41597-025-04377-2 |
work_keys_str_mv | AT sangjunyea annotatedcorpusfortraditionalformuladiseaserelationshipsinbiomedicalarticles AT hojang annotatedcorpusfortraditionalformuladiseaserelationshipsinbiomedicalarticles AT soyoungkim annotatedcorpusfortraditionalformuladiseaserelationshipsinbiomedicalarticles AT sanghunlee annotatedcorpusfortraditionalformuladiseaserelationshipsinbiomedicalarticles AT jaeukukim annotatedcorpusfortraditionalformuladiseaserelationshipsinbiomedicalarticles |