Comparing a Thai Words Segmentation Methods in the LST20 Dataset

In this era of globalization where information is widely available, organizations are increasingly placing importance on using information to enhance their business. Although data is easily available, there are still challenges in natural language processing tasks, especially, the division of Thai w...

Full description

Saved in:
Bibliographic Details
Main Authors: Krittapol Damrongkamoltip, Khatcha Ruenlek, Wasit Limprasert, Prachya Boonkwan
Format: Article
Language:English
Published: Surindra Rajabhat University, Faculty of Science and Technology, Department of Computer Education 2024-08-01
Series:Journal of Computer and Creative Technology
Subjects:
Online Access:https://so13.tci-thaijo.org/index.php/jcct/article/view/679
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841526346369466368
author Krittapol Damrongkamoltip
Khatcha Ruenlek
Wasit Limprasert
Prachya Boonkwan
author_facet Krittapol Damrongkamoltip
Khatcha Ruenlek
Wasit Limprasert
Prachya Boonkwan
author_sort Krittapol Damrongkamoltip
collection DOAJ
description In this era of globalization where information is widely available, organizations are increasingly placing importance on using information to enhance their business. Although data is easily available, there are still challenges in natural language processing tasks, especially, the division of Thai words that lacks clarity of word boundaries, etc. This makes it difficult to identify the word groups in a sentence appropriately. Therefore, this study focuses on evaluating the performance of the word segmentation method including the Dictionary use and learning from data using evaluation of word segmentation in six techniques are important goals for the verification of the literal level accuracy and processing time of each method and technique, by the LST20 dataset contains 3,745 documents and covers 15 news categories in results show a more efficient way to learn from data.
format Article
id doaj-art-5dec21d36fa24d30af9c621360dbbe12
institution Kabale University
issn 2985-1580
2985-1599
language English
publishDate 2024-08-01
publisher Surindra Rajabhat University, Faculty of Science and Technology, Department of Computer Education
record_format Article
series Journal of Computer and Creative Technology
spelling doaj-art-5dec21d36fa24d30af9c621360dbbe122025-01-17T03:08:58ZengSurindra Rajabhat University, Faculty of Science and Technology, Department of Computer EducationJournal of Computer and Creative Technology2985-15802985-15992024-08-0122617010.14456/jcct.2024.7684Comparing a Thai Words Segmentation Methods in the LST20 DatasetKrittapol Damrongkamoltip0Khatcha Ruenlek1https://orcid.org/0009-0008-0666-2634Wasit Limprasert2Prachya Boonkwan3Student, Data Science and Innovation, College of Interdisciplinary Studies, Thammasat University, Pathum Thani 12121, ThailandStudent, Data Science and Innovation, College of Interdisciplinary Studies, Thammasat University, Pathum Thani 12121, ThailandAssistant Professor, Dr., Data Science and Innovation, College of Interdisciplinary Studies, Thammasat University, Pathum Thani 12121, ThailandLecturer, Dr., Language and Semantic Technology Research Team, National Electronics and Computer Technology Center, Pathum Thani 12121, ThailandIn this era of globalization where information is widely available, organizations are increasingly placing importance on using information to enhance their business. Although data is easily available, there are still challenges in natural language processing tasks, especially, the division of Thai words that lacks clarity of word boundaries, etc. This makes it difficult to identify the word groups in a sentence appropriately. Therefore, this study focuses on evaluating the performance of the word segmentation method including the Dictionary use and learning from data using evaluation of word segmentation in six techniques are important goals for the verification of the literal level accuracy and processing time of each method and technique, by the LST20 dataset contains 3,745 documents and covers 15 news categories in results show a more efficient way to learn from data.https://so13.tci-thaijo.org/index.php/jcct/article/view/679thai wordssegmentation methodslst20 dataset
spellingShingle Krittapol Damrongkamoltip
Khatcha Ruenlek
Wasit Limprasert
Prachya Boonkwan
Comparing a Thai Words Segmentation Methods in the LST20 Dataset
Journal of Computer and Creative Technology
thai words
segmentation methods
lst20 dataset
title Comparing a Thai Words Segmentation Methods in the LST20 Dataset
title_full Comparing a Thai Words Segmentation Methods in the LST20 Dataset
title_fullStr Comparing a Thai Words Segmentation Methods in the LST20 Dataset
title_full_unstemmed Comparing a Thai Words Segmentation Methods in the LST20 Dataset
title_short Comparing a Thai Words Segmentation Methods in the LST20 Dataset
title_sort comparing a thai words segmentation methods in the lst20 dataset
topic thai words
segmentation methods
lst20 dataset
url https://so13.tci-thaijo.org/index.php/jcct/article/view/679
work_keys_str_mv AT krittapoldamrongkamoltip comparingathaiwordssegmentationmethodsinthelst20dataset
AT khatcharuenlek comparingathaiwordssegmentationmethodsinthelst20dataset
AT wasitlimprasert comparingathaiwordssegmentationmethodsinthelst20dataset
AT prachyaboonkwan comparingathaiwordssegmentationmethodsinthelst20dataset