Comparing a Thai Words Segmentation Methods in the LST20 Dataset
In this era of globalization where information is widely available, organizations are increasingly placing importance on using information to enhance their business. Although data is easily available, there are still challenges in natural language processing tasks, especially, the division of Thai w...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Surindra Rajabhat University, Faculty of Science and Technology, Department of Computer Education
2024-08-01
|
Series: | Journal of Computer and Creative Technology |
Subjects: | |
Online Access: | https://so13.tci-thaijo.org/index.php/jcct/article/view/679 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841526346369466368 |
---|---|
author | Krittapol Damrongkamoltip Khatcha Ruenlek Wasit Limprasert Prachya Boonkwan |
author_facet | Krittapol Damrongkamoltip Khatcha Ruenlek Wasit Limprasert Prachya Boonkwan |
author_sort | Krittapol Damrongkamoltip |
collection | DOAJ |
description | In this era of globalization where information is widely available, organizations are increasingly placing importance on using information to enhance their business. Although data is easily available, there are still challenges in natural language processing tasks, especially, the division of Thai words that lacks clarity of word boundaries, etc. This makes it difficult to identify the word groups in a sentence appropriately. Therefore, this study focuses on evaluating the performance of the word segmentation method including the Dictionary use and learning from data using evaluation of word segmentation in six techniques are important goals for the verification of the literal level accuracy and processing time of each method and technique, by the LST20 dataset contains 3,745 documents and covers 15 news categories in results show a more efficient way to learn from data. |
format | Article |
id | doaj-art-5dec21d36fa24d30af9c621360dbbe12 |
institution | Kabale University |
issn | 2985-1580 2985-1599 |
language | English |
publishDate | 2024-08-01 |
publisher | Surindra Rajabhat University, Faculty of Science and Technology, Department of Computer Education |
record_format | Article |
series | Journal of Computer and Creative Technology |
spelling | doaj-art-5dec21d36fa24d30af9c621360dbbe122025-01-17T03:08:58ZengSurindra Rajabhat University, Faculty of Science and Technology, Department of Computer EducationJournal of Computer and Creative Technology2985-15802985-15992024-08-0122617010.14456/jcct.2024.7684Comparing a Thai Words Segmentation Methods in the LST20 DatasetKrittapol Damrongkamoltip0Khatcha Ruenlek1https://orcid.org/0009-0008-0666-2634Wasit Limprasert2Prachya Boonkwan3Student, Data Science and Innovation, College of Interdisciplinary Studies, Thammasat University, Pathum Thani 12121, ThailandStudent, Data Science and Innovation, College of Interdisciplinary Studies, Thammasat University, Pathum Thani 12121, ThailandAssistant Professor, Dr., Data Science and Innovation, College of Interdisciplinary Studies, Thammasat University, Pathum Thani 12121, ThailandLecturer, Dr., Language and Semantic Technology Research Team, National Electronics and Computer Technology Center, Pathum Thani 12121, ThailandIn this era of globalization where information is widely available, organizations are increasingly placing importance on using information to enhance their business. Although data is easily available, there are still challenges in natural language processing tasks, especially, the division of Thai words that lacks clarity of word boundaries, etc. This makes it difficult to identify the word groups in a sentence appropriately. Therefore, this study focuses on evaluating the performance of the word segmentation method including the Dictionary use and learning from data using evaluation of word segmentation in six techniques are important goals for the verification of the literal level accuracy and processing time of each method and technique, by the LST20 dataset contains 3,745 documents and covers 15 news categories in results show a more efficient way to learn from data.https://so13.tci-thaijo.org/index.php/jcct/article/view/679thai wordssegmentation methodslst20 dataset |
spellingShingle | Krittapol Damrongkamoltip Khatcha Ruenlek Wasit Limprasert Prachya Boonkwan Comparing a Thai Words Segmentation Methods in the LST20 Dataset Journal of Computer and Creative Technology thai words segmentation methods lst20 dataset |
title | Comparing a Thai Words Segmentation Methods in the LST20 Dataset |
title_full | Comparing a Thai Words Segmentation Methods in the LST20 Dataset |
title_fullStr | Comparing a Thai Words Segmentation Methods in the LST20 Dataset |
title_full_unstemmed | Comparing a Thai Words Segmentation Methods in the LST20 Dataset |
title_short | Comparing a Thai Words Segmentation Methods in the LST20 Dataset |
title_sort | comparing a thai words segmentation methods in the lst20 dataset |
topic | thai words segmentation methods lst20 dataset |
url | https://so13.tci-thaijo.org/index.php/jcct/article/view/679 |
work_keys_str_mv | AT krittapoldamrongkamoltip comparingathaiwordssegmentationmethodsinthelst20dataset AT khatcharuenlek comparingathaiwordssegmentationmethodsinthelst20dataset AT wasitlimprasert comparingathaiwordssegmentationmethodsinthelst20dataset AT prachyaboonkwan comparingathaiwordssegmentationmethodsinthelst20dataset |