TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study
This paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties,...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10753591/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1846160744746319872 |
---|---|
author | Ali Bugra Kanburoglu Faik Boray Tek |
author_facet | Ali Bugra Kanburoglu Faik Boray Tek |
author_sort | Ali Bugra Kanburoglu |
collection | DOAJ |
description | This paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties, including nested queries, to create a comprehensive benchmark for Turkish Text-to-SQL tasks. The dataset enables cross-language comparison and significantly enhances the training and evaluation of large language models (LLMs) in generating SQL queries from Turkish natural language inputs. We fine-tuned several Turkish-supported LLMs on TURSpider and evaluated their performance in comparison to state-of-the-art models like GPT-3.5 Turbo and GPT-4. Our results show that fine-tuned Turkish LLMs demonstrate competitive performance, with one model even surpassing GPT-based models on execution accuracy. We also apply the Chain-of-Feedback (CoF) methodology to further improve model performance, demonstrating its effectiveness across multiple LLMs. This work provides a valuable resource for Turkish NLP and addresses specific challenges in developing accurate Text-to-SQL models for low-resource languages. |
format | Article |
id | doaj-art-26b12b8a171b4491a8cbf80f1884a79f |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-26b12b8a171b4491a8cbf80f1884a79f2024-11-22T00:00:54ZengIEEEIEEE Access2169-35362024-01-011216937916938710.1109/ACCESS.2024.349884110753591TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based StudyAli Bugra Kanburoglu0https://orcid.org/0009-0003-9031-1485Faik Boray Tek1https://orcid.org/0000-0002-8649-6013Department of Computer Engineering, Işık University, Istanbul, TürkiyeDepartment of Artificial Intelligence and Data Engineering, Istanbul Technical University, Istanbul, TürkiyeThis paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties, including nested queries, to create a comprehensive benchmark for Turkish Text-to-SQL tasks. The dataset enables cross-language comparison and significantly enhances the training and evaluation of large language models (LLMs) in generating SQL queries from Turkish natural language inputs. We fine-tuned several Turkish-supported LLMs on TURSpider and evaluated their performance in comparison to state-of-the-art models like GPT-3.5 Turbo and GPT-4. Our results show that fine-tuned Turkish LLMs demonstrate competitive performance, with one model even surpassing GPT-based models on execution accuracy. We also apply the Chain-of-Feedback (CoF) methodology to further improve model performance, demonstrating its effectiveness across multiple LLMs. This work provides a valuable resource for Turkish NLP and addresses specific challenges in developing accurate Text-to-SQL models for low-resource languages.https://ieeexplore.ieee.org/document/10753591/Text-to-SQLLLMlarge language modelsTurkishdatasetTURSpider |
spellingShingle | Ali Bugra Kanburoglu Faik Boray Tek TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study IEEE Access Text-to-SQL LLM large language models Turkish dataset TURSpider |
title | TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study |
title_full | TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study |
title_fullStr | TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study |
title_full_unstemmed | TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study |
title_short | TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study |
title_sort | turspider a turkish text to sql dataset and llm based study |
topic | Text-to-SQL LLM large language models Turkish dataset TURSpider |
url | https://ieeexplore.ieee.org/document/10753591/ |
work_keys_str_mv | AT alibugrakanburoglu turspideraturkishtexttosqldatasetandllmbasedstudy AT faikboraytek turspideraturkishtexttosqldatasetandllmbasedstudy |