Semantic Schema Extraction in NoSQL Databases using BERT Embeddings
NoSQL databases, valued for flexibility and scalability, pose analytics challenges due to their schema-less nature. Automatic schema extraction is crucial, with existing techniques limited in handling nested structures. Leveraging Natural Language Processing (NLP) advancements, this paper introduces...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Ubiquity Press
2024-12-01
|
Series: | Data Science Journal |
Subjects: | |
Online Access: | https://account.datascience.codata.org/index.php/up-j-dsj/article/view/1688 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841554956960661504 |
---|---|
author | Saad Belefqih Ahmed Zellou Mouna Berquedich |
author_facet | Saad Belefqih Ahmed Zellou Mouna Berquedich |
author_sort | Saad Belefqih |
collection | DOAJ |
description | NoSQL databases, valued for flexibility and scalability, pose analytics challenges due to their schema-less nature. Automatic schema extraction is crucial, with existing techniques limited in handling nested structures. Leveraging Natural Language Processing (NLP) advancements, this paper introduces a novel BERT Embeddings-Based approach for extracting schemas from NoSQL databases. The method analyzes semantic relationships within triplets from JSON documents through four stages: triplet extraction, preprocessing, BERT Embedding generation, and similarity analysis. Evaluation on real datasets demonstrates over 83% accuracy in extracting valid nested schema components. The study reveals interdisciplinary intersections, using NLP to unveil structures in scenarios lacking explicit schemas, showcasing significant potential for autonomous schema extraction from raw, unstructured data formats. |
format | Article |
id | doaj-art-a7ba0f4eb09946f1ac90bfd467cf0c2c |
institution | Kabale University |
issn | 1683-1470 |
language | English |
publishDate | 2024-12-01 |
publisher | Ubiquity Press |
record_format | Article |
series | Data Science Journal |
spelling | doaj-art-a7ba0f4eb09946f1ac90bfd467cf0c2c2025-01-08T07:55:17ZengUbiquity PressData Science Journal1683-14702024-12-0123575710.5334/dsj-2024-0571688Semantic Schema Extraction in NoSQL Databases using BERT EmbeddingsSaad Belefqih0https://orcid.org/0009-0001-5972-0147Ahmed Zellou1https://orcid.org/0000-0002-4688-912XMouna Berquedich2https://orcid.org/0000-0002-2597-0455Software Project Management Research Team, ENSIAS, Mohammed V University in RabatSoftware Project Management Research Team, ENSIAS, Mohammed V University in RabatGreen Tech Institute, Mohammed VI Polytechnique University, BenguerirNoSQL databases, valued for flexibility and scalability, pose analytics challenges due to their schema-less nature. Automatic schema extraction is crucial, with existing techniques limited in handling nested structures. Leveraging Natural Language Processing (NLP) advancements, this paper introduces a novel BERT Embeddings-Based approach for extracting schemas from NoSQL databases. The method analyzes semantic relationships within triplets from JSON documents through four stages: triplet extraction, preprocessing, BERT Embedding generation, and similarity analysis. Evaluation on real datasets demonstrates over 83% accuracy in extracting valid nested schema components. The study reveals interdisciplinary intersections, using NLP to unveil structures in scenarios lacking explicit schemas, showcasing significant potential for autonomous schema extraction from raw, unstructured data formats.https://account.datascience.codata.org/index.php/up-j-dsj/article/view/1688nosql databasesdata integrationschema extractionsemantic textual similarity |
spellingShingle | Saad Belefqih Ahmed Zellou Mouna Berquedich Semantic Schema Extraction in NoSQL Databases using BERT Embeddings Data Science Journal nosql databases data integration schema extraction semantic textual similarity |
title | Semantic Schema Extraction in NoSQL Databases using BERT Embeddings |
title_full | Semantic Schema Extraction in NoSQL Databases using BERT Embeddings |
title_fullStr | Semantic Schema Extraction in NoSQL Databases using BERT Embeddings |
title_full_unstemmed | Semantic Schema Extraction in NoSQL Databases using BERT Embeddings |
title_short | Semantic Schema Extraction in NoSQL Databases using BERT Embeddings |
title_sort | semantic schema extraction in nosql databases using bert embeddings |
topic | nosql databases data integration schema extraction semantic textual similarity |
url | https://account.datascience.codata.org/index.php/up-j-dsj/article/view/1688 |
work_keys_str_mv | AT saadbelefqih semanticschemaextractioninnosqldatabasesusingbertembeddings AT ahmedzellou semanticschemaextractioninnosqldatabasesusingbertembeddings AT mounaberquedich semanticschemaextractioninnosqldatabasesusingbertembeddings |