Semantic Schema Extraction in NoSQL Databases using BERT Embeddings

NoSQL databases, valued for flexibility and scalability, pose analytics challenges due to their schema-less nature. Automatic schema extraction is crucial, with existing techniques limited in handling nested structures. Leveraging Natural Language Processing (NLP) advancements, this paper introduces...

Full description

Saved in:
Bibliographic Details
Main Authors: Saad Belefqih, Ahmed Zellou, Mouna Berquedich
Format: Article
Language:English
Published: Ubiquity Press 2024-12-01
Series:Data Science Journal
Subjects:
Online Access:https://account.datascience.codata.org/index.php/up-j-dsj/article/view/1688
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841554956960661504
author Saad Belefqih
Ahmed Zellou
Mouna Berquedich
author_facet Saad Belefqih
Ahmed Zellou
Mouna Berquedich
author_sort Saad Belefqih
collection DOAJ
description NoSQL databases, valued for flexibility and scalability, pose analytics challenges due to their schema-less nature. Automatic schema extraction is crucial, with existing techniques limited in handling nested structures. Leveraging Natural Language Processing (NLP) advancements, this paper introduces a novel BERT Embeddings-Based approach for extracting schemas from NoSQL databases. The method analyzes semantic relationships within triplets from JSON documents through four stages: triplet extraction, preprocessing, BERT Embedding generation, and similarity analysis. Evaluation on real datasets demonstrates over 83% accuracy in extracting valid nested schema components. The study reveals interdisciplinary intersections, using NLP to unveil structures in scenarios lacking explicit schemas, showcasing significant potential for autonomous schema extraction from raw, unstructured data formats.
format Article
id doaj-art-a7ba0f4eb09946f1ac90bfd467cf0c2c
institution Kabale University
issn 1683-1470
language English
publishDate 2024-12-01
publisher Ubiquity Press
record_format Article
series Data Science Journal
spelling doaj-art-a7ba0f4eb09946f1ac90bfd467cf0c2c2025-01-08T07:55:17ZengUbiquity PressData Science Journal1683-14702024-12-0123575710.5334/dsj-2024-0571688Semantic Schema Extraction in NoSQL Databases using BERT EmbeddingsSaad Belefqih0https://orcid.org/0009-0001-5972-0147Ahmed Zellou1https://orcid.org/0000-0002-4688-912XMouna Berquedich2https://orcid.org/0000-0002-2597-0455Software Project Management Research Team, ENSIAS, Mohammed V University in RabatSoftware Project Management Research Team, ENSIAS, Mohammed V University in RabatGreen Tech Institute, Mohammed VI Polytechnique University, BenguerirNoSQL databases, valued for flexibility and scalability, pose analytics challenges due to their schema-less nature. Automatic schema extraction is crucial, with existing techniques limited in handling nested structures. Leveraging Natural Language Processing (NLP) advancements, this paper introduces a novel BERT Embeddings-Based approach for extracting schemas from NoSQL databases. The method analyzes semantic relationships within triplets from JSON documents through four stages: triplet extraction, preprocessing, BERT Embedding generation, and similarity analysis. Evaluation on real datasets demonstrates over 83% accuracy in extracting valid nested schema components. The study reveals interdisciplinary intersections, using NLP to unveil structures in scenarios lacking explicit schemas, showcasing significant potential for autonomous schema extraction from raw, unstructured data formats.https://account.datascience.codata.org/index.php/up-j-dsj/article/view/1688nosql databasesdata integrationschema extractionsemantic textual similarity
spellingShingle Saad Belefqih
Ahmed Zellou
Mouna Berquedich
Semantic Schema Extraction in NoSQL Databases using BERT Embeddings
Data Science Journal
nosql databases
data integration
schema extraction
semantic textual similarity
title Semantic Schema Extraction in NoSQL Databases using BERT Embeddings
title_full Semantic Schema Extraction in NoSQL Databases using BERT Embeddings
title_fullStr Semantic Schema Extraction in NoSQL Databases using BERT Embeddings
title_full_unstemmed Semantic Schema Extraction in NoSQL Databases using BERT Embeddings
title_short Semantic Schema Extraction in NoSQL Databases using BERT Embeddings
title_sort semantic schema extraction in nosql databases using bert embeddings
topic nosql databases
data integration
schema extraction
semantic textual similarity
url https://account.datascience.codata.org/index.php/up-j-dsj/article/view/1688
work_keys_str_mv AT saadbelefqih semanticschemaextractioninnosqldatabasesusingbertembeddings
AT ahmedzellou semanticschemaextractioninnosqldatabasesusingbertembeddings
AT mounaberquedich semanticschemaextractioninnosqldatabasesusingbertembeddings