Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques

Identifying potentially high-performing students is crucial for universities aiming to enhance educational outcomes, for companies seeking to recruit top talents early, and for advertising platforms looking to optimize targeted marketing. This paper introduces an algorithm designed to identify stude...

Full description

Saved in:
Bibliographic Details
Main Authors: Sergei S. Gorshkov, Dmitry I. Ignatov, Anastasia Yu. Chernysheva, Vyacheslav L. Goiko, Vitaliy V. Kashpur
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10812733/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841554065445617664
author Sergei S. Gorshkov
Dmitry I. Ignatov
Anastasia Yu. Chernysheva
Vyacheslav L. Goiko
Vitaliy V. Kashpur
author_facet Sergei S. Gorshkov
Dmitry I. Ignatov
Anastasia Yu. Chernysheva
Vyacheslav L. Goiko
Vitaliy V. Kashpur
author_sort Sergei S. Gorshkov
collection DOAJ
description Identifying potentially high-performing students is crucial for universities aiming to enhance educational outcomes, for companies seeking to recruit top talents early, and for advertising platforms looking to optimize targeted marketing. This paper introduces an algorithm designed to identify students with exceptional academic performance by analyzing their subscriptions to communities on the social network VKontakte. The study examines a sample of 4445 students from Tomsk State University with publicly accessible VK profiles. The research methodology involves generating vector representations for each community based on embeddings, topic modeling, sentiment and emotion analysis, as well as text complexity metrics. To generate the embeddings, a separate model was trained and made publicly available on HuggingFace. The integration of diverse features was achieved using attention mechanisms, allowing the model to dynamically weigh their importance and capture intricate interrelations. These representations are then used to construct a digital user profile, capturing the students’ interests as reflected in their community subscriptions. Additionally, the machine learning pipeline incorporated stacking to combine predictions from multiple models, enhancing robustness and classification performance. Through a series of experiments, we developed a machine learning algorithm that effectively distinguishes between high- and low-performing students based on these profiles. This approach also enabled the identification and interpretation of key factors differentiating high-performing students from their lower-performing peers. Additionally, we investigated the factors positively and negatively associated with academic performance.
format Article
id doaj-art-c5fcb02b1b1f4689bc43f20ede0f5828
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c5fcb02b1b1f4689bc43f20ede0f58282025-01-09T00:02:28ZengIEEEIEEE Access2169-35362025-01-011396297910.1109/ACCESS.2024.352185710812733Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP TechniquesSergei S. Gorshkov0https://orcid.org/0000-0001-5958-5224Dmitry I. Ignatov1https://orcid.org/0000-0002-6584-8534Anastasia Yu. Chernysheva2https://orcid.org/0000-0003-0812-8941Vyacheslav L. Goiko3Vitaliy V. Kashpur4Department of Computer Science, National Research University Higher School of Economics, Moscow, RussiaDepartment of Computer Science, National Research University Higher School of Economics, Moscow, RussiaSkolkovo Institute of Science and Technology, Skolkovo, RussiaLaboratory of Big Data in Social Sciences, Tomsk State University, Tomsk, RussiaDepartment of Sociology, Tomsk State University, Tomsk, RussiaIdentifying potentially high-performing students is crucial for universities aiming to enhance educational outcomes, for companies seeking to recruit top talents early, and for advertising platforms looking to optimize targeted marketing. This paper introduces an algorithm designed to identify students with exceptional academic performance by analyzing their subscriptions to communities on the social network VKontakte. The study examines a sample of 4445 students from Tomsk State University with publicly accessible VK profiles. The research methodology involves generating vector representations for each community based on embeddings, topic modeling, sentiment and emotion analysis, as well as text complexity metrics. To generate the embeddings, a separate model was trained and made publicly available on HuggingFace. The integration of diverse features was achieved using attention mechanisms, allowing the model to dynamically weigh their importance and capture intricate interrelations. These representations are then used to construct a digital user profile, capturing the students’ interests as reflected in their community subscriptions. Additionally, the machine learning pipeline incorporated stacking to combine predictions from multiple models, enhancing robustness and classification performance. Through a series of experiments, we developed a machine learning algorithm that effectively distinguishes between high- and low-performing students based on these profiles. This approach also enabled the identification and interpretation of key factors differentiating high-performing students from their lower-performing peers. Additionally, we investigated the factors positively and negatively associated with academic performance.https://ieeexplore.ieee.org/document/10812733/Digital footprintdomain adaptationeducational data mininginformation technologies in educationnatural language processing
spellingShingle Sergei S. Gorshkov
Dmitry I. Ignatov
Anastasia Yu. Chernysheva
Vyacheslav L. Goiko
Vitaliy V. Kashpur
Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques
IEEE Access
Digital footprint
domain adaptation
educational data mining
information technologies in education
natural language processing
title Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques
title_full Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques
title_fullStr Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques
title_full_unstemmed Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques
title_short Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques
title_sort identifying top performing students via vkontakte social media communities using advanced nlp techniques
topic Digital footprint
domain adaptation
educational data mining
information technologies in education
natural language processing
url https://ieeexplore.ieee.org/document/10812733/
work_keys_str_mv AT sergeisgorshkov identifyingtopperformingstudentsviavkontaktesocialmediacommunitiesusingadvancednlptechniques
AT dmitryiignatov identifyingtopperformingstudentsviavkontaktesocialmediacommunitiesusingadvancednlptechniques
AT anastasiayuchernysheva identifyingtopperformingstudentsviavkontaktesocialmediacommunitiesusingadvancednlptechniques
AT vyacheslavlgoiko identifyingtopperformingstudentsviavkontaktesocialmediacommunitiesusingadvancednlptechniques
AT vitaliyvkashpur identifyingtopperformingstudentsviavkontaktesocialmediacommunitiesusingadvancednlptechniques