Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques
Identifying potentially high-performing students is crucial for universities aiming to enhance educational outcomes, for companies seeking to recruit top talents early, and for advertising platforms looking to optimize targeted marketing. This paper introduces an algorithm designed to identify stude...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10812733/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841554065445617664 |
---|---|
author | Sergei S. Gorshkov Dmitry I. Ignatov Anastasia Yu. Chernysheva Vyacheslav L. Goiko Vitaliy V. Kashpur |
author_facet | Sergei S. Gorshkov Dmitry I. Ignatov Anastasia Yu. Chernysheva Vyacheslav L. Goiko Vitaliy V. Kashpur |
author_sort | Sergei S. Gorshkov |
collection | DOAJ |
description | Identifying potentially high-performing students is crucial for universities aiming to enhance educational outcomes, for companies seeking to recruit top talents early, and for advertising platforms looking to optimize targeted marketing. This paper introduces an algorithm designed to identify students with exceptional academic performance by analyzing their subscriptions to communities on the social network VKontakte. The study examines a sample of 4445 students from Tomsk State University with publicly accessible VK profiles. The research methodology involves generating vector representations for each community based on embeddings, topic modeling, sentiment and emotion analysis, as well as text complexity metrics. To generate the embeddings, a separate model was trained and made publicly available on HuggingFace. The integration of diverse features was achieved using attention mechanisms, allowing the model to dynamically weigh their importance and capture intricate interrelations. These representations are then used to construct a digital user profile, capturing the students’ interests as reflected in their community subscriptions. Additionally, the machine learning pipeline incorporated stacking to combine predictions from multiple models, enhancing robustness and classification performance. Through a series of experiments, we developed a machine learning algorithm that effectively distinguishes between high- and low-performing students based on these profiles. This approach also enabled the identification and interpretation of key factors differentiating high-performing students from their lower-performing peers. Additionally, we investigated the factors positively and negatively associated with academic performance. |
format | Article |
id | doaj-art-c5fcb02b1b1f4689bc43f20ede0f5828 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-c5fcb02b1b1f4689bc43f20ede0f58282025-01-09T00:02:28ZengIEEEIEEE Access2169-35362025-01-011396297910.1109/ACCESS.2024.352185710812733Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP TechniquesSergei S. Gorshkov0https://orcid.org/0000-0001-5958-5224Dmitry I. Ignatov1https://orcid.org/0000-0002-6584-8534Anastasia Yu. Chernysheva2https://orcid.org/0000-0003-0812-8941Vyacheslav L. Goiko3Vitaliy V. Kashpur4Department of Computer Science, National Research University Higher School of Economics, Moscow, RussiaDepartment of Computer Science, National Research University Higher School of Economics, Moscow, RussiaSkolkovo Institute of Science and Technology, Skolkovo, RussiaLaboratory of Big Data in Social Sciences, Tomsk State University, Tomsk, RussiaDepartment of Sociology, Tomsk State University, Tomsk, RussiaIdentifying potentially high-performing students is crucial for universities aiming to enhance educational outcomes, for companies seeking to recruit top talents early, and for advertising platforms looking to optimize targeted marketing. This paper introduces an algorithm designed to identify students with exceptional academic performance by analyzing their subscriptions to communities on the social network VKontakte. The study examines a sample of 4445 students from Tomsk State University with publicly accessible VK profiles. The research methodology involves generating vector representations for each community based on embeddings, topic modeling, sentiment and emotion analysis, as well as text complexity metrics. To generate the embeddings, a separate model was trained and made publicly available on HuggingFace. The integration of diverse features was achieved using attention mechanisms, allowing the model to dynamically weigh their importance and capture intricate interrelations. These representations are then used to construct a digital user profile, capturing the students’ interests as reflected in their community subscriptions. Additionally, the machine learning pipeline incorporated stacking to combine predictions from multiple models, enhancing robustness and classification performance. Through a series of experiments, we developed a machine learning algorithm that effectively distinguishes between high- and low-performing students based on these profiles. This approach also enabled the identification and interpretation of key factors differentiating high-performing students from their lower-performing peers. Additionally, we investigated the factors positively and negatively associated with academic performance.https://ieeexplore.ieee.org/document/10812733/Digital footprintdomain adaptationeducational data mininginformation technologies in educationnatural language processing |
spellingShingle | Sergei S. Gorshkov Dmitry I. Ignatov Anastasia Yu. Chernysheva Vyacheslav L. Goiko Vitaliy V. Kashpur Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques IEEE Access Digital footprint domain adaptation educational data mining information technologies in education natural language processing |
title | Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques |
title_full | Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques |
title_fullStr | Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques |
title_full_unstemmed | Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques |
title_short | Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques |
title_sort | identifying top performing students via vkontakte social media communities using advanced nlp techniques |
topic | Digital footprint domain adaptation educational data mining information technologies in education natural language processing |
url | https://ieeexplore.ieee.org/document/10812733/ |
work_keys_str_mv | AT sergeisgorshkov identifyingtopperformingstudentsviavkontaktesocialmediacommunitiesusingadvancednlptechniques AT dmitryiignatov identifyingtopperformingstudentsviavkontaktesocialmediacommunitiesusingadvancednlptechniques AT anastasiayuchernysheva identifyingtopperformingstudentsviavkontaktesocialmediacommunitiesusingadvancednlptechniques AT vyacheslavlgoiko identifyingtopperformingstudentsviavkontaktesocialmediacommunitiesusingadvancednlptechniques AT vitaliyvkashpur identifyingtopperformingstudentsviavkontaktesocialmediacommunitiesusingadvancednlptechniques |