Automated text processing: topic segmentation of educational texts

The article explores the problem of automatic quantitative assessment of text complexity and thematic segmentation of texts. The authors offer a brief description of the state of affairs in this area including the fact that the existing formulas for text readability index are genredependent and lose...

Full description

Saved in:
Bibliographic Details
Main Authors: Marina I. Solnyshkina, Iskander E. Yarmakeev, Elzara V. Gafiyatova, Farida Kh. Ismaeva
Format: Article
Language:English
Published: Samara State Technical University 2019-09-01
Series:Вестник Самарского государственного технического университета. Серия: Психолого-педагогические науки
Subjects:
Online Access:https://vestnik-pp.samgtu.ru/1991-8569/article/viewFile/52421/35874
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The article explores the problem of automatic quantitative assessment of text complexity and thematic segmentation of texts. The authors offer a brief description of the state of affairs in this area including the fact that the existing formulas for text readability index are genredependent and lose their reliability when used for texts of other genres. Based on the corpus of educational texts and analysis of quantitative text parameters, we suggest a new way of text ranking so that they correspond linguistic abilities of pupils. The study was carried out on the material of UMK Spotlight 11, the corpus used in the study comprises 38 texts of 12891 tokens in total. The methods used were topic segmentation, component analysis, statistical analysis, Flash-Kincaid readability Texts complexity assessment showed that the dynamics of texts with tasks (1) testing skimming abilities is from more complex to less complex (-0.2); (2) testing scanning abilities is from less complex to more (+0.4); (3) in the texts for intensive reading, text readability rose by 5.2. The thematic segmentation performed based on Monkey Learn revealed the vocabulary of 15 topics that, during the school year, are offered to students on average 3-5 times. The most frequent theme is "Humanities", the reference to which is revealed in 9 modules. It is significant that textbook authors offer the following topics Gardening, Computers & Internet, Science & Mathematics, Entertainment& Recreation only once during the school year.
ISSN:1991-8569
2712-892X