Evaluating Handwritten Answers Using DeepSeek: A Comparative Analysis of Deep Learning-Based Assessment
Abstract Artificial intelligence is revolutionizing the education sector by making learning more accessible, efficient, and customized. Recent advancements in artificial intelligence have sparked significant interest in automating the evaluation of handwritten answers. Traditional handwritten evalua...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-08-01
|
| Series: | International Journal of Computational Intelligence Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44196-025-00946-w |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849331749069258752 |
|---|---|
| author | Sanskar Bansal Vinay Gupta Eshita Gupta Peeyush Garg |
| author_facet | Sanskar Bansal Vinay Gupta Eshita Gupta Peeyush Garg |
| author_sort | Sanskar Bansal |
| collection | DOAJ |
| description | Abstract Artificial intelligence is revolutionizing the education sector by making learning more accessible, efficient, and customized. Recent advancements in artificial intelligence have sparked significant interest in automating the evaluation of handwritten answers. Traditional handwritten evaluation techniques are influenced by the evaluator's mental and physical state, environmental factors, human bias, emotional swings, and logistical challenges like storage and retrieval. Although sequence-to-sequence neural networks and other existing AI evaluation methods have demonstrated promise, they are constrained by their reliance on high-performance hardware, such as GPUs, lengthy training periods, and challenges in managing a variety of scenarios. The state-of-the-art technique known as Bidirectional Encoders Representation from Transformer (BERT) has overcome the drawbacks of previous NLP techniques like Bag of Words, TF-IDF, and Word2Vec. But BERT depends on surface-level keyword similarity, if the keywords are different then the accuracy is not perfect. This study presents a technique that combines optical character recognition (OCR) technology with DeepSeek-R1 1.5B model to create a robust, efficient, and accurate grading system. To overcome the above-mentioned challenges, we proposed an evaluation technique that uses the Google Cloud Vision API to extract and convert handwritten responses into machine-readable text, thereby providing a pre-processed input for further evaluation in this study. The main aim of this study is to develop a scalable, automated, and effective system for grading handwritten responses by combining DeepSeek for response evaluation with the Google Cloud Vision API for text extraction. To check the performance of the proposed DeepSeek evaluation method, we compare its results with cosine similarity metrics. After testing on multiple assignments, DeepSeek’s independent evaluation method gave the best results: lowest MAE—0.0580, lowest RMSE—0.147, and strongest correlation—0.895. The finding of this technique has shown that the proposed technique is reliable and accurate. |
| format | Article |
| id | doaj-art-966e6f28e4d9414c8e1e55ac024b8f67 |
| institution | Kabale University |
| issn | 1875-6883 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Springer |
| record_format | Article |
| series | International Journal of Computational Intelligence Systems |
| spelling | doaj-art-966e6f28e4d9414c8e1e55ac024b8f672025-08-20T03:46:24ZengSpringerInternational Journal of Computational Intelligence Systems1875-68832025-08-0118111610.1007/s44196-025-00946-wEvaluating Handwritten Answers Using DeepSeek: A Comparative Analysis of Deep Learning-Based AssessmentSanskar Bansal0Vinay Gupta1Eshita Gupta2Peeyush Garg3Department of Electrical Engineering, Manipal University JaipurDepartment of Electrical Engineering, Manipal University JaipurDepartment of AI & Machine Learning, Manipal University JaipurDepartment of Electrical Engineering, Manipal University JaipurAbstract Artificial intelligence is revolutionizing the education sector by making learning more accessible, efficient, and customized. Recent advancements in artificial intelligence have sparked significant interest in automating the evaluation of handwritten answers. Traditional handwritten evaluation techniques are influenced by the evaluator's mental and physical state, environmental factors, human bias, emotional swings, and logistical challenges like storage and retrieval. Although sequence-to-sequence neural networks and other existing AI evaluation methods have demonstrated promise, they are constrained by their reliance on high-performance hardware, such as GPUs, lengthy training periods, and challenges in managing a variety of scenarios. The state-of-the-art technique known as Bidirectional Encoders Representation from Transformer (BERT) has overcome the drawbacks of previous NLP techniques like Bag of Words, TF-IDF, and Word2Vec. But BERT depends on surface-level keyword similarity, if the keywords are different then the accuracy is not perfect. This study presents a technique that combines optical character recognition (OCR) technology with DeepSeek-R1 1.5B model to create a robust, efficient, and accurate grading system. To overcome the above-mentioned challenges, we proposed an evaluation technique that uses the Google Cloud Vision API to extract and convert handwritten responses into machine-readable text, thereby providing a pre-processed input for further evaluation in this study. The main aim of this study is to develop a scalable, automated, and effective system for grading handwritten responses by combining DeepSeek for response evaluation with the Google Cloud Vision API for text extraction. To check the performance of the proposed DeepSeek evaluation method, we compare its results with cosine similarity metrics. After testing on multiple assignments, DeepSeek’s independent evaluation method gave the best results: lowest MAE—0.0580, lowest RMSE—0.147, and strongest correlation—0.895. The finding of this technique has shown that the proposed technique is reliable and accurate.https://doi.org/10.1007/s44196-025-00946-wLarge language modelDeepSeekAI-based evaluation techniqueEvaluating handwritten answer sheet |
| spellingShingle | Sanskar Bansal Vinay Gupta Eshita Gupta Peeyush Garg Evaluating Handwritten Answers Using DeepSeek: A Comparative Analysis of Deep Learning-Based Assessment International Journal of Computational Intelligence Systems Large language model DeepSeek AI-based evaluation technique Evaluating handwritten answer sheet |
| title | Evaluating Handwritten Answers Using DeepSeek: A Comparative Analysis of Deep Learning-Based Assessment |
| title_full | Evaluating Handwritten Answers Using DeepSeek: A Comparative Analysis of Deep Learning-Based Assessment |
| title_fullStr | Evaluating Handwritten Answers Using DeepSeek: A Comparative Analysis of Deep Learning-Based Assessment |
| title_full_unstemmed | Evaluating Handwritten Answers Using DeepSeek: A Comparative Analysis of Deep Learning-Based Assessment |
| title_short | Evaluating Handwritten Answers Using DeepSeek: A Comparative Analysis of Deep Learning-Based Assessment |
| title_sort | evaluating handwritten answers using deepseek a comparative analysis of deep learning based assessment |
| topic | Large language model DeepSeek AI-based evaluation technique Evaluating handwritten answer sheet |
| url | https://doi.org/10.1007/s44196-025-00946-w |
| work_keys_str_mv | AT sanskarbansal evaluatinghandwrittenanswersusingdeepseekacomparativeanalysisofdeeplearningbasedassessment AT vinaygupta evaluatinghandwrittenanswersusingdeepseekacomparativeanalysisofdeeplearningbasedassessment AT eshitagupta evaluatinghandwrittenanswersusingdeepseekacomparativeanalysisofdeeplearningbasedassessment AT peeyushgarg evaluatinghandwrittenanswersusingdeepseekacomparativeanalysisofdeeplearningbasedassessment |