Evaluating Handwritten Answers Using DeepSeek: A Comparative Analysis of Deep Learning-Based Assessment

Abstract Artificial intelligence is revolutionizing the education sector by making learning more accessible, efficient, and customized. Recent advancements in artificial intelligence have sparked significant interest in automating the evaluation of handwritten answers. Traditional handwritten evalua...

Full description

Saved in:
Bibliographic Details
Main Authors: Sanskar Bansal, Vinay Gupta, Eshita Gupta, Peeyush Garg
Format: Article
Language:English
Published: Springer 2025-08-01
Series:International Journal of Computational Intelligence Systems
Subjects:
Online Access:https://doi.org/10.1007/s44196-025-00946-w
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Artificial intelligence is revolutionizing the education sector by making learning more accessible, efficient, and customized. Recent advancements in artificial intelligence have sparked significant interest in automating the evaluation of handwritten answers. Traditional handwritten evaluation techniques are influenced by the evaluator's mental and physical state, environmental factors, human bias, emotional swings, and logistical challenges like storage and retrieval. Although sequence-to-sequence neural networks and other existing AI evaluation methods have demonstrated promise, they are constrained by their reliance on high-performance hardware, such as GPUs, lengthy training periods, and challenges in managing a variety of scenarios. The state-of-the-art technique known as Bidirectional Encoders Representation from Transformer (BERT) has overcome the drawbacks of previous NLP techniques like Bag of Words, TF-IDF, and Word2Vec. But BERT depends on surface-level keyword similarity, if the keywords are different then the accuracy is not perfect. This study presents a technique that combines optical character recognition (OCR) technology with DeepSeek-R1 1.5B model to create a robust, efficient, and accurate grading system. To overcome the above-mentioned challenges, we proposed an evaluation technique that uses the Google Cloud Vision API to extract and convert handwritten responses into machine-readable text, thereby providing a pre-processed input for further evaluation in this study. The main aim of this study is to develop a scalable, automated, and effective system for grading handwritten responses by combining DeepSeek for response evaluation with the Google Cloud Vision API for text extraction. To check the performance of the proposed DeepSeek evaluation method, we compare its results with cosine similarity metrics. After testing on multiple assignments, DeepSeek’s independent evaluation method gave the best results: lowest MAE—0.0580, lowest RMSE—0.147, and strongest correlation—0.895. The finding of this technique has shown that the proposed technique is reliable and accurate.
ISSN:1875-6883