Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3603781.3603871acmotherconferencesArticle/Chapter ViewAbstractPublication PagescniotConference Proceedingsconference-collections
research-article

A Study of Sentence-BERT Based Essay Off-topic Detection

Published: 27 July 2023 Publication History

Abstract

Automated essay scoring systems are widely used in education, and essay off-topic detection is an integral part of this. Traditionally off-topic essay detection is based on text features represented as spatial vectors, however, this approach only addresses the structure of essay statements and requires the use of manual features. This paper proposed to use the Sentence-BERT model to detect off-topic essays, the method first obtains a large amount of high-quality data to build a corpus of off-topic essays, and two Siamese twin pre-trained models are used to embed sentences in the essay topic, and the body of the essay, generate semantically rich sentence vectors and then use cosine similarity to calculate the similarity between the topic and the body of the essay after averaging the pooled sentence vectors, and select the optimal threshold to determine off-topic essays through continuous training. The experimental results show that the proposed method improves the accuracy, recall, and F1 values by 9.5%, 11.2%, and 10.4% respectively over the C-BGRU (Convolutional-Bidirectional Gate Recurrent Unit) based Siamese twin network and also has an excellent performance in topics with different degrees of divergence.

References

[1]
Chen Zhipeng, Chen Wenliang. Off-topic Essays Detection Based on Document Divergence [J]. Journal of Chinese Information Processing, 2017, 31(01): 23-30.
[2]
Deerwester S, Dumais S T, Furnas G W, Indexing by latent semantic analysis[J]. Journal of the American society for information science, 1990, 41(6): 391-407.
[3]
Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation[J]. Journal of machine Learning research, 2003, 3(Jan): 993-1022.
[4]
Tashu T M, Horváth T. A layered approach to automatic essay evaluation using word-embedding [C]// International Conference on Computer Supported Education. Springer, Cham, 2018: 77-94.
[5]
Attali Y. A differential word use measure for content analysis in automated essay scoring[J]. ETS Research Report Series, 2011, 2011(2): i-19.
[6]
Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks [C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019: 3982-3992.
[7]
Qu Qiang, Cui Rongyi, Zhao Yahui. Off-topic detection for English essays based on LDA and word2vec [J]. Application Research of Computers, 2019, 36(02): 415-419.
[8]
Persing I, Ng V. Modeling prompt adherence in student essays [C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014: 1534-1543.
[9]
Passero G, Ferreira R, Dazzi R L S. Off-Topic Essay Detection: A comparative study on the Portuguese language[J]. Revista Brasileira de Informática na Educação, 2019, 27(03): 177-190.
[10]
Li X, Wen Q, Pan K. Unsupervised off-topic essay detection based on target and reference prompts [C]// 2017 13th International Conference on Computational Intelligence and Security (CIS). IEEE, 2017: 465-468.
[11]
Meng Chaoyin, Song Wenai, Fu Lizhen. Off-topic essay detection based on LDA coupling space [J]. Application Research of Computers, 2019, 36(12): 3544-3547.
[12]
Tashu T M. Off-topic essay detection using c-bgru Siamese [C]// 2020 IEEE 14th International Conference on Semantic Computing (ICSC). IEEE, 2020: 221-225.
[13]
Zheng T, Gao Y, Wang F, Detection of medical text semantic similarity based on convolutional neural network[J]. BMC medical informatics and decision making, 2019, 19(1): 1-11.
[14]
Liu P, Liu J, Ma X, Off-topic detection model based on biterm-LDA and doc2vec [C]// Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference. 2019: 157-161.
[15]
Zhu Y. Off-Topic Detection of Business English Essay Based on Deep Learning Model[J]. Mobile Information Systems, 2021: 1-9.
[16]
Devlin J, Chang Mingwei, Lee K, BERT: pre-training of deep bidirectional transformers for language understanding [C]// Proc of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186.
[17]
Schroff F, Kalenichenko D, Philbin J. Facenet: a unified embedding for face recognition and clustering [C]// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2015: 815-823.
[18]
Tashu T M, Horváth T. Semantic-based feedback recommendation for automatic essay evaluation [C]// Proceedings of SAI Intelligent Systems Conference. Springer, Cham, 2019: 334-346.

Index Terms

  1. A Study of Sentence-BERT Based Essay Off-topic Detection
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          CNIOT '23: Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things
          May 2023
          1025 pages
          ISBN:9798400700705
          DOI:10.1145/3603781
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 27 July 2023

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Cosine similarity
          2. Off-topic essay detection
          3. Pre-training models
          4. Siamese network

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          CNIOT'23

          Acceptance Rates

          Overall Acceptance Rate 39 of 82 submissions, 48%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 46
            Total Downloads
          • Downloads (Last 12 months)42
          • Downloads (Last 6 weeks)4
          Reflects downloads up to 19 Sep 2024

          Other Metrics

          Citations

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media