Abstract
The error accumulation in traditional offline handwritten mathematical expression recognition (OHMER) becomes challenging, because of the two-dimensional structure and writing arbitrariness of offline handwritten mathematical formulas. In this study, an OHMER method based on YOLOv5s was proposed. First, YOLOv5s was used to recognize the symbol category and spatial location information of the expression image. Second, the spatial attention mechanism was introduced in YOLOv5s to enlarge the difference among symbol categories and improve accuracy. Then, a bidirectional long short-term memory network (BiLSTM) was introduced to give the symbols context-related information. Finally, the contextual relevance of the symbols was improved by increasing the number of BiLSTM layers, achieving an accuracy of 95.67%. A mathematical expressions relationship tree was built using the symbol recognition results. Clustering theory was used to analyze the two-dimensional structure of expressions. The recognition accuracy of expressions on the CROHME 2019 Test was 65.47%. The recognition rate of YOLOv5s_SB3CT is second only to that of PAL. However, the recognition rate of YOLOv5_SB3CT is higher than that of PAL when the error is less than three. This finding demonstrates that the proposed model is more fault-tolerant and stable than other models.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data set used in this article comes from the CROHME, the other data supporting this study’s findings are available from the corresponding author upon reasonable request.
References
Yang, C., Du, J., Zhang, J.S., Wu, C.J., Chen, M.J., Wu, J.J.: Tree-based data augmentation and mutual learning for offline handwritten mathematical expression recognition. Pattern Recognit. 132, 108910 (2022). https://doi.org/10.1016/j.patcog.2022.108910
Pambudi, S., Hidayatulloh, I., Surjono, H.D., Sukardiyono, T.: Development of instructional videos for the principles of 3D computer animation. J. Phys.: Conf. Ser. 1737(1), 012022 (2021). https://doi.org/10.1088/1742-6596/1737/1/012022
Choudhary, A., Ahlawat, S., Gupta, H., Bhandari, A., Dhall, A., Kumar, M.: Offline handwritten mathematical expression evaluator using convolutional neural network. In: 2020 International Conference on Innovative Computing and Communications, pp. 527–537 (2020). https://doi.org/10.1007/978-981-15-5148-2_47
Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2245–2250 (2018). https://doi.org/10.1109/ICPR.2018.8546031
Zanibbi, R., Blostein, D., Cordy, J.R.: Recognizing mathematical expressions using tree transformation. IEEE Trans. Pattern. Anal. Mach. Intell. 24(11), 1455–1467 (2002). https://doi.org/10.1109/TPAMI.2002.1046157
Mouchère, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2014 competition on recognition of on-line handwritten mathematical expressions (CROHME 2014). In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 791–796 (2014). https://doi.org/10.1109/ICFHR.2014.138
Álvaro, F., Sánchez, J.A., Benedí, J.M.: An integrated grammar-based approach for mathematical expression recognition. Pattern Recognit. 51, 135–147 (2016). https://doi.org/10.1016/j.patcog.2015.09.013
Hirata, N.S.T., Julca-Aguilarm, F.D.: Matching based ground-truth annotation for online handwritten mathematical expressions. Pattern Recognit. 48(3), 837–848 (2015). https://doi.org/10.1016/j.patcog.2014.09.015
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81
Zhang, J.S., Du, J., Dai, L.R.: Track, attend, and parse (tap): an end-to-end framework for online handwritten mathematical expression recognition. IEEE Trans. Multimedia. 21(1), 221–233 (2019). https://doi.org/10.1109/ICFHR2020.2020.00047
Ding, L., Wang, Y., Laganiѐre, R., Huang, D., Luo, X., Zhang, H.: A robust and fast multispectral pedestrian detection deep network. Knowl Based Syst. 227, 106990 (2021). https://doi.org/10.1016/j.knosys.2021.106990
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Mahdavi, M., Zanibbi, R., Mouchere, H., Viard-Gaudin, C., Garain, U.: ICDAR 2019 CROHME + TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1533–1538 (2019). https://doi.org/10.1109/ICDAR.2019.00247
Truong, T.N., Nguyen, H.T., Nguyen, C.T., Nakagawa, M.: Learning symbol relation tree for online handwritten mathematical expression recognition. In: 2021 Asian Conference on Pattern Recognition (ACPR), pp. 307–321 (2021). https://doi.org/10.1007/978-3-031-02444-3_23
Xu, H., Wang, Z., Zhang, Y., Weng, X., Wang, Z., Zhou, G.: Document structure model for survey generation using neural network. Front. Comput. Sci. 15(4), 1–10 (2021). https://doi.org/10.1007/s11704-020-9366-8
Mouchere, H., Viard-Gaudin, C., Kim, D.H., Kim, J.H., Garain, U.: Crohme2011: competition on recognition of online handwritten mathematical expressions. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 1497–1500 (2011). https://doi.org/10.1109/ICDAR.2011.297
Wang, Y., Li, K., Lei, Y.: A general multi-scale image classification based on shared conversion matrix routing. Appl. Intell. 52(3), 3249–3265 (2022). https://doi.org/10.1007/s10489-021-02558-1
Woo, S., Park, J., Lee J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: 2018 European Conference on Computer Vision (ECCV), pp. 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Cheng, Z., Qu, A., He, X.: Contour-aware semantic segmentation network with spatial attention mechanism for medical image. Vis. Comput. 38, 749–762 (2022). https://doi.org/10.1007/s00371-021-02075-9
Song, Y., Tian, S., Yu, L.: A method for identifying local drug names in xinjiang based on BERT-BiLSTM-CRF. Autom. Control Comput. Sci. 54, 179–190 (2020). https://doi.org/10.3103/S0146411620030098
Liang, D., Liang, H., Yu, Z., Zhang, Y.: Deep convolutional BiLSTM fusion network for facial expression recognition. Vis. Comput. 36(3), 499–508 (2020). https://doi.org/10.1007/s00371-019-01636-3
Xu, Y., Wei, M.: Multi-view clustering toward aerial images by combining spectral analysis and local refinement. Future. Gener. Comput. Syst. 117, 138–144 (2021). https://doi.org/10.1016/j.future.2020.11.005
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z.M., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J.J, Chintala, S.: PyTorch: an imperative style, high-performance deep learning library. In: 33rd Conference on Neural Information Processing Systems (NeurIPS), pp. 8024–8035 (2019). https://doi.org/10.48550/arXiv.1912.01703
Ge, Z., Liu, S. T., Wang, F., Li, Z. M., Sun, J.: YOLOX: exceeding YOLO Series in 2021. In: 2021 Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.48550/arXiv.2107.08430
Le, A.D., Indurkhya, B., Nakagawa, M.: Pattern generation strategies for improving recognition of handwritten mathematical expressions. Pattern Recognit. Lett. 128, 255–262 (2019). https://doi.org/10.1016/j.patrec.2019.09.002
Chan, C.: Stroke extraction for offline handwritten mathematical expression recognition. IEEE Access. 8, 61565–61575 (2020). https://doi.org/10.1109/ACCESS.2020.2984627
Wu, J., Yin, F., Zhang, Y., Zhang, X., Liu, C.: Image-to-markup generation via paired adversarial learning. In: 2018 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 18–34 (2018). https://doi.org/10.1007/978-3-030-10925-7_2
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. In: 2018 Computer Vision and Pattern Recognition (CVPR), 1804.02767 (2018). https://doi.org/10.48550/arXiv.1804.02767
Acknowledgements
This work is partly supported by the National Natural Science Foundation of China (No. 61562009), the Open Fund Project in Semiconductor Power Device Reliability Engineering Center of Ministry of Education (No. ERCMEKFJJ2019-06), Guizhou Provincial Science and Technology Projects (No. [2023]060), Guizhou Provincial Science and Technology Support Plan (No. [2022]003).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict on interests
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, F., Fang, H., Wang, D. et al. Offline handwritten mathematical expression recognition based on YOLOv5s. Vis Comput 40, 1439–1452 (2024). https://doi.org/10.1007/s00371-023-02859-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02859-1