Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Offline handwritten mathematical expression recognition based on YOLOv5s

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

The error accumulation in traditional offline handwritten mathematical expression recognition (OHMER) becomes challenging, because of the two-dimensional structure and writing arbitrariness of offline handwritten mathematical formulas. In this study, an OHMER method based on YOLOv5s was proposed. First, YOLOv5s was used to recognize the symbol category and spatial location information of the expression image. Second, the spatial attention mechanism was introduced in YOLOv5s to enlarge the difference among symbol categories and improve accuracy. Then, a bidirectional long short-term memory network (BiLSTM) was introduced to give the symbols context-related information. Finally, the contextual relevance of the symbols was improved by increasing the number of BiLSTM layers, achieving an accuracy of 95.67%. A mathematical expressions relationship tree was built using the symbol recognition results. Clustering theory was used to analyze the two-dimensional structure of expressions. The recognition accuracy of expressions on the CROHME 2019 Test was 65.47%. The recognition rate of YOLOv5s_SB3CT is second only to that of PAL. However, the recognition rate of YOLOv5_SB3CT is higher than that of PAL when the error is less than three. This finding demonstrates that the proposed model is more fault-tolerant and stable than other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data set used in this article comes from the CROHME, the other data supporting this study’s findings are available from the corresponding author upon reasonable request.

References

  1. Yang, C., Du, J., Zhang, J.S., Wu, C.J., Chen, M.J., Wu, J.J.: Tree-based data augmentation and mutual learning for offline handwritten mathematical expression recognition. Pattern Recognit. 132, 108910 (2022). https://doi.org/10.1016/j.patcog.2022.108910

    Article  Google Scholar 

  2. Pambudi, S., Hidayatulloh, I., Surjono, H.D., Sukardiyono, T.: Development of instructional videos for the principles of 3D computer animation. J. Phys.: Conf. Ser. 1737(1), 012022 (2021). https://doi.org/10.1088/1742-6596/1737/1/012022

    Article  Google Scholar 

  3. Choudhary, A., Ahlawat, S., Gupta, H., Bhandari, A., Dhall, A., Kumar, M.: Offline handwritten mathematical expression evaluator using convolutional neural network. In: 2020 International Conference on Innovative Computing and Communications, pp. 527–537 (2020). https://doi.org/10.1007/978-981-15-5148-2_47

  4. Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2245–2250 (2018). https://doi.org/10.1109/ICPR.2018.8546031

  5. Zanibbi, R., Blostein, D., Cordy, J.R.: Recognizing mathematical expressions using tree transformation. IEEE Trans. Pattern. Anal. Mach. Intell. 24(11), 1455–1467 (2002). https://doi.org/10.1109/TPAMI.2002.1046157

    Article  Google Scholar 

  6. Mouchère, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2014 competition on recognition of on-line handwritten mathematical expressions (CROHME 2014). In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 791–796 (2014). https://doi.org/10.1109/ICFHR.2014.138

  7. Álvaro, F., Sánchez, J.A., Benedí, J.M.: An integrated grammar-based approach for mathematical expression recognition. Pattern Recognit. 51, 135–147 (2016). https://doi.org/10.1016/j.patcog.2015.09.013

    Article  ADS  Google Scholar 

  8. Hirata, N.S.T., Julca-Aguilarm, F.D.: Matching based ground-truth annotation for online handwritten mathematical expressions. Pattern Recognit. 48(3), 837–848 (2015). https://doi.org/10.1016/j.patcog.2014.09.015

    Article  ADS  Google Scholar 

  9. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81

  10. Zhang, J.S., Du, J., Dai, L.R.: Track, attend, and parse (tap): an end-to-end framework for online handwritten mathematical expression recognition. IEEE Trans. Multimedia. 21(1), 221–233 (2019). https://doi.org/10.1109/ICFHR2020.2020.00047

    Article  Google Scholar 

  11. Ding, L., Wang, Y., Laganiѐre, R., Huang, D., Luo, X., Zhang, H.: A robust and fast multispectral pedestrian detection deep network. Knowl Based Syst. 227, 106990 (2021). https://doi.org/10.1016/j.knosys.2021.106990

    Article  Google Scholar 

  12. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91

  13. Mahdavi, M., Zanibbi, R., Mouchere, H., Viard-Gaudin, C., Garain, U.: ICDAR 2019 CROHME + TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1533–1538 (2019). https://doi.org/10.1109/ICDAR.2019.00247

  14. Truong, T.N., Nguyen, H.T., Nguyen, C.T., Nakagawa, M.: Learning symbol relation tree for online handwritten mathematical expression recognition. In: 2021 Asian Conference on Pattern Recognition (ACPR), pp. 307–321 (2021). https://doi.org/10.1007/978-3-031-02444-3_23

  15. Xu, H., Wang, Z., Zhang, Y., Weng, X., Wang, Z., Zhou, G.: Document structure model for survey generation using neural network. Front. Comput. Sci. 15(4), 1–10 (2021). https://doi.org/10.1007/s11704-020-9366-8

    Article  Google Scholar 

  16. Mouchere, H., Viard-Gaudin, C., Kim, D.H., Kim, J.H., Garain, U.: Crohme2011: competition on recognition of online handwritten mathematical expressions. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 1497–1500 (2011). https://doi.org/10.1109/ICDAR.2011.297

  17. Wang, Y., Li, K., Lei, Y.: A general multi-scale image classification based on shared conversion matrix routing. Appl. Intell. 52(3), 3249–3265 (2022). https://doi.org/10.1007/s10489-021-02558-1

    Article  Google Scholar 

  18. Woo, S., Park, J., Lee J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: 2018 European Conference on Computer Vision (ECCV), pp. 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2_1

  19. Cheng, Z., Qu, A., He, X.: Contour-aware semantic segmentation network with spatial attention mechanism for medical image. Vis. Comput. 38, 749–762 (2022). https://doi.org/10.1007/s00371-021-02075-9

    Article  PubMed  Google Scholar 

  20. Song, Y., Tian, S., Yu, L.: A method for identifying local drug names in xinjiang based on BERT-BiLSTM-CRF. Autom. Control Comput. Sci. 54, 179–190 (2020). https://doi.org/10.3103/S0146411620030098

    Article  Google Scholar 

  21. Liang, D., Liang, H., Yu, Z., Zhang, Y.: Deep convolutional BiLSTM fusion network for facial expression recognition. Vis. Comput. 36(3), 499–508 (2020). https://doi.org/10.1007/s00371-019-01636-3

    Article  Google Scholar 

  22. Xu, Y., Wei, M.: Multi-view clustering toward aerial images by combining spectral analysis and local refinement. Future. Gener. Comput. Syst. 117, 138–144 (2021). https://doi.org/10.1016/j.future.2020.11.005

    Article  Google Scholar 

  23. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z.M., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J.J, Chintala, S.: PyTorch: an imperative style, high-performance deep learning library. In: 33rd Conference on Neural Information Processing Systems (NeurIPS), pp. 8024–8035 (2019). https://doi.org/10.48550/arXiv.1912.01703

  24. Ge, Z., Liu, S. T., Wang, F., Li, Z. M., Sun, J.: YOLOX: exceeding YOLO Series in 2021. In: 2021 Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.48550/arXiv.2107.08430

  25. Le, A.D., Indurkhya, B., Nakagawa, M.: Pattern generation strategies for improving recognition of handwritten mathematical expressions. Pattern Recognit. Lett. 128, 255–262 (2019). https://doi.org/10.1016/j.patrec.2019.09.002

    Article  ADS  Google Scholar 

  26. Chan, C.: Stroke extraction for offline handwritten mathematical expression recognition. IEEE Access. 8, 61565–61575 (2020). https://doi.org/10.1109/ACCESS.2020.2984627

    Article  Google Scholar 

  27. Wu, J., Yin, F., Zhang, Y., Zhang, X., Liu, C.: Image-to-markup generation via paired adversarial learning. In: 2018 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 18–34 (2018). https://doi.org/10.1007/978-3-030-10925-7_2

  28. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. In: 2018 Computer Vision and Pattern Recognition (CVPR), 1804.02767 (2018). https://doi.org/10.48550/arXiv.1804.02767

Download references

Acknowledgements

This work is partly supported by the National Natural Science Foundation of China (No. 61562009), the Open Fund Project in Semiconductor Power Device Reliability Engineering Center of Ministry of Education (No. ERCMEKFJJ2019-06), Guizhou Provincial Science and Technology Projects (No. [2023]060), Guizhou Provincial Science and Technology Support Plan (No. [2022]003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benliang Xie.

Ethics declarations

Conflict on interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, F., Fang, H., Wang, D. et al. Offline handwritten mathematical expression recognition based on YOLOv5s. Vis Comput 40, 1439–1452 (2024). https://doi.org/10.1007/s00371-023-02859-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02859-1

Keywords

Navigation