Nothing Special   »   [go: up one dir, main page]

Skip to main content

Labeling Document Images for E-Commence Products with Tree-Based Segment Re-organizing and Hierarchical Transformer

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 Workshops (ICDAR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12917))

Included in the following conference series:

  • 1782 Accesses

Abstract

Document images of products have been widely used in E-commence. As a kind of special data, the contents in document images are quite diverse: texts can be scattered anywhere with pictures, and both short text snippets and long text chunks exist. To predict text labels in document images, we propose a two stage approach. The first stage, named as tree-based segment re-organizing, is to resume text order and text connection through hierarchical clustering, segment reordering and segment merging. The second stage, named as hierarchical transformer, is to generate segment embeddings and predict segment labels, where segment level and document level encoder are applied. We empirically study the effects of incorporating different features and compare two kinds of attention to aggregate context, where distance and direction are measured in 1D and 2D respectively. Experiments based on a real-world dataset show that our proposed segment re-organizing method can reduce about 40% input size to the labeling model while bring negligible impact to performance. For hierarchical transformer, we empirically show that document encoder using 1D attention is more effective than 2D attention.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/paddlepaddle/paddleocr/.

  2. 2.

    https://www.jd.com/.

  3. 3.

    https://huggingface.co/voidful/albert_chinese_tiny.

References

  1. Esser, D., Schuster, D., Muthmann, K., Berger, M., Schill, A.: Automatic indexing of scanned documents: a layout-based approach. Doc. Recognit. Retrieval XIX 8297, 118–125 (2012)

    Google Scholar 

  2. Hwang, W., et al.: Post-OCR parsing: building simple and robust parser via BIO tagging. In: Workshop on Document Intelligence at NeurIPS 2019 (2019)

    Google Scholar 

  3. Guo, H., Qin, X., Liu, J., Han, J., Liu, J., Ding, E: EATEN: entity-aware attention for single shot visual text extraction. In: ICDAR, pp. 254–259 (2019)

    Google Scholar 

  4. Qian, Y., Santus, E., Jin, Z., Guo, J., Barzilay, R.: GraphIE: a graph-based framework for information extraction. In: NAACL-HLT, pp. 751–761 (2019)

    Google Scholar 

  5. Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: PICK: processing key information extraction from documents using improved graph learning-convolutional networks. In: ICPR 2020 (2020D)

    Google Scholar 

  6. Liu, X., Gao, F., Zhang, Q., Zhao, H.: Graph convolution for multimodal information extraction from visually rich documents. In: NAACL-HLT, pp. 32–39 (2019)

    Google Scholar 

  7. Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: ACL, pp. 1064–1074 (2016)

    Google Scholar 

  8. Zhang, P., et al.: TRIE: end-to-end text reading and information extraction for document understanding. In: ACMMM, pp. 1413–1422 (2020)

    Google Scholar 

  9. Li, L., Gao, F., Bu, J., Wang, Y., Yu, Z., Zheng, Q.: An end-to-end OCR text re-organization sequence learning for rich-text detail image comprehension. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 85–100. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_6

    Chapter  Google Scholar 

  10. Zhang, X., Wei, F., Zhou, M.: HIBERT: document level pre-training of hierarchical bidirectional transformers for document summarization. In: ACL, pp. 5059–5069 (2019)

    Google Scholar 

  11. Wang, J., et al.: Towards robust visual information extraction in real world: new dataset and novel solution. In: CoRR 2021 (2021)

    Google Scholar 

  12. Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: pre-training of text and layout for document image understanding. In: ACM-SIGKDD, pp. 1192–1200 (2020)

    Google Scholar 

  13. Garncarek, U., Powalski, R., Stanisawek, T., Topolski, B., Graliński, F.: LAMBERT: layout-aware language modeling using BERT for information extraction. In: CoRR 2020 (2020)

    Google Scholar 

  14. Cohan, A., Beltagy, I., King, D., Dalvi, B., Weld, D.S.: Pretrained language models for sequential sentence classification. In: CoRR 2019 (2019)

    Google Scholar 

  15. Katti, A.R., et al.: Chargrid: towards understanding 2D documents. In: EMNLP, pp. 4459–4469 (2018)

    Google Scholar 

  16. Iz, B., Matthew, E.P., Arman, C.: Longformer: the long-document transformer. In: CoRR 2020 (2020)

    Google Scholar 

  17. Ashish, V., et al.: Attention is all you need. In: NIPS 2017 (2017)

    Google Scholar 

  18. Yan, H., Deng, B., Li, X., Qiu, X: TENER: adapting transformer encoder for named entity recognition. In: CoRR 2019 (2019)

    Google Scholar 

  19. Huang, Z., et al.: ICDAR 2019 competition on scanned receipt OCR and information extraction. In: ICDAR, pp. 1516–1520 (2019)

    Google Scholar 

  20. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: ACL, pp. 4171–4186 (2019)

    Google Scholar 

  21. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. In: ICLR 2020 (2020)

    Google Scholar 

  22. Sun, Y., Wang, S., Li, Y., Feng, S., Wu, H.: ERNIE: enhanced representation through knowledge integration. In: CoRR 2019 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, P., Yuan, P., Li, Y., Bao, Y., Yan, W. (2021). Labeling Document Images for E-Commence Products with Tree-Based Segment Re-organizing and Hierarchical Transformer. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12917. Springer, Cham. https://doi.org/10.1007/978-3-030-86159-9_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86159-9_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86158-2

  • Online ISBN: 978-3-030-86159-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics