MAA-PTG: multimodal aspect-aware product title generation

Mengli Zhang ORCID: orcid.org/0000-0002-7411-2561¹,
Zhou Gang¹,
Wanting Yu¹,
Ningbo Huang¹ &
…
Wenfen Liu²

584 Accesses
2 Citations
Explore all metrics

Abstract

For e-commerce platforms, high-quality product titles are a vital element in facilitating transactions. A concise, accurate, and informative product title can not only stimulate consumers’ desire to buy the products, but also provide them with precise shopping guides. However, previous work is mainly based on manual rules and templates, which not only limits the generalization ability of the model, but also lacks dominant product aspects in the generated titles. In this paper, we propose a Transformer-based Multimodal Aspect-Aware Product Title Generation model, denoted as MAA-PTG, which can effectively integrate the visual and textual information of the product to generate a valuable title. Specifically, on the decoder side, we construct an image cross-attention layer to incorporate the local image feature. And then, we explore various strategies to fuse product aspects and global image features. During training, we also adopt an aspect-based reward augmented maximum likelihood (RAML) training strategy to promote our model to generate a product title covering the key product aspects. We elaborately construct an e-commerce product dataset consisting of the product-title pairs. The experimental results on this dataset demonstrate that compared with competitive methods, our MAA-PTG model has significant advantages in ROUGE score and human evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DHA: Product Title Generation with Discriminative Hierarchical Attention for E-commerce

VSEM-SAMMI: An Explainable Multimodal Learning Approach to Predict User-Generated Image Helpfulness and Product Sales

Article Open access 18 April 2024

ACE-BERT: Adversarial Cross-Modal Enhanced BERT for E-Commerce Retrieval

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Code Availability

The data that support the findings of this study are not openly available due to business sensitivity and are available from the corresponding author upon reasonable request.

References

Barros, C., Lloret, E., Saquete, E., & et al. (2019). NATSUM: narrative abstractive summarization through cross-document timeline generation. information processing and management, 56(5), 1775–1793. https://doi.org/10.1016/j.ipm.2019.02.010.
Article Google Scholar
Belém, FM, Silva, RM, de Andrade, CMV, & et al (2020). Fixing the curse of the bad product descriptions - search-boosted tag recommendation for e-commerce products. information processing and management, 57(5), 102,289. https://doi.org/10.1016/j.ipm.2020.102289.
Article Google Scholar
Chan, Z., Zhang, Y., Chen, X., & et al. (2020). Selection and generation: Learning towards multi-product advertisement post generation. In: EMNLP (pp. 3818–3829). https://doi.org/10.18653/v1/2020.emnlp-main.313.
Chen, Q., Lin, J., Zhang, Y., & et al. (2019). Towards knowledge-based personalized product description generation in e-commerce. In: KDD (pp. 3040–3050). https://doi.org/10.1145/3292500.3330725.
Chopra, S., Auli, M., & Rush, A.M. (2016). Abstractive sentence summarization with attentive recurrent neural networks. In: NAACL-HLT (pp. 93–98). https://doi.org/10.18653/v1/n16-1012.
Daultani, V., Nio, L., & Chung, Y. (2019). Unsupervised extractive summarization for product description using coverage maximization with attribute concept. In: ICSC (pp. 114–117). https://doi.org/10.1109/ICOSC.2019.8665503.
Erkan, G., & Radev, D.R. (2011). Lexrank: Graph-based lexical centrality as salience in text summarization. arXiv:1109.2128.
Fan, M., Feng, C., Sun, M., & et al. (2019). Reinforced product metadata selection for helpfulness assessment of customer reviews. In: EMNLP-IJCNLP (pp. 1675–1683). https://doi.org/10.18653/v1/D19-1177.
Gong, Y., Luo, X., Zhu, K.Q., & et al. (2019). Automatic generation of chinese short product titles for mobile display. In: AAAI (pp. 9460–9465). https://doi.org/10.1609/aaai.v33i01.33019460.
Gu, J, Lu, Z, Li, H, & et al. (2016). Incorporating copying mechanism in sequence-to-sequence learning. In: ACL. https://doi.org/10.18653/v1/p16-1154.
He, K., Zhang, X., Ren, S., & et al. (2016). Deep residual learning for image recognition. In: CVPR (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90.
Khatri, C., Singh, G., & Parikh, N. (2018). Abstractive and extractive text summarization using document context vector and recurrent neural networks. arXiv:1807.08000.
Kim, S.G., & Kang, J. (2018). Analyzing the discriminative attributes of products using text mining focused on cosmetic reviews. information processing and management, 54(6), 938–957. https://doi.org/10.1016/j.ipm.2018.06.003.
Article Google Scholar
Krishna, R., Zhu, Y., Groth, O., & et al (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. international journal of computer vision, 123 (1), 32–73. https://doi.org/10.1007/s11263-016-0981-7.
Article MathSciNet Google Scholar
Lebanoff, L., Song, K., Dernoncourt, F., & et al. (2019). Scoring sentence singletons and pairs for abstractive summarization. In: ACL (pp. 2175–2189). https://doi.org/10.18653/v1/p19-1209.
Li, H, Yuan, P, Xu, S, & et al. (2020a). Aspect-aware multimodal summarization for chinese e-commerce products. In: AAAI (pp. 8188–8195). https://aaai.org/ojs/index.php/AAAI/article/view/6332.
Li, H, Zhu, J, Zhang, J, & et al. (2020b). Keywords-guided abstractive sentence summarization. In: AAAI (pp. 8196–8203). https://aaai.org/ojs/index.php/AAAI/article/view/6333.
Lin, CY. (2004). ROUGE: A package for automatic evaluation of summaries. In: Text summarization branches out (pp. 74–81). https://aclanthology.org/W04-1013.
Liu, N., Sun, X., Yu, H., & et al. (2020). Multistage fusion with forget gate for multimodal summarization in open-domain videos. In: EMNLP (pp. 1834–1845). https://doi.org/10.18653/v1/2020.emnlp-main.144.
Liu, Y., & Lapata, M. (2019). Text summarization with pretrained encoders. In: EMNLP-IJCNLP (pp. 3728–3738). https://doi.org/10.18653/v1/D19-1387.
Luo, Z., Huang, S., & Zhu, K.Q. (2019). Knowledge empowered prominent aspect extraction from product reviews. information processing and management, 56(3), 408–423. https://doi.org/10.1016/j.ipm.2018.11.006.
Article Google Scholar
Mane, MR, Kedia, S, Mantha, A, & et al. (2020). Product title generation for conversational systems using BERT. arXiv:https://arxiv.org/abs/2007.11768.
de Melo, T., da Silva, A.S., de Moura, E.S., & et al. (2019). OpinionLink: Leveraging user opinions for product catalog enrichment. information processing and management, 56(3), 823–843. https://doi.org/10.1016/j.ipm.2019.01.004.
Article Google Scholar
Miao, L., Cao, D., Li, J., & et al (2020). Multi-modal product title compression. information processing and management, 57, 1. https://doi.org/10.1016/j.ipm.2019.102123.
Article Google Scholar
Nallapati, R., Zhou, B., dos Santos, C.N., & et al. (2016). Abstractive text summarization using sequence-to-sequence rnns and beyond. In: SIGNLL (pp. 280–290). https://doi.org/10.18653/v1/k16-1028.
Norouzi, M, Bengio, S, Chen, Z, & et al. (2016). Reward augmented maximum likelihood for neural structured prediction. In: NeurIPS (pp. 1723–1731). https://proceedings.neurips.cc/paper/2016/hash/2f885d0fbe2e131bfc9d98363e55d1d4-Abstract.html.
Ramasamy, L.K., Kadry, S., Nam, Y., & et al. (2021). Performance analysis of sentiments in twitter dataset using svm models. international journal of electrical & computer Engineering (2088-8708), 11, 3.
Google Scholar
Ren, S., He, K., Girshick, R.B., & et al. (2017). Faster R-CNN: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.
Article Google Scholar
Rush, A.M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. In: EMNLP (pp. 379–389). https://doi.org/10.18653/v1/d15-1044.
See, A., Liu, P.J., & Manning, C.D. (2017). Get to the point: Summarization with pointer-generator networks. In: ACL (pp. 1073–1083). https://doi.org/10.18653/v1/P17-1099.
Shinzato, K, & Sekine, S. (2013). Unsupervised extraction of attributes and their values from product description. In: IJCNLP (pp. 1339–1347). https://aclanthology.org/I13-1190/.
Song, K, Tan, X, Qin, T, & et al. (2019). MASS: masked sequence to sequence pre-training for language generation. In: ICML (pp. 5926–5936). http://proceedings.mlr.press/v97/song19d.html.
de Souza, J.G.C., Kozielski, M., Mathur, P., & et al. (2018). Generating e-commerce product titles and predicting their quality. In: Proceedings of the 11th international conference on natural language generation (pp. 233–243). Association for Computational Linguistics. https://doi.org/10.18653/v1/w18-6530.
Srivastava, N, Hinton, GE, Krizhevsky, A, & et al. (2014). Dropout: a simple way to prevent neural networks from overfitting. journal of machine learning research, 15(1), 1929–1958. http://dl.acm.org/citation.cfm?id=2670313.
MathSciNet MATH Google Scholar
Vaswani, A, Shazeer, N, Parmar, N, & et al. (2017). Attention is all you need. In: NeurIPS (pp. 5998–6008). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
Wang, H., Lu, Y., & Zhai, C. (2010). Latent aspect rating analysis on review text data: a rating regression approach. In: SIGKDD (pp. 783–792). https://doi.org/10.1145/1835804.1835903.
Wang, J, Hou, Y, Liu, J, & et al. (2017). A statistical framework for product description generation. In: IJCNLP (pp. 187–192). https://aclanthology.org/I17-2032/.
Wang, J, Tian, J, Qiu, L, & et al. (2018). A multi-task learning approach for improving product title compression with user search log data. In: AAAI (pp. 451–458). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16766.
Xu, F., Pan, Z., & Xia, R. (2020a). E-commerce product review sentiment classification based on a naïve bayes continuous learning framework. information processing and management, 57(5), 102,221. https://doi.org/10.1016/j.ipm.2020.102221.
Article Google Scholar
Xu, S., Li, H., Yuan, P., & et al. (2020b). Self-attention guided copy mechanism for abstractive summarization. In: ACL (pp. 1355–1362). https://doi.org/10.18653/v1/2020.acl-main.125.
Xu, H., Wang, W., Mao, X., & et al. (2019). Scaling up open tagging from tens to thousands: Comprehension empowered attribute value extraction from product title. In: ACL (pp. 5214–5223). Association for Computational Linguistics. https://doi.org/10.18653/v1/p19-1514.
Yang, M, Qu, Q, Shen, Y, & et al. (2018). Aspect and sentiment aware abstractive review summarization. In: COLING (pp. 1110–1120). https://aclanthology.org/C18-1095/.
Zhang, J., Zou, P., Li, Z., & et al. (2019a). Multi-modal generative adversarial network for short product title generation in mobile e-commerce. In: NAACL-HLT (pp. 64–72). https://doi.org/10.18653/v1/n19-2009.
Zhang, X., Wei, F., & Zhou, M. (2019b). HIBERT: Document level pre-training of hierarchical bidirectional transformers for document summarization. In A Korhonen, DR Traum, & L Márquez (Eds.) ACL(pp. 5059–5069). https://doi.org/10.18653/v1/p19-1499.
Zhang, M., Fan, B., Zhang, N., & et al. (2021a). Mining product innovation ideas from online reviews. information processing and management, 58 (1), 102,389. https://doi.org/10.1016/j.ipm.2020.102389.
Article MathSciNet Google Scholar
Zhang, M., Zhou, G., Yu, W., & et al. (2021b). FAR-ASS: Fact-aware reinforced abstractive sentence summarization. information processing and management, 58(3), 102,478. https://doi.org/10.1016/j.ipm.2020.102478.
Article Google Scholar
Zhu, C., Yang, Z., Gmyr, R., & et al. (2019). Make lead bias in your favor: A simple and effective method for news summarization. arXiv:1912.11602.

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (61862011), Guangxi Science and Technology Foundation (2019GXNSFGA245004).

Author information

Authors and Affiliations

State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, 450000, China
Mengli Zhang, Zhou Gang, Wanting Yu & Ningbo Huang
Guilin University of Electronic Technology, Guilin, 541000, China
Wenfen Liu

Authors

Mengli Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Gang
View author publications
You can also search for this author in PubMed Google Scholar
Wanting Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ningbo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wenfen Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mengli Zhang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, M., Gang, Z., Yu, W. et al. MAA-PTG: multimodal aspect-aware product title generation. J Intell Inf Syst 59, 213–235 (2022). https://doi.org/10.1007/s10844-022-00695-8

Download citation

Received: 05 October 2021
Revised: 09 December 2021
Accepted: 16 January 2022
Published: 22 February 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10844-022-00695-8

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DHA: Product Title Generation with Discriminative Hierarchical Attention for E-commerce

VSEM-SAMMI: An Explainable Multimodal Learning Approach to Predict User-Generated Image Helpfulness and Product Sales

ACE-BERT: Adversarial Cross-Modal Enhanced BERT for E-Commerce Retrieval

Code Availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

MAA-PTG: multimodal aspect-aware product title generation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DHA: Product Title Generation with Discriminative Hierarchical Attention for E-commerce

VSEM-SAMMI: An Explainable Multimodal Learning Approach to Predict User-Generated Image Helpfulness and Product Sales

ACE-BERT: Adversarial Cross-Modal Enhanced BERT for E-Commerce Retrieval

Explore related subjects

Code Availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation