Abstract
For e-commerce platforms, high-quality product titles are a vital element in facilitating transactions. A concise, accurate, and informative product title can not only stimulate consumers’ desire to buy the products, but also provide them with precise shopping guides. However, previous work is mainly based on manual rules and templates, which not only limits the generalization ability of the model, but also lacks dominant product aspects in the generated titles. In this paper, we propose a Transformer-based Multimodal Aspect-Aware Product Title Generation model, denoted as MAA-PTG, which can effectively integrate the visual and textual information of the product to generate a valuable title. Specifically, on the decoder side, we construct an image cross-attention layer to incorporate the local image feature. And then, we explore various strategies to fuse product aspects and global image features. During training, we also adopt an aspect-based reward augmented maximum likelihood (RAML) training strategy to promote our model to generate a product title covering the key product aspects. We elaborately construct an e-commerce product dataset consisting of the product-title pairs. The experimental results on this dataset demonstrate that compared with competitive methods, our MAA-PTG model has significant advantages in ROUGE score and human evaluation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Code Availability
The data that support the findings of this study are not openly available due to business sensitivity and are available from the corresponding author upon reasonable request.
References
Barros, C., Lloret, E., Saquete, E., & et al. (2019). NATSUM: narrative abstractive summarization through cross-document timeline generation. information processing and management, 56(5), 1775–1793. https://doi.org/10.1016/j.ipm.2019.02.010.
Belém, FM, Silva, RM, de Andrade, CMV, & et al (2020). Fixing the curse of the bad product descriptions - search-boosted tag recommendation for e-commerce products. information processing and management, 57(5), 102,289. https://doi.org/10.1016/j.ipm.2020.102289.
Chan, Z., Zhang, Y., Chen, X., & et al. (2020). Selection and generation: Learning towards multi-product advertisement post generation. In: EMNLP (pp. 3818–3829). https://doi.org/10.18653/v1/2020.emnlp-main.313.
Chen, Q., Lin, J., Zhang, Y., & et al. (2019). Towards knowledge-based personalized product description generation in e-commerce. In: KDD (pp. 3040–3050). https://doi.org/10.1145/3292500.3330725.
Chopra, S., Auli, M., & Rush, A.M. (2016). Abstractive sentence summarization with attentive recurrent neural networks. In: NAACL-HLT (pp. 93–98). https://doi.org/10.18653/v1/n16-1012.
Daultani, V., Nio, L., & Chung, Y. (2019). Unsupervised extractive summarization for product description using coverage maximization with attribute concept. In: ICSC (pp. 114–117). https://doi.org/10.1109/ICOSC.2019.8665503.
Erkan, G., & Radev, D.R. (2011). Lexrank: Graph-based lexical centrality as salience in text summarization. arXiv:1109.2128.
Fan, M., Feng, C., Sun, M., & et al. (2019). Reinforced product metadata selection for helpfulness assessment of customer reviews. In: EMNLP-IJCNLP (pp. 1675–1683). https://doi.org/10.18653/v1/D19-1177.
Gong, Y., Luo, X., Zhu, K.Q., & et al. (2019). Automatic generation of chinese short product titles for mobile display. In: AAAI (pp. 9460–9465). https://doi.org/10.1609/aaai.v33i01.33019460.
Gu, J, Lu, Z, Li, H, & et al. (2016). Incorporating copying mechanism in sequence-to-sequence learning. In: ACL. https://doi.org/10.18653/v1/p16-1154.
He, K., Zhang, X., Ren, S., & et al. (2016). Deep residual learning for image recognition. In: CVPR (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90.
Khatri, C., Singh, G., & Parikh, N. (2018). Abstractive and extractive text summarization using document context vector and recurrent neural networks. arXiv:1807.08000.
Kim, S.G., & Kang, J. (2018). Analyzing the discriminative attributes of products using text mining focused on cosmetic reviews. information processing and management, 54(6), 938–957. https://doi.org/10.1016/j.ipm.2018.06.003.
Krishna, R., Zhu, Y., Groth, O., & et al (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. international journal of computer vision, 123 (1), 32–73. https://doi.org/10.1007/s11263-016-0981-7.
Lebanoff, L., Song, K., Dernoncourt, F., & et al. (2019). Scoring sentence singletons and pairs for abstractive summarization. In: ACL (pp. 2175–2189). https://doi.org/10.18653/v1/p19-1209.
Li, H, Yuan, P, Xu, S, & et al. (2020a). Aspect-aware multimodal summarization for chinese e-commerce products. In: AAAI (pp. 8188–8195). https://aaai.org/ojs/index.php/AAAI/article/view/6332.
Li, H, Zhu, J, Zhang, J, & et al. (2020b). Keywords-guided abstractive sentence summarization. In: AAAI (pp. 8196–8203). https://aaai.org/ojs/index.php/AAAI/article/view/6333.
Lin, CY. (2004). ROUGE: A package for automatic evaluation of summaries. In: Text summarization branches out (pp. 74–81). https://aclanthology.org/W04-1013.
Liu, N., Sun, X., Yu, H., & et al. (2020). Multistage fusion with forget gate for multimodal summarization in open-domain videos. In: EMNLP (pp. 1834–1845). https://doi.org/10.18653/v1/2020.emnlp-main.144.
Liu, Y., & Lapata, M. (2019). Text summarization with pretrained encoders. In: EMNLP-IJCNLP (pp. 3728–3738). https://doi.org/10.18653/v1/D19-1387.
Luo, Z., Huang, S., & Zhu, K.Q. (2019). Knowledge empowered prominent aspect extraction from product reviews. information processing and management, 56(3), 408–423. https://doi.org/10.1016/j.ipm.2018.11.006.
Mane, MR, Kedia, S, Mantha, A, & et al. (2020). Product title generation for conversational systems using BERT. arXiv:https://arxiv.org/abs/2007.11768.
de Melo, T., da Silva, A.S., de Moura, E.S., & et al. (2019). OpinionLink: Leveraging user opinions for product catalog enrichment. information processing and management, 56(3), 823–843. https://doi.org/10.1016/j.ipm.2019.01.004.
Miao, L., Cao, D., Li, J., & et al (2020). Multi-modal product title compression. information processing and management, 57, 1. https://doi.org/10.1016/j.ipm.2019.102123.
Nallapati, R., Zhou, B., dos Santos, C.N., & et al. (2016). Abstractive text summarization using sequence-to-sequence rnns and beyond. In: SIGNLL (pp. 280–290). https://doi.org/10.18653/v1/k16-1028.
Norouzi, M, Bengio, S, Chen, Z, & et al. (2016). Reward augmented maximum likelihood for neural structured prediction. In: NeurIPS (pp. 1723–1731). https://proceedings.neurips.cc/paper/2016/hash/2f885d0fbe2e131bfc9d98363e55d1d4-Abstract.html.
Ramasamy, L.K., Kadry, S., Nam, Y., & et al. (2021). Performance analysis of sentiments in twitter dataset using svm models. international journal of electrical & computer Engineering (2088-8708), 11, 3.
Ren, S., He, K., Girshick, R.B., & et al. (2017). Faster R-CNN: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.
Rush, A.M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. In: EMNLP (pp. 379–389). https://doi.org/10.18653/v1/d15-1044.
See, A., Liu, P.J., & Manning, C.D. (2017). Get to the point: Summarization with pointer-generator networks. In: ACL (pp. 1073–1083). https://doi.org/10.18653/v1/P17-1099.
Shinzato, K, & Sekine, S. (2013). Unsupervised extraction of attributes and their values from product description. In: IJCNLP (pp. 1339–1347). https://aclanthology.org/I13-1190/.
Song, K, Tan, X, Qin, T, & et al. (2019). MASS: masked sequence to sequence pre-training for language generation. In: ICML (pp. 5926–5936). http://proceedings.mlr.press/v97/song19d.html.
de Souza, J.G.C., Kozielski, M., Mathur, P., & et al. (2018). Generating e-commerce product titles and predicting their quality. In: Proceedings of the 11th international conference on natural language generation (pp. 233–243). Association for Computational Linguistics. https://doi.org/10.18653/v1/w18-6530.
Srivastava, N, Hinton, GE, Krizhevsky, A, & et al. (2014). Dropout: a simple way to prevent neural networks from overfitting. journal of machine learning research, 15(1), 1929–1958. http://dl.acm.org/citation.cfm?id=2670313.
Vaswani, A, Shazeer, N, Parmar, N, & et al. (2017). Attention is all you need. In: NeurIPS (pp. 5998–6008). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
Wang, H., Lu, Y., & Zhai, C. (2010). Latent aspect rating analysis on review text data: a rating regression approach. In: SIGKDD (pp. 783–792). https://doi.org/10.1145/1835804.1835903.
Wang, J, Hou, Y, Liu, J, & et al. (2017). A statistical framework for product description generation. In: IJCNLP (pp. 187–192). https://aclanthology.org/I17-2032/.
Wang, J, Tian, J, Qiu, L, & et al. (2018). A multi-task learning approach for improving product title compression with user search log data. In: AAAI (pp. 451–458). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16766.
Xu, F., Pan, Z., & Xia, R. (2020a). E-commerce product review sentiment classification based on a naïve bayes continuous learning framework. information processing and management, 57(5), 102,221. https://doi.org/10.1016/j.ipm.2020.102221.
Xu, S., Li, H., Yuan, P., & et al. (2020b). Self-attention guided copy mechanism for abstractive summarization. In: ACL (pp. 1355–1362). https://doi.org/10.18653/v1/2020.acl-main.125.
Xu, H., Wang, W., Mao, X., & et al. (2019). Scaling up open tagging from tens to thousands: Comprehension empowered attribute value extraction from product title. In: ACL (pp. 5214–5223). Association for Computational Linguistics. https://doi.org/10.18653/v1/p19-1514.
Yang, M, Qu, Q, Shen, Y, & et al. (2018). Aspect and sentiment aware abstractive review summarization. In: COLING (pp. 1110–1120). https://aclanthology.org/C18-1095/.
Zhang, J., Zou, P., Li, Z., & et al. (2019a). Multi-modal generative adversarial network for short product title generation in mobile e-commerce. In: NAACL-HLT (pp. 64–72). https://doi.org/10.18653/v1/n19-2009.
Zhang, X., Wei, F., & Zhou, M. (2019b). HIBERT: Document level pre-training of hierarchical bidirectional transformers for document summarization. In A Korhonen, DR Traum, & L Márquez (Eds.) ACL(pp. 5059–5069). https://doi.org/10.18653/v1/p19-1499.
Zhang, M., Fan, B., Zhang, N., & et al. (2021a). Mining product innovation ideas from online reviews. information processing and management, 58 (1), 102,389. https://doi.org/10.1016/j.ipm.2020.102389.
Zhang, M., Zhou, G., Yu, W., & et al. (2021b). FAR-ASS: Fact-aware reinforced abstractive sentence summarization. information processing and management, 58(3), 102,478. https://doi.org/10.1016/j.ipm.2020.102478.
Zhu, C., Yang, Z., Gmyr, R., & et al. (2019). Make lead bias in your favor: A simple and effective method for news summarization. arXiv:1912.11602.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (61862011), Guangxi Science and Technology Foundation (2019GXNSFGA245004).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, M., Gang, Z., Yu, W. et al. MAA-PTG: multimodal aspect-aware product title generation. J Intell Inf Syst 59, 213–235 (2022). https://doi.org/10.1007/s10844-022-00695-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-022-00695-8