Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3664647.3680946acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM

Published: 28 October 2024 Publication History

Abstract

Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs through text supervision. To achieve this, we transform quality labels into textual descriptions during the fine-tuning phase, enabling LMMs to derive quality rating logits from 2D projections of point clouds. To compensate for the loss of perception in the 3D domain, structural features are extracted as well. These quality logits and structural features are then combined and regressed into quality scores. Our experimental results affirm the effectiveness of our approach, showcasing a novel integration of LMMs into PCQA that enhances model understanding and assessment accuracy. We hope our contributions can inspire subsequent investigations into the fusion of LMMs with PCQA, fostering advancements in 3D visual quality analysis and beyond. The code is available at https://github.com/zzc-1998/LMM-PCQA.

References

[1]
2000. Recommendation 500--10: Methodology for the subjective assessment of the quality of television pictures. ITU-R Rec. BT.500.
[2]
Evangelos Alexiou et al. 2019. Exploiting user interactivity in quality assessment of point cloud imaging. In QoMEX. IEEE.
[3]
Evangelos Alexiou and Touradj Ebrahimi. 2020. Towards a point cloud structural similarity metric. In International Conference on Multimedia and Expo Workshop. 1--6.
[4]
Evangelos Alexiou, Xuemei Zhou, Irene Viola, and Pablo Cesar. 2021. PointPCA: Point cloud objective quality assessment using PCA-based descriptors. arXiv preprint arXiv:2111.12663 (2021).
[5]
Xiongli Chai, Feng Shao, Baoyang Mu, Hangwei Chen, Qiuping Jiang, and Yo- Sung Ho. 2024. Plain-PCQA: No-Reference Point Cloud Quality Assessment by Analysis of Plain Visual and Geometrical Components. IEEE Transactions on Circuits and Systems for Video Technology (2024).
[6]
Qi Chen, Lin Sun, ZhixinWang, Kui Jia, and Alan Yuille. 2020. Object as hotspots: An anchor-free 3d object detection approach via firing of hotspots. In European Conference on Computer Vision. 68--84.
[7]
Yixiong Chen. 2023. X-IQE: eXplainable Image Quality Evaluation for Textto- Image Generation with Visual Large Language Models. arXiv preprint arXiv:2305.10843 (2023).
[8]
Mingmei Cheng, Le Hui, Jin Xie, and Jian Yang. 2021. SSPC-Net: Semi-supervised semantic 3D point cloud segmentation network. In AAAI.
[9]
Aladine Chetouani, Maurice Quach, Giuseppe Valenzise, and Frédéric Dufaux. 2021. Deep learning-based quality assessment of 3d point clouds without reference. In International Conference on Multimedia and Expo Workshop. 1--6.
[10]
Yaodong Cui, Ren Chen, Wenbo Chu, Long Chen, Daxin Tian, Ying Li, and Dongpu Cao. 2021. Deep learning for image and point cloud fusion in autonomous driving: A review. IEEE Transactions on Intelligent Transportation Systems 23, 2 (2021), 722--739.
[11]
Tingyu Fan, Linyao Gao, Yiling Xu, Zhu Li, and Dong Wang. 2022. D-DPCC: Deep Dynamic Point Cloud Compression via 3D Motion Prediction. International Joint Conference on Artificial Intelligence (2022).
[12]
Yu Fan, Zicheng Zhang, Wei Sun, Xiongkuo Min, Ning Liu, Quan Zhou, Jun He, Qiyuan Wang, and Guangtao Zhai. 2022. A no-reference quality assessment metric for point cloud based on captured video sequences. In IEEE MMSP. IEEE, 1--5.
[13]
Eleonora Grilli, Fabio Menna, and Fabio Remondino. 2017. A review of point clouds segmentation and classification algorithms. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 42 (2017), 339.
[14]
Ke Gu, Dacheng Tao, Jun-Fei Qiao, and Weisi Lin. 2017. Learning a no-reference quality assessment model of enhanced images with big data. IEEE Transactions on Neural Networks and Learning Systems 29, 4 (2017), 1301--1313.
[15]
Shuai Gu, Junhui Hou, Huanqiang Zeng, Hui Yuan, and Kai-Kuang Ma. 2019. 3D point cloud attribute compression using geometry-guided sparse representation. IEEE Transactions on Image Processing 29 (2019), 796--808.
[16]
Zhipeng Huang, Zhizheng Zhang, Yiting Lu, Zheng-Jun Zha, Zhibo Chen, and Baining Guo. 2024. VisualCritic: Making LMMs Perceive Visual Quality Like Humans. arXiv:2403.12806 [cs.CV]
[17]
Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven LWaslander. 2018. Joint 3d proposal generation and object detection from view aggregation. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 1--8.
[18]
Haotian Liu, Chunyuan Li, QingyangWu, and Yong Jae Lee. 2023. Visual Instruction Tuning.
[19]
Qi Liu, Honglei Su, Zhengfang Duanmu, Wentao Liu, and Zhou Wang. 2022. Perceptual Quality Assessment of Colored 3D Point Clouds. IEEE Transactions on Visualization and Computer Graphics (2022).
[20]
Qi Liu, Hui Yuan, Raouf Hamzaoui, Honglei Su, Junhui Hou, and Huan Yang. 2021. Reduced reference perceptual quality model with application to rate control for video-based point cloud compression. IEEE Transactions on Image Processing 30 (2021), 6623--6636.
[21]
Qi Liu, Hui Yuan, Junhui Hou, Raouf Hamzaoui, and Honglei Su. 2020. Modelbased joint bit allocation between geometry and color for video-based 3D point cloud compression. IEEE Transactions on Multimedia 23 (2020), 3278--3291.
[22]
Qi Liu, Hui Yuan, Honglei Su, Hao Liu, Yu Wang, Huan Yang, and Junhui Hou. 2021. PQA-Net: Deep No Reference Point Cloud Quality Assessment via Multi- View Projection. IEEE Transactions on Circuits and Systems for Video Technology 31, 12 (2021), 4645--4660.
[23]
Tsung-Jung Liu, Kuan-Hsien Liu, Joe Yuchieh Lin, Weisi Lin, and C-C Jay Kuo. 2015. A paraboost method to image quality assessment. IEEE Transactions on Neural Networks and Learning Systems 28, 1 (2015), 107--121.
[24]
Weiquan Liu, Hanyun Guo, Weini Zhang, Yu Zang, Cheng Wang, and Jonathan Li. 2022. TopoSeg: Topology-aware Segmentation for Point Clouds. International Joint Conference on Artificial Intelligence (2022).
[25]
Yipeng Liu, Qi Yang, Yiling Xu, and Le Yang. 2022. Point Cloud Quality Assessment: Dataset Construction and Learning-based No-Reference Metric. ACM Transactions on Multimedia Computing, Communications, and Applications (2022).
[26]
Rufael Mekuria, Kees Blom, and Pablo Cesar. 2016. Design, implementation, and evaluation of a point cloud codec for tele-immersive video. IEEE Transactions on Circuits and Systems for Video Technology 27, 4 (2016), 828--842.
[27]
R Mekuria, Z Li, C Tulvan, and P Chou. 2016. Evaluation criteria for point cloud compression. ISO/IEC MPEG 16332 (2016).
[28]
Gabriel Meynet, Yana Nehmé, Julie Digne, and Guillaume Lavoué. 2020. PCQM: A full-reference quality metric for colored 3D point clouds. In International Workshop on Quality of Multimedia. 1--6.
[29]
Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. 2012. Noreference image quality assessment in the spatial domain. IEEE Transactions on Image Processing 21, 12 (2012), 4695--4708.
[30]
Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. 2012. Making a 'completely blind' image quality analyzer. IEEE Signal Processing Letters 20, 3 (2012), 209--212.
[31]
Youngmin Park, Vincent Lepetit, and Woontack Woo. 2008. Multiple 3d object tracking for augmented reality. In IEEE/ACM International Symposium on Mixed and Augmented Reality. 117--120.
[32]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision.
[33]
Alec Radford, Jeff Wu, Rewon Child, D. Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners.
[34]
Dong Tian, Hideaki Ochimizu, Chen Feng, Robert Cohen, and Anthony Vetro. 2017. Geometric distortion metrics for point cloud compression. In IEEE International Conference on Image Processing. 3460--3464.
[35]
Eric M Torlig, Evangelos Alexiou, Tiago A Fonseca, Ricardo L de Queiroz, and Touradj Ebrahimi. 2018. A novel methodology for quality assessment of voxelized point clouds. In Applications of Digital Image Processing XLI, Vol. 10752. 174--190.
[36]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]
[37]
Sourabh Vora, Alex H Lang, Bassam Helou, and Oscar Beijbom. 2020. Pointpainting: Sequential fusion for 3d object detection. In Proceedings of the IEEE/CVF international conference on computer vision. 4604--4612.
[38]
SongtaoWang, XiaoqiWang, Hao Gao, and Jian Xiong. 2023. Non-Local Geometry and Color Gradient Aggregation Graph Model for No-Reference Point Cloud Quality Assessment. In Proceedings of the 31st ACM International Conference on Multimedia. 6803--6810.
[39]
Zhixin Wang and Kui Jia. 2019. Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 1742--1749.
[40]
Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, and Weisi Lin. 2024. Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision. ICLR (2024).
[41]
Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Kaixin Xu, Chunyi Li, Jingwen Hou, Guangtao Zhai, et al. 2024. Q-instruct: Improving low-level visual abilities for multi-modality foundation models. CVPR (2024).
[42]
HaoningWu, Zicheng Zhang,Weixia Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Yixuan Gao, Annan Wang, Erli Zhang, Wenxiu Sun, et al. 2023. Q-align: Teaching lmms for visual scoring via discrete text-defined levels. arXiv preprint arXiv:2312.17090 (2023).
[43]
Haoning Wu, Hanwei Zhu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Annan Wang, Wenxiu Sun, Qiong Yan, Xiaohong Liu, Guangtao Zhai, Shiqi Wang, and Weisi Lin. 2024. Towards Open-ended Visual Quality Comparison. arXiv preprint arXiv:2402.16641 (2024).
[44]
Liang Xie, Chao Xiang, Zhengxu Yu, Guodong Xu, Zheng Yang, Deng Cai, and Xiaofei He. 2020. PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module. In AAAI, Vol. 34. 12460--12467.
[45]
Wuyuan Xie, Kaimin Wang, Yakun Ju, and Miaohui Wang. 2023. pmbqa: Projection-based blind point cloud quality assessment via multimodal learning. In Proceedings of the 31st ACM International Conference on Multimedia. 3250--3258.
[46]
Qi Yang, Hao Chen, Zhan Ma, Yiling Xu, Rongjun Tang, and Jun Sun. 2020. Predicting the perceptual quality of point cloud: A 3d-to-2d projection-based exploration. IEEE Transactions on Multimedia (2020).
[47]
Qi Yang, Yipeng Liu, Siheng Chen, Yiling Xu, and Jun Sun. 2022. No-Reference Point Cloud Quality Assessment via Domain Adaptation. In Proceedings of the IEEE/CVF international conference on computer vision. 21179--21188.
[48]
Qi Yang, Zhan Ma, Yiling Xu, Zhu Li, and Jun Sun. 2020. Inferring point cloud quality via graph similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).
[49]
Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chaoya Jiang, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qian Qi, Ji Zhang, and Fei Huang. 2023. mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality. arXiv:2304.14178 [cs.CL]
[50]
Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, and Jingren Zhou. 2023. mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration. arXiv:2311.04257 [cs.CL]
[51]
Jin Hyeok Yoo, Yecheol Kim, Jisong Kim, and Jun Won Choi. 2020. 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In European Conference on Computer Vision. 720--736.
[52]
Zhiyuan You, Zheyuan Li, Jinjin Gu, Zhenfei Yin, Tianfan Xue, and Chao Dong. 2023. Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models. arXiv:2312.08962 [cs.CV]
[53]
Chaofan Zhang, Ziqing Huang, Shiguang Liu, and Jian Xiao. 2022. Dual-Channel Multi-Task CNN for No-Reference Screen Content Image Quality Assessment. IEEE Transactions on Circuits and Systems for Video Technology 32, 8 (2022), 5011-- 5025.
[54]
Chaofan Zhang and Shiguang Liu. 2022. No-reference omnidirectional image quality assessment based on joint network. In ACM International Conference on Multimedia. 943--951.
[55]
Lin Zhang, Lei Zhang, and Alan C Bovik. 2015. A feature-enriched completely blind image quality evaluator. IEEE Transactions on Image Processing 24, 8 (2015), 2579--2591.
[56]
Wei Zhang, Ali Borji, Zhou Wang, Patrick Le Callet, and Hantao Liu. 2015. The application of visual saliency models in objective image quality assessment: A statistical evaluation. IEEE Transactions on Neural Networks and Learning Systems 27, 6 (2015), 1266--1278.
[57]
Yujie Zhang, Qi Yang, and Yiling Xu. 2021. MS-GraphSIM: Inferring point cloud quality via multiscale graph similarity. In Proceedings of the 29th ACM International Conference on Multimedia. 1230--1238.
[58]
Zicheng Zhang, Wei Sun, Xiongkuo Min, Tao Wang, Wei Lu, and Guangtao Zhai. 2022. No-reference quality assessment for 3d colored point cloud and mesh models. IEEE Transactions on Circuits and Systems for Video Technology (2022).
[59]
Zicheng Zhang, Wei Sun, Xiongkuo Min, Wei Wu, Ying Chen, and Guangtao Zhai. 2022. Treating Point Cloud as Moving Camera Videos: A No-Reference Quality Assessment Metric. arXiv preprint arXiv:2208.14085 (2022).
[60]
Zicheng Zhang, Wei Sun, Xiongkuo Min, Quan Zhou, Jun He, Qiyuan Wang, and Guangtao Zhai. 2023. MM-PCQA: Multi-modal learning for no-reference point cloud quality assessment. International Joint Conference on Artificial Intelligence (2023).
[61]
Zicheng Zhang, Wei Sun, Xiongkuo Min, Wenhan Zhu, Tao Wang, Wei Lu, and Guangtao Zhai. 2021. A No-Reference Evaluation Metric for Low-Light Image Enhancement. In IEEE International Conference on Multimedia and Expo.
[62]
Zicheng Zhang, Wei Sun, Houning Wu, Yingjie Zhou, Chunyi Li, Xiongkuo Min, Guangtao Zhai, and Weisi Lin. 2023. GMS-3DQA: Projection-based Grid Minipatch Sampling for 3D Model Quality Assessment. arXiv preprint arXiv:2306.05658 (2023).
[63]
Qian-Yi Zhou, Jaesik Park, and Vladlen Koltun. 2018. Open3D: A modern library for 3D data processing. arXiv preprint arXiv:1801.09847 (2018).
[64]
Wei Zhou, Qi Yang, Qiuping Jiang, Guangtao Zhai, and Weisi Lin. 2022. Blind Quality Assessment of 3D Dense Point Clouds with Structure Guided Resampling. arXiv preprint arXiv:2208.14603 (2022).
[65]
Xuemei Zhou, Evangelos Alexiou, Irene Viola, and Pablo Cesar. 2023. PointPCA: Extending PointPCA objective quality assessment metric. In 2023 IEEE International Conference on Image Processing Challenges and Workshops (ICIPCW). IEEE, 1--5.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. large multi-modality model
  2. point cloud quality assessment

Qualifiers

  • Research-article

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 64
    Total Downloads
  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)64
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media