research-article

LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM

Authors:

Guangtao ZhaiAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 7783 - 7792

https://doi.org/10.1145/3664647.3680946

Published: 28 October 2024 Publication History

Abstract

Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs through text supervision. To achieve this, we transform quality labels into textual descriptions during the fine-tuning phase, enabling LMMs to derive quality rating logits from 2D projections of point clouds. To compensate for the loss of perception in the 3D domain, structural features are extracted as well. These quality logits and structural features are then combined and regressed into quality scores. Our experimental results affirm the effectiveness of our approach, showcasing a novel integration of LMMs into PCQA that enhances model understanding and assessment accuracy. We hope our contributions can inspire subsequent investigations into the fusion of LMMs with PCQA, fostering advancements in 3D visual quality analysis and beyond. The code is available at https://github.com/zzc-1998/LMM-PCQA.

References

[1]

2000. Recommendation 500--10: Methodology for the subjective assessment of the quality of television pictures. ITU-R Rec. BT.500.

[2]

Evangelos Alexiou et al. 2019. Exploiting user interactivity in quality assessment of point cloud imaging. In QoMEX. IEEE.

[3]

Evangelos Alexiou and Touradj Ebrahimi. 2020. Towards a point cloud structural similarity metric. In International Conference on Multimedia and Expo Workshop. 1--6.

[4]

Evangelos Alexiou, Xuemei Zhou, Irene Viola, and Pablo Cesar. 2021. PointPCA: Point cloud objective quality assessment using PCA-based descriptors. arXiv preprint arXiv:2111.12663 (2021).

[5]

Xiongli Chai, Feng Shao, Baoyang Mu, Hangwei Chen, Qiuping Jiang, and Yo- Sung Ho. 2024. Plain-PCQA: No-Reference Point Cloud Quality Assessment by Analysis of Plain Visual and Geometrical Components. IEEE Transactions on Circuits and Systems for Video Technology (2024).

[6]

Qi Chen, Lin Sun, ZhixinWang, Kui Jia, and Alan Yuille. 2020. Object as hotspots: An anchor-free 3d object detection approach via firing of hotspots. In European Conference on Computer Vision. 68--84.

Digital Library

[7]

Yixiong Chen. 2023. X-IQE: eXplainable Image Quality Evaluation for Textto- Image Generation with Visual Large Language Models. arXiv preprint arXiv:2305.10843 (2023).

[8]

Mingmei Cheng, Le Hui, Jin Xie, and Jian Yang. 2021. SSPC-Net: Semi-supervised semantic 3D point cloud segmentation network. In AAAI.

[9]

Aladine Chetouani, Maurice Quach, Giuseppe Valenzise, and Frédéric Dufaux. 2021. Deep learning-based quality assessment of 3d point clouds without reference. In International Conference on Multimedia and Expo Workshop. 1--6.

[10]

Yaodong Cui, Ren Chen, Wenbo Chu, Long Chen, Daxin Tian, Ying Li, and Dongpu Cao. 2021. Deep learning for image and point cloud fusion in autonomous driving: A review. IEEE Transactions on Intelligent Transportation Systems 23, 2 (2021), 722--739.

Digital Library

[11]

Tingyu Fan, Linyao Gao, Yiling Xu, Zhu Li, and Dong Wang. 2022. D-DPCC: Deep Dynamic Point Cloud Compression via 3D Motion Prediction. International Joint Conference on Artificial Intelligence (2022).

[12]

Yu Fan, Zicheng Zhang, Wei Sun, Xiongkuo Min, Ning Liu, Quan Zhou, Jun He, Qiyuan Wang, and Guangtao Zhai. 2022. A no-reference quality assessment metric for point cloud based on captured video sequences. In IEEE MMSP. IEEE, 1--5.

[13]

Eleonora Grilli, Fabio Menna, and Fabio Remondino. 2017. A review of point clouds segmentation and classification algorithms. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 42 (2017), 339.

[14]

Ke Gu, Dacheng Tao, Jun-Fei Qiao, and Weisi Lin. 2017. Learning a no-reference quality assessment model of enhanced images with big data. IEEE Transactions on Neural Networks and Learning Systems 29, 4 (2017), 1301--1313.

[15]

Shuai Gu, Junhui Hou, Huanqiang Zeng, Hui Yuan, and Kai-Kuang Ma. 2019. 3D point cloud attribute compression using geometry-guided sparse representation. IEEE Transactions on Image Processing 29 (2019), 796--808.

Digital Library

[16]

Zhipeng Huang, Zhizheng Zhang, Yiting Lu, Zheng-Jun Zha, Zhibo Chen, and Baining Guo. 2024. VisualCritic: Making LMMs Perceive Visual Quality Like Humans. arXiv:2403.12806 [cs.CV]

[17]

Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven LWaslander. 2018. Joint 3d proposal generation and object detection from view aggregation. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 1--8.

Digital Library

[18]

Haotian Liu, Chunyuan Li, QingyangWu, and Yong Jae Lee. 2023. Visual Instruction Tuning.

[19]

Qi Liu, Honglei Su, Zhengfang Duanmu, Wentao Liu, and Zhou Wang. 2022. Perceptual Quality Assessment of Colored 3D Point Clouds. IEEE Transactions on Visualization and Computer Graphics (2022).

Digital Library

[20]

Qi Liu, Hui Yuan, Raouf Hamzaoui, Honglei Su, Junhui Hou, and Huan Yang. 2021. Reduced reference perceptual quality model with application to rate control for video-based point cloud compression. IEEE Transactions on Image Processing 30 (2021), 6623--6636.

Digital Library

[21]

Qi Liu, Hui Yuan, Junhui Hou, Raouf Hamzaoui, and Honglei Su. 2020. Modelbased joint bit allocation between geometry and color for video-based 3D point cloud compression. IEEE Transactions on Multimedia 23 (2020), 3278--3291.

[22]

Qi Liu, Hui Yuan, Honglei Su, Hao Liu, Yu Wang, Huan Yang, and Junhui Hou. 2021. PQA-Net: Deep No Reference Point Cloud Quality Assessment via Multi- View Projection. IEEE Transactions on Circuits and Systems for Video Technology 31, 12 (2021), 4645--4660.

[23]

Tsung-Jung Liu, Kuan-Hsien Liu, Joe Yuchieh Lin, Weisi Lin, and C-C Jay Kuo. 2015. A paraboost method to image quality assessment. IEEE Transactions on Neural Networks and Learning Systems 28, 1 (2015), 107--121.

[24]

Weiquan Liu, Hanyun Guo, Weini Zhang, Yu Zang, Cheng Wang, and Jonathan Li. 2022. TopoSeg: Topology-aware Segmentation for Point Clouds. International Joint Conference on Artificial Intelligence (2022).

[25]

Yipeng Liu, Qi Yang, Yiling Xu, and Le Yang. 2022. Point Cloud Quality Assessment: Dataset Construction and Learning-based No-Reference Metric. ACM Transactions on Multimedia Computing, Communications, and Applications (2022).

[26]

Rufael Mekuria, Kees Blom, and Pablo Cesar. 2016. Design, implementation, and evaluation of a point cloud codec for tele-immersive video. IEEE Transactions on Circuits and Systems for Video Technology 27, 4 (2016), 828--842.

Digital Library

[27]

R Mekuria, Z Li, C Tulvan, and P Chou. 2016. Evaluation criteria for point cloud compression. ISO/IEC MPEG 16332 (2016).

[28]

Gabriel Meynet, Yana Nehmé, Julie Digne, and Guillaume Lavoué. 2020. PCQM: A full-reference quality metric for colored 3D point clouds. In International Workshop on Quality of Multimedia. 1--6.

[29]

Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. 2012. Noreference image quality assessment in the spatial domain. IEEE Transactions on Image Processing 21, 12 (2012), 4695--4708.

Digital Library

[30]

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. 2012. Making a 'completely blind' image quality analyzer. IEEE Signal Processing Letters 20, 3 (2012), 209--212.

[31]

Youngmin Park, Vincent Lepetit, and Woontack Woo. 2008. Multiple 3d object tracking for augmented reality. In IEEE/ACM International Symposium on Mixed and Augmented Reality. 117--120.

Digital Library

[32]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision.

[33]

Alec Radford, Jeff Wu, Rewon Child, D. Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners.

[34]

Dong Tian, Hideaki Ochimizu, Chen Feng, Robert Cohen, and Anthony Vetro. 2017. Geometric distortion metrics for point cloud compression. In IEEE International Conference on Image Processing. 3460--3464.

Digital Library

[35]

Eric M Torlig, Evangelos Alexiou, Tiago A Fonseca, Ricardo L de Queiroz, and Touradj Ebrahimi. 2018. A novel methodology for quality assessment of voxelized point clouds. In Applications of Digital Image Processing XLI, Vol. 10752. 174--190.

[36]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]

[37]

Sourabh Vora, Alex H Lang, Bassam Helou, and Oscar Beijbom. 2020. Pointpainting: Sequential fusion for 3d object detection. In Proceedings of the IEEE/CVF international conference on computer vision. 4604--4612.

[38]

SongtaoWang, XiaoqiWang, Hao Gao, and Jian Xiong. 2023. Non-Local Geometry and Color Gradient Aggregation Graph Model for No-Reference Point Cloud Quality Assessment. In Proceedings of the 31st ACM International Conference on Multimedia. 6803--6810.

[39]

Zhixin Wang and Kui Jia. 2019. Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 1742--1749.

Digital Library

[40]

Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, and Weisi Lin. 2024. Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision. ICLR (2024).

[41]

Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Kaixin Xu, Chunyi Li, Jingwen Hou, Guangtao Zhai, et al. 2024. Q-instruct: Improving low-level visual abilities for multi-modality foundation models. CVPR (2024).

[42]

HaoningWu, Zicheng Zhang,Weixia Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Yixuan Gao, Annan Wang, Erli Zhang, Wenxiu Sun, et al. 2023. Q-align: Teaching lmms for visual scoring via discrete text-defined levels. arXiv preprint arXiv:2312.17090 (2023).

[43]

Haoning Wu, Hanwei Zhu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Annan Wang, Wenxiu Sun, Qiong Yan, Xiaohong Liu, Guangtao Zhai, Shiqi Wang, and Weisi Lin. 2024. Towards Open-ended Visual Quality Comparison. arXiv preprint arXiv:2402.16641 (2024).

[44]

Liang Xie, Chao Xiang, Zhengxu Yu, Guodong Xu, Zheng Yang, Deng Cai, and Xiaofei He. 2020. PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module. In AAAI, Vol. 34. 12460--12467.

[45]

Wuyuan Xie, Kaimin Wang, Yakun Ju, and Miaohui Wang. 2023. pmbqa: Projection-based blind point cloud quality assessment via multimodal learning. In Proceedings of the 31st ACM International Conference on Multimedia. 3250--3258.

Digital Library

[46]

Qi Yang, Hao Chen, Zhan Ma, Yiling Xu, Rongjun Tang, and Jun Sun. 2020. Predicting the perceptual quality of point cloud: A 3d-to-2d projection-based exploration. IEEE Transactions on Multimedia (2020).

Digital Library

[47]

Qi Yang, Yipeng Liu, Siheng Chen, Yiling Xu, and Jun Sun. 2022. No-Reference Point Cloud Quality Assessment via Domain Adaptation. In Proceedings of the IEEE/CVF international conference on computer vision. 21179--21188.

[48]

Qi Yang, Zhan Ma, Yiling Xu, Zhu Li, and Jun Sun. 2020. Inferring point cloud quality via graph similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).

[49]

Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chaoya Jiang, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qian Qi, Ji Zhang, and Fei Huang. 2023. mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality. arXiv:2304.14178 [cs.CL]

[50]

Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, and Jingren Zhou. 2023. mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration. arXiv:2311.04257 [cs.CL]

[51]

Jin Hyeok Yoo, Yecheol Kim, Jisong Kim, and Jun Won Choi. 2020. 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In European Conference on Computer Vision. 720--736.

Digital Library

[52]

Zhiyuan You, Zheyuan Li, Jinjin Gu, Zhenfei Yin, Tianfan Xue, and Chao Dong. 2023. Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models. arXiv:2312.08962 [cs.CV]

[53]

Chaofan Zhang, Ziqing Huang, Shiguang Liu, and Jian Xiao. 2022. Dual-Channel Multi-Task CNN for No-Reference Screen Content Image Quality Assessment. IEEE Transactions on Circuits and Systems for Video Technology 32, 8 (2022), 5011-- 5025.

Digital Library

[54]

Chaofan Zhang and Shiguang Liu. 2022. No-reference omnidirectional image quality assessment based on joint network. In ACM International Conference on Multimedia. 943--951.

Digital Library

[55]

Lin Zhang, Lei Zhang, and Alan C Bovik. 2015. A feature-enriched completely blind image quality evaluator. IEEE Transactions on Image Processing 24, 8 (2015), 2579--2591.

Digital Library

[56]

Wei Zhang, Ali Borji, Zhou Wang, Patrick Le Callet, and Hantao Liu. 2015. The application of visual saliency models in objective image quality assessment: A statistical evaluation. IEEE Transactions on Neural Networks and Learning Systems 27, 6 (2015), 1266--1278.

[57]

Yujie Zhang, Qi Yang, and Yiling Xu. 2021. MS-GraphSIM: Inferring point cloud quality via multiscale graph similarity. In Proceedings of the 29th ACM International Conference on Multimedia. 1230--1238.

Digital Library

[58]

Zicheng Zhang, Wei Sun, Xiongkuo Min, Tao Wang, Wei Lu, and Guangtao Zhai. 2022. No-reference quality assessment for 3d colored point cloud and mesh models. IEEE Transactions on Circuits and Systems for Video Technology (2022).

Digital Library

[59]

Zicheng Zhang, Wei Sun, Xiongkuo Min, Wei Wu, Ying Chen, and Guangtao Zhai. 2022. Treating Point Cloud as Moving Camera Videos: A No-Reference Quality Assessment Metric. arXiv preprint arXiv:2208.14085 (2022).

[60]

Zicheng Zhang, Wei Sun, Xiongkuo Min, Quan Zhou, Jun He, Qiyuan Wang, and Guangtao Zhai. 2023. MM-PCQA: Multi-modal learning for no-reference point cloud quality assessment. International Joint Conference on Artificial Intelligence (2023).

Digital Library

[61]

Zicheng Zhang, Wei Sun, Xiongkuo Min, Wenhan Zhu, Tao Wang, Wei Lu, and Guangtao Zhai. 2021. A No-Reference Evaluation Metric for Low-Light Image Enhancement. In IEEE International Conference on Multimedia and Expo.

[62]

Zicheng Zhang, Wei Sun, Houning Wu, Yingjie Zhou, Chunyi Li, Xiongkuo Min, Guangtao Zhai, and Weisi Lin. 2023. GMS-3DQA: Projection-based Grid Minipatch Sampling for 3D Model Quality Assessment. arXiv preprint arXiv:2306.05658 (2023).

[63]

Qian-Yi Zhou, Jaesik Park, and Vladlen Koltun. 2018. Open3D: A modern library for 3D data processing. arXiv preprint arXiv:1801.09847 (2018).

[64]

Wei Zhou, Qi Yang, Qiuping Jiang, Guangtao Zhai, and Weisi Lin. 2022. Blind Quality Assessment of 3D Dense Point Clouds with Structure Guided Resampling. arXiv preprint arXiv:2208.14603 (2022).

[65]

Xuemei Zhou, Evangelos Alexiou, Irene Viola, and Pablo Cesar. 2023. PointPCA: Extending PointPCA objective quality assessment metric. In 2023 IEEE International Conference on Image Processing Challenges and Workshops (ICIPCW). IEEE, 1--5.

Index Terms

LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM
1. Computing methodologies
  1. Artificial intelligence
2. Human-centered computing
  1. Visualization
    1. Visualization design and evaluation methods

Recommendations

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Traditional deep neural network (DNN)-based image quality assessment (IQA) models leverage convolutional neural networks (CNN) or Transformer to learn the quality-aware feature representation, achieving commendable performance on natural scene images. ...
Non-Local Geometry and Color Gradient Aggregation Graph Model for No-Reference Point Cloud Quality Assessment
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

No-Reference point cloud quality assessment (NR-PCQA) is a challenging task in computer vision due to the irregularity of point cloud structures and the unavailability of reference information. Existing point-based and projection-based NR-PCQA models are ...
Full reference point cloud quality assessment using support vector regression
Abstract
Point clouds are a general format for representing realistic 3D objects in diverse 3D applications. Since point clouds have large data sizes, developing efficient point cloud compression methods is crucial. However, excessive compression leads to ...
Highlights
- An accurate full-reference quality assessment method for a point cloud is proposed.
- Five types of full-reference metrics are integrated by support vector regression.
- Since the five metrics cover various distortions, a superior ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
275
Total Downloads

Downloads (Last 12 months)275
Downloads (Last 6 weeks)163

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten