research-article

VIVID: Virtual Environment for Visual Deep Learning

Authors:

Ming-Syan ChenAuthors Info & Claims

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 1356 - 1359

https://doi.org/10.1145/3240508.3243653

Published: 15 October 2018 Publication History

Abstract

Due to the advances in deep reinforcement learning and the demand of large training data, virtual-to-real learning has gained lots of attention from computer vision community recently. As state-of-the-art 3D engines can generate photo-realistic images suitable for training deep neural networks, researchers have been gradually applied 3D virtual environment to learn different tasks including autonomous driving, collision avoidance, and image segmentation, to name a few. Although there are already many open-source simulation environments readily available, most of them either provide small scenes or have limited interactions with objects in the environment. To facilitate visual recognition learning, we present a new Virtual Environment for Visual Deep Learning (VIVID), which offers large-scale diversified indoor and outdoor scenes. Moreover, VIVID leverages the advanced human skeleton system, which enables us to simulate numerous complex human actions. VIVID has a wide range of applications and can be used for learning indoor navigation, action recognition, event detection, etc. We also release several deep learning examples in Python to demonstrate the capabilities and advantages of our system.

References

[1]

Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3D Semantic Parsing of Large-scale Indoor Spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1534--1543.

[2]

Charles Beattie, Joel Z Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, et al. 2016. DeepMind Lab. arXiv preprint arXiv:1612.03801 (2016).

[3]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016).

[4]

Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, and Aaron Courville. 2017. HoME: A Household Multimodal Environment. arXiv preprint arXiv:1711.11017 (2017).

[5]

César Roberto de Souza, Adrien Gaidon, Yohann Cabon, and AM López Pena. 2017. Procedural generation of videos to train deep action recognition networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2.

[6]

DeepDrive. 2018. DeepDrive. http://deepdrive.io

[7]

Epic Games. 2004. Unreal Engine. http://www.unrealengine.com

[8]

Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, and Jitendra Malik. 2017. Cognitive Mapping and Planning for Visual Navigation. CVPR 3 (2017).

[9]

Zhang-Wei Hong, Chen Yu-Ming, Shih-Yang Su, Tzu-Yun Shann, Yi-Hsiang Chang, Hsuan-Kung Yang, Brian Hsi-Lin Ho, Chih-Chieh Tu, Yueh-Chuan Chang, Tsu-Ching Hsiao, Hsin-Wei Hsiao, Sih-Pin Lai, and Chun-Yi Lee. 2018. Virtual-to-Real: Learning to Control in Visual Semantic Segmentation. IJCAI (2018).

[10]

Matterport Inc. 2011. Matterport 3D. http://matterport.com/

[11]

Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Jaıkowski. 2016. VizDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning. In Computational Intelligence and Games (CIG), 2016 IEEE Conference on. IEEE, 1--8.

Digital Library

[12]

Mark Martinez, Chawin Sitawarin, Kevin Finch, Lennart Meincke, Alex Yablonski, and Alain L. Kornhauser. 2017. Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars. CoRR abs/1712.01397 (2017). arXiv:1712.01397 http://arxiv.org/abs/1712.01397

[13]

Microsoft. 2015. Project Malmo. http://www.microsoft.com/en-us/research/project/project-malmo/

[14]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level Control through Deep Reinforcement Learning. Nature 518, 7540 (2015), 529--533.

[15]

Matthias Müller, Vincent Casser, Jean Lahoud, Neil Smith, and Bernard Ghanem. 2018. Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications. International Journal of Computer Vision (2018), 1--18.

Digital Library

[16]

Xinlei Pan, Yurong You, Ziyan Wang, and Cewu Lu. 2017. Virtual to Real Reinforcement Learning for Autonomous Driving. BMVC (2017).

[17]

Weichao Qiu, Fangwei Zhong, Yi Zhang, Siyuan Qiao, Zihao Xiao, Tae Soo Kim, and Yizhou Wang. 2017. UnrealCV: Virtual Worlds for Computer Vision. In Proceedings of the 2017 ACM on Multimedia Conference. ACM, 1221--1224.

Digital Library

[18]

Fereshteh Sadeghi and Sergey Levine. 2016. (CAD) 2 RL: Real Single-Image Flight without a Single Real Image. arXiv preprint arXiv:1611.04201 (2016).

[19]

Manolis Savva, Angel X Chang, Alexey Dosovitskiy, Thomas Funkhouser, and Vladlen Koltun. 2017. MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments. arXiv preprint arXiv:1712.03931 (2017).

[20]

Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. 2018. AirSim: High-fidelity Visual and Physical Simulation for Autonomous Vehicles. In Field and Service Robotics. Springer, 621--635.

[21]

Adobe Systems. 2015. Mixamo. http://www.mixamo.com

[22]

Unity Technologies. 2004. Unity. http://unity.com

[23]

Unity. 2017. Unity Machine Learning. http://unity3d.com/machine-learning

[24]

Yi Wu, Yuxin Wu, Georgia Gkioxari, and Yuandong Tian. 2018. Building Generalizable Agents with a Realistic and Rich 3D Environment. arXiv preprint arXiv:1801.02209 (2018).

[25]

Bernhard Wymann, Eric Espié, Christophe Guionneau, Christos Dimitrakakis, Rémi Coulom, and Andrew Sumner. 2000. Torcs, the Open Racing Car Simulator. Software available at http://torcs. sourceforge. net 4 (2000).

[26]

Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J Lim, Abhinav Gupta, Li Fei-Fei, and Ali Farhadi. 2017. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning. In Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 3357--3364.

Digital Library

Cited By

Shen MTan ZNiyato DLiu YKang JXiong ZZhu LWang WShen X(2024)Artificial Intelligence for Web 3.0: A Comprehensive SurveyACM Computing Surveys10.1145/365728456:10(1-39)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3657284
Luo HLuo JVasilakos A(2024)BC4LLM: A perspective of trusted artificial intelligence when blockchain meets large language modelsNeurocomputing10.1016/j.neucom.2024.128089599(128089)Online publication date: Sep-2024
https://doi.org/10.1016/j.neucom.2024.128089
Yao DZhu MZhu HCai WZhou L(2024)Integrating synthetic datasets with CLIP semantic insights for single image localization advancementsISPRS Journal of Photogrammetry and Remote Sensing10.1016/j.isprsjprs.2024.10.027218(198-213)Online publication date: Dec-2024
https://doi.org/10.1016/j.isprsjprs.2024.10.027
Show More Cited By

Index Terms

VIVID: Virtual Environment for Visual Deep Learning
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Virtual reality
  2. Machine learning

Recommendations

Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain

Rough terrain autonomous navigation continues to pose a challenge to the robotics community. Robust navigation by a mobile robot depends not only on the individual performance of perception and planning systems, but on how well these systems are ...
Visual Learning with Navigation as an Example

This article describes Shoslif, an appearance-based approach for vision-system-based control. The state-based learning method presented in this article is applicable to virtually any vision-based control problem. The authors use navigation as an ...
Camera Recognition and Laser Detection based on EKF-SLAM in the Autonomous Navigation of Humanoid Robot

The ability of autonomous navigation of the humanoid robot under unknown environment is very important to real-life applications. EKF-SLAM based on the camera recognition and laser detection for humanoid robot NAO is presented in this paper. Camera ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

2167 pages

ISBN:9781450356657

DOI:10.1145/3240508

General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22 - 26, 2018

Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
494
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)6

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shen MTan ZNiyato DLiu YKang JXiong ZZhu LWang WShen X(2024)Artificial Intelligence for Web 3.0: A Comprehensive SurveyACM Computing Surveys10.1145/365728456:10(1-39)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3657284
Luo HLuo JVasilakos A(2024)BC4LLM: A perspective of trusted artificial intelligence when blockchain meets large language modelsNeurocomputing10.1016/j.neucom.2024.128089599(128089)Online publication date: Sep-2024
https://doi.org/10.1016/j.neucom.2024.128089
Yao DZhu MZhu HCai WZhou L(2024)Integrating synthetic datasets with CLIP semantic insights for single image localization advancementsISPRS Journal of Photogrammetry and Remote Sensing10.1016/j.isprsjprs.2024.10.027218(198-213)Online publication date: Dec-2024
https://doi.org/10.1016/j.isprsjprs.2024.10.027
Shen MTang XWang WZhu LShen MTang XWang WZhu L(2024)Introduction of Web 3.0Security and Privacy in Web 3.010.1007/978-981-97-5752-7_1(1-14)Online publication date: 10-Jul-2024
https://doi.org/10.1007/978-981-97-5752-7_1
Madan NSiemon MGjerde MPetersson BGrotuzas AEsbensen MNikolov IPhilipsen MNasrollahi KMoeslund T(2023)ThermalSynth: A Novel Approach for Generating Synthetic Thermal Human Scenarios2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)10.1109/WACVW58289.2023.00018(130-139)Online publication date: Jan-2023
https://doi.org/10.1109/WACVW58289.2023.00018
Lai KChung YSu JLai CHuang Y(2023)AI Wings: An AIoT Drone System for Commanding ArduPilot UAVsIEEE Systems Journal10.1109/JSYST.2022.318901117:2(2213-2224)Online publication date: Jun-2023
https://doi.org/10.1109/JSYST.2022.3189011
You WHuang CHu KLiu TLai K(2023)Augmented Reality for Real Object Detection2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)10.1109/ICCE-Taiwan58799.2023.10227067(803-804)Online publication date: 17-Jul-2023
https://doi.org/10.1109/ICCE-Taiwan58799.2023.10227067
Gokce Narin N(2023)The Role of Artificial Intelligence and Robotic Solution Technologies in Metaverse DesignMetaverse10.1007/978-981-99-4641-9_4(45-63)Online publication date: 13-Oct-2023
https://doi.org/10.1007/978-981-99-4641-9_4
Li JCai SYang QHuang H(2023)How to Enrich Metaverse? Blockchains, AI, and Digital TwinFrom Blockchain to Web3 & Metaverse10.1007/978-981-99-3648-9_2(27-61)Online publication date: 25-May-2023
https://doi.org/10.1007/978-981-99-3648-9_2
Yang QZhao YHuang HXiong ZKang JZheng Z(2022)Fusing Blockchain and AI With Metaverse: A SurveyIEEE Open Journal of the Computer Society10.1109/OJCS.2022.31882493(122-136)Online publication date: 2022
https://doi.org/10.1109/OJCS.2022.3188249
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents