Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2018
Unprecedented Usage of Pre-trained CNNs on Beauty Product
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 2068–2072https://doi.org/10.1145/3240508.3266433How does a pre-trained Convolution Neural Network (CNN) model perform on beauty and personal care items (i.e Perfect-500K) This is the question we attempt to answer in this paper by adopting several well known deep learning models pre-trained on ...
- research-articleOctober 2018
On Reducing Effort in Evaluating Laparoscopic Skills
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 815–819https://doi.org/10.1145/3240508.3243934Training and evaluation of laparoscopic skills have become an important aspect of young surgeons' education. The evaluation process is currently performed manually by experienced surgeons through reviewing video recordings of laparoscopic procedures for ...
- research-articleOctober 2018
VIVID: Virtual Environment for Visual Deep Learning
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1356–1359https://doi.org/10.1145/3240508.3243653Due to the advances in deep reinforcement learning and the demand of large training data, virtual-to-real learning has gained lots of attention from computer vision community recently. As state-of-the-art 3D engines can generate photo-realistic images ...
- research-articleOctober 2018
Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1976–1983https://doi.org/10.1145/3240508.3241911Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, ...
- demonstrationOctober 2018
Magical Rice Bowl: A Real-time Food Category Changer
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1244–1246https://doi.org/10.1145/3240508.3241391In this demo, we demonstrate "Real-time Food Category Change'' based on a Conditional Cycle GAN (cCycle GAN) with a large-scale food image data collected from the Twitter Stream. Conditional Cycle GAN is an extension of CycleGAN, which enables "Food ...
-
- research-articleOctober 2018
Partial Multi-view Subspace Clustering
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1794–1801https://doi.org/10.1145/3240508.3240679For many real-world multimedia applications, data are often described by multiple views. Therefore, multi-view learning researches are of great significance. Traditional multi-view clustering methods assume that each view has complete data. However, ...
- research-articleOctober 2018
A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1510–1518https://doi.org/10.1145/3240508.3240675Current researches mainly focus on single-view and multiview human action recognition, which can hardly satisfy the requirements of human-robot interaction (HRI) applications to recognize actions from arbitrary views. The lack of databases also sets up ...
- research-articleOctober 2018
Generating Defensive Plays in Basketball Games
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1580–1588https://doi.org/10.1145/3240508.3240670In this paper, we present a method to generate realistic defensive plays in a basketball game based on the ball and the offensive team's movements. Our system allows players and coaches to simulate how the opposing team will react to a newly developed ...
- research-articleOctober 2018
SibNet: Sibling Convolutional Encoder for Video Captioning
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1425–1434https://doi.org/10.1145/3240508.3240667Video captioning is a challenging task owing to the complexity of understanding the copious visual information in videos and describing it using natural language. Different from previous work that encodes video information using a single flow, in this ...
- research-articleOctober 2018
Learning Local Descriptors with Adversarial Enhancer from Volumetric Geometry Patches
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1466–1474https://doi.org/10.1145/3240508.3240666Local matching problems (e.g. key point matching, geometry registration) are significant but challenging tasks in computer vision field. In this paper, we propose to learn a robust local 3D descriptor from volumetric point patches to tackle the local ...
- research-articleOctober 2018
Enhancing Visual Question Answering Using Dropout
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1002–1010https://doi.org/10.1145/3240508.3240662Using dropout in Visual Question Answering (VQA) is a common practice to prevent overfitting. However, in multi-path networks, the current way to use dropout may cause two problems: the co-adaptations of neurons and the explosion of output variance. In ...
- research-articleOctober 2018
User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1536–1544https://doi.org/10.1145/3240508.3240661Scribble colors based line art colorization is a challenging computer vision problem since neither greyscale values nor semantic information is presented in line arts, and the lack of authentic illustration-line art training pairs also increases ...
- research-articleOctober 2018
Fully Point-wise Convolutional Neural Network for Modeling Statistical Regularities in Natural Images
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 984–992https://doi.org/10.1145/3240508.3240653Modeling statistical regularity plays an essential role in ill-posed image processing problems. Recently, deep learning based methods have been presented to implicitly learn statistical representation of pixel distributions in natural images and ...
- research-articleOctober 2018
Attentive Recurrent Neural Network for Weak-supervised Multi-label Image Classification
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1092–1100https://doi.org/10.1145/3240508.3240649Multi-label image classification is a fundamental and challenging task in computer vision, and recently achieved significant progress by exploiting semantic relations among labels. However, the spatial positions of labels for multi-labels images are ...
- research-articleOctober 2018
End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 546–554https://doi.org/10.1145/3240508.3240643Blind video quality assessment (BVQA) algorithms are traditionally designed with a two-stage approach - a feature extraction stage that computes typically hand-crafted spatial and/or temporal features, and a regression stage working in the feature space ...
- research-articleOctober 2018
ThoughtViz: Visualizing Human Thoughts Using Generative Adversarial Network
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 950–958https://doi.org/10.1145/3240508.3240641Studying human brain signals has always gathered great attention from the scientific community. In Brain Computer Interface (BCI) research, for example, changes of brain signals in relation to specific tasks (e.g., thinking something) are detected and ...
- research-articleOctober 2018
An ADMM-Based Universal Framework for Adversarial Attacks on Deep Neural Networks
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1065–1073https://doi.org/10.1145/3240508.3240639Deep neural networks (DNNs) are known vulnerable to adversarial attacks. That is, adversarial examples, obtained by adding delicately crafted distortions onto original legal inputs, can mislead a DNN to classify them as any target labels. In a ...
- research-articleOctober 2018
Learning and Fusing Multimodal Deep Features for Acoustic Scene Categorization
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1892–1900https://doi.org/10.1145/3240508.3240631Convolutional Neural Networks (CNNs) have been widely applied to audio classification recently where promising results have been obtained. Previous CNN-based systems mostly learn from two-dimensional time-frequency representations such as MFCC and ...
- research-articleOctober 2018
BeautyGAN: Instance-level Facial Makeup Transfer with Deep Generative Adversarial Network
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 645–653https://doi.org/10.1145/3240508.3240618Facial makeup transfer aims to translate the makeup style from a given reference makeup face image to another non-makeup one while preserving face identity. Such an instance-level transfer problem is more challenging than conventional domain-level ...
- research-articleOctober 2018
WildFish: A Large Benchmark for Fish Recognition in the Wild
MM '18: Proceedings of the 26th ACM international conference on MultimediaPages 1301–1309https://doi.org/10.1145/3240508.3240616Fish recognition is an important task to understand the marine ecosystem and biodiversity. It is often challenging to identify fish species in the wild, due to the following difficulties. First, most fish benchmarks are small-scale, which may limit the ...