Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

Symmetry in Artificial Visual Perception and Its Application

A special issue of Symmetry (ISSN 2073-8994). This special issue belongs to the section "Computer".

Deadline for manuscript submissions: closed (30 November 2021) | Viewed by 47884

Special Issue Editor


E-Mail Website
Guest Editor
Department of New Media, Seoul Media Institute of Technology, Seoul 07590, Korea
Interests: artificial intelligence; control theory; neuroscience; affective computing; reinforcement learning; intervention for special education; and wireless system

Special Issue Information

Dear colleagues,

Symmetry in human neural structure has a great impact on perception. People tend to perceive an object based on salient visual regularity. The concept of symmetry is often adopted in a deep neural network to construct an efficient network structure tailored for a specific task. With the enormous advancement in artificial intelligence, artificial intelligence in visual perception does better than humans in some specific tasks. However, even the tasks which are known to be highly successful using artificial intelligence often fail in the presence of unexpected uncertainty. Even though many researches in artificial visual perception have studied object classification or segmentation, artificial visual perception is starting to make progress in cyber-physical systems with the convergence of other fields such as control or robotics. In the future, a virtual human with artificial vision capabilities beyond the human biological vision system is expected. The operation of the virtual human is expected to be realized through interaction among many different coupled subsystems. This implies that the task of visual perception is likely to be modeled in conjunction with other sensory or motor systems. Recent advances in multimodal deep learning show a glimpse of realizing the virtual human. The recent advances in artificial visual perception are resulting in many fascinating applications in various fields such as autonomous robots, human–computer interfaces, autonomous driving, smart factories, medical systems, national security, and even the fashion industry.

This Special Issue aims to highlight and advance contemporary research on artificial visual perception and its application to various fields. Theory can be developed further from exploiting theory used in different fields, while application development can be improved by understanding its principle deeply. We invite contributions of both original research and reviews of research that organize the recent research results in a unified and systematic way.

Suggested topics include but are not limited to:

  • Artificial visual perception with partial information.
  • Artificial visual perception for sensorimotor control.
  • Visual perceptual learning.
  • Artificial specificity and plasticity.
  • Artificial vision in autonomous virtual humans.
  • Multimodal deep learning.
  • Multimodal reinforcement learning.
  • Multimodal association learning.
  • Collaborative artificial visual perception.
  • Generative adversarial networks for visual perception. 

Prof. Dr. Janghoon Yang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Symmetry is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • visual perception
  • artificial intelligence
  • multimodal learning
  • virtual human
  • computer vision
  • vision-based control

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

24 pages, 510 KiB  
Article
Intelligent Detection of False Information in Arabic Tweets Utilizing Hybrid Harris Hawks Based Feature Selection and Machine Learning Models
by Thaer Thaher, Mahmoud Saheb, Hamza Turabieh and Hamouda Chantar
Symmetry 2021, 13(4), 556; https://doi.org/10.3390/sym13040556 - 27 Mar 2021
Cited by 39 | Viewed by 3640
Abstract
Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social [...] Read more.
Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset. Full article
(This article belongs to the Special Issue Symmetry in Artificial Visual Perception and Its Application)
Show Figures

Figure 1

Figure 1
<p>Behavior of E during two runs and 500 iterations [<a href="#B20-symmetry-13-00556" class="html-bibr">20</a>].</p>
Full article ">Figure 2
<p>Conceptual diagram of the proposed methodology.</p>
Full article ">Figure 3
<p>Process of tweets collection and preprocessing.</p>
Full article ">Figure 4
<p>V-shaped Transfer functions.</p>
Full article ">Figure 5
<p>Accuracy and F1_score for all classifiers using different vectorization methods.</p>
Full article ">Figure 6
<p>Accuracy and F1_score for LR and XGboost using different n-grams.</p>
Full article ">Figure 7
<p>Convergence trends of VBHHO variants.</p>
Full article ">Figure 8
<p>Average of the best-selected features by VBHHO variants.</p>
Full article ">Figure 9
<p>Average of the best-selected features by all algorithms.</p>
Full article ">Figure 10
<p>Convergence curves of all algorithms.</p>
Full article ">
15 pages, 2167 KiB  
Article
Human Symmetry Uncertainty Detected by a Self-Organizing Neural Network Map
by Birgitta Dresp-Langley and John M. Wandeto
Symmetry 2021, 13(2), 299; https://doi.org/10.3390/sym13020299 - 10 Feb 2021
Cited by 9 | Viewed by 2308
Abstract
Symmetry in biological and physical systems is a product of self-organization driven by evolutionary processes, or mechanical systems under constraints. Symmetry-based feature extraction or representation by neural networks may unravel the most informative contents in large image databases. Despite significant achievements of artificial [...] Read more.
Symmetry in biological and physical systems is a product of self-organization driven by evolutionary processes, or mechanical systems under constraints. Symmetry-based feature extraction or representation by neural networks may unravel the most informative contents in large image databases. Despite significant achievements of artificial intelligence in recognition and classification of regular patterns, the problem of uncertainty remains a major challenge in ambiguous data. In this study, we present an artificial neural network that detects symmetry uncertainty states in human observers. To this end, we exploit a neural network metric in the output of a biologically inspired Self-Organizing Map Quantization Error (SOM-QE). Shape pairs with perfect geometry mirror symmetry but a non-homogenous appearance, caused by local variations in hue, saturation, or lightness within and/or across the shapes in a given pair produce, as shown here, a longer choice response time (RT) for “yes” responses relative to symmetry. These data are consistently mirrored by the variations in the SOM-QE from unsupervised neural network analysis of the same stimulus images. The neural network metric is thus capable of detecting and scaling human symmetry uncertainty in response to patterns. Such capacity is tightly linked to the metric’s proven selectivity to local contrast and color variations in large and highly complex image data. Full article
(This article belongs to the Special Issue Symmetry in Artificial Visual Perception and Its Application)
Show Figures

Figure 1

Figure 1
<p>Hick’s Law [<a href="#B53-symmetry-13-00299" class="html-bibr">53</a>,<a href="#B54-symmetry-13-00299" class="html-bibr">54</a>] postulates that, provided the error rate in the given psychophysical task is low, sensory system uncertainty (SU) increases linearly with the amount of transmitted information (I) in a set (<b>top graph</b>). The law presumes a direct relationship between choice RT and sensory system uncertainty (SU) where RT increases linearly with amount of transmitted information/stimulus uncertainty (<b>bottom graph</b>).</p>
Full article ">Figure 2
<p>Copies of the test images, for illustration. Mirror symmetric shape pairs are displayed on a medium grey background. Visual symmetry uncertainty in the shape pairs was varied by giving shape elements variable amounts of color information resulting in variations in appearance. The condition with the highest amount of locally different color information is MULTICOL2.</p>
Full article ">Figure 3
<p>Statistically significant differences in average RT (<b>top</b>) for the comparison between BLUE and RED shape pairs with appearance levels 1 and 2. The corresponding Self-Organizing Map Quantization Error (SOM-QE) values (<b>bottom</b>) from the neural network analysis are plotted in the graph below.</p>
Full article ">Figure 4
<p>Statistically significant differences in average RT (<b>top</b>) for the comparison between BLUE and RED shape pairs with appearance levels 1, 3 and 4. The corresponding SOM-QE values (<b>bottom</b>) from the neural network analysis are plotted in the graph below. The difference in average RT between BLUE3 and BLUE4 is the only one here that is not statistically significant (see “<a href="#sec3dot1dot1-symmetry-13-00299" class="html-sec">Section 3.1.1</a>.”).</p>
Full article ">Figure 5
<p>Differences in average RT (<b>top</b>) for the comparison between BLUE and RED shape pairs with appearance level 1 and the multicolored MULTICOL shape pairs with appearance levels 1 and 2. The differences between BLUE and RED shape pairs of any appearance level are not statistically significant (see “<a href="#sec3dot1dot1-symmetry-13-00299" class="html-sec">Section 3.1.1</a>”). The differences between image conditions BLUE1 or RED1 and MULTICOL1 and between BLUE2 or RED2 and MULTICOL2 are highly significant, as is the difference between MULTICOL1 and MULTICOL2 (see “<a href="#sec3dot1dot2-symmetry-13-00299" class="html-sec">Section 3.1.2</a>”). The corresponding SOM-QE values (<b>bottom</b>) from the neural network analysis are plotted in the graph below.</p>
Full article ">Figure 6
<p>The tight link between variations in RT reflecting different levels of human uncertainty and the variations in the SOM-QE metric from the neural network analyses is brought to the fore here under the light of linear regression analysis on the RT data for shape pairs with varying levels of appearance in BLUE, RED and MULTICOL shapes, and linear regression analysis on the SOM-QE data for exactly the same shape pairs.</p>
Full article ">
26 pages, 19861 KiB  
Article
Transfer Detection of YOLO to Focus CNN’s Attention on Nude Regions for Adult Content Detection
by Nouar AlDahoul, Hezerul Abdul Karim, Mohd Haris Lye Abdullah, Mohammad Faizal Ahmad Fauzi, Abdulaziz Saleh Ba Wazir, Sarina Mansor and John See
Symmetry 2021, 13(1), 26; https://doi.org/10.3390/sym13010026 - 25 Dec 2020
Cited by 17 | Viewed by 9650
Abstract
Video pornography and nudity detection aim to detect and classify people in videos into nude or normal for censorship purposes. Recent literature has demonstrated pornography detection utilising the convolutional neural network (CNN) to extract features directly from the whole frames and support vector [...] Read more.
Video pornography and nudity detection aim to detect and classify people in videos into nude or normal for censorship purposes. Recent literature has demonstrated pornography detection utilising the convolutional neural network (CNN) to extract features directly from the whole frames and support vector machine (SVM) to classify the extracted features into two categories. However, existing methods were not able to detect the small-scale content of pornography and nudity in frames with diverse backgrounds. This limitation has led to a high false-negative rate (FNR) and misclassification of nude frames as normal ones. In order to address this matter, this paper explores the limitation of the existing convolutional-only approaches focusing the visual attention of CNN on the expected nude regions inside the frames to reduce the FNR. The You Only Look Once (YOLO) object detector was transferred to the pornography and nudity detection application to detect persons as regions of interest (ROIs), which were applied to CNN and SVM for nude/normal classification. Several experiments were conducted to compare the performance of various CNNs and classifiers using our proposed dataset. It was found that ResNet101 with random forest outperformed other models concerning the F1-score of 90.03% and accuracy of 87.75%. Furthermore, an ablation study was performed to demonstrate the impact of adding the YOLO before the CNN. YOLO–CNN was shown to outperform CNN-only in terms of accuracy, which was increased from 85.5% to 89.5%. Additionally, a new benchmark dataset with challenging content, including various human sizes and backgrounds, was proposed. Full article
(This article belongs to the Special Issue Symmetry in Artificial Visual Perception and Its Application)
Show Figures

Figure 1

Figure 1
<p>A few samples of frames in NPDI dataset.</p>
Full article ">Figure 2
<p>A few samples of normal frames with various activities, scales, positions, orientations, viewpoints, and cloth colours. The images have various resolutions, but they were resized to the same size.</p>
Full article ">Figure 3
<p>A few samples of nude frames with various backgrounds, activities, scales, positions, orientations, viewpoints, and skin colours. The images have various resolutions, but they were resized to the same size.</p>
Full article ">Figure 4
<p>A few samples of images after data augmentation.</p>
Full article ">Figure 5
<p>Few samples from the human-only images detected and extracted by YOLO3 using <b>MMU dataset.</b></p>
Full article ">Figure 6
<p>A few samples of frames in the testing film dataset.</p>
Full article ">Figure 7
<p>Illustration of the proposed pornography and nudity detection system.</p>
Full article ">Figure 8
<p>YOLO architecture [<a href="#B41-symmetry-13-00026" class="html-bibr">41</a>].</p>
Full article ">Figure 9
<p>Random forest architecture.</p>
Full article ">Figure 10
<p>Flow chart of the proposed method (on the right) and the state-of-the-art CNN-only (on the left) for nudity and pornography detection.</p>
Full article ">Figure 11
<p>Comparison between the proposed YOLO-ResNet50 and ResNet50-only [<a href="#B10-symmetry-13-00026" class="html-bibr">10</a>] in terms of accuracy using the testing film dataset.</p>
Full article ">Figure 12
<p>Confusion matrix of CNN-only (first row) and YOLO–CNN (second row) for each fold.</p>
Full article ">Figure 13
<p>Receiver operating characteristic (ROC) curves of YOLO–CNN and CNN-only using the MMU dataset.</p>
Full article ">Figure 14
<p>Comparison between the various feature extractors and classifiers in terms of the F1-score.</p>
Full article ">Figure 15
<p>CAM for few samples (<b>a</b>) input image; (<b>b</b>) ResNet50 CAM; (<b>c</b>) InceptionV3 CAM; and (<b>d</b>) ResNet101 CAM.</p>
Full article ">
16 pages, 1100 KiB  
Article
Method for Effectiveness Assessment of Electronic Warfare Systems in Cyberspace
by Seungcheol Choi, Oh-Jin Kwon, Haengrok Oh and Dongkyoo Shin
Symmetry 2020, 12(12), 2107; https://doi.org/10.3390/sym12122107 - 18 Dec 2020
Cited by 7 | Viewed by 9347
Abstract
Current electronic warfare (EW) systems, along with the rapid development of information and communication technology, are essential elements in the modern battlefield associated with cyberspace. In this study, an efficient evaluation framework is proposed to assess the effectiveness of various types of EW [...] Read more.
Current electronic warfare (EW) systems, along with the rapid development of information and communication technology, are essential elements in the modern battlefield associated with cyberspace. In this study, an efficient evaluation framework is proposed to assess the effectiveness of various types of EW systems that operate in cyberspace, which is recognized as an indispensable factor affecting modern military operations. The proposed method classifies EW systems into primary and sub-categories according to EWs’ types and identifies items for the measurement of the effectiveness of each EW system by considering the characteristics of cyberspace for evaluating the damage caused by cyberattacks. A scenario with an integrated EW system incorporating two or more different types of EW equipment is appropriately provided to confirm the effectiveness of the proposed framework in cyber electromagnetic warfare. The scenario explicates an example of assessing the effectiveness of EW systems under cyberattacks. Finally, the proposed method is demonstrated sufficiently by assessing the effectiveness of the EW systems using the scenario. Full article
(This article belongs to the Special Issue Symmetry in Artificial Visual Perception and Its Application)
Show Figures

Figure 1

Figure 1
<p>Electronic warfare sub-divisions and applications [<a href="#B6-symmetry-12-02107" class="html-bibr">6</a>].</p>
Full article ">Figure 2
<p>Overview of the electronic warfare support (ES) test and evaluation (T&amp;E) process [<a href="#B6-symmetry-12-02107" class="html-bibr">6</a>].</p>
Full article ">Figure 3
<p>Definition of cyber and electromagnetic activities (CEMA) [<a href="#B8-symmetry-12-02107" class="html-bibr">8</a>].</p>
Full article ">Figure 4
<p>Cyber and electromagnetic activities elements [<a href="#B17-symmetry-12-02107" class="html-bibr">17</a>].</p>
Full article ">Figure 5
<p>Overview of the effectiveness assessment. (<b>a</b>) Cyber battle damage assessment framework, and (<b>b</b>) workflow of EW damage assessment, which is a sub-system of cyberspace battle damage assessment framework (CBDAF).</p>
Full article ">
10 pages, 1469 KiB  
Article
Image Denoising Based on Bivariate Distribution
by Ping Zhao, Xingyu Zhao and Chun Zhao
Symmetry 2020, 12(11), 1909; https://doi.org/10.3390/sym12111909 - 20 Nov 2020
Cited by 3 | Viewed by 2327
Abstract
The literature has shown that the performance of the de-noising algorithm was greatly influenced by the dependencies between wavelet coefficients. In this paper, the bivariate probability density function (PDF) was proposed which was symmetric, and the dependencies between the coefficients were considered. The [...] Read more.
The literature has shown that the performance of the de-noising algorithm was greatly influenced by the dependencies between wavelet coefficients. In this paper, the bivariate probability density function (PDF) was proposed which was symmetric, and the dependencies between the coefficients were considered. The bivariate Cauchy distribution and the bivariate Student’s distribution are special cases of the proposed bivariate PDF. One of the parameters in the probability density function gave the estimation method, and the other parameter can take any real number greater than 2. The algorithm adopted a maximum a posteriori estimator employing the dual-tree complex wavelet transform (DTCWT). Compared with the existing best results, the method is faster and more efficient than the previous numerical integration techniques. The bivariate shrinkage function of the proposed algorithm can be expressed explicitly. The proposed method is simple to implement. Full article
(This article belongs to the Special Issue Symmetry in Artificial Visual Perception and Its Application)
Show Figures

Figure 1

Figure 1
<p>Joint bivariate probability density function for the coefficients.</p>
Full article ">Figure 2
<p>Histogram of the high-band dual-tree complex wavelet coefficients of boat image. Solid line: fitted curve of the same coefficients scaled by the proposed marginal PDF, for <math display="inline"><semantics> <mrow> <mi>θ</mi> <mo>=</mo> <mn>4</mn> </mrow> </semantics></math>. Dashed line: fitted curve of Gaussian PDF.</p>
Full article ">Figure 3
<p>New bivariate shrinkage function derived from the model proposed in (<a href="#FD2-symmetry-12-01909" class="html-disp-formula">2</a>).</p>
Full article ">Figure 4
<p>Block diagram of the proposed denoising method.</p>
Full article ">Figure 5
<p>(<b>a</b>) Original image. (<b>b</b>) Noisy image with PSNR=20.17dB (<math display="inline"><semantics> <mrow> <msub> <mi>σ</mi> <mi>n</mi> </msub> <mrow> <mspace width="3.33333pt"/> <mo>=</mo> <mspace width="3.33333pt"/> <mn>25</mn> <mo>)</mo> </mrow> </mrow> </semantics></math>. (<b>c</b>) Denoised image using the method [<a href="#B8-symmetry-12-01909" class="html-bibr">8</a>]: PSNR = 28.52 dB. (<b>d</b>) Denoised image using new bivariate shrinkage function given in (<a href="#FD19-symmetry-12-01909" class="html-disp-formula">19</a>): PSNR = 28.7222 dB.</p>
Full article ">Figure 6
<p>(<b>a</b>) Original image. (<b>b</b>) Noisy image with PSNR=20.17dB (<math display="inline"><semantics> <mrow> <msub> <mi>σ</mi> <mi>n</mi> </msub> <mrow> <mspace width="3.33333pt"/> <mo>=</mo> <mspace width="3.33333pt"/> <mn>25</mn> <mo>)</mo> </mrow> </mrow> </semantics></math>. (<b>c</b>) Denoised image using the method [<a href="#B8-symmetry-12-01909" class="html-bibr">8</a>]: PSNR = 28.93 dB. (<b>d</b>) Denoised image using new bivariate shrinkage function given in (<a href="#FD19-symmetry-12-01909" class="html-disp-formula">19</a>): PSNR = 29.0215 dB.</p>
Full article ">
16 pages, 3722 KiB  
Article
Predicting Perceptual Quality in Internet Television Based on Unsupervised Learning
by Jaroslav Frnda, Jan Nedoma, Radek Martinek and Michael Fridrich
Symmetry 2020, 12(9), 1535; https://doi.org/10.3390/sym12091535 - 17 Sep 2020
Cited by 4 | Viewed by 2251
Abstract
Quality of service (QoS) and quality of experience (QoE) are two major concepts for the quality evaluation of video services. QoS analyzes the technical performance of a network transmission chain (e.g., utilization or packet loss rate). On the other hand, subjective evaluation (QoE) [...] Read more.
Quality of service (QoS) and quality of experience (QoE) are two major concepts for the quality evaluation of video services. QoS analyzes the technical performance of a network transmission chain (e.g., utilization or packet loss rate). On the other hand, subjective evaluation (QoE) relies on the observer’s opinion, so it cannot provide output in a form of score immediately (extensive time requirements). Although several well-known methods for objective evaluation exist (trying to adopt psychological principles of the human visual system via mathematical models), each of them has its own rating scale without an existing symmetric conversion to a standardized subjective output like MOS (mean opinion score), typically represented by a five-point rating scale. This makes it difficult for network operators to recognize when they have to apply resource reservation control mechanisms. For this reason, we propose an application (classifier) that derivates the subjective end-user quality perception based on a score of objective assessment and selected parameters of each video sequence. Our model integrates the unique benefits of unsupervised learning and clustering techniques such as overfitting avoidance or small dataset requirements. In fact, most of the published papers are based on regression models or supervised clustering. In this article, we also investigate the possibility of a graphical SOM (self-organizing map) representation called a U-matrix as a feature selection method. Full article
(This article belongs to the Special Issue Symmetry in Artificial Visual Perception and Its Application)
Show Figures

Figure 1

Figure 1
<p>SSIM metric scheme.</p>
Full article ">Figure 2
<p>Absolute category rating (ACR) testing procedure [<a href="#B15-symmetry-12-01535" class="html-bibr">15</a>].</p>
Full article ">Figure 3
<p>Dataset making procedure.</p>
Full article ">Figure 4
<p>Test sequences [<a href="#B2-symmetry-12-01535" class="html-bibr">2</a>,<a href="#B17-symmetry-12-01535" class="html-bibr">17</a>] (clockwise): (<b>a</b>) a high-motion scene (fast camera rotation); (<b>b</b>) night scene (people sitting next to a fire); (<b>c</b>) low-motion scene (moving of bulldozer, static background); (<b>d</b>) Shanghai marathon (static shooting).</p>
Full article ">Figure 5
<p>Area of weight modification reducing over time.</p>
Full article ">Figure 6
<p>Cluster identification by U-matrix.</p>
Full article ">Figure 7
<p>The diagram of self-organizing map (SOM) classifier modelling.</p>
Full article ">Figure 8
<p>The U-matrix weight plane for each parameter of the training set (content type, SSIM index, packet loss rate, and bitrate). Input 1 has connections with output neurons that are very dissimilar to those of inputs 2, 3, and 4.</p>
Full article ">Figure 9
<p>Kohonen map topology with designed clusters.</p>
Full article ">Figure 10
<p>The hit diagram: training set (<b>left</b>) and cross-validation (<b>right</b>). Three inputs were wrongly classified during the cross-validation.</p>
Full article ">Figure 11
<p>The receiver operating characteristic (ROC) curve for the current model and our previous deep learning model published in [<a href="#B2-symmetry-12-01535" class="html-bibr">2</a>].</p>
Full article ">
16 pages, 14060 KiB  
Article
Learning Multimodal Representations by Symmetrically Transferring Local Structures
by Bin Dong, Songlei Jian and Kai Lu
Symmetry 2020, 12(9), 1504; https://doi.org/10.3390/sym12091504 - 13 Sep 2020
Viewed by 2631
Abstract
Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the [...] Read more.
Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the intra-modal local structures. In other words, they only focus on the object-level alignment and ignore structure-level alignment. To tackle the problem, we propose a novel symmetric multimodal representation learning framework by transferring local structures across different modalities, namely MTLS. A customized soft metric learning strategy and an iterative parameter learning process are designed to symmetrically transfer local structures and enhance the cluster structures in intra-modal representations. The bidirectional retrieval loss based on multi-layer neural networks is utilized to align two modalities. MTLS is instantiated with image and text data and shows its superior performance on image-text retrieval and image clustering. MTLS outperforms the state-of-the-art multimodal learning methods by up to 32% in terms of R@1 on text-image retrieval and 16.4% in terms of AMI onclustering. Full article
(This article belongs to the Special Issue Symmetry in Artificial Visual Perception and Its Application)
Show Figures

Figure 1

Figure 1
<p>The toy example of original unimodal representations (<b>a</b>) and the multimodal representations learned by MTLS (<b>b</b>). Furthermore, the same colors (circle and triangle pair) indicate the paired data instance. The proposed MTLS not only aligns data instances from two modalities but also transfers local cluster structures from each other.</p>
Full article ">Figure 2
<p>Multimodal representation learning framework by transferring local structures (MTLS). MTLS transforms initial data into multimodal representation via the representation encoding process. Then MTLS optimizes the multimodal representation by local structure transferring and modality aligning processes. Specifically, the multimodal representation in each modality is alternatively optimized until the loss value keeps stable.</p>
Full article ">Figure 3
<p>Image-to-text retrieval by our approach MTLS, MTFN [<a href="#B19-symmetry-12-01504" class="html-bibr">19</a>] and VSE++ [<a href="#B44-symmetry-12-01504" class="html-bibr">44</a>]. For each query image, we provide the top-5 ranked captions by MTLS, MTFN and VSE++ at the right-hand of the image, and the ground-truth ones are marked as red.</p>
Full article ">Figure 4
<p>Text-to-image retrieval by our approach MTLS, MTFN [<a href="#B19-symmetry-12-01504" class="html-bibr">19</a>] and VSE++ [<a href="#B44-symmetry-12-01504" class="html-bibr">44</a>]. For each query text, we provide the top-5 ranked images from left to right retrieved by MTLS, MTFN and VSE++, and the ground-truth ones are outlined by red box.</p>
Full article ">Figure 5
<p>The t-SNE visualization of multimodal image representations from ResNet152, VSE, VSE++, Order, TBNN, MTFN, and the proposed MTLS.</p>
Full article ">
15 pages, 1144 KiB  
Article
Braille Recognition for Reducing Asymmetric Communication between the Blind and Non-Blind
by Bi-Min Hsu
Symmetry 2020, 12(7), 1069; https://doi.org/10.3390/sym12071069 - 30 Jun 2020
Cited by 12 | Viewed by 7804
Abstract
Assistive braille technology has existed for many years with the purpose of aiding the blind in performing common tasks such as reading, writing, and communicating with others. Such technologies are aimed towards helping those who are visually impaired to better adapt to the [...] Read more.
Assistive braille technology has existed for many years with the purpose of aiding the blind in performing common tasks such as reading, writing, and communicating with others. Such technologies are aimed towards helping those who are visually impaired to better adapt to the visual world. However, an obvious gap exists in current technology when it comes to symmetric two-way communication between the blind and non-blind, as little technology allows non-blind individuals to understand the braille system. This research presents a novel approach to convert images of braille into English text by employing a convolutional neural network (CNN) model and a ratio character segmentation algorithm (RCSA). Further, a new dataset was constructed, containing a total of 26,724 labeled braille images, which consists of 37 braille symbols that correspond to 71 different English characters, including the alphabet, punctuation, and numbers. The performance of the CNN model yielded a prediction accuracy of 98.73% on the test set. The functionality performance of this artificial intelligence (AI) based recognition system could be tested through accessible user interfaces in the future. Full article
(This article belongs to the Special Issue Symmetry in Artificial Visual Perception and Its Application)
Show Figures

Figure 1

Figure 1
<p>Illustration of the basic structure of the braille cell.</p>
Full article ">Figure 2
<p>Illustration of asymmetric communication between visually impaired and non-visually impaired individuals.</p>
Full article ">Figure 3
<p>Diagram of a simple artificial neural network (ANN) model consisting of an input layer (three input nodes), a hidden layer (five hidden nodes), and an output layer (two output nodes).</p>
Full article ">Figure 4
<p>Illustration showing how convolutions work on an image with a single channel for simplicity (adapted from <a href="https://github.com/vdumoulin/conv_arithmetic/blob/master/gif/padding_strides.gif" target="_blank">https://github.com/vdumoulin/conv_arithmetic/blob/master/gif/padding_strides.gif</a>).</p>
Full article ">Figure 5
<p>(<b>a</b>) Sample cropped images from the braille dataset. (<b>b</b>) The 37 different symbols in the braille dataset, which makes up 71 braille character classes. Note: The first 10 letters of the alphabet have the same braille symbols as the 10 numerical digits, so the <math display="inline"><semantics> <mrow> <mo>“</mo> <mo>#</mo> <mo>”</mo> </mrow> </semantics></math> symbol preceding the digit character is used to denote a numerical symbol. Similarly, the “CAPS” symbol precedes a letter symbol to denote a capital letter.</p>
Full article ">Figure 6
<p>Overview of the steps involved in optical braille recognition (OBR).</p>
Full article ">Figure 7
<p>Diagram of the proposed approach for the ratio character segmentation algorithm (RCSA) and model training and prediction.</p>
Full article ">Figure 8
<p>Diagram of the architecture of the convolutional neural network (CNN) model.</p>
Full article ">Figure 9
<p>Plot showing the training and validation loss as a function of the number of training epochs.</p>
Full article ">Figure 10
<p>Flow chart showing the typical steps needed to predict paragraphs of braille (in green). The combination of the CNN model and the RSCA algorithm with the addition of a space character in the dataset (in blue) enables the jump from predicting braille characters to braille lines.</p>
Full article ">
15 pages, 2174 KiB  
Article
Cloud Detection for Satellite Imagery Using Attention-Based U-Net Convolutional Neural Network
by Yanan Guo, Xiaoqun Cao, Bainian Liu and Mei Gao
Symmetry 2020, 12(6), 1056; https://doi.org/10.3390/sym12061056 - 25 Jun 2020
Cited by 64 | Viewed by 6320
Abstract
Cloud detection is an important and difficult task in the pre-processing of satellite remote sensing data. The results of traditional cloud detection methods are often unsatisfactory in complex environments or the presence of various noise disturbances. With the rapid development of artificial intelligence [...] Read more.
Cloud detection is an important and difficult task in the pre-processing of satellite remote sensing data. The results of traditional cloud detection methods are often unsatisfactory in complex environments or the presence of various noise disturbances. With the rapid development of artificial intelligence technology, deep learning methods have achieved great success in many fields such as image processing, speech recognition, autonomous driving, etc. This study proposes a deep learning model suitable for cloud detection, Cloud-AttU, which is based on a U-Net network and incorporates an attention mechanism. The Cloud-AttU model adopts the symmetric Encoder-Decoder structure, which achieves the fusion of high-level features and low-level features through the skip-connection operation, making the output results contain richer multi-scale information. This symmetrical network structure is concise and stable, significantly enhancing the effect of image segmentation. Based on the characteristics of cloud detection, the model is improved by introducing an attention mechanism that allows model to learn more effective features and distinguish between cloud and non-cloud pixels more accurately. The experimental results show that the method proposed in this paper has a significant accuracy advantage over the traditional cloud detection method. The proposed method is also able to achieve great results in the presence of snow/ice disturbance and other bright non-cloud objects, with strong resistance to disturbance. The Cloud-AttU model proposed in this study has achieved excellent results in the cloud detection tasks, indicating that this symmetric network architecture has great potential for application in satellite image processing and deserves further research. Full article
(This article belongs to the Special Issue Symmetry in Artificial Visual Perception and Its Application)
Show Figures

Figure 1

Figure 1
<p>U-Net architecture diagram modified from the original study [<a href="#B27-symmetry-12-01056" class="html-bibr">27</a>]. Green/yellow boxes indicate multi-channel feature maps; red arrows indicate 3 × 3 convolution for feature extraction; cyan arrows indicate skip-connection for feature fusion; downward orange arrows indicate max pooling for dimension reduction; upward orange arrows indicate up-sampling for dimension recovery.</p>
Full article ">Figure 2
<p>The structure of the Cloud-AttU model. All the orange/white boxes correspond to multi-channel feature maps. The Cloud-AttU is equipped with skip connections to adaptively rescale feature maps in the encoding path with weights learned from the correlation of feature maps in the decoding path.</p>
Full article ">Figure 3
<p>The diagram of attention gate in Cloud-AttU.</p>
Full article ">Figure 4
<p>Cloud detection results of different scenes over Landsat-Cloud dataset [<a href="#B48-symmetry-12-01056" class="html-bibr">48</a>]. The first row shows the RGB images (<b>top</b>), the second row shows the ground truths (<b>middle</b>) and the third row shows the predictions of Cloud-AttU model (<b>bottom</b>). The yellow in the figure indicates that cloud exists and the purple indicates that no cloud exists.</p>
Full article ">Figure 5
<p>Cloud detection results of different scenes over Landsat-Cloud dataset [<a href="#B48-symmetry-12-01056" class="html-bibr">48</a>]. The first column is the RGB image (<b>left</b>), the second column is the ground truth (<b>center left</b>), the third column is the predictions of Cloud-Net model (<b>center right</b>) and the fourth column is the predictions of Cloud-AttU model (<b>right</b>). The yellow in the figure indicates that cloud exists and the purple indicates that no cloud exists.</p>
Full article ">Figure 6
<p>Cloud detection results under the influence of snow and ice ground over Landsat-Cloud dataset [<a href="#B48-symmetry-12-01056" class="html-bibr">48</a>], the first column is RGB image (<b>left</b>), the second column is ground truth (<b>middle left</b>), the third column is the prediction of the Cloud-Net model (<b>center right</b>), and the fourth column is the prediction of the Cloud-AttU model (<b>right</b>). The yellow in the figure indicates the presence of clouds and the purple indicates the absence of clouds.</p>
Full article ">Figure 7
<p>Cloud detection results under the influence of other factors over Landsat-Cloud dataset [<a href="#B48-symmetry-12-01056" class="html-bibr">48</a>], the first column is RGB image (<b>left</b>), the second column is ground truth (<b>middle left</b>), the third column is the prediction of the Cloud-Net model (<b>center right</b>), and the fourth column is the prediction of the Cloud-AttU model (<b>right</b>). The yellow in the figure indicates the presence of clouds and the purple indicates the absence of clouds.</p>
Full article ">
Back to TopTop