Special Section on Recent Advances in Multimedia Signal Processing Techniques and Applications
-
Haizhou LI
2012 Volume E95.D Issue 5 Pages
1181
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
-
Sadaoki FURUI
Article type: INVITED PAPER
Subject area: Speech Processing
2012 Volume E95.D Issue 5 Pages
1182-1194
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
This paper presents our recent work in regard to building Large Vocabulary Continuous Speech Recognition (LVCSR) systems for the Thai, Indonesian, and Chinese languages. For Thai, since there is no word boundary in the written form, we have proposed a new method for automatically creating word-like units from a text corpus, and applied topic and speaking style adaptation to the language model to recognize spoken-style utterances. For Indonesian, we have applied proper noun-specific adaptation to acoustic modeling, and rule-based English-to-Indonesian phoneme mapping to solve the problem of large variation in proper noun and English word pronunciation in a spoken-query information retrieval system. In spoken Chinese, long organization names are frequently abbreviated, and abbreviated utterances cannot be recognized if the abbreviations are not included in the dictionary. We have proposed a new method for automatically generating Chinese abbreviations, and by expanding the vocabulary using the generated abbreviations, we have significantly improved the performance of spoken query-based search.
View full abstract
-
Kuan-Yu CHEN, Hsin-Min WANG, Berlin CHEN
Article type: PAPER
Subject area: Speech Processing
2012 Volume E95.D Issue 5 Pages
1195-1205
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
This paper describes the application of two attractive categories of topic modeling techniques to the problem of spoken document retrieval (SDR), viz. document topic model (DTM) and word topic model (WTM). Apart from using the conventional unsupervised training strategy, we explore a supervised training strategy for estimating these topic models, imagining a scenario that user query logs along with click-through information of relevant documents can be utilized to build an SDR system. This attempt has the potential to associate relevant documents with queries even if they do not share any of the query words, thereby improving on retrieval quality over the baseline system. Likewise, we also study a novel use of pseudo-supervised training to associate relevant documents with queries through a pseudo-feedback procedure. Moreover, in order to lessen SDR performance degradation caused by imperfect speech recognition, we investigate leveraging different levels of index features for topic modeling, including words, syllable-level units, and their combination. We provide a series of experiments conducted on the TDT (TDT-2 and TDT-3) Chinese SDR collections. The empirical results show that the methods deduced from our proposed modeling framework are very effective when compared with a few existing retrieval approaches.
View full abstract
-
Xiaoxuan WANG, Lei XIE, Mimi LU, Bin MA, Eng Siong CHNG, Haizhou LI
Article type: PAPER
Subject area: Speech Processing
2012 Volume E95.D Issue 5 Pages
1206-1215
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
In this paper, we propose integration of multimodal features using conditional random fields (CRFs) for the segmentation of broadcast news stories. We study story boundary cues from lexical, audio and video modalities, where lexical features consist of lexical similarity, chain strength and overall cohesiveness; acoustic features involve pause duration, pitch, speaker change and audio event type; and visual features contain shot boundaries, anchor faces and news title captions. These features are extracted in a sequence of boundary candidate positions in the broadcast news. A linear-chain CRF is used to detect each candidate as boundary/non-boundary tags based on the multimodal features. Important interlabel relations and contextual feature information are effectively captured by the sequential learning framework of CRFs. Story segmentation experiments show that the CRF approach outperforms other popular classifiers, including decision trees (DTs), Bayesian networks (BNs), naive Bayesian classifiers (NBs), multilayer perception (MLP), support vector machines (SVMs) and maximum entropy (ME) classifiers.
View full abstract
-
Sungjin LEE, Hyungjong NOH, Jonghoon LEE, Kyusong LEE, Gary Geunbae LE ...
Article type: PAPER
Subject area: Speech Processing
2012 Volume E95.D Issue 5 Pages
1216-1228
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Although there have been enormous investments into English education all around the world, not many differences have been made to change the English instruction style. Considering the shortcomings for the current teaching-learning methodology, we have been investigating advanced computer-assisted language learning (CALL) systems. This paper aims at summarizing a set of POSTECH approaches including theories, technologies, systems, and field studies and providing relevant pointers. On top of the state-of-the-art technologies of spoken dialog system, a variety of adaptations have been applied to overcome some problems caused by numerous errors and variations naturally produced by non-native speakers. Furthermore, a number of methods have been developed for generating educational feedback that help learners develop to be proficient. Integrating these efforts resulted in intelligent educational robots — Mero and Engkey — and virtual 3D language learning games, Pomy. To verify the effects of our approaches on students' communicative abilities, we have conducted a field study at an elementary school in Korea. The results showed that our CALL approaches can be enjoyable and fruitful activities for students. Although the results of this study bring us a step closer to understanding computer-based education, more studies are needed to consolidate the findings.
View full abstract
-
Yi Ren LENG, Huy Dat TRAN, Norihide KITAOKA, Haizhou LI
Article type: PAPER
Subject area: Audio Processing
2012 Volume E95.D Issue 5 Pages
1229-1237
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Conventional features for Automatic Speech Recognition and Sound Event Recognition such as Mel-Frequency Cepstral Coefficients (MFCCs) have been shown to perform poorly in noisy conditions. We introduce an auditory feature based on the gammatone filterbank, the Selective Gammatone Envelope Feature (SGEF), for Robust Sound Event Recognition where channel selection and the filterbank envelope is used to reduce the effect of noise for specific noise environments. In the experiments with Hidden Markov Model (HMM) recognizers, we shall show that our feature outperforms MFCCs significantly in four different noisy environments at various signal-to-noise ratios.
View full abstract
-
Jae Gon KIM, Jun-Dong CHO
Article type: PAPER
Subject area: Image Processing
2012 Volume E95.D Issue 5 Pages
1238-1247
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
In this paper, we propose an optimized virtual re-convergence system especially to reduce the visual fatigue caused by binocular stereoscopy. Our unique idea to reduce visual fatigue is to utilize the virtual re-convergence based on the optimized disparity-map that contains more depth information in the negative disparity area than in the positive area. Therefore, our system facilitates a unique search-range scheme, especially for negative disparity exploration. In addition, we used a dedicated method, using a so-called Global-Shift Value (GSV), which are the total shift values of each image in stereoscopy to converge a main object that can mostly affect visual fatigue. The experimental result, which is a subjective assessment by participants, shows that the proposed method makes stereoscopy significantly comfortable and attractive to view than existing methods.
View full abstract
-
Zhuo YANG, Sei-ichiro KAMATA
Article type: PAPER
Subject area: Image Processing
2012 Volume E95.D Issue 5 Pages
1248-1255
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Polar and Spherical Fourier analysis can be used to extract rotation invariant features for image retrieval and pattern recognition tasks. They are demonstrated to show superiorities comparing with other methods on describing rotation invariant features of two and three dimensional images. Based on mathematical properties of trigonometric functions and associated Legendre polynomials, fast algorithms are proposed for multimedia applications like real time systems and large multimedia databases in order to increase the computation speed. The symmetric points are computed simultaneously. Inspired by relative prime number theory, systematic analysis are given in this paper. Novel algorithm is deduced that provide even faster speed. Proposed method are 9-15% faster than previous work. The experimental results on two and three dimensional images are given to illustrate the effectiveness of the proposed method. Multimedia signal processing applications that need real time polar and spherical Fourier analysis can be benefit from this work.
View full abstract
-
Kitti KOONSANIT, Chuleerat JARUSKULCHAI
Article type: PAPER
Subject area: Image Processing
2012 Volume E95.D Issue 5 Pages
1256-1263
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Nowadays, clustering is a popular tool for exploratory data analysis, with one technique being K-means clustering. Determining the appropriate number of clusters is a significant problem in K-means clustering because the results of the k-means technique depend on different numbers of clusters. Automatic determination of the appropriate number of clusters in a K-means clustering application is often needed in advance as an input parameter to the K-means algorithm. We propose a new method for automatic determination of the appropriate number of clusters using an extended co-occurrence matrix technique called a tri-co-occurrence matrix technique for multispectral imagery in the pre-clustering steps. The proposed method was tested using a dataset from a known number of clusters. The experimental results were compared with ground truth images and evaluated in terms of accuracy, with the numerical result of the tri-co-occurrence providing an accuracy of 84.86%. The results from the tests confirmed the effectiveness of the proposed method in finding the appropriate number of clusters and were compared with the original co-occurrence matrix technique and other algorithms.
View full abstract
-
Kenjiro SUGIMOTO, Koji INOUE, Yoshimitsu KUROKI, Sei-ichiro KAMATA
Article type: PAPER
Subject area: Image Processing
2012 Volume E95.D Issue 5 Pages
1264-1271
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
This paper presents a color-based method for medicine package recognition, called a linear manifold color descriptor (LMCD). It describes a color distribution (a set of color pixels) of a color package image as a linear manifold (an affine subspace) in the color space, and recognizes an anonymous package by linear manifold matching. Mainly due to low dimensionality of color spaces, LMCD can provide more compact description and faster computation than description styles based on histogram and dominant-color. This paper also proposes distance-based dissimilarities for linear manifold matching. Specially designed for color distribution matching, the proposed dissimilarities are theoretically appropriate more than J-divergence and canonical angles. Experiments on medicine package recognition validates that LMCD outperforms competitors including MPEG-7 color descriptors in terms of description size, computational cost and recognition rate.
View full abstract
-
Toshiyuki UTO, Yuka TAKEMURA, Hidekazu KAMITANI, Kenji OHUE
Article type: PAPER
Subject area: Image Processing
2012 Volume E95.D Issue 5 Pages
1272-1279
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
This paper describes a blind watermarking scheme through cyclic signal processing. Due to various rapid networks, there is a growing demand of copyright protection for multimedia data. As efficient watermarking of images, there exist two major approaches: a quantization-based method and a correlation-based method. In this paper, we proposes a correlation-based watermarking technique of three-dimensional (3-D) polygonal models using the fast Fourier transforms (FFTs). For generating a watermark with desirable properties, similar to a pseudonoise signal, an impulse signal on a two-dimensional (2-D) space is spread through the FFT, the multiplication of a complex sinusoid signal, and the inverse FFT. This watermark, i.e., spread impulse signal, in a transform domain is converted to a spatial domain by an inverse wavelet transform, and embedded into 3-D data aligned by the principle component analysis (PCA). In the detection procedure, after realigning the watermarked mesh model through the PCA, we map the 3-D data on the 2-D space via block segmentation and averaging operation. The 2-D data are processed by the inverse system, i.e., the FFT, the division of the complex sinusoid signal, and the inverse FFT. From the resulting 2-D signal, we detect the position of the maximum value as a signature. For 3-D bunny models, detection rates and information capacity are shown to evaluate the performance of the proposed method.
View full abstract
-
Pengyi HAO, Sei-ichiro KAMATA
Article type: PAPER
Subject area: Video Processing
2012 Volume E95.D Issue 5 Pages
1280-1287
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
We are interested in retrieving video shots or videos containing particular people from a video dataset. Owing to the large variations in pose, illumination conditions, occlusions, hairstyles and facial expressions, face tracks have recently been researched in the fields of face recognition, face retrieval and name labeling from videos. However, when the number of face tracks is very large, conventional methods, which match all or some pairs of faces in face tracks, will not be effective. Therefore, in this paper, an efficient method for finding a given person from a video dataset is presented. In our study, in according to performing research on face tracks in a single video, we also consider how to organize all the faces in videos in a dataset and how to improve the search quality in the query process. Different videos may include the same person; thus, the management of individuals in different videos will be useful for their retrieval. The proposed method includes the following three points. (i) Face tracks of the same person appearing for a period in each video are first connected on the basis of scene information with a time constriction, then all the people in one video are organized by a proposed hierarchical clustering method. (ii) After obtaining the organizational structure of all the people in one video, the people are organized into an upper layer by affinity propagation. (iii) Finally, in the process of querying, a remeasuring method based on the index structure of videos is performed to improve the retrieval accuracy. We also build a video dataset that contains six types of videos: films, TV shows, educational videos, interviews, press conferences and domestic activities. The formation of face tracks in the six types of videos is first researched, then experiments are performed on this video dataset containing more than 1 million faces and 218,786 face tracks. The results show that the proposed approach has high search quality and a short search time.
View full abstract
-
Ichiro IDE, Tomoyoshi KINOSHITA, Tomokazu TAKAHASHI, Hiroshi MO, Norio ...
Article type: PAPER
Subject area: Video Processing
2012 Volume E95.D Issue 5 Pages
1288-1300
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Recent advance in digital storage technology has enabled us to archive a large volume of video data. Thanks to this trend, we have archived more than 1,800 hours of video data from a daily Japanese news show in the last ten years. When considering the effective use of such a large news video archive, we assumed that analysis of its chronological and semantic structure becomes important. We also consider that providing the users with the development of news topics is more important to help their understanding of current affairs, rather than providing a list of relevant news stories as in most of the current news video retrieval systems. Therefore, in this paper, we propose a structuring method for a news video archive, together with an interface that visualizes the structure, so that users could track the development of news topics according to their interest, efficiently. The proposed news video structure, namely the “topic thread structure”, is obtained as a result of an analysis of the chronological and semantic relation between news stories. Meanwhile, the proposed interface, namely “mediaWalker II”, allows users to track the development of news topics along the topic thread structure, and at the same time watch the video footage corresponding to each news story. Analyses on the topic thread structures obtained by applying the proposed method to actual news video footages revealed interesting and comprehensible relations between news topics in the real world. At the same time, analyses on their size quantified the efficiency of tracking a user's topic-of-interest based on the proposed topic thread structure. We consider this as a first step towards facilitating video authoring by users based on existing contents in a large-scale news video archive.
View full abstract
-
Takayuki NAKACHI, Kan TOYOSHIMA, Yoshihide TONOMURA, Tatsuya FUJII
Article type: PAPER
Subject area: Video Processing
2012 Volume E95.D Issue 5 Pages
1301-1312
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
In this paper, we propose a layered multicast encryption scheme that provides flexible access control to motion JPEG2000 code streams. JPEG2000 generates layered code streams and offers flexible scalability in characteristics such as resolution and SNR. The layered multicast encryption proposal allows a sender to multicast the encrypted JPEG2000 code streams such that only designated groups of users can decrypt the layered code streams. While keeping the layering functionality, the proposed method offers useful properties such as 1) video quality control using only one private key, 2) guaranteed security, and 3) low computational complexity comparable to conventional non-layered encryption. Simulation results show the usefulness of the proposed method.
View full abstract
-
Lei SUN, Jie LENG, Jia SU, Yiqing HUANG, Hiroomi MOTOHASHI, Takeshi IK ...
Article type: PAPER
Subject area: Video Processing
2012 Volume E95.D Issue 5 Pages
1313-1323
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Scalable Video Coding (SVC) was standardized as an extension of H.264/AVC with the intention to provide flexible adaptation to heterogeneous networks and different end-user requirements, which provides great scalability in multi-point applications such as video conferencing. However, due to the existence of H.264/AVC-based systems, transcoding between AVC and SVC becomes necessary. Most existing works focus on temporal transcoding, quality transcoding or SVC-to-AVC spatial transcoding while the straightforward re-encoding method requires high computational cost. This paper proposes a low-complexity AVC-to-SVC spatial transcoder based on coarse-level mode mapping for video conferencing scenes. First, to omit unnecessary motion estimations (ME) for layers with reduced resolution, an ME skipping scheme based on AVC mode distribution is proposed with an adaptive search range. Then a probability-profile based scheme is proposed for further mode skipping. After that 3 coarse-level mode-mapping methods are presented for fast mode decision and the adaptive usage of the 3 methods is discussed. Finally, motion vector (MV) refinement is introduced for further lower-layer time reduction. As for the top layer, direct encapsulation is proposed to preserve better quality and another scheme involving inter-layer predictions is also provided for bandwidth-crucial applications. Simulation results show that proposed transcoder achieves up to 92.6% time reduction without significant coding efficiency loss compared to re-encoding method.
View full abstract
-
Peng SONG, Shuhong XU, Wee Teck FONG, Ching Ling CHIN, Gim Guan CHUA, ...
Article type: PAPER
Subject area: Signal Processing
2012 Volume E95.D Issue 5 Pages
1324-1331
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
The development of new technologies has undoubtedly promoted the advances of modern education, among which Virtual Reality (VR) technologies have made the education more visually accessible for students. However, classroom education has been the focus of VR applications whereas not much research has been done in promoting sports education using VR technologies. In this paper, an immersive VR system is designed and implemented to create a more intuitive and visual way of teaching tennis. A scalable system architecture is proposed in addition to the hardware setup layout, which can be used for various immersive interactive applications such as architecture walkthroughs, military training simulations, other sports game simulations, interactive theaters, and telepresent exhibitions. Realistic interaction experience is achieved through accurate and robust hybrid tracking technology, while the virtual human opponent is animated in real time using shader-based skin deformation. Potential future extensions are also discussed to improve the teaching/learning experience.
View full abstract
-
Jing WANG, Guangda SU
Article type: LETTER
Subject area: Image Processing
2012 Volume E95.D Issue 5 Pages
1332-1335
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Sparse representation based classification (SRC) has emerged as a new paradigm for solving face recognition problems. Further research found that the main limitation of SRC is the assumption of pixel-accurate alignment between the test image and the training set. A. Wagner used a series of linear programs that iteratively minimize the sparsity of the registration error. In this paper, we propose another face registration method called three-point positioning method. Experiments show that our proposed method achieves better performance.
View full abstract
-
Chien-Sheng CHEN, Jium-Ming LIN, Wen-Hsiung LIU, Ching-Lung CHI
Article type: LETTER
Subject area: Signal Processing
2012 Volume E95.D Issue 5 Pages
1336-1340
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
To achieve more accurate measurements of the mobile station (MS) location, it is possible to integrate many kinds of measurements. In this paper we proposed several simpler methods that utilized time of arrival (TOA) at three base stations (BSs) and the angle of arrival (AOA) information at the serving BS to give location estimation of the MS in non-line-of-sight (NLOS) environments. From the viewpoint of geometric approach, for each a TOA value measured at any BS, one can generate a circle. Rather than applying the nonlinear circular lines of position (LOP), the proposed methods are much easier by using linear LOP to determine the MS. Numerical results demonstrate that the calculation time of using linear LOP is much less than employing circular LOP. Although the location precision of using linear LOP is only reduced slightly. However, the proposed efficient methods by using linear LOP can still provide precise solution of MS location and reduce the computational effort greatly. In addition, the proposed methods with less effort can mitigate the NLOS effect, simply by applying the weighted sum of the intersections between different linear LOP and the AOA line, without requiring priori knowledge of NLOS error statistics. Simulation results show that the proposed methods can always yield superior performance in comparison with Taylor series algorithm (TSA) and the hybrid lines of position algorithm (HLOP).
View full abstract
-
Woosung JUNG, Eunjoo LEE, Chisu WU
Article type: SURVEY PAPER
Subject area: Software Engineering
2012 Volume E95.D Issue 5 Pages
1384-1406
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
This paper presents fundamental concepts, overall process and recent research issues of Mining Software Repositories. The data sources such as source control systems, bug tracking systems or archived communications, data types and techniques used for general MSR problems are also presented. Finally, evaluation approaches, opportunities and challenge issues are given.
View full abstract
-
Yangjie CAO, Hongyang SUN, Depei QIAN, Weiguo WU
Article type: PAPER
Subject area: Fundamentals of Information Systems
2012 Volume E95.D Issue 5 Pages
1407-1416
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
The proliferation of many-core architectures has led to the explosive development of parallel applications using programming models, such as OpenMP, TBB, and Cilk/Cilk++. With increasing number of cores, however, it becomes even harder to efficiently schedule parallel applications on these resources since current many-core runtime systems still lack effective mechanisms to support collaborative scheduling of these applications. In this paper, we study feedback-driven adaptive scheduling based on work stealing, which provides an efficient solution for concurrently executing a set of applications on many-core systems. To dynamically estimate the number of cores desired by each application, a stable feedback-driven adaptive algorithm, called SAWS, is proposed using active workers and the length of active deques, which well captures the runtime characteristics of the applications. Furthermore, a prototype system is built by extending the Cilk runtime system, and the experimental results, which are obtained on a Sun Fire server, show that SAWS has more advantages for scheduling concurrent parallel applications. Specifically, compared with existing algorithms A-Steal and WS-EQUI, SAWS improves the performances by up to 12.43% and 21.32% with respect to mean response time respectively, and 25.78% and 46.98% with respect to processor utilization, respectively.
View full abstract
-
Kuo-Yi CHEN, Chin-Yang LIN, Tien-Yan MA, Ting-Wei HOU
Article type: PAPER
Subject area: Software System
2012 Volume E95.D Issue 5 Pages
1417-1426
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
With more digital home appliances and network devices having OSGi as the software management platform, the power-saving capability of the OSGi platform has become a critical issue. This paper is aimed at improving the power-efficiency of the OSGi platform, i.e. reducing the energy consumption with minimum performance degradation. The key to this study is an efficient power-saving technique which exploits the runtime information already available in a Java virtual machine (JVM), the base software of the OSGi platform, to best determine the timing of performing DVFS (Dynamic Voltage and Frequency Scaling). This, technically, involves a phase detection scheme that identifies the memory phase of the OSGi-enabled device/server in a correct and almost effortless way. The overhead of the power-saving procedure is thus minimized, and the system performance is well maintained. We have implemented and evaluated the proposed power-saving approach on an OSGi server, where the Apache Felix OSGi implementation and the DaCapo benchmarks were applied. The results show that this approach can achieve real power-efficiency for the OSGi platform, in which the power consumption is significantly reduced and the performance remains highly competitive, compared with the other power-saving techniques.
View full abstract
-
Chang-Sup PARK, Jun Pyo PARK, Yon Dohn CHUNG
Article type: PAPER
Subject area: Data Engineering, Web Information Systems
2012 Volume E95.D Issue 5 Pages
1427-1435
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Wireless broadcasting of heterogeneous XML data has become popular in many applications, where energy-efficient processing of user queries at the mobile client is a critical issue. This paper proposes a new index structure for wireless stream of heterogeneous XML data to enhance tuning time performance in processing path queries on the stream. The index called
PrefixSummary stores for each location path in the XML data the address of a bucket in the stream which contains an XML node satisfying the location path and appearing first in the stream. We present algorithms to generate broadcast stream with the proposed index and to process a path query on the stream efficiently by exploiting the index. We also suggest a replication scheme of
PrefixSummary within a broadcast cycle to reduce latency in query processing. By analysis and experiment we show the proposed
PrefixSummary approach can reduce tuning time for processing path queries significantly while it can also achieve reasonable access time performance by means of replication of the index over the broadcast stream.
View full abstract
-
Xuemin ZHAO, Yuhong GUO, Jian LIU, Yonghong YAN, Qiang FU
Article type: PAPER
Subject area: Information Network
2012 Volume E95.D Issue 5 Pages
1436-1445
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
In this paper, a logarithmic adaptive quantization projection (LAQP) algorithm for digital watermarking is proposed. Conventional quantization index modulation uses a fixed quantization step in the watermarking embedding procedure, which leads to poor fidelity. Moreover, the conventional methods are sensitive to value-metric scaling attack. The LAQP method combines the quantization projection scheme with a perceptual model. In comparison to some conventional quantization methods with a perceptual model, the LAQP only needs to calculate the perceptual model in the embedding procedure, avoiding the decoding errors introduced by the difference of the perceptual model used in the embedding and decoding procedure. Experimental results show that the proposed watermarking scheme keeps a better fidelity and is robust against the common signal processing attack. More importantly, the proposed scheme is invariant to value-metric scaling attack.
View full abstract
-
HyunYong LEE, Masahiro YOSHIDA, Akihiro NAKAO
Article type: PAPER
Subject area: Information Network
2012 Volume E95.D Issue 5 Pages
1446-1453
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Despite its great success, BitTorrent suffers from the content unavailability problem where peers cannot complete their content downloads due to some missing chunks, which is caused by a shortage of seeders who hold the content in its entirety. The multi-swarm collaboration approach is a natural choice for improving content availability, since content unavailability cannot be overcome by one swarm easily. Most existing multi-swarm collaboration approaches, however, suffer from content-related limitations, which limit their application scopes. In this paper, we introduce a new kind of multi-swarm collaboration utilizing a swarm as temporal storage. In a nutshell, the collaborating swarms cache some chunks of each other that are likely to be unavailable before the content unavailability happens and share the cached chunks when the content unavailability happens. Our approach enables any swarms to collaborate with each other without the content-related limitations. Simulation results show that our approach increases the number of download completions by over 50% (26%) compared to normal BitTorrent (existing bundling approach) with low overhead. In addition, our approach shows around 30% improved download completion time compared to the existing bundling approach. The results also show that our approach enables the peers participating in our approach to enjoy better performance than other peers, which can be a peer incentive.
View full abstract
-
Kai LI, Yanmeng GUO, Qiang FU, Junfeng LI, Yonghong YAN
Article type: PAPER
Subject area: Speech and Hearing
2012 Volume E95.D Issue 5 Pages
1454-1464
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Traditional two-microphone noise reduction algorithms to deal with highly nonstationary directional noises generally use the direction of arrival or phase difference information. The performance of these algorithms deteriorate when diffuse noises coexist with nonstationary directional noises in realistic adverse environments. In this paper, we present a two-channel noise reduction algorithm using a spatial information-based speech estimator and a spatial-information-controlled soft-decision noise estimator to improve the noise reduction performance in realistic non-stationary noisy environments. A target presence probability estimator based on Bayes rules using both phase difference and magnitude squared coherence is proposed for soft-decision of noise estimation, so that they can share complementary advantages when both directional noises and diffuse noises are present. Performances of the proposed two-microphone noise reduction algorithm are evaluated by noise reduction, log-spectral distance (LSD) and word recognition rate (WRR) of a distant-talking ASR system in a real room's noisy environment. Experimental results show that the proposed algorithm achieves better noises suppression without further distorting the desired signal components over the comparative dual-channel noise reduction algorithms.
View full abstract
-
Takanobu OBA, Takaaki HORI, Atsushi NAKAMURA, Akinori ITO
Article type: PAPER
Subject area: Speech and Hearing
2012 Volume E95.D Issue 5 Pages
1465-1474
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
This paper describes a technique for overcoming the model shrinkage problem in automatic speech recognition (ASR), which allows application developers and users to control the model size with less degradation of accuracy. Recently, models for ASR systems tend to be large and this can constitute a bottleneck for developers and users without special knowledge of ASR with respect to introducing the ASR function. Specifically, discriminative language models (DLMs) are usually designed in a high-dimensional parameter space, although DLMs have gained increasing attention as an approach for improving recognition accuracy. Our proposed method can be applied to linear models including DLMs, in which the score of an input sample is given by the inner product of its features and the model parameters, but our proposed method can shrink models in an easy computation by obtaining simple statistics, which are square sums of feature values appearing in a data set. Our experimental results show that our proposed method can shrink a DLM with little degradation in accuracy and perform properly whether or not the data for obtaining the statistics are the same as the data for training the model.
View full abstract
-
Nitin SINGHAL, Jin Woo YOO, Ho Yeol CHOI, In Kyu PARK
Article type: PAPER
Subject area: Image Processing and Video Processing
2012 Volume E95.D Issue 5 Pages
1475-1484
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
In this paper, we analyze the key factors underlying the implementation, evaluation, and optimization of image processing and computer vision algorithms on embedded GPU using OpenGL ES 2.0 shader model. First, we present the characteristics of the embedded GPU and its inherent advantage when compared to embedded CPU. Additionally, we propose techniques to achieve increased performance with optimized shader design. To show the effectiveness of the proposed techniques, we employ cartoon-style non-photorealistic rendering (NPR), speeded-up robust feature (SURF) detection, and stereo matching as our example algorithms. Performance is evaluated in terms of the execution time and speed-up achieved in comparison with the implementation on embedded CPU.
View full abstract
-
Jong-Min LEE, Whoi-Yul KIM
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2012 Volume E95.D Issue 5 Pages
1485-1493
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Determining the rotation angle between two images is essential when comparing images that may include rotational variation. While there are three representative methods that utilize the phases of Zernike moments (ZMs) to estimate rotation angles, very little work has been done to compare the performances of these methods. In this paper, we compare the performances of these three methods and propose a new, angular radial transform (ART)-based method. Our method extends Revaud et al.'s method [1] and uses the phase of angular radial transform coefficients instead of ZMs. We show that our proposed method outperforms the ZM-based method using the MPEG-7 shape dataset when computation times are compared or in terms of the root mean square error vs.
coverage.
View full abstract
-
Wei ZHOU, Alireza AHRARY, Sei-ichiro KAMATA
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2012 Volume E95.D Issue 5 Pages
1494-1505
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
In this paper, we propose a novel approach for presenting the local features of digital image using 1D Local Patterns by Multi-Scans (1DLPMS). We also consider the extentions and simplifications of the proposed approach into facial images analysis. The proposed approach consists of three steps. At the first step, the gray values of pixels in image are represented as a vector giving the local neighborhood intensity distrubutions of the pixels. Then, multi-scans are applied to capture different spatial information on the image with advantage of less computation than other traditional ways, such as Local Binary Patterns (LBP). The second step is encoding the local features based on different encoding rules using 1D local patterns. This transformation is expected to be less sensitive to illumination variations besides preserving the appearance of images embedded in the original gray scale. At the final step, Grouped 1D Local Patterns by Multi-Scans (G1DLPMS) is applied to make the proposed approach computationally simpler and easy to extend. Next, we further formulate boosted algorithm to extract the most discriminant local features. The evaluated results demonstrate that the proposed approach outperforms the conventional approaches in terms of accuracy in applications of face recognition, gender estimation and facial expression.
View full abstract
-
Kazuhiro TOKUNAGA, Nobuyuki KAWABATA, Tetsuo FURUKAWA
Article type: PAPER
Subject area: Biocybernetics, Neurocomputing
2012 Volume E95.D Issue 5 Pages
1506-1518
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
We propose a novel modular network called the Self-Evolving Modular Network (SEEM). The SEEM has a modular network architecture with a graph structure and these following advantages: (1) new modules are added incrementally to allow the network to adapt in a self-organizing manner, and (2) graph's paths are formed based on the relationships between the models represented by modules. The SEEM is expected to be applicable to evolving functions of an autonomous robot in a self-organizing manner through interaction with the robot's environment and categorizing large-scale information. This paper presents the architecture and an algorithm for the SEEM. Moreover, performance characteristic and effectiveness of the network are shown by simulations using cubic functions and a set of 3D-objects.
View full abstract
-
Chaochao FENG, Zhonghai LU, Axel JANTSCH, Minxuan ZHANG
Article type: LETTER
Subject area: Computer System
2012 Volume E95.D Issue 5 Pages
1519-1522
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
In this paper, we propose a 1-cycle high-performance 3D bufferless router with a 3-stage permutation network. The proposed router utilizes the 3-stage permutation network instead of the serialized switch allocator and 7×7 crossbar to achieve the frequency of 1.25GHz in TSMC 65nm technology. Compared with the other two 3D bufferless routers, the proposed router occupies less area and consumes less power consumption. Simulation results under both synthetic and application workloads illustrate that the proposed router achieves less average packet latency than the other two 3D bufferless routers.
View full abstract
-
Hwan Sik YUN, Kiho CHO, Nam Soo KIM
Article type: LETTER
Subject area: Information Network
2012 Volume E95.D Issue 5 Pages
1523-1526
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Acoustic data transmission is a technique which embeds data in a sound wave imperceptibly and detects it at a receiver. The data are embedded in an original audio signal and transmitted through the air by playing back the data-embedded audio using a loudspeaker. At the receiver, the data are extracted from the received audio signal captured by a microphone. In our previous work, we proposed an acoustic data transmission system designed based on phase modification of the modulated complex lapped transform (MCLT) coefficients. In this paper, we propose the spectral magnitude adjustment (SMA) technique which not only enhances the quality of the data-embedded audio signal but also improves the transmission performance of the system.
View full abstract
-
Xiaodong DENG, Mengtian RONG, Tao LIU
Article type: LETTER
Subject area: Information Network
2012 Volume E95.D Issue 5 Pages
1527-1530
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
As RFID technology is being more widely adopted, it is fairly common to read mobile tags using RFID systems, such as packages on conveyer belt and unit loads on pallet jack or forklift truck. In RFID systems, multiple tags use a shared medium for communicating with a reader. It is quite possible that tags will exit the reading area without being read, which results in tag leaking. In this letter, a reliable tag anti-collision algorithm for mobile tags is proposed. It reliably estimates the expectation of the number of tags arriving during a time slot when new tags continually enter the reader's reading area and no tag leaves without being read. In addition, it gives priority to tags that arrived early among read cycles and applies the expectation of the number of tags arriving during a time slot to the determination of the number of slots in the initial inventory round of the next read cycle. Simulation results show that the reliability of the proposed algorithm is close to that of DFSA algorithm when the expectation of the number of tags entering the reading area during a time slot is a given, and is better than that of DFSA algorithm when the number of time slots in the initial inventory round of next read cycle is set to 1 assuming that the number of tags arriving during a time slot follows Poisson distribution.
View full abstract
-
Jeonghun YOON, Dae-Won KIM
Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2012 Volume E95.D Issue 5 Pages
1531-1535
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Classification based on predictive association rules (CPAR) is a widely used associative classification method. Despite its efficiency, the analysis results obtained by CPAR will be influenced by missing values in the data sets, and thus it is not always possible to correctly analyze the classification results. In this letter, we improve CPAR to deal with the problem of missing data. The effectiveness of the proposed method is demonstrated using various classification examples.
View full abstract
-
Shi-Ze GUO, Zhe-Ming LU, Guang-Yu KANG, Zhe CHEN, Hao LUO
Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2012 Volume E95.D Issue 5 Pages
1536-1538
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Small-world is a common property existing in many real-life social, technological and biological networks. Small-world networks distinguish themselves from others by their high clustering coefficient and short average path length. In the past dozen years, many probabilistic small-world networks and some deterministic small-world networks have been proposed utilizing various mechanisms. In this Letter, we propose a new deterministic small-world network model by first constructing a binary-tree structure and then adding links between each pair of brother nodes and links between each grandfather node and its four grandson nodes. Furthermore, we give the analytic solutions to several topological characteristics, which shows that the proposed model is a small-world network.
View full abstract
-
Kwanho KIM, Jae-Yoon JUNG, Jonghun PARK
Article type: LETTER
Subject area: Office Information Systems, e-Business Modeling
2012 Volume E95.D Issue 5 Pages
1539-1542
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Information diffusion analysis in social networks is of significance since it enables us to deeply understand dynamic social interactions among users. In this paper, we introduce approaches to discovering information diffusion process in social networks based on process mining. Process mining techniques are applied from three perspectives: social network analysis, process discovery and community recognition. We then present experimental results by using a real-life social network data. The proposed techniques are expected to employ as new analytical tools in online social networks such as blog and wikis for company marketers, politicians, news reporters and online writers.
View full abstract
-
Jin Soo SEO
Article type: LETTER
Subject area: Speech and Hearing
2012 Volume E95.D Issue 5 Pages
1543-1546
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Speaker change detection involves the identification of the time indices of an audio stream, where the identity of the speaker changes. This paper proposes novel measures for speaker change detection over the centroid model, which divides the feature space into non-overlapping clusters for effective speaker-change comparison. The centroid model is a computationally-efficient variant of the widely-used mixture-distribution based background models for speaker recognition. Experiments on both synthetic and real-world data were performed; the results show that the proposed approach yields promising results compared with the conventional statistical measures.
View full abstract
-
Cagatay KARABAT, Hakan ERDOGAN
Article type: LETTER
Subject area: Image Recognition, Computer Vision
2012 Volume E95.D Issue 5 Pages
1547-1551
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Face image hashing is an emerging method used in biometric verification systems. In this paper, we propose a novel face image hashing method based on a new technique called discriminative projection selection. We apply the Fisher criterion for selecting the rows of a random projection matrix in a user-dependent fashion. Moreover, another contribution of this paper is to employ a bimodal Gaussian mixture model at the quantization step. Our simulation results on three different databases demonstrate that the proposed method has superior performance in comparison to previously proposed random projection based methods.
View full abstract
-
Chenbo SHI, Guijin WANG, Xiaokang PEI, Bei HE, Xinggang LIN
Article type: LETTER
Subject area: Image Recognition, Computer Vision
2012 Volume E95.D Issue 5 Pages
1552-1555
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
In this paper, we propose an interleaving updating framework of disparity and confidence map (IUFDCM) for stereo matching to eliminate the redundant and interfere information from unreliable pixels. Compared with other propagation algorithms using matching cost as messages, IUFDCM updates the disparity map and the confidence map in an interleaving manner instead. Based on the Confidence-based Support Window (CSW), disparity map is updated adaptively to alleviate the effect of input parameters. The reassignment for unreliable pixels with larger probability keeps ground truth depending on reliable messages. Consequently, the confidence map is updated according to the previous disparity map and the left-right consistency. The top ranks on Middlebury benchmark corresponding to different error thresholds demonstrate that our algorithm is competitive with the best stereo matching algorithms at present.
View full abstract
-
Hong BAO, De XU, Yingjun TANG
Article type: LETTER
Subject area: Image Recognition, Computer Vision
2012 Volume E95.D Issue 5 Pages
1556-1559
Published: May 01, 2012
Released on J-STAGE: May 01, 2012
JOURNAL
FREE ACCESS
Visually saliency detection provides an alternative methodology to image description in many applications such as adaptive content delivery and image retrieval. One of the main aims of visual attention in computer vision is to detect and segment the salient regions in an image. In this paper, we employ matrix decomposition to detect salient object in nature images. To efficiently eliminate high contrast noise regions in the background, we integrate global context information into saliency detection. Therefore, the most salient region can be easily selected as the one which is globally most isolated. The proposed approach intrinsically provides an alternative methodology to model attention with low implementation complexity. Experiments show that our approach achieves much better performance than that from the existing state-of-art methods.
View full abstract