research-article

Linguistic summarization using a weighted N-gram language model based on the similarity of time-series data

Authors: Kasumi Aoki, Ichiro KobayashiAuthors Info & Claims

2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)

Pages 595 - 601

https://doi.org/10.1109/FUZZ-IEEE.2016.7737741

Published: 01 July 2016 Publication History

Abstract

This paper describes a method to verbalize the trends of time-series data. As an example of time-series data, we use the price of Nikkei stock average and develop a method to generate natural language sentences which describe how the stock price goes in the market. As the basic idea for making linguistic descriptions of the stock price trends, we firstly classify all the time-series data including a newly observed time-series data, i.e., the target to be verbalized, by means of spectral clustering employing Dynamic Time Warping distance as its similarity metric. Secondly, a bi-gram language model for the newly observed data is built based on the weighted bi-gram language models of the other time-series data classified in the same cluster. The weights for the bi-gram model of the target data from other time-series data are decided based on the similarity between the target data and the other data in the same cluster. Lastly, linguistic summarization for the target data is generated by finding the most likely combination of words by means of dynamic programming, employing the weighted bi-gram model. Through the experiments under the conditions of various cluster numbers in spectral clustering, we have confirmed that natural language sentences, which properly describe the trends of the stock price, are generated by our method.

References

[1]

Gkatzia, D., Hastie, H. and Lemon, O., Finding middle ground? Multi-objective Natural Language Generation from time-series data, the 14th European Association for Computational Linguistics, pp. 210–214, 2014.

[2]

H. Banaee, M. U. Ahmed, A. Loutfi, A Framework for Automatic Text Generation of Trends in Physiological Time Series Data, IEEE Int. Conf. on Systems, Man, and Cybernetics, pp. 3876–3881, 2013.

[3]

Mizuki Kobayashi, Ichiro Kobayashi, Hideki Asoh, and Sergio Guadrrama, A Probabilistic Approach to Text Generation of Human Motions extracted from Kinect Videos, the International Conference on Computer Science and Applications (ICCSA‘13), (World Congress on Engineering and Computer Science 2013), San Francisco, 2013.

[4]

Kasumi Aoki, Ichiro Kobayashi, “An Approach to Text Generation for Describing Stock Price Trends using Language Models (in Japanese)”, 21th Annual Meeting of Natural Language Processing, 2015.

[5]

Wataru Takano, Yoshihiko Nakamura, “Bigram-based natural language model and statistical motion symbol model for scalable language of humanoid robots”, ICRA 2012: 1232–1237, 2012.

[6]

Yusuke Goutsu, Wataru Takano, Yoshihiko Nakamura, “Generating sentence from motion by using large-scale and high-order N-grams”, IROS 2013: 151–156, 2013.

[7]

Priscilla Moraes, Gabriel Sina, Kathleem McCoy. and Sandra Carberry, Generating Summaries of Line Graphs, the 8th International Natural Language Generation Conference, Pages 95–98, 2014.

[8]

AINaymat, G., Chawla, S., & Taheri, J., SparseDTW A Novel Approach to Speed up Dynamic Time Warping, 2012.

[9]

Stan Salvador & Philip Chan, FastDTW Toward Accurate Dynamic Time Warping in Linear Time and Space. KDD Workshop on Mining Temporal and Sequential Data, pp. 70–80, 2004.

[10]

Keogh, E.; Ratanamahatana, C. A. “Exact indexing of dynamic time warping”. Knowledge and Information Systems 7 (3): 358–386. http://10.1007/s10115–004-0154-9. 2005.

Digital Library

[11]

Lemire, D. “Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound”. Pattern Recognition 42 (9): 2169–2180. http://10.1016j.patcog.2008.11.030, 2009.

Digital Library

[12]

Wang, Xiaoyue; et al. “Experimental comparison of representation methods and distance measures for time series data”. Data Mining and Knowledge Discovery 2010: pp, 1–35, 2010.

[13]

Z. Wu and R. Leahy, An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 11, pp. 1, 101–1, 113, Nov. 1993.

[14]

A. Barbu, A. Bridge, Z. Burchill, D. Coroian, S. Dickinson, S. Fidler, A. Michaux, S. Mussman, S. Narayanaswamy, D. Salvi, L. Schmidt, J. Shangguan, J. Siskind, J. Waggoner, S. Wang, J. Wei, Y. Yin, and Z. Zhang. Video In Sentences Out, Conference on Uncertainty in Artificial Intelligence (UAI), 2012.

[15]

J. Kacprzyk, A. Wilbik, S. Zadrozny, Linguistic summarization of time series using a fuzzy quantifier driven aggregation, Fuzzy Sets Syst. 159, pp. 1485–1499, 2008.

Digital Library

[16]

J. Kacprzyk, A. Wilbik, S. Zadrozny, Linguistic summarization of trends:a fuzzy logic based approach, in: Proc. 11th Internat. Conf. on Information Processing and Management of Uncertainty in Knowledge-based Systems, Paris, France, July 2–7, pp. 2166–2172, 2006.

[17]

J. Kacprzyk, A. Wilbik, S. Zadrozny, Linguistic summaries of time series via a quantifier based aggregation using the Sugeno integral, in: Proc. of 2006 IEEE World Congress on Computational Intelligence, Vancouver, BC, Canada, IEEE Press, New York, pp. 3610–3616, July, 2006.

[18]

J. Kacprzyk, A. Wilbik, S. Zadrozny On some types of linguistic summaries of time series Proc. of the Third International IEEE Conf. on Intelligent Systems, IEEE Press, New York, London, UK, pp. 373–378, 2006.

[19]

Vilem Novak, Linguistic characterization of time series., Fuzzy Sets and Systems 285, pp. 52–72, 2016.

Digital Library

[20]

Ulrike von Luxburg, Max Planck Institute for Biological Cybernetics Spr, spemannstr. 38, 72076 Tubinge, Germaniy

“A Tutorial on Spectral Clustering”, Statics and Computing 17 (4), 2007.

[21]

Inderjit Dhillon, Yuqiang Guan, and Brian Kulis, A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts, In The University of Texas at Austin, Department of Computer Science. Technical Report TR-04–25, 2005.

[22]

Cung, B. et al., “Spectral Clustering: An empirical study of Approximation Algorithms and its Application to the Attrition Problem.” arXiv preprint arXiv:1211.3444, 2012.

[23]

C. Fowlkes, S. Belongie, F. Chung, and J. Malik. Spectral grouping using the nystrom method. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26 (2): 214–225, 2004.

Digital Library

[24]

L. Huang, D. Yan, M.I. Jordan, and N. Taft. Spectral clustering with perturbed data. Advances in Neural Information Processing Systems (NIPS), pp. 705–712, 2008.

[25]

L. Wang, C. Leckie, K. Ramamohanarao, and J. Bezdek. Approximate spectral clustering. Advances in Knowledge Discovery and Data Mining, pages 134–146, 2009.

[26]

Ding Hui, Trajcevski Goce, Scheuermann Peter, Wang, Xiaoyue, Keogh Eamonn, “Querying and mining of time series data:experimental comparison of representations and distance measures”. Proc. VLDB Endow 1 (2): 1542–1552, 2008.

Cited By

Qian XKoh EDu FKim SChan JRossi RMalik SLee T(2021)Generating Accurate Caption Units for Figure CaptioningProceedings of the Web Conference 202110.1145/3442381.3449923(2792-2804)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449923

Recommendations

Multi class-based n-gram language model for new words using web data
ROCOM'11/MUSP'11: Proceedings of the 11th WSEAS international conference on robotics, control and manufacturing technology, and 11th WSEAS international conference on Multimedia systems & signal processing

Out-of-vocabulary (OOV) words cause a serious problem for automatic speech recognition (ASR) system. Not only it will be miss-recognized as an in-vocabulary word with similar phonetics, but the error will also affect nearby words to make errors. ...
Topic-Dependent-Class-Based $n$ -Gram Language Model

A topic-dependent-class (TDC)-based $n$-gram language model (LM) is a topic-based LM that employs a semantic extraction method to reveal latent topic information extracted from noun-noun relations. A topic of a given word sequence is decided on the ...
Trend-weighted fuzzy time-series model for TAIEX forecasting
ICONIP'06: Proceedings of the 13th international conference on Neural information processing - Volume Part III

Time-series models have been used to make reasonably accurate predictions in the areas of weather forecasting, academic enrolment and stock price etc... We propose a methodology which incorporates trend-weighting into the fuzzy time-series models ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)

2526 pages

Copyright © 2016.

Publisher

IEEE Press

Publication History

Published: 01 July 2016

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qian XKoh EDu FKim SChan JRossi RMalik SLee T(2021)Generating Accurate Caption Units for Figure CaptioningProceedings of the Web Conference 202110.1145/3442381.3449923(2792-2804)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449923

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents