Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media (such as blog articles, forum posts, product reviews, and tweets). This has led to an increasing demand for powerful software tools to help people manage and analyze vast amounts of text data effectively and efficiently. Unlike data generated by a computer system or sensors, text data are usually generated directly by humans, and capture semantically rich content. As such, text data are especially valuable for discovering knowledge about human opinions and preferences, in addition to many other kinds of knowledge that we encode in text. In contrast to structured data, which conform to well-defined schemas (thus are relatively easy for computers to handle), text has less explicit structure, requiring computer processing toward understanding of the content encoded in text. The current technology of natural language processing has not yet reached a point to enable a computer to precisely understand natural language text, but a wide range of statistical and heuristic approaches to management and analysis of text data have been developed over the past few decades. They are usually very robust and can be applied to analyze and manage text data in any natural language, and about any topic.
This book provides a systematic introduction to many of these approaches, with an emphasis on covering the most useful knowledge and skills required to build a variety of practically useful text information systems. Because humans can understand natural languages far better than computers can, effective involvement of humans in a text information system is generally needed and text information systems often serve as intelligent assistants for humans. Depending on how a text information system collaborates with humans, we distinguish two kinds of text information systems. The first is information retrieval systems which include search engines and recommender systems; they assist users in finding from a large collection of text data the most relevant text data that are actually needed for solving a specific application problem, thus effecively turning big raw text data into much smaller relevant text data that can be more easily processed by humans. The second is text mining application systems; they can assist users in analyzing patterns in text data to extract and discover useful actionable knowledge directly useful for task completion or decision making, thus providing more direct task support for users. This book covers the major concepts, techniques, and ideas in information retrieval and text data mining from a practical viewpoint, and includes many hands-on exercises designed with a companion software toolkit (i.e., MeTA) to help readers learn how to apply techniques of information retrieval and text mining to real-world text data and how to experiment with and improve some of the algorithms for interesting application tasks. This book can be used as a textbook for computer science undergraduates and graduates, library and information scientists, or as a reference book for practitioners working on relevant problems in managing and analyzing text data.
Chapters
- C. C. Aggarwal. 2015. Data Mining - The Textbook. Springer. DOI: 10.1007/978-3-319-14142-8.Google Scholar
- C. C. Aggarwal and C. Zhai, editors. 2012. Mining Text Data. Springer. DOI: 10.1007/978-1-4614-3223-4.Google Scholar
- J. Allen. 1995. Natural Language Understanding. 2nd ed. Benjamin-Cummings Publishing Co., Inc., Redwood City, CA.Google Scholar
- G. Amati and C. J. Van Rijsbergen. October 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst., 20(4):357–389. DOI: 10.1145/582415.582416.Google ScholarDigital Library
- A. U. Asuncion, M. Welling, P. Smyth, and Y. W. Teh. 2009. On smoothing and inference for topic models. In UAI 2009, Proc. of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, June 18-21, 2009, pp. 27–34.Google Scholar
- R. A. Baeza-Yates and B. A. Ribeiro-Neto. 2011. Modern Information Retrieval - the concepts and technology behind search. 2nd ed. Pearson Education Ltd., Harlow, UK. http://www.mir2ed.org/.Google Scholar
- Y. Bar-Hillel, The Present Status of Automatic Translation of Languages, in Advances in Computers, vol. 1 (1960), pp. 91–163.Google Scholar
- R. Belew. 2008. Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW. Cambridge University Press.Google Scholar
- N. J. Belkin and W. B. Croft. 1992. Information filtering and information retrieval: Two sides of the same coin? Commun. ACM, 35(12):29–38. DOI: 10.1145/138859.138861.Google ScholarDigital Library
- C. M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ.Google Scholar
- D. M. Blei, A. Y. Ng, and M. I. Jordan. March 2003. Latent Dirichlet Allocation. J. of Mach. Learn. Res., 3:993–1022.Google Scholar
- J. S. Breese, D. Heckerman, and C. Kadie. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In Proc. of the Fourteenth Conference on Uncertainty in Artificial Intelligence, UAI'98, Morgan Kaufmann Publishers Inc. pp. 43–52, San Francisco, CA. http://dl.acm.org/citation.cfm?id=2074094.2074100.Google Scholar
- P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. Della Pietra, and J. C. Lai. 1992. Class-based N-gram Models of Natural Language. Comput. Linguist., 18(4):467–479.Google Scholar
- C. Buckley. 1994. Automatic query expansion using smart: Trec 3. In Proc. of The third Text REtrieval Conference (TREC-3, pp. 69–80.Google Scholar
- S. Büttcher, C. Clarke, and G. V. Cormack. 2010. Information Retrieval: Implementing and Evaluating Search Engines. The MIT Press.Google ScholarDigital Library
- F. Cacheda, V. Carneiro, D. Fernández, and V. Formoso. 2011. Comparison of collaborative filtering algorithms: Limitations of current techniques and proposals for scalable, high-performance recommender systems. ACM Trans. Web, 5(1):2:1–2:33. DOI: 10.1145/1921591.1921593.Google ScholarDigital Library
- C. Campbell and Y. Ying. 2011. Learning with Support Vector Machines. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers. DOI: 10.2200/S00324ED1V01Y201102AIM010.Google Scholar
- J. Carbonell and J. Goldstein. 1998. The Use of MMR, Diversity-based Reranking for Reordering Documents and Producing Summaries. In Proc. of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '98,ACM, pp. 335–336, New York. DOI: 10.1145/290941.291025Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. 2011. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol., 2(3):27:1–27:27.Google ScholarDigital Library
- J. Chang, S. Gerrish, C. Wang, J. L. Boyd-graber, and D. M. Blei. 2009. Reading Tea Leaves: How Humans Interpret Topic Models. In Y. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems, Curran Associates, Inc. 22, pp. 288–296.Google Scholar
- K. W. Church and P. Hanks. 1990. Word association norms, mutual information, and lexicography. Comput. Linguist., 16(1):22–29. http://dl.acm.org/citation.cfm?id=89086.89095.Google Scholar
- T. Cover and J. Thomas. 1991. Elements of Information Theory. New York: Wiley. DOI: 10.1002/047174882XGoogle ScholarCross Ref
- B. Croft, D. Metzler, and T. Strohman. 2009. Search Engines: Information Retrieval in Practice, 1st ed., Addison-Wesley Publishing Company.Google ScholarDigital Library
- D. Das and A. F. T. Martins. 2007. A Survey on Automatic Text Summarization. Technical report, Literature Survey for the Language and Statistics II course at Carnegie Mellon University.Google Scholar
- R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. J. Mach. Learn. Res., 9:1871–1874.Google ScholarDigital Library
- H. Fang, T. Tao, and C. Zhai. 2004. A formal study of information retrieval heuristics. In Proc. of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '04, ACM, pp. 49–56, New York. DOI: 10.1145/1008992.1009004.Google ScholarDigital Library
- H. Fang, T. Tao, and C. Zhai. April 2011. Diagnostic evaluation of information retrieval models. ACM Trans. Inf. Syst., 29(2):7:1–7:42. DOI: 10.1145/1961209.1961210.Google ScholarDigital Library
- R. Feldman and J. Sanger. 2007. The Text Mining Handbook - Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.Google Scholar
- E. A. Fox, M. A. Gon„alves, and R. Shen. 2012. Theoretical Foundations for Digital Libraries: The 5S (Societies, Scenarios, Spaces, Structures, Streams) Approach. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers. DOI: 10.2200/S00434ED1V01Y201207ICR022.Google Scholar
- W. B. Frakes and R. A. Baeza-Yates, editors. 1992. Information Retrieval: Data Structures & Algorithms. Prentice-Hall,Google ScholarDigital Library
- K. Ganesan, C. Zhai, and J. Han. 2010. Opinosis: A graph-based approach to abstractive summarization of highly redundant opinions. In Proc. of the 23rd International Conference on Computational Linguistics, COLING '10, Association for Computational Linguistics, pp. 340–348, Stroudsburg, PA.Google Scholar
- K. Ganesan, C. Zhai, and E. Viegas. 2012. Micropinion generation: an unsupervised approach to generating ultra-concise summaries of opinions. In Proc. of the 21st World Wide Web Conference 2012, WWW 2012, Lyon, France, April 16-20, 2012, pages 869–878. DOI: 10.1145/2187836.2187954Google ScholarDigital Library
- J. Gantz, and D. Reinsel. 2012. The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East, IDC Report, December, 2012.Google Scholar
- A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 1995. Bayesian Data Analysis. Chapman & Hall.Google Scholar
- S.Ghemawat, H. Gobioff, and S.-T. Leung. 2003. The Google file system. In Proc. of the nineteenth ACM symposium on Operating systems principles (SOSP '03). ACM, New York, 29–43.Google Scholar
- M. A. Gon„alves, E. A. Fox, L. T. Watson, and N. A. Kipp. 2004. Streams, structures, spaces, scenarios, societies (5s): A formal model for digital libraries. ACM Trans. Inf. Syst., 22(2):270–312. DOI: 10.1145/984321.984325.Google ScholarDigital Library
- D. A. Grossman and O. Frieder. Kluwer, 2004. Information Retrieval - Algorithms and Heuristics, Second Edition, vol. 15 of The Kluwer International Series on Information Retrieval. DOI: 10.1007/978-1-4020-3005-5.Google Scholar
- G. Hamerly and C. Elkan. 2003. Learning the k in k-means. In Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, NIPS December 8-13, 2003, Vancouver and Whistler, British Columbia, Canada], pp. 281–288.Google Scholar
- J. Han. 2005. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA.Google Scholar
- D. Harman. 2011. Information Retrieval Evaluation. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers. DOI: < 10.1145/215206.215351Google Scholar
- M. A. Hearst. 2009. Search User Interfaces. 1st ed. Cambridge University Press, New York.Google Scholar
- J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. 2004. Evaluating Collaborative Filtering Recommender Systems. ACM Trans. Inf. Syst., 22(1):5–53. DOI: 10.1145/963770.963772Google ScholarDigital Library
- J. L. Hodges and E. L. Lehmann. 1970. Basic Concepts of Probability and Statistics. Holden Day, San Francisco.Google Scholar
- T. Hofmann. 1999. Probabilistic Latent Semantic Analysis. In Proc. of the Fifteenth Conference on Uncertainty in Artificial Intelligence, UAI'99, Morgan Kaufmann Publishers Inc., pp. 289–296, San Francisco, CA. DOI: 10.1145/312624.312649Google ScholarDigital Library
- A. Huang. 2008. Similarity Measures for Text Document Clustering. In Proc. of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008), Christchurch, New Zealand, pages 49–56.Google Scholar
- F. Jelinek. 1997. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA.Google Scholar
- J. Jiang. 2012. Information extraction from text, In Charu C. Aggarwal and ChengXiang Zhai (Eds.), Mining Text Data, Springer, pp. 11–41.Google Scholar
- S. Jiang and C. Zhai. 2014. Random walks on adjacency graphs for mining lexical relations from big text data. In 2014 IEEE International Conference on Big Data, Big Data 2014, Washington, DC, USA, October 27-30, pages 549–554. DOI: 10.1109/BigData.2014.7004272.Google ScholarCross Ref
- Y. Jo and A. H. Oh. 2011. Aspect and sentiment unification model for online review analysis. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM '11, ACM, pp. 815–824, New York. DOI: 10.1145/1935826.1935932.Google ScholarDigital Library
- T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. 2007. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst., 25(2). DOI: 10.1145/1229179.1229181.Google ScholarDigital Library
- D. Jurafsky and J. H. Martin. 2009. Speech and Language Processing. 2nd ed. Prentice-Hall, Inc., Upper Saddle River, NJ.Google Scholar
- D. Kelly. 2009. Methods for Evaluating Interactive Information Retrieval Systems with Users. Foundations and Trends in Information Retrieval, 3(1-2):1–224. DOI: 10.1561/1500000012Google ScholarDigital Library
- D. Kelly and J. Teevan. 2003. Implicit feedback for inferring user preference: A bibliography. SIGIR Forum, 37(2):18–28. DOI: 10.1145/959258.959260.Google ScholarDigital Library
- H. D. Kim, M. Castellanos, M. Hsu, C. Zhai, T. Rietz, and D. Diermeier. 2013. Mining causal topics in text data: iterative topic modeling with time series feedback. In Proc. of the 22nd ACM international conference on Conference on information and knowledge management, CIKM '13, ACM pages 885–890, New York, NY. DOI: 10.1145/2505515.2505612.Google ScholarDigital Library
- J. M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604–632. DOI: 10.1145/324133.324140.Google ScholarDigital Library
- J. M. Kleinberg. 2002. An impossibility theorem for clustering. In Advances in Neural Information Processing Systems 15 [Neural Information Processing Systems, NIPS 2002, December 9-14, 2002, Vancouver, British Columbia, Canada], pp. 446–453. http://papers.nips.cc/paper/2340-an-impossibility-theorem-for-clustering.Google Scholar
- D. Koller and N. Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning. The MIT Press.Google Scholar
- J. Lafferty and C. Zhai. 2003. Probabilistic relevance models based on document and query generation. In W. Bruce Croft and John Lafferty, editors, Language Modeling and Information Retrieval. Kluwer Academic Publishers. DOI: 10.1007/978-94-017-0171-6\_1Google Scholar
- D. Lin. 1999. Automatic identification of non-compositional phrases. In Proc. of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, ACL '99, Association for Computational Linguistics, pages 317–324, Stroudsburg, PA. DOI: 10.3115/1034678.1034730.Google ScholarDigital Library
- J.Lin and C. Dyer. 2010. Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers. DOI: 10.2200/S00274ED1V01Y201006HLT007.Google Scholar
- Bing Liu. 2012. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers. DOI: 10.2200/S00416ED1V01Y201204HLT016.Google Scholar
- T.-Y. Liu. 2009. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225–331. DOI: 10.1561/1500000016.Google Scholar
- Y. Lv and C. Zhai. 2009. A comparative study of methods for estimating query language models with pseudo feedback. In Proc. of the 18th ACM Conference on Information and Knowledge Management, CIKM '09, ACM, pp. 1895–1898, New York. DOI: 10.1145/1645953.1646259.Google ScholarCross Ref
- Y. Lv and C. Zhai. 2010. Positional relevance model for pseudo-relevance feedback. In Proc. of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '10, ACM, pages 579–586, New York. DOI: 10.1145/1835449.1835546.Google ScholarDigital Library
- Y. Lv and C. Zhai. 2011. Lower-bounding Term Frequency Normalization. In Proc. of the 20th ACM International Conference on Information and Knowledge Management, CIKM '11, pp. 7–16. DOI: 10.1145/2063576.2063584Google ScholarDigital Library
- P. Lyman, H. R. Varian, K. Swearingen, P. Charles, N. Good, L.L. Jordan, and J. Pal. 2003. How much information? http://www2.sims.berkeley.edu/research/projects/how-much-info-2003.Google Scholar
- C. D. Manning and H. Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.Google Scholar
- C. D. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York.Google Scholar
- M. E. Maron and J. L. Kuhns. 1960. On relevance, probabilistic indexing and information retrieval. Journal of the ACM, 7:216–244. DOI: 10.1145/321033.321035Google ScholarDigital Library
- S. Massung and C. Zhai. 2015. SyntacticDiff: Operator-Based Transformation for Comparative Text Mining. In Proc. of the 3rd IEEE International Conference on Big Data, pp. 571–580.Google Scholar
- S. Massung and C. Zhai. 2016. Non-Native Text Analysis: A Survey. The Journal of Natural Language Engineering, 22(2):163–186. DOI: 10.1017/S1351324915000303Google ScholarCross Ref
- S. Massung, C. Zhai, and J.Hockenmaier. 2013. Structural Parse Tree Features for Text Representation. In IEEE Seventh International Conference on Semantic Computing, pp. 9–13. DOI: 10.1109/ICSC.2013.13Google ScholarDigital Library
- J. D. McAuliffe and D. M. Blei. 2008. Supervised topic models. In J.C. Platt, D. Koller, Y. Singer, and S.T. Roweis, eds., Advances in Neural Information Processing Systems 20, pages 121–128. Curran Associates, Inc.Google Scholar
- G. J. McLachlan and T. Krishnan. 2008. The EM algorithm and extensions. 2nd ed. Wiley Series in Probability and Statistics. Hoboken, NJ., Wiley. http://gso.gbv.de/DB=2.1/CMD?ACT=SRCHA&SRT=YOP&IKT=1016&SRT=YOP&IKT=1016&IKT=1016&TRM=ppn+52983362X&sourceid=fbw_bibsonomy. DOI: 10.1002/9780470191613Google Scholar
- Q. Mei. 2009. Contextual text mining. Ph.D. Dissertation, University of Illinois at Urbana-Champaign.Google Scholar
- Q. Mei and C. Zhai. 2006. A mixture model for contextual text mining. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '06, ACM, pp. 649–655, New York. DOI: 10.1145/1150402.1150482.Google ScholarDigital Library
- Q. Mei, D. Xin, H. Cheng, J. Han, and C. Zhai. 2006. Generating semantic annotations for frequent patterns with context analysis. In Proc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '06, ACM, pp. 337–346, New York. DOI: 10.1145/1150402.1150441.Google ScholarDigital Library
- Q. Mei, C. Liu, H. Su, and C. Zhai. 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proc.of the 15th international conference on World Wide Web (WWW '06). ACM. New York, 533–542. DOI: 10.1145/1135777.1135857.Google ScholarDigital Library
- Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. 2007a. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proc. of the 16th International Conference on World Wide Web, WWW '07, ACM, pp. 171–180, New York. DOI: 10.1145/1242572.1242596.Google ScholarCross Ref
- Q. Mei, X. Shen, and C. Zhai. 2007b. Automatic labeling of multinomial topic models. In Proc. of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, August 12-15, 2007, pp. 490–499. DOI: 10.1145/1281192.1281246.Google ScholarDigital Library
- Q. Mei, D. Cai, D. Zhang, and C. Zhai. 2008. Topic modeling with network regularization. In Proceedings of the 17th International Conference on World Wide Web, WWW '08, ACM, pp. 101–110, New York. DOI: 10.1145/1367497.1367512.Google ScholarDigital Library
- T. Mikolov, M. Karafiát, L. Burget, J. Cernocky, and S. Khudanpur. 2010. Recurrent neural network based language model. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, pp. 1045–1048. http://www.isca-speech.org/archive/interspeech_2010/i10_1045.html.Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, NV, pp. 3111–3119.Google Scholar
- T. M. Mitchell. 1997. Machine learning. McGraw Hill Series in Computer Science. McGraw-Hill.Google Scholar
- M.-F. Moens. 2006. Information Extraction: Algorithms and Prospects in a Retrieval Context (The Information Retrieval Series). Springer-Verlag New York, Inc., Secaucus, NJ. DOI: 10.1007/978-1-4020-4993-4.Google ScholarCross Ref
- I. J. Myung. 2003. Tutorial on maximum likelihood estimation. J. Math. Psychol., 47(1):90–100. DOI: 10.1016/S0022-2496(02)00028-7.Google ScholarDigital Library
- A. Nenkova and K. McKeown. 2012. A survey of text summarization techniques. In Charu C. Aggarwal and C. Zhai, eds, Mining Text Data, pp. 43–76. Springer US. DOI: 10.1007/978-1-4614-3223-4_3.Google ScholarCross Ref
- L. Page, S. Brin, R. Motwani, and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf.Google Scholar
- B. Pang and L. Lee. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 2(1-2):1–135. DOI: 10.1561/1500000011Google ScholarDigital Library
- J. M. Ponte and W. B. Croft. 1998. A language modeling approach to information retrieval. In Proc. of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '98, ACM, pp. 275–281, New York, NY. DOI: 10.1145/290941.291008.Google ScholarDigital Library
- J. R. Quinlan. 1986. Induction of Decision Trees. Machine Learning, 1(1):81–106. DOI: 10.1007/BF00116251.Google ScholarCross Ref
- D. R. Radev, H. Jing, M. Styś, and D. Tam. 2004. Centroid-based summarization of multiple documents. Information Processing & Management, 40(6):919––938. DOI: 10.1016/j.ipm.2003.10.006.Google Scholar
- D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. 2009. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP '09, Association for Computational Linguistics, pages 248–256, Stroudsburg, PA.Google ScholarDigital Library
- E. Reiter and R. Dale. 2000. Building Natural Language Generation Systems. Cambridge University Press, New York.Google Scholar
- F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor. 2010. Recommender Systems Handbook. 1st ed. Springer-Verlag New York, Inc. DOI: 10.1007/978-0-387-85820-3Google Scholar
- C. J. Van Rijsbergen. 1979. Information Retrieval. 2nd ed. Butterworth-Heinemann, Newton, MA.Google Scholar
- S. Robertson and K. Sparck Jones. 1976. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129–146.Google ScholarCross Ref
- S. E. Robertson. 1997. Readings in Information Retrieval. In The Probability Ranking Principle in IR, San Francisco, CA, Morgan Kaufmann Publishers Inc. pp. 281–286.Google Scholar
- S. Robertson and H. Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr., 3(4):333–389. DOI: 10.1561/1500000019.Google Scholar
- S. Robertson, H. Zaragoza, and M. Taylor. 2004. Simple BM25 Extension to Multiple Weighted Fields. In Proc. of the Thirteenth ACM International Conference on Information and Knowledge Management, CIKM '04, pp. 42–49. DOI: 10.1145/1031171.1031181Google ScholarDigital Library
- C. Roe. 2012. The growth of unstructured data: what to do with all those zettabytes? http://www.dataversity.net/the-growth-of-unstructured-data-what-are-we-going-to-do-with-all-those-zettabytes/.Google Scholar
- R. Rosenfeld. 2000. Two decades of statistical language modeling: Where do we go from here. In Proceedings of the IEEE.Google ScholarCross Ref
- G. Salton. 1989. Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley.Google Scholar
- G. Salton and M. McGill. 1983. Introduction to Modern Information Retrieval. McGraw-Hill.Google Scholar
- G. Salton, A. Wong, and C. S. Yang. 1975. A vector space model for automatic indexing. Commun. ACM, 18(11):613–620.Google Scholar
- G. Salton and C. Buckley. 1990. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41:288–297.Google ScholarCross Ref
- M. Sanderson. 2010. Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval, 4(4):247–375.Google Scholar
- M. Sanderson and W. B. Croft. 2012. The history of information retrieval research. Proc. of the IEEE, 100(Centennial-Issue):1444–1451, 2012. DOI: 10.1109/JPROC.2012.2189916.Google ScholarCross Ref
- S. Sarawagi. 2008. Information extraction. Found. Trends databases, 1(3):261–377. DOI: 10.1561/1900000003.Google Scholar
- F. Sebastiani. 2002. Machine learning in automated text categorization. ACM Comput. Surv., 34(1):1–47. DOI: 10.1145/505282.505283.Google ScholarDigital Library
- G. Shani and A. Gunawardana. 2011. Evaluating Recommendation Systems. In Recommender Systems Handbook, 2nd ed., pp. 257–297. Springer, New York, NY. DOI: 10.1007/978-0-387-85820-3_8.Google Scholar
- F. Silvestri. 2010. Mining query logs: Turning search usage data into knowledge. Found. Trends Inf. Retr., 4:1–174. DOI: 10.1561/1500000013Google ScholarDigital Library
- A. Singhal, C. Buckley, and Mandar Mitra. 1996. Pivoted document length normalization. In Proc. of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '96,ACM, pp. 21–29, New York. DOI: 10.1145/243199.243206.Google ScholarDigital Library
- N. Smith. 2010. Text-driven forecasting. http://www.cs.cmu.edu/\~nasmith/papers/smith.whitepaper10.pdf.Google Scholar
- Mark D. Smucker, James Allan, and Ben Carterette. 2007. A Comparison of Statistical Significance Tests for Information Retrieval Evaluation. In Proc. of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM '07, ACM, pp. 623–632, New York. DOI: 10.1145/1321440.1321528.Google ScholarDigital Library
- K. Sparck Jones and P. Willett, eds. 1997. Readings in Information Retrieval. San Francisco, CA, Morgan Kaufmann Publishers Inc.Google Scholar
- N. Spirin and J. Han. May 2012. Survey on Web Spam Detection: Principles and Algorithms. SIGKDD Explor. Newsl., 13(2):50–64. DOI: 10.1145/2207243.2207252.Google ScholarDigital Library
- E. Stamatatos. 2009. A Survey of Modern Authorship Attribution Methods. J. Am. Soc. Inf. Sci. Technol., 60(3):538–556. DOI: 10.1002/asi.v60:3Google ScholarCross Ref
- M. Steinbach, G. Karypis, and V. Kumar. 2000. A comparison of document clustering techniques. In KDD Workshop on Text Mining.Google Scholar
- J. Steinberger and K. Jezek. 2009. Evaluation measures for text summarization. Computing and Informatics, 28(2):251–275.Google Scholar
- M. Steyvers and T. Griffiths. 2007. Probabilistic topic models. Handbook of Latent Semantic Analysis, 427(7):424–440.Google Scholar
- Y. Sun and J. Han. 2012. Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers. DOI: 10.2200/S00433ED1V01Y201207DMK005.Google Scholar
- I. Titov and R. McDonald. 2008. Modeling online reviews with multi-grain topic models. In Proc. of the 17th International Conference on World Wide Web, WWW '08, ACM, pp. 111–120, New York. DOI: 10.1145/1367497.1367513.Google ScholarDigital Library
- H. Turtle and W. B. Croft. 1990. Inference networks for document retrieval. In Proc. of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '90, ACM, pp. 1–24, New York. DOI: 10.1145/96749.98006.Google ScholarDigital Library
- Princeton University. 2010. About wordnet. http://wordnet.princeton.edu.Google Scholar
- C. J. van Rijsbergen. 1979. Information Retrieval. Butterworths.Google Scholar
- H. Wang, Yue Lu, and C. Zhai. 2010. Latent Aspect Rating Analysis on Review Text Data: A Rating Regression Approach. In Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '10, ACM, pp. 783–792, New York. DOI: 10.1145/1835804.1835903.Google ScholarDigital Library
- H. Wang, Y. Lu, and C. Zhai. 2011. Latent Aspect Rating Analysis Without Aspect Keyword Supervision. In Proc. of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, ACM, pp. 618–626, New York. DOI: 10.1145/2020408.2020505.Google ScholarDigital Library
- J. Weizenbaum. 1966. ELIZA—A Computer Program for the Study of Natural Language Communication Between Man and Machine, Communications of the ACM 9 (1): 36–45, DOI: 10.1145/265153.365168.Google Scholar
- J. S. Whissell and C. L. A. Clarke. 2013. Effective Measures for Inter-document Similarity. In Proc. of the 22nd ACM International Conference on Conference on Information & Knowledge Management, CIKM '13, ACM, pages 1361––1370, New York. DOI: 10.1145/2505515.2505526.Google ScholarDigital Library
- R. W. White and R. A. Roth. 2009. Exploratory Search: Beyond the Query-Response Paradigm. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers. DOI: < 10.2200/S00174ED1V01Y200901ICR003.Google ScholarDigital Library
- R. W. White, B. Kules, S. M. Drucker, and m.c. schraefel. 2006. Introduction. Commun. ACM, 49(4):36–39. DOI: 10.1145/1121949.1121978.Google ScholarDigital Library
- I. H. Witten, A. Moffat, and T. C. Bell. 1999. Managing Gigabytes (2Nd Ed.): Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers Inc., San Francisco, CA.Google Scholar
- C.F J. Wu. 1983. On the convergence properties of the EM algorithm. Ann. of stat., 95–103.Google Scholar
- J. Xu and W. B. Croft. 1996. Query expansion using local and global document analysis. In Proc. of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '96, ACM, pp. 4–11, New York. DOI: 10.1145/243199.243202.Google ScholarDigital Library
- Y. Yang. 1999. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1:67–88.Google ScholarDigital Library
- C. Zhai. 1997. Exploiting context to identify lexical atoms—a statistical view of linguistic context. In Proc. of the International and Interdisciplinary Conference on Modelling and Using Context (CONTEXT-97), pages 119–129. Rio de Janeiro, Brazil.Google Scholar
- C. Zhai. 2008. Statistical Language Models for Information Retrieval. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers. DOI: 10.2200/S00158ED1V01Y200811HLT001.Google Scholar
- C. Zhai and J. Lafferty. 2001. Model-based Feedback in the Language Modeling Approach to Information Retrieval. In Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM '01, ACM, pp. 403–410, New York. DOI: 10.1145/502585.502654.Google ScholarDigital Library
- C. Zhai and J. Lafferty. 2004. A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Trans. Inf. Syst., 22(2):179–214.Google ScholarDigital Library
- C. Zhai, P. Jansen, E. Stoica, N. Grot, and D. A. Evans. 1998. Threshold Calibration in CLARIT Adaptive Filtering. In Proc. of Seventh Text REtrieval Conference (TREC-7), pp. 149–156.Google Scholar
- C. Zhai, P. Jansen, and D. A. Evans. 2000. Exploration of a heuristic approach to threshold learning in adaptive filtering. In SIGIR, ACM, pp. 360–362. DOI: 10.1145/345508.345652.Google ScholarCross Ref
- C. Zhai, A. Velivelli, and B. Yu. 2004. A cross-collection mixture model for comparative text mining. In Proc. of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '04, ACM, pp. 743–748, New York. DOI: 10.1145/1014052.1014150.Google ScholarDigital Library
- D. Zhang, C. Zhai, J. Han, A. Srivastava, and N. Oza. 2009. Topic modeling for OLAP on multidimensional text databases: topic cube and its applications. Stat. Anal. Data Min. 2, 5–6 (December 2009), 378–395. DOI: 10.1002/sam.v2.5/6.Google ScholarCross Ref
- J. Zhu, A. Ahmed, and E. P. Xing. 2009. Medlda: Maximum margin supervised topic models for regression and classification. In Proc. of the 26th Annual International Conference on Machine Learning, ICML '09, ACM, pp. 1257–1264, New York. DOI: 10.1145/1553374.1553535.Google ScholarCross Ref
- G. K. Zipf. 1949. Human Behavior and the Principle of Least-Effort. Cambridge, MA, Addison-Wesley.Google Scholar
Cited By
-
Karousos N, Vorvilas G, Pantazi D and Verykios V (2024). A Hybrid Text Summarization Technique of Student Open-Ended Responses to Online Educational Surveys, Electronics, 10.3390/electronics13183722, 13:18, (3722)
-
Xiong S, Tian W, Si H, Zhang G and Shi L (2024). A Survey of the Applications of Text Mining for the Food Domain, Algorithms, 10.3390/a17050176, 17:5, (176)
-
Hu Z, Ma H, Xiong J, Gao P and Divakaran P Convergence or Divergence: A Computational Text Analysis of Stakeholder Concerns on Manufacturing Upgrading in China, IEEE Transactions on Engineering Management, 10.1109/TEM.2022.3159344, 71, (1285-1295)
-
Zhang W, Yan R and Yuan L How Generative AI Was Mentioned in Social Media and Academic Field? A Text Mining Based on Internet Text Data, IEEE Access, 10.1109/ACCESS.2024.3379010, 12, (43940-43947)
-
Tzirides A (2024). Artificial Intelligence Integration in Translingual Language Learning: Enhancing Communication and Digital Literacy Trust and Inclusion in AI-Mediated Education, 10.1007/978-3-031-64487-0_12, (261-286),
-
Cope B and Kalantzis M (2024). On Cyber-Social Learning: A Critique of Artificial Intelligence in Education Trust and Inclusion in AI-Mediated Education, 10.1007/978-3-031-64487-0_1, (3-34),
-
Kassimi M, Abdellatif H and Essayad A (2024). Mono-Lingual Search Engine: Combining Keywords with Context for Semantic Search Engine Advances in Intelligent System and Smart Technologies, 10.1007/978-3-031-47672-3_34, (353-363),
-
Phan H, Vinh N and Huu N (2023). An Efficient System for Personal Information Search in Cyberspace Using Facial Recognition Technology 2023 12th International Conference on Control, Automation and Information Sciences (ICCAIS), 10.1109/ICCAIS59597.2023.10382379, 979-8-3503-2878-3, (566-571)
-
OKATAN B and ÇAM H (2023). Analysis of customer reviews for digital banking applications with text mining methodsMetin madenciliği yöntemleri ile dijital bankacılık uygulamalarına yönelik müşteri yorumlarının analizi, Gümüşhane Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 10.17714/gumusfenbil.1361431
-
Tao D, Hu R, Zhang D, Laber J, Lapsley A, Kwan T, Rathke L, Rundensteiner E and Feng H (2023). A Novel Foodborne Illness Detection and Web Application Tool Based on Social Media, Foods, 10.3390/foods12142769, 12:14, (2769)
-
ÇULLU B and OKURSOY A (2023). Kargo Firmalarının Hizmet Kalitesinin Metin Madenciliği İle İncelenmesiInvestigation of Cargo Companies' Service Quality Using Text Mining, Anadolu Üniversitesi Sosyal Bilimler Dergisi, 10.18037/ausbd.1205507, 23:2, (399-422)
-
Alhoori H, Fox E, Frommholz I, Liu H, Coupette C, Rieck B, Ghosal T and Wu J (2023). Who can Submit an Excellent Review for this Manuscript in the Next 30 Days? - Peer Reviewing in the Age of Overload 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 10.1109/JCDL57899.2023.00077, 979-8-3503-9931-8, (319-320)
-
Reid A (2023). Closing the Affordable Housing Gap: Identifying the Barriers Hindering the Sustainable Design and Construction of Affordable Homes, Sustainability, 10.3390/su15118754, 15:11, (8754)
-
Nakamura Y, Nagaoka T, Kitagawa T, Inoki M and Honiden S (2023). Understanding Support Method for Requirements Specification Using Description Status Based on Page Trend 2023 8th International Conference on Information and Network Technologies (ICINT), 10.1109/ICINT58947.2023.00016, 979-8-3503-0145-8, (43-48)
-
VORVILAS G, LIAPIS A, KOROVESIS A, AGGELOPOULOU D, KAROUSOS N and EFSTATHOPOULOS E (2023). CONDUCTING REMOTE ELECTRONIC EXAMINATIONS IN DISTANCE HIGHER EDUCATION: STUDENTS’ PERCEPTIONS, Turkish Online Journal of Distance Education, 10.17718/tojde.971889, 24:2, (167-182)
-
Kline S (2023). CGScholar Promoting Next-Generation Learning Environments Through CGScholar, 10.4018/978-1-6684-5124-3.ch011, (206-229)
-
Olteanu A, Cernian A and Gâgă S (2022). Leveraging Machine Learning and Semi-Structured Information to Identify Political Views from Social Media Posts, Applied Sciences, 10.3390/app122412962, 12:24, (12962)
-
Purwandari K and Nurlaila I (2022). Sequential Topic Modelling: A Case Study on One Health Conversation on Twitter 2022 5th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 10.1109/ISRITI56927.2022.10052987, 978-1-6654-5512-1, (457-461)
-
López J and Cuadrado J (2021). An efficient and scalable search engine for models, Software and Systems Modeling, 10.1007/s10270-021-00960-4, 21:5, (1715-1737), Online publication date: 1-Oct-2022.
-
Amies A (2022). Machine Learning Approaches with Multilingual Bibliographic, Quotation, and Terminology Databases for the Study of the Chinese Buddhist Canon 2022 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC), 10.23919/PNC56605.2022.9982732, 978-9-8695-3174-0, (1-7)
-
Zuo E, Aysa A, Muhammat M, Zhao Y, Chen B and Ubul K (2022). A food safety prescreening method with domain-specific information using online reviews, Journal of Consumer Protection and Food Safety, 10.1007/s00003-022-01367-z
-
Zehtab G and Basiri A (2022). Employees Turnover Rate with Pivoted Length Normalization 2022 27th International Computer Conference, Computer Society of Iran (CSICC), 10.1109/CSICC55295.2022.9780489, 978-1-6654-8027-7, (1-4)
- Liu X, Wang J, Rui X, Zhang J and Sun G (2022). Application of GIS Technology-Supported Cross Media Fusion Method Based on Deep Learning in Landscape Performance Evaluation, Computational Intelligence and Neuroscience, 2022, Online publication date: 1-Jan-2022.
-
(2022). Bibliography Storage Systems, 10.1016/B978-0-32-390796-5.00023-1, (641-693),
-
Thomasian A (2022). Structured, unstructured, and diverse databases Storage Systems, 10.1016/B978-0-32-390796-5.00018-8, (493-563),
-
Sirajzade J, Bouvry P and Schommer C (2022). Deep Mining Covid-19 Literature Applied Informatics, 10.1007/978-3-031-19647-8_9, (121-133),
-
Aggarwal C (2022). An Introduction to Text Analytics Machine Learning for Text, 10.1007/978-3-030-96623-2_1, (1-17),
-
BÜYÜKEKE A and ÖZSOY T (2021). A text mining analysis of customer evaluations in terms of gastronomy tourismGastronomi turizmi açısından müşteri değerlendirmelerinin metin madenciliği ile analizi, Balıkesir Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 10.31795/baunsobed.1025204, 24:46-1, (1295-1312)
-
BUDAK V (2021). Geçici Bilgi İhtiyacının Giderilme Sürecinde Kullanıcı Okuma Davranışlarının İncelenmesi, Turk Kutuphaneciligi - Turkish Librarianship, 10.24146/tk.955630, 35:4, (1-18)
-
Trinko D, Porter E, Dunckley J, Bradley T and Coburn T (2021). Combining Ad Hoc Text Mining and Descriptive Analytics to Investigate Public EV Charging Prices in the United States, Energies, 10.3390/en14175240, 14:17, (5240)
-
Gholamian S and Ward P (2021). On the Naturalness and Localness of Software Logs 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), 10.1109/MSR52588.2021.00028, 978-1-7281-8710-5, (155-166)
-
Liapis A, Vorvilas G, Korovesis A, Aggelopoulou D, Karousos N and Efstathopoulos E (2021). Evaluating the remote examination process applied by the Hellenic Open University (HOU) during COVID-19 pandemic: Students’ opinions 2021 IEEE Global Engineering Education Conference (EDUCON), 10.1109/EDUCON46332.2021.9454107, 978-1-7281-8478-4, (924-927)
-
Parlina A, Ramli K and Murfi H (2021). Exposing Emerging Trends in Smart Sustainable City Research Using Deep Autoencoders-Based Fuzzy C-Means, Sustainability, 10.3390/su13052876, 13:5, (2876)
- Mahbub S, Pardede E, Kayes A and Chaudhry S (2021). Detection of Harassment Type of Cyberbullying, Security and Communication Networks, 2021, Online publication date: 1-Jan-2021.
-
Kim J, On B and Lee I High-Quality Train Data Generation for Deep Learning-Based Web Page Classification Models, IEEE Access, 10.1109/ACCESS.2021.3086586, 9, (85240-85254)
-
Singh K, Dorendro A, Devi H and Mahanta A (2021). Analysis of Changing Trends in Textual Data Representation Recent Trends in Image Processing and Pattern Recognition, 10.1007/978-981-16-0507-9_21, (237-251),
-
Bachmaier P (2021). Text Mining: Durchführung einer Sentiment Analysis mit SAP HANA Data Science, 10.1007/978-3-658-33403-1_16, (259-275),
-
Hoßfeld H (2021). Text Mining in der Organisationsforschung Handbuch Empirische Organisationsforschung, 10.1007/978-3-658-08580-3_35-1, (1-23),
-
Seyler D, Li L and Zhai C (2020). Semantic Text Analysis for Detection of Compromised Accounts on Social Networks 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 10.1109/ASONAM49781.2020.9381432, 978-1-7281-1056-1, (417-424)
-
Sim J, Miller P and Swarup S (2020). Tweeting the High Line Life: A Social Media Lens on Urban Green Spaces, Sustainability, 10.3390/su12218895, 12:21, (8895)
-
Shah A, Yan X, Khan S, Khurrum W and Khan Q (2020). A multi-modal approach to predict the strength of doctor–patient relationships, Multimedia Tools and Applications, 10.1007/s11042-020-09596-w
-
Moreno-Guerrero A, López-Belmonte J, Marín-Marín J and Soler-Costa R (2020). Scientific Development of Educational Artificial Intelligence in Web of Science, Future Internet, 10.3390/fi12080124, 12:8, (124)
-
Hendrickx I, Voets T, van Dyk P and Kool R (2020). Using text mining techniques to identify healthcare providers with patient safety problems: an exploratory study (Preprint), Journal of Medical Internet Research, 10.2196/19064
-
Fraj M, Hajkacem M and Essoussi N (2020). Self-Organizing Map for Multi-view Text Clustering Big Data Analytics and Knowledge Discovery, 10.1007/978-3-030-59065-9_30, (396-408),
-
Al-Ash H, Putri M, Mursanto P and Bustamam A (2019). Ensemble Learning Approach on Indonesian Fake News Classification 2019 3rd International Conference on Informatics and Computational Sciences (ICICoS), 10.1109/ICICoS48119.2019.8982409, 978-1-7281-4610-2, (1-6)
-
Milne G, Villarroel Ordenes F and Kaplan B (2019). Mindful consumption: Three consumer segment views, Australasian Marketing Journal (AMJ), 10.1016/j.ausmj.2019.09.003, Online publication date: 1-Sep-2019.
- Labhishetty S, Bhavya , Pei K, Boughoula A and Zhai C Web of Slides Proceedings of the Sixth (2019) ACM Conference on Learning @ Scale, (1-4)
-
Hu M and Pavao-Zuckerman M (2019). Literature Review of Net Zero and Resilience Research of the Urban Environment: A Citation Analysis Using Big Data, Energies, 10.3390/en12081539, 12:8, (1539)
-
Husáková M (2019). Ontology-Based Conceptualisation of Text Mining Practice Areas for Education Computational Collective Intelligence, 10.1007/978-3-030-28374-2_46, (533-542),
- Karmaker Santu S, Geigle C, Ferguson D, Cope W, Kalantzis M, Searsmith D and Zhai C (2018). SOFSAT, ACM SIGKDD Explorations Newsletter, 20:2, (21-30), Online publication date: 11-Dec-2018.
- Gupta D and Berberich K GYANI Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (487-496)
- Lee G and Sun A Seed-driven Document Ranking for Systematic Reviews in Evidence-Based Medicine The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, (455-464)
-
Wu P and Lin K (2018). Unstructured big data analytics for retrieving e-commerce logistics knowledge, Telematics and Informatics, 10.1016/j.tele.2017.11.004, 35:1, (237-244), Online publication date: 1-Apr-2018.
- Castillo E, Cervantes O, Vilariño D, Pinto D, Singh V, Villavicencio A, Mayr-Schlegel P and Stamatatos E (2018). Author profiling using a graph enrichment approach, Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, 34:5, (3003-3014), Online publication date: 1-Jan-2018.
-
Balog K (2018). Term-Based Models for Entity Ranking Entity-Oriented Search, 10.1007/978-3-319-93935-3_3, (57-99),
-
Balog K (2018). Meet the Data Entity-Oriented Search, 10.1007/978-3-319-93935-3_2, (25-53),
-
Lommatzsch A (2018). A Next Generation Chatbot-Framework for the Public Administration Innovations for Community Services, 10.1007/978-3-319-93408-2_10, (127-141),
-
Correia A, Teodoro M and Lobo V (2018). Statistical Methods for Word Association in Text Mining Recent Studies on Risk Analysis and Statistical Modeling, 10.1007/978-3-319-76605-8_27, (375-384),
-
Aggarwal C (2018). Machine Learning for Text: An Introduction Machine Learning for Text, 10.1007/978-3-319-73531-3_1, (1-16),
- Wang S, Giridhar P, Wang H, Kaplan L, Pham T, Yener A and Abdelzaher T StoryLine Proceedings of the Second International Conference on Internet-of-Things Design and Implementation, (83-93)
- Albishre K, Li Y and Xu Y Effective pseudo-relevance for Microblog retrieval Proceedings of the Australasian Computer Science Week Multiconference, (1-6)
-
Golani N, Khandelwal I and Tripathy B (2017). Hybrid Intelligent Techniques in Text Mining and Analysis of Social Networks and Media Data Hybrid Intelligence for Social Networks, 10.1007/978-3-319-65139-2_1, (1-24),
-
Correia A and Gonçalves A (2017). Topics Discovery in Text Mining Recent Advances in Information Systems and Technologies, 10.1007/978-3-319-56535-4_25, (251-256),