Semantic similarity measurement using historical google search patterns

Jorge Martinez-Gil¹ &
José F. Aldana-Montes¹

726 Accesses
18 Citations
4 Altmetric
2 Mentions
Explore all metrics

Abstract

Computing the semantic similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is an important challenge in the information integration field. The problem is that techniques for textual semantic similarity measurement often fail to deal with words not covered by synonym dictionaries. In this paper, we try to solve this problem by determining the semantic similarity for terms using the knowledge inherent in the search history logs from the Google search engine. To do this, we have designed and evaluated four algorithmic methods for measuring the semantic similarity between terms using their associated history search patterns. These algorithmic methods are: a) frequent co-occurrence of terms in search patterns, b) computation of the relationship between search patterns, c) outlier coincidence on search patterns, and d) forecasting comparisons. We have shown experimentally that some of these methods correlate well with respect to human judgment when evaluating general purpose benchmark datasets, and significantly outperform existing methods when evaluating datasets containing terms that do not usually appear in dictionaries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Web Similarity in Sets of Search Terms Using Database Queries

Article 12 May 2020

An Analysis of Semantic Similarity Measures for Information Retrieval

AST Method for Scoring String-to-text Similarity

Notes

References

Aitken, A. (2007). Statistical mathematics. Oliver & Boyd.
Badea, B., & Vlad, A. (2006). Revealing Statistical Independence of Two Experimental Data Sets: An Improvement on Spearman’s Algorithm. In ICCSA (pp. 1166–1176).
Banek, M., Vrdoljak, B., Min Tjoa, A., Skocir, Z. (2007). Automating the Schema Matching Process for Heterogeneous Data Warehouses. In DaWaK (pp. 45–54). 596
Banek, M., Vrdoljak, B., Tjoa, A.M. (2007). Using Ontologies for Measuring Semantic Similarity in Data Warehouse Schema Matching Process. In CONTEL (pp. 227–234).
Banerjee, S., & Pedersen, T. (2003). Extended Gloss Overlaps as a Measure of Semantic Relatedness. In IJCAI (pp. 805–810).
Bollegala, D., Matsuo, Y., Ishizuka, M. (2007). Measuring semantic similarity between words using web search engines. In WWW (pp. 757–766).
Bollegala, D., Honma, T., Matsuo, Y., Ishizuka, M. (2008). Mining for personal name aliases on the web. In WWW (pp. 1107–1108).
Brin, S., & Page, L. (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks, 30(1–7), 107–117.
Google Scholar
Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics, 32(1), 13–47.
Article Google Scholar
Choi, H., & Varian, H. (2009). Predicting the present with Google Trends. Technical Report, Economics Research Group, Google.
Cilibrasi, R., & Vitányi, P.M. (2007). The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering, 19(3), 370–383.
Article Google Scholar
Dhurandhar, A. (2011). Improving predictions using aggregate information. In KDD (pp. 1118–1126).
Egghe, L., & Leydesdorff, L. (2009). The relation between Pearson’s correlation coefficient r and Salton’s cosine measure CoRR abs/0911.1318.
Fong, J., Shiu, H., Cheung, D. (2009). A relational-XML data warehouse for data aggregation with SQL and XQuery. Software, Practice and Experience, 38(11), 1183–1213.
Article Google Scholar
Grubbs, F. (1969). Procedures for Detecting Outlying Observations in Samples. Technometrics, 11(1), 1–21.
Article Google Scholar
Hliaoutakis, A., Varelas, G., Petrakis, E.G.M.,Milios, E. (2006). Med-Search: A Retrieval System for Medical Information Based on Semantic Similarity. In ECDL (pp. 512–515).
Hu, N., Bose, I., Koh, N.S., Liu, L. (2012). Manipulation of online reviews: An analysis of ratings, readability, and sentiments. Decision Support Systems (DSS), 52(3), 674–684.
Article Google Scholar
Hjorland, H. (2007). Semantics and knowledge organization. ARIST, 41(1), 367–405.
Google Scholar
Jung, J.J., & Thanh Nguyen, N. (2008). Collective Intelligence for Semantic and Knowledge Grid. Journal of Universal Computer Science (JUCS), 14(7), 1016–1019.
Google Scholar
Kopcke, H., Thor, A., Rahm, E. (2010). Evaluation of entity resolution approaches on real-world match problems. PVLDB, 3(1), 484–493.
Google Scholar
Leacock, C., Chodorow, M., Miller, G.A. (1998). Using Corpus Statistics and WordNet Relations for Sense Identification. Computational Linguistics, 24(1), 147–165.
Google Scholar
Lesk, M. (1986). Information in Data: Using the Oxford English Dictionary on a Computer. SIGIR Forum, 20(1–4), 18–21.
Article Google Scholar
Li, J., Alan Wang, G., Chen, H. (2011). Identity matching using personal and social identity features. Information Systems Frontiers, 13(1), 101–113.
Article Google Scholar
Li, Y., Bandar, A., McLean, D. (2003). An approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE Transactions on Knowledge and Data Engineering, 15(4), 871–882.
Article Google Scholar
Liu, B., & Zhang, L. (2012). A Survey of Opinion Mining and Sentiment Analysis. In Mining Text Data (pp. 415–463).
Miller, G., & Charles, W. (1991). Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 6(1), 1–28.
Article Google Scholar
Nandi, A., & Bernstein, P.A. (2009). HAMSTER: Using Search Click- logs for Schema and Taxonomy Matching. PVLDB, 2(1), 181–192.
Google Scholar
Patuwo, B.E., & Hu, M. (1998) Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting, 14(1), 35–62.
Article Google Scholar
Patwardhan, S., Banerjee, S., Pedersen, T. (2003). Using Measures of Semantic Relatedness for Word Sense Disambiguation. In CICLing (pp. 241–257).
Pedersen, T., Patwardhan, S., Michelizzi, J. (2004). Word-Net::Similarity - Measuring the Relatedness of Concepts. In AAAI (pp. 1024–1025).
Petrakis, E.G.M., Varelas, G., Hliaoutakis, A., Raftopoulou, P. (2006). X-Similarity: Computing Semantic Similarity between Concepts from Different Ontologies. JDIM, 4(4), 233–237.
Google Scholar
Pirro, G. (2009). A semantic similarity metric combining features and intrinsic information content. Data and Knowledge Engineering, 68(11), 1289–1308.
Article Google Scholar
Resnik, P. (1995). Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In IJCAI (pp. 448–453).
Retzer, S., Yoong, P., Hooper, V. (2012). Inter-organisational knowledge transfer in social networks: A definition of intermediate ties. Information Systems Frontiers, 14(2), 343–361.
Article Google Scholar
Rousseeuw, P.J., & Leroy, A.M. (2005). Robust Regression and Outlier Detection: John Wiley & Sons Inc.
Sanchez, D., Batet, M., Valls, A. (2010). Web-Based Semantic Similarity: An Evaluation in the Biomedical Domain. International Journal of Software and Informatics, 4(1), 39–52.
Google Scholar
Sanchez, D., Batet, M., Valls, A., Gibert, K. (2010). Ontology-driven web-based semantic similarity. Journal of Intelligent Information Systems, 35(3), 383–413.
Article Google Scholar
Scarlat, E., & Maries, I. (2009). Towards an Increase of Collective Intelligence within Organizations Using Trust and Reputation Models. In ICCCI (pp. 140–151).
Sparck Jones, K. (2006). Collective Intelligence: It’s All in the Numbers. IEEE Intelligent Systems (EXPERT), 21(3), 64–65.
Article Google Scholar
Tuan Duc, N., Bollegala, D., Ishizuka, M. (2010). Using Relational Similarity between Word Pairs for Latent Relational Search on the Web. In Web Intelligence (pp. 196–199).

Download references

Acknowledgements

We would like to to thank the reviewers for their time and consideration. We thank Lisa Huckfield for proofreading this manuscript. This work has been funded by Spanish Ministry of Innovation and Science through: REALIDAD: Efficient Analysis, Management and Exploitation of Linked Data., Project Code: TIN2011-25840 and by the Department of Innovation, Enterprise and Science from the Regional Government of Andalucia through: Towards a platform for exploiting and analyzing biological linked data, Project Code: P11-TIC-7529.

Author information

Authors and Affiliations

Department of Computer Science, University of Malaga, Boulevard Louis Pasteur 35, Malaga, Spain
Jorge Martinez-Gil & José F. Aldana-Montes

Authors

Jorge Martinez-Gil
View author publications
You can also search for this author in PubMed Google Scholar
José F. Aldana-Montes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jorge Martinez-Gil.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martinez-Gil, J., Aldana-Montes, J.F. Semantic similarity measurement using historical google search patterns. Inf Syst Front 15, 399–410 (2013). https://doi.org/10.1007/s10796-012-9404-7

Download citation

Published: 15 January 2013
Issue Date: July 2013
DOI: https://doi.org/10.1007/s10796-012-9404-7

Semantic similarity measurement using historical google search patterns

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Web Similarity in Sets of Search Terms Using Database Queries

An Analysis of Semantic Similarity Measures for Information Retrieval

AST Method for Scoring String-to-text Similarity

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Semantic similarity measurement using historical google search patterns

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Web Similarity in Sets of Search Terms Using Database Queries

An Analysis of Semantic Similarity Measures for Information Retrieval

AST Method for Scoring String-to-text Similarity

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation