Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections

Published: 01 April 2010 Publication History

Abstract

The discovery of implicit connections between terms that do not occur together in any scientific document underlies the model of literature-based knowledge discovery first proposed by Swanson. Corpus-derived statistical models of semantic distance such as Latent Semantic Analysis (LSA) have been evaluated previously as methods for the discovery of such implicit connections. However, LSA in particular is dependent on a computationally demanding method of dimension reduction as a means to obtain meaningful indirect inference, limiting its ability to scale to large text corpora. In this paper, we evaluate the ability of Random Indexing (RI), a scalable distributional model of word associations, to draw meaningful implicit relationships between terms in general and biomedical language. Proponents of this method have achieved comparable performance to LSA on several cognitive tasks while using a simpler and less computationally demanding method of dimension reduction than LSA employs. In this paper, we demonstrate that the original implementation of RI is ineffective at inferring meaningful indirect connections, and evaluate Reflective Random Indexing (RRI), an iterative variant of the method that is better able to perform indirect inference. RRI is shown to lead to more clearly related indirect connections and to outperform existing RI implementations in the prediction of future direct co-occurrence in the MEDLINE corpus.

References

[1]
Swanson, D.R., Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med. v30 i1. 7-18.
[2]
Weeber, M., Kors, J.A. and Mons, B., Online tools to support literature-based discovery in the life sciences. Brief Bioinform. v6 i3. 277-286.
[3]
Yetisgen-Yildiz, M. and Pratt, W., A new evaluation methodology for literature-based discovery systems. J Biomed Inform. v42 i4. 633-643.
[4]
Landauer, T.K. and Dumais, S.T., A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev. v104. 211-240.
[5]
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R., Indexing by latent semantic analysis. J Am Soc Inform Sci. v41. 391-407.
[6]
Peirce, C.S., Abduction and induction. In: Buchler, J. (Ed.), Philosophical writings of Peirce, Dover, New York.
[7]
Bruza, P., Cole, R., Song, D. and Bari, Z., Towards operational abduction from a cognitive perspective. 2006. Oxford University Press.
[8]
Hristovski, D., Friedman, C., Rindflesch, T.C. and Peterlin, B., Exploiting semantic relations for literature-based discovery. AMIA Annu Symp Proc. v3. 49-53.
[9]
Schvaneveldt Roger, Cohen Trevor. Abductive reasoning and similarity. In: Ifenthaler D, Seel NM, editors. Computer based diagnostics and systematic analysis of knowledge. New York: Springer; in press.
[10]
Swanson, D.R. and Smalheiser, N.R., An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif Intell. v91. 183-203.
[11]
Gordon, M.D. and Lindsay, R.K., Toward discovery support systems: a replication, re-examination, and extension of Swanson's work on literature-based discovery of a connection between Raynaud's and fish oil. J Am Soc Inform Sci. v47 i2. 116-128.
[12]
Weeber, M., Vos, R., Klein, H., Berg, L.T.W.D.J.-V.D., Aronson, A.R. and Molema, G., Generating hypotheses by discovering implicit associations in the literature. A case report of a search for new potential therapeutic uses for thalomide. J Am Med Inform Assoc. v10 i3. 252-259.
[13]
Srinivasan, P., Text mining: generating hypotheses from MEDLINE. J Am Soc Inform Sci Technol. v55 i5. 396-413.
[14]
Ganiz M, Pottenger WM, Janneck CD. Recent advances in literature based discovery. Technical report. Lehih University; 2005. LU-CSE-05-027.
[15]
Kostoff, R.N., Literature-related discovery (LRD): introduction and background. Technol Forecast Soc Change. v75 i2. 165-185.
[16]
Gordon, M.D. and Dumais, S., Using latent semantic indexing for literature based discovery. J Am Soc Inform Sci. v49 i8. 674-685.
[17]
Lund, K. and Burgess, C., Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Methods Instrum Comput. v28. 203-208.
[18]
Cole, R. and Bruza, P., A bare bones approach to literature-based discovery: an analysis of the Raynaud's/fish-oil and migraine-magnesium discoveries in semantic space. In: Lecture notes in computer science: discovery science, vol. 3735. Springer, Berlin/Heidelberg. pp. 84-98.
[19]
Bruza, P.D., Widdows, D. and Woods, J., A quantum logic of down below. In: Engesser, Kurt, Gabbay, Dov, Lehmann, Daniel (Eds.), Handbook of quantum logic and quantum structures: quantum logic, Elsevier. pp. 625-660.
[20]
Cohen, T. and Widdows, D., Empirical distributional semantics: methods and biomedical applications. J Biomed Inform. v42 i2. 390-405.
[21]
Landauer, T.K., Laham, D., Rehder, B. and Schreiner, M.E., How well can passage meaning be derived without using word order? A comparison of latent semantic analysis and humans. In: Shafto, M.G., Langley, P. (Eds.), Proceedings of the 19th annual meeting of the cognitive science society, Erlbaum, Mawhwah, NJ. pp. 412-417.
[22]
Laham, D., Latent semantic analysis approaches to categorization. In: Shafto, M.G., Langley, P. (Eds.), Proceedings of the 19th annual meeting of the cognitive science society, Erlbauml, Mawhwah, NJ. pp. 979
[23]
Giles, J.T., Wo, L. and Berry, M.W., GTP (General Text Parser) software for text mining. In: Bozdogan, Hamparsum (Ed.), Statistical data mining and knowledge discovery, CRC Press Inc., Boca Raton, FL, USA.
[24]
Kanerva P, Kristofersson J, Holst A. Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd annual conference of the cognitive science society; 2000. p. 103-6.
[25]
Karlgren, J. and Sahlgren, M., From words to understanding. Found Real World Intell. 294-308.
[26]
Kanerva, P., Hyperdimensional computing: an introduction to computing in distributed representation with high-dimensional random vectors. Cogn Comput. v1 i2. 139-159.
[27]
Bau III, David and Trefethen, Lloyd, Numerical linear algebra. 1997. Society for Industrial and Applied Mathematics, Philadelphia.
[28]
Widdows D, Ferraro K. Semantic vectors: a scalable open source package and online technology management application. In: 6th International conference on language resources and evaluation (LREC); 2008.
[29]
Vempala SS. The random projection method. In: DIMACS series in discrete mathematics and theoretical computer science, vol. 65, Providence, RI: American Mathematical Society; 2004.
[30]
Johnson, W. and Lindenstrauss, J., Extension of lipshitz mapping to hilbert space. Contemp Math. v26. 189-206.
[31]
Cohen, T.A., Exploring MEDLINE space with random indexing and pathfinder networks. AMIA Annu Symp Proc. 126-130.
[32]
Burgess, C., Livesay, K. and Lund, K., Explorations in context space. words, sentences, discourse. Discourse Process. v25 i2-3. 211-257.
[33]
Sahlgren M. The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Doctoral thesis. Stockholm University, Faculty of Humanities, Department of Linguistics.
[34]
Rapp R. Word sense discovery based on sense descriptor similarity. In: Proceedings of the 9th machine translation summit, New Orleans; 2003. p. 315-22.
[35]
Widdows D. Retraining document and term vectors, and refactoring the interface to sparse vector stores {Internet}. Available from: http://groups.google.com/group/semanticvectors/browse_thread/thread/d5885d4822b09444/8ce877844c1cb0af?lnk=gst&q=retraining#8ce877844c1cb0af.
[36]
Gallant, S.I., Context vectors: a step toward a "Grand Unified Representation". In: Wermter, S., Sun, R. (Eds.), Hybrid neural systems (LNAI 1778), Springer-Verlag, Berlin, Heidelberg. pp. 204-210.
[37]
Widdows, D. and Cohen, T., Semantic vector combinations and the synoptic gospels. In: Bruza, P., Sofge, D., Lawless, W., Van Rijsbergen, C.J., Klusch, M. (Eds.), Proceedings of the 3rd quantum interaction symposium (March 25-27, 2009-DFKI, Saarbruecken), Springer. pp. 251-265.
[38]
Sahlgren M, Holst A, Kanerva P. Permutations as a means to encode order in word space. In: Proceedings of the 30th annual meeting of the cognitive science society (CogSci'08), July 23-26, Washington, DC, USA; 2008.
[39]
Martin, D.I. and Berry, M.W., Mathematical foundations behind latent semantic analysis. In: Landauer, T., McNamara, D., Dennis, S., Kintsch, W. (Eds.), Handbook of latent semantic analysis, Lawrence Erlbaum Associates, Mahwah, NJ.
[40]
Widdows D. Orthogonal negation in vector spaces for modelling word-meanings and document retrieval. In: Proceedings of the 41st annual meeting of the association for computational linguistics (ACL); 2003.
[41]
Ipsen, I. and Wills, R.M., Analysis and computation of Google's PageRank. In: 7th IMACS international symposium on iterative methods in scientific computing, Fields Institute, Toronto, Canada.
[42]
Kanaji, S., Kanaji, T., Furihata, K., Kato, K., Ware, J.L. and Kunicki, T.J., Convulxin binds to native, human glycoprotein Ib alpha. J Biol Chem. v278 i41. 39452-39460.
[43]
Sitbon L, Bruza P. On the relevance of documents for semantic representation. In: Proceedings of the 13th Australasian document computing symposium (ADCS 2008). p. 19-22.
[44]
Hristovski, D., Stare, J., Peterlin, B. and Dzeroski, S., Supporting discovery in medicine by association rule mining in Medline and UMLS. Stud Health Technol Inform. 1344-1348.
[45]
Yetisgen-Yildiz, M. and Pratt, W., Using statistical and knowledge-based approaches for literature-based discovery. J Biomed Inform. v39 i6. 600-611.
[46]
Yang, Y. and Chute, C.G., A linear least squares fit mapping method for information retrieval from natural language texts. In: Proceedings of the 14th conference on computational linguistics, vol. 2. Association for Computational Linguistics, Nantes, France. pp. 447-453.

Cited By

View all
  • (2020)Towards Creating a New Triple Store for Literature-Based DiscoveryTrends and Applications in Knowledge Discovery and Data Mining10.1007/978-3-030-60470-7_5(41-50)Online publication date: 11-May-2020
  • (2019)A Systematic Review on Literature-based DiscoveryACM Computing Surveys10.1145/336575652:6(1-34)Online publication date: 10-Dec-2019
  • (2019)Feature extraction for phenotyping from semantic and knowledge resourcesJournal of Biomedical Informatics10.1016/j.jbi.2019.10312291:COnline publication date: 1-Mar-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Biomedical Informatics
Journal of Biomedical Informatics  Volume 43, Issue 2
April, 2010
184 pages

Publisher

Elsevier Science

San Diego, CA, United States

Publication History

Published: 01 April 2010

Author Tags

  1. Distributional semantics
  2. Implicit associations
  3. Indirect inference
  4. Literature-based discovery

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Towards Creating a New Triple Store for Literature-Based DiscoveryTrends and Applications in Knowledge Discovery and Data Mining10.1007/978-3-030-60470-7_5(41-50)Online publication date: 11-May-2020
  • (2019)A Systematic Review on Literature-based DiscoveryACM Computing Surveys10.1145/336575652:6(1-34)Online publication date: 10-Dec-2019
  • (2019)Feature extraction for phenotyping from semantic and knowledge resourcesJournal of Biomedical Informatics10.1016/j.jbi.2019.10312291:COnline publication date: 1-Mar-2019
  • (2019)Rapamycin − mTOR + BRAF = ? Using relational similarity to find therapeutically relevant drug-gene relationships in unstructured textJournal of Biomedical Informatics10.1016/j.jbi.2019.10309490:COnline publication date: 1-Feb-2019
  • (2017)Progressive Random IndexingACM Transactions on Internet Technology10.1145/299618517:2(1-21)Online publication date: 24-Mar-2017
  • (2017)Embedding of semantic predicationsJournal of Biomedical Informatics10.1016/j.jbi.2017.03.00368:C(150-166)Online publication date: 1-Apr-2017
  • (2016)Lightweight random indexing for polylingual text classificationJournal of Artificial Intelligence Research10.5555/3176748.317675257:1(151-185)Online publication date: 1-Sep-2016
  • (2016)Social Question AnsweringACM Transactions on Information Systems10.1145/294806335:1(1-40)Online publication date: 3-Sep-2016
  • (2015)On efficient link recommendation in social networks using actor-fact matricesScientific Programming10.1155/2015/4502152015(2-2)Online publication date: 1-Jan-2015
  • (2015)An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical recordsArtificial Intelligence in Medicine10.1016/j.artmed.2015.04.00765:2(155-166)Online publication date: 1-Oct-2015
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media