Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2756406.2756920acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Improving Access to Large-scale Digital Libraries ThroughSemantic-enhanced Search and Disambiguation

Published: 21 June 2015 Publication History

Abstract

With 13,000,000 volumes comprising 4.5 billion pages of text, it is currently very difficult for scholars to locate relevant sets of documents that are useful in their research from the HathiTrust Digital Libary (HTDL) using traditional lexically-based retrieval techniques. Existing document search tools and document clustering approaches use purely lexical analysis, which cannot address the inherent ambiguity of natural language. A semantic search approach offers the potential to overcome the shortcoming of lexical search, but even if an appropriate network of ontologies could be decided upon it would require a full semantic markup of each document. In this paper, we present a conceptual design and report on the initial implementation of a new framework that affords the benefits of semantic search while minimizing the problems associated with applying existing semantic analysis at scale. Our approach avoids the need for complete semantic document markup using pre-existing ontologies by developing an automatically generated Concept-in-Context (CiC) network seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system analyzes documents by the semantics and context of their content. The disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. Our method achieves a form of semantic-enhanced search that simultaneously exploits the proven scale benefits provided by lexical indexing.

References

[1]
M. Apperley, S. J. Cunningham, T. T. Keegan, and I. H. Witten. Niupepa: a historical newspaper collection. Communications of the ACM, 44(5):86--87, 2001.
[2]
V. Basile, J. Bos, K. Evang, and N. Venhuizen. Developing a large semantically annotated corpus. In LREC, volume 12, pages 3196--3200, 2012.
[3]
I. Campbell. The Ostensive Model of Developing Information-Needs. PhD thesis, University of Glasgow, 2000.
[4]
C. Carpineto and G. Romano. A survey of automatic query expansion in information retrieval. ACM Comput. Surv., 44(1):1:1--1:50, 2012.
[5]
J. S. Downie, T. Cole, B. Plale, K. Fenlon, K. Wickett, and M. Senseney. The workset creation for scholarly analysis (wcsa) prototyping project: Background and goals. In Proceedings of the Chicago Colloquium on Digital Humanities and Computer Science, Chicago, IL, December 5--7 2013.
[6]
A. Duineveld, R. Stoter, M. Weiden, B. Kenepa, and V. Benjamins. Wondertools? a comparative study of ontological engineering tools. International Journal of Human-Computer Studies, 52(6):1111--1133, 2000.
[7]
E. N. Efthimiadis. Interactive query expansion: A user-based evaluation in a relevance feedback environment. J. Am. Soc. Inf. Sci., 51(11):989--1003, Sept. 2000.
[8]
S. R. El-Beltagy and A. Rafea. Kp-miner: A keyphrase extraction system for english and arabic documents. Information Systems, 34(1):132--144, 2009.
[9]
C. Fellbaum. WordNet. Wiley Online Library, 1998.
[10]
K. Fenlon, M. Senseney, H. Green, S. Bhattacharyya, C. Willis, and J. Downie. Scholar-built collections: A study of user requirements for research in large-scale digital libraries. In Proc. of the Association for Information Science and Technology, 2014.
[11]
G. Flouris, D. Manakanatas, H. Kondylakis, D. Plexousakis, and G. Antoniou. Ontology change: Classification and survey. The Knowledge Engineering Review, 23(02):117--152, 2008.
[12]
G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. The vocabulary problem in human-system communication. Commun. ACM, 30(11):964--971, Nov. 1987.
[13]
R. Grishman and B. Sundheim. Message understanding conference-6: A brief history. In Proceedings of the 16th Conference on Computational Linguistics - Volume 1, COLING '96, pages 466--471, Stroudsburg, PA, USA, 1996. Association for Computational Linguistics.
[14]
P. Harris, R. Matamua, T. Smith, H. Kerr, and T. Waaka. A review of M\=aori Astronomy in Aotaora-New Zealand. Journal of Astronomical History and Heritage, 16(3):325--336, 2013.
[15]
A. Hinze, R. Heese, M. Luczak-Rösch, and A. Paschke. Semantic enrichment by non-experts: usability of manual annotation tools. In The Semantic Web--ISWC 2012, pages 165--181. Springer, 2012.
[16]
A. Hinze, R. Heese, A. Schlegel, and M. Luczak-Rösch. User-defined semantic enrichment of full-text documents: Experiences and lessons learned. In Theory and Practice of Digital Libraries, pages 209--214. Springer, 2012.
[17]
L. Jean-Louis, A. Zouaq, M. Gagnon, and F. Ensan. An assessment of online semantic annotators for the keyword extraction task. In PRICAI 2014: Trends in Artificial Intelligence, pages 548--560. Springer, 2014.
[18]
D. Karger. Unference: Ui (not ai) as key to the semantic web. Panel on Interaction Design Grand Challenges and the Semantic Web, at the 3rd International Semantic Web User Interaction Workshop, 2006.
[19]
D. Karger and mc Schraefel. The pathetic fallacy of rdf. In The 3rd International Semantic Web User Interaction, September 2006.
[20]
U. S. Kohomban and W. S. Lee. Learning semantic classes for word sense disambiguation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 34--41. Association for Computational Linguistics, 2005.
[21]
M. Lytras, M. Sicilia, J. Davies, V. Kashyap, and N. Stojanovic. On the conceptualisation of the query refinement task. Library Management, 26(4/5):281--294, 2005.
[22]
O. Medelyan, E. Frank, and I. H. Witten. Human-competitive tagging using automatic keyphrase extraction. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3, pages 1318--1327. Association for Computational Linguistics, 2009.
[23]
D. Milne and I. H. Witten. An open-source toolkit for mining wikipedia. Artificial Intelligence, 194:222--239, 2013.
[24]
R. Navigli. Word sense disambiguation: A survey. ACM Computing Surveys (CSUR), 41(2):10, 2009.
[25]
H. J. Peat and P. Willett. The limitations of term co-occurrence data for query expansion in document retrieval systems. Journal of the American Society for Information Science, 42:378--383, 1991.
[26]
J. S. T. Rito and S. M. Healy, editors. Proceedings of the Traditional Knowledge Conference 2008: Traditional Knowledge and Gateways to Balanced Relationships. New Zealand's Mäori Centre of Research Excellence, 2008.
[27]
G. Rizzo and R. Troncy. Nerd: evaluating named entity recognition tools in the web of data. In ISWC'11, Workshop on Web Scale Knowledge Extraction (WEKEX'11), 2011.
[28]
C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large altavista query log. ACM SIGIR Forum, 33, 1998.
[29]
R. Sinkkila, O. Suominen, and E. Hyvönen. Automatic semantic subject indexing of web documents in highly inflected languages. In The Semantic Web: Research and Applications, pages 215--229. Springer, 2011.
[30]
J. F. Sowa. Conceptual structures: information processing in mind and machine. Addison-Wesley Longman Publishing Co., Inc., 1984.
[31]
N. Stojanovic. Information-need driven query refinement. Web Intelli. and Agent Sys., 3(3):155--169, July 2005.
[32]
N. Stojanovic, R. Studer, and L. Stojanovic. An approach for step-by-step query refinement in the ontology-based information retrieval. In Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, WI '04, pages 36--43, Washington, DC, USA, 2004. IEEE Computer Society.
[33]
E. Tregear. The Maori Race. AD Willis, 1904.
[34]
E. M. Voorhees. Query expansion using lexical-semantic relations. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '94, pages 61--69, New York, NY, USA, 1994. Springer-Verlag New York, Inc.
[35]
Y. Yesilada, S. Bechhofer, and B. Horan. Cohse: dynamic linking of web resources. Technical report, Sun Microsystems, Inc., 2007.

Cited By

View all
  • (2020)Capturing cultural heritage in East Asia and OceaniaCommunications of the ACM10.1145/337854863:4(50-52)Online publication date: 20-Mar-2020
  • (2018)A linked open data framework to enhance the discoverability and impact of culture heritageJournal of Information Science10.1177/016555151881265845:6(756-766)Online publication date: 27-Nov-2018
  • (2018)Seeding Strategies for Semantic DisambiguationProceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries10.1145/3197026.3203874(343-344)Online publication date: 23-May-2018
  • Show More Cited By

Index Terms

  1. Improving Access to Large-scale Digital Libraries ThroughSemantic-enhanced Search and Disambiguation

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      JCDL '15: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries
      June 2015
      324 pages
      ISBN:9781450335942
      DOI:10.1145/2756406
      • General Chairs:
      • Paul Logasa Bogen,
      • Suzie Allard,
      • Holly Mercer,
      • Micah Beck,
      • Program Chairs:
      • Sally Jo Cunningham,
      • Dion Goh,
      • Geneva Henry
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 June 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. disambiguation
      2. semantic classification
      3. semantic search

      Qualifiers

      • Research-article

      Conference

      JCDL '15
      Sponsor:
      JCDL '15: 15th ACM/IEEE-CS Joint Conference on Digital Libraries
      June 21 - 25, 2015
      Tennessee, Knoxville, USA

      Acceptance Rates

      JCDL '15 Paper Acceptance Rate 18 of 60 submissions, 30%;
      Overall Acceptance Rate 415 of 1,482 submissions, 28%

      Upcoming Conference

      JCDL '24
      The 2024 ACM/IEEE Joint Conference on Digital Libraries
      December 16 - 20, 2024
      Hong Kong , China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 22 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)Capturing cultural heritage in East Asia and OceaniaCommunications of the ACM10.1145/337854863:4(50-52)Online publication date: 20-Mar-2020
      • (2018)A linked open data framework to enhance the discoverability and impact of culture heritageJournal of Information Science10.1177/016555151881265845:6(756-766)Online publication date: 27-Nov-2018
      • (2018)Seeding Strategies for Semantic DisambiguationProceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries10.1145/3197026.3203874(343-344)Online publication date: 23-May-2018
      • (2018)Semantically Enriched Line Search in a Humanities Digital Library2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI)10.1109/ICACCI.2018.8554885(2169-2174)Online publication date: Sep-2018
      • (2018)Capisco: low-cost concept-based access to digital librariesInternational Journal on Digital Libraries10.1007/s00799-018-0232-320:4(307-334)Online publication date: 14-Mar-2018
      • (2017)Information-seeking in large-scale digital librariesProceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries10.5555/3200334.3200365(253-256)Online publication date: 19-Jun-2017
      • (2017)Visual semantic enrichment for ereadingProceedings of the 31st British Computer Society Human Computer Interaction Conference10.14236/ewic/HCI2017.96(1-6)Online publication date: 3-Jul-2017
      • (2017)Good Applications for Crummy Entity Linkers?Proceedings of the 13th International Conference on Semantic Systems10.1145/3132218.3132237(81-88)Online publication date: 11-Sep-2017
      • (2017)Information-Seeking in Large-Scale Digital Libraries: Strategies for Scholarly Workset Creation2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL)10.1109/JCDL.2017.7991583(1-4)Online publication date: Jun-2017
      • (2017)Writers of the Lost Paper: A Case Study on Barriers to (Re-) Finding PublicationsDigital Libraries: Data, Information, and Knowledge for Digital Lives10.1007/978-3-319-70232-2_18(212-224)Online publication date: 3-Nov-2017
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media