Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2361354.2361400acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Evaluation of BILBO reference parsing in digital humanities via a comparison of different tools

Published: 04 September 2012 Publication History

Abstract

Automatic bibliographic reference annotation involves the tokenization and identification of reference fields. Recent methods use machine learning techniques such as Conditional Random Fields to tackle this problem. On the other hand, the state of the art methods always learn and evaluate their systems with a well structured data having simple format such as bibliography at the end of scientific articles. And that is a reason why the parsing of new reference different from a regular format does not work well. In our previous work, we have established a standard for the tokenization and feature selection with a less formulaic data such as notes. In this paper, we evaluate our system BILBO with other popular online reference parsing tools on a new data from totally different source. BILBO is constructed with our own corpora extracted and annotated from real world data, digital humanities articles of Revues.org site (90% in French) of OpenEdition. The robustness of BILBO system allows a language independent tagging result. We expect that this first attempt of evaluation will motivate the development of other efficient techniques for the scattered and less formulaic bibliographic references.

References

[1]
E. Cortez, D. Oliveira, A. S. da Silva, E. S. de Moura, and A. H. Laender. Joint unsupervised structure discovery and information extraction. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data}, pages 541--552, 2011.
[2]
E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178(60):471--479, November 1972.
[3]
C. L. Giles, K. D. Bollacker, and S. Lawrence. Citeseer: an automatic citation indexing system. In International Conference on Digital Libraries, pages 89--98. ACM Press, 1998.
[4]
J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pages 282--289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.
[5]
F. Peng and A. McCallum. Information extraction from research papers using conditional random fields. Inf. Process. Manage., 42:963--979, July 2006.
[6]
V. Qazvinian and D. R. Radev. Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, pages 689--696, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics.
[7]
A. Ritchie, S. Robertson, and S. Teufel. Comparing citation contexts for information retrieval. In Proceeding of the 17th ACM conference on Information and knowledge management, CIKM '08, pages 213--222, New York, NY, USA, 2008. ACM.
[8]
K. Seymore, A. Mccallum, and R. Rosenfeld. Learning hidden markov model structure for information extraction. In AAAI 99 Workshop on Machine Learning for Information Extraction, pages 37--42, 1999.
[9]
C. Sutton and A. McCallum. An introduction to conditional random fields. Foundations and Trends in Machine Learning}, 2011. To appear.

Cited By

View all
  • (2024)Comparing free reference extraction pipelinesInternational Journal on Digital Libraries10.1007/s00799-024-00404-625:4(841-853)Online publication date: 20-Jun-2024
  • (2018)Machine Learning vs. Rules and Out-of-the-Box vs. RetrainedProceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries10.1145/3197026.3197048(99-108)Online publication date: 23-May-2018
  • (2013)Large Scale Text Mining Approaches for Information Retrieval and ExtractionInnovations in Intelligent Machines-410.1007/978-3-319-01866-9_1(3-45)Online publication date: 15-Nov-2013

Index Terms

  1. Evaluation of BILBO reference parsing in digital humanities via a comparison of different tools

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DocEng '12: Proceedings of the 2012 ACM symposium on Document engineering
    September 2012
    256 pages
    ISBN:9781450311168
    DOI:10.1145/2361354
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 September 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. BILBO
    2. automatic annotation
    3. bibliographic reference
    4. comparison

    Qualifiers

    • Research-article

    Conference

    DocEng '12
    Sponsor:
    DocEng '12: ACM Symposium on Document Engineering
    September 4 - 7, 2012
    Paris, France

    Acceptance Rates

    Overall Acceptance Rate 194 of 564 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 14 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Comparing free reference extraction pipelinesInternational Journal on Digital Libraries10.1007/s00799-024-00404-625:4(841-853)Online publication date: 20-Jun-2024
    • (2018)Machine Learning vs. Rules and Out-of-the-Box vs. RetrainedProceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries10.1145/3197026.3197048(99-108)Online publication date: 23-May-2018
    • (2013)Large Scale Text Mining Approaches for Information Retrieval and ExtractionInnovations in Intelligent Machines-410.1007/978-3-319-01866-9_1(3-45)Online publication date: 15-Nov-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media