research-article

Evaluation of BILBO reference parsing in digital humanities via a comparison of different tools

Authors:

Marin DacosAuthors Info & Claims

DocEng '12: Proceedings of the 2012 ACM symposium on Document engineering

Pages 209 - 212

https://doi.org/10.1145/2361354.2361400

Published: 04 September 2012 Publication History

Get Access

Abstract

Automatic bibliographic reference annotation involves the tokenization and identification of reference fields. Recent methods use machine learning techniques such as Conditional Random Fields to tackle this problem. On the other hand, the state of the art methods always learn and evaluate their systems with a well structured data having simple format such as bibliography at the end of scientific articles. And that is a reason why the parsing of new reference different from a regular format does not work well. In our previous work, we have established a standard for the tokenization and feature selection with a less formulaic data such as notes. In this paper, we evaluate our system BILBO with other popular online reference parsing tools on a new data from totally different source. BILBO is constructed with our own corpora extracted and annotated from real world data, digital humanities articles of Revues.org site (90% in French) of OpenEdition. The robustness of BILBO system allows a language independent tagging result. We expect that this first attempt of evaluation will motivate the development of other efficient techniques for the scattered and less formulaic bibliographic references.

References

[1]

E. Cortez, D. Oliveira, A. S. da Silva, E. S. de Moura, and A. H. Laender. Joint unsupervised structure discovery and information extraction. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data}, pages 541--552, 2011.

Digital Library

Google Scholar

[2]

E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178(60):471--479, November 1972.

Crossref

Google Scholar

[3]

C. L. Giles, K. D. Bollacker, and S. Lawrence. Citeseer: an automatic citation indexing system. In International Conference on Digital Libraries, pages 89--98. ACM Press, 1998.

Digital Library

Google Scholar

[4]

J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pages 282--289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.

Digital Library

Google Scholar

[5]

F. Peng and A. McCallum. Information extraction from research papers using conditional random fields. Inf. Process. Manage., 42:963--979, July 2006.

Digital Library

Google Scholar

[6]

V. Qazvinian and D. R. Radev. Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, pages 689--696, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics.

Digital Library

Google Scholar

[7]

A. Ritchie, S. Robertson, and S. Teufel. Comparing citation contexts for information retrieval. In Proceeding of the 17th ACM conference on Information and knowledge management, CIKM '08, pages 213--222, New York, NY, USA, 2008. ACM.

Digital Library

Google Scholar

[8]

K. Seymore, A. Mccallum, and R. Rosenfeld. Learning hidden markov model structure for information extraction. In AAAI 99 Workshop on Machine Learning for Information Extraction, pages 37--42, 1999.

Google Scholar

[9]

C. Sutton and A. McCallum. An introduction to conditional random fields. Foundations and Trends in Machine Learning}, 2011. To appear.

Google Scholar

Cited By

View all

Backes TIurshina AShahid MMayr P(2024)Comparing free reference extraction pipelinesInternational Journal on Digital Libraries10.1007/s00799-024-00404-625:4(841-853)Online publication date: 20-Jun-2024
https://doi.org/10.1007/s00799-024-00404-6
Tkaczyk DCollins ASheridan PBeel JChen JGonçalves MAllen JFox EKan MPetras V(2018)Machine Learning vs. Rules and Out-of-the-Box vs. RetrainedProceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries10.1145/3197026.3197048(99-108)Online publication date: 23-May-2018
https://dl.acm.org/doi/10.1145/3197026.3197048
Bellot PBonnefoy LBouvier VDuvert FKim Y(2013)Large Scale Text Mining Approaches for Information Retrieval and ExtractionInnovations in Intelligent Machines-410.1007/978-3-319-01866-9_1(3-45)Online publication date: 15-Nov-2013
https://doi.org/10.1007/978-3-319-01866-9_1

Index Terms

Evaluation of BILBO reference parsing in digital humanities via a comparison of different tools
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Reversible online BIST using bidirectional BILBO
CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

Test generation for reversible circuits is currently gaining interest due to its feasibility towards quantum implementation and asymptotically zero-power dissipation. A novel BIST (Built-In-Self-Test) method for reversible circuits is proposed in this ...
Parsing Tools for Italian Phraseological Units
Computational Science and Its Applications – ICCSA 2021
Abstract
Phraseological complexity plays a critical role in assessing the language competence level necessary to understand or produce text by a language learner and automated tools supporting international certifications for second languages. Appropriate ...
Comparing bibliometric statistics obtained from the Web of Science and Scopus

For more than 40 years, the Institute for Scientific Information (ISI, now part of Thomson Reuters) produced the only available bibliographic databases from which bibliometricians could compile large-scale bibliometric indicators. ISI's citation indexes,...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

DocEng '12: Proceedings of the 2012 ACM symposium on Document engineering

September 2012

256 pages

ISBN:9781450311168

DOI:10.1145/2361354

General Chair:
Cyril Concolato
Telecom ParisTech, France
,
Program Chair:
Patrick Schmitz
University of California, Berkeley, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGDOC: ACM Special Interest Group for Design of Communications

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 September 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DocEng '12

Sponsor:

SIGWEB

DocEng '12: ACM Symposium on Document Engineering

September 4 - 7, 2012

Paris, France

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
150
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Backes TIurshina AShahid MMayr P(2024)Comparing free reference extraction pipelinesInternational Journal on Digital Libraries10.1007/s00799-024-00404-625:4(841-853)Online publication date: 20-Jun-2024
https://doi.org/10.1007/s00799-024-00404-6
Tkaczyk DCollins ASheridan PBeel JChen JGonçalves MAllen JFox EKan MPetras V(2018)Machine Learning vs. Rules and Out-of-the-Box vs. RetrainedProceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries10.1145/3197026.3197048(99-108)Online publication date: 23-May-2018
https://dl.acm.org/doi/10.1145/3197026.3197048
Bellot PBonnefoy LBouvier VDuvert FKim Y(2013)Large Scale Text Mining Approaches for Information Retrieval and ExtractionInnovations in Intelligent Machines-410.1007/978-3-319-01866-9_1(3-45)Online publication date: 15-Nov-2013
https://doi.org/10.1007/978-3-319-01866-9_1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Reversible online BIST using bidirectional BILBO

Parsing Tools for Italian Phraseological Units

Comparing bibliometric statistics obtained from the Web of Science and Scopus

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations