Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2740769.2740783acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Disambiguating publication venue titles using association rules

Published: 08 September 2014 Publication History

Abstract

Research agencies in several countries evaluate the impact of scientific publications of researcher groups to define their investments, and one of the main used metrics is the quality of the publication venues where their works were published. Several bibliometric indexes have been formulated by measuring the quality of a publication venue. However, given a set of citations extracted, for example, from curricula vitae of a researcher group, to effectively use bibliometric indexes to evaluate their quality it is necessary to identify correctly the publication venue title of each citation. This task is not easy, since there are not unique identifiers for publication venues. Frequently, citations contain abbreviated forms and acronyms, publication venues share similar titles, sometimes they change their titles, divide or merge, creating new ones. Traditional digital libraries deal with this problem by creating Authority Files. In this work, we present a twofold contribution: (i) the creation of a Computer Science publication venue authority file and (ii) the proposal of a method that uses association rules to disambiguate publication venue titles originated from citations. The disambiguator is a supervised learning method that uses the authority file to train a classifier, whose generated model is a set of association rules to identify publication venues. Experiments show that our method obtains better results than three state of art baselines.

References

[1]
J. C. French, A. L. Powell, and E. Schulman, "Using clustering strategies for creating authority files," J. Amer. Soc. Inform. Sci., vol. 51, no. 8, pp. 774--786, 2000.
[2]
R. Baeza-Yates and B. Ribeiro-Neto, Modern information retrieval: The Concepts and Technology behind Search. Addison-Wesley Professional, 2011.
[3]
P. Jaccard, "Étude comparative de la distribuition florale dans une portion des Alpes et des Jura," Bulletin de la Société Vaudoise des Sciences Naturelles, vol. 37, pp. 547--579, 1901.
[4]
V. I. Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals," Soviet Physics Doklady, vol. 10, no. 8, pp. 707--710, 1966.
[5]
L. Auld, "Authority control: An eight-year review," Library Resources & Tech. Services, vol. 26, pp. 319--330, 1982.
[6]
"VIAF: The virtual international authority file," 2014, http://viaf.org/. Accessed in February, 2014.
[7]
D. Lee et al., "Are your citations clean?" Commun. ACM, vol. 50, no. 12, pp. 33--38, December 2007.
[8]
R. Agrawal and R. Srikant, "Fast algorithms for mining association rules," in Proc. of the 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, 1994, pp. 487--499.
[9]
R. Bennett et al., "VIAF (virtual international authority file): Linking die deutsche bibliothek and library of congress name authority files," in Proc. of the World Library and Information Congr.: 72nd IFLA General Conf. and Council, Seoul, Korea, August 2006.
[10]
A. Leiva-Mederos et al., "AUTHORIS: a tool for authority control in the semantic web," Library Hi Tech, vol. 31, no. 3, pp. 536--553, 2013.
[11]
N. Houssos et al., "Implementing citation management and report generation value-added services over oai-pmh compliant repositories," in Proc. of the 5th Int. Conf. on Open Repositories, Madrid, Spain, July 2010.
[12]
D. A. Pereira et al., "Using web information for creating publication venue authority files," in Proc. of the 8th ACM/IEEE-CS Joint Conf. on Digital Libraries. Pittsburgh, USA: ACM New York, NY, USA, June 2008, pp. 295--304.
[13]
D. A. Pereira et al., "A generic web-based entity resolution framework," J. Amer. Soc. Inform. Sci. Tech., vol. 62, no. 5, pp. 919--932, May 2011.
[14]
S. Lawrence et al., "Digital libraries and autonomous citation indexing," IEEE Computer, vol. 32, no. 6, pp. 67--71, 1999.
[15]
S. Lawrence et al., "Autonomous citation matching," in Proc. of the 3rd Annu. Conf. on Autonomous Agents. Seattle, USA: ACM, New York, NY, USA, May 1999, pp. 392--393.
[16]
M. Ley and P. Reuther, "Maintaining an online bibliographical database: The problem of data quality," in Proc. of the Extraction et Gestion des Connaissances (EGC), Lille, France, 2006, pp. 5--10.
[17]
A. A. Ferreira et al., "A brief survey of automatic methods for author name disambiguation," SIGMOD Record, vol. 41, no. 2, pp. 15--26, 2012.
[18]
J. W. Warner and E. W. Brown, "Automated name authority control," in Proc. of the 1st ACM/IEEE-CS Joint Conf. on Digital Libraries, Roanoke, USA, June 2001, pp. 21--22.
[19]
P. T. Davis et al., "Methods for precise named entity matching in digital collections," in Proc. of the 3rd ACM/IEEE-CS Joint Conf. on Digital Libraries, Houston, USA, May 2003, pp. 125--127.
[20]
O. Benjelloun et al., "Swoosh: A generic approach to entity resolution," The VLDB Journal, vol. 18, no. 1, pp. 255--276, March 2009.
[21]
E. Ioannou et al., "On Generating Benchmark Data for Entity Matching," J. Data Semantics, vol. 2, no. 1, pp. 37--56, 2013. {Online}. Available: http://dx.doi.org/10.1007/s13740-012-0015-8
[22]
A. K. Elmagarmid et al., "Duplicate record detection: A survey," IEEE Trans. Knowl. Data Eng., vol. 19, no. 1, pp. 1--16, 2007.
[23]
H. Köpcke and E. Rahm, "Frameworks for entity matching: A comparison," Data & Knowl. Eng., vol. 69, no. 2, pp. 197--210, 2010. {Online}. Available: http://dx.doi.org/10.1016/j.datak.2009.10.003
[24]
N. Koudas et al., "Record linkage: Similarity measures and algorithms," in Proc. of the ACM SIGMOD Int. Conf. on Management of Data. Chicago, USA: ACM New York, NY, USA, June 2006, pp. 802--803.
[25]
L. Getoor and A. Machanavajjhala, "Entity resolution: Theory, practice and open challenges," Proc. VLDB Endowment, vol. 5, no. 12, pp. 2018--2019, 2012, tutorial available at http://www.cs.umd.edu/~getoor/Tutorials/ER_VLDB2012.pdf.
[26]
A. Veloso et al., "Cost-effective on-demand associative author name disambiguation," Inform. Process. Manage., vol. 48, no. 4, pp. 680--697, 2012.
[27]
A. Veloso et al., "Multi-evidence, multi-criteria, lazy associative document classification," in Proc. of the 15th ACM Int. Conf. on Information and Knowledge Management. Arlington, Virginia, USA: ACM, New York, NY, USA, 20, pp. 218--227.
[28]
A. H. F. Laender et al., "A brief survey of web data extraction tools," SIGMOD Rec., vol. 31, no. 2, pp. 84--93, 2002.
[29]
I. H. Witten et al., Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed. Morgan Kaufmann, 2011.
[30]
V. N. Vapnik, The Nature of Statistical Learning Theory. New York, USA: Springer-Velag, 1995.
[31]
A. H. F. Laender et al., "Assessing the research and education quality of the top brazilian computer science graduate programs," ACM SIGCSE Bulletin, vol. 4, no. 2, pp. 135--145, 2008.

Cited By

View all
  • (2018)Estimating Similarity Among Entities Aided by the Web when Only the Entity Name is AvailableProceedings of the 24th Brazilian Symposium on Multimedia and the Web10.1145/3243082.3243118(253-260)Online publication date: 16-Oct-2018
  • (2017)Classifying short unstructured data using the Apache Spark platformProceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries10.5555/3200334.3200349(129-138)Online publication date: 19-Jun-2017
  • (2017)A Supervised Learning Approach To Entity Matching Between Scholarly Big DatasetsProceedings of the 9th Knowledge Capture Conference10.1145/3148011.3154470(1-4)Online publication date: 4-Dec-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '14: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries
September 2014
498 pages
ISBN:9781479955695

Sponsors

Publisher

IEEE Press

Publication History

Published: 08 September 2014

Check for updates

Author Tags

  1. association rules
  2. authority file
  3. citation
  4. entity resolution
  5. publication venue

Qualifiers

  • Research-article

Conference

JCDL '14
Sponsor:
JCDL '14: 14th ACM/IEEE-CS Joint Conference on Digital Libraries
September 8 - 12, 2014
London, United Kingdom

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Upcoming Conference

JCDL '24
The 2024 ACM/IEEE Joint Conference on Digital Libraries
December 16 - 20, 2024
Hong Kong , China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Estimating Similarity Among Entities Aided by the Web when Only the Entity Name is AvailableProceedings of the 24th Brazilian Symposium on Multimedia and the Web10.1145/3243082.3243118(253-260)Online publication date: 16-Oct-2018
  • (2017)Classifying short unstructured data using the Apache Spark platformProceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries10.5555/3200334.3200349(129-138)Online publication date: 19-Jun-2017
  • (2017)A Supervised Learning Approach To Entity Matching Between Scholarly Big DatasetsProceedings of the 9th Knowledge Capture Conference10.1145/3148011.3154470(1-4)Online publication date: 4-Dec-2017
  • (2016)An Example of Automatic Authority ControlProceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries10.1145/2910896.2925458(255-256)Online publication date: 19-Jun-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media