Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3404835.3462794acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

PYA0: A Python Toolkit for Accessible Math-Aware Search

Published: 11 July 2021 Publication History

Abstract

Mathematical Information Retrieval (MIR) has been actively studied in recent years and many fruitful results have emerged. Among those, the Approach Zero system is one of the few math-aware search engines that is able to perform substructure matching efficiently. Furthermore, it has been deployed in ARQMath2020, the most recent community-wide MIR evaluation, as a strong baseline due to its empirical effectiveness and ability to handle structured math content. However, in order to implement a retrieval model that handles structured queries efficiently, Approach Zero is written in C from the ground up, requiring special pipelines for processing math content and queries. Thus, the system is not conveniently accessible and reusable to the community as a research tool. In this paper, we present PyA0, an easy-to-use Python toolkit built on Approach Zero that improves its accessibility to researchers. We introduce the toolkit interface and report evaluation results on popular MIR datasets to demonstrate the effectiveness and efficiency of our toolkit. We have made PyA0 source code publicly accessible at https://github.com/approach0/pya0, which includes a link to a notebook demo.

References

[1]
Kenny Davila and Richard Zanibbi. 2017. Layout and Semantics: Combining Representations for Mathematical Formula Search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1165--1168.
[2]
Giovanni Yoko Kristianto, Goran Topic, and Akiko Aizawa. 2016. MCAT Math Retrieval System for NTCIR-12 MathIR Task. In Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies.
[3]
Behrooz Mansouri, Shaurya Rohatgi, Douglas W Oard, Jian Wu, C Lee Giles, and Richard Zanibbi. 2019. Tangent-CFT: An Embedding Model for Mathematical Formulas. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. 11--18.
[4]
Shaurya Rohatgi, Jian Wu, and C Lee Giles. 2020. PSU at CLEF-2020 ARQMath Track: Unsupervised Re-ranking using Pretraining. In International Conference of the Cross-Language Evaluation Forum for European Languages (Working Notes).
[5]
Tetsuya Sakai and Noriko Kando. 2008. On Information Retrieval Metrics Designed for Evaluation with Incomplete Relevance Assessments. Information Retrieval, Vol. 11, 5 (2008), 447--470.
[6]
NG Yin Ki, Dallas J Fraser, Besat Kassaie, George Labahn, Mirette S Marzouk, Frank Wm Tompa, and Kevin Wang. 2020. Dowsing for Math Answers with Tangent-L. In International Conference of the Cross-Language Evaluation Forum for European Languages (Working Notes).
[7]
Richard Zanibbi, Akiko Aizawa, Michael Kohlhase, Iadh Ounis, Goran Topic, and Kenny Davila. 2016. NTCIR-12 MathIR Task Overview. In Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies.
[8]
Richard Zanibbi and Dorothea Blostein. 2012. Recognition and Retrieval of Mathematical Expressions. Int. J. Doc. Anal. Recognit., Vol. 15, 4 (Dec. 2012), 331--357.
[9]
Richard Zanibbi, Douglas W Oard, Anurag Agarwal, and Behrooz Mansouri. 2020. Overview of ARQMath 2020: CLEF Lab on Answer Retrieval for Questions on Math. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 169--193.
[10]
Wei Zhong and Hui Fang. 2015. A Novel Similarity-Search Method for Mathematical Content in LaTeX Markup and its Implementation. Master's thesis. University of Delaware.
[11]
Wei Zhong, Shaurya Rohatgi, Jian Wu, C Lee Giles, and Richard Zanibbi. 2020. Accelerating Substructure Similarity Search for Formula Retrieval. In European Conference on Information Retrieval. Springer, 714--727.
[12]
Wei Zhong and Richard Zanibbi. 2019. Structural Similarity Search for Formulas using Leaf-Root Paths in Operator Subtrees. In European Conference on Information Retrieval (ECIR 2019). Springer.

Cited By

View all
  • (2023)JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem SolvingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599850(5660-5672)Online publication date: 6-Aug-2023
  • (2023)One Blade for One Purpose: Advancing Math Information Retrieval using Hybrid SearchProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591746(141-151)Online publication date: 19-Jul-2023
  • (2023)Answer Retrieval for Math Questions Using Structural and Dense RetrievalExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-42448-9_18(209-223)Online publication date: 18-Sep-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2021
2998 pages
ISBN:9781450380379
DOI:10.1145/3404835
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2021

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Short-paper

Funding Sources

  • IDSA

Conference

SIGIR '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem SolvingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599850(5660-5672)Online publication date: 6-Aug-2023
  • (2023)One Blade for One Purpose: Advancing Math Information Retrieval using Hybrid SearchProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591746(141-151)Online publication date: 19-Jul-2023
  • (2023)Answer Retrieval for Math Questions Using Structural and Dense RetrievalExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-42448-9_18(209-223)Online publication date: 18-Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media