research-article

Improving ML-based information retrieval software with user-driven functional testing and defect class analysis

Authors:

Junjie Zhu,

Teng Long,

Wei Wang,

Atif MemonAuthors Info & Claims

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 1291 - 1301

https://doi.org/10.1145/3540250.3558941

Published: 09 November 2022 Publication History

Get Access

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Improving ML-based information retrieval software with user-driven functional testing and defect class analysis

Pages 1291 - 1301

Abstract
References

Abstract

Machine Learning (ML) has become the cornerstone of information retrieval (IR) software, as it can drive better user experience by leveraging information-rich data and complex models. However, evaluating the emergent behavior of ML-based IR software can be challenging with traditional software testing approaches: when developers modify the software, they cannot often extract useful information from individual test instances; rather, they seek to holistically verify whether—and where—their modifications caused significant regressions or improvements at scale. In this paper, we introduce not only such a holistic approach to evaluate the system-level behavior of the software, but also the concept of a defect class, which represents a partition of the input space on which the ML-based software does measurably worse for an existing feature or on which the ML task is more challenging for a new feature. We leverage large volumes of functional test cases, automatically obtained, to derive these defect classes, and propose new ways to improve the IR software from an end-user’s perspective. Applying our approach on a real production Search-AutoComplete system that contains a query interpretation ML component, we demonstrate that (1) our holistic metrics successfully identified two regressions and one improvement, where all 3 were independently verified with retrospective A/B experiments, (2) the automatically obtained defect classes provided actionable insights during early-stage ML development, and (3) we also detected defect classes at the finer sub-component level for which there were significant regressions, which we blocked prior to different releases.

References

[1]

Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: A Case Study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 291–300. https://doi.org/10.1109/ICSE-SEIP.2019.00042

Abstract

References

Index Terms

Recommendations

Question-driven information retrieval systems

Improving video event retrieval by user feedback

Ontology based user query interpretation for semantic multimedia contents retrieval

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations