Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3614407.3643703acmconferencesArticle/Chapter ViewAbstractPublication PagescslawConference Proceedingsconference-collections
research-article
Open access

Error-Tolerant E-Discovery Protocols

Published: 12 March 2024 Publication History

Abstract

We consider the multi-party classification problem introduced by Dong, Hartline, and Vijayaraghavan (2022) in the context of electronic discovery (e-discovery). Based on a request for production from the requesting party, the responding party is required to provide documents that are responsive to the request except for those that are legally privileged1. Our goal is to find a protocol that verifies that the responding party sends almost all responsive documents while minimizing the disclosure of non-responsive documents. We provide protocols in the challenging non-realizable setting, where the instance may not be perfectly separated by a linear classifier. We demonstrate empirically that our protocol successfully manages to find almost all relevant documents, while incurring only a small disclosure of non-responsive documents. We complement this with a theoretical analysis of our protocol in the single-dimensional setting, and other experiments on simulated data which suggest that the non-responsive disclosure incurred by our protocol may be unavoidable.

Supplementary Material

dong (dong.zip)
Supplemental movie, appendix, image and software files for, Error-Tolerant E-Discovery Protocols
PDF File (Appendix.pdf)
Appendix

References

[1]
Pranjal Awasthi, Maria Florina Balcan, and Philip M. Long. 2017. The Power of Localization for Efficiently Learning Linear Separators with Noise. J. ACM 63, 6, Article 50 (jan 2017), 27 pages. https://doi.org/10.1145/3006384
[2]
Wei-Cheng Chang, Felix X Yu, Yin-Wen Chang, Yiming Yang, and Sanjiv Kumar. 2020. Pre-training tasks for embedding-based large-scale retrieval. arXiv preprint arXiv:2002.03932 (2020).
[3]
Kenneth L Clarkson. 1994. More output-sensitive geometric algorithms. In Proceedings 35th Annual Symposium on Foundations of Computer Science. IEEE, 695--702.
[4]
Gordon V Cormack and Maura R Grossman. 2014. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 153--162.
[5]
Gordon V Cormack and Maura R Grossman. 2017. Technology-Assisted Review in Empirical Medicine: Waterloo Participation in CLEF eHealth 2017. CLEF (working notes) 11 (2017).
[6]
Gordon V Cormack and Mona Mojdeh. 2009. Machine Learning for Information Retrieval: TREC 2009 Web, Relevance Feedback and Legal Tracks. In TREC.
[7]
Jinshuo Dong, Jason Hartline, and Aravindan Vijayaraghavan. 2022. Classification Protocols with Minimal Disclosure. In Proceedings of the 2022 Symposium on Computer Science and Law. 67--76.
[8]
Brown v. Tellermate Holdings Ltd. 2014. Case No. 2:11-cv-1122, 2014 U.S. Dist. LEXIS 90123 (2014). https://casetext.com/case/brown-v-tellermate-holdings-ltd
[9]
Hyles v. New York City. 2016. 10 Civ. 3119 (AT)(AJP) (2016). https://casetext.com/case/hyles-v-nyc
[10]
Moore v. Groupe. 2012. 868 F. Supp. 2d 137 (2012). https://casetext.com/case/moore-v-groupe
[11]
O Goldreich, S Micali, and A Wigderson. 1987. How to play ANY mental game. In Proceedings of the nineteenth annual ACM symposium on Theory of computing. 218--229.
[12]
Shafi Goldwasser, Silvio Micali, and Charles Rackoff. 1989. The Knowledge Complexity of Interactive Proof Systems. SIAM J. Comput. 18, 1 (1989), 186--208.
[13]
Shafi Goldwasser, Guy N. Rothblum, Jonathan Shafer, and Amir Yehudayoff. 2021. Interactive Proofs for Verifying Machine Learning. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 185), James R. Lee (Ed.). Schloss Dagstuhl--Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 41:1--41:19. https://doi.org/10.4230/LIPIcs.ITCS.2021.41
[14]
Maura R Grossman and Gordon V Cormack. 2010. Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review. Rich. JL & Tech. 17 (2010), 1.
[15]
Maura R Grossman and Gordon V Cormak. 2012. Inconsistent responsiveness determination in document review: Difference of opinion or human error. Pace L. Rev. 32 (2012), 267.
[16]
Venkatesan Guruswami and Prasad Raghavendra. 2009. Hardness of Learning Halfspaces with Noise. SIAM J. Comput. 39, 2 (2009), 742--765. https://doi.org/10.1137/070685798 arXiv:https://doi.org/10.1137/070685798
[17]
Bruce Hedin, Stephen Tomlinson, Jason R Baron, and Douglas W Oard. 2009. Overview of the TREC 2009 Legal Track. In TREC.
[18]
Adam Tauman Kalai, Adam R. Klivans, Yishay Mansour, and Rocco A. Servedio. 2008. Agnostically Learning Halfspaces. SIAM J. Comput. 37, 6 (2008), 1777--1805. https://doi.org/10.1137/060649057 arXiv:https://doi.org/10.1137/060649057
[19]
Michael J. Kearns, Robert E. Schapire, and Linda M. Sellie. 1992. Toward Efficient Agnostic Learning. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory (Pittsburgh, Pennsylvania, USA) (COLT '92). Association for Computing Machinery, New York, NY, USA, 341--352. https://doi.org/10.1145/130385.130424
[20]
Michael J. Kearns and Umesh V. Vazirani. 1994. An Introduction to Computational Learning Theory. MIT Press, Cambridge, MA, USA.
[21]
Daniel N Kluttz and Deirdre K Mulligan. 2019. Automated decision support technologies and the legal profession. Berkeley Technology Law Journal 34, 3 (2019), 853--890.
[22]
Antoine Louis and Gerasimos Spanakis. 2021. A statutory article retrieval dataset in French. arXiv preprint arXiv:2108.11792 (2021).
[23]
Alison O'Mara-Eves, James Thomas, John McNaught, Makoto Miwa, and Sophia Ananiadou. 2015. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic reviews 4, 1 (2015), 1--22.
[24]
Allyson Haynes Stuart. 2021. A Right to Privacy for Modern Discovery. Geo. Mason L. Rev. 29 (2021), 675.
[25]
Jan van den Brand, Yin Tat Lee, Aaron Sidford, and Zhao Song. 2020. Solving tall dense linear programs in nearly linear time. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. 775--788.
[26]
Jie Zou and Evangelos Kanoulas. 2020. Towards question-based high-recall information retrieval: Locating the last few relevant documents for technology-assisted reviews. ACM Transactions on Information Systems (TOIS) 38, 3 (2020), 1--35.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CSLAW '24: Proceedings of the Symposium on Computer Science and Law
March 2024
161 pages
ISBN:9798400703331
DOI:10.1145/3614407
This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 March 2024

Check for updates

Author Tags

  1. Classification
  2. E-Discovery
  3. Multi-party Protocol

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

CSLAW '24
Sponsor:
CSLAW '24: Symposium on Computer Science and Law
March 12 - 13, 2024
MA, Boston, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 152
    Total Downloads
  • Downloads (Last 12 months)152
  • Downloads (Last 6 weeks)35
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media