Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Responsible data management

Published: 20 May 2022 Publication History

Abstract

Perspectives on the role and responsibility of the data-management research community in designing, developing, using, and overseeing automated decision systems.

References

[1]
Abiteboul, S. and Stoyanovich, J. Transparency, fairness, data protection, neutrality: Data management challenges in the face of new regulation. J. of Data and Information Quality 11, 3 (2019), 15:1--15:9.
[2]
Asudeh, A., Jin, Z., and Jagadish, H.V. Assessing and remedying coverage for a given dataset. In 35th IEEE International Conference on Data Engineering (April 2019), 554--565.
[3]
Baeza-Yates, R. Bias on the web. Communications of the ACM 61, 6 (2018), 54--61.
[4]
Biessmann, F., Salinas, D., Schelter, S., Schmidt, P., and Lange, D. Deep learning for missing value imputation in tables with non-numerical data. In Proceedings of the 27th ACM Intern. Conf. on Information and Knowledge Management (2018), 2017--2025.
[5]
Bogen, M. and Rieke, A. Help wanted: An examination of hiring algorithms, equity, and bias. Upturn (2018).
[6]
Cauwenberghs, G. and Poggio, T. Incremental and decremental support vector machine learning. NeurIPS (2001), 409--415.
[7]
Chen, I., Johansson, F., and Sontag, D. Why is my classifier discriminatory? S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 3543--3554.
[8]
Chouldechova, A. and Roth, A. A snapshot of the frontiers of fairness in machine learning. Communications of the ACM 63, 5 (2020), 82--89.
[9]
Crenshaw, K. Demarginalizing the intersection of race and sex: A Black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum 1 (1989), 139--167.
[10]
Datta, A., Sen, S., and Zick, Y. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In IEEE Symposium on Security and Privacy (May 2016), 598--617.
[11]
Friedler, S., Scheidegger, C., and Venkatasubramanian, S. The (im)possibility of fairness: Different value systems require different mechanisms for fair decision making. Communications of the ACM 64, 4 (2021), 136--143.
[12]
Friedman, B. and Nissenbaum, H. Bias in computer systems. ACM Transactions on Information Systems 14, 3 (1996), 330--347.
[13]
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J., Wallach, H., Daumé III, H., and Crawford, K. Datasheets for datasets. CoRR (2018), abs/1803.09010.
[14]
Ginart, A., Guan, M., Valiant, G., and Zou, J. Making AI forget you: Data deletion in machine learning. In NeurIPS (2019), 3513--3526.
[15]
Grafberger, S., Stoyanovich, J., and Schelter, S. Lightweight inspection of data preprocessing in native machine learning pipelines. In 11th Conf. on Innovative Data Sys. Research, Online Proceedings (January 2021), http://www.cidrdb.org.
[16]
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D. A survey of methods for explaining black box models. ACM Computing Surveys 51, 5 (2019), 93:1--93:42.
[17]
Herschel, M., Diestelkämper, R., and Ben Lahmar, H. A survey on provenance: What for? What form? What from? VLDB Journal 26, 6 (2017), 881--906.
[18]
Holland, S., Hosny, A., Newman, S., Joseph, J., and Chmielinski, K. The dataset nutrition label: A framework to drive higher data quality standards. CoRR (2018), abs/1805.03677.
[19]
Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J., Ramakrishnan, R., and Shahabi, C. Big data and its technical challenges. Communications of the ACM 57, 7 (2014), 86--94.
[20]
Kappelhof, J. Survey research and the quality of survey data among ethnic minorities. In Total Survey Error in Practice, Wiley (2017).
[21]
Kilbertus, N., Carulla, M., Parascandolo, G., Hardt, M., Janzing, D., and Schölkopf, B. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems (2017), 656--666.
[22]
Kusner, M., Loftus, J., Russell, C., and Silva, R. Counterfactual fairness. I. Guyon, U. von Luxburg, S. Bengio, H.M. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, In Advances in Neural Information Processing Systems 30: (2017), 4066--4076.
[23]
Lehr, D. and Ohm, P. Playing with the data: What legal scholars should learn about machine learning. UC Davis Law Review 51, 2 (2017), 653--717.
[24]
Lewis, A. and Stoyanovich, J. Teaching responsible data science. Intern. J. of Artificial Intelligence in Education (2021).
[25]
Mitchell, M., et al. Model cards for model reporting. In Proceedings of the Conf. on Fairness, Accountability, and Transparency 2019, 220--229.
[26]
Olteanu, A., Castillo, C., Diaz, F., and Kiciman, E. Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers Big Data 2, 13 (2019).
[27]
Rabanser, S., Günnemann, S., and Lipton, Z. Failing loudly: An empirical study of methods for detecting dataset shift. H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Gannett, editors. In Advances in Neural Information Processing Systems 32 (December 2019), 1394--1406.
[28]
Reeves, R. and Halikias, D. Race gaps in SAT scores highlight inequality and hinder upward mobility. Brookings (2017), https://www.brookings.edu/research/race-gaps-in-sat-scores-highlight-inequality-and-hinder-upward-mobility.
[29]
Salimi, B., Rodriguez, L., Howe, B., and Suciu, D. Interventional fairness: Causal database repair for algorithmic fairness. P.A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, and T. Kraska, editors. In Proceedings of the 2019 Intern. Conf. on Management of Data, 793--810.
[30]
Sarkar, S., Papon, T., Staratzis, D., and Athanassoulis, M. Lethe: A tunable delete-aware LSM engine. In Proceedings of the 2020 Intern. Conf. on Management of Data.
[31]
Schelter, S. "Amnesia"--a selection of machine learning models that can forget user data very fast. Conf. on Innovative Data Systems Research, 2020.
[32]
Schelter, S., Grafberger, S., and Dunning, T. HedgeCut: Maintaining randomised trees for low-latency machine unlearning. In Proceedings of the 2021 Intern. Conf. on Management of Data.
[33]
Schelter, S. and Stoyanovich, J. Taming technical bias in machine learning pipelines. IEEE Data Engineering Bulletin 43, 4 (2020).
[34]
Selbst, A. Disparate impact in big data policing. Georgia Law Review 52, 109 (2017).
[35]
Shastri, S., Banakar, V., Wasserman, M., Kumar, A., and Chidambaram, V. Understanding and benchmarking the impact of GDPR on database systems. PVLDB (2020).
[36]
Stoyanovich, J. and Howe, B. Nutritional labels for data and models. IEEE Data Engineering Bulletin 42, 3 (2019), 13--23.
[37]
Stoyanovich, J., Howe, B., and Jagadish, H.V. Responsible data management. In Proceedings of the VLDB Endowment 13, 12 (2020), 3474--3488.
[38]
Yang, K., Loftus, J., and Stoyanovich, J. Causal intersectionality and fair ranking. K. Ligett and S. Gupta, editors. In 2nd Symposium on Foundations of Responsible Computing, Volume 192 of LIPICS, Schloss Dagstuhl--Leibniz Center for Informatics (June 2021), 7:1--7:20.
[39]
Yang, K., Stoyanovich, J., Asudeh, A., Howe, B., Jagadish, H.V., and Miklau, G. A nutritional label for rankings. G. Das, C. Jermaine, and P. Bernstein, editors. In Proceedings of the 2018 Intern. Conf. on Management of Data, 1773--1776.
[40]
Zehlike, M., Yang, K., and Stoyanovich, J. Fairness in ranking: A survey. CoRR (2021), abs/2103.14000.

Cited By

View all
  • (2024)Snarcase - Regain Control over Your Predictions with Low-Latency Machine UnlearningProceedings of the VLDB Endowment10.14778/3685800.368585317:12(4273-4276)Online publication date: 1-Aug-2024
  • (2024)FairHash: A Fair and Memory/Time-efficient HashmapProceedings of the ACM on Management of Data10.1145/36549392:3(1-29)Online publication date: 30-May-2024
  • (2024)Towards Interactively Improving ML Data Preparation Code via "Shadow Pipelines"Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663327(7-11)Online publication date: 9-Jun-2024
  • Show More Cited By

Index Terms

  1. Responsible data management

                                  Recommendations

                                  Comments

                                  Please enable JavaScript to view thecomments powered by Disqus.

                                  Information & Contributors

                                  Information

                                  Published In

                                  cover image Communications of the ACM
                                  Communications of the ACM  Volume 65, Issue 6
                                  June 2022
                                  98 pages
                                  ISSN:0001-0782
                                  EISSN:1557-7317
                                  DOI:10.1145/3538687
                                  Issue’s Table of Contents
                                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                                  Publisher

                                  Association for Computing Machinery

                                  New York, NY, United States

                                  Publication History

                                  Published: 20 May 2022
                                  Published in CACM Volume 65, Issue 6

                                  Permissions

                                  Request permissions for this article.

                                  Check for updates

                                  Qualifiers

                                  • Research-article
                                  • Popular
                                  • Refereed

                                  Funding Sources

                                  Contributors

                                  Other Metrics

                                  Bibliometrics & Citations

                                  Bibliometrics

                                  Article Metrics

                                  • Downloads (Last 12 months)1,834
                                  • Downloads (Last 6 weeks)135
                                  Reflects downloads up to 12 Nov 2024

                                  Other Metrics

                                  Citations

                                  Cited By

                                  View all
                                  • (2024)Snarcase - Regain Control over Your Predictions with Low-Latency Machine UnlearningProceedings of the VLDB Endowment10.14778/3685800.368585317:12(4273-4276)Online publication date: 1-Aug-2024
                                  • (2024)FairHash: A Fair and Memory/Time-efficient HashmapProceedings of the ACM on Management of Data10.1145/36549392:3(1-29)Online publication date: 30-May-2024
                                  • (2024)Towards Interactively Improving ML Data Preparation Code via "Shadow Pipelines"Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663327(7-11)Online publication date: 9-Jun-2024
                                  • (2024)First Workshop on Governance, Understanding and Integration of Data for Effective and Responsible AI (GUIDE-AI)Companion of the 2024 International Conference on Management of Data10.1145/3626246.3655019(661-662)Online publication date: 9-Jun-2024
                                  • (2024)Fairness-Aware Data Preparation for Entity Matching2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00268(3476-3489)Online publication date: 13-May-2024
                                  • (2024)Employing Hybrid AI Systems to Trace and Document Bias in ML PipelinesIEEE Access10.1109/ACCESS.2024.342738812(96821-96847)Online publication date: 2024
                                  • (2024)Toward Avoiding the Data Mess: Industry Insights From Data Mesh ImplementationsIEEE Access10.1109/ACCESS.2024.341729112(95402-95416)Online publication date: 2024
                                  • (2024)Policy advice and best practices on bias and fairness in AIEthics and Information Technology10.1007/s10676-024-09746-w26:2Online publication date: 29-Apr-2024
                                  • (2023)A Scientific Field in FormationEnhancing Business Communications and Collaboration Through Data Science Applications10.4018/978-1-6684-6786-2.ch004(60-82)Online publication date: 28-Apr-2023
                                  • (2023)“That’s important, but...”: How Computer Science Researchers Anticipate Unintended Consequences of Their Research InnovationsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581347(1-16)Online publication date: 19-Apr-2023
                                  • Show More Cited By

                                  View Options

                                  View options

                                  PDF

                                  View or Download as a PDF file.

                                  PDF

                                  eReader

                                  View online with eReader.

                                  eReader

                                  Digital Edition

                                  View this article in digital edition.

                                  Digital Edition

                                  Magazine Site

                                  View this article on the magazine site (external)

                                  Magazine Site

                                  Get Access

                                  Login options

                                  Full Access

                                  Media

                                  Figures

                                  Other

                                  Tables

                                  Share

                                  Share

                                  Share this Publication link

                                  Share on social media