Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3563836.3568727acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Domain-Specific Visual Language for Data Engineering Quality

Published: 01 December 2022 Publication History

Abstract

Data engineering pipelines process large amounts of information, and ensuring that the quality and integrity of the data is maintained throughout is critical for technical, business, and social reasons. Conventional data quality assurance approaches require a large amount of fine-grained testing code, which is laborious, easy to get out of sync, and inscrutable to non-technical stakeholders. An executable higher-level visual approach to expressing quality requirements can serve as a shared representation of these constraints and their implications for all parties, eliminating repetition while increasing accessibility and maintainability. We present a visual programming language for expressing data quality requirements within a pipeline declaratively, structured as a diagram of compositional data flow, transformation, and validation steps.

References

[1]
Ziawasch Abedjan, Xu Chu, Dong Deng, Raul Castro Fernandez, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker, and Nan Tang. 2016. Detecting Data Errors: Where Are We and What Needs to Be Done? Proc. VLDB Endow., 9, 12 (2016), aug, 993–1004. issn:2150-8097 https://doi.org/10.14778/2994509.2994518
[2]
Bryan W. C. Chung. 2013. Multimedia Programming with Pure Data. Packt Publishing. isbn:1782164642
[3]
Philip T. Cox and Anh Dang. 2010. Semantic Comparison of Structured Visual Dataflow Programs. In Proceedings of the 3rd International Symposium on Visual Information Communication (VINCI ’10). Association for Computing Machinery, New York, NY, USA. Article 11, 9 pages. isbn:9781450304368 https://doi.org/10.1145/1865841.1865856
[4]
Philip T. Cox and Simon Gauvin. 2011. Controlled Dataflow Visual Programming Languages. In Proceedings of the 2011 Visual Information Communication - International Symposium (VINCI ’11). Association for Computing Machinery, New York, NY, USA. Article 9, 10 pages. isbn:9781450307864 https://doi.org/10.1145/2016656.2016665
[5]
Burak Emir, Martin Odersky, and John Williams. 2007. Matching Objects with Patterns. In Proceedings of the 21st European Conference on Object-Oriented Programming (ECOOP’07). Springer-Verlag, Berlin, Heidelberg. 273–298. isbn:3-540-73588-7, 978-3-540-73588-5 http://dl.acm.org/citation.cfm?id=2394758.2394779
[6]
Riley Evans, Samantha Frohlich, and Meng Wang. 2022. CircuitFlow: A Domain Specific Language for Dataflow Programming. In Practical Aspects of Declarative Languages: 24th International Symposium, PADL 2022, Philadelphia, PA, USA, January 17–18, 2022, Proceedings. Springer-Verlag, Berlin, Heidelberg. 79–98. isbn:978-3-030-94478-0 https://doi.org/10.1007/978-3-030-94479-7_6
[7]
Alex Fukunaga, Wolfgang Pree, and Takayuki Dan Kimura. 1993. Functions as Objects in a Data Flow Based Visual Language. In Proceedings of the 1993 ACM Conference on Computer Science (CSC ’93). Association for Computing Machinery, New York, NY, USA. 215–220. isbn:0897915585 https://doi.org/10.1145/170791.170832
[8]
Felix Geller, Robert Hirschfeld, and Gilad Bracha. 2010. Pattern Matching for an Object-Oriented and Dynamically Typed Programming Language. Hasso-Plattner-Instituts für Sofwaresystemtechnik an der Universität Potsdam.
[9]
Bernd Heinrich, Diana Hristova, Mathias Klier, Alexander Schiller, and Michael Szubartowicz. 2018. Requirements for Data Quality Metrics. J. Data and Information Quality, 9, 2 (2018), Article 12, jan, 32 pages. issn:1936-1955 https://doi.org/10.1145/3148238
[10]
Ian Hellström. 2016. The problems with visual programming languages in data engineering. https://databaseline.tech/the-problems-with-visual-programming-languages-in-data-engineering/
[11]
Michael Homer, Timothy Jones, and James Noble. 2015. From APIs to Languages: Generalising Method Names. In Dynamic Language Symposium. https://doi.org/10.1145/2816707.2816708
[12]
Michael Homer, James Noble, Kim B. Bruce, Andrew P. Black, and David J. Pearce. 2012. Patterns As Objects in Grace. In Proceedings of the 8th Symposium on Dynamic Languages (DLS ’12). ACM, New York, NY, USA. 17–28. isbn:978-1-4503-1564-7 https://doi.org/10.1145/2384577.2384581
[13]
C. T. Johnston, D. G. Bailey, and P. Lyons. 2006. Towards a Visual Notation for Pipelining in a Visual Programming Language for Programming FPGAs. In Proceedings of the 7th ACM SIGCHI New Zealand Chapter’s International Conference on Computer-Human Interaction: Design Centered HCI (CHINZ ’06). Association for Computing Machinery, New York, NY, USA. 1–9. isbn:1595934731 https://doi.org/10.1145/1152760.1152761
[14]
Wesley M. Johnston, J. R. Paul Hanna, and Richard J. Millar. 2004. Advances in Dataflow Programming Languages. ACM Comput. Surv., 36, 1 (2004), mar, 1–34. issn:0360-0300 https://doi.org/10.1145/1013208.1013209
[15]
Tobias Kohn, Guido van Rossum, Gary Brandt Bucher II, Talin, and Ivan Levkivskyi. 2020. Dynamic Pattern Matching with Python. In Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages (DLS 2020). Association for Computing Machinery, New York, NY, USA. 85–98. isbn:9781450381758 https://doi.org/10.1145/3426422.3426983
[16]
Shixia Liu, Gennady Andrienko, Yingcai Wu, Nan Cao, Liu Jiang, Conglei Shi, Yu-Shuen Wang, and Seokhee Hong. 2018. Steering data quality with visual analytics: The complexity challenge. Visual Informatics, 2, 4 (2018), 191–197. issn:2468-502X https://doi.org/10.1016/j.visinf.2018.12.001
[17]
Nigel Martin, Alexandra Poulovassilis, and Jianing Wang. 2014. A Methodology and Architecture Embedding Quality Assessment in Data Integration. J. Data and Information Quality, 4, 4 (2014), Article 17, may, 40 pages. issn:1936-1955 https://doi.org/10.1145/2567663
[18]
Brad A Myers. 1990. Taxonomies of visual programming and program visualization. Journal of Visual Languages & Computing, 1, 1 (1990), 97–123.
[19]
Arvind Narayanan and Vitaly Shmatikov. 2010. Myths and Fallacies of "Personally Identifiable Information". Commun. ACM, 53, 6 (2010), jun, 24–26. issn:0001-0782 https://doi.org/10.1145/1743546.1743558
[20]
Netlify. 2022. Great Expectations Home Page. https://greatexpectations.io/
[21]
Ken Orr. 1998. Data Quality and Systems Theory. Commun. ACM, 41, 2 (1998), feb, 66–71. issn:0001-0782 https://doi.org/10.1145/269012.269023
[22]
Marco Porta. 2000. Iteration constructs in data-flow visual programming languages. Computer Languages, 26 (2000), 67–104.
[23]
Robert Schaefer. 2011. On the Limits of Visual Programming Languages. SIGSOFT Softw. Eng. Notes, 36, 2 (2011), mar, 7–8. issn:0163-5948 https://doi.org/10.1145/1943371.1943373
[24]
Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Biessmann, and Andreas Grafberger. 2018. Automating Large-Scale Data Quality Verification. Proc. VLDB Endow., 11, 12 (2018), aug, 1781–1794. issn:2150-8097 https://doi.org/10.14778/3229863.3229867
[25]
Marc Schmidt. 2021. Patterns for Visual Programming: With a Focus on Flow-Based Programming Inspired Systems. In 26th European Conference on Pattern Languages of Programs (EuroPLoP’21). Association for Computing Machinery, New York, NY, USA. Article 6, 7 pages. isbn:9781450389976 https://doi.org/10.1145/3489449.3489977
[26]
Don Syme, Gregory Neverov, and James Margetson. 2007. Extensible Pattern Matching Via a Lightweight Language Extension. In ICFP.
[27]
The MathWorks, Inc. 2022. Simulink. https://www.mathworks.com/products/simulink.html
[28]
R.Y. Wang, V.C. Storey, and C.P. Firth. 1995. A framework for analysis of data quality research. IEEE Transactions on Knowledge and Data Engineering, 7, 4 (1995), 623–640. https://doi.org/10.1109/69.404034

Cited By

View all
  • (2024)Function+Data Flow: A Framework to Specify Machine Learning Pipelines for Digital TwinningProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664759(19-27)Online publication date: 10-Jul-2024
  • (2023)Multiple-Representation Visual Compositional Dataflow ProgrammingCompanion Proceedings of the 7th International Conference on the Art, Science, and Engineering of Programming10.1145/3594671.3594681(39-47)Online publication date: 13-Mar-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PAINT 2022: Proceedings of the 1st ACM SIGPLAN International Workshop on Programming Abstractions and Interactive Notations, Tools, and Environments
November 2022
62 pages
ISBN:9781450399104
DOI:10.1145/3563836
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data engineering
  2. dataflow programming
  3. visual programming

Qualifiers

  • Research-article

Conference

PAINT '22
Sponsor:

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)3
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Function+Data Flow: A Framework to Specify Machine Learning Pipelines for Digital TwinningProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664759(19-27)Online publication date: 10-Jul-2024
  • (2023)Multiple-Representation Visual Compositional Dataflow ProgrammingCompanion Proceedings of the 7th International Conference on the Art, Science, and Engineering of Programming10.1145/3594671.3594681(39-47)Online publication date: 13-Mar-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media