Nothing Special   »   [go: up one dir, main page]

skip to main content
article

PADS: a domain-specific language for processing ad hoc data

Published: 12 June 2005 Publication History

Abstract

PADS is a declarative data description language that allows data analysts to describe both the physical layout of ad hoc data sources and semantic properties of that data. From such descriptions, the PADS compiler generates libraries and tools for manipulating the data, including parsing routines, statistical profiling tools, translation programs to produce well-behaved formats such as Xml or those required for loading relational databases, and tools for running XQueries over raw PADS data sources. The descriptions are concise enough to serve as "living" documentation while flexible enough to describe most of the ASCII, binary, and Cobol formats that we have seen in practice. The generated parsing library provides for robust, application-specific error handling.

References

[1]
Abstract syntax description language. http://sourceforge.net/projects/asdl.
[2]
Cisco netflow. http://www.cisco.com/warp/public/732/Tech/nmp/netflow/index.shtml.
[3]
DFDL project. http://forge.gridforum.org/projects/dfdl-wg.
[4]
Erlang bit syntax. http://www.erlang.se/euc/99/binaries.ps.
[5]
Galax user manual. http://www.galaxquery.org/doc.html#manual.
[6]
Hypertext transfer protocol -- HTTP/1.1. http://www.w3.org/Protocols/rfc2616/rfc2616.html.
[7]
PADS user manual. http://www.padsproj.org/doc.html#manual.
[8]
Unicode home page. http://www.unicode.org/.
[9]
G. Back. DataScript - A specification and scripting language for binary data. In Proceedings of Generative Programming and Component Engineering, volume 2487, pages 66--77. LNCS, 2002.
[10]
J. Bell, F. Bellegarde, J. Hook, R. B. Kieburtz, A. Kotov, J. Lewis, L. McKinney, D. P. Oliva, T. Sheard, L. Tong, L. Walton, and T. Zhou. Software design for reliability and reuse: A proof-of-concept demonstration. In TRI-Ada '94 proceedings, pages 396--404, 1994.
[11]
S. Boag, D. Chamberlin, M. F. Fernández, D. Florescu, J. Robie, and J. Siméon. XQuery 1.0 An XML Query Language, W3C Working Draft, Aug 2004. http://www.w3.org/TR/xquery.
[12]
S. Chandra, N. Heintze, D. MacQueen, D. Oliva, and M. Siff. C-frontend library for SML/NJ. See cm.bell-labs.com/cm/cs/what/smlnj., 1999.
[13]
C. Cortes, K. Fisher, D. Pregibon, A. Rogers, and F. Smith. Hancock: A language for analyzing transactional data streams. ACM Trans. Program. Lang. Syst., 26(2):301--338, 2004.
[14]
C. Cortes and D. Pregibon. Giga mining. In KDD, 1998.
[15]
C. Cortes and D. Pregibon. Information mining platform: An infrastructure for KDD rapid deployment. In KDD, 1999.
[16]
C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, and O. Spatscheck. Gigascope: High performance network monitoring with an SQL interface. In SIGMOD. ACM, 2002.
[17]
O. Dubuisson. ASN.1: Communication between heterogeneous systems. Morgan Kaufmann, 2001.
[18]
M. F. Fernández, J. Siméon, B. Choi, A. Marian, and G. Sur. Implementing XQuery 1.0: The Galax experience. In VLDB, pages 1077--1080. ACM, 2003.
[19]
G. Fowler, D. Korn, S. North, and P. Vo. The AT&T AST opensource software collection. In Proceedings of the FREENIX Track 2000 Usenix Annual Technical Conference, pages 187--195, 2000.
[20]
A. C. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Fast, small-space algorithms for approximate histogram maintenance. In STOC, pages 389--398, 2002.
[21]
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. How to summarize the universe: Dynamic maintenance of quantiles. In VLDB, pages 454--465, 2002.
[22]
R. Greer. Daytona and the fourth-generation language Cymbal. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania, USA. ACM Press, 1999. Also available at www.research.att.com/projects/daytona.
[23]
S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss. Histogramming data streams with fast per-item processing. In ICALP, pages 681--692, 2002.
[24]
R. Kieburtz, L. McKinney, J. Bell, J. Hook, A. Kotov, J. Lewis, D. Oliva, T. Sheard, I. Smith, and L. Walton. A software engineering experiment in software component generation. In Proceedings of the 18th International Conference on Software Engineering, 1996.
[25]
D. G. Korn and K.-P. Vo. SFIO: Safe/fast string/file IO. In Proc. of the Summer '91 Usenix Conference, pages 235--256. USENIX, 1991.
[26]
B. Krishnamurthy and J. Rexford. Web Protocols and Practice. Addison Wesley, 2001.
[27]
B. Krishnamurthy and J. Wang. On network-aware clustering of web clients. In Proceedings of SIGCOMM 2000. ACM, 2000.
[28]
B. Krishnamurthy and C. Wills. Improving web experience by client characterization driven server adaptation. In Proceedings of WWW 2002. ACM, 2002.
[29]
P. McCann and S. Chandra. PacketTypes: Abstract specification of network protocol messages. In ACM Conference of Special Interest Group on Data Communications (SIGCOMM), pages 321--333, August 1998.

Cited By

View all
  • (2022)A Model and Declarative Language for Specifying Binary Data FormatsProgramming and Computing Software10.1134/S036176882207004048:7(469-483)Online publication date: 1-Dec-2022
  • (2021)Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data LakesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457250(1678-1691)Online publication date: 9-Jun-2021
  • (2021)“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AIProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445518(1-15)Online publication date: 6-May-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 40, Issue 6
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
June 2005
325 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1064978
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
    June 2005
    338 pages
    ISBN:1595930566
    DOI:10.1145/1065010
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2005
Published in SIGPLAN Volume 40, Issue 6

Check for updates

Author Tags

  1. data description language
  2. domain-specific languages

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)5
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Model and Declarative Language for Specifying Binary Data FormatsProgramming and Computing Software10.1134/S036176882207004048:7(469-483)Online publication date: 1-Dec-2022
  • (2021)Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data LakesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457250(1678-1691)Online publication date: 9-Jun-2021
  • (2021)“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AIProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445518(1-15)Online publication date: 6-May-2021
  • (2021)Semi-automatic Column Type Inference for CSV Table UnderstandingSOFSEM 2021: Theory and Practice of Computer Science10.1007/978-3-030-67731-2_39(535-549)Online publication date: 25-Jan-2021
  • (2020)Automatic Discovery and Synthesis of Checksum Algorithms from Binary Data SamplesProceedings of the 15th Workshop on Programming Languages and Analysis for Security10.1145/3411506.3417599(25-34)Online publication date: 13-Nov-2020
  • (2019)Active learning for software engineeringProceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3359591.3359732(62-78)Online publication date: 23-Oct-2019
  • (2019)Floorplan: spatial layout in memory management systemsProceedings of the 18th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3357765.3359519(81-93)Online publication date: 21-Oct-2019
  • (2019)Layout-aware information extraction from semi-structured medical imagesComputers in Biology and Medicine10.1016/j.compbiomed.2019.02.016Online publication date: Feb-2019
  • (2018)Bringing Effortless Refinement of Data Layouts to CogentLeveraging Applications of Formal Methods, Verification and Validation. Modeling10.1007/978-3-030-03418-4_9(134-149)Online publication date: 29-Oct-2018
  • (2016)Incremental forest: a DSL for efficiently managing filestoresACM SIGPLAN Notices10.1145/3022671.298403451:10(252-271)Online publication date: 19-Oct-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media