Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/360204.360209acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
Article

Regular expression pattern matching for XML

Published: 01 January 2001 Publication History

Abstract

We propose regular expression pattern matching as a core feature for programming languages for manipulating XML (and similar tree-structured data formats). We extend conventional pattern-matching facilities with regular expression operators such as repetition (*), alternation (I), etc., that can match arbitrarily long sequences of subtrees, allowing a compact pattern to extract data from the middle of a complex sequence. We show how to check standard notions of exhaustiveness and redundancy for these patterns.Regular expression patterns are intended to be used in languages whose type systems are also based on the regular expression types. To avoid excessive type annotations, we develop a type inference scheme that propagates type constraints to pattern variables from the surrounding context. The type inference algorithm translates types and patterns into regular tree automata and then works in terms of standard closure operations (union, intersection, and difference) on tree automata. The main technical challenge is dealing with the interaction of repetition and alternation patterns with the first-match policy, which gives rise to subtleties concerning both the termination and the precision of the analysis. We address these issues by introducing a data structure representing closure operations lazily.

References

[1]
Serge Abiteboul, Dallan Quass, Jason McHugh, Jennifer Widom, and Janet L. Wiener. The Lorel query language for semistructured data. International Journal on Digital Libraries, 1(1):68-88, 1997.
[2]
Alexander Aiken and Edward L. Wimmers. Solving systems of set constraints (extended abstract). In Proceedings, Seventh Annual IEEE Symposium on Logic in Computer Science, pages 329-340, June 1992.
[3]
R. Burstall, David MacQueen, and Donald Sannella. HOPE: an experimental applicative language. In Proceedings of the 1980 LISP Conference, pages 136- 143, Stanford, California, 1980. Stanford University.
[4]
Luca Cardelli and Giorgio Ghelli. A query language for semistructured data based on the ambient logic. Manuscript, April 2000.
[5]
Luca Cardelli and Andrew D. Gordon. Anytime, anywhere. Modal logics for mobile ambients. In Proceedings of the 27th ACM Symposium on Principles of Programming Languages, pages 365-377, 2000.
[6]
Sophie Cluet and JOrSme Simeon. Using YAT to build a web server. In Intl. Workshop on the Web and Databases (WebDB), 1998.
[7]
Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu. XML-QL: A Query Language for XML. http : llwww, w3. orglTRINOTE-xml-ql.
[8]
Vladimir Gapeyev, Michael Levin, and Benjamin Pierce. Recursive subtyping revealed. In Proceedings of the International Conference on lunctional Programming (ICFP), pages 221-232, 2000.
[9]
R. Gilleron, S. Tison, and M. Tommasi. Set constraints and automata. Technical Report it-292, Laboratoire d'Informatique fondamentale de Lille, UniversitLille 1, 1996.
[10]
Haruo Hosoya and Benjamin Pierce. Regular expression pattern matching for XML. Available through ht'ep://wtw, cis. upemx, edu/'hahos oya/ papers/tapat-full.ps, November 2000.
[11]
Haruo Hosoya and Benjamin C. Pierce. XDuce: A typed XML processing language. In Proceedings of Third International Workshop on the Web and Databases (WebDB2000), May 2000.
[12]
John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979.
[13]
Haruo Hosoya, Jarfme Vouillon, and Benjamin C. Pierce. Regular expression types for XML. In Proceedings of the International Conference on Functional Programming (ICFP), pages 11-22, September 2000.
[14]
Simon L. Peyton Jones, Cordelia V. Hall, Kevin Hammond, Will Partain, and Philip Wadler. The Glasgow Haskell compiler: a technical overview. In Prec. UK Joint Framework for Information Technology (JFIT) Technical Conference, July 93.
[15]
Nils Klarlund, Anders M~ller, and Michael I. Schwartzbach. DSD: A schema language for XML. htCp :/lw,n. brics, dk/DSD/.
[16]
Xavier Leroy, Jr6me Vouillon, Damien Doligez, et al. The Objective Carol system. Software and documentation available on the Web, http :// pauillac, inria, fr/ocaml/, 1996.
[17]
Tova Milo and Dan Suciu. Type inference for queries on semistructured data. In Proceedings of Symposium on Principles of Database Systems, pages 215-226, Philadelphia, May 1999.
[18]
Tova Milo, Dan Suciu, and Victor Vianu. Typechecking for XML transformers. In Proceedings of the Nineteenth A CM SIGMOD-SIGA CT-SIGART Symposium on Principles of Database Systems, pages 11- 22. ACM, May 2000.
[19]
Robin Milner, Mads Tofte, and Robert Harper. The Definition of Standard ML. The MIT Press, 1990.
[20]
Makoto Murata. Transformation of documents and schemas by patterns and contextual conditions. In Principles of Document Processing '96, volume 1293 of Lecture Notes in Computer Science, pages 153- 169. Springer-Verlag, 1997.
[21]
Andreas Neumann and Helmut Seidl. Locating matches of tree patterns in forests. In 18th FSTTCS, volume 1530 of LNCS, pages 134-145, 1998.
[22]
Frank Neven and Thomas Schwentick. Expressive and efficient pattern languages for tree-structured data. In Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 145-156. ACM, 2000.
[23]
Laurence Puel and Ascinder Suhrez. Compiling pattern matching by term decomposition. In 1990 ACM Conference on Lisp and Functional Programming, pages 272-281, June 1990.
[24]
Yannis Papakonstantinou and Victor Vianu. DTD Inference for Views of XML Data. In Proceedings of the Nineteenth A CM SIGMOD-SIGA CT-SIGART Symposium on Principles of Database Systems, pages 35-46, Dallas, Texas, May 2000.
[25]
RELAX (REgular LAnguage description for XML). htp://www.xml.gr.jp/relax/.
[26]
Hermut Seidl. Deciding equivalence of finite tree automata. SIAM Journal of Computing, 19(3):424-437, June 1990.
[27]
Andrew K. Wright and Robert Cartwright. A practical soft type system for scheme. In In Proceedings of A CM Conference on Lisp and Functional Programming, pages 250-262, 1994.
[28]
Extensible markup language (XMLTM). http :// www.w3.org/XML/.
[29]
XML Schema Part O: Primer, W3C Working Draft. http://www. w3. org/TR/xmlschema-0/, 2000.

Cited By

View all
  • (2023)Programming with Union, Intersection, and Negation TypesThe French School of Programming10.1007/978-3-031-34518-0_12(309-378)Online publication date: 11-Oct-2023
  • (2022)Set-theoretic Types for ErlangProceedings of the 34th Symposium on Implementation and Application of Functional Languages10.1145/3587216.3587220(1-14)Online publication date: 31-Aug-2022
  • (2021)String Matching with Wildcards in the Massively Parallel Computation ModelProceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3409964.3461793(275-284)Online publication date: 6-Jul-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
POPL '01: Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
January 2001
304 pages
ISBN:1581133367
DOI:10.1145/360204
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

POPL01

Acceptance Rates

POPL '01 Paper Acceptance Rate 24 of 126 submissions, 19%;
Overall Acceptance Rate 824 of 4,130 submissions, 20%

Upcoming Conference

POPL '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)2
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Programming with Union, Intersection, and Negation TypesThe French School of Programming10.1007/978-3-031-34518-0_12(309-378)Online publication date: 11-Oct-2023
  • (2022)Set-theoretic Types for ErlangProceedings of the 34th Symposium on Implementation and Application of Functional Languages10.1145/3587216.3587220(1-14)Online publication date: 31-Aug-2022
  • (2021)String Matching with Wildcards in the Massively Parallel Computation ModelProceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3409964.3461793(275-284)Online publication date: 6-Jul-2021
  • (2015)Reusing metadata across components, applications, and languagesScience of Computer Programming10.1016/j.scico.2014.09.00298:P4(617-644)Online publication date: 1-Feb-2015
  • (2014)BiFluXProceedings of the 16th International Symposium on Principles and Practice of Declarative Programming10.1145/2643135.2643141(147-158)Online publication date: 8-Sep-2014
  • (2014)Using regular expressions for mining data in large software repositoriesThe 5th International Conference on Information and Communication Technology for The Muslim World (ICT4M)10.1109/ICT4M.2014.7020649(1-6)Online publication date: Nov-2014
  • (2014)Investigating the factors that influence the quality of open source systemsThe 5th International Conference on Information and Communication Technology for The Muslim World (ICT4M)10.1109/ICT4M.2014.7020589(1-6)Online publication date: Nov-2014
  • (2012)Metadata invariants: checking and inferring metadata coding conventionsProceedings of the 34th International Conference on Software Engineering10.5555/2337223.2337305(694-704)Online publication date: 2-Jun-2012
  • (2012)Regular expression sub-matching using partial derivativesProceedings of the 14th symposium on Principles and practice of declarative programming10.1145/2370776.2370788(79-90)Online publication date: 19-Sep-2012
  • (2012)Metadata invariants: Checking and inferring metadata coding conventions2012 34th International Conference on Software Engineering (ICSE)10.1109/ICSE.2012.6227148(694-704)Online publication date: Jun-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media