Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu
Program Generators and the Tools to Make Them Yannis Smaragdakis, Shan Shan Huang, David Zook College of Computing Georgia Institute of Technology Atlanta, GA 30332, USA {yannis|ssh|dzook}@cc.gatech.edu ABSTRACT Program generation is among the most promising techniques in the effort to increase the automation of programming tasks. In this paper, we discuss the potential impact and research value of program generation, we give examples of our research in the area, and we outline a future work direction that we consider most interesting. Specifically, we first discuss why program generators have significant applied potential. At the same time we argue that, as a research topic, meta-programming tools (i.e., language tools for writing program generators) may be of greater value. We then illustrate our views on generators and meta-programming tools with our latest work on the Meta-AspectJ metaprogramming language and the GOTECH generator. Finally, we examine the problem of statically determining the safety of a generator and present its intricacies. We limit our focus to one particular kind of guarantee for generated code—ensuring that the generated program is free of compile-time errors. We believe that this research direction will see significant attention and will make a difference in the mainstream adoption of meta-programming technology. Categories and Subject Descriptors D.1.2 [Programming Techniques]: Automatic Programming—program synthesis, program transformation, program verification; D.3.4 [Programming Languages]: Processors General Terms Design, Languages Keywords Program generators, meta-programming, safety guarantees Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PEPM’04, August 24–26, 2004, Verona, Italy. Copyright 2004 ACM 1-58113-835-0/04/0008 ...$5.00. 1. PROGRAM GENERATORS AND METAPROGRAMMING We first present some general thoughts on program generators. We concentrate on frequently-asked questions about the nature and value of generators, as well as the research promise of the area. 1.1 What Are Program Generators? A program generator (or just generator ) is a program that generates other programs. This broad definition is often qualified to include constraints such as “the generated program is expressed in a high-level programming language”. No definition, however, offers strict boundaries distinguishing generators from traditional compilers, text-generating programs, etc. Thus, what really constitutes a generator is often determined intuitively and does not reflect so much a technical distinction as the emphasis in the development of the “generator”. A related concept is meta-programming. Metaprogramming can be described as the creation of a program that computes other programs or reasons about other programs. Meta-programming tools constitute the general platform for implementing generators in a given language setting. To understand what generators are we should look at their common uses. In practice, generators are typically compilers for domain-specific languages (DSLs). A domain-specific language is a special-purpose programming language for a particular software domain. A “domain” can be defined either by its technical structure (e.g., the domain of reactive real-time programs, the domain of LALR language parsers) or by real-world applications (e.g., the database domain, the telephony domain, etc.). The purpose of restricting attention to specific domains is to exploit the domain features and knowledge to increase automation. If we view generators as compilers for DSLs, it is worth asking how they differ from general-purpose language compilers. Indeed, the research and practice of program generators is distinctly different from that of general-purpose compilers. A general-purpose language compiler implements a stable, separately defined specification and can take several man-years to develop. In contrast, a generator is typically co-designed with the DSL that it implements. Thus, the emphasis is not on deeply analyzing a program to infer its properties, but on designing the DSL so that the properties are clearly exposed and on having the generator exploit them with as little effort as possible. The effort of implementing a program generator is typically small—comparable to the effort of implementing a software library for the domain. This is largely the result of leveraging the high-level language (commonly called the object language) in which the generated programs are expressed. The above features of program generators (domainspecificity, language/generator co-design, low-effort development) also define the focus of research in the area. Most research in program generators concentrates either on specific domains that are amenable to a generator-based approach or on meta-programming tools that simplify generator implementation. 1.2 Why Care about Generators? Program generators have been an active research focus since the early days of Computer Science. The main reasons that researchers are attracted to program generators have to do with the inherently interesting conceptual problems of meta-programming and with the potential for practical benefit. For many researchers, meta-programming is an intellectually fascinating topic. After all, computer scientists are people who find computations interesting. What then can be more interesting than computing about computations? The canonical sensationalist example is self-generating programs. It is easy to be intrigued by programs whose task is to output their own program text. For example, we can have: ((lambda (x) (list x (list ’quote x))) ’(lambda (x) (list x (list ’quote x)))) in Lisp or: main(a){a="main(a){a=%c%s%c;printf(a,34,a,34);}"; printf(a,34,a,34);} in C. At the same time, many researchers hold the belief that software engineering tasks can be substantially automated through program generation. Everyday programming often involves rote coding of pieces of functionality. These can be generated mechanically, as long as the domain-specific knowledge is suitably encoded. Most programming practitioners are confident that they can achieve far greater code reuse with the appropriate domain-specific tool. (In fact, we have encountered many cases where people overestimate the potential for automation based on their experience with a small set of examples.) Nevertheless, the question is one of engineering balance between the cost of developing a DSL and the benefit of using it. In the software development arena, program generators compete primarily with lowereffort but also lower-benefit tools, such as domain-specific libraries/APIs. There are currently many thousands specialized software libraries for various domain-specific tasks. A realistic software application employs hundreds of predefined APIs, e.g., for manipulating strings, for file processing, for math functions, for abstractions such as streams and sockets, for graphical interfaces, etc. Replacing some of these APIs with domain-specific languages implemented as program generators may result in significant simplification of the programming tasks in the domain. Libraries/APIs can themselves be thought of as crude programming languages. They have their own simplistic syntax: only function call syntax is allowed and the syntax checking is limited to checking the number of arguments. They have their own semantic restrictions: arguments need to satisfy some preconditions and calls may affect the state of the system, thus needing to occur in specific ordering patterns. Limited static error checking takes place by type checking the function calls in the host language. Libraries/APIs even have their own simple optimization: they typically offer multiple hand-specialized versions of operations for different kinds of arguments, such as special-purpose multiplication operators for sparse arrays in a scientific computing library. The user needs to pick the appropriate operations and to ensure their safety. Users of standard libraries/APIs are often constrained more by the library semantics than by the semantics of the host programming language. This is a common sentiment among users of large parallel processing and scientific computing libraries (like MPI or LAPACK). It is also often expressed by programmers in large projects where each part of the code needs to support the conventions of other parts. In this case, the “domain” is the project itself. A Microsoft programmer once told us “I don’t program in C, I program in Word’s internal API.” The software engineering benefit of domain-specific languages relative to function libraries/APIs is exactly in addressing the above deficiencies of syntax, safety, and performance. A domain-specific language can offer more concise syntax, increasing the ease of development and maintenance; it can perform static error checking to detect common violations of the library semantics; and it can offer better performance through domain-specific optimizations. A common argument (especially of the partial evaluation community) is that the advantages of program generation can be expressed entirely on the performance axis. After all, if performance is not a concern, the syntax and safety benefits of a domain-specific language can be achieved with just an interpreter. Subsequently, if better performance is required, it can be achieved by specializing the interpreter. This specialization can occur either via a general-purpose partial evaluator or via program generation. Thus, according to this view, program generation is an alternative to partial evaluation and it is entirely about code specialization. Although it is tempting and often useful to think entirely in performance terms, we should note that this view is not accurate for many, if not most, of the program generators in actual use. In practice, program generators are often employed not for reasons of performance but for reasons dictated by modularity. Many generation tasks are performed to produce code so that it satisfies conventions of existing external libraries, run-time systems, etc. For example, a domain-specific language for telephony can be generating code in the form needed to interface with a specific telephony standard API. The GOTECH generator that we will later present yields components that can be loaded in a Java application server. None of these tasks is performance related: programs need to be generated in a specific form in order to interface with external clients. 1.3 The Research Value of Meta-Programming Although program generators are very interesting in structure and valuable in practice, individual generators are often less than ideal targets for research. The reason has to do with the domain-specificity of generators. Domain knowledge is by far the primary element of a successful generator. Unfortunately, however, domain knowledge is not accessible to non-domain-experts. Since research is concerned with the encoding and transmission of reusable knowledge, the domain-specificity of generators often limits their research impact to narrow communities. Without domainindependent results it would be hard to speak of research in generators as a whole. In contrast, we believe that meta-programming infrastructure is a much more promising research target, especially for the programming languages and tools research community. Meta-programming tools are the domain-independent part of program generation. They pose all the interesting conceptual research problems of program generation and have a large applied value. In terms of technical problems, metaprogramming has a need for better language constructs, type systems and analyses to ensure safety, etc. Additionally, meta-programming stresses the limits of language processing and analysis technology (e.g., parsing) because the generated code fragments are small and often appear out of their final program context [14]. In terms of applied value, the benefit of generators is tied to the ease with which they can be implemented. A successful meta-programming tool is one that significantly simplifies generator implementation. This simplification can be a result of added expressiveness (i.e., easy program generation and transformation) or added safety (i.e., avoiding errors in coding generators). Perhaps surprisingly, the potential impact of metaprogramming tools is lower for mature, well-understood, high-value domains. For such domains, the importance of domain abstractions is so high that better metaprogramming infrastructure is unlikely to matter much. For instance, it is hard to imagine that good meta-programming tools would make a difference for parser generation (i.e., generators like Yacc, ANTLR), string and system command processing (i.e., implementations of scripting languages like Perl), or database processing (i.e., languages like SQL). In general, although such systems are technically translators for DSLs, they are not part of the focus of program generation. Meta-programming tools can have a greater impact in less mature domains, where it is not yet clear whether applying a domain-specific language approach is appropriate. Good meta-programming infrastructure can bridge the implementation effort gap between a library and a generator, thus making program generation cost-feasible for many more domains. sion”, “statement”, “identifier”, etc.) through type inference and a context-sensitive parsing algorithm. 2.1.1 What Do Aspects Have to Do with Generation? The AspectJ [6] language is an extension of Java and the flagship tool of aspect-oriented programming (AOP): a methodology that advocates decomposing software by aspects of functionality that may affect many functional units. AspectJ supports defining such aspects separately from the main application code and subsequently merging them with that code. For the purposes of our discussion, we can view AspectJ as a sophisticated code manipulation tool. With AspectJ, the user can add superclasses and interfaces to existing classes and can interpose arbitrary code to method executions, field references, exception throwing, and more. Complex enabling predicates can be used to determine whether code should be interposed at a certain point. Such predicates can include, for instance, information on the identity of the caller and callee, whether a call to a method is made while a call to a certain other method is on the stack, etc. Our use of AspectJ in MAJ is not tied to the AOP methodology at all. Instead, the main observation behind Meta-AspectJ is that generators can use AspectJ as a backend language. That is, a generator can output an AspectJ file, which will be passed to the AspectJ compiler together with an existing Java program. Based on the generated AspectJ file, the AspectJ compiler will then transform the Java program. AspectJ offers a convenient vocabulary for expressing program inspection and transformation. This functionality is crucial for program generators. A generator often needs to inspect an existing program and generate code for each field, method, field assignment, etc. encountered. At the same time, a generator often needs to transform its input program—e.g., to make it call the newly generated code. Such transformations are conveniently expressed in the AspectJ vocabulary. For example, consider a simple transformation of a Java program. We may want to take as input a class C and add an interface definition clause to it. At the source code level, this means transforming the code of C from: class C ... {...} to: 2. META-ASPECTJ AND GOTECH In this section we discuss our recent work on the MetaAspectJ meta-programming tool and the GOTECH program generator. Both of these artifacts illustrate well our earlier and later points in this paper, regarding the conceptual and applied value of meta-programming, as well as the kinds of generators that meta-programming tools should support. 2.1 Meta-AspectJ Meta-AspectJ (MAJ) [16] is a language tool for generating Java and AspectJ programs using code templates, i.e., quote and unquote operators. Two elements of MAJ are quite interesting. First, we believe that using the AspectJ language as a back-end simplifies the task of writing a generator. Second, MAJ is a technically mature meta-programming tool— in many respects the most advanced meta-programming tool for Java. For instance, MAJ reduces the need to deal with low level syntactic types for quoted entities (e.g., “expres- class C implements I ... {...} The standard way to effect this code transformation would be to parse the input, find the definition of class C and add the implements clause as a branch in the abstract syntax tree representing the C definition. At the same time, we need to be careful to find the right class C (and not, for instance, some local class with the same name), preserve its existing superclasses and interfaces, etc. An alternative way to do this transformation would be to emit the AspectJ code: aspect S { declare parents: C implements I; } The latter way is simpler and relieves us of dealing with many low-level details. The above example is admittedly simple. For more complicated code transformations (e.g., indirecting all sites where a field value may be changed, transforming all instantiations of a class to instantiations of a different class, etc.) the benefit of expressing them with AspectJ strictly increases. AspectJ offers a convenient level of abstraction in transforming programs. Although AspectJ cannot manipulate low-level code (e.g., one cannot find all the if statements or all for loops in a program and change them) it can transform almost every externally visible element of a Java class (e.g., methods, fields, type information, etc.). As an added practical advantage, AspectJ features a mature compiler implementation that directly outputs Java bytecode. 2.1.2 MAJ Technically MAJ extends Java with two code-template operators for creating AspectJ code fragments: ‘[...] (“quote”) and #[EXPR] or just #IDENTIFIER (“unquote”). (The ellipses, EXPR and IDENTIFIER are meta-variables matching any syntax, expressions and identifiers, respectively.) The quote operator creates abstract syntax tree representations of AspectJ code fragments. Parts of these representations can be variable and are designated by the unquote operator (instances of unquote can only occur inside a quoted code fragment). For example, the value of the MAJ expression ‘[call(* *(..))] is a data structure that represents the abstract syntax tree for the fragment of AspectJ code call(* *(..)). (This AspectJ expression matches all method calls for any method with any argument list and any return type.) Similarly, the MAJ expression ‘[!within(#className)] is a quoted pattern with an unquoted part. Its value depends on the value of the variable className. If, for instance, className holds the identifier “SomeClass”, the value of ‘[!within(#className)] is the abstract syntax tree for the expression !within(SomeClass). MAJ also introduces the new keyword infer that can be used in place of a type name when a new variable is being declared and initialized to a quoted expression. For example, we can write: infer pct1 = ‘[call(* *(..))]; This declares a variable pct1 that can be used just like any other program variable. For instance, we can unquote it: infer adv1 = ‘[void around() : #pct1 { }]; This creates the abstract syntax tree for a piece of AspectJ code defining what should happen (nothing in this case) every time any method gets called. In the above example, the inferred type of variable adv1 will be AdviceDec (for “advice declaration”), which is one of the types for AspectJ abstract syntax tree nodes that MAJ defines. The full set of permitted type qualifiers contains over 20 types, including Identifier, Modifiers, JavaExpr, Stmt, MethodDec, ConstructorDec, etc. In addition to variable definitions, such types can also be used explicitly in the quote/unquote operators. For instance, the fully qualified version of the adv1 example would be: AdviceDec adv1 = ‘(AdviceDec)[void around(): #(Pcd)pct1 {}]; The inference of qualifiers and types of variables holding abstract syntax trees relieves the user from dealing with the specifics of abstract syntax tree types, while maintaining the static checking of the well-formedness of trees. This feature distinguishes MAJ from other meta-programming tools. Having multiple quote/unquote operators is the norm in meta-programming tools for languages with rich surface syntax (e.g., meta-programming tools for Java [1], C [15], and C++ [3]). For instance, let us examine the JTS tool for Java meta-programming—the closest comparable to MAJ. JTS introduces several different kinds of quote/unquote operators: exp{...}exp, $exp(...), stm{...}stm, $stm(...), mth{...}mth, $mth(...), cls{...}cls, $cls(...), etc. Additionally, just like in MAJ, JTS has distinct types for each abstract syntax tree form: AST_Exp, AST_Stmt, AST_FieldDecl, AST_Class, etc. Unlike MAJ, however, the JTS user needs to always specify explicitly the correct operator and tree type for all generated code fragments. For instance, consider the JTS fragment: AST_Exp x = exp{ 7 + i }exp; AST_Stm s = stm{ if (i > 0) return $exp(x); }stm; This written in MAJ is simply: infer x = ‘[7 + i]; infer s = ‘[if (i > 0) return #x;]; The inference of type qualifiers, as in MAJ, requires sophistication in the implementation of the metaprogramming tool. The meta-programming tool needs to be a full-fledged compiler with its own type system (instead of a naive pre-processor, translating to an object language and relying on that language’s type system for checking). Additionally, parsing becomes context-sensitive—i.e., the type of a variable determines how a certain piece of syntax is parsed, which puts the parsing task beyond the capabilities of a (context-free) grammar-based parser generator. To see the above points, consider the MAJ code fragment: infer l = ‘[ #foo class A {} ]; The inferred type of l depends on the type of foo. For instance, if foo is of type Modifiers (e.g., it has the value ‘[public]) then the above code would be equivalent to: ClassDec l = ‘[ #(Modifiers)foo class A {} ]; If, however, foo is of type Import (e.g., it has the value ‘[import java.io.*;]) then the above code would be equivalent to: CompilationUnit l = ‘[ #(Import)foo class A {} ]; Thus, to be able to infer the type of the quoted expression we need to know the types of the unquoted expressions. Furthermore, the types of expressions even influence the parsing and translation of quoted code. The two possible abstract syntax trees are not isomorphic. If the type of foo is Modifiers, this will result in an entirely different parse and translation of the quoted code than if the type of foo is Import (or ClassDec, or InterfaceDec, etc). In the former case, foo just describes a modifier—i.e., a branch of the abstract syntax tree for the definition of class A. In the latter case, the abstract syntax tree value of foo is at the same level as the tree for the class definition. The MAJ type system itself is simple: it has a fixed set of types with a few subtyping relations and a couple of ad hoc conversion rules (e.g., from Java strings to MAJ identifiers). Type inference is quite straightforward: when deriving the type of an expression, the types of its component subexpressions are known and there is a most specific type for each expression. No recursion is possible in the inference logic, since the infer keyword can only be used in variable declarations and the use of a variable in its own initialization expression is not allowed in Java. An added advantage of having a full compiler for the metaprogramming tool is that it can emit better error messages. MAJ differentiates between MAJ type errors (i.e., syntactically invalid generated code) and regular parse errors in the MAJ language. It then emits accurate error messages relative to the original MAJ source. 2.2 Applications: The GOTECH Generator We will next briefly describe our GOTECH generator. GOTECH has two interesting elements. First, it is implemented by producing AspectJ code, thus being a representative of the generators that MAJ can support. Second, it shows the kinds of program manipulations that we need to handle in meta-programming tools. The next section, discussing future directions in statically safe meta-programming, will refer to examples of tasks from GOTECH. 2.2.1 What Is GOTECH? GOTECH [13] (for “General Object-To-EJB Conversion Helper”) is a program generator for Java server-side components. GOTECH takes as input a Java program annotated with JavaDoc comments to describe what parts of the functionality should be remotely executable. It then transforms parts of the program so that they execute over a network instead of running on a local machine. The middleware platform used for distributed computing is J2EE (the protocol for Enterprise Java Beans—EJB). GOTECH takes care of generating code adhering to the EJB conventions (EJB session beans) and makes methods, construction calls, etc. execute on a remote machine. Internally, the modification of the application is performed by generating AspectJ code that transforms existing classes. GOTECH was originally implemented using XDoclet [11]—a simple, text-based meta-programming tool. Subsequently GOTECH has been expressed more safely and conveniently with MAJ. It is interesting to look at the specific program inspection and generation tasks that GOTECH performs. These include the following (for each Java class that has the appropriate GOTECH input annotation): • GOTECH generates two isomorphic interfaces—i.e., Java interfaces whose methods correspond one-to-one to the public methods of the class. These are the local and remote interfaces required for the session bean according to the EJB specification. • An EJB class (session bean) is generated by cloning the functionality of the original Java class. The session bean declares that it implements the generated local and remote interfaces and modifies the original method signatures to throw the exceptions required by the distribution middleware. • An AspectJ aspect is generated to intercept all instantiations and calls to methods of the original Java class. All such object creations and client calls are now performed on the EJB class. • An aspect is generated to make parameter types serializable (i.e., passed by-copy when used as arguments to remote calls) if necessary, according to user-supplied annotations. 2.2.2 Using AspectJ in GOTECH GOTECH illustrates the value of using AspectJ as a generator back-end. The transformations performed by GOTECH (e.g., changing all client calls, making parameter types serializable) would be much harder to do without AspectJ. On the other hand, AspectJ alone would not be able to handle the GOTECH tasks. The GOTECH activities were previously [10] shown impossible to automate with just AspectJ. For example, AspectJ cannot be used to create an interface isomorphic to the public methods of a given class. Additionally, the aspects used are not static— they need to be custom-generated for the particular input application. The customization is not just with respect to the names of classes that need to be transformed. Instead, complex decisions may need to be made. For instance, the GOTECH logic making method parameter types be serializable is roughly “the type should be serializable, if neither it nor its supertypes are already declared to be serializable and it is not a primitive type.” None of these tests can be expressed in AspectJ but they are easy to compute using arbitrary Java code and reflection. Then the right custom aspect gets generated when applicable. 3. GUARANTEED-LEGAL GENERATORS In this section we present some thoughts on valuable research directions for meta-programming tools. We discuss the motivation for statically safe program generation, the state of the art, and what needs to be addressed next to support powerful generators. 3.1 What Is Guaranteed-Legal Generation and Why Do We Need It? As we saw earlier, the value of a MAJ quoted code fragment is an abstract syntax tree for the code fragment. Representing the generated program as abstract syntax trees is a standard technique in structured meta-programming tools— to be contrasted with unstructured, or “text-based” tools. It statically ensures that any generated code is syntactically correct by requiring that the tree be well-formed. This property is often described as “the type safety of the generator implies the syntactic correctness of the generated program.” Nevertheless, we would like to go further than mere syntax checking. The generated program may still contain semantic errors. Offering guarantees on the generated code is one of the toughest and most interesting issues in meta-programming. We are interested in statically safe program generation: statically checking the generator should enforce the safety of the generated program. Since the space of program safety properties is huge and most of these properties are undecidable, we limit our attention to language legality checking. That is, we want to ensure that the generated program does not suffer from errors typically detected by a conventional compiler, such as type mismatch errors, references to undeclared variables, duplicate variable definitions, etc. We use the term guaranteed-legal to describe a generator that always generates programs that will compile correctly. We would like to have meta-programming infrastructure that statically ensures that a generator is guaranteed-legal for all legal inputs. Of course, having any kind of sound static safety check means putting restrictions on the expressible programs. Hence, some safe generators will be conservatively rejected. The challenge is to design a static checking mechanism that is expressive enough for common program generators (e.g., the GOTECH program manipulations). It is often debated whether static legality checking is a valuable feature. After all, the generated program will be checked statically before it runs, so why try to catch these errors before the program is even generated? The answer is that static checking is not intended to detect errors in the generated program or even errors in the generator input, but errors in the generator itself. Although these errors will be detected at compile-time of the generated program, this is at least as late as the run-time of the generator. Thus, static legality checking for generators is analogous to static typing for regular programs. It is a desirable property because it increases confidence in the correctness of the generator under all inputs (and not just the inputs with which the generator writer has tested the generator). To see the problem in an example, consider a program generator that emits programs depending on two input-related conditions: if (pred1()) emit( ‘[int i;] ); ... if (pred2()) emit( ‘[i++;] ); If, for some input, pred2 does not imply pred1, then the generator can emit the reference to variable i without having generated the definition of i. This is an error in the generator and it should be the responsibility of a good metaprogramming language tool to prevent such errors by statically examining the generator. (Of course, it is rather easy to catch this error at generation time of the i++ fragment, but this just shifts the blame: the generator does not produce an invalid program, but fails to produce anything.) In fact, the error may only occur after the generator writer has tested and widely deployed the generator. The error will then be detected by some random user. The analogy to compilers for general-purpose languages is interesting. Consider a C compiler that for some perfectly legal C inputs produces executable files in an illegal binary format—e.g., illegal ELF or .EXE files or machine code for the wrong architecture. These files do not represent programs and will not fail at run-time: the operating system will reject them at load time. Nevertheless, this is small consolation to the C compiler author: his/her artifact was deployed while containing basic errors and has failed during its run-time. If there existed a language tool whose purpose was to help C compiler writers, it would be desirable to statically check that the compiler always produces legal executables for legal inputs. In general meta-programming the problem is both more pronounced (due to the multitude of potential generators relative to the small number of C compilers) and harder (due to the richness of target high-level languages relative to the shallow rules for binary executable formats). ated programs. Specifically, multi-stage languages, such as MetaML [12] and MetaOCaml [2], guarantee that the generated program is type-correct by statically checking the generator. In this sense, multi-stage languages represent the state of the art in static safety checking of generators. Nevertheless, as we will see, staging applies severe restrictions on the structure of the generator and prohibits the expression of arbitrary generators, like GOTECH. Staging language constructs are fairly common in the partial evaluation community. They comprise primitives for quoting, unquoting, and performing run-time evaluation (with an eval/run construct) of code representations. The focus of multi-stage languages, however, is performance. Multi-stage languages serve as assembly languages for partial evaluation. The staging primitives are sufficient for expressing any partial evaluation task, either hand-staged or derived by an automatic partial evaluator through binding time analysis. Staging differs from general program generation, however. The hallmark property of multi-stage languages is the erasure property: erasing the staging constructs should yield a meaningful program. Clearly this property does not hold for arbitrary program generation, even when the meta-language and the object language are the same. In MAJ, for instance, we can write: infer x = ‘[ class A {} ]; Removing the quote construct does not result in a legal Java program, however. For an example of a staged program, consider a simple exponentiation function (we use MAJ-style quotes and unquotes but the function is shown in ML-like pseudo-code, to avoid low-level complexities of explicit typing): exp(n,a) = if (a == 0) 1 else n * exp(n, a-1) If we would like to stage this function with respect to a statically known exponent, the result would be: exp(n,a) = if (a == 0) ‘[1] else ‘[#[n] * #[exp(n, a-1)]] It is easy to see that erasing the staging constructs from the second exp function will yield exactly the first one. (As a side note, in this example the two exp functions are not exactly equivalent. The latter exp is a generator even when all involved values are known: exp(3,4) will yield ‘[3*3*3*3*1]. The run/eval primitive can then evaluate this expression.) In order to offer type correctness guarantees on the generated code, multi-stage languages place rigid syntactic restrictions on the generator. The first restriction is that the binding of an identifier should be clear from the generator code. Specifically, the only way to quote a code fragment containing free variables is in the scope of another quote that contains definitions for these variables. That is, we can write: 3.2 State of the Art and Why We Need More ‘[ int i; #[fun( ‘[ i++; ] )] ] Static safety checking is a hard property to ensure, even when we limit our attention to language legality checks. Nevertheless, an interesting special case of program generation already offers strong legality guarantees for gener- In this fragment, i occurs free in the inner quote, but is bound to a definition in the outer quote. We cannot, however, write two independent quote expressions containing a separate definition and reference, such as: ... ‘[ int i; ] ... ‘[ i++; ] This is a serious restriction for arbitrary program generation. Supporting separate variable definition and reference would dissociate the structure of the generator source code from the structure of the generated program, allowing more freedom in writing the generator. The main problem is not one of scoping—one can imagine a namespace construct that ties together independent variable definitions and references. Instead, the greater issue is control flow. Just as we saw in Section 3.1, separating the definitions and references can result in generated programs referencing undeclared variables. Furthermore, the same quoted definition may be used to create multiple actual variable definitions in the generated program. For instance, consider the fragment: while (pred1()) { emit( ‘[int i;] ); ... } ...‘[i++;] There may be several distinct definitions generated by the quoted fragment ‘[int i;]. It may seem at first that the problem is the name conflicts among the i definitions. Nevertheless, a good meta-programming system should rename the actual variable names, from i to some unique identifier for every iteration, to avoid accidental name conflicts with both user-supplied code and generated declarations. (This is part of the well-studied hygiene problem for metaprogramming [7, 4, 8].) The more important issue, however, is that it is not clear which binding of i is intended in the generated expression i++. Current systems that offer static legality guarantees also restrict the names used in the generated program. In multistage languages, it is not legal to unquote an identifier in a binding position. Instead, all names used in definitions and references are constants. For instance, it is not possible to generate the definition of a variable whose name will be determined at generator run-time, such as: emit( ‘[ int #name; ] ); This is a severe limitation. Recall the program manipulations performed in GOTECH. One of the most straightforward ones is to generate isomorphic interfaces for each input class. Such interfaces need to have methods with names that depend on the generator input. In general, program generators commonly emit both definitions and references with names that are not statically known. For instance, a generated program may be calling existing methods of classes, whose names are discovered at generation time. Allowing non-constant names in quoted expressions is an issue of data flow in the generator. For example, consider a generator that introduces two new names in the same lexical context: emit( ‘[ int #name1; int #name2; ] ); For static language legality checking, we need to know that name1 and name2 do not hold the same value (or we will end up with a duplicate variable definition in the generated program). 3.3 Open Problems and Promising Directions Based on the above observations, meta-programming tools need to advance if they are to support realistic guaranteed-legal generators. There is one general design direction that we are pursuing and we consider particularly promising. Given that the problem is one of data and control flow analysis in the generator, we expect that an easy-toanalyze meta-programming language will be beneficial. For instance, consider a sub-Turing-computable language with controlled iteration and selection. This language can integrate a reflective mechanism (analyzing an input program to extract its elements) with program generation. That is, the language can define iterators over existing programs. The iterators can range over, say, all fields of a class, all arguments of a method, all classes in a package, etc. All program generation should be predicated on an iterator: copies of the quoted code will be generated for each iteration. For example, we could have a code generation expression such as: #for[f in Field(c), ‘[ #[Type(f)] #[Name(f)] ; ]] (The #for primitive is part of our invented syntax, as are the usual ‘[...] and #[...]. Field, Type, and Name are iterator functions.) In this example, definitions for several variables are being generated. The generated variable names are not statically known but they depend on existing names. Thus, static checking can be done, based on the assumption that the input program is legal. For instance, the above code fragment can never generate a duplicate definition: the generated names are in a one-to-one mapping with fields of input class c, which are guaranteed to be uniquely named. Similarly, when generating references, we can use the iterators to match them to generated definitions. For instance, we can refer to the variables generated by the above fragment in code such as: #for[f in Field(c), ‘[insert(#[Name(f)])] ] We know that the emitted code refers to valid variable names because the iterator f ranges over the same values (all fields of class c) when generating both definitions and references. More specifically, we want to define collections of elements of existing programs in a relational view. That is, we will treat the input program as a set of relations (or a single big relation), matching packages and classes, classes and fields, methods and arguments, etc. Each of our iterators will then be defined using a relational calculus query. That is, we can define iterators by selecting elements in a relation through a logical language that supports universal and existential quantifiers. For instance, we can define an iterator over all classes with at least one “synchronized” method. Similarly, we can define an iterator over all classes supporting all methods from a certain interface (i.e., for every interface method, there is a method in the class with identical name and type signature). Subsequently we can use the iterators to determine the control and data flow of program generation. Our controlled-iteration language will be able to interface with a general-purpose language but the analysis will be limited to the controlled-iteration language. For a more complete example, consider a routine generating an isomorphic interface for a given input class. In our syntax, this would be written as: Input [ cl :: Class; ] 4. CONCLUSIONS pm(c::Class) = m::Method(c) : "public" in Modifier(m); total() = ‘[ interface #name[cl, "Isomorphic"] { #for[m in pm(cl), ‘[#[RetType(m)] #[Name(m)] (#for[a in Args(m),‘[#[Type(a)] arg_name]]) ] ] } ] ; The Input clause in the above program declares cl as an input class relation. Then, pm is defined as an iterator over all public methods of the class. Finally, the generator function total generates an interface (named after the input class, with the appended suffix ”Isomorphic”) that contains all public methods of cl with identical signatures. All arguments are named arg_name in the above code: their actual names do not matter and the generated names will be unique to avoid conflicts. The above description is sketchy—we omitted many of the specifics of the syntax and semantics of the proposed language, other language constructs for conditional generation, etc. Nevertheless, the example is hopefully sufficient to outline the interesting conceptual problems. In order to match references to definitions (e.g., for type checking, or checking for undeclared variable references) we need to know which definitions are generated under the conditions for generating the reference. Recall our problem from Section 3.1: if (pred1()) emit( ‘[int i;] ); ... if (pred2()) emit( ‘[i++;] ); We need to know whether pred2() implies pred1(). With our controlled language, all generation is predicated on an iterator. Thus our problem is one of containment of the relations ranged over by two iterators. This is not a decidable problem in the full relational calculus but there are many well-known restrictions that result in decidable containment problems. The restriction could be imposed on the syntax or on the reasoning power. We outlined above just one design direction in the space of guaranteed-legal meta-programming. In the end, the right solution will be an engineering tradeoff between expressiveness and safety. Many interesting open issues remain to be explored, even in the immediate vicinity of the ideas we outlined above: • Can we do a powerful enough data- and control-flow analysis for a general purpose, Turing-complete language that will support static legality checking while allowing to express most common generation tasks? • For controlled-iteration languages, like the one sketched above, what is an expressive logic that allows decidable containment checking? • What is a good balance between the run-time complexity of containment checking and the expressiveness of the logic? In this paper, we gave a focused overview of program generation and meta-programming tools. Our purpose was not to offer a comprehensive survey—good surveys on program generation already exist [5, 9]. Notably, Jones and Glenstrup [5] offer an extensive survey of program generation as it pertains to partial evaluation. Here we presented our views on the practical value of program generators and the research value of meta-programming. Our arguments on how meta-programming infrastructure affects generators were illustrated with two of our recent research artifacts: the Meta-AspectJ tool and the GOTECH generator. Finally, we discussed promising future directions for meta-programming research. We hope that meta-programming will soon mature to become a valued tool in the arsenal of the sophisticated programmer, making program generation a viable option for a multitude of software domains. 5. ACKNOWLEDGMENTS This research was supported by the NSF through grants CCR-0238289 and CCR-0220248, and by the Georgia Electronic Design Center. We want to thank Pete Manolios and the members of the IFIP WG 2.11 for valuable discussions and inspiration. Peter Sestoft’s and Walid Taha’s comments on our draft improved the quality of the paper. 6. REFERENCES [1] D. Batory, B. Lofaso, and Y. Smaragdakis. JTS: tools for implementing domain-specific languages. In Proceedings Fifth International Conference on Software Reuse, pages 143–153, Victoria, BC, Canada, 1998. IEEE. [2] C. Calcagno, W. Taha, L. Huang, and X. Leroy. Implementing multi-stage languages using ASTs, gensym, and reflection. In Generative Programming and Component Engineering (GPCE) Conference, LNCS 2830, pages 57–76. Springer, 2003. [3] S. Chiba. A metaobject protocol for C++. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’95), SIGPLAN Notices 30(10), pages 285–299, Austin, Texas, USA, Oct. 1995. [4] W. Clinger. Macros that work. In Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, pages 155–162. ACM Press, 1991. [5] N. D. Jones and A. J. Glenstrup. Program generation, termination, and binding-time analysis. In Generative Programming and Component Engineering (GPCE) Conference, LNCS 2487, pages 1–31. Springer, 2002. [6] G. Kiczales, E. Hilsdale, J. Hugunin, M. Kersten, J. Palm, and W. Griswold. Getting started with AspectJ. Communications of the ACM, 44(10):59–65, 2001. [7] E. Kohlbecker, D. P. Friedman, M. Felleisen, and B. Duba. Hygienic macro expansion. In Proceedings of the 1986 ACM conference on LISP and functional programming, pages 151–161. ACM Press, 1986. [8] Y. Smaragdakis and D. Batory. Scoping constructs for program generators. In Generative and [9] [10] [11] [12] [13] [14] [15] [16] Component-Based Software Engineering Symposium (GCSE), 1999. Earlier version in Technical Report UTCS-TR-96-37. Y. Smaragdakis and D. Batory. Application generators. Encyclopedia of Electrical and Electronics Engineering, 2000. J.G. Webster (ed.), John Wiley and Sons. S. Soares, E. Laureano, and P. Borba. Implementing distribution and persistence aspects with AspectJ. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’02), pages 174–190. ACM Press, 2002. A. Stevens et al. XDoclet Web site, http://xdoclet.sourceforge.net/. W. Taha and T. Sheard. Multi-stage programming with explicit annotations. In Partial Evaluation and Semantics-Based Program Manipulation, Amsterdam, The Netherlands, June 1997, pages 203–217. New York: ACM, 1997. E. Tilevich, S. Urbanski, Y. Smaragdakis, and M. Fleury. Aspectizing server-side distribution. In Proceedings of the Automated Software Engineering (ASE) Conference. IEEE Press, October 2003. E. Visser. Meta-programming with concrete object syntax. In Generative Programming and Component Engineering (GPCE) Conference, LNCS 2487, pages 299–315. Springer, 2002. D. Weise and R. F. Crew. Programmable syntax macros. In SIGPLAN Conference on Programming Language Design and Implementation, pages 156–165, 1993. D. Zook, S. S. Huang, and Y. Smaragdakis. Generating AspectJ programs with Meta-AspectJ. In Proceedings of the 2004 Generative Progamming and Component Engineering (GPCE) Conference. Springer-Verlag, to appear.