bnfc/document/pragmas.html at master · BNFC/bnfc

History

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">

<html>

<head>

<title>New Pragmas in the BNF Converter</title>

</head>

<h1>New Pragmas in the BNF Converter</h1>

<p>

By Aarne Ranta, September 19, 2003.

<p>

This document is a supplement to the BNFC documentation, aimed to explain

the new pragmas introduced in versions 1.3 and 1.9b. Its content will

be incorporated in the BNFC report later.

<p>

<h2>New pragmas in BNFC version 1.3</h2>

These pragmas do not add to the expressive power of BNFC, but are just

shorthands for groups of rules.

<h3>Terminators and separators</h3>

The <tt>terminator</tt> pragma defines a pair of list rules by what

token terminates each element in the list. For instance,

<pre>

terminator Stm ";" ;

</pre>

tells that each statement (<tt>Stm</tt>) is terminated with a semicolon

(<tt>;</tt>). It is a shorthand for the pair of rules

<pre>

[]. [Stm] ::= ;

(:). [Stm] ::= Stm ";" [Stm] ;

</pre>

The qualifier <tt>nonempty</tt> in the pragma makes one-element list

to be the base case. Thus

<pre>

terminator nonempty Stm ";" ;

</pre>

is shorthand for

<pre>

(:[]). [Stm] ::= Stm ";" ;

(:). [Stm] ::= Stm ";" [Stm] ;

</pre>

The terminator can be specified as empty <tt>""</tt>. No token is

introduced then, but e.g.

<pre>

terminator Stm "" ;

</pre>

is translated to

<pre>

[]. [Stm] ::= ;

(:). [Stm] ::= Stm [Stm] ;

</pre>

<p>

The <tt>separator</tt> pragma is similar to <tt>terminator</tt>,

except that the separating token is not attached to the last

element. Thus

<pre>

separator Stm ";" ;

</pre>

means

<pre>

[]. [Stm] ::= ;

(:[]). [Stm] ::= Stm ;

(:). [Stm] ::= Stm ";" [Stm] ;

</pre>

whereas

<pre>

separator nonempty Stm ";" ;

</pre>

means

<pre>

(:[]). [Stm] ::= Stm ;

(:). [Stm] ::= Stm ";" [Stm] ;

</pre>

Notice that, if the empty token <tt>""</tt> is used, there

is no difference between <tt>terminator</tt> and <tt>separator</tt>.

<p>

<b>Problem</b>.

The grammar generated from

a <tt>separator</tt> without <tt>nonempty</tt>

will actually

also accept a list terminating with a semicolon, whereas

the pretty printer "normalizes" it away. This might be considered

as a bug, but a set of rules

forbidding the terminating semicolon would be much more

complicated. The <tt>nonempty</tt> case is strict.

<h3>Coercions</h3>

The <tt>coercions</tt> pragma is a shorthand for a group of rules

translating between precedence levels. For instance,

<pre>

coercions Exp 3 ;

</pre>

is shorthand for

<pre>

_. Exp ::= Exp1 ;

_. Exp1 ::= Exp2 ;

_. Exp2 ::= Exp3 ;

_. Exp3 ::= "(" Exp ")" ;

</pre>

Because of the total coverage of these coercions,

it does not matter in practice if the integer indicating the highest level

(here <tt>3</tt>) is bigger than the highest level actually occurring,

or if there are some other levels without productions in the grammar.

<h3>Rules</h3>

The <tt>rules</tt> pragma is a shorthand for a set of rules from which

labels are generated automatically. For instance,

<pre>

rules Type ::= "int" | "float" | "double" | "long" ;

</pre>

is shorthand for

<pre>

Type_int. Type ::= "int" ;

Type_float. Type ::= "float" ;

Type_double. Type ::= "double" ;

Type_long. Type ::= "long" ;

</pre>

The labels are created automatically. If the production has just

one item, the label looks natural. If it is longer, the type

name indexed with an integer is used. No global checks are

performed when generating these labels. Any label name clashes that

result from them are captured by BNFC type checking on the generated

rules.

<h2>New pragmas in BNFC version 1.9b: layout syntax</h2>

Those who do not know what layout syntax is or who do not like

it can skip this section.

<p>

These new pragmas define a <b>layout syntax</b> for a language.

Before these pragmas were added, layout syntax was not definable in

BNFC. The layout pragmas are only

available for the files generated for Haskell-related tools;

if Java or C++ programmers want to handle layout, they can use

the Haskell layout resolver as a preprocessor to their

front end, before the lexer. In Haskell, the layout resolver

appears, automatically, in its most natural place, which is between the

lexer and the parser. The layout pragmas of BNFC are not

powerful enough to handle the full layout rule of Haskell 98,

but they suffice for the "regular" cases.

<p>

Here is an example, found in the

grammar <tt>layout/Alfa2.cf</tt>.

<pre>

layout "of", "let", "where", "sig", "struct" ;

</pre>

The first line says that <tt>"of", "let", "where", "sig", "struct"</tt>

are <b>layout words</b>, i.e. start a

<b>layout list</b>. A layout list is a list of expressions

normally enclosed in curly brackets and separated by semicolons,

as shown by the Alfa example

<pre>

ECase. Exp ::= "case" Exp "of" "{" [Branch] "}" ;

separator Branch ";" ;

</pre>

When the layout resolver finds the token <tt>of</tt> in the code

(i.e. in the sequence of its lexical tokens), it

checks if the next token is an opening curly bracket. If it

is, nothing special is done until a layout word is encountered again.

The parser will expect the semicolons and the closing bracket

to appear as usual.

<p>

But, if the token <i>t</i>

following <tt>of</tt> is not an opening curly bracket,

a bracket is inserted, and the start column of <i>t</i>

is remembered as the position at which the elements of the

layout list must begin. Semicolons are inserted at those

positions. When a token is eventually encountered left

of the position of <i>t</i> (or an end-of-file), a closing bracket

is inserted at that point.

<p>

Nested layout blocks are allowed, which means that the layout

resolver maintains a stack of positions. Pushing a position

on the stack corresponds to inserting a left bracket, and

popping from the stack corresponds to inserting a right bracket.

<p>

Here is an example of an Alfa source file using layout:

<pre>

c :: Nat = case x of

True -> b

False -> case y of

False -> b

Neither -> d

d = case x of True -> case y of False -> g

x -> b

y -> h

</pre>

Here is what it looks like after layout resolution:

<pre>

c :: Nat = case x of {

True -> b

;False -> case y of {

False -> b

};Neither -> d

};d = case x of {True -> case y of {False -> g

;x -> b

};y -> h} ;

</pre>

<b>Hint</b>. It is good practice to start a new line after

any layout word, to guarantee alpha convertibility.

For instance, if you change the variable name <tt>x</tt>

to <tt>foo</tt>, the second definition above becomes

syntactically incorrect, whereas the first one remains correct.

<p>

There are two more layout-related pragmas. The <tt>layout stop</tt>

pragma, as in

<pre>

layout stop "in" ;

</pre>

tells the resolver that the layout list can be exited with

some stop words, like <tt>in</tt>, which exits a

<tt>let</tt> list. It is no error in the resolver to exit

some other kind of layout list with <tt>in</tt>, but an error will show up

in the parser.

<p>

The <tt>layout toplevel</tt> pragma tells that the whole source file

is a layout list, even though no layout word indicates this.

The position is the first column, and the resolver adds a semicolon

after every paragraph whose first token is at this position. No curly

brackets are added. The Alfa file above is an example of this, with

two such semicolons added.

<p>

To make layout resolution a stand-alone program, e.g. to serve as a

preprocessor, the programmer can modify the file

<tt>layout/ResolveLayoutAlfa.hs</tt> as indicated in the file,

and either compile it or run it in the Hugs interpreter by

<pre>

runhugs ResolveLayoutX.hs <X-source-file>

</pre>

We may add the generation of <tt>ResolveLayoutX.hs</tt>

to a later version of BNFC.

<p>

<b>Bug</b>. The generated layout resolver does not work correctly

if a layout word is the first token on a line.

</body>

</html>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pragmas.html

pragmas.html

Files

pragmas.html

Latest commit

History

pragmas.html

File metadata and controls