PetitParser: splitting a grammar in classes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

PetitParser: splitting a grammar in classes

Alberto Bacchelli
Hi,

 Yet another question on PetitParser :)
All the grammars that I find in PetitParser (e.g., PetitXML,
PetitSmalltalk) are defined in
a single class called PP...Grammar. However, the Java grammar has many rules and
including all of them in a single class seems not the right approach.
For example, now I have a class called PPJavaLexicon, in which I cover the rules
for finding tokens and comments (i.e. the lexical structure [1]).
Then, for example,
I would continue working on types, values, and variables [2]. So, I would
create another class that references PPJavaLexicon and uses the
rules defined there to define the new ones. Something like:

PPJavaTypes>>typeVariable
 ^ppJavaLexicon identifier

Is this a good approach to split a grammar in more classes,
or would you suggest something different?

Thank you,
 Alberto


[1] http://java.sun.com/docs/books/jls/third_edition/html/lexical.html
[2] http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: PetitParser: splitting a grammar in classes

Lukas Renggli
> a single class called PP...Grammar. However, the Java grammar has many rules and
> including all of them in a single class seems not the right approach.

Sure, depending on the size and structure of the grammar you might
want to split it into multiple classes.

> For example, now I have a class called PPJavaLexicon, in which I cover the rules
> for finding tokens and comments (i.e. the lexical structure [1]).
> Then, for example,
> I would continue working on types, values, and variables [2]. So, I would
> create another class that references PPJavaLexicon and uses the
> rules defined there to define the new ones. Something like:
>
> PPJavaTypes>>typeVariable
>  ^ppJavaLexicon identifier

Yes, that's a possibility that works well. Maybe better use the
accessor #productionAt: to access the cached productions of a
different grammar, otherwise you end up with much larger grammars than
necessary.

> Is this a good approach to split a grammar in more classes,
> or would you suggest something different?

The problem of splitting up the grammars as you propose is that it is
not that easy anymore (but still possible) when you want to use
subclassing to customize the grammar with different production
actions.

Another (and more traditional approach) is to use a separate lexer:
You can see that in TextLint (check on squeaksource.com). There we
have different lexers for plain text, LaTeX and HTML; and a parser for
a 'natural language' of words, sentences, and paragraphs (very simple)
that can be composed in different ways. For Java such a split probably
doesn't make sense, but it is a good example of PetitParser being very
flexible to different requirements.

Also you might want to look at my work on language embedding,
especially <http://scg.unibe.ch/archive/papers/Reng09cLanguageBoxes.pdf>.
There we programmatically compose different languages modeled as
PPCompositeParser instances at specific join-points.

Lukas

--
Lukas Renggli
www.lukas-renggli.ch

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev