PetitJava

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

PetitJava

Alberto Bacchelli
Hi,

 As I wrote in my previous e-mail, I am trying to write the Java grammar
for PetitParser. I am following "The Java Language Specification,
Third Edition",
which is the last available book written by Sun with the specification
for the whole
Java language [1]. It covers Java 1.5.

The project, PetitJava, is on SqueakSource [2].

I've just "finished" to implement the literals [3]
and I am trying it with some tests.
I find something strange, thus probably there is something
in PetitParser that I don't get.

As an example, among the primitives in PPJavaGrammar you find:

<snips>
PPJavaGrammar>>octalEscape
        ^ $\ asParser , ( octalDigit / (octalDigit , octalDigit) /
(zeroToThree , octalDigit , octalDigit)   )

PPJavaGrammar>>octalDigit
        ^PPPredicateObjectParser anyOf: '01234567'

PPJavaGrammar>>zeroToThree
        ^PPPredicateObjectParser anyOf: '0123'
</snips>

I take some one of the failing tests as an example:

<snips>
PPJavaGrammarTest>>testOctalEscape1
        self parse: '\0' rule: #octalEscape

PPJavaGrammarTest>>testOctalEscape2
        self parse: '\00' rule: #octalEscape

PPJavaGrammarTest>>testOctalEscape3
        self parse: '\000' rule: #octalEscape
</snips>

While the first test passes as I expected,
the second and the third ones do not.
However, they should be recognized as:

$\ , octalDigit , octalDigit

and

$\ , zeroToThree, octalDigit , octalDigit

as stated in the "octalEscape" implementation.

Is there something wrong with my implementation,
or didn't I understand correctly the '/' operator?

Thank you,
 Alberto


[1] http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html
[2] http://www.squeaksource.com/PetitJava.html
[3] http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: PetitJava

Alberto Bacchelli
[...]
> Is there something wrong with my implementation,
> or didn't I understand correctly the '/' operator?

I also tried to change the implementation of octalEscape in the following way
(to put the longest match as the first option), but without good results:

<snip>
PPJavaGrammar>>octalEscape
        ^ $\ asParser , ( (zeroToThree , octalDigit , octalDigit) /
(octalDigit , octalDigit) / octalDigit   )
</snip>
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: PetitJava

Alberto Bacchelli
I tried to reproduce the issue on my workspace:

<snip>
octalDigit := PPPredicateObjectParser anyOf: '01234567'.
octalDigits := octalDigit plus.
zeroToThree := PPPredicateObjectParser anyOf: '0123'.
octalEscape := $\ asParser , ( (zeroToThree , octalDigit , octalDigit)
/ (octalDigit , octalDigit) / octalDigit  ).

(octalEscape end parse: '\000') isPetitFailure
</snip>

In this case, it seems that changing the order of the different options
( (zeroToThree , octalDigit , octalDigit) / (octalDigit , octalDigit)
/ octalDigit )
matters. In fact, with the longest option, it is able to recognize all
the possibility.

So, is it correct that I have to put always the longest option in this cases?
Are there other approaches?

Finally, it does not work in the compiled PPJavaGrammar class.
Is there a way to reset all the variables, so that they will be
recompiled again?

Thank you,
 Alberto

On 27 July 2010 17:08, Alberto Bacchelli <[hidden email]> wrote:

> [...]
>> Is there something wrong with my implementation,
>> or didn't I understand correctly the '/' operator?
>
> I also tried to change the implementation of octalEscape in the following way
> (to put the longest match as the first option), but without good results:
>
> <snip>
> PPJavaGrammar>>octalEscape
>        ^ $\ asParser , ( (zeroToThree , octalDigit , octalDigit) /
> (octalDigit , octalDigit) / octalDigit   )
> </snip>
>

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: PetitJava

Bergel, Alexandre
Hi Alberto,

The Java grammar is quite large. If you want to import Java code into Moose without using inFuzion, an alternative is to take the output of srcML (which is able to process Java) and to realize XPath queries.
The tricky part to represent Java code in Moose is about the type resolution. This is not trivial (not difficult, just extremely boring). Even though a type evaluator is easily implementable, it is a bit like implementing a semantic evaluator.

Cheers,
Alexandre


On 27 Jul 2010, at 17:26, Alberto Bacchelli wrote:

> I tried to reproduce the issue on my workspace:
>
> <snip>
> octalDigit := PPPredicateObjectParser anyOf: '01234567'.
> octalDigits := octalDigit plus.
> zeroToThree := PPPredicateObjectParser anyOf: '0123'.
> octalEscape := $\ asParser , ( (zeroToThree , octalDigit , octalDigit)
> / (octalDigit , octalDigit) / octalDigit  ).
>
> (octalEscape end parse: '\000') isPetitFailure
> </snip>
>
> In this case, it seems that changing the order of the different options
> ( (zeroToThree , octalDigit , octalDigit) / (octalDigit , octalDigit)
> / octalDigit )
> matters. In fact, with the longest option, it is able to recognize all
> the possibility.
>
> So, is it correct that I have to put always the longest option in this cases?
> Are there other approaches?
>
> Finally, it does not work in the compiled PPJavaGrammar class.
> Is there a way to reset all the variables, so that they will be
> recompiled again?
>
> Thank you,
> Alberto
>
> On 27 July 2010 17:08, Alberto Bacchelli <[hidden email]> wrote:
>> [...]
>>> Is there something wrong with my implementation,
>>> or didn't I understand correctly the '/' operator?
>>
>> I also tried to change the implementation of octalEscape in the following way
>> (to put the longest match as the first option), but without good results:
>>
>> <snip>
>> PPJavaGrammar>>octalEscape
>>        ^ $\ asParser , ( (zeroToThree , octalDigit , octalDigit) /
>> (octalDigit , octalDigit) / octalDigit   )
>> </snip>
>>
>
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: PetitJava

Alberto Bacchelli
Alexandre thank you for your suggestion.
Actually, I already used srcML last year for another project,
as you suggested me.

Unfortunately, this time, I really need such a grammar in PetitParser
(long story).
Since I am not proficient enough with PetitParser, I discarded the
idea of creating a translator
from grammars specified in other syntaxes (e.g., antlr) to PetitParser.
I hope to be able to do it in the future :)

Cheers,
 Alberto

On 27 July 2010 17:39, Bergel, Alexandre <[hidden email]> wrote:

> Hi Alberto,
>
> The Java grammar is quite large. If you want to import Java code into Moose without using inFuzion, an alternative is to take the output of srcML (which is able to process Java) and to realize XPath queries.
> The tricky part to represent Java code in Moose is about the type resolution. This is not trivial (not difficult, just extremely boring). Even though a type evaluator is easily implementable, it is a bit like implementing a semantic evaluator.
>
> Cheers,
> Alexandre
>
>
> On 27 Jul 2010, at 17:26, Alberto Bacchelli wrote:
>
>> I tried to reproduce the issue on my workspace:
>>
>> <snip>
>> octalDigit := PPPredicateObjectParser anyOf: '01234567'.
>> octalDigits := octalDigit plus.
>> zeroToThree := PPPredicateObjectParser anyOf: '0123'.
>> octalEscape := $\ asParser , ( (zeroToThree , octalDigit , octalDigit)
>> / (octalDigit , octalDigit) / octalDigit  ).
>>
>> (octalEscape end parse: '\000') isPetitFailure
>> </snip>
>>
>> In this case, it seems that changing the order of the different options
>> ( (zeroToThree , octalDigit , octalDigit) / (octalDigit , octalDigit)
>> / octalDigit )
>> matters. In fact, with the longest option, it is able to recognize all
>> the possibility.
>>
>> So, is it correct that I have to put always the longest option in this cases?
>> Are there other approaches?
>>
>> Finally, it does not work in the compiled PPJavaGrammar class.
>> Is there a way to reset all the variables, so that they will be
>> recompiled again?
>>
>> Thank you,
>> Alberto
>>
>> On 27 July 2010 17:08, Alberto Bacchelli <[hidden email]> wrote:
>>> [...]
>>>> Is there something wrong with my implementation,
>>>> or didn't I understand correctly the '/' operator?
>>>
>>> I also tried to change the implementation of octalEscape in the following way
>>> (to put the longest match as the first option), but without good results:
>>>
>>> <snip>
>>> PPJavaGrammar>>octalEscape
>>>        ^ $\ asParser , ( (zeroToThree , octalDigit , octalDigit) /
>>> (octalDigit , octalDigit) / octalDigit   )
>>> </snip>
>>>
>>
>> _______________________________________________
>> Moose-dev mailing list
>> [hidden email]
>> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>>
>
> --
> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
> Alexandre Bergel  http://www.bergel.eu
> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>
>
>
>
>
>
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: PetitJava

Alberto Bacchelli
In reply to this post by Alberto Bacchelli
[...]
> Finally, it does not work in the compiled PPJavaGrammar class.
> Is there a way to reset all the variables, so that they will be
> recompiled again?

This was a silly question. I just noticed that the problem
was a singleton that was keeping alive the instance of PPJavaGrammar.
Sorry :)
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: PetitJava

Lukas Renggli
In reply to this post by Alberto Bacchelli
> In this case, it seems that changing the order of the different options
> ( (zeroToThree , octalDigit , octalDigit) / (octalDigit , octalDigit)
> / octalDigit )
> matters. In fact, with the longest option, it is able to recognize all
> the possibility.

#/ is an ordered choice, see the method comment of #/ and the class
comment of PPChoiceParser.

To summarize: #/ tries the first choice (the receiver): if that works
it returns the result without trying any other choice; if that doesn't
work, it backtracks and tries the next choice, and so on ...

> So, is it correct that I have to put always the longest option in this cases?
> Are there other approaches?

Not necessarily, you put first what has the highest priority. Note
that the ordered choice #/ is similar, but not exactly the same as the
unordered choice #| in traditional LR/LALR grammars.

> Since I am not proficient enough with PetitParser, I discarded the
> idea of creating a translator from grammars specified in other
> syntaxes (e.g., antlr) to PetitParser. I hope to be able to do it in the
> future :)

Yes, that would be something awesome to have!

Lukas

--
Lukas Renggli
www.lukas-renggli.ch
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: PetitJava

Alberto Bacchelli
On 27 July 2010 21:17, Lukas Renggli <[hidden email]> wrote:
[...]
>
> #/ is an ordered choice, see the method comment of #/ and the class
> comment of PPChoiceParser.
>
> To summarize: #/ tries the first choice (the receiver): if that works
> it returns the result without trying any other choice; if that doesn't
> work, it backtracks and tries the next choice, and so on ...

I see. That's why it is important to write a good grammar:
Otherwise you would backtrack too often.

>> So, is it correct that I have to put always the longest option in this cases?
>> Are there other approaches?
>
> Not necessarily, you put first what has the highest priority.

You are right. I should rephrase my question in:
Is it correct to put always the longest option when the others
are a subset of it? :)

> Note that the ordered choice #/ is similar, but not exactly the same as the
> unordered choice #| in traditional LR/LALR grammars.

This was probably my problem: I didn't get it.
Now it seems pretty clear. Thank you!


Ciao,
 Alberto
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: PetitJava

Lukas Renggli
>>> So, is it correct that I have to put always the longest option in this cases?
>>> Are there other approaches?
>>
>> Not necessarily, you put first what has the highest priority.
>
> You are right. I should rephrase my question in:
> Is it correct to put always the longest option when the others
> are a subset of it? :)

Yes. In your case the subset (or sub-langugage) relationship is quite
clear, but generally that can be less obvious.

For example in the Smalltalk grammar

    identifier / 'true' asParser

does not work, because 'true' is also an identifier. Reordering to

   'true' asParser / identifier

alone does not solve the problem, because the identifier 'truehearted'
would succeed as 'true' and 'hearted' would be tried on the following
parser (the identifier parser would never be touched). To make it work
you need something like

   ('true' asParser , #word asParser not) / identifier

what means that 'true' not directly followed by another identifier-character.

#not is a predicate (as well as #and) that does not consume input but
that can provide an unlimited lookahead. This makes PetitParser able
to parse more languages than a typical LR/LALR grammar.

Lukas

--
Lukas Renggli
www.lukas-renggli.ch
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev