Hi,
As I wrote in my previous e-mail, I am trying to write the Java grammar for PetitParser. I am following "The Java Language Specification, Third Edition", which is the last available book written by Sun with the specification for the whole Java language [1]. It covers Java 1.5. The project, PetitJava, is on SqueakSource [2]. I've just "finished" to implement the literals [3] and I am trying it with some tests. I find something strange, thus probably there is something in PetitParser that I don't get. As an example, among the primitives in PPJavaGrammar you find: <snips> PPJavaGrammar>>octalEscape ^ $\ asParser , ( octalDigit / (octalDigit , octalDigit) / (zeroToThree , octalDigit , octalDigit) ) PPJavaGrammar>>octalDigit ^PPPredicateObjectParser anyOf: '01234567' PPJavaGrammar>>zeroToThree ^PPPredicateObjectParser anyOf: '0123' </snips> I take some one of the failing tests as an example: <snips> PPJavaGrammarTest>>testOctalEscape1 self parse: '\0' rule: #octalEscape PPJavaGrammarTest>>testOctalEscape2 self parse: '\00' rule: #octalEscape PPJavaGrammarTest>>testOctalEscape3 self parse: '\000' rule: #octalEscape </snips> While the first test passes as I expected, the second and the third ones do not. However, they should be recognized as: $\ , octalDigit , octalDigit and $\ , zeroToThree, octalDigit , octalDigit as stated in the "octalEscape" implementation. Is there something wrong with my implementation, or didn't I understand correctly the '/' operator? Thank you, Alberto [1] http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html [2] http://www.squeaksource.com/PetitJava.html [3] http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10 _______________________________________________ Moose-dev mailing list [hidden email] https://www.iam.unibe.ch/mailman/listinfo/moose-dev |
[...]
> Is there something wrong with my implementation, > or didn't I understand correctly the '/' operator? I also tried to change the implementation of octalEscape in the following way (to put the longest match as the first option), but without good results: <snip> PPJavaGrammar>>octalEscape ^ $\ asParser , ( (zeroToThree , octalDigit , octalDigit) / (octalDigit , octalDigit) / octalDigit ) </snip> _______________________________________________ Moose-dev mailing list [hidden email] https://www.iam.unibe.ch/mailman/listinfo/moose-dev |
I tried to reproduce the issue on my workspace:
<snip> octalDigit := PPPredicateObjectParser anyOf: '01234567'. octalDigits := octalDigit plus. zeroToThree := PPPredicateObjectParser anyOf: '0123'. octalEscape := $\ asParser , ( (zeroToThree , octalDigit , octalDigit) / (octalDigit , octalDigit) / octalDigit ). (octalEscape end parse: '\000') isPetitFailure </snip> In this case, it seems that changing the order of the different options ( (zeroToThree , octalDigit , octalDigit) / (octalDigit , octalDigit) / octalDigit ) matters. In fact, with the longest option, it is able to recognize all the possibility. So, is it correct that I have to put always the longest option in this cases? Are there other approaches? Finally, it does not work in the compiled PPJavaGrammar class. Is there a way to reset all the variables, so that they will be recompiled again? Thank you, Alberto On 27 July 2010 17:08, Alberto Bacchelli <[hidden email]> wrote: > [...] >> Is there something wrong with my implementation, >> or didn't I understand correctly the '/' operator? > > I also tried to change the implementation of octalEscape in the following way > (to put the longest match as the first option), but without good results: > > <snip> > PPJavaGrammar>>octalEscape > ^ $\ asParser , ( (zeroToThree , octalDigit , octalDigit) / > (octalDigit , octalDigit) / octalDigit ) > </snip> > _______________________________________________ Moose-dev mailing list [hidden email] https://www.iam.unibe.ch/mailman/listinfo/moose-dev |
Hi Alberto,
The Java grammar is quite large. If you want to import Java code into Moose without using inFuzion, an alternative is to take the output of srcML (which is able to process Java) and to realize XPath queries. The tricky part to represent Java code in Moose is about the type resolution. This is not trivial (not difficult, just extremely boring). Even though a type evaluator is easily implementable, it is a bit like implementing a semantic evaluator. Cheers, Alexandre On 27 Jul 2010, at 17:26, Alberto Bacchelli wrote: > I tried to reproduce the issue on my workspace: > > <snip> > octalDigit := PPPredicateObjectParser anyOf: '01234567'. > octalDigits := octalDigit plus. > zeroToThree := PPPredicateObjectParser anyOf: '0123'. > octalEscape := $\ asParser , ( (zeroToThree , octalDigit , octalDigit) > / (octalDigit , octalDigit) / octalDigit ). > > (octalEscape end parse: '\000') isPetitFailure > </snip> > > In this case, it seems that changing the order of the different options > ( (zeroToThree , octalDigit , octalDigit) / (octalDigit , octalDigit) > / octalDigit ) > matters. In fact, with the longest option, it is able to recognize all > the possibility. > > So, is it correct that I have to put always the longest option in this cases? > Are there other approaches? > > Finally, it does not work in the compiled PPJavaGrammar class. > Is there a way to reset all the variables, so that they will be > recompiled again? > > Thank you, > Alberto > > On 27 July 2010 17:08, Alberto Bacchelli <[hidden email]> wrote: >> [...] >>> Is there something wrong with my implementation, >>> or didn't I understand correctly the '/' operator? >> >> I also tried to change the implementation of octalEscape in the following way >> (to put the longest match as the first option), but without good results: >> >> <snip> >> PPJavaGrammar>>octalEscape >> ^ $\ asParser , ( (zeroToThree , octalDigit , octalDigit) / >> (octalDigit , octalDigit) / octalDigit ) >> </snip> >> > > _______________________________________________ > Moose-dev mailing list > [hidden email] > https://www.iam.unibe.ch/mailman/listinfo/moose-dev > -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. _______________________________________________ Moose-dev mailing list [hidden email] https://www.iam.unibe.ch/mailman/listinfo/moose-dev |
Alexandre thank you for your suggestion.
Actually, I already used srcML last year for another project, as you suggested me. Unfortunately, this time, I really need such a grammar in PetitParser (long story). Since I am not proficient enough with PetitParser, I discarded the idea of creating a translator from grammars specified in other syntaxes (e.g., antlr) to PetitParser. I hope to be able to do it in the future :) Cheers, Alberto On 27 July 2010 17:39, Bergel, Alexandre <[hidden email]> wrote: > Hi Alberto, > > The Java grammar is quite large. If you want to import Java code into Moose without using inFuzion, an alternative is to take the output of srcML (which is able to process Java) and to realize XPath queries. > The tricky part to represent Java code in Moose is about the type resolution. This is not trivial (not difficult, just extremely boring). Even though a type evaluator is easily implementable, it is a bit like implementing a semantic evaluator. > > Cheers, > Alexandre > > > On 27 Jul 2010, at 17:26, Alberto Bacchelli wrote: > >> I tried to reproduce the issue on my workspace: >> >> <snip> >> octalDigit := PPPredicateObjectParser anyOf: '01234567'. >> octalDigits := octalDigit plus. >> zeroToThree := PPPredicateObjectParser anyOf: '0123'. >> octalEscape := $\ asParser , ( (zeroToThree , octalDigit , octalDigit) >> / (octalDigit , octalDigit) / octalDigit ). >> >> (octalEscape end parse: '\000') isPetitFailure >> </snip> >> >> In this case, it seems that changing the order of the different options >> ( (zeroToThree , octalDigit , octalDigit) / (octalDigit , octalDigit) >> / octalDigit ) >> matters. In fact, with the longest option, it is able to recognize all >> the possibility. >> >> So, is it correct that I have to put always the longest option in this cases? >> Are there other approaches? >> >> Finally, it does not work in the compiled PPJavaGrammar class. >> Is there a way to reset all the variables, so that they will be >> recompiled again? >> >> Thank you, >> Alberto >> >> On 27 July 2010 17:08, Alberto Bacchelli <[hidden email]> wrote: >>> [...] >>>> Is there something wrong with my implementation, >>>> or didn't I understand correctly the '/' operator? >>> >>> I also tried to change the implementation of octalEscape in the following way >>> (to put the longest match as the first option), but without good results: >>> >>> <snip> >>> PPJavaGrammar>>octalEscape >>> ^ $\ asParser , ( (zeroToThree , octalDigit , octalDigit) / >>> (octalDigit , octalDigit) / octalDigit ) >>> </snip> >>> >> >> _______________________________________________ >> Moose-dev mailing list >> [hidden email] >> https://www.iam.unibe.ch/mailman/listinfo/moose-dev >> > > -- > _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: > Alexandre Bergel http://www.bergel.eu > ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. > > > > > > > _______________________________________________ > Moose-dev mailing list > [hidden email] > https://www.iam.unibe.ch/mailman/listinfo/moose-dev > _______________________________________________ Moose-dev mailing list [hidden email] https://www.iam.unibe.ch/mailman/listinfo/moose-dev |
In reply to this post by Alberto Bacchelli
[...]
> Finally, it does not work in the compiled PPJavaGrammar class. > Is there a way to reset all the variables, so that they will be > recompiled again? This was a silly question. I just noticed that the problem was a singleton that was keeping alive the instance of PPJavaGrammar. Sorry :) _______________________________________________ Moose-dev mailing list [hidden email] https://www.iam.unibe.ch/mailman/listinfo/moose-dev |
In reply to this post by Alberto Bacchelli
> In this case, it seems that changing the order of the different options
> ( (zeroToThree , octalDigit , octalDigit) / (octalDigit , octalDigit) > / octalDigit ) > matters. In fact, with the longest option, it is able to recognize all > the possibility. #/ is an ordered choice, see the method comment of #/ and the class comment of PPChoiceParser. To summarize: #/ tries the first choice (the receiver): if that works it returns the result without trying any other choice; if that doesn't work, it backtracks and tries the next choice, and so on ... > So, is it correct that I have to put always the longest option in this cases? > Are there other approaches? Not necessarily, you put first what has the highest priority. Note that the ordered choice #/ is similar, but not exactly the same as the unordered choice #| in traditional LR/LALR grammars. > Since I am not proficient enough with PetitParser, I discarded the > idea of creating a translator from grammars specified in other > syntaxes (e.g., antlr) to PetitParser. I hope to be able to do it in the > future :) Yes, that would be something awesome to have! Lukas -- Lukas Renggli www.lukas-renggli.ch _______________________________________________ Moose-dev mailing list [hidden email] https://www.iam.unibe.ch/mailman/listinfo/moose-dev |
On 27 July 2010 21:17, Lukas Renggli <[hidden email]> wrote:
[...] > > #/ is an ordered choice, see the method comment of #/ and the class > comment of PPChoiceParser. > > To summarize: #/ tries the first choice (the receiver): if that works > it returns the result without trying any other choice; if that doesn't > work, it backtracks and tries the next choice, and so on ... I see. That's why it is important to write a good grammar: Otherwise you would backtrack too often. >> So, is it correct that I have to put always the longest option in this cases? >> Are there other approaches? > > Not necessarily, you put first what has the highest priority. You are right. I should rephrase my question in: Is it correct to put always the longest option when the others are a subset of it? :) > Note that the ordered choice #/ is similar, but not exactly the same as the > unordered choice #| in traditional LR/LALR grammars. This was probably my problem: I didn't get it. Now it seems pretty clear. Thank you! Ciao, Alberto _______________________________________________ Moose-dev mailing list [hidden email] https://www.iam.unibe.ch/mailman/listinfo/moose-dev |
>>> So, is it correct that I have to put always the longest option in this cases?
>>> Are there other approaches? >> >> Not necessarily, you put first what has the highest priority. > > You are right. I should rephrase my question in: > Is it correct to put always the longest option when the others > are a subset of it? :) Yes. In your case the subset (or sub-langugage) relationship is quite clear, but generally that can be less obvious. For example in the Smalltalk grammar identifier / 'true' asParser does not work, because 'true' is also an identifier. Reordering to 'true' asParser / identifier alone does not solve the problem, because the identifier 'truehearted' would succeed as 'true' and 'hearted' would be tried on the following parser (the identifier parser would never be touched). To make it work you need something like ('true' asParser , #word asParser not) / identifier what means that 'true' not directly followed by another identifier-character. #not is a predicate (as well as #and) that does not consume input but that can provide an unlimited lookahead. This makes PetitParser able to parse more languages than a typical LR/LALR grammar. Lukas -- Lukas Renggli www.lukas-renggli.ch _______________________________________________ Moose-dev mailing list [hidden email] https://www.iam.unibe.ch/mailman/listinfo/moose-dev |
Free forum by Nabble | Edit this page |