Pratt Parsers for PetitParser

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Pratt Parsers for PetitParser

camille teruel
Hello Pharoers and Moosers,

I did a Pratt parser extension for PetitParser.
A Pratt parser (a.k.a top-down operator precedence parser) handles left-recursion and operator precedence.
It handles grouping, prefix, postfix, infix (right- or left-associative) and "multifix” operators (e.g. "if ... then ... else ...", "... ? ... : ...", Smalltalk keyword messages).
Normally Pratt Parsing needs a tokenization phase but here tokenization is done on the fly with other PP parsers.
Apart from tokenization, no backtracking is needed so parsing is quite fast (approximatively 2 times faster than PPExpressionParser).

Here is an exemple of a calculator:

    parser := PPPrattParser new.
    "Numbers"
    parser terminal: #digit asParser plus do: [ :token | token inputValue asNumber ]. 
    parser skip: #space asParser plus.
    "Parentheses"
    parser groupLeft: $( asParser right: $) asParser. 
    "Addition, substraction, multiplication, division: all left infix, * and / have higher precedence than + and -"
    parser leftInfix: $+ asParser precedence: 1 do: [ :left :op :right | left + right ].
    parser leftInfix: $- asParser precedence: 1 do: [ :left :op :right | left - right ].
    parser leftInfix: $* asParser precedence: 2 do: [ :left :op :right | left * right ].
    parser leftInfix: $/ asParser precedence: 2 do: [ :left :op :right | left / right ].
    "Power: right infix with higher precedence than multiplication and division"
    parser rightInfix: $^ asParser precedence: 3 do: [ :left :op :right | left raisedTo: right ].
    "Unary minus: prefix with highest precedence"
    parser prefix: $- asParser precedence: 4 do: [ :op :right | right negated ].

    parser parse: '2*3 + 4^(1/2)*3' ----> 12


To try it:

Gofer it
    smalltalkhubUser: 'CamilleTeruel' project: 'PetitPratt';
    package: 'PetitPratt';
    load

Note that it is in beta stage so it might still change drastically.

@PP Devs: 
I had trouble with the PPContext furthestFailure that is taken into account instead of the failures I return, so I had to redefine #parseWithContext: to return the failures I want. The results given by furthestFailure were not very meaningful in my case (the same is true for PPExpressionParser btw). 
But I guess it was introduced because it gives good results in other cases. 
So would it be possible to change this behavior to let the parser decide if it returns the furthestFailure or the original failure?

Cheers,
Camille
cbc
Reply | Threaded
Open this post in threaded view
|

Re: Pratt Parsers for PetitParser

cbc
Inteteresting....

On Wed, Jun 10, 2015 at 9:36 AM, Camille <[hidden email]> wrote:
Hello Pharoers and Moosers,

I did a Pratt parser extension for PetitParser.

<snip> 
@PP Devs: 
I had trouble with the PPContext furthestFailure that is taken into account instead of the failures I return, so I had to redefine #parseWithContext: to return the failures I want. The results given by furthestFailure were not very meaningful in my case (the same is true for PPExpressionParser btw). 
But I guess it was introduced because it gives good results in other cases. 
So would it be possible to change this behavior to let the parser decide if it returns the furthestFailure or the original failure?

The intent behind the furthestFailure is that it give the failure that gets the furthest into the source stream.  It is most useful when there are embedded choice operators in the parser - the original/pre furthest behaviour would return the last failure, which depending on the incoming stream and the order of the choice options could be significantly not useful.

I ran into this when working with the sql parser, which started off with the outer choice of (by memory):
   ^ selectStatement / insertStatement / updateStatement / deleteStatement
If I was trying to part a select statement that had an error at the very end of the statement, the parser would return an error talking about how the incoming stream failed in deleteStatement.  Not useful.

I would be saddened if this further failure was not available.

-cbc
Reply | Threaded
Open this post in threaded view
|

Re: Pratt Parsers for PetitParser

camille teruel

On 10 Jun 2015, at 19:11, Chris Cunningham <[hidden email]> wrote:

Inteteresting....

On Wed, Jun 10, 2015 at 9:36 AM, Camille <[hidden email]> wrote:
Hello Pharoers and Moosers,

I did a Pratt parser extension for PetitParser.

<snip> 
@PP Devs: 
I had trouble with the PPContext furthestFailure that is taken into account instead of the failures I return, so I had to redefine #parseWithContext: to return the failures I want. The results given by furthestFailure were not very meaningful in my case (the same is true for PPExpressionParser btw). 
But I guess it was introduced because it gives good results in other cases. 
So would it be possible to change this behavior to let the parser decide if it returns the furthestFailure or the original failure?

The intent behind the furthestFailure is that it give the failure that gets the furthest into the source stream.  It is most useful when there are embedded choice operators in the parser - the original/pre furthest behaviour would return the last failure, which depending on the incoming stream and the order of the choice options could be significantly not useful.

I ran into this when working with the sql parser, which started off with the outer choice of (by memory):
   ^ selectStatement / insertStatement / updateStatement / deleteStatement
If I was trying to part a select statement that had an error at the very end of the statement, the parser would return an error talking about how the incoming stream failed in deleteStatement.  Not useful.

I would be saddened if this further failure was not available.

Yes in that case returning the furthest failure gives better results.
However, this don’t give meaningful messages in all cases.
For exemple with the calculator I gave in my previous message, if I parse ‘1+’ I want to get ‘expression expected at: 2’ but instead it returns ‘$- expected at 2'.
I’m not proposing to remove this feature but to let parsers decide to use it or not.
Something like (changes in bold): 

PPParser>>parseWithContext: context
| result |
context initializeFor: self.
result := self parseOn: context.
 
"Return the furthest failure, it gives better results than the last failure"
(result isPetitFailure and: [ self wantsFurthestFailure and: [ context furthestFailure notNil ] ]) 
ifTrue: [ ^ context furthestFailure ].
^ result
PPParser>>wantsFurthestFailure
^ true

Like this, one can return the failures he wants.

PPPrattParser>>wantsFurthestFailure
^ false


Camille


-cbc

cbc
Reply | Threaded
Open this post in threaded view
|

Re: Pratt Parsers for PetitParser

cbc
That sounds very reasonable to me.

-cbc

On Wed, Jun 10, 2015 at 10:45 AM, Camille <[hidden email]> wrote:

On 10 Jun 2015, at 19:11, Chris Cunningham <[hidden email]> wrote:

Inteteresting....

On Wed, Jun 10, 2015 at 9:36 AM, Camille <[hidden email]> wrote:
Hello Pharoers and Moosers,

I did a Pratt parser extension for PetitParser.

<snip> 
@PP Devs: 
I had trouble with the PPContext furthestFailure that is taken into account instead of the failures I return, so I had to redefine #parseWithContext: to return the failures I want. The results given by furthestFailure were not very meaningful in my case (the same is true for PPExpressionParser btw). 
But I guess it was introduced because it gives good results in other cases. 
So would it be possible to change this behavior to let the parser decide if it returns the furthestFailure or the original failure?

The intent behind the furthestFailure is that it give the failure that gets the furthest into the source stream.  It is most useful when there are embedded choice operators in the parser - the original/pre furthest behaviour would return the last failure, which depending on the incoming stream and the order of the choice options could be significantly not useful.

I ran into this when working with the sql parser, which started off with the outer choice of (by memory):
   ^ selectStatement / insertStatement / updateStatement / deleteStatement
If I was trying to part a select statement that had an error at the very end of the statement, the parser would return an error talking about how the incoming stream failed in deleteStatement.  Not useful.

I would be saddened if this further failure was not available.

Yes in that case returning the furthest failure gives better results.
However, this don’t give meaningful messages in all cases.
For exemple with the calculator I gave in my previous message, if I parse ‘1+’ I want to get ‘expression expected at: 2’ but instead it returns ‘$- expected at 2'.
I’m not proposing to remove this feature but to let parsers decide to use it or not.
Something like (changes in bold): 

PPParser>>parseWithContext: context
| result |
context initializeFor: self.
result := self parseOn: context.
 
"Return the furthest failure, it gives better results than the last failure"
(result isPetitFailure and: [ self wantsFurthestFailure and: [ context furthestFailure notNil ] ]) 
ifTrue: [ ^ context furthestFailure ].
^ result
PPParser>>wantsFurthestFailure
^ true

Like this, one can return the failures he wants.

PPPrattParser>>wantsFurthestFailure
^ false


Camille


-cbc


Reply | Threaded
Open this post in threaded view
|

Re: Pratt Parsers for PetitParser

Richard Sargent
Administrator
In reply to this post by camille teruel
camille teruel wrote
> On 10 Jun 2015, at 19:11, Chris Cunningham <[hidden email]> wrote:
>
> Inteteresting....
>
> On Wed, Jun 10, 2015 at 9:36 AM, Camille <[hidden email] <mailto:[hidden email]>> wrote:
> Hello Pharoers and Moosers,
>
> I did a Pratt parser extension for PetitParser.
>
> <snip> 
> @PP Devs:
> I had trouble with the PPContext furthestFailure that is taken into account instead of the failures I return, so I had to redefine #parseWithContext: to return the failures I want. The results given by furthestFailure were not very meaningful in my case (the same is true for PPExpressionParser btw).
> But I guess it was introduced because it gives good results in other cases.
> So would it be possible to change this behavior to let the parser decide if it returns the furthestFailure or the original failure?
>
> The intent behind the furthestFailure is that it give the failure that gets the furthest into the source stream.  It is most useful when there are embedded choice operators in the parser - the original/pre furthest behaviour would return the last failure, which depending on the incoming stream and the order of the choice options could be significantly not useful.
>
> I ran into this when working with the sql parser, which started off with the outer choice of (by memory):
>    ^ selectStatement / insertStatement / updateStatement / deleteStatement
> If I was trying to part a select statement that had an error at the very end of the statement, the parser would return an error talking about how the incoming stream failed in deleteStatement.  Not useful.
>
> I would be saddened if this further failure was not available.

Yes in that case returning the furthest failure gives better results.
However, this don’t give meaningful messages in all cases.
For exemple with the calculator I gave in my previous message, if I parse ‘1+’ I want to get ‘expression expected at: 2’ but instead it returns ‘$- expected at 2'.
I’m not proposing to remove this feature but to let parsers decide to use it or not.
Something like (changes in bold):

PPParser>>parseWithContext: context
        | result |
        context initializeFor: self.
        result := self parseOn: context.
   
        "Return the furthest failure, it gives better results than the last failure"
        (result isPetitFailure and: [ self wantsFurthestFailure and: [ context furthestFailure notNil ] ])
                ifTrue: [ ^ context furthestFailure ].
        ^ result
This screams at me. Why not just delegate to the context and use a context that returns the preferred failure? e.g. end with:
^context preferredResultFor: result.

       
PPParser>>wantsFurthestFailure
        ^ true

Like this, one can return the failures he wants.

PPPrattParser>>wantsFurthestFailure
        ^ false


Camille

>
> -cbc
Reply | Threaded
Open this post in threaded view
|

Re: Pratt Parsers for PetitParser

kurs.jan

That sounds really cool and useful extension.

Regarding the furthest failure, the core of the problem is the distinction between an error and a failure. Error reports on a problem in the input, while failure is information for choice parser combinator. In general, the furthest failure is a better approximation of an error than the last failure, so we use it.

I am not sure what exactly is the problem in case of PrattParser. I guess the last failure gives better results for a user? One has to consider a pratt parser included in the normal parser, e. g. Expressions parsed by pratt in a Java Grammar. Depending where an error occurs, different strategy for choosing the proper failure is necessary :-/

Regarding tokenization, there is a message token, that returns PPTokenParser, which transforms a parsed input into the PPToken object. Perhaps this might be helpful?

Cheers Jan


On Wed, Jun 10, 2015, 20:52 Richard Sargent <[hidden email]> wrote:
camille teruel wrote
>> On 10 Jun 2015, at 19:11, Chris Cunningham &lt;

> cunningham.cb@

> &gt; wrote:
>>
>> Inteteresting....
>>
>> On Wed, Jun 10, 2015 at 9:36 AM, Camille &lt;

> camille.teruel@

>  &lt;mailto:

> camille.teruel@

> &gt;> wrote:
>> Hello Pharoers and Moosers,
>>
>> I did a Pratt parser extension for PetitParser.
>>
>>
> <snip>
>
>> @PP Devs:
>> I had trouble with the PPContext furthestFailure that is taken into
>> account instead of the failures I return, so I had to redefine
>> #parseWithContext: to return the failures I want. The results given by
>> furthestFailure were not very meaningful in my case (the same is true for
>> PPExpressionParser btw).
>> But I guess it was introduced because it gives good results in other
>> cases.
>> So would it be possible to change this behavior to let the parser decide
>> if it returns the furthestFailure or the original failure?
>>
>> The intent behind the furthestFailure is that it give the failure that
>> gets the furthest into the source stream.  It is most useful when there
>> are embedded choice operators in the parser - the original/pre furthest
>> behaviour would return the last failure, which depending on the incoming
>> stream and the order of the choice options could be significantly not
>> useful.
>>
>> I ran into this when working with the sql parser, which started off with
>> the outer choice of (by memory):
>>    ^ selectStatement / insertStatement / updateStatement /
>> deleteStatement
>> If I was trying to part a select statement that had an error at the very
>> end of the statement, the parser would return an error talking about how
>> the incoming stream failed in deleteStatement.  Not useful.
>>
>> I would be saddened if this further failure was not available.
>
> Yes in that case returning the furthest failure gives better results.
> However, this don’t give meaningful messages in all cases.
> For exemple with the calculator I gave in my previous message, if I parse
> ‘1+’ I want to get ‘expression expected at: 2’ but instead it returns ‘$-
> expected at 2'.
> I’m not proposing to remove this feature but to let parsers decide to use
> it or not.
> Something like (changes in bold):
>
> PPParser>>parseWithContext: context
>       | result |
>       context initializeFor: self.
>       result := self parseOn: context.
>
>       "Return the furthest failure, it gives better results than the last
> failure"
>       (result isPetitFailure and: [ self wantsFurthestFailure and: [ context
> furthestFailure notNil ] ])
>               ifTrue: [ ^ context furthestFailure ].
>       ^ result

This screams at me. Why not just delegate to the context and use a context
that returns the preferred failure? e.g. end with:
^context preferredResultFor: result.


>
> PPParser>>wantsFurthestFailure
>       ^ true
>
> Like this, one can return the failures he wants.
>
> PPPrattParser>>wantsFurthestFailure
>       ^ false
>
>
> Camille
>
>>
>> -cbc





--
View this message in context: http://forum.world.st/Pratt-Parsers-for-PetitParser-tp4831456p4831486.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Pratt Parsers for PetitParser

camille teruel
In reply to this post by Richard Sargent

On 10 Jun 2015, at 20:31, Richard Sargent <[hidden email]> wrote:

camille teruel wrote
On 10 Jun 2015, at 19:11, Chris Cunningham &lt;

cunningham.cb@

&gt; wrote:

Inteteresting....

On Wed, Jun 10, 2015 at 9:36 AM, Camille &lt;

camille.teruel@

&lt;mailto:

camille.teruel@

&gt;> wrote:
Hello Pharoers and Moosers,

I did a Pratt parser extension for PetitParser.


<snip>

@PP Devs: 
I had trouble with the PPContext furthestFailure that is taken into
account instead of the failures I return, so I had to redefine
#parseWithContext: to return the failures I want. The results given by
furthestFailure were not very meaningful in my case (the same is true for
PPExpressionParser btw). 
But I guess it was introduced because it gives good results in other
cases. 
So would it be possible to change this behavior to let the parser decide
if it returns the furthestFailure or the original failure?

The intent behind the furthestFailure is that it give the failure that
gets the furthest into the source stream.  It is most useful when there
are embedded choice operators in the parser - the original/pre furthest
behaviour would return the last failure, which depending on the incoming
stream and the order of the choice options could be significantly not
useful.

I ran into this when working with the sql parser, which started off with
the outer choice of (by memory):
  ^ selectStatement / insertStatement / updateStatement /
deleteStatement
If I was trying to part a select statement that had an error at the very
end of the statement, the parser would return an error talking about how
the incoming stream failed in deleteStatement.  Not useful.

I would be saddened if this further failure was not available.

Yes in that case returning the furthest failure gives better results.
However, this don’t give meaningful messages in all cases.
For exemple with the calculator I gave in my previous message, if I parse
‘1+’ I want to get ‘expression expected at: 2’ but instead it returns ‘$-
expected at 2'.
I’m not proposing to remove this feature but to let parsers decide to use
it or not.
Something like (changes in bold): 

PPParser>>parseWithContext: context
| result |
context initializeFor: self.
result := self parseOn: context.

"Return the furthest failure, it gives better results than the last
failure"
(result isPetitFailure and: [ self wantsFurthestFailure and: [ context
furthestFailure notNil ] ]) 
ifTrue: [ ^ context furthestFailure ].
^ result

This screams at me. Why not just delegate to the context and use a context
that returns the preferred failure? e.g. end with:
^context preferredResultFor: result.

Because the same context is passed around by many parsers. 
So if you let the context decide, you get the same behavior for all the parsers.





PPParser>>wantsFurthestFailure
^ true

Like this, one can return the failures he wants.

PPPrattParser>>wantsFurthestFailure
^ false


Camille


-cbc





--
View this message in context: http://forum.world.st/Pratt-Parsers-for-PetitParser-tp4831456p4831486.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Pratt Parsers for PetitParser

camille teruel
In reply to this post by kurs.jan

On 11 Jun 2015, at 09:30, Jan Kurš <[hidden email]> wrote:

That sounds really cool and useful extension.

Thank you Jan!

Regarding the furthest failure, the core of the problem is the distinction between an error and a failure. Error reports on a problem in the input, while failure is information for choice parser combinator. In general, the furthest failure is a better approximation of an error than the last failure, so we use it.

I am not sure what exactly is the problem in case of PrattParser. I guess the last failure gives better results for a user?

Yes, with furthest failure I get errors from tokenization instead of my errors. 
For exemple with the calculator grammar I gave in my first mail when I parse ‘1+’ the furthestFailure is ‘$- expected at 2’ whereas I return ‘expression expected at 2’ because there's a whole expression missing. Same thing with ‘(1+2’ that returns ‘digit expected at 4’ instead of ‘$) expected at 4’.

But furthest failure gives wrong messages in other cases to. 
Consider this sequence parser:

keyword := #letter asParser plus , $: asParser.
keyword parse: ‘foo'

This returns 'letter expected at 3’, but no matter how many letters I add to the end I’ll still get ‘letter expected’. 
I want to get what is really missing: '$: expected at 3’.
Maybe returning the “latest furthest failure” instead of the “first furthest failure” could solves the problem here (i.e. replacing > with >= in PPContext>>#noteFailure:)?

One has to consider a pratt parser included in the normal parser, e. g. Expressions parsed by pratt in a Java Grammar. Depending where an error occurs, different strategy for choosing the proper failure is necessary :-/

Indeed, my hack (redefining #parseWithContext:) works only when the Pratt parser is the top parser, but a soon as I compose it I’m screwed because only parseOn: is sent to the Pratt parser.
That’s why I wonder if letting the parser decide what to return wouldn’t solve the problem: by default the furthest failure but special parsers can still decide.

Regarding tokenization, there is a message token, that returns PPTokenParser, which transforms a parsed input into the PPToken object. Perhaps this might be helpful?

The Pratt tokens are special: a token points back to the parser that generated it (its “kind”).
PPTokenKind subclasses PPFlatteningParser and generates instances of PPPrattToken.
A PPTokenKind stores the precedence, the action to be executed when a token of this kind is met at the start of an expression (for terminals and prefixes) and the action to be executed when a token is met in the middle of an expression (for postfixes and infixes).

Cheers,
Camille

Cheers Jan


On Wed, Jun 10, 2015, 20:52 Richard Sargent <[hidden email]> wrote:
camille teruel wrote
>> On 10 Jun 2015, at 19:11, Chris Cunningham &lt;

> cunningham.cb@

> &gt; wrote:
>>
>> Inteteresting....
>>
>> On Wed, Jun 10, 2015 at 9:36 AM, Camille &lt;

> camille.teruel@

>  &lt;mailto:

> camille.teruel@

> &gt;> wrote:
>> Hello Pharoers and Moosers,
>>
>> I did a Pratt parser extension for PetitParser.
>>
>>
> <snip>
>
>> @PP Devs:
>> I had trouble with the PPContext furthestFailure that is taken into
>> account instead of the failures I return, so I had to redefine
>> #parseWithContext: to return the failures I want. The results given by
>> furthestFailure were not very meaningful in my case (the same is true for
>> PPExpressionParser btw).
>> But I guess it was introduced because it gives good results in other
>> cases.
>> So would it be possible to change this behavior to let the parser decide
>> if it returns the furthestFailure or the original failure?
>>
>> The intent behind the furthestFailure is that it give the failure that
>> gets the furthest into the source stream.  It is most useful when there
>> are embedded choice operators in the parser - the original/pre furthest
>> behaviour would return the last failure, which depending on the incoming
>> stream and the order of the choice options could be significantly not
>> useful.
>>
>> I ran into this when working with the sql parser, which started off with
>> the outer choice of (by memory):
>>    ^ selectStatement / insertStatement / updateStatement /
>> deleteStatement
>> If I was trying to part a select statement that had an error at the very
>> end of the statement, the parser would return an error talking about how
>> the incoming stream failed in deleteStatement.  Not useful.
>>
>> I would be saddened if this further failure was not available.
>
> Yes in that case returning the furthest failure gives better results.
> However, this don’t give meaningful messages in all cases.
> For exemple with the calculator I gave in my previous message, if I parse
> ‘1+’ I want to get ‘expression expected at: 2’ but instead it returns ‘$-
> expected at 2'.
> I’m not proposing to remove this feature but to let parsers decide to use
> it or not.
> Something like (changes in bold):
>
> PPParser>>parseWithContext: context
>       | result |
>       context initializeFor: self.
>       result := self parseOn: context.
>
>       "Return the furthest failure, it gives better results than the last
> failure"
>       (result isPetitFailure and: [ self wantsFurthestFailure and: [ context
> furthestFailure notNil ] ])
>               ifTrue: [ ^ context furthestFailure ].
>       ^ result

This screams at me. Why not just delegate to the context and use a context
that returns the preferred failure? e.g. end with:
^context preferredResultFor: result.


>
> PPParser>>wantsFurthestFailure
>       ^ true
>
> Like this, one can return the failures he wants.
>
> PPPrattParser>>wantsFurthestFailure
>       ^ false
>
>
> Camille
>
>>
>> -cbc





--
View this message in context: http://forum.world.st/Pratt-Parsers-for-PetitParser-tp4831456p4831486.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: Pratt Parsers for PetitParser

jgfoster
> On Jun 11, 2015, at 2:47 AM, Camille <[hidden email]> wrote:
>
> But furthest failure gives wrong messages in other cases to.
> Consider this sequence parser:
>
> keyword := #letter asParser plus , $: asParser.
> keyword parse: ‘foo'
>
> This returns 'letter expected at 3’, but no matter how many letters I add to the end I’ll still get ‘letter expected’.
> I want to get what is really missing: '$: expected at 3’.

Indeed. I recently submitted an update to the parser and tests (based on code from Dale Henrichs) for a problem in which trim would cause the error to always be “separator expected” when there was something after the separator that was expected. Better error reporting would certainly be helpful. The ‘letter expected’ explanation is quite wrong.

James Foster
Reply | Threaded
Open this post in threaded view
|

Re: Pratt Parsers for PetitParser

Francisco Garau-2
In reply to this post by camille teruel

Hi Camile

On 11 Jun 2015, at 10:47, Camille <[hidden email]> wrote:


On 11 Jun 2015, at 09:30, Jan Kurš <[hidden email]> wrote:

That sounds really cool and useful extension.

Thank you Jan!

Regarding the furthest failure, the core of the problem is the distinction between an error and a failure. Error reports on a problem in the input, while failure is information for choice parser combinator. In general, the furthest failure is a better approximation of an error than the last failure, so we use it.

I am not sure what exactly is the problem in case of PrattParser. I guess the last failure gives better results for a user?

Yes, with furthest failure I get errors from tokenization instead of my errors. 
For exemple with the calculator grammar I gave in my first mail when I parse ‘1+’ the furthestFailure is ‘$- expected at 2’ whereas I return ‘expression expected at 2’ because there's a whole expression missing. Same thing with ‘(1+2’ that returns ‘digit expected at 4’ instead of ‘$) expected at 4’.

But furthest failure gives wrong messages in other cases to. 
Consider this sequence parser:

keyword := #letter asParser plus , $: asParser.
keyword parse: ‘foo'

This returns 'letter expected at 3’, but no matter how many letters I add to the end I’ll still get ‘letter expected’. 
I want to get what is really missing: '$: expected at 3’.

Any of those error messages wouldn't be too bad if the failing production rule were also mentioned. Something like "keyword rule failed with xx expected at 3"

But yeah, that would require annotating the production rules which can pollute the clarity of the grammar definition...


Maybe returning the “latest furthest failure” instead of the “first furthest failure” could solves the problem here (i.e. replacing > with >= in PPContext>>#noteFailure:)?

One has to consider a pratt parser included in the normal parser, e. g. Expressions parsed by pratt in a Java Grammar. Depending where an error occurs, different strategy for choosing the proper failure is necessary :-/

Indeed, my hack (redefining #parseWithContext:) works only when the Pratt parser is the top parser, but a soon as I compose it I’m screwed because only parseOn: is sent to the Pratt parser.
That’s why I wonder if letting the parser decide what to return wouldn’t solve the problem: by default the furthest failure but special parsers can still decide.

Regarding tokenization, there is a message token, that returns PPTokenParser, which transforms a parsed input into the PPToken object. Perhaps this might be helpful?

The Pratt tokens are special: a token points back to the parser that generated it (its “kind”).
PPTokenKind subclasses PPFlatteningParser and generates instances of PPPrattToken.
A PPTokenKind stores the precedence, the action to be executed when a token of this kind is met at the start of an expression (for terminals and prefixes) and the action to be executed when a token is met in the middle of an expression (for postfixes and infixes).

Cheers,
Camille

Cheers Jan


On Wed, Jun 10, 2015, 20:52 Richard Sargent <[hidden email]> wrote:
camille teruel wrote
>> On 10 Jun 2015, at 19:11, Chris Cunningham &lt;

> cunningham.cb@

> &gt; wrote:
>>
>> Inteteresting....
>>
>> On Wed, Jun 10, 2015 at 9:36 AM, Camille &lt;

> camille.teruel@

>  &lt;mailto:

> camille.teruel@

> &gt;> wrote:
>> Hello Pharoers and Moosers,
>>
>> I did a Pratt parser extension for PetitParser.
>>
>>
> <snip>
>
>> @PP Devs:
>> I had trouble with the PPContext furthestFailure that is taken into
>> account instead of the failures I return, so I had to redefine
>> #parseWithContext: to return the failures I want. The results given by
>> furthestFailure were not very meaningful in my case (the same is true for
>> PPExpressionParser btw).
>> But I guess it was introduced because it gives good results in other
>> cases.
>> So would it be possible to change this behavior to let the parser decide
>> if it returns the furthestFailure or the original failure?
>>
>> The intent behind the furthestFailure is that it give the failure that
>> gets the furthest into the source stream.  It is most useful when there
>> are embedded choice operators in the parser - the original/pre furthest
>> behaviour would return the last failure, which depending on the incoming
>> stream and the order of the choice options could be significantly not
>> useful.
>>
>> I ran into this when working with the sql parser, which started off with
>> the outer choice of (by memory):
>>    ^ selectStatement / insertStatement / updateStatement /
>> deleteStatement
>> If I was trying to part a select statement that had an error at the very
>> end of the statement, the parser would return an error talking about how
>> the incoming stream failed in deleteStatement.  Not useful.
>>
>> I would be saddened if this further failure was not available.
>
> Yes in that case returning the furthest failure gives better results.
> However, this don’t give meaningful messages in all cases.
> For exemple with the calculator I gave in my previous message, if I parse
> ‘1+’ I want to get ‘expression expected at: 2’ but instead it returns ‘$-
> expected at 2'.
> I’m not proposing to remove this feature but to let parsers decide to use
> it or not.
> Something like (changes in bold):
>
> PPParser>>parseWithContext: context
>       | result |
>       context initializeFor: self.
>       result := self parseOn: context.
>
>       "Return the furthest failure, it gives better results than the last
> failure"
>       (result isPetitFailure and: [ self wantsFurthestFailure and: [ context
> furthestFailure notNil ] ])
>               ifTrue: [ ^ context furthestFailure ].
>       ^ result

This screams at me. Why not just delegate to the context and use a context
that returns the preferred failure? e.g. end with:
^context preferredResultFor: result.


>
> PPParser>>wantsFurthestFailure
>       ^ true
>
> Like this, one can return the failures he wants.
>
> PPPrattParser>>wantsFurthestFailure
>       ^ false
>
>
> Camille
>
>>
>> -cbc





--
View this message in context: http://forum.world.st/Pratt-Parsers-for-PetitParser-tp4831456p4831486.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: Pratt Parsers for PetitParser

kurs.jan
Hi,

Did you tried to inspect the PPFailure in the Moose? There is a tab with a tree structure giving you a pretty good overview, what is going wrong... Or as an alternative, one can call:
myParser enableDebug parse:myInput

Otherwise, PetitParser really needs some nice error reporting system. I would have integrated one, but I am not aware of any suitable solution :(

Cheers,
Jan

On Fri, Jun 12, 2015 at 8:07 AM Francisco Garau <[hidden email]> wrote:

Hi Camile

On 11 Jun 2015, at 10:47, Camille <[hidden email]> wrote:


On 11 Jun 2015, at 09:30, Jan Kurš <[hidden email]> wrote:

That sounds really cool and useful extension.

Thank you Jan!

Regarding the furthest failure, the core of the problem is the distinction between an error and a failure. Error reports on a problem in the input, while failure is information for choice parser combinator. In general, the furthest failure is a better approximation of an error than the last failure, so we use it.

I am not sure what exactly is the problem in case of PrattParser. I guess the last failure gives better results for a user?

Yes, with furthest failure I get errors from tokenization instead of my errors. 
For exemple with the calculator grammar I gave in my first mail when I parse ‘1+’ the furthestFailure is ‘$- expected at 2’ whereas I return ‘expression expected at 2’ because there's a whole expression missing. Same thing with ‘(1+2’ that returns ‘digit expected at 4’ instead of ‘$) expected at 4’.

But furthest failure gives wrong messages in other cases to. 
Consider this sequence parser:

keyword := #letter asParser plus , $: asParser.
keyword parse: ‘foo'

This returns 'letter expected at 3’, but no matter how many letters I add to the end I’ll still get ‘letter expected’. 
I want to get what is really missing: '$: expected at 3’.

Any of those error messages wouldn't be too bad if the failing production rule were also mentioned. Something like "keyword rule failed with xx expected at 3"

But yeah, that would require annotating the production rules which can pollute the clarity of the grammar definition...


Maybe returning the “latest furthest failure” instead of the “first furthest failure” could solves the problem here (i.e. replacing > with >= in PPContext>>#noteFailure:)?

One has to consider a pratt parser included in the normal parser, e. g. Expressions parsed by pratt in a Java Grammar. Depending where an error occurs, different strategy for choosing the proper failure is necessary :-/

Indeed, my hack (redefining #parseWithContext:) works only when the Pratt parser is the top parser, but a soon as I compose it I’m screwed because only parseOn: is sent to the Pratt parser.
That’s why I wonder if letting the parser decide what to return wouldn’t solve the problem: by default the furthest failure but special parsers can still decide.

Regarding tokenization, there is a message token, that returns PPTokenParser, which transforms a parsed input into the PPToken object. Perhaps this might be helpful?

The Pratt tokens are special: a token points back to the parser that generated it (its “kind”).
PPTokenKind subclasses PPFlatteningParser and generates instances of PPPrattToken.
A PPTokenKind stores the precedence, the action to be executed when a token of this kind is met at the start of an expression (for terminals and prefixes) and the action to be executed when a token is met in the middle of an expression (for postfixes and infixes).

Cheers,
Camille

Cheers Jan


On Wed, Jun 10, 2015, 20:52 Richard Sargent <[hidden email]> wrote:
camille teruel wrote
>> On 10 Jun 2015, at 19:11, Chris Cunningham &lt;

> cunningham.cb@

> &gt; wrote:
>>
>> Inteteresting....
>>
>> On Wed, Jun 10, 2015 at 9:36 AM, Camille &lt;

> camille.teruel@

>  &lt;mailto:

> camille.teruel@

> &gt;> wrote:
>> Hello Pharoers and Moosers,
>>
>> I did a Pratt parser extension for PetitParser.
>>
>>
> <snip>
>
>> @PP Devs:
>> I had trouble with the PPContext furthestFailure that is taken into
>> account instead of the failures I return, so I had to redefine
>> #parseWithContext: to return the failures I want. The results given by
>> furthestFailure were not very meaningful in my case (the same is true for
>> PPExpressionParser btw).
>> But I guess it was introduced because it gives good results in other
>> cases.
>> So would it be possible to change this behavior to let the parser decide
>> if it returns the furthestFailure or the original failure?
>>
>> The intent behind the furthestFailure is that it give the failure that
>> gets the furthest into the source stream.  It is most useful when there
>> are embedded choice operators in the parser - the original/pre furthest
>> behaviour would return the last failure, which depending on the incoming
>> stream and the order of the choice options could be significantly not
>> useful.
>>
>> I ran into this when working with the sql parser, which started off with
>> the outer choice of (by memory):
>>    ^ selectStatement / insertStatement / updateStatement /
>> deleteStatement
>> If I was trying to part a select statement that had an error at the very
>> end of the statement, the parser would return an error talking about how
>> the incoming stream failed in deleteStatement.  Not useful.
>>
>> I would be saddened if this further failure was not available.
>
> Yes in that case returning the furthest failure gives better results.
> However, this don’t give meaningful messages in all cases.
> For exemple with the calculator I gave in my previous message, if I parse
> ‘1+’ I want to get ‘expression expected at: 2’ but instead it returns ‘$-
> expected at 2'.
> I’m not proposing to remove this feature but to let parsers decide to use
> it or not.
> Something like (changes in bold):
>
> PPParser>>parseWithContext: context
>       | result |
>       context initializeFor: self.
>       result := self parseOn: context.
>
>       "Return the furthest failure, it gives better results than the last
> failure"
>       (result isPetitFailure and: [ self wantsFurthestFailure and: [ context
> furthestFailure notNil ] ])
>               ifTrue: [ ^ context furthestFailure ].
>       ^ result

This screams at me. Why not just delegate to the context and use a context
that returns the preferred failure? e.g. end with:
^context preferredResultFor: result.


>
> PPParser>>wantsFurthestFailure
>       ^ true
>
> Like this, one can return the failures he wants.
>
> PPPrattParser>>wantsFurthestFailure
>       ^ false
>
>
> Camille
>
>>
>> -cbc





--
View this message in context: http://forum.world.st/Pratt-Parsers-for-PetitParser-tp4831456p4831486.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: Pratt Parsers for PetitParser

camille teruel

On 12 Jun 2015, at 09:36, Jan Kurš <[hidden email]> wrote:

Hi,

Did you tried to inspect the PPFailure in the Moose? There is a tab with a tree structure giving you a pretty good overview, what is going wrong... Or as an alternative, one can call:
myParser enableDebug parse:myInput


I didn’t know about these features, there are cool :) Thanks!

Otherwise, PetitParser really needs some nice error reporting system. I would have integrated one, but I am not aware of any suitable solution :(

I would be happy to discuss any idea you’ll have on that subject :)


Cheers,
Jan

On Fri, Jun 12, 2015 at 8:07 AM Francisco Garau <[hidden email]> wrote:

Hi Camile

On 11 Jun 2015, at 10:47, Camille <[hidden email]> wrote:


On 11 Jun 2015, at 09:30, Jan Kurš <[hidden email]> wrote:

That sounds really cool and useful extension.

Thank you Jan!

Regarding the furthest failure, the core of the problem is the distinction between an error and a failure. Error reports on a problem in the input, while failure is information for choice parser combinator. In general, the furthest failure is a better approximation of an error than the last failure, so we use it.

I am not sure what exactly is the problem in case of PrattParser. I guess the last failure gives better results for a user?

Yes, with furthest failure I get errors from tokenization instead of my errors. 
For exemple with the calculator grammar I gave in my first mail when I parse ‘1+’ the furthestFailure is ‘$- expected at 2’ whereas I return ‘expression expected at 2’ because there's a whole expression missing. Same thing with ‘(1+2’ that returns ‘digit expected at 4’ instead of ‘$) expected at 4’.

But furthest failure gives wrong messages in other cases to. 
Consider this sequence parser:

keyword := #letter asParser plus , $: asParser.
keyword parse: ‘foo'

This returns 'letter expected at 3’, but no matter how many letters I add to the end I’ll still get ‘letter expected’. 
I want to get what is really missing: '$: expected at 3’.

Any of those error messages wouldn't be too bad if the failing production rule were also mentioned. Something like "keyword rule failed with xx expected at 3"

But yeah, that would require annotating the production rules which can pollute the clarity of the grammar definition...


Maybe returning the “latest furthest failure” instead of the “first furthest failure” could solves the problem here (i.e. replacing > with >= in PPContext>>#noteFailure:)?

One has to consider a pratt parser included in the normal parser, e. g. Expressions parsed by pratt in a Java Grammar. Depending where an error occurs, different strategy for choosing the proper failure is necessary :-/

Indeed, my hack (redefining #parseWithContext:) works only when the Pratt parser is the top parser, but a soon as I compose it I’m screwed because only parseOn: is sent to the Pratt parser.
That’s why I wonder if letting the parser decide what to return wouldn’t solve the problem: by default the furthest failure but special parsers can still decide.

Regarding tokenization, there is a message token, that returns PPTokenParser, which transforms a parsed input into the PPToken object. Perhaps this might be helpful?

The Pratt tokens are special: a token points back to the parser that generated it (its “kind”).
PPTokenKind subclasses PPFlatteningParser and generates instances of PPPrattToken.
A PPTokenKind stores the precedence, the action to be executed when a token of this kind is met at the start of an expression (for terminals and prefixes) and the action to be executed when a token is met in the middle of an expression (for postfixes and infixes).

Cheers,
Camille

Cheers Jan


On Wed, Jun 10, 2015, 20:52 Richard Sargent <[hidden email]> wrote:
camille teruel wrote
>> On 10 Jun 2015, at 19:11, Chris Cunningham &lt;

> cunningham.cb@

> &gt; wrote:
>>
>> Inteteresting....
>>
>> On Wed, Jun 10, 2015 at 9:36 AM, Camille &lt;

> camille.teruel@

>  &lt;mailto:

> camille.teruel@

> &gt;> wrote:
>> Hello Pharoers and Moosers,
>>
>> I did a Pratt parser extension for PetitParser.
>>
>>
> <snip>
>
>> @PP Devs:
>> I had trouble with the PPContext furthestFailure that is taken into
>> account instead of the failures I return, so I had to redefine
>> #parseWithContext: to return the failures I want. The results given by
>> furthestFailure were not very meaningful in my case (the same is true for
>> PPExpressionParser btw).
>> But I guess it was introduced because it gives good results in other
>> cases.
>> So would it be possible to change this behavior to let the parser decide
>> if it returns the furthestFailure or the original failure?
>>
>> The intent behind the furthestFailure is that it give the failure that
>> gets the furthest into the source stream.  It is most useful when there
>> are embedded choice operators in the parser - the original/pre furthest
>> behaviour would return the last failure, which depending on the incoming
>> stream and the order of the choice options could be significantly not
>> useful.
>>
>> I ran into this when working with the sql parser, which started off with
>> the outer choice of (by memory):
>>    ^ selectStatement / insertStatement / updateStatement /
>> deleteStatement
>> If I was trying to part a select statement that had an error at the very
>> end of the statement, the parser would return an error talking about how
>> the incoming stream failed in deleteStatement.  Not useful.
>>
>> I would be saddened if this further failure was not available.
>
> Yes in that case returning the furthest failure gives better results.
> However, this don’t give meaningful messages in all cases.
> For exemple with the calculator I gave in my previous message, if I parse
> ‘1+’ I want to get ‘expression expected at: 2’ but instead it returns ‘$-
> expected at 2'.
> I’m not proposing to remove this feature but to let parsers decide to use
> it or not.
> Something like (changes in bold):
>
> PPParser>>parseWithContext: context
>       | result |
>       context initializeFor: self.
>       result := self parseOn: context.
>
>       "Return the furthest failure, it gives better results than the last
> failure"
>       (result isPetitFailure and: [ self wantsFurthestFailure and: [ context
> furthestFailure notNil ] ])
>               ifTrue: [ ^ context furthestFailure ].
>       ^ result

This screams at me. Why not just delegate to the context and use a context
that returns the preferred failure? e.g. end with:
^context preferredResultFor: result.


>
> PPParser>>wantsFurthestFailure
>       ^ true
>
> Like this, one can return the failures he wants.
>
> PPPrattParser>>wantsFurthestFailure
>       ^ false
>
>
> Camille
>
>>
>> -cbc





--
View this message in context: http://forum.world.st/Pratt-Parsers-for-PetitParser-tp4831456p4831486.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.