Re: [vwnc] Parsing in Smalltalk

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Parsing in Smalltalk

Ben Coman
Am .10.2018, 20:14 Uhr, schrieb Steffen Märcker <[hidden email]>:

> Dear all,
>
> I have two questions regarding parsing frameworks.
>
> 1) Do you have any insights on the performance of SmaCC VS Xtreams  
> Parsing VS PetitParser?
> 2) Has anybody started to port PetitParser 2 from Pharo to VW? Is it  
> worth the effort?
>
> Sorry for cross-posting, I thought this might interest both communities.
>
> Cheers, Steffen

On Fri, 5 Oct 2018 at 04:47, Steffen Märcker <[hidden email]> wrote:
I gave Xtreams-Parsing and PetitParser a shot and like to share my 
findings.[*]

The task was to parse the modelling language of the probabilistic model 
checker PRISM. I've written a grammer of about 130 definitions in the 
Xtreams DSL, which is close to Bryan Fords syntax. To avoid doing it all 
again with PetitParser, I wrote a PetitParserGenerator that takes the DSL 
and builds a PetitParser.

The numbers below are just parsing times, no further actions involved. For 
reference I show the times from PRISM (which uses JavaCC), too -- although 
they involve additional verification and normalization steps on the AST.

input  Prism    XP   PP
230kB    14s    9s   2s
544kB   121s   20s   5s
1.1MB   421s   34s   8s
1.4MB  1091s   47s  12s
2.2MB          63s  16s
2.9MB          81s  20s
3.8MB         107s  25s
4.4MB         123s  30s

Please note that these times are not representative at all. It's just a 
single example and I put zero effort in optimization. However, I am quite 
satisfied with the results.

[*] I was already familiar with the DSL of Xtreams-Parsing, which I like 
very much. I did not consider SmaCC, as I find PEGs easier to use.

Best, Steffen

Thanks for your report Steffen. Nice to see such comparisons even when a bit apples & oranges.
Will you be implementing those "additional verification and normalization steps" ?
It seems they have an exponential or power impact on times.

cheers -ben 





Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Parsing in Smalltalk

Jan Kurš
#memoized is one of the most efficient and hardest optimizations. It cannot be done efficiently in an automated way. It depends on input. Best way is to identify repeated invocation of the same parser combinator at the same position for a typical input, pp2 has a tooling support for this, I wrote a chapter about #memoized in PP2 [1]. PP2 does the poor-man version of memoization (based on grammar analysis) automatically, just by calling #optimize. 

If really needed, provide me with parser and input, I can check and suggest optimizations.  

There should be no fundamental issue with porting PP2 to VW. As far as I know, there is an automated tool to do so, right? On the other hand, PP is stable and does not change, PP2 is maintained and updated from time to time (mostly adding optimizations), so there might be an overhead of syncing PP2 to VW2.

Cheers,
Jan

[1]: https://kursjan.github.io/petitparser2/pillar-book/build/Chapters/memoization.html

On Fri, Oct 5, 2018, 13:26 Steffen Märcker <[hidden email]> wrote:
Hi Doru!

> I assume that you tried the original PetitParser. PetitParser2 offers 
> the possibility to optimize the parser (kind of a compilation), and this 
> provides a significant speedup:
> https://github.com/kursjan/petitparser2
>
> Would you be interested in trying this out?

Yes, I'd like to give this a shot, too. However, as far as I know, PP2 is 
only available for Pharo and not VW, is it?

Speaking of optimizations, I also tried to use memoizing the petit parser. 
However, the times got worse instead of better. Is there a rule of thumb 
where to apply #memoized in a sensible way? As far as I understand, 
applying it to the root parser does not memoize subsequent parsers, does 
it?

Kind regards, Steffen

Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Parsing in Smalltalk

Steffen Märcker
Dear Jan,

I just tried to use PP2 but ran into two issues:

1. PP2 does not load into Pharo 6.1 stable.
2. I use #- to create character classes but was not able to find the  
equivalent in PP2 yet.

> There should be no fundamental issue with porting PP2 to VW. As far as I
> know, there is an automated tool to do so, right?

I am not aware of this tool. Can you give me some hints what exactly to  
look for?

Best, Steffen


On the other hand, PP

> is
> stable and does not change, PP2 is maintained and updated from time to  
> time
> (mostly adding optimizations), so there might be an overhead of syncing  
> PP2
> to VW2.
>
> Cheers,
> Jan
>
> [1]:
> https://kursjan.github.io/petitparser2/pillar-book/build/Chapters/memoization.html
>
> On Fri, Oct 5, 2018, 13:26 Steffen Märcker <[hidden email]> wrote:
>
>> Hi Doru!
>>
>> > I assume that you tried the original PetitParser. PetitParser2 offers
>> > the possibility to optimize the parser (kind of a compilation), and
>> this
>> > provides a significant speedup:
>> > https://github.com/kursjan/petitparser2
>> >
>> > Would you be interested in trying this out?
>>
>> Yes, I'd like to give this a shot, too. However, as far as I know, PP2  
>> is
>> only available for Pharo and not VW, is it?
>>
>> Speaking of optimizations, I also tried to use memoizing the petit
>> parser.
>> However, the times got worse instead of better. Is there a rule of thumb
>> where to apply #memoized in a sensible way? As far as I understand,
>> applying it to the root parser does not memoize subsequent parsers, does
>> it?
>>
>> Kind regards, Steffen
>>

Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Parsing in Smalltalk

Sean P. DeNigris
Administrator
Steffen Märcker wrote
> 1. PP2 does not load into Pharo 6.1 stable.

Can you give more details? IIRC I have PP2 loaded in several 6.1 images.



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Parsing in Smalltalk

Steffen Märcker
> Can you give more details? IIRC I have PP2 loaded in several 6.

I did the following:
1)  Download and start Pharo 6.1 stable via the launcher.
2a) Attempt to install PetitParser2 via the CatalogBrowser:
     "Information
     There was an error while trying to install PetitParser2.
     Installation was cancelled."
2b) Attempt to install PP2 via the scripts from GitHub:
     Metacello new
         baseline: 'PetitParser2';
         repository: 'github://kursjan/petitparser2';
         load.
     Metacello new
         baseline: 'PetitParser2Gui';
         repository: 'github://kursjan/petitparser2';
         load.
     "Could not resolve: [BaselineOfPetitParser2] in [...]"

Interestingly, it works in Pharo 7 dev, but there the GUI-Tools won't load  
because of some issues with their dependencies.

I hope this helps. As I am not familiar with Pharo, I'd appreciate any  
hints.

Best, Steffen

Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Parsing in Smalltalk

Sean P. DeNigris
Administrator
Steffen Märcker wrote

> I did the following:
> 1)  Download and start Pharo 6.1 stable via the launcher.
> 2b) Attempt to install PP2 via the scripts from GitHub:
>      Metacello new
>          baseline: 'PetitParser2';
>          repository: 'github://kursjan/petitparser2';
>          load.
>      Metacello new
>          baseline: 'PetitParser2Gui';
>          repository: 'github://kursjan/petitparser2';
>          load.

This way worked for me in Pharo #60546 (check in World->System->About). What
exact Pharo version/OS are you on? 32 or 64-bit



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Parsing in Smalltalk

Steffen Märcker
I am using MacOS 10.13.6 and the 32bit VM:

Pharo 6.0
Latest update: #60546

... the String in about is wrong, it should be 6.1. I installed it via the  
launcher as "Official Distribution: Pharo 6.1 - 32Bit (stable)" I just  
noticed, that the sources file is missing from vms/private/6521/, too.

Am .10.2018, 17:02 Uhr, schrieb Sean P. DeNigris <[hidden email]>:

> Steffen Märcker wrote
>> I did the following:
>> 1)  Download and start Pharo 6.1 stable via the launcher.
>> 2b) Attempt to install PP2 via the scripts from GitHub:
>>      Metacello new
>>          baseline: 'PetitParser2';
>>          repository: 'github://kursjan/petitparser2';
>>          load.
>>      Metacello new
>>          baseline: 'PetitParser2Gui';
>>          repository: 'github://kursjan/petitparser2';
>>          load.
>
> This way worked for me in Pharo #60546 (check in World->System->About).  
> What
> exact Pharo version/OS are you on? 32 or 64-bit
>
>
>
> -----
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>



Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Parsing in Smalltalk

Steffen Märcker
Reading the code of PetitParser, I wonder why PPRepeatingParser  
initializes 'max' with SmallInteger maxVal instead of some notion of  
infinity, like Float infinity (and PP2RepeatingNode as well). If I  
understand the code correctly, PParser>>min: fails if the number of  
repetitions exceeds SmallInteger maxVal, doesn't it?

Best, Steffen


Am .10.2018, 17:10 Uhr, schrieb Steffen Märcker <[hidden email]>:

> I am using MacOS 10.13.6 and the 32bit VM:
>
> Pharo 6.0
> Latest update: #60546
>
> ... the String in about is wrong, it should be 6.1. I installed it via  
> the launcher as "Official Distribution: Pharo 6.1 - 32Bit (stable)" I  
> just noticed, that the sources file is missing from vms/private/6521/,  
> too.
>
> Am .10.2018, 17:02 Uhr, schrieb Sean P. DeNigris <[hidden email]>:
>
>> Steffen Märcker wrote
>>> I did the following:
>>> 1)  Download and start Pharo 6.1 stable via the launcher.
>>> 2b) Attempt to install PP2 via the scripts from GitHub:
>>>      Metacello new
>>>          baseline: 'PetitParser2';
>>>          repository: 'github://kursjan/petitparser2';
>>>          load.
>>>      Metacello new
>>>          baseline: 'PetitParser2Gui';
>>>          repository: 'github://kursjan/petitparser2';
>>>          load.
>>
>> This way worked for me in Pharo #60546 (check in World->System->About).  
>> What
>> exact Pharo version/OS are you on? 32 or 64-bit
>>
>>
>>
>> -----
>> Cheers,
>> Sean
>> --
>> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Parsing in Smalltalk

Peter Kenny
In reply to this post by Steffen Märcker
Steffen

I do most of my work using Moose Suite 6.1, which is Pharo 6.1 with a lot of
extras, because it comes with the tools I want (PetitParser, PP2 and
XMLParser) already loaded. The image is huge, but if that's not a problem
for you it could be an easy way to get PP2.

Best wishes

Peter Kenny

-----Original Message-----
From: Pharo-users <[hidden email]> On Behalf Of Steffen
Märcker
Sent: 11 October 2018 16:11
To: [hidden email]
Subject: Re: [Pharo-users] [vwnc] Parsing in Smalltalk

I am using MacOS 10.13.6 and the 32bit VM:

Pharo 6.0
Latest update: #60546

... the String in about is wrong, it should be 6.1. I installed it via the
launcher as "Official Distribution: Pharo 6.1 - 32Bit (stable)" I just
noticed, that the sources file is missing from vms/private/6521/, too.

Am .10.2018, 17:02 Uhr, schrieb Sean P. DeNigris <[hidden email]>:

> Steffen Märcker wrote
>> I did the following:
>> 1)  Download and start Pharo 6.1 stable via the launcher.
>> 2b) Attempt to install PP2 via the scripts from GitHub:
>>      Metacello new
>>          baseline: 'PetitParser2';
>>          repository: 'github://kursjan/petitparser2';
>>          load.
>>      Metacello new
>>          baseline: 'PetitParser2Gui';
>>          repository: 'github://kursjan/petitparser2';
>>          load.
>
> This way worked for me in Pharo #60546 (check in World->System->About).  
> What
> exact Pharo version/OS are you on? 32 or 64-bit
>
>
>
> -----
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>




Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Parsing in Smalltalk

Jan Kurš
In reply to this post by Steffen Märcker
I run PP2 on travis [1], seems Pharo 6.1 loads all configurations, both on linux and mac. Pharo 5, Pharo 6.0 got broken, why is build configuration so hard :'( I don't know, how can I support you. I myself had to gave up on some tools, because I failed to load them.

There is no specific reason to use SmallInteger maxVal...  and nobody ever thought it might be too little. 'PP2 min: X' fails if there are less repetitions that X. 'PP2 max: X' parses at most X repetitions.

($a asPParser min: 2 max: 3) parse: 'a'. -> Failure
($a asPParser min: 2 max: 3) parse: 'aa'.  #($a $a)
($a asPParser min: 2 max: 3) parse: 'aaa'. #($a $a $a)
($a asPParser min: 2 max: 3) parse: 'aaaa'. #($a $a $a)
 

Use $- asPParser for characters, e.g:
$- asPParser parse: '-'


On Thu, Oct 11, 2018 at 8:13 PM Steffen Märcker <[hidden email]> wrote:
Reading the code of PetitParser, I wonder why PPRepeatingParser 
initializes 'max' with SmallInteger maxVal instead of some notion of 
infinity, like Float infinity (and PP2RepeatingNode as well). If I 
understand the code correctly, PParser>>min: fails if the number of 
repetitions exceeds SmallInteger maxVal, doesn't it?

Best, Steffen


Am .10.2018, 17:10 Uhr, schrieb Steffen Märcker <[hidden email]>:

> I am using MacOS 10.13.6 and the 32bit VM:
>
> Pharo 6.0
> Latest update: #60546
>
> ... the String in about is wrong, it should be 6.1. I installed it via 
> the launcher as "Official Distribution: Pharo 6.1 - 32Bit (stable)" I 
> just noticed, that the sources file is missing from vms/private/6521/, 
> too.
>
> Am .10.2018, 17:02 Uhr, schrieb Sean P. DeNigris <[hidden email]>:
>
>> Steffen Märcker wrote
>>> I did the following:
>>> 1)  Download and start Pharo 6.1 stable via the launcher.
>>> 2b) Attempt to install PP2 via the scripts from GitHub:
>>>      Metacello new
>>>          baseline: 'PetitParser2';
>>>          repository: 'github://kursjan/petitparser2';
>>>          load.
>>>      Metacello new
>>>          baseline: 'PetitParser2Gui';
>>>          repository: 'github://kursjan/petitparser2';
>>>          load.
>>
>> This way worked for me in Pharo #60546 (check in World->System->About). 
>> What
>> exact Pharo version/OS are you on? 32 or 64-bit
>>
>>
>>
>> -----
>> Cheers,
>> Sean
>> --
>> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Parsing in Smalltalk

Steffen Märcker
Hi, I tried it some more times and things are different now:
- image appeared to lock up (1st)
- no network traffic at all (2nd)
- image unresponsive, loading successful after 2 minutes (3rd)
Call me a fool, but I didn't do anything different. Notably, it succeeded  
each time in 7.0. =)

> There is no specific reason to use SmallInteger maxVal...  and nobody  
> ever thought it might be too little.

Maybe it makes sense to change this? It appears to be just wrong and on  
32bit the limit is well in practical reach. With a little guidance, I'd  
try to do a first PR myself (if the change is considered sensible).

I was mentioning #min: since it is implemented in terms of 'min: min max:  
SmallInteber maxVal'.

Is there an easy way to create a character class, similar to [1-2x-z]?

Best, Steffen



> Use $- asPParser for characters, e.g:
> $- asPParser parse: '-'
>
> [1]: https://travis-ci.org/kursjan/petitparser2/builds/438358467
>
> On Thu, Oct 11, 2018 at 8:13 PM Steffen Märcker <[hidden email]> wrote:
>
>> Reading the code of PetitParser, I wonder why PPRepeatingParser
>> initializes 'max' with SmallInteger maxVal instead of some notion of
>> infinity, like Float infinity (and PP2RepeatingNode as well). If I
>> understand the code correctly, PParser>>min: fails if the number of
>> repetitions exceeds SmallInteger maxVal, doesn't it?
>>
>> Best, Steffen
>>
>>
>> Am .10.2018, 17:10 Uhr, schrieb Steffen Märcker <[hidden email]>:
>>
>> > I am using MacOS 10.13.6 and the 32bit VM:
>> >
>> > Pharo 6.0
>> > Latest update: #60546
>> >
>> > ... the String in about is wrong, it should be 6.1. I installed it via
>> > the launcher as "Official Distribution: Pharo 6.1 - 32Bit (stable)" I
>> > just noticed, that the sources file is missing from vms/private/6521/,
>> > too.
>> >
>> > Am .10.2018, 17:02 Uhr, schrieb Sean P. DeNigris  
>> <[hidden email]
>> >:
>> >
>> >> Steffen Märcker wrote
>> >>> I did the following:
>> >>> 1)  Download and start Pharo 6.1 stable via the launcher.
>> >>> 2b) Attempt to install PP2 via the scripts from GitHub:
>> >>>      Metacello new
>> >>>          baseline: 'PetitParser2';
>> >>>          repository: 'github://kursjan/petitparser2';
>> >>>          load.
>> >>>      Metacello new
>> >>>          baseline: 'PetitParser2Gui';
>> >>>          repository: 'github://kursjan/petitparser2';
>> >>>          load.
>> >>
>> >> This way worked for me in Pharo #60546 (check in
>> World->System->About).
>> >> What
>> >> exact Pharo version/OS are you on? 32 or 64-bit
>> >>
>> >>
>> >>
>> >> -----
>> >> Cheers,
>> >> Sean
>> >> --
>> >> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>> >>
>> >
>> >
>>

Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Parsing in Smalltalk

Jan Kurš
In reply to this post by Ben Coman
HI Steffen,

Thanks for the report, number pleases me :)

Speaking of tool for porting, I was recently showed this one, I don't have any experience with it:

Speaking of character ranges, there is currently available:
#letter asPParser (to recognize character matching #isLetter predicate)
#word asPParser (characters matching #isAlphaNumeric predicate)
#digit (#isDigit predicate)
#hex ([a-fA-F])
#space
#blank
#any

You can find definitions in PP2NodeFactory, see implementation of #hex, which is probably closest to what you want. For your convenience, your project can extend the class to fit your needs.

You can also specify:$a asPParser / $b asPParser / $c asPParser / #digit asPParser ... PetitParser2 can recognize this pattern and makes a character class during optimization pass.

On Sat, Oct 13, 2018 at 5:38 PM Steffen Märcker <[hidden email]> wrote:
Hi,

I gave PetitParser 2 a try and I am pretty impressed by the results, 
please see the updated table below. =) Again, that's pure parsing and 
Array-based AST-building. Moving to PP2 was indeed as easy as sending 
#asPParser and working around character ranges ($a - $z). Is there a 
preferred way to do the latter?

Jan mentioned that there might be an automated tool to port stuff to 
VisualWorks. Do you have a name? And again the old question: what is the 
preferred workflow to exchange code between the two dialects? Till now I 
stick to FileOut30.

input  Prism        Storm  Xtreams.PEG  PP     PP2
size   parse check  check  parse cache  parse  parse optim
230kB   0.1s   10s     6s     9s    3s     2s     4s  0.2s
544kB   0.2s   90s    20s    20s    7s     5s     9s  0.5s
1.1MB   0.4s  392s    46s    34s   13s     8s    15s  1.0s
1.4MB   0.8s 1091s    85s    47s   20s    12s    20s  1.3s
2.2MB                        63s   30s    16s    27s  1.9s
2.9MB                        81s   44s    20s    34s  2.5s
3.8MB                       107s   61s    25s    45s  3.1s
4.4MB                       123s   76s    30s    56s  3.7s

Best, Steffen


Am .10.2018, 05:22 Uhr, schrieb Tudor Girba <[hidden email]>:

> Hi,
>
> Interesting experiment. Thanks for sharing!
>
> I assume that you tried the original PetitParser. PetitParser2 offers 
> the possibility to optimize the parser (kind of a compilation), and this 
> provides a significant speedup:
> https://github.com/kursjan/petitparser2
>
> Would you be interested in trying this out?
>
> Cheers,
> Doru
>
>
>
>> On Oct 4, 2018, at 10:46 PM, Steffen Märcker <[hidden email]> wrote:
>>
>> I gave Xtreams-Parsing and PetitParser a shot and like to share my 
>> findings.[*]
>>
>> The task was to parse the modelling language of the probabilistic model 
>> checker PRISM. I've written a grammer of about 130 definitions in the 
>> Xtreams DSL, which is close to Bryan Fords syntax. To avoid doing it 
>> all again with PetitParser, I wrote a PetitParserGenerator that takes 
>> the DSL and builds a PetitParser.
>>
>> The numbers below are just parsing times, no further actions involved. 
>> For reference I show the times from PRISM (which uses JavaCC), too -- 
>> although they involve additional verification and normalization steps 
>> on the AST.
>>
>> input  Prism    XP   PP     
>> 230kB    14s    9s   2s
>> 544kB        121s   20s   5s
>> 1.1MB        421s   34s   8s
>> 1.4MB  1091s   47s  12s
>> 2.2MB          63s  16s
>> 2.9MB          81s  20s
>> 3.8MB         107s  25s
>> 4.4MB         123s  30s
>>
>> Please note that these times are not representative at all. It's just a 
>> single example and I put zero effort in optimization. However, I am 
>> quite satisfied with the results.
>>
>> [*] I was already familiar with the DSL of Xtreams-Parsing, which I 
>> like very much. I did not consider SmaCC, as I find PEGs easier to use.
>>
>> Best, Steffen
>>
>>
>>
>> Am .10.2018, 20:14 Uhr, schrieb Steffen Märcker <[hidden email]>:
>>
>>> Dear all,
>>>
>>> I have two questions regarding parsing frameworks.
>>>
>>> 1) Do you have any insights on the performance of SmaCC VS Xtreams 
>>> Parsing VS PetitParser?
>>> 2) Has anybody started to port PetitParser 2 from Pharo to VW? Is it 
>>> worth the effort?
>>>
>>> Sorry for cross-posting, I thought this might interest both 
>>> communities.
>>>
>>> Cheers, Steffen
>
> --
> www.feenk.com
>
> "No matter how many recipes we know, we still value a chef."
>
>
>
>
>
>
>