what to use for simple parsing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

what to use for simple parsing

Peter Uhnak
Hi,

I would like to parse text like
-----
id(param1, param2, ... paramX)
id -> id
id ->> id
-----
id is alphanumeric string,
param is any string optionally enclosed in quotes (so both quoted and unquoted string is needed)

I saw that there are many tools for parsing but since I have no experience in such matters I don't know what would be the best.

PetitParser?
SmaCC?
Something else?

Thanks,
Peter
Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

Thierry Goubier
Hi Peter,

I can answer for SmaCC. It would do the job, be very simple and fairly
fast with a trick or two to handle the param unquoted string.

Thierry

Le 28/03/2015 02:53, Peter Uhnák a écrit :

> Hi,
>
> I would like to parse text like
> -----
> id(param1, param2, ... paramX)
> id -> id
> id ->> id
> -----
> id is alphanumeric string,
> param is any string optionally enclosed in quotes (so both quoted and
> unquoted string is needed)
>
> I saw that there are many tools for parsing but since I have no
> experience in such matters I don't know what would be the best.
>
> PetitParser?
> SmaCC?
> Something else?
>
> Thanks,
> Peter


Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

Martin Bähr
Excerpts from Thierry Goubier's message of 2015-03-28 08:46:46 +0100:
> I can answer for SmaCC. It would do the job, be very simple and fairly
> fast with a trick or two to handle the param unquoted string.

could someone provide (or link to) a comparison between the major parsers?
why are there even multiple ones?

greetings, martin.

--
eKita                   -   the online platform for your entire academic life
--
chief engineer                                                       eKita.co
pike programmer      pike.lysator.liu.se    caudium.net     societyserver.org
secretary                                                      beijinglug.org
mentor                                                           fossasia.org
foresight developer  foresightlinux.org                            realss.com
unix sysadmin
Martin Bähr          working in china        http://societyserver.org/mbaehr/

Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

kilon.alios

PettitParser provides a more Pharo orientated syntax. SmaCC uses syntax very similar to regex. SmaCC comes with a lot of parser for programming languages.

I chose SmaCC because

a) the syntax is very compact and it allows me a bird's eye view over the overall syntax definition, but also more difficult to read. PettitParse is the opposite , much easier to read but more verbose.
b) It comes with a very detailed python parser and many other parsers.

For less powerful parsing you can use also regex which is what both parser are based on as a general idea. The advantage of regex is that is already included inside Pharo. However if you plan to extend your syntax and grow it to something complex pettit or smacc will be a far better choice.

I use smacc to parse python data types (list, dictionaries and custom python types) to pharo objects and so far has been working like a charm.  I could do the same with PettitParser but having a python parser saved me time in the short and long run. I also really like the compact syntax, it took me time to get used to regex syntax but now I find quite easy to read.

Hope that helps you.
Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

stepharo
In reply to this post by Martin Bähr
In essence you have

     - PetitParser (read chapter in deep into pharo)
             incremental
             composable
             flexible
             a bit slow

     - Smacc
         static
         traditional
         I think that there is one chapter in book in progress on github


Stef
> Excerpts from Thierry Goubier's message of 2015-03-28 08:46:46 +0100:
>> I can answer for SmaCC. It would do the job, be very simple and fairly
>> fast with a trick or two to handle the param unquoted string.
> could someone provide (or link to) a comparison between the major parsers?
> why are there even multiple ones?
>
> greetings, martin.
>


Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

Damien Cassou
In reply to this post by Peter Uhnak

Peter Uhnák <[hidden email]> writes:

> Hi,
>
> I would like to parse text like
> -----
> id(param1, param2, ... paramX)
> id -> id
> id ->> id
> -----
> id is alphanumeric string,
> param is any string optionally enclosed in quotes (so both quoted and
> unquoted string is needed)


I would start with streams and regular expressions. If that's not
powerful enough I would use PetitParser. If that's not fast enough, I
would try SmaCC and compare speed.

--
Damien Cassou
http://damiencassou.seasidehosting.st

"Success is the ability to go from one failure to another without
losing enthusiasm." --Winston Churchill

Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

Peter Uhnak
Thanks all,

in the end I've used PetitParser and I was really surprised and happy how easy and far I've got with it.

TBH using regular expressions in Pharo feels extremely uncomfortable to me compared to Perl or Ruby, but maybe that was design decision by the author to not be too hacky.

So at least to me PetitParser feels like a more practical regex library than Regex itself.

Peter

On Tue, Mar 31, 2015 at 5:08 PM, Damien Cassou <[hidden email]> wrote:

Peter Uhnák <[hidden email]> writes:

> Hi,
>
> I would like to parse text like
> -----
> id(param1, param2, ... paramX)
> id -> id
> id ->> id
> -----
> id is alphanumeric string,
> param is any string optionally enclosed in quotes (so both quoted and
> unquoted string is needed)


I would start with streams and regular expressions. If that's not
powerful enough I would use PetitParser. If that's not fast enough, I
would try SmaCC and compare speed.

--
Damien Cassou
http://damiencassou.seasidehosting.st

"Success is the ability to go from one failure to another without
losing enthusiasm." --Winston Churchill


Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

Tudor Girba-2
Not to mention that you can also read it :)

Doru

On Tue, Mar 31, 2015 at 5:17 PM, Peter Uhnák <[hidden email]> wrote:
Thanks all,

in the end I've used PetitParser and I was really surprised and happy how easy and far I've got with it.

TBH using regular expressions in Pharo feels extremely uncomfortable to me compared to Perl or Ruby, but maybe that was design decision by the author to not be too hacky.

So at least to me PetitParser feels like a more practical regex library than Regex itself.

Peter

On Tue, Mar 31, 2015 at 5:08 PM, Damien Cassou <[hidden email]> wrote:

Peter Uhnák <[hidden email]> writes:

> Hi,
>
> I would like to parse text like
> -----
> id(param1, param2, ... paramX)
> id -> id
> id ->> id
> -----
> id is alphanumeric string,
> param is any string optionally enclosed in quotes (so both quoted and
> unquoted string is needed)


I would start with streams and regular expressions. If that's not
powerful enough I would use PetitParser. If that's not fast enough, I
would try SmaCC and compare speed.

--
Damien Cassou
http://damiencassou.seasidehosting.st

"Success is the ability to go from one failure to another without
losing enthusiasm." --Winston Churchill





--

"Every thing has its own flow"
Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

stepharo
In reply to this post by Peter Uhnak


Le 31/3/15 17:17, Peter Uhnák a écrit :
Thanks all,

in the end I've used PetitParser and I was really surprised and happy how easy and far I've got with it.

TBH using regular expressions in Pharo feels extremely uncomfortable to me compared to Perl or Ruby, but maybe that was design decision by the author to not be too hacky.

So at least to me PetitParser feels like a more practical regex library than Regex itself.

Yes
:)
May be this was a mistake to get regexp.

Stef


Peter

On Tue, Mar 31, 2015 at 5:08 PM, Damien Cassou <[hidden email]> wrote:

Peter Uhnák <[hidden email]> writes:

> Hi,
>
> I would like to parse text like
> -----
> id(param1, param2, ... paramX)
> id -> id
> id ->> id
> -----
> id is alphanumeric string,
> param is any string optionally enclosed in quotes (so both quoted and
> unquoted string is needed)


I would start with streams and regular expressions. If that's not
powerful enough I would use PetitParser. If that's not fast enough, I
would try SmaCC and compare speed.

--
Damien Cassou
http://damiencassou.seasidehosting.st

"Success is the ability to go from one failure to another without
losing enthusiasm." --Winston Churchill



Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

Martin Bähr
Excerpts from stepharo's message of 2015-04-01 23:00:11 +0200:
> May be this was a mistake to get regexp.

Some people, when confronted with a problem, think "I know, I'll use regular
expressions." Now they have two problems.
(jwz)

Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

kilon.alios
depends on your needs. I never used regex before , pharo regex was my first. I used on a specific case that was quite simple. I love its simplicity and its extremely compact syntax. Perfect fit for my needs, one very happy customer :)

I think it depends on complexity of the parsing and how you like to work, if you have a simple problem and want a compact syntax, nothing can beat regex. At least taking account SmaCC and PettitParser. When I need more complex parsing with nested syntax, SmaCC made a lot more sense.

On Thu, Apr 2, 2015 at 4:31 AM, Martin Bähr <[hidden email]> wrote:
Excerpts from stepharo's message of 2015-04-01 23:00:11 +0200:
> May be this was a mistake to get regexp.

Some people, when confronted with a problem, think "I know, I'll use regular
expressions." Now they have two problems.
(jwz)


Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

philippeback
On Thu, Apr 2, 2015 at 9:45 AM, kilon alios <[hidden email]> wrote:
depends on your needs. I never used regex before , pharo regex was my first. I used on a specific case that was quite simple. I love its simplicity and its extremely compact syntax. Perfect fit for my needs, one very happy customer :)

I think it depends on complexity of the parsing and how you like to work, if you have a simple problem and want a compact syntax, nothing can beat regex. At least taking account SmaCC and PettitParser. When I need more complex parsing with nested syntax, SmaCC made a lot more sense.

PetitParser was great for what I wanted to do: parse SNMP results.
What was great was the ability to make one "generic" grammar and subclass it with specific ones.

I used Regex at first to do the job but it turned into an unholy mess (not that it couldn't parse what I needed) and PetitParser was there.

No SmaCC usage so far, so can't comment there.

Phil
 

On Thu, Apr 2, 2015 at 4:31 AM, Martin Bähr <[hidden email]> wrote:
Excerpts from stepharo's message of 2015-04-01 23:00:11 +0200:
> May be this was a mistake to get regexp.

Some people, when confronted with a problem, think "I know, I'll use regular
expressions." Now they have two problems.
(jwz)






 

Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

Damien Pollet-2
In reply to this post by Peter Uhnak
On 31 March 2015 at 17:17, Peter Uhnák <[hidden email]> wrote:
So at least to me PetitParser feels like a more practical regex library than Regex itself.

In which use-cases is Regex less practical?
I'm thinking it could get a builder with a PetitParser-like API in addition to the current string syntax.
Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

kilon.alios
"I used Regex at first to do the job but it turned into an unholy mess (not that it couldn't parse what I needed) and PetitParser was there."

it really depends how you approach this. For example in my case I quickly found out that would be insane to put everything in  a single string, so I broke the single string to smaller ones and resynthesised it. Each string had a very simple regex quite easy to read if one is familiar syntax.

And as Damien said you could extend String to a more flexible object, but then you get in the realm of SmaCC/ PettitParser.

I chose SmaCC because it already offered solution to my problem. But for the title of this thread "simple parsing" I think regex is a very good choice depending on your demands.


Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

philippeback
Sure works.

Regex

'((XXX Logical Channel) ([0-9])) on (((Upstream)|(Downstream)) ([0-9])) on ((chassis) ([0-9])), ((slot) ([0-9])), ((mac) ([0-9]))' asRegex

But in PP, things were more comple and there were a lot of them, so:

line
^ temperatureStatusDescrEntry token asParser
/ temperatureStatusValueEntry token asParser
/ temperatureThresholdEntry token asParser
/ temperatureLastShutdownEntry token asParser
/ temperatureStateEntry token asParser
and things like

temperatureStatusDescrEntry ^ temperatureStatusDescrOidPrefix, oidIndex, space, equals, space, stringType, space, displayStringValue.

made my day much easier.

Especially when I had all the tokens I needed:

gauge32Type ^'Gauge32:' asParser flatten ==> [:str | #gauge32].


​Not sure it would have been as flexible with a SmaCC (I am not familiar with SmaCC but used Lex/Yacc|Bison in another life).

Phil
Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

kilon.alios
yeap you use what makes your life easier ;)

On Thu, Apr 2, 2015 at 12:49 PM, [hidden email] <[hidden email]> wrote:
Sure works.

Regex

'((XXX Logical Channel) ([0-9])) on (((Upstream)|(Downstream)) ([0-9])) on ((chassis) ([0-9])), ((slot) ([0-9])), ((mac) ([0-9]))' asRegex

But in PP, things were more comple and there were a lot of them, so:

line
^ temperatureStatusDescrEntry token asParser
/ temperatureStatusValueEntry token asParser
/ temperatureThresholdEntry token asParser
/ temperatureLastShutdownEntry token asParser
/ temperatureStateEntry token asParser
and things like

temperatureStatusDescrEntry ^ temperatureStatusDescrOidPrefix, oidIndex, space, equals, space, stringType, space, displayStringValue.

made my day much easier.

Especially when I had all the tokens I needed:

gauge32Type ^'Gauge32:' asParser flatten ==> [:str | #gauge32].


​Not sure it would have been as flexible with a SmaCC (I am not familiar with SmaCC but used Lex/Yacc|Bison in another life).

Phil

Reply | Threaded
Open this post in threaded view
|

Re: what to use for simple parsing

Thierry Goubier
In reply to this post by philippeback


2015-04-02 11:49 GMT+02:00 [hidden email] <[hidden email]>:
Sure works.

Regex

'((XXX Logical Channel) ([0-9])) on (((Upstream)|(Downstream)) ([0-9])) on ((chassis) ([0-9])), ((slot) ([0-9])), ((mac) ([0-9]))' asRegex

But in PP, things were more comple and there were a lot of them, so:

line
^ temperatureStatusDescrEntry token asParser
/ temperatureStatusValueEntry token asParser
/ temperatureThresholdEntry token asParser
/ temperatureLastShutdownEntry token asParser
/ temperatureStateEntry token asParser
and things like

temperatureStatusDescrEntry ^ temperatureStatusDescrOidPrefix, oidIndex, space, equals, space, stringType, space, displayStringValue.

made my day much easier.

Especially when I had all the tokens I needed:

gauge32Type ^'Gauge32:' asParser flatten ==> [:str | #gauge32].


Not sure it would have been as flexible with a SmaCC (I am not familiar with SmaCC but used Lex/Yacc|Bison in another life).

SmaCC is a lot (and I mean a lot) simpler than Flex/bison, especially for the interaction between Flex and Bison (in short, SmaCC infer all the token/keyword stuff as well as the api between the two objects, behaving like a scannerless system).

For everything like keywords, for example, you don't even bother with the token:

Gauge32Type: "Gauge32:" { #gauge32 } ;

And of course you would:

TemperatureStatusDescrEntry : TemperatureStatusDescrOidPrefix OidIndex Space "=" Space StringType Space DisplayStringValue

(Everytime I read PetitParser code, I see the SmaCC grammar, usually in a more verbose form (asParser, asToken)... )

Some of the benefits of SmaCC are not that obvious in fact. Coming from the Flex/Bison world, what is striking is the multithreading ability of the SmaCC parser infrastructure: they have no global/shared space and you can create as many instances of them as you like, as often as you like... A second benefit, but harder to use, is the AST node automatic generation, with the api, an equality and visitors: this makes all the code appearing behind an SmaCC parser very regular.

However, if you derive on a regular basis grammars, the SmaCC API is not designed for that. It could do it (you could maybe include other grammars, for example), but nobody has expressed that need :)

Thierry