Hi,
I would like to parse text like ----- id(param1, param2, ... paramX) id -> id id ->> id ----- id is alphanumeric string, param is any string optionally enclosed in quotes (so both quoted and unquoted string is needed) I saw that there are many tools for parsing but since I have no experience in such matters I don't know what would be the best. PetitParser? SmaCC? Something else? Thanks, Peter |
Hi Peter,
I can answer for SmaCC. It would do the job, be very simple and fairly fast with a trick or two to handle the param unquoted string. Thierry Le 28/03/2015 02:53, Peter Uhnák a écrit : > Hi, > > I would like to parse text like > ----- > id(param1, param2, ... paramX) > id -> id > id ->> id > ----- > id is alphanumeric string, > param is any string optionally enclosed in quotes (so both quoted and > unquoted string is needed) > > I saw that there are many tools for parsing but since I have no > experience in such matters I don't know what would be the best. > > PetitParser? > SmaCC? > Something else? > > Thanks, > Peter |
Excerpts from Thierry Goubier's message of 2015-03-28 08:46:46 +0100:
> I can answer for SmaCC. It would do the job, be very simple and fairly > fast with a trick or two to handle the param unquoted string. could someone provide (or link to) a comparison between the major parsers? why are there even multiple ones? greetings, martin. -- eKita - the online platform for your entire academic life -- chief engineer eKita.co pike programmer pike.lysator.liu.se caudium.net societyserver.org secretary beijinglug.org mentor fossasia.org foresight developer foresightlinux.org realss.com unix sysadmin Martin Bähr working in china http://societyserver.org/mbaehr/ |
PettitParser provides a more Pharo orientated syntax. SmaCC uses syntax very similar to regex. SmaCC comes with a lot of parser for programming languages. I chose SmaCC because a) the syntax is very compact and it allows me a bird's eye view over the overall syntax definition, but also more difficult to read. PettitParse is the opposite , much easier to read but more verbose. b) It comes with a very detailed python parser and many other parsers. For less powerful parsing you can use also regex which is what both parser are based on as a general idea. The advantage of regex is that is already included inside Pharo. However if you plan to extend your syntax and grow it to something complex pettit or smacc will be a far better choice. I use smacc to parse python data types (list, dictionaries and custom python types) to pharo objects and so far has been working like a charm. I could do the same with PettitParser but having a python parser saved me time in the short and long run. I also really like the compact syntax, it took me time to get used to regex syntax but now I find quite easy to read. Hope that helps you. |
In reply to this post by Martin Bähr
In essence you have
- PetitParser (read chapter in deep into pharo) incremental composable flexible a bit slow - Smacc static traditional I think that there is one chapter in book in progress on github Stef > Excerpts from Thierry Goubier's message of 2015-03-28 08:46:46 +0100: >> I can answer for SmaCC. It would do the job, be very simple and fairly >> fast with a trick or two to handle the param unquoted string. > could someone provide (or link to) a comparison between the major parsers? > why are there even multiple ones? > > greetings, martin. > |
In reply to this post by Peter Uhnak
Peter Uhnák <[hidden email]> writes: > Hi, > > I would like to parse text like > ----- > id(param1, param2, ... paramX) > id -> id > id ->> id > ----- > id is alphanumeric string, > param is any string optionally enclosed in quotes (so both quoted and > unquoted string is needed) I would start with streams and regular expressions. If that's not powerful enough I would use PetitParser. If that's not fast enough, I would try SmaCC and compare speed. -- Damien Cassou http://damiencassou.seasidehosting.st "Success is the ability to go from one failure to another without losing enthusiasm." --Winston Churchill |
Thanks all, in the end I've used PetitParser and I was really surprised and happy how easy and far I've got with it. TBH using regular expressions in Pharo feels extremely uncomfortable to me compared to Perl or Ruby, but maybe that was design decision by the author to not be too hacky. So at least to me PetitParser feels like a more practical regex library than Regex itself. Peter On Tue, Mar 31, 2015 at 5:08 PM, Damien Cassou <[hidden email]> wrote:
|
Not to mention that you can also read it :) Doru On Tue, Mar 31, 2015 at 5:17 PM, Peter Uhnák <[hidden email]> wrote:
|
In reply to this post by Peter Uhnak
Le 31/3/15 17:17, Peter Uhnák a écrit :
Yes :) May be this was a mistake to get regexp. Stef
|
Excerpts from stepharo's message of 2015-04-01 23:00:11 +0200:
> May be this was a mistake to get regexp. Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. (jwz) |
depends on your needs. I never used regex before , pharo regex was my first. I used on a specific case that was quite simple. I love its simplicity and its extremely compact syntax. Perfect fit for my needs, one very happy customer :) I think it depends on complexity of the parsing and how you like to work, if you have a simple problem and want a compact syntax, nothing can beat regex. At least taking account SmaCC and PettitParser. When I need more complex parsing with nested syntax, SmaCC made a lot more sense. On Thu, Apr 2, 2015 at 4:31 AM, Martin Bähr <[hidden email]> wrote: Excerpts from stepharo's message of 2015-04-01 23:00:11 +0200: |
On Thu, Apr 2, 2015 at 9:45 AM, kilon alios <[hidden email]> wrote:
PetitParser was great for what I wanted to do: parse SNMP results. What was great was the ability to make one "generic" grammar and subclass it with specific ones. I used Regex at first to do the job but it turned into an unholy mess (not that it couldn't parse what I needed) and PetitParser was there. No SmaCC usage so far, so can't comment there. Phil
|
In reply to this post by Peter Uhnak
On 31 March 2015 at 17:17, Peter Uhnák <[hidden email]> wrote: So at least to me PetitParser feels like a more practical regex library than Regex itself. In which use-cases is Regex less practical? I'm thinking it could get a builder with a PetitParser-like API in addition to the current string syntax. |
"I used Regex at first to do the job but it turned into an unholy mess
(not that it couldn't parse what I needed) and PetitParser was there." it really depends how you approach this. For example in my case I quickly found out that would be insane to put everything in a single string, so I broke the single string to smaller ones and resynthesised it. Each string had a very simple regex quite easy to read if one is familiar syntax. |
Sure works.
Regex '((XXX Logical Channel) ([0-9])) on (((Upstream)|(Downstream)) ([0-9])) on ((chassis) ([0-9])), ((slot) ([0-9])), ((mac) ([0-9]))' asRegex But in PP, things were more comple and there were a lot of them, so: line ^ temperatureStatusDescrEntry token asParser / temperatureStatusValueEntry token asParser / temperatureThresholdEntry token asParser / temperatureLastShutdownEntry token asParser / temperatureStateEntry token asParser and things like temperatureStatusDescrEntry
^ temperatureStatusDescrOidPrefix, oidIndex, space, equals, space, stringType, space, displayStringValue. made my day much easier. Especially when I had all the tokens I needed: gauge32Type
^'Gauge32:' asParser flatten ==> [:str | #gauge32]. Phil |
yeap you use what makes your life easier ;) On Thu, Apr 2, 2015 at 12:49 PM, [hidden email] <[hidden email]> wrote:
|
In reply to this post by philippeback
2015-04-02 11:49 GMT+02:00 [hidden email] <[hidden email]>:
SmaCC is a lot (and I mean a lot) simpler than Flex/bison, especially for the interaction between Flex and Bison (in short, SmaCC infer all the token/keyword stuff as well as the api between the two objects, behaving like a scannerless system). For everything like keywords, for example, you don't even bother with the token: Gauge32Type: "Gauge32:" { #gauge32 } ; And of course you would: TemperatureStatusDescrEntry :
TemperatureStatusDescrOidPrefix OidIndex Space "=" Space StringType Space DisplayStringValue (Everytime I read PetitParser code, I see the SmaCC grammar, usually in a more verbose form (asParser, asToken)... ) Some of the benefits of SmaCC are not that obvious in fact. Coming from the Flex/Bison world, what is striking is the multithreading ability of the SmaCC parser infrastructure: they have no global/shared space and you can create as many instances of them as you like, as often as you like... A second benefit, but harder to use, is the AST node automatic generation, with the api, an equality and visitors: this makes all the code appearing behind an SmaCC parser very regular. However, if you derive on a regular basis grammars, the SmaCC API is not designed for that. It could do it (you could maybe include other grammars, for example), but nobody has expressed that need :) Thierry |
Free forum by Nabble | Edit this page |