negative matching using Smacc

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

negative matching using Smacc

Damien Pollet
Hi,

I'm using SmaCC to parse a line-oriented file format. I'm only
interested in some lines that begin with known keywords, and I want to
ignore all other lines without knowing their format.

At the moment I have a scanner rule like this:
File : ((InterestingLine | JunkLine) <newline>)* ;

but how do I define JunkLine to match anything BUT interesting lines?
--
Damien Pollet
type less, do more [ | ] http://typo.cdlm.fasmz.org

Reply | Threaded
Open this post in threaded view
|

Re: negative matching using Smacc

Lukas Renggli
> I'm using SmaCC to parse a line-oriented file format. I'm only
> interested in some lines that begin with known keywords, and I want to
> ignore all other lines without knowing their format.
>
> At the moment I have a scanner rule like this:
> File : ((InterestingLine | JunkLine) <newline>)* ;

Have a look at Pier (or SmallWiki) they do line-based paring using SmaCC.

Also make sure that you are using a 3.9 image, because before there
were some bugs in Character that made it impossible to write a
line-based parser with SmaCC.

Cheers,
Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

Reply | Threaded
Open this post in threaded view
|

Re: negative matching using Smacc

Damien Cassou-3
In reply to this post by Damien Pollet
Damien Pollet a écrit :

> Hi,
>
> I'm using SmaCC to parse a line-oriented file format. I'm only
> interested in some lines that begin with known keywords, and I want to
> ignore all other lines without knowing their format.
>
> At the moment I have a scanner rule like this:
> File : ((InterestingLine | JunkLine) <newline>)* ;
>
> but how do I define JunkLine to match anything BUT interesting lines?

You should be able to define a priority between rules.

Reply | Threaded
Open this post in threaded view
|

Re: Re: negative matching using Smacc

Damien Pollet
In reply to this post by Lukas Renggli
2006/12/30, Lukas Renggli <[hidden email]>:
> Have a look at Pier (or SmallWiki) they do line-based paring using SmaCC.

Thanks, but in Pier's case there are simple patterns for all possible
line beginnings, e.g. \+\+ matches a non-link.

In my case if I want to match lines that don't begin with abc, I have
to write ([^a] | a[^b] | ab[^c]) .*
I actually have 4 keywords, each about 8 chars long, so the pattern is
already tedious to write...

--
Damien Pollet
type less, do more [ | ] http://typo.cdlm.fasmz.org

Reply | Threaded
Open this post in threaded view
|

Re: Re: negative matching using Smacc

Damien Pollet
In reply to this post by Damien Cassou-3
2006/12/30, Damien Cassou <[hidden email]>:
> > but how do I define JunkLine to match anything BUT interesting lines?
>
> You should be able to define a priority between rules.

I don't know how besides ordering them. But does it really solve my
problem, since the two rules still match (and the junk one possibly
matches a longer input since it is much more generic) ?

In fact I'm getting an error while compiling the parser: "A block
compiles more than 1K bytes of code"

--
Damien Pollet
type less, do more [ | ] http://typo.cdlm.fasmz.org

Reply | Threaded
Open this post in threaded view
|

Re: Re: negative matching using Smacc

Lukas Renggli
> > You should be able to define a priority between rules.
>
> I don't know how besides ordering them. But does it really solve my
> problem, since the two rules still match (and the junk one possibly
> matches a longer input since it is much more generic) ?

Yes, rules defined first match first. As far as I experienced this is
only true for the parser and not the scanner, but I don't know the
details.

> In fact I'm getting an error while compiling the parser: "A block
> compiles more than 1K bytes of code"

This means that one of your scanner reg-exp is too complicated, so
that it cannot be compiled into one method. VisualWorks doesn't have
this problem, as their methods work up to a couple of GB.

To come back to your original problem, I think that the following code
should do what you request (not actually tried out):

Scanner:
<newline> : \r \n | \n | \r ;
<any> : [^\r\n]+ ;

Parser:
Start
        :
        | Line
        | Start <newline> Line
        | Start <newline> ;
       
Line
        : "abc" <any> { Transcript show: '--> '; show: '2' value; cr }
        | <any> { Transcript show: '1' value; cr } ;

--
Lukas Renggli
http://www.lukas-renggli.ch