petit parser help

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

petit parser help

EstebanLM
Hi Lukas, all
I'm finally working on a HTML petit parser (a very basic one, based on XML petit parser) and I have a serious problem (well... besides my complete ignorance about petit parser, he...)
I need to match this pattern:

openTag, contents, closeTag (that will be something like "<html> ... </html>")
inlineTag (that will be something like "<br/>")
openTag (that will be something like "<link ...>" or "<img src='anUrl'>")

so, after try some variants... I came with this construct:

element
        "[39]   element   ::=   EmptyElemTag | STag content ETag"
       
        ^(self inlineTag / (self openTag, content, self closeTag) / self openTag)
                ==> [ :nodes | ].

openTag
        ^ $< asParser, qualified, whitespace optional, attributes, whitespace optional, $> asParser

inlineTag
        ^ $< asParser, qualified, whitespace optional, attributes, whitespace optional, '/>' asParser

closeTag
        ^'</' asParser , qualified , whitespace optional , $> asParser


so... the problem here is that the statement

self openTag, contents, self closeTag

matchs with

...
        <link ...>
</html>
       
and for that reason, the resulting tree is invalid.

So, I need a way to ensure the openTag name is equal to the closeTag name.

How can I do that?

Cheers,
Esteban
Reply | Threaded
Open this post in threaded view
|

Re: petit parser help

Casey Ransberger-2
I played with PEGs awhile ago. Recommend reading the wikipedia article. You can use a couple of lookahead type tricks, pay attention to the bit about "syntactic predicates."

I can't remember what the exact phrasing is in PetitParser, but there should be an "and" predicate and a "not" predicate that can effectively give you look ahead because they don't consume any input... Or something like that.

After grokking the whole predicates bit, you will probably look at PetitParser again and it'll just make sense. That's how it went for me anyhow, though it's already rather jumbled because it was about a week of my life a year ago;)

On Apr 25, 2011, at 2:42 PM, Esteban Lorenzano <[hidden email]> wrote:

> Hi Lukas, all
> I'm finally working on a HTML petit parser (a very basic one, based on XML petit parser) and I have a serious problem (well... besides my complete ignorance about petit parser, he...)
> I need to match this pattern:
>
> openTag, contents, closeTag    (that will be something like "<html> ... </html>")
> inlineTag                    (that will be something like "<br/>")
> openTag                    (that will be something like "<link ...>" or "<img src='anUrl'>")
>
> so, after try some variants... I came with this construct:
>
> element
>    "[39]       element       ::=        EmptyElemTag | STag content ETag"
>    
>    ^(self inlineTag / (self openTag, content, self closeTag) / self openTag)
>        ==> [ :nodes | ].
>
> openTag
>    ^ $< asParser, qualified, whitespace optional, attributes, whitespace optional, $> asParser
>
> inlineTag
>    ^ $< asParser, qualified, whitespace optional, attributes, whitespace optional, '/>' asParser
>
> closeTag
>    ^'</' asParser , qualified , whitespace optional , $> asParser
>
>
> so... the problem here is that the statement
>
> self openTag, contents, self closeTag
>
> matchs with
>
> ...
>    <link ...>
> </html>    
>    
> and for that reason, the resulting tree is invalid.
>
> So, I need a way to ensure the openTag name is equal to the closeTag name.
>
> How can I do that?
>
> Cheers,
> Esteban

Reply | Threaded
Open this post in threaded view
|

Re: petit parser help

Ricardo Moran
In reply to this post by EstebanLM
I only played a little with PetitParser but I think the answer is in PetitXml>>#element. You see in the action block that it compares the "qualified" of the open and close tags and if they're different it returns a PPFailure. It also takes care of the inlineTag in the same block by asking if the fifth node is '/>'.

element
"[39]   element   ::=   EmptyElemTag | STag content ETag"
^ $< asParser , qualified , attributes , whitespace optional , ('/>' asParser / ($> asParser , content , [ :stream | stream position ] asParser , '</' asParser , qualified , whitespace optional , $> asParser)) ==> [ :nodes | 
nodes fifth = '/>'
ifTrue: [ Array with: nodes second with: nodes third with: #() ]
ifFalse: [
nodes second = nodes fifth fifth
ifTrue: [ Array with: nodes second with: nodes third with: nodes fifth second ]
ifFalse: [ PPFailure message: 'Expected </' , nodes second qualifiedName , '>' at: nodes fifth third ] ] ]

I hope this helps.
Cheers,
Richo

On Mon, Apr 25, 2011 at 6:42 PM, Esteban Lorenzano <[hidden email]> wrote:
Hi Lukas, all
I'm finally working on a HTML petit parser (a very basic one, based on XML petit parser) and I have a serious problem (well... besides my complete ignorance about petit parser, he...)
I need to match this pattern:

openTag, contents, closeTag     (that will be something like "<html> ... </html>")
inlineTag                                       (that will be something like "<br/>")
openTag                                         (that will be something like "<link ...>" or "<img src='anUrl'>")

so, after try some variants... I came with this construct:

element
       "[39]           element    ::=           EmptyElemTag | STag content ETag"

       ^(self inlineTag / (self openTag, content, self closeTag) / self openTag)
               ==> [ :nodes | ].

openTag
       ^ $< asParser, qualified, whitespace optional, attributes, whitespace optional, $> asParser

inlineTag
       ^ $< asParser, qualified, whitespace optional, attributes, whitespace optional, '/>' asParser

closeTag
       ^'</' asParser , qualified , whitespace optional , $> asParser


so... the problem here is that the statement

self openTag, contents, self closeTag

matchs with

...
       <link ...>
</html>

and for that reason, the resulting tree is invalid.

So, I need a way to ensure the openTag name is equal to the closeTag name.

How can I do that?

Cheers,
Esteban

Reply | Threaded
Open this post in threaded view
|

Re: petit parser help

Lukas Renggli
> I only played a little with PetitParser but I think the answer is in
> PetitXml>>#element. You see in the action block that it compares the
> "qualified" of the open and close tags and if they're different it returns a
> PPFailure. It also takes care of the inlineTag in the same block by asking
> if the fifth node is '/>'.

This "taking care of the inlineTag" is just an ugly optimization, but
it makes the parser extremely fast :-)

Ideally you start with a grammar as you propose and add then check if
open and close tag are the same in the callback, as in the example
above.

Another subtlety that you see in the above example that is the parser:

    [ :stream | stream position ] asParser

This object returns the position in the stream and makes it possible
to create the failure at the beginning of the close tag. Again, this
is not required to start with. You can create the failure object like
below, it just will not point you to the right place in the input
then:

    PPFailure message: 'tags not matching' at: 0

Lukas

--
Lukas Renggli
www.lukas-renggli.ch

Reply | Threaded
Open this post in threaded view
|

Re: petit parser help

Ricardo Moran
Hi Lukas, thanks for answering! 
Since all this stuff is pretty new to me sometimes is hard to see when something is an optimization or the right way to do it :)

Cheers,
Richo

On Tue, Apr 26, 2011 at 2:27 AM, Lukas Renggli <[hidden email]> wrote:
> I only played a little with PetitParser but I think the answer is in
> PetitXml>>#element. You see in the action block that it compares the
> "qualified" of the open and close tags and if they're different it returns a
> PPFailure. It also takes care of the inlineTag in the same block by asking
> if the fifth node is '/>'.

This "taking care of the inlineTag" is just an ugly optimization, but
it makes the parser extremely fast :-)

Ideally you start with a grammar as you propose and add then check if
open and close tag are the same in the callback, as in the example
above.

Another subtlety that you see in the above example that is the parser:

   [ :stream | stream position ] asParser

This object returns the position in the stream and makes it possible
to create the failure at the beginning of the close tag. Again, this
is not required to start with. You can create the failure object like
below, it just will not point you to the right place in the input
then:

   PPFailure message: 'tags not matching' at: 0

Lukas

--
Lukas Renggli
www.lukas-renggli.ch