Hi Lukas, all
I'm finally working on a HTML petit parser (a very basic one, based on XML petit parser) and I have a serious problem (well... besides my complete ignorance about petit parser, he...) I need to match this pattern: openTag, contents, closeTag (that will be something like "<html> ... </html>") inlineTag (that will be something like "<br/>") openTag (that will be something like "<link ...>" or "<img src='anUrl'>") so, after try some variants... I came with this construct: element "[39] element ::= EmptyElemTag | STag content ETag" ^(self inlineTag / (self openTag, content, self closeTag) / self openTag) ==> [ :nodes | ]. openTag ^ $< asParser, qualified, whitespace optional, attributes, whitespace optional, $> asParser inlineTag ^ $< asParser, qualified, whitespace optional, attributes, whitespace optional, '/>' asParser closeTag ^'</' asParser , qualified , whitespace optional , $> asParser so... the problem here is that the statement self openTag, contents, self closeTag matchs with ... <link ...> </html> and for that reason, the resulting tree is invalid. So, I need a way to ensure the openTag name is equal to the closeTag name. How can I do that? Cheers, Esteban |
I played with PEGs awhile ago. Recommend reading the wikipedia article. You can use a couple of lookahead type tricks, pay attention to the bit about "syntactic predicates."
I can't remember what the exact phrasing is in PetitParser, but there should be an "and" predicate and a "not" predicate that can effectively give you look ahead because they don't consume any input... Or something like that. After grokking the whole predicates bit, you will probably look at PetitParser again and it'll just make sense. That's how it went for me anyhow, though it's already rather jumbled because it was about a week of my life a year ago;) On Apr 25, 2011, at 2:42 PM, Esteban Lorenzano <[hidden email]> wrote: > Hi Lukas, all > I'm finally working on a HTML petit parser (a very basic one, based on XML petit parser) and I have a serious problem (well... besides my complete ignorance about petit parser, he...) > I need to match this pattern: > > openTag, contents, closeTag (that will be something like "<html> ... </html>") > inlineTag (that will be something like "<br/>") > openTag (that will be something like "<link ...>" or "<img src='anUrl'>") > > so, after try some variants... I came with this construct: > > element > "[39] element ::= EmptyElemTag | STag content ETag" > > ^(self inlineTag / (self openTag, content, self closeTag) / self openTag) > ==> [ :nodes | ]. > > openTag > ^ $< asParser, qualified, whitespace optional, attributes, whitespace optional, $> asParser > > inlineTag > ^ $< asParser, qualified, whitespace optional, attributes, whitespace optional, '/>' asParser > > closeTag > ^'</' asParser , qualified , whitespace optional , $> asParser > > > so... the problem here is that the statement > > self openTag, contents, self closeTag > > matchs with > > ... > <link ...> > </html> > > and for that reason, the resulting tree is invalid. > > So, I need a way to ensure the openTag name is equal to the closeTag name. > > How can I do that? > > Cheers, > Esteban |
In reply to this post by EstebanLM
I only played a little with PetitParser but I think the answer is in PetitXml>>#element. You see in the action block that it compares the "qualified" of the open and close tags and if they're different it returns a PPFailure. It also takes care of the inlineTag in the same block by asking if the fifth node is '/>'.
element "[39] element ::= EmptyElemTag | STag content ETag"
^ $< asParser , qualified , attributes , whitespace optional , ('/>' asParser / ($> asParser , content , [ :stream | stream position ] asParser , '</' asParser , qualified , whitespace optional , $> asParser)) ==> [ :nodes |
nodes fifth = '/>' ifTrue: [ Array with: nodes second with: nodes third with: #() ]
ifFalse: [ nodes second = nodes fifth fifth ifTrue: [ Array with: nodes second with: nodes third with: nodes fifth second ]
ifFalse: [ PPFailure message: 'Expected </' , nodes second qualifiedName , '>' at: nodes fifth third ] ] ]
I hope this helps. Cheers, Richo On Mon, Apr 25, 2011 at 6:42 PM, Esteban Lorenzano <[hidden email]> wrote: Hi Lukas, all |
> I only played a little with PetitParser but I think the answer is in
> PetitXml>>#element. You see in the action block that it compares the > "qualified" of the open and close tags and if they're different it returns a > PPFailure. It also takes care of the inlineTag in the same block by asking > if the fifth node is '/>'. This "taking care of the inlineTag" is just an ugly optimization, but it makes the parser extremely fast :-) Ideally you start with a grammar as you propose and add then check if open and close tag are the same in the callback, as in the example above. Another subtlety that you see in the above example that is the parser: [ :stream | stream position ] asParser This object returns the position in the stream and makes it possible to create the failure at the beginning of the close tag. Again, this is not required to start with. You can create the failure object like below, it just will not point you to the right place in the input then: PPFailure message: 'tags not matching' at: 0 Lukas -- Lukas Renggli www.lukas-renggli.ch |
Hi Lukas, thanks for answering!
Since all this stuff is pretty new to me sometimes is hard to see when something is an optimization or the right way to do it :) Cheers, Richo
On Tue, Apr 26, 2011 at 2:27 AM, Lukas Renggli <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |