How to approach this PEGParser Grammar Fix.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
tty
Reply | Threaded
Open this post in threaded view
|

How to approach this PEGParser Grammar Fix.

tty
Hi Folks,

In XTreams parsing, the grammarWiki/PEGWikiGenerator combo do not parse the Wikimedia headings.
I have copied the grammarWiki to grammarWikiMedia and I am slowly building it up in an attempt to isolate the problem.
Grammar looks like this:


grammarWikiMedia
"
"
^
'Page <- (Heading)*

LineCharacter <- [^\n]
Flow <- Escape / Bold / Italic / LinkShort / LinkFull / LineCharacter
Escape <- "**" / "__" / "[["
Bold <- "*" Flow{"*"}
Italic <- "_" Flow{"_"}
LinkShort <- "[" .{&[>\]]} "]"
LinkFull <- "[" Flow{">"} .{"]"}

Line <- Flow{1,"\n"}
Paragraph <- Line
Empty <- "\n"
Whitespace <- [\t\s]*

Heading <- Heading6 /  Heading5 / Heading4 / Heading3 / Heading2 / Heading1
Heading1 <- Whitespace "= " Flow{" ="}
Heading2 <- Whitespace "== " Flow{" =="}
Heading3 <- Whitespace "=== " Flow{" ==="}
Heading4 <- Whitespace "==== " Flow{" ===="}
Heading5 <- Whitespace "===== " Flow{" ====="}
Heading6 <- Whitespace "====== " Flow{" ======"}

'



For the Actor, I have copied the PEGWikiGenerator, saving it as PEGWikiMediaGenerator. I have made some minor additions to support H5 and H6 heading levels per Wikimedia standards.
My problem, is that Wikimedia seems to like to wrap its <hN></hN> tags within a paragraph: <p><hN></hN></p>
So, while I can parse this input just ducky:

| wikiGrammar wikiParser input output |
wikiGrammar := PEGParser grammarWikiMedia reading positioning. "This is your grammar converted to an xtream."
wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar actor: PEGParserParser new. "This is the parser generated from your grammar."
input := ' = Heading 1 =  == Heading 2 == === Heading 3 === ==== Heading 4 ==== ===== Heading 5 ===== ====== Heading 6 ======'.
output := wikiParser parse: 'Page' stream: input actor: PEGWikiMediaGenerator new. "An actual compiler doing the most basic stuff."
output inspect.
Producing an XMLElement looking like this:


<div><h1>Heading 1</h1><h2>Heading 2</h2><h3>Heading 3</h3><h4>Heading 4</h4><h5>Heading 5</h5><h6>Heading 6</h6></div>
When I wrap the <hn> elements in <p> tags for this input...


| wikiGrammar wikiParser input output |
wikiGrammar := PEGParser grammarWikiMedia reading positioning. "This is your grammar converted to an xtream."
wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar actor: PEGParserParser new. "This is the parser generated from your grammar."
input := '<p>= Heading 1 =</p>  <p>== Heading 2 ==</p> <p>=== Heading 3 ===</p> <p>==== Heading 4 ====</p> <p>===== Heading 5 =====</p> <p> ====== Heading 6 ======</p>'.
output := wikiParser parse: 'Page' stream: input actor: PEGWikiMediaGenerator new. "An actual compiler doing the most basic stuff."
output inspect.
my XMLElement looks like this:

<div/>

I am supposing that I have a wayward Grammar specification.

Where should I focus?

Should I hack at 
Heading1 <- Whitespace "= " Flow{" ="}
and change "Whitespace" to something else ?

Or should I redefine the 

Line <- Flow{1,"\n"}
Paragraph <- Line

duo?

If a general principle exists that will guide me going forward, I would very much appreciate it.

Thank you in advance.

t









_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
tty
Reply | Threaded
Open this post in threaded view
|

Re: How to approach this PEGParser Grammar Fix.

tty
Figured out how to approach it.
Study PEG Parsing Expression Grammar
https://en.wikipedia.org/wiki/Parsing_expression_grammar




--
Sent from: http://forum.world.st/Squeak-Beginners-f107673.html
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
tty
Reply | Threaded
Open this post in threaded view
|

Re: How to approach this PEGParser Grammar Fix.

tty
In reply to this post by tty
Slowly grokking the big picture.

A Gramamar is a bunch of PEG rules

in the grammarWiki (and now grammarMediaWiki) the "top rule" is

Page <- (Preformatted / Code / UnorderedList / OrderedList / Heading / Table
/ Paragraph / Empty)*

which is a a grouping of zero or more subrules. The rule for Preformatted
looks like this:

Preformatted <- "---\n" .{"---\n"}


Now here is the neat thing.

In the PEGActor subclass for that grammar PEGWikiGenerator, there are a
bunch of methods. However, within many of those methods are PRAGMAS.

Here is the Preformatted method:


Preformatted: text

        <action: 'Preformatted' arguments: #(2)>
        <action: 'Code' arguments: #(2)>

        ^self
                newElementTag: Preformatted
                elements: (Array with: (self newText: text))

See those pragmas?
They match up exactly with the Grammar rules.

Somehow, something, parses the grammars and then goes looking in the Actor
for pragmas. When a pragma in a method matches a rule, then I "think" that a
block is stored in the PEGParser and when that rule is encountered during
the parse, that block is executed.







--
Sent from: http://forum.world.st/Squeak-Beginners-f107673.html
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
tty
Reply | Threaded
Open this post in threaded view
|

Re: How to approach this PEGParser Grammar Fix.

tty
In reply to this post by tty
So, to answer my initial questions...

The way to debug this is from the bottom up.

the rule

Page <- (Paragraph)*  and a breakpoint in the PEGActor's Paragraph method
shows that the rule for Paragraph

Line <- Flow{1,"\n"}
Paragraph <- Line

while wrong(it wraps the entire page in a paragraph), IS being captured by
the Actor.

Interesting, the Paragraph is executed before the Page..this tells me that
its an inverse onion approach.

Get the inner rules correct and work outwards.

So, I have a broken Grammar for WikiText and now I have a method to approach
this problem.

1. try to define a rule.
2. put a breakpoint in the method that has a pragma for that rule.
3. ???
4. profilt







--
Sent from: http://forum.world.st/Squeak-Beginners-f107673.html
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners