[vwnc] how to parse textual markup (Wikitext et. al.)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[vwnc] how to parse textual markup (Wikitext et. al.)

Thomas Schrader
Hi everybody,

(I tried hard, but standing in my way all the time I couldn't arrive anywhere; please help ...)

I *just* want to write a parser for some light textual markup like follows:

---%<------%<------%<------%<------

A section title
========

Some first paragraph
with linebreak.

A subsection title
---------------------

A first paragraph.

A second paragraph.

  * a list item
  * another list item
  * same

Another subsection title
-----------------------------

A first paragraph.

A second paragraph.

---%<------%<------%<------%<------

This should parse into a document tree à la DOM to transform
it to XHTML or anything else. Performance is not an issue!

I tried to define different state objects with different closures in them
to be evaluated on each different kind of input line arriving, but couldn't
come up with a decent design that allows nesting of sections.

One problem is, that a section or a subsection doesn't know that
it's going to be a section, till two lines later, when the underline
arrives. Actually I see a lot of case statements and stacking and
doubling code.

Just blinded by the obvious, perhaps ...

Many thanks for every hint

Thomas
______________________________________________________
GRATIS für alle WEB.DE-Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] how to parse textual markup (Wikitext et. al.)

Steffen Märcker
Hi,

one can use SmaCC - code and example are attached.

Steffen



Am 21.10.2009, 23:35 Uhr, schrieb Thomas Schrader  
<[hidden email]>:

> Hi everybody,
>
> (I tried hard, but standing in my way all the time I couldn't arrive  
> anywhere; please help ...)
>
> I *just* want to write a parser for some light textual markup like  
> follows:
>
> ---%<------%<------%<------%<------
>
> A section title
> ========
>
> Some first paragraph
> with linebreak.
>
> A subsection title
> ---------------------
>
> A first paragraph.
>
> A second paragraph.
>
>   * a list item
>   * another list item
>   * same
>
> Another subsection title
> -----------------------------
>
> A first paragraph.
>
> A second paragraph.
>
> ---%<------%<------%<------%<------
>
> This should parse into a document tree à la DOM to transform
> it to XHTML or anything else. Performance is not an issue!
>
> I tried to define different state objects with different closures in them
> to be evaluated on each different kind of input line arriving, but  
> couldn't
> come up with a decent design that allows nesting of sections.
>
> One problem is, that a section or a subsection doesn't know that
> it's going to be a section, till two lines later, when the underline
> arrives. Actually I see a lot of case statements and stacking and
> doubling code.
>
> Just blinded by the obvious, perhaps ...
>
> Many thanks for every hint
>
> Thomas
> ______________________________________________________
> GRATIS für alle WEB.DE-Nutzer: Die maxdome Movie-FLAT!
> Jetzt freischalten unter http://movieflat.web.de
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

example markup.txt (390 bytes) Download Attachment
parser example.st (20K) Download Attachment