Do we have a simple markdown parser?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Do we have a simple markdown parser?

Tim Mackinnon
Hi guys - do we have a simple markdown parser that is reasonably up to date? I did a quick GitHub scan and a few popped out, but I wasn’t convinced I had found one the “everyone” uses (albeit, everyone might be a small sample).

Ideally I don’t want to get sucked into writing another one (a project for a future time).

Tim
Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

darth-cheney
Hi Tim,

I was looking into this the other day (along with the potential native implementation) and remembered this was posted in Discord:

It uses an existing C based markdown parsing library via FFI.

Evidently, it is quite difficult to make a Markdown parser because there can be a lot of ambiguity. Would be cool to have a full Smalltalk implementation though.

On Tue, Mar 24, 2020 at 1:52 PM Tim Mackinnon <[hidden email]> wrote:
Hi guys - do we have a simple markdown parser that is reasonably up to date? I did a quick GitHub scan and a few popped out, but I wasn’t convinced I had found one the “everyone” uses (albeit, everyone might be a small sample).

Ideally I don’t want to get sucked into writing another one (a project for a future time).

Tim


--
Eric
Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

Pharo Smalltalk Users mailing list
In reply to this post by Tim Mackinnon
I do not know if it works on Pharo, but XTreams has a XTreams-Parsing section that I am currently working with. The existing Wikitext grammar was my starting point .


the existing Grammar is under:

XTreams-Parsing -> PEGParser > grammars->grammarWiki


Use case is:


| wikiGrammar wikiParser output|
Transcript clear.
wikiGrammar := PEGParser grammarWiki reading positioning.
wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar actor: PEGParserParser new.
input := (your string input goes here)
output := wikiParser parse: 'Page' stream: input actor: PEGWikiGenerator new.
output inspect.

hth

---- On Tue, 24 Mar 2020 13:51:31 -0400 Tim Mackinnon <[hidden email]> wrote ----

Hi guys - do we have a simple markdown parser that is reasonably up to date? I did a quick GitHub scan and a few popped out, but I wasn’t convinced I had found one the “everyone” uses (albeit, everyone might be a small sample).

Ideally I don’t want to get sucked into writing another one (a project for a future time).

Tim


Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

Tim Mackinnon
In reply to this post by darth-cheney
Thanks Eric - I did have a peek at Phoedown (as I did recall that announcement) but as my use case is for a generic library for testing (that I could potentially share with the wider community), I wasn’t convince that needing a separate dependency would fly (for an own standalone project it did seem like a good bet and very comprehensive).


On 24 Mar 2020, at 17:56, Eric Gade <[hidden email]> wrote:

Hi Tim,

I was looking into this the other day (along with the potential native implementation) and remembered this was posted in Discord:

It uses an existing C based markdown parsing library via FFI.

Evidently, it is quite difficult to make a Markdown parser because there can be a lot of ambiguity. Would be cool to have a full Smalltalk implementation though.

On Tue, Mar 24, 2020 at 1:52 PM Tim Mackinnon <[hidden email]> wrote:
Hi guys - do we have a simple markdown parser that is reasonably up to date? I did a quick GitHub scan and a few popped out, but I wasn’t convinced I had found one the “everyone” uses (albeit, everyone might be a small sample).

Ideally I don’t want to get sucked into writing another one (a project for a future time).

Tim


--
Eric

Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

Tim Mackinnon
In reply to this post by Pharo Smalltalk Users mailing list
Hmm I hadn’t even thought of Xstreams… I always thought it sounded cool, perhaps its a place to start - but as you mention, I’m not sure it really gained traction in the Pharo world. My use case is quite simple as the markdown files are simple configuration (and maybe don’t even need a parser - e.g. https://github.com/servirtium/demo-java-climate-tck/blob/master/src/test/mocks/averageRainfallForEgyptFrom1980to1999Exists.md). I was hoping there might be something simple I could run with to help explore how best to use the config in those files.

Lets see if anyone else mentions something that’s a no-brainer.

Tim

On 24 Mar 2020, at 18:40, gettimothy via Pharo-users <[hidden email]> wrote:


From: gettimothy <[hidden email]>
Subject: Re: [Pharo-users] Do we have a simple markdown parser?
Date: 24 March 2020 at 18:40:42 GMT
To: "Any question about pharo is welcome" <[hidden email]>
Cc: "Pharo Users Newsgroup" <[hidden email]>


I do not know if it works on Pharo, but XTreams has a XTreams-Parsing section that I am currently working with. The existing Wikitext grammar was my starting point .


the existing Grammar is under:

XTreams-Parsing -> PEGParser > grammars->grammarWiki


Use case is:


| wikiGrammar wikiParser output|
Transcript clear.
wikiGrammar := PEGParser grammarWiki reading positioning.
wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar actor: PEGParserParser new.
input := (your string input goes here)
output := wikiParser parse: 'Page' stream: input actor: PEGWikiGenerator new.
output inspect.

hth

---- On Tue, 24 Mar 2020 13:51:31 -0400 Tim Mackinnon <[hidden email]> wrote ----

Hi guys - do we have a simple markdown parser that is reasonably up to date? I did a quick GitHub scan and a few popped out, but I wasn’t convinced I had found one the “everyone” uses (albeit, everyone might be a small sample).

Ideally I don’t want to get sucked into writing another one (a project for a future time).

Tim





Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

Kasper Osterbye
I have a github-markdown which translates into pillar - and from pillar you can get many things.

It is lacking several aspects, as I personally was most interested in rendering it inside pharo. The major thing missing (because I could not figure out how to render them) is tables. But take a look at: https://github.com/kasperosterbye/PillarRichTextRender

But perhaps it is better than nothing - your usage scenario is not quite clear to me.

Best,

Kasper

On Tue, Mar 24, 2020 at 10:12 PM Tim Mackinnon <[hidden email]> wrote:
Hmm I hadn’t even thought of Xstreams… I always thought it sounded cool, perhaps its a place to start - but as you mention, I’m not sure it really gained traction in the Pharo world. My use case is quite simple as the markdown files are simple configuration (and maybe don’t even need a parser - e.g. https://github.com/servirtium/demo-java-climate-tck/blob/master/src/test/mocks/averageRainfallForEgyptFrom1980to1999Exists.md). I was hoping there might be something simple I could run with to help explore how best to use the config in those files.

Lets see if anyone else mentions something that’s a no-brainer.

Tim

On 24 Mar 2020, at 18:40, gettimothy via Pharo-users <[hidden email]> wrote:


From: gettimothy <[hidden email]>
Subject: Re: [Pharo-users] Do we have a simple markdown parser?
Date: 24 March 2020 at 18:40:42 GMT
To: "Any question about pharo is welcome" <[hidden email]>
Cc: "Pharo Users Newsgroup" <[hidden email]>


I do not know if it works on Pharo, but XTreams has a XTreams-Parsing section that I am currently working with. The existing Wikitext grammar was my starting point .


the existing Grammar is under:

XTreams-Parsing -> PEGParser > grammars->grammarWiki


Use case is:


| wikiGrammar wikiParser output|
Transcript clear.
wikiGrammar := PEGParser grammarWiki reading positioning.
wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar actor: PEGParserParser new.
input := (your string input goes here)
output := wikiParser parse: 'Page' stream: input actor: PEGWikiGenerator new.
output inspect.

hth

---- On Tue, 24 Mar 2020 13:51:31 -0400 Tim Mackinnon <[hidden email]> wrote ----

Hi guys - do we have a simple markdown parser that is reasonably up to date? I did a quick GitHub scan and a few popped out, but I wasn’t convinced I had found one the “everyone” uses (albeit, everyone might be a small sample).

Ideally I don’t want to get sucked into writing another one (a project for a future time).

Tim





Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

Richard O'Keefe
In reply to this post by Tim Mackinnon
Much depends on what you mean by "Markdown".
The current version of the CommonMark spec https://spec.commonmark.org/0.29/
comes to 124 printed pages.  Mind you, some of that is background, some of it is
advice about how to parse Markdown, and a lot of it is 500 examples.
If I wanted to
process Markdown in Smalltalk, I'd probably use the CommonMark reference
implementation in C (cmark) to convert Markdown to XML and parse the XML.
Alternatively, call libcmark through the foreign function interface.

On Wed, 25 Mar 2020 at 06:52, Tim Mackinnon <[hidden email]> wrote:
>
> Hi guys - do we have a simple markdown parser that is reasonably up to date? I did a quick GitHub scan and a few popped out, but I wasn’t convinced I had found one the “everyone” uses (albeit, everyone might be a small sample).
>
> Ideally I don’t want to get sucked into writing another one (a project for a future time).
>
> Tim

Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

Tim Mackinnon
In reply to this post by Kasper Osterbye
Kaspar - I can’t get it to load in a P8 image - it seems to stall half way through (I’ve seen that a few times with Metacello to be honest - I think there is some escaping timeout that causes issues).

Does it load for you?



On 24 Mar 2020, at 22:38, Kasper Osterbye <[hidden email]> wrote:

I have a github-markdown which translates into pillar - and from pillar you can get many things.

It is lacking several aspects, as I personally was most interested in rendering it inside pharo. The major thing missing (because I could not figure out how to render them) is tables. But take a look at: https://github.com/kasperosterbye/PillarRichTextRender

But perhaps it is better than nothing - your usage scenario is not quite clear to me.

Best,

Kasper

On Tue, Mar 24, 2020 at 10:12 PM Tim Mackinnon <[hidden email]> wrote:
Hmm I hadn’t even thought of Xstreams… I always thought it sounded cool, perhaps its a place to start - but as you mention, I’m not sure it really gained traction in the Pharo world. My use case is quite simple as the markdown files are simple configuration (and maybe don’t even need a parser - e.g. https://github.com/servirtium/demo-java-climate-tck/blob/master/src/test/mocks/averageRainfallForEgyptFrom1980to1999Exists.md). I was hoping there might be something simple I could run with to help explore how best to use the config in those files.

Lets see if anyone else mentions something that’s a no-brainer.

Tim

On 24 Mar 2020, at 18:40, gettimothy via Pharo-users <[hidden email]> wrote:


From: gettimothy <[hidden email]>
Subject: Re: [Pharo-users] Do we have a simple markdown parser?
Date: 24 March 2020 at 18:40:42 GMT
To: "Any question about pharo is welcome" <[hidden email]>
Cc: "Pharo Users Newsgroup" <[hidden email]>


I do not know if it works on Pharo, but XTreams has a XTreams-Parsing section that I am currently working with. The existing Wikitext grammar was my starting point .


the existing Grammar is under:

XTreams-Parsing -> PEGParser > grammars->grammarWiki


Use case is:


| wikiGrammar wikiParser output|
Transcript clear.
wikiGrammar := PEGParser grammarWiki reading positioning.
wikiParser := PEGParser parserPEG parse: 'Grammar' stream: wikiGrammar actor: PEGParserParser new.
input := (your string input goes here)
output := wikiParser parse: 'Page' stream: input actor: PEGWikiGenerator new.
output inspect.

hth

---- On Tue, 24 Mar 2020 13:51:31 -0400 Tim Mackinnon <[hidden email]> wrote ----

Hi guys - do we have a simple markdown parser that is reasonably up to date? I did a quick GitHub scan and a few popped out, but I wasn’t convinced I had found one the “everyone” uses (albeit, everyone might be a small sample).

Ideally I don’t want to get sucked into writing another one (a project for a future time).

Tim






Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

Kasper Osterbye
@Tim. I just verified that it load on a fresh P8. I am on a mac, but that should not make any difference.

image.png

Best,

Kasper
Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

Tim Mackinnon
Hey thanks - I took another fresh image and this one seemed to work, so not sure what happened in the other image that causes that issue (I’ve seen it before with other projects too - something seems to cause a stall, and then things don’t seem to work …).

Anyway - thanks for putting it forward - I saw it load PP2, so it seems like something in the write direction for me (although my use case might be possible with a regex, but that is just so nasty).

Tim

> On 25 Mar 2020, at 17:03, Kasper Osterbye <[hidden email]> wrote:
>
> @Tim. I just verified that it load on a fresh P8. I am on a mac, but that should not make any difference.
>
> <image.png>
>
> Best,
>
> Kasper


Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

Kasper Osterbye
I do not think the PP2 is used in the github parser actually. The github parser was just a addon to the rest of the stuff, and the pillar markdown parser used PP2.

Anyways, if you have any questions, feel free to ask.

Best,

Kasper
Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

Tim Mackinnon
Hey Kasper - it looks like your GHMParser does exactly what I need, in that it parses the markdown spec I am looking at.

One question - GHMAbstractBlock has a children property - when would you expect that to be populated?

E.g. if the markdown was

# Header 1

Some text

## Header 2

More text

## Header 3

Other text


Would you expect Header2 and 3 to be children of Header1? And equally the text blocks to be children of the respective headers?

At the moment they are just represented as a flat list - which I can use, but I was curious of your thoughts on that children property with respect to headings. Digging a bit more, it seems that only Lists use children (maybe that’s right, but I always viewed markdown a bit like a structure document and hence the idea of sections having children too).


I’ll see how I get on, as this gives me a big leg up. If it does prove useful, might there be a possibility to either extract it into a separate project, or have a baseline group that just loads the parser?

Tim

> On 25 Mar 2020, at 19:46, Kasper Osterbye <[hidden email]> wrote:
>
> I do not think the PP2 is used in the github parser actually. The github parser was just a addon to the rest of the stuff, and the pillar markdown parser used PP2.
>
> Anyways, if you have any questions, feel free to ask.
>
> Best,
>
> Kasper


Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

Ramon Leon-5
In reply to this post by Tim Mackinnon
On 2020-03-24 10:51 a.m., Tim Mackinnon wrote:
> Hi guys - do we have a simple markdown parser that is reasonably up to date?

What's wrong with the real markdown itself? I've used the original Markdown.pl implementation for years same as I would any other shell script, via OSProcess

markdown: someContent
   ^UnixProcess pipeString: someContent throughCommand: (FileDirectory default fullPathFor: 'Markdown.pl')


--
Ramón León


Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

darth-cheney
Hi Ramón,

I have a couple of questions. If you are using OSProcess in Pharo 8, how are you installing it and from what repository? If I add the Squeaksource version, I do not have the method that you are referencing.

Additionally, it would be good to see an example of this in OSSubprocess. Using the latest Baseline installation instructions for it from Github, and from reading the instructions, I don't see a clear way to do this. The documentation claims that you can redirect the stdin to any file or readstream, but the new ZnStreams in P7/P8 DNU on #isOssPipe and StandardFileStream (for which OSSubprocess was designed, evidently) is being deprecated.

If anyone reading has a good concise example of running a subprocess command with input from a stream (or string), that would be very useful. Thanks.

On Thu, Mar 26, 2020 at 5:24 PM Ramon Leon <[hidden email]> wrote:
On 2020-03-24 10:51 a.m., Tim Mackinnon wrote:
> Hi guys - do we have a simple markdown parser that is reasonably up to date?

What's wrong with the real markdown itself? I've used the original Markdown.pl implementation for years same as I would any other shell script, via OSProcess

markdown: someContent
   ^UnixProcess pipeString: someContent throughCommand: (FileDirectory default fullPathFor: 'Markdown.pl')


--
Ramón León




--
Eric
Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

Kasper Osterbye
In reply to this post by Tim Mackinnon
Hi Tim,

I had the same need for a more structured access to a parsed document. I believe this class does something to like that

My parser and that of pillar produce what you call "flat" parses, so this thing creates a superstructure on top of the actual tree, and leaves the actual tree intact. It seems I even made a gtInspector for the thing.

If you find it useful, I can try and move it to the PillarRichTextRender and make the baseline or separate project.

Best,

Kasper


On Thu, Mar 26, 2020 at 5:12 PM Tim Mackinnon <[hidden email]> wrote:
Hey Kasper - it looks like your GHMParser does exactly what I need, in that it parses the markdown spec I am looking at.

One question - GHMAbstractBlock has a children property - when would you expect that to be populated?

E.g. if the markdown was

# Header 1

Some text

## Header 2

More text

## Header 3

Other text


Would you expect Header2 and 3 to be children of Header1? And equally the text blocks to be children of the respective headers?

At the moment they are just represented as a flat list - which I can use, but I was curious of your thoughts on that children property with respect to headings. Digging a bit more, it seems that only Lists use children (maybe that’s right, but I always viewed markdown a bit like a structure document and hence the idea of sections having children too).


I’ll see how I get on, as this gives me a big leg up. If it does prove useful, might there be a possibility to either extract it into a separate project, or have a baseline group that just loads the parser?

Tim

> On 25 Mar 2020, at 19:46, Kasper Osterbye <[hidden email]> wrote:
>
> I do not think the PP2 is used in the github parser actually. The github parser was just a addon to the rest of the stuff, and the pillar markdown parser used PP2.
>
> Anyways, if you have any questions, feel free to ask.
>
> Best,
>
> Kasper


Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

Ramon Leon-5
In reply to this post by darth-cheney
On 2020-03-26 3:24 p.m., Eric Gade wrote:
> Hi Ramón,
>
> I have a couple of questions. If you are using OSProcess in Pharo 8

I'm not, I don't try and keep up with the latest stuff, too much churn.  But I'd imagine the latest must still be able to pipe out to a command even if the API changed a bit.

>  but the new ZnStreams in P7/P8 DNU on #isOssPipe and StandardFileStream (for which OSSubprocess was designed, evidently) is being deprecated.

And this is why it's not worth riding the bleeding edge.  Go back and find a stable older version you like and stick with it. Let others get all cut up playing with unstable new stuff.

--
Ramón León


Reply | Threaded
Open this post in threaded view
|

Re: Do we have a simple markdown parser?

David T. Lewis
On Fri, Mar 27, 2020 at 08:35:47AM -0700, Ramon Leon wrote:
> On 2020-03-26 3:24 p.m., Eric Gade wrote:
> >Hi Ram??n,
> >
> >I have a couple of questions. If you are using OSProcess in Pharo 8
>
> I'm not, I don't try and keep up with the latest stuff, too much churn.  
> But I'd imagine the latest must still be able to pipe out to a command even
> if the API changed a bit.
>

For Pharo 7, you may want to use these:

https://github.com/dtlewis290/OSProcess-Tonel
https://github.com/dtlewis290/CommandShell-Tonel

I have not tried Pharo 8, so I cannot say if it works there.

Dave

> > but the new ZnStreams in P7/P8 DNU on #isOssPipe and StandardFileStream
> > (for which OSSubprocess was designed, evidently) is being deprecated.
>
> And this is why it's not worth riding the bleeding edge.  Go back and find
> a stable older version you like and stick with it. Let others get all cut
> up playing with unstable new stuff.
>
> --
> Ram??n Le??n
>