HTML5 Parser

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

HTML5 Parser

Mohammad Al Houssami (Alumni)

Hello again,

 

Im building an HTML5 Parser in smalltalk.
Im building it according to the pseudo code provided by WHATWG  mainly these:

Tokenization: http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html

Tree Construction: http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html

 

Im still in the Tokenization phase.
The spec defines a state machine. It describes what has to be done when we reach a certain state.

My approach to building the tokenizer is by representing each state as a method.

Each method does some operations ( calling other methods to represent changing state or returning tokens etc..)

 

I will be translating the pseudo code provided as is to Smalltalk.

I am not sure if this is the best approach to do things especially that I am still new to Smalltalk.

 

I was told by Stephane Ducasse to use PetitParser.

Doing a quick reading I noticed that it is used when grammars are available. In my case I don’t have a grammar but pseudo-code of a parser. Can I know if anyone has any suggestions for such a project or any comments on the approach I am aiming to follow ?

 

Thanks in advance,

Mohammad 

Reply | Threaded
Open this post in threaded view
|

Re: HTML5 Parser

philippeback
Learn petit parser way of doing things. It takes a while to grok.
Well, at least, there is still a ton to learn AFAIAC

But it is very powerful.

Phil
2013/3/11 Mohammad Al Houssami (Alumni) <[hidden email]>:

> Hello again,
>
>
>
> Im building an HTML5 Parser in smalltalk.
> Im building it according to the pseudo code provided by WHATWG  mainly
> these:
>
> Tokenization:
> http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html
>
> Tree Construction:
> http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html
>
>
>
> Im still in the Tokenization phase.
> The spec defines a state machine. It describes what has to be done when we
> reach a certain state.
>
> My approach to building the tokenizer is by representing each state as a
> method.
>
> Each method does some operations ( calling other methods to represent
> changing state or returning tokens etc..)
>
>
>
> I will be translating the pseudo code provided as is to Smalltalk.
>
> I am not sure if this is the best approach to do things especially that I am
> still new to Smalltalk.
>
>
>
> I was told by Stephane Ducasse to use PetitParser.
>
> Doing a quick reading I noticed that it is used when grammars are available.
> In my case I don’t have a grammar but pseudo-code of a parser. Can I know if
> anyone has any suggestions for such a project or any comments on the
> approach I am aiming to follow ?
>
>
>
> Thanks in advance,
>
> Mohammad
>