HTML parser

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

HTML parser

Sylvain pralon

Hi,

 

Can you advise me a parcel which can parse some html.

I tried with an xmlparser but even the google page is not xhtml valid, the meta tags are not closed.

Maybe the xml parser is a little too strict…

 

I saw the twoFlower parser but it seems to be old and not maintained.

 

Ideas ?

 

Thanks

 

 


_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: HTML parser

Philippe Marschall
2007/4/12, Sylvain Pralon <[hidden email]>:

>
>
>
>
> Hi,
>
>
>
> Can you advise me a parcel which can parse some html.
>
> I tried with an xmlparser but even the google page is not xhtml valid, the
> meta tags are not closed.
And it doesn't claim to be.

> Maybe the xml parser is a little too strict…

No, it just happens to parse XML and not HTML as the name says.

>
>
> I saw the twoFlower parser but it seems to be old and not maintained.
>
>
>
> Ideas ?

For Squeak there is:
http://www.squeaksource.com/htmlcssparser.html

Philippe


>
>
> Thanks
>
>
>
>
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
>

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

RE: HTML parser

Sylvain pralon
I am on visualWorks so I 'll look for a kind of equivalence


-----Message d'origine-----
De : [hidden email] [mailto:[hidden email]] De la part de Philippe Marschall
Envoyé : jeudi 12 avril 2007 13:32
À : Seaside - general discussion
Objet : Re: [Seaside] HTML parser

2007/4/12, Sylvain Pralon <[hidden email]>:

>
>
>
>
> Hi,
>
>
>
> Can you advise me a parcel which can parse some html.
>
> I tried with an xmlparser but even the google page is not xhtml valid,
> the meta tags are not closed.

And it doesn't claim to be.

> Maybe the xml parser is a little too strict…

No, it just happens to parse XML and not HTML as the name says.

>
>
> I saw the twoFlower parser but it seems to be old and not maintained.
>
>
>
> Ideas ?

For Squeak there is:
http://www.squeaksource.com/htmlcssparser.html

Philippe


>
>
> Thanks
>
>
>
>
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
>

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: HTML parser

Philippe Marschall
In reply to this post by Philippe Marschall
2007/4/12, Sylvain Pralon <[hidden email]>:
> I am on visualWorks so I 'll look for a kind of equivalence

If everything else fails, you can pass the website through tidy and
get wellformed (and valid) xhtml

Philippe

> -----Message d'origine-----
> De : [hidden email] [mailto:[hidden email]] De la part de Philippe Marschall
> Envoyé : jeudi 12 avril 2007 13:32
> À : Seaside - general discussion
> Objet : Re: [Seaside] HTML parser
>
> 2007/4/12, Sylvain Pralon <[hidden email]>:
> >
> >
> >
> >
> > Hi,
> >
> >
> >
> > Can you advise me a parcel which can parse some html.
> >
> > I tried with an xmlparser but even the google page is not xhtml valid,
> > the meta tags are not closed.
>
> And it doesn't claim to be.
>
> > Maybe the xml parser is a little too strict…
>
> No, it just happens to parse XML and not HTML as the name says.
>
> >
> >
> > I saw the twoFlower parser but it seems to be old and not maintained.
> >
> >
> >
> > Ideas ?
>
> For Squeak there is:
> http://www.squeaksource.com/htmlcssparser.html
>
> Philippe
>
>
> >
> >
> > Thanks
> >
> >
> >
> >
> > _______________________________________________
> > Seaside mailing list
> > [hidden email]
> > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
> >
> >
>
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: HTML parser

tblanchard
In reply to this post by Philippe Marschall
You can probably port it without too much trouble - it mostly relies  
on streams IIRC.  It is pretty forgiving of rotten input.

-Todd Blanchard

On Apr 12, 2007, at 5:54 AM, Sylvain Pralon wrote:

> I am on visualWorks so I 'll look for a kind of equivalence
>
>
> -----Message d'origine-----
> De : [hidden email] [mailto:seaside-
> [hidden email]] De la part de Philippe Marschall
> Envoyé : jeudi 12 avril 2007 13:32
> À : Seaside - general discussion
> Objet : Re: [Seaside] HTML parser
>
> 2007/4/12, Sylvain Pralon <[hidden email]>:
>>
>>
>>
>>
>> Hi,
>>
>>
>>
>> Can you advise me a parcel which can parse some html.
>>
>> I tried with an xmlparser but even the google page is not xhtml  
>> valid,
>> the meta tags are not closed.
>
> And it doesn't claim to be.
>
>> Maybe the xml parser is a little too strict…
>
> No, it just happens to parse XML and not HTML as the name says.
>
>>
>>
>> I saw the twoFlower parser but it seems to be old and not maintained.
>>
>>
>>
>> Ideas ?
>
> For Squeak there is:
> http://www.squeaksource.com/htmlcssparser.html
>
> Philippe
>
>
>>
>>
>> Thanks
>>
>>
>>
>>
>> _______________________________________________
>> Seaside mailing list
>> [hidden email]
>> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>>
>>
>
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside