HTML parser

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

HTML parser

Sylvain pralon

Hi,

 

Can you advise me a parcel which can parse some html.

I tried with an xmlparser but even the google page is not xhtml valid, the meta tags are not closed.

Maybe the xml parser is a little too strict…

 

I saw the twoFlower parser but it seems to be old and not maintained.

 

Ideas ?

 

Thanks

 

 

Reply | Threaded
Open this post in threaded view
|

Re: HTML parser

Josef Springer
Hi Sylvain,

you can pares a XML-File with  XML.XMLParser processDocumentInFilename: aFilename beforeScanDo: [:t5 | t5 validate: false]

Signature mit freundlichen Grüßen / best regards,
Josef Springer
(Geschäftsleitung/Management)

Postal
Address
[hidden email]
Orlando-di-Lasso Str. 2
D-85640 Putzbrunn
Phone
Office
+49 (0)89 600 6920


Phone
Fax
+49 (0)89 600 69220


Web
Web
http://www.joops.com



JOOPS
-- the software company --

 

Sylvain Pralon wrote:

Hi,

 

Can you advise me a parcel which can parse some html.

I tried with an xmlparser but even the google page is not xhtml valid, the meta tags are not closed.

Maybe the xml parser is a little too strict…

 

I saw the twoFlower parser but it seems to be old and not maintained.

 

Ideas ?

 

Thanks

 

 


Reply | Threaded
Open this post in threaded view
|

RE: [Seaside] HTML parser

Robin Barendregt
In reply to this post by Sylvain pralon
Well, the parsed xml should be at least well-formed, shouldn't it.
What you could try is to run the html through HTMLTidy and then through the xml parser.

 
________________________________

Van: [hidden email] namens Sylvain Pralon
Verzonden: do 12/04/2007 12:33
Aan: [hidden email]; [hidden email]
Onderwerp: [Seaside] HTML parser



Hi,

 

Can you advise me a parcel which can parse some html.

I tried with an xmlparser but even the google page is not xhtml valid, the meta tags are not closed.

Maybe the xml parser is a little too strict...

 

I saw the twoFlower parser but it seems to be old and not maintained.

 

Ideas ?

 

Thanks