Smalltalk › Pharo › Pharo Smalltalk Developers

Re: [Moose-dev] Getting some tag in an HTML file

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

1 message

Vincent Blondeau

Re: [Moose-dev] Getting some tag in an HTML file

Hi,

Look at the class side, there is the method parse: namespace: validation: . call this method instead of parse: with false in the two last arguments. It should work.

Anyway, you should use the sax parser. It is faster and memory less consuming. It is very simple to get only one tag.

Cheers
Vincent

Le 14 août 2015 01:31, Alexandre Bergel <[hidden email]> a écrit :

>
> Hi!
>
> Together with Nicolas we are trying to get all the <script …> … </script> from html files.
> We have tried to use XMLDOMParser, but many webpages are actually not well formed, therefore the parser is complaining.
>
> Anyone has tried to get some particular tags from HTML files? This looks like a classical thing to do. Maybe some of you have done it.
> Is there a way to configure the parser to accept a broken XML/HTML content?
>
> Cheers,
> Alexandre
> --
> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
> Alexandre Bergel http://www.bergel.eu
> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>
>
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev