Re: [Moose-dev] Re: Getting some tag in an HTML file

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: [Moose-dev] Re: Getting some tag in an HTML file

Hannes Hirzel
http://ss3.gemtalksystems.com/ss/Tabular.html

contains an application example of a SAX parser. You only pick what is
of interest.

On 8/14/15, Vincent Blondeau <[hidden email]> wrote:

>  Hi,
>
> Look at the class side, there is the method parse: namespace: validation: .
> call this method instead of parse: with false in the two last arguments. It
> should work.
>
> Anyway, you should use the sax parser. It is faster and memory less
> consuming. It is very simple to get only one tag.
>
> Cheers
> Vincent
>
> Le 14 août 2015 01:31, Alexandre Bergel <[hidden email]> a écrit :
>>
>> Hi!
>>
>> Together with Nicolas we are trying to get all the <script …> … </script>
>> from html files.
>> We have tried to use XMLDOMParser, but many webpages are actually not well
>> formed, therefore the parser is complaining.
>>
>> Anyone has tried to get some particular tags from HTML files? This looks
>> like a classical thing to do. Maybe some of you have done it.
>> Is there a way to configure the parser to accept a broken XML/HTML
>> content?
>>
>> Cheers,
>> Alexandre
>> --
>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
>> Alexandre Bergel  http://www.bergel.eu
>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>>
>>
>> _______________________________________________
>> Moose-dev mailing list
>> [hidden email]
>> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>