htmlcssparser package/discovering size of objects

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

htmlcssparser package/discovering size of objects

Marcin Tustin
Hello everyone, a slightly involved and multi-part question:
I'm using the package at http://www.squeaksource.com/htmlcssparser (HTML/CSS Parser, or "the parser") to scrape multiple pages (in fact about two or three a day, and about a thousand existing pages), so I can extract parts of them to put into an rss feed. If I let the root object for a parse (the Validator's dom object) be garbage collected, none of the rest of the parse tree really works (because then other objects only referred to weakly get collected, AFAICT).

So, my first question is whether there's a way to assess what kind of memory overhead there would be for keeping each of these objects hanging around indefinitely.
My second is whether anyone has any advice for another way to do it - by using a different parser, or by copying the data into a different structure somehow, or something else.

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside