Hello everyone, a slightly involved and multi-part question:
I'm using the package at http://www.squeaksource.com/htmlcssparser (HTML/CSS Parser, or "the parser") to scrape multiple pages (in fact about two or three a day, and about a thousand existing pages), so I can extract parts of them to put into an rss feed. If I let the root object for a parse (the Validator's dom object) be garbage collected, none of the rest of the parse tree really works (because then other objects only referred to weakly get collected, AFAICT). So, my first question is whether there's a way to assess what kind of memory overhead there would be for keeping each of these objects hanging around indefinitely. My second is whether anyone has any advice for another way to do it - by using a different parser, or by copying the data into a different structure somehow, or something else. _______________________________________________ Beginners mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/beginners |
Well, as the perpetrator of that bit of hackery, I can certainly explain why it gets broken if you let the head object go away.
A node knows its parent through a weak reference, and its offset/length in the original parsed string. The top object owns the parsed string. When a node tries to print itself it traverses the parents to get the original text buffer and then takes the appropriate substring out of it and prints that. This was really useful during debugging since I could see exactly what hunk of text each node thought it represented (especially since the nodes parse themselves). Reprinting the document should reproduce the original text buffer or something is wrong somewhere. So that makes for a cheap and cheerful integrity check. Anyhow, it is possible that making the parent weak was perhaps not a great choice but it was meant to make some DOM editing operations easier in the future (anticipating possible javascript integration). Two fixes/workarounds. Either never let go of the root, or change the parent code in parsed node to use strong references. It amounts to the same thing. On Jul 30, 2008, at 7:38 AM, Marcin Tustin wrote:
_______________________________________________ Beginners mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/beginners |
Free forum by Nabble | Edit this page |