I've released the underlying technology behind
http://www.badpage.info and placed it on squeaksource.
Project Description
This is an HTML and CSS parser and DOM that handles rotten HTML and broken CSS quite well. I wrote it to provide validation of web pages and it is the underlying technology behind http://www.badpage.info. The tag nesting and attribute rules are determined by interpreting the DTD's at the W3C. Hopefully this will make it fairly future proof. The CSS parser understands most of CSS 2 and some CSS 3 and the CSS selectors can tell if they match a DOM node. There is no visual rendering and no calculation of layout.
I hearby license it free for almost any use with the understanding that it may not be used to provide website QA software or services such as might compete with http://badpage.info.
Otherwise, do whatever you like with it. I think it would make a dandy base for a real web browser. I also find it quite useful for scraping web pages. -----
SqueakMap is not presently responding to requests to send me a new password and I can't remember my old one. When it regains its senses, I'll put it up there as well.
-Todd Blanchard