Is there any kind of HTML parser available for Dolphin (preferably,
something written in Smalltalk)? Thanks. Chris Hayes |
On Mon, 11 Aug 2003 13:30:30 +0000, Chris Hayes wrote:
> Is there any kind of HTML parser available for Dolphin (preferably, > something written in Smalltalk)? > > Thanks. > > Chris Hayes If you're willing to write your html as strict XHTML, (all tags closed, quoted attributes, etc) you could use the activex control, to get a DOM, eg IXMLDOMDocument new validateOnParse: false ; loadText: '<html></html>' ; lastChild It's not smalltalk but it works as advertised. I wrote a framework on top of that which converts the DOM into a set of Smalltalk HTML/XML classes. That's what I am using to work with HTML in Dolphin. If you're interested, you can download it at: http://www.reider.net/dolphinsmalltalk/free/software.html -alan r. |
In reply to this post by Chris Hayes-4
Chris,
Rumor control indicates that _some_ html can be parsed by an XML parser. FWIW, I have yet to see that work. Squeak has a nice HTML parser as part of Scamper, and I've used it a couple of times to find errors that I wasn't able to spot myself. IIRC, I had Squeak read the offending HTML from the clipboard, and display it in an object exporer. Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
In reply to this post by Chris Hayes-4
"Chris Hayes" <[hidden email]> wrote in message
news:WTMZa.6307$[hidden email]... > Is there any kind of HTML parser available for Dolphin (preferably, > something written in Smalltalk)? I have used ActiveX wrapping of Internet Explorer for this. I don't remember if I did it from MS Access or Dolphin, but it should work from Dolphin. I believe you can hide IE, so you can just use it as a parser. Chris |
In reply to this post by Bill Schwab-2
On Mon, 11 Aug 2003 12:37:41 -0500, Bill Schwab wrote:
> Chris, > > Rumor control indicates that _some_ html can be parsed by an XML parser. > FWIW, I have yet to see that work. I am doing it routinely in Dolphin for non-trivial web pages (using the activex XML parser (see my other response). It's been absolutely unobstrusive except when I forget to close or match up my tags properly. Then it puts out a very good error message. The one gotcha I hit was using a '<' (less than operator) in javascript. That got interpreted as the beginning of a tag. I just changed it to < and it went away, however I think the right way was to enclose the contents of the script in <!-- and --> (thinking to myself, so *that's* why they do that... :) |
In reply to this post by Bill Schwab-2
"Bill Schwab" <[hidden email]> wrote in message news:<bh8jvc$vcqgd$[hidden email]>...
> Chris, > > Rumor control indicates that _some_ html can be parsed by an XML parser. > FWIW, I have yet to see that work. Squeak has a nice HTML parser as part of > Scamper, and I've used it a couple of times to find errors that I wasn't > able to spot myself. IIRC, I had Squeak read the offending HTML from the > clipboard, and display it in an object exporer. There's also a stricter HTML parser (by which I mean that it will do exactly the right thing with conformant HTML 4, but may give unexpected results for broken markup) on SqueakMap: http://map1.squeakfoundation.org/sm/package/69778b2e-3884-4490-b18a-1b9a86a201ec It requires YAXO, which I think has already been ported to Dolphin, and shouldn't be too hard to port itself. |
Hi,
In doing the porting of "HTML-Parser" package to Dolphin, it seemed that it needed the framework of ProtoObject ( prototype object ? in http://russell-allen.com/squeak/prototypes/), ( for "HTMLDocument class", implementing "canContain: aNode" method, that would become aNode object ( XMLNodeWithElements ) at runtime), which is missing in Dolphin. Is there other workaround or I misunderstood the meaning? Best regards, Tk Kuo. "Avi Bryant" <[hidden email]>wrote: > "Bill Schwab" <[hidden email]> wrote in message news:<bh8jvc$vcqgd$[hidden email]>... > There's also a stricter HTML parser (by which I mean that it will do > exactly the right thing with conformant HTML 4, but may give > unexpected results for broken markup) on SqueakMap: > http://map1.squeakfoundation.org/sm/package/69778b2e-3884-4490-b18a-1b9a86a201ec > > It requires YAXO, which I think has already been ported to Dolphin, > and shouldn't be too hard to port itself. |
In reply to this post by Christopher J. Demers
Christopher J. Demers wrote:
> > Is there any kind of HTML parser available for Dolphin (preferably, > > something written in Smalltalk)? > > I have used ActiveX wrapping of Internet Explorer for this. I don't > remember if I did it from MS Access or Dolphin, but it should work > from Dolphin. I believe you can hide IE, so you can just use it as a > parser. I'd be a bit cautious about that approach. I think that, by default, the IE "parser" will also execute any Javascript, ActiveX, etc, that the page contains. MS give an example of how to change the default, but I, personally, would not trust MS to have got it right if I were considering using IE to parse HTML that might be hostile. It may well not have mattered for Chris's application, but it's an issue to bear in mind. (In a previous life I worked on an HTML parser and Javascript engine for use in a bulk scanning application. The likelyhood was high that we'd scan hostile (or accidentally nasty) HTML; it grew disheartening that I had to *keep* explaining that we couldn't possibly use the IE parser....) -- chris |
In reply to this post by Chris Hayes-4
Hey everyone,
Thanks (as always) for the many helpful responses! Hopefully, one of these approaches will do the trick. Regards, Chris Hayes |
In reply to this post by tgkuo
"kuo" <[hidden email]> wrote in message news:<[hidden email]>...
> Hi, > In doing the porting of "HTML-Parser" package to Dolphin, it seemed > that it needed the framework of ProtoObject ( prototype object ? in > http://russell-allen.com/squeak/prototypes/), ( for "HTMLDocument class", > implementing "canContain: aNode" method, that would become aNode object ( > XMLNodeWithElements ) at runtime), which is missing in Dolphin. > Is there other workaround or I misunderstood the meaning? It definitely doesn't need that framework. ProtoObject is the superclass of Object in Squeak - subclassing from ProtoObject is like subclassing from nil in VW. You also get subclasses of ProtoObject when you load code into Squeak which uses a missing superclass. My guess is that you loaded HTML-Parser without having YAXO loaded first, which is listed as a dependency. Any classes in the HTML-Parser that were subclasses of YAXO classes would show up as subclasses of ProtoObject instead. Avi |
Thanks for your instruction, it is really due to the missing YAXO.
Some unit tests failed after porting the Html-Parser codes, I don't know why. I still worked on it and it may take some time since I'm now restudying XML books in order to get more clear and complete pictures on it. The mechanism and theory underlying YAXO is quite messive, it needs to change, evolve and conformed continually with the XML ( W3C ) standards. I've visited its web site at http://www.squeaklet.com/Yax/index.html. I think I could get aquainted to it quickly if there are good tutorials, test files or examples available. Best regards, Tk Kuo "Avi Bryant" <[hidden email]> wrote: > "kuo" <[hidden email]> wrote in message news:<[hidden email]>... > > It definitely doesn't need that framework. ProtoObject is the > superclass of Object in Squeak - subclassing from ProtoObject is like > subclassing from nil in VW. You also get subclasses of ProtoObject > when you load code into Squeak which uses a missing superclass. > > My guess is that you loaded HTML-Parser without having YAXO loaded > first, which is listed as a dependency. Any classes in the > HTML-Parser that were subclasses of YAXO classes would show up as > subclasses of ProtoObject instead. > > Avi |
Free forum by Nabble | Edit this page |