Hi all,
I'm using the IE WebBrowser control (Shell.Explorer.2) in a HTML parser wrapper class. My implementation was rather simple, creating a new webbrowser instance for each parser instance. But when I was creating multiple parser instances (at the order of 1000s), at a specific point, "Failed to open walkback dialog" would be printed in the Transcript and I had the trace below in the .errors file. When I changed the implementation to share a single webbrowser instance, it ran ok (whew!). I was wondering if this indicated an exhaustion of window handles? Although using a singleton works, I was hoping to solve this (limitations like multithreads can come back and haunt me). I tried overriding #finalize in the parser class to send AXControlSite>>free, but it seems that #finalize is not called for my parser class. I tried doing (MemoryManager current) collectGarbage; administerLastRites as described (by Blair) in an older post but still #finalize doesn't get called. Did I miss out anything? Just to test it out, I explictly sent #free to all the webbrowser instances but the problem still persist. Any idea? Thanks in advance. =================== 9:46:50 PM, Sunday, September 05, 2004: Unhandled exception - a Win32Error('Unable to realize a CommandMenuItem(&Copy) (16r579: Invalid menu handle.)') CommandMenuItem(MenuItem)>>insertIntoMenu:at:info: [] in Menu>>basicRealize Array(SequenceableCollection)>>uncheckedFrom:to:keysAndValuesDo: Array(SequenceableCollection)>>from:to:keysAndValuesDo: Array(SequenceableCollection)>>from:keysAndValuesDo: Array(SequenceableCollection)>>keysAndValuesDo: Menu>>basicRealize Menu(GraphicsTool)>>realize Menu(GraphicsTool)>>handle Menu(GraphicsTool)>>asParameter Menu>>showIn:position: RichTextEdit(View)>>trackContextMenu: RichTextEdit(View)>>wmContextMenu:wParam:lParam: RichTextEdit(View)>>dispatchMessage:wParam:lParam: [] in InputState>>wndProc:message:wParam:lParam:cookie: BlockClosure>>ifCurtailed: ProcessorScheduler>>callback:evaluate: InputState>>wndProc:message:wParam:lParam:cookie: RichTextEdit(View)>>sendMessage:wParam:lParam: RichTextEdit>>onRightButtonReleased: =================== -- Regards Hwee Boon MotionObj |
Hwee Boon,
> I'm using the IE WebBrowser control (Shell.Explorer.2) in a HTML parser > wrapper class. My implementation was rather simple, creating a new > webbrowser instance for each parser instance. But when I was creating > multiple parser instances (at the order of 1000s), at a specific point, > "Failed to open walkback dialog" would be printed in the Transcript and I > had the trace below in the .errors file. When I changed the implementation > to share a single webbrowser instance, it ran ok (whew!). I was wondering > if this indicated an exhaustion of window handles? Seems reasonable - it's probably an exhaustion of something. > Although using a singleton works, I was hoping to solve this (limitations > like multithreads can come back and haunt me). I doubt threads will be of much you to you. IIRC, you cannot overlap ActiveX calls. > I tried overriding > #finalize in the parser class to send AXControlSite>>free, but it seems > that #finalize is not called for my parser class. See #beFinalizeable. However, I doubt finalization is going to help you. Andreas Raab has been quite vocal (and convincing) on this topic on the Squeak mailing list, and I am forced to agree based on my own experience. Finalization is a nice safety net, espeically for casual experiments in workspaces, but there is no substitute for explicitly releasing things you no longer need, at least when large number of expensive external resources are used for short periods of time. I went around with this over ODBC cursors for quite some time. > Just to test it out, I explictly sent #free to all the webbrowser > instances but the problem still persist. Any idea? Thanks in advance. It's probably too late, given that you have so many of them created. Try explicitly freeing them as you no longer need them. Have you looked at Squeak's HTML parser? I've used it (in Squeak) to see how errant HTML was parsed, and it worked well. It might not be too hard to port, after which you would have a parser under your own control, and would therefore be able to take advantage of background threads. Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
On Sun, 5 Sep 2004 14:24:13 -0400, Bill Schwab
<[hidden email]> wrote: >> Although using a singleton works, I was hoping to solve this >> (limitations >> like multithreads can come back and haunt me). > > I doubt threads will be of much you to you. IIRC, you cannot overlap > ActiveX calls. What I meant was, I'd be in trouble if somehow I allowed more than 1 thread to create a parser instance since they are sharing a single underlying webbrowser. Although I'm definitely not intending to do anything like that, I will be happy if there's a nicer way out :) > See #beFinalizeable. However, I doubt finalization is going to help you. Aha! Thanks, didn't realise objects have to be explicitly marked finalizable. > Andreas Raab has been quite vocal (and convincing) on this topic on the > Squeak mailing list, and I am forced to agree based on my own experience. > Finalization is a nice safety net, espeically for casual experiments in > workspaces, but there is no substitute for explicitly releasing things > you > no longer need, at least when large number of expensive external > resources > are used for short periods of time. I went around with this over ODBC > cursors for quite some time. I did consider this too, see below. >> Just to test it out, I explictly sent #free to all the webbrowser >> instances but the problem still persist. Any idea? Thanks in advance. > > It's probably too late, given that you have so many of them created. Try > explicitly freeing them as you no longer need them. Actually, what I did was to send #free to each AXControSite instance right after I created/used it, in the same method. I assume this is what you mean? But it does seem like it is not helping, since the code is still failing at the same point. > Have you looked at Squeak's HTML parser? I've used it (in Squeak) to see > how errant HTML was parsed, and it worked well. It might not be too > hard to > port, after which you would have a parser under your own control, and > would > therefore be able to take advantage of background threads. I'll take a look. But my preference would be for the webbrowser control though, because I'm already using it in the same application for editing HTML, so however convoluted it maybe, at least it has the advantage of being compatible (ie. in it being able to parse the very same horrible HTML it generated). Perhaps I can try porting it in v2.0 of my app :) -- Regards Hwee Boon MotionObj |
Hwee Boon,
> I'll take a look. But my preference would be for the webbrowser control > though, because I'm already using it in the same application for editing > HTML, so however convoluted it maybe, at least it has the advantage of > being compatible (ie. in it being able to parse the very same horrible > HTML it generated). Are you certain of that? That's a sincere caution based on experience with rich text that didn't make a round trip using nothing but Microsoft's own code. Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
On Mon, 6 Sep 2004 01:10:44 -0400, Bill Schwab
<[hidden email]> wrote: > Hwee Boon, > >> I'll take a look. But my preference would be for the webbrowser control >> though, because I'm already using it in the same application for editing >> HTML, so however convoluted it maybe, at least it has the advantage of >> being compatible (ie. in it being able to parse the very same horrible >> HTML it generated). > > Are you certain of that? That's a sincere caution based on experience > with > rich text that didn't make a round trip using nothing but Microsoft's own > code. Thanks. The webbrowser control has been doing ok (with its share of quirks of course) as a parser for me _so far_. I hope it stays that way, one reason I appear more hopeful of it compared to, say your rich text example, is because it is notorious for being very forgiving of bad HTML. I plan to investigate how I can replace the webbrowser control for both parsing and editing in a future release. It is good for a fast start, but it gets more torturing as I use it :( -- Regards Hwee Boon MotionObj |
In reply to this post by Bill Schwab-2
Bill,
> Are you certain of that? That's a sincere caution based on > experience with rich text that didn't make a round trip using nothing > but Microsoft's own code. FWIW, all versions of the RichEdit control after 1.0 (possible 2.0, I haven't got the docs handy) will round trip correctly. It's one of the highlighted selling points :-) -- Ian Use the Reply-To address to contact me. Mail sent to the From address is ignored. |
In reply to this post by Bill Schwab-2
"Bill Schwab" <[hidden email]> wrote in message
news:[hidden email]... > Have you looked at Squeak's HTML parser? I've used it (in Squeak) to see > how errant HTML was parsed, and it worked well. It might not be too hard to > port, after which you would have a parser under your own control, and would > therefore be able to take advantage of background threads. > If anyone is interested, I have just done a partial port of the Squeak HTML parser to Dolphin. Quote from the package comment: "It is limited to the facilities that I needed for immediate work and found (fairly) easy to transfer. It will, as far as I have been able to check, successfully input and parse a stream on an HTML document, generating an instance of HtmlDocument which consists of instances of appropriate subclasses of HtmlEntity; in the cases I have checked the parse matches that from Squeak. It will *not* generate formatted text output from the parse; the required class HtmlFormatter and its subclass DHtmlFormatter are present but do not work. All the broken methods are present, in case anyone else feels inclined to take the work further. In consequence, installing this package will produce a long string of error messages on the Transcript. The port has been done in a way which produces a working system with minimum effort." I don't know whether such a half-completed project is of general interest, but if so I am willing to make it available. I do not intend to take it further myself, since it does all I need at present. It represented about a day's work, but saving that work could be useful. I don't have a working web site, so e-mail me ([hidden email]) if you want it. Usual caveats, of course - it's all experimental, no guarantees of anything, use it at your own risk. Peter Kenny |
Peter,
> If anyone is interested, I have just done a partial port of the Squeak HTML > parser to Dolphin. Quote from the package comment: > > "It is limited to the facilities that I needed for immediate work and found > (fairly) easy to transfer. It will, as far as I have been able to check, > successfully input and parse a stream on an HTML document, generating an > instance of HtmlDocument which consists of instances of appropriate > subclasses of HtmlEntity; in the cases I have checked the parse matches that > from Squeak. An interesting project would be to use SIXX to move Squeak parsings into Dolphin for comparison. > I don't know whether such a half-completed project is of general interest, Microsoft seems to do fairly well with half-completed projects :) > but if so I am willing to make it available. I do not intend to take it > further myself, since it does all I need at present. It represented about a > day's work, but saving that work could be useful. I don't have a working > web site, so e-mail me ([hidden email]) if you want it. Usual > caveats, of course - it's all experimental, no guarantees of anything, use > it at your own risk. Beyond saving work, this strikes me as the kind of effort that is especially likely to benefit from a growing set of unit tests, and has a logical way to add to the tests. By that I mean, it does something wrong, somebody finds it, makes a new test method, fixes the problem, and benefits from tests accumulated by others. That will work only if the tests are shared along with the code. I am willing to put it on my web site (labeled as your work of course), but this seems like a good community project, either via Source Forge or Camp Smalltalk. Ideally, it would be packaged for multiple dialects. With luck, it might even apply pressure on Squeak to correctly handle stream exhaustion - one of my long-term goals. The Squeak maintainer(s) might be willing (or even prefer) to package the code in a way that helps us port new versions to Dolphin. Contacting them is probably the first step. Any takers? If not, let's at least get the package on the web. Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
> > If anyone is interested, I have just done a partial port of the Squeak
> HTML > > parser to Dolphin. Further to my previous post, I have done some more tests. I have also taken up Bill's offer to mount the package on his web site, but asked him to hold fire until my tests are complete. My tests confirm so far that my port does reproduce accurately the behaviour of the Squeak parser. However, some of that behaviour does not look very sensible, so I am not sure how far it is worth while pursuing the port idea. The first problem is due to Squeak using a different character set, which means that the HTML special characters are converted to something other than intended when displayed. The first thing I saw was that the code came out as the character Ê rather than the expected space; when I looked at the parse of a page from a German newspaper there were mistakes on many accented characters. The fix for this is simple - delete the call to the method Character>>#isoToSqueak. Much more serious is that the parse can make a complete nonsense of the structure of a web page, because it has built in a test of whether entity A is allowed to contain entity B, and some of the exclusions are of embeddings which occur all over the place in real web pages. For example, in a table cell (tag <td>) the possible contents do not include another table, but web page designers use this all the time to control positioning. Again, in a <table> entity the parser says the next level must be a <tr>, but this is not always so. The problem is compounded because, when the parser finds an inclusion which it thinks is not permitted, it removes items from its parse stack until it finds one which does allow this inclusion; this in effect terminates the parse of the outer entity, and can leave later elements floating around unconnected (e.g. leaving <tr> or <td> entities which are not sub-entities of any <table>). I am not sure where to go from here. I could re-think and re-code the entire logic of the parser, or just extend the permitted range of inclusions (as I have already done for a <table> inside a <td>), or give up altogether and try to use the MSIE parser. Like Bill, I like the idea of having a parser under my own control, but I wonder if the effort is worth it. As a first step, I shall run the parser on a wider variety of web pages with some instrumentation to discover what effect the 'illegal inclusion' test is having. I shall report back if anyone is interested. Peter Kenny |
Peter Kenny wrote:
> I am not sure where to go from here. I could re-think and re-code the > entire logic of the parser, or just extend the permitted range of > inclusions (as I have already done for a <table> inside a <td>), or give > up altogether and try to use the MSIE parser. Like Bill, I like the idea > of having a parser under my own control, but I wonder if the effort is > worth it. As a first step, I shall run the parser on a wider variety of > web pages with some instrumentation to discover what effect the 'illegal > inclusion' test is having. I shall report back if anyone is interested. [random thoughts] I'd also like a simple, safe, reliable, lightweight HTML parser and renderer -- but I've pretty-much given up on ever finding one :-( The parser's no great problem, I've written one before, and can do it again. But rendering is the tricky bit. Malformed HTML is a major pain. Unfortunately it's ubiquitous. Two things to look out for that are /very/ common on real web pages are unclosed > and unclosed ". If you want to be able to handle "real" HTML then you /have/ to use a very forgiving parser. Character sets and page encodings are a problem too. Difficult to deal with in an 8-bit system like Dolphin. Not a problem I have to worry about for my purposes, though (Hurrah!). There is no way on God's earth that I'd use a Microsoft supplied HTML parser or renderer for any HTML that I hadn't written myself. The security holes in their code have been (and still are) so huge and so pervasive, and have taken so long to fix, that I can't imagine that code base /ever/ being trustworthy. Then, too, I suspect their ongoing attempts to "fix" the security, whilst remaining unsuccessful, are quite likely to break my applications... -- chris |
In reply to this post by Peter Kenny-2
Peter,
> The first problem is due to Squeak using a different character set, which > means that the HTML special characters are converted to something other than > intended when displayed. The first thing I saw was that the code came > out as the character Ê rather than the expected space; when I looked at the > parse of a page from a German newspaper there were mistakes on many accented > characters. The fix for this is simple - delete the call to the method > Character>>#isoToSqueak. Hopefully the upcoming unicode changes will take care of it in Squeak, but that might complicate a port. > I am not sure where to go from here. I could re-think and re-code the entire > logic of the parser, or just extend the permitted range of inclusions (as I > have already done for a <table> inside a <td>), Clearly that is more than a port. Again, it sounds like there is value in a cross-dialect project. > or give up altogether and > try to use the MSIE parser. Like Bill, I like the idea of having a parser > under my own control, but I wonder if the effort is worth it. As a first > step, I shall run the parser on a wider variety of web pages with some > instrumentation to discover what effect the 'illegal inclusion' test is > having. I shall report back if anyone is interested. That sounds great. Chris made some excellent points about malformed HTML, and about the risks associated with running arbitrary HTML through Microsoft code; a Smalltalk parser sounds ever more complicated and valuable. As an alternative, have you considered Mozilla? I have no idea whether or not they export their parser, but it might be worth a look. Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
In reply to this post by Yar Hwee Boon-3
"Yar Hwee Boon" <[hidden email]> wrote in message
news:[hidden email]... > Hi all, > > I'm using the IE WebBrowser control (Shell.Explorer.2) in a HTML parser > wrapper class. My implementation was rather simple, creating a new > webbrowser instance for each parser instance. But when I was creating > multiple parser instances (at the order of 1000s), at a specific point, > "Failed to open walkback dialog" would be printed in the Transcript and I > had the trace below in the .errors file. When I changed the implementation > to share a single webbrowser instance, it ran ok (whew!). I was wondering > if this indicated an exhaustion of window handles? > > Although using a singleton works, I was hoping to solve this (limitations > like multithreads can come back and haunt me). I tried overriding > #finalize in the parser class to send AXControlSite>>free, but it seems > that #finalize is not called for my parser class. I tried doing > (MemoryManager current) collectGarbage; administerLastRites as described > (by Blair) in an older post but still #finalize doesn't get called. Did I > miss out anything? > > Just to test it out, I explictly sent #free to all the webbrowser > instances but the problem still persist. Any idea? Thanks in advance. Well AXControlSite's are windows, and they need to be closed/destroyed explicitly to free up resources, they are not cleaned up by finalisation. #free should release the contained control, but it will not release the control site itself. Anyway, regardless of that you should avoid using a visual control where a non-visual alternative is available and you don't need the visuals. In this case just use the HTML DOM directly, e.g. dom := IHTMLDocument2 createObject: 'htmlfile'. dom write: #('<HTML><BODY>test</BODY></HTML>'). dom body outerHTML. This is assuming you have built the interfaces using the AX Component wizard (listed as the 'Microsoft HTML Object Library' in the registered components list). Full documentation for the (very large) object model is available on MSDN. It it easy to relate the generated methods to the C++ help. Regards Blair |
On Tue, 14 Sep 2004 13:29:39 +0100, Blair McGlashan <[hidden email]> wrote:
> Well AXControlSite's are windows, and they need to be closed/destroyed > explicitly to free up resources, they are not cleaned up by finalisation. > #free should release the contained control, but it will not release the > control site itself. Ahh.. are you saying that #free does release the controls, but the problem would have went away if I had also sent #close? > Anyway, regardless of that you should avoid using a visual control where > a > non-visual alternative is available and you don't need the visuals. In > this > case just use the HTML DOM directly, e.g. > > dom := IHTMLDocument2 createObject: 'htmlfile'. > dom write: #('<HTML><BODY>test</BODY></HTML>'). > dom body outerHTML. Thanks, it never occurred to me to use the underlying DOM directly... -- Regards Hwee Boon MotionObj |
Free forum by Nabble | Edit this page |