Can the IWebBrowser control wrapper we used to parse the HTML DOM?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Can the IWebBrowser control wrapper we used to parse the HTML DOM?

John Small
Hi,

I wanted to access a web page programmatically but I'm not sure how
to do this.

    IWebBrowser2 new navigate: 'www.google.com'; document

returns invalid call on

    IWebBrowser2>>navigate:

I tried to see how the URLPresenter set up this control to an external
address.  Is there sample code somewhere in the hierarchy from which
I could cook book a solution.

Is there a better way to parse and html document?

Thanks!

John


Reply | Threaded
Open this post in threaded view
|

Re: Can the IWebBrowser control wrapper we used to parse the HTML DOM?

Ian Bartholomew-17
John,

> I wanted to access a web page programmatically but I'm not sure how
> to do this.
[]
> Is there a better way to parse and html document?

I'm not sure what you mean by "parse" here. If you mean to want to get the
textual contents of the page you can use (with error block as I think it's
safest on a web link) ...

iStream := [IStream onURL: 'http://www.google.com']
    on: Error
    do: [:error | ByteArray new readStream].
iStream contents asString

If you just want to display the html page ....

URLPresenter showOn: 'http://www.google.com'

If you want a parsed representation of the page then I'm afraid I can't
help.

Regards
Ian


Reply | Threaded
Open this post in threaded view
|

Re: Can the IWebBrowser control wrapper we used to parse the HTML DOM?

Blair McGlashan
In reply to this post by John Small
John

You wrote in message news:[hidden email]...

> Hi,
>
> I wanted to access a web page programmatically but I'm not sure how
> to do this.
>
>     IWebBrowser2 new navigate: 'www.google.com'; document
>
> returns invalid call on
>
>     IWebBrowser2>>navigate:
>
> I tried to see how the URLPresenter set up this control to an external
> address.  Is there sample code somewhere in the hierarchy from which
> I could cook book a solution.

Just set the value of the model associated with the URLPresenter to the URL
you want to visit. For example:

    ie := URLPresenter show.
    ie value: 'www.google.com'.

As for accessing the documnt, well the IE control is asynchronous, so if you
attempt to access the document immediately after the navigation request you
will more than likely get back nil. You need to pause until the document is
available; there are lots of ways to do this, including listening for the
events, probably the #DocumentComplete: event, though you'd need to check
MSDN for further details about the events sent by the browser control. BTW:
The selectors of the events triggered can be found by evaluating an
expression such as:

    ie view sink eventNames

Or for a little more detail:

    ie view sink sourceTypeInfo printIDL

You may find it simpler, however, to just poll the control in a loop until
the download is complete. For example:

    [ie view controlDispatch readState = READYSTATE_COMPLETE] whileFalse:
[SessionManager inputState pumpMessages].

You may need to disable parts of your applications user interface (if it has
one) while the document is downloading, as otherwise it will be "live". Note
that it is even necessary to wait for the initial about:blank document to
load, which is useful if one wants to push generated HTML directly into the
control:

When the document has loaded you can access the document object model:

    ie view controlDispatch document

The result will be an IDispatch onto the HTMLDocument object. Any access to
the objects properties, or invocation of its methods, will have to go
through #doesNotUnderstand: or explicit calls via #invoke:etc #getProperty:
#setProperty:etc. If you are contemplating anything more than simple
manipulation of the DOM, you may want to generate interfaces for at least
part of the object model using the AX component wizard. Be warned, however,
that the full HTML DOM is huge. This is not particularly an issue in a
deployed app., because the stripper will be able to remove most of the
unused interfaces, but you may find it significantly bloats your development
image.

> Is there a better way to parse and html document?

If it is well formed, then the XML parser will do it, but that is a big
'If'. Maybe there is something else available from (for example)
dolphinharbour.org?

Regards

Blair


Reply | Threaded
Open this post in threaded view
|

Re: Can the IWebBrowser control wrapper we used to parse the HTML DOM?

Costas
Blair,

How do I get the IE control to display its menubar and address bar? In
the view of the control they are already set to true but they don't
show.


 ie := URLPresenter show.an URLPresenter
   ie value: 'www.google.com'.a

ie view controlDispatch AddressBar  ---> true

Regards,

Costas

On Mon, 22 Jul 2002 11:29:14 +0100, "Blair McGlashan"
<[hidden email]> wrote:

>John
>
>You wrote in message news:[hidden email]...
>> Hi,
>>
>> I wanted to access a web page programmatically but I'm not sure how
>> to do this.
>>
>>     IWebBrowser2 new navigate: 'www.google.com'; document
>>
>> returns invalid call on
>>
>>     IWebBrowser2>>navigate:
>>
>> I tried to see how the URLPresenter set up this control to an external
>> address.  Is there sample code somewhere in the hierarchy from which
>> I could cook book a solution.
>
>Just set the value of the model associated with the URLPresenter to the URL
>you want to visit. For example:
>
>    ie := URLPresenter show.
>    ie value: 'www.google.com'.
>
>As for accessing the documnt, well the IE control is asynchronous, so if you
>attempt to access the document immediately after the navigation request you
>will more than likely get back nil. You need to pause until the document is
>available; there are lots of ways to do this, including listening for the
>events, probably the #DocumentComplete: event, though you'd need to check
>MSDN for further details about the events sent by the browser control. BTW:
>The selectors of the events triggered can be found by evaluating an
>expression such as:
>
>    ie view sink eventNames
>
>Or for a little more detail:
>
>    ie view sink sourceTypeInfo printIDL
>
>You may find it simpler, however, to just poll the control in a loop until
>the download is complete. For example:
>
>    [ie view controlDispatch readState = READYSTATE_COMPLETE] whileFalse:
>[SessionManager inputState pumpMessages].
>
>You may need to disable parts of your applications user interface (if it has
>one) while the document is downloading, as otherwise it will be "live". Note
>that it is even necessary to wait for the initial about:blank document to
>load, which is useful if one wants to push generated HTML directly into the
>control:
>
>When the document has loaded you can access the document object model:
>
>    ie view controlDispatch document
>
>The result will be an IDispatch onto the HTMLDocument object. Any access to
>the objects properties, or invocation of its methods, will have to go
>through #doesNotUnderstand: or explicit calls via #invoke:etc #getProperty:
>#setProperty:etc. If you are contemplating anything more than simple
>manipulation of the DOM, you may want to generate interfaces for at least
>part of the object model using the AX component wizard. Be warned, however,
>that the full HTML DOM is huge. This is not particularly an issue in a
>deployed app., because the stripper will be able to remove most of the
>unused interfaces, but you may find it significantly bloats your development
>image.
>
>> Is there a better way to parse and html document?
>
>If it is well formed, then the XML parser will do it, but that is a big
>'If'. Maybe there is something else available from (for example)
>dolphinharbour.org?
>
>Regards
>
>Blair
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Can the IWebBrowser control wrapper we used to parse the HTML DOM?

Blair McGlashan
"Costas" <[hidden email]> wrote in message
news:[hidden email]...
> Blair,
>
> How do I get the IE control to display its menubar and address bar? In
> the view of the control they are already set to true but they don't
> show.
> ...

1) Use Google to do a search for "AddressBar Property"
2) Choose the first search result (an MSDN help page)
3) Note the following comment in the remarks:
        "The WebBrowser object ignores this property"

Similarly for MenuBar. It it is the IE browser application, rather than the
IE browser control, which implements the various control bars. I believe it
is possible to automate the entire IE application, but only as a separate
top-level window.

Regards

Blair