posting web forms to external site via VW?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

posting web forms to external site via VW?

Rick Flower
Hi all..

After not looking much at any Smalltalk code for the most part since late
last summer, I'm getting back into it on behalf of my kids school.. I'm
trying to automate some tasks that are currently done manually and one of
those tasks is to visit a website and do the following:

   1) login via http/https (probably https)
   2) once in, click on a specific link (for report generation)
   3) fill in some fields indicating type of report
   4) click on link to get report
   5) suck up contents of report as a VW temp file or stream?
   6) parse contents (CSV) -- easy part.

I've got some initial code that can do the parsing mentioned in #6 but
am not sure what stuff I need to use in order to post web forms to a
3rd party site, get responses (processing if need be),etc..  Any ideas?

Is this sort of stuff error prone?  Obviously if the page gets updated
(divs renamed, objects renumered,etc) then we'll have a problem -- I've
not yet looked at the source for the offending page but am trying to see
if this is doable in VW and if so, what parcels I need to do the
offending tasks..  Thanks in advance!
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: posting web forms to external site via VW?

Mark Roberts
The NetClients package has code for performing GET and POST via
HTTP/HTTPS. You can then do things like:

     aURI readStreamDo: [:rs :meta | pageString := rs contents].

And:

     stream := String new writeStream.
     (HttpRequest post: 'http://localhost/xx/ValueOfFoo')
         addFormKey: 'foo' value: 'bar';
         addFormKey: 'file' value: 'myFile';
         writeOn: stream.
     stream contents.

A description of the API can be found in the Internet Client Developer's
Guide.

I don't know of any code out of the box that can parse the incoming HTML
into a DOM, so I've ended up using code like this:

     beginning := pageString indexOfSubCollection: '<title>' startingAt: 1.
     end :=  pageString indexOfSubCollection: '</title>' startingAt:
beginning.
     titleString := pageString copyFrom: beginning + 7 to: end - 1.

Others may have a more elegant solution.

One potential complication is if the web site uses any session state
encoded using cookies or hidden form elements. Your code may need to
detect those and include something suitable in the requests sent to the
web site.

HTH,

M. Roberts
Cincom Systems, Inc.

On 4/20/2010 6:18 AM, [hidden email] wrote:

> Hi all..
>
> After not looking much at any Smalltalk code for the most part since late
> last summer, I'm getting back into it on behalf of my kids school.. I'm
> trying to automate some tasks that are currently done manually and one of
> those tasks is to visit a website and do the following:
>
>     1) login via http/https (probably https)
>     2) once in, click on a specific link (for report generation)
>     3) fill in some fields indicating type of report
>     4) click on link to get report
>     5) suck up contents of report as a VW temp file or stream?
>     6) parse contents (CSV) -- easy part.
>
> I've got some initial code that can do the parsing mentioned in #6 but
> am not sure what stuff I need to use in order to post web forms to a
> 3rd party site, get responses (processing if need be),etc..  Any ideas?
>
> Is this sort of stuff error prone?  Obviously if the page gets updated
> (divs renamed, objects renumered,etc) then we'll have a problem -- I've
> not yet looked at the source for the offending page but am trying to see
> if this is doable in VW and if so, what parcels I need to do the
> offending tasks..  Thanks in advance!
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>    

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: posting web forms to external site via VW?

Henrik Sperre Johansen
In reply to this post by Rick Flower
  On 19.04.2010 23:18, [hidden email] wrote:

> Hi all..
>
> After not looking much at any Smalltalk code for the most part since late
> last summer, I'm getting back into it on behalf of my kids school.. I'm
> trying to automate some tasks that are currently done manually and one of
> those tasks is to visit a website and do the following:
>
>     1) login via http/https (probably https)
>     2) once in, click on a specific link (for report generation)
>     3) fill in some fields indicating type of report
>     4) click on link to get report
>     5) suck up contents of report as a VW temp file or stream?
>     6) parse contents (CSV) -- easy part.
>
> I've got some initial code that can do the parsing mentioned in #6 but
> am not sure what stuff I need to use in order to post web forms to a
> 3rd party site, get responses (processing if need be),etc..  Any ideas?
>
> Is this sort of stuff error prone?  Obviously if the page gets updated
> (divs renamed, objects renumered,etc) then we'll have a problem -- I've
> not yet looked at the source for the offending page but am trying to see
> if this is doable in VW and if so, what parcels I need to do the
> offending tasks..  Thanks in advance!
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Here's how I did it some time last year:
- Use a sniffer to collect the actual package data you need going back
and forth during a manual successful run-through of the procedure.
- Create requests based on the data for steps:
     - Login
     - Gathering available report type parameters
     - Requesting results with parameters as selected from the available
ones.

- Execute with HTTPClient, make sure you have certificates set up nicely
if you plan on using HTTPS*.

- Parse the HttpResponses for the data you need for the next steps.

A possible stumbling block in 7.7, is that when you set contents: to a
string, it will automatically change contentType to 'text', so if the
server expects a custom mime-type like
'application/x-www-form-urlencoded', you need to set this AFTER setting
request contents.

The prereqs for said package I made were:
- HTTPS
- X509 (To check certificate of site I connected to)
- ASN1-Support (To serialize the "valid" X509-certificates in a string
in the image)

All in all, it was easier than I expected :D

Cheers,
Henry

*Speaking of which, is there any way to set up certificates for a client
executing multiple https-requests? I couldn't find a way to make them
"stick" between requests, and had to resort to ugly code like:
[client executeRequest: anHttpsRequest]
                 on: Security.SSLBadCertificate
                 do:
                     [:error |

                     "Setting certificate authentification  correctly up
front seems quite impossible when using HttpClient to execute https
requests... "
                     (error originator
                         trustedCertificateMatching: self
trustedRootCertificate subjectDNInBytes)
                             ifNil:
                                 [error originator addTrusted: self
trustedRootCertificate.
                                 error restart]
                             ifNotNil: [error raise]]

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: posting web forms to external site via VW?

kobetic
In reply to this post by Rick Flower
"Henrik Sperre Johansen"<[hidden email]> wrote:
g of which, is there any way to set up certificates for a client

> executing multiple https-requests? I couldn't find a way to make them
> "stick" between requests, and had to resort to ugly code like:
> [client executeRequest: anHttpsRequest]
>                  on: Security.SSLBadCertificate
>                  do:
>                      [:error |
>
>                      "Setting certificate authentification  correctly up
> front seems quite impossible when using HttpClient to execute https
> requests... "
>                      (error originator
>                          trustedCertificateMatching: self
> trustedRootCertificate subjectDNInBytes)
>                              ifNil:
>                                  [error originator addTrusted: self
> trustedRootCertificate.
>                                  error restart]
>                              ifNotNil: [error raise]]

You have 2 options:

1) Either you preload the CA certificate into the global default registry:

        X509Registry default addTrusted: trustedRootCertificate

If you don't configure different registry explicitly, the default is used instead.

2) If you'd rather not share the registry between different applications in your image, you can use private registries for each. To set up a  client with its own private registry you need to do something like this:

        registry := X509Registry new addTrusted: trustedRootCertificate.
        anHttpClient sslContext: (SSLContext newWithSecureCipherSuitesUsing: registry)

HTH,

Martin
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: posting web forms to external site via VW?

Henrik Sperre Johansen
  On 20.04.2010 00:36, [hidden email] wrote:

> "Henrik Sperre Johansen"<[hidden email]>  wrote:
> g of which, is there any way to set up certificates for a client
>> executing multiple https-requests?
> You have 2 options:
>
> 1) Either you preload the CA certificate into the global default registry:
>
> X509Registry default addTrusted: trustedRootCertificate
>
> If you don't configure different registry explicitly, the default is used instead.
>
> 2) If you'd rather not share the registry between different applications in your image, you can use private registries for each. To set up a  client with its own private registry you need to do something like this:
>
> registry := X509Registry new addTrusted: trustedRootCertificate.
> anHttpClient sslContext: (SSLContext newWithSecureCipherSuitesUsing: registry)
>
> HTH,
>
> Martin
Thanks! I got as far as the 2nd part of line 2 in 2) following the
Security Guide,  but the rest of what I read seemed geared on either
using the context in error handlers, or using the connection creation
methods of the context.
Should've thought to look at the protocol of NetClient as well as
HttpClient, couldn't quite figure out how you were supposed to set it up
front following the logic in a debugger :)

Cheers,
Henry

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: posting web forms to external site via VW?

Steven Kelly
In reply to this post by Rick Flower
> I don't know of any code out of the box that can parse the incoming
> HTML
> into a DOM, so I've ended up using code like this:
>
>      beginning := pageString indexOfSubCollection: '<title>'
startingAt:
> 1.
>      end :=  pageString indexOfSubCollection: '</title>' startingAt:
> beginning.
>      titleString := pageString copyFrom: beginning + 7 to: end - 1.
>
> Others may have a more elegant solution.

I'm not sure any solution to do with HTML "in the wild" can be elegant,
but I've preferred using streams to strings and indices:

aStream skipThroughAll: '<title>'.
titleString := aStream upToAndSkipThroughAll: '</title>'.

There's some reasonably robust code for parsing HTML in the Webtalk
package in the public repository. The Webtalk class there is a simple
parser for extracting speakable text and links from HTML. It throws most
of the information away, but it should be easy enough to modify
#parseHtml to make it grab what you want. (If you publish a new version,
please use a blessing < Development, otherwise a few blind Finnish
people will find their browser updates to your version! I hadn't thought
the market for this package would be large enough to interest anyone
else :->)

You could also consider using the HTML Tidy library,
http://tidy.sourceforge.net/, if only to preprocess the HTML to make
parsing it easier. I thought BottomFeeder used that, but I'm not seeing
a package for it in the public repository. BottomFeeder also has a
TolerantXML-Parser, which might work for a particular HTML page
(presumably providing it's written in a modern XML-like style).

Wasn't there also a unit testing framework for HTML forms? That might be
useful.

Steve

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: posting web forms to external site via VW?

kobetic
In reply to this post by Rick Flower
"Steven Kelly"<[hidden email]> wrote:
> Wasn't there also a unit testing framework for HTML forms? That might be
> useful.

The seaside/SUnitToo-Seaside parcel does that. Check out the parcel comment.

HTH,

Martin
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: posting web forms to external site via VW?

Rick Flower
On Tue, 20 Apr 2010 10:06:11 -0400, [hidden email] wrote:
> "Steven Kelly"<[hidden email]> wrote:
>> Wasn't there also a unit testing framework for HTML forms? That might
be
>> useful.
>
> The seaside/SUnitToo-Seaside parcel does that. Check out the parcel
> comment.

Thanks all for the great suggestions!  I'll check them out once I get
my VW7.7NC up and running and loaded with my code again..

-- Rick
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc