Hi all..
After not looking much at any Smalltalk code for the most part since late last summer, I'm getting back into it on behalf of my kids school.. I'm trying to automate some tasks that are currently done manually and one of those tasks is to visit a website and do the following: 1) login via http/https (probably https) 2) once in, click on a specific link (for report generation) 3) fill in some fields indicating type of report 4) click on link to get report 5) suck up contents of report as a VW temp file or stream? 6) parse contents (CSV) -- easy part. I've got some initial code that can do the parsing mentioned in #6 but am not sure what stuff I need to use in order to post web forms to a 3rd party site, get responses (processing if need be),etc.. Any ideas? Is this sort of stuff error prone? Obviously if the page gets updated (divs renamed, objects renumered,etc) then we'll have a problem -- I've not yet looked at the source for the offending page but am trying to see if this is doable in VW and if so, what parcels I need to do the offending tasks.. Thanks in advance! _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
The NetClients package has code for performing GET and POST via
HTTP/HTTPS. You can then do things like: aURI readStreamDo: [:rs :meta | pageString := rs contents]. And: stream := String new writeStream. (HttpRequest post: 'http://localhost/xx/ValueOfFoo') addFormKey: 'foo' value: 'bar'; addFormKey: 'file' value: 'myFile'; writeOn: stream. stream contents. A description of the API can be found in the Internet Client Developer's Guide. I don't know of any code out of the box that can parse the incoming HTML into a DOM, so I've ended up using code like this: beginning := pageString indexOfSubCollection: '<title>' startingAt: 1. end := pageString indexOfSubCollection: '</title>' startingAt: beginning. titleString := pageString copyFrom: beginning + 7 to: end - 1. Others may have a more elegant solution. One potential complication is if the web site uses any session state encoded using cookies or hidden form elements. Your code may need to detect those and include something suitable in the requests sent to the web site. HTH, M. Roberts Cincom Systems, Inc. On 4/20/2010 6:18 AM, [hidden email] wrote: > Hi all.. > > After not looking much at any Smalltalk code for the most part since late > last summer, I'm getting back into it on behalf of my kids school.. I'm > trying to automate some tasks that are currently done manually and one of > those tasks is to visit a website and do the following: > > 1) login via http/https (probably https) > 2) once in, click on a specific link (for report generation) > 3) fill in some fields indicating type of report > 4) click on link to get report > 5) suck up contents of report as a VW temp file or stream? > 6) parse contents (CSV) -- easy part. > > I've got some initial code that can do the parsing mentioned in #6 but > am not sure what stuff I need to use in order to post web forms to a > 3rd party site, get responses (processing if need be),etc.. Any ideas? > > Is this sort of stuff error prone? Obviously if the page gets updated > (divs renamed, objects renumered,etc) then we'll have a problem -- I've > not yet looked at the source for the offending page but am trying to see > if this is doable in VW and if so, what parcels I need to do the > offending tasks.. Thanks in advance! > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Rick Flower
On 19.04.2010 23:18, [hidden email] wrote:
> Hi all.. > > After not looking much at any Smalltalk code for the most part since late > last summer, I'm getting back into it on behalf of my kids school.. I'm > trying to automate some tasks that are currently done manually and one of > those tasks is to visit a website and do the following: > > 1) login via http/https (probably https) > 2) once in, click on a specific link (for report generation) > 3) fill in some fields indicating type of report > 4) click on link to get report > 5) suck up contents of report as a VW temp file or stream? > 6) parse contents (CSV) -- easy part. > > I've got some initial code that can do the parsing mentioned in #6 but > am not sure what stuff I need to use in order to post web forms to a > 3rd party site, get responses (processing if need be),etc.. Any ideas? > > Is this sort of stuff error prone? Obviously if the page gets updated > (divs renamed, objects renumered,etc) then we'll have a problem -- I've > not yet looked at the source for the offending page but am trying to see > if this is doable in VW and if so, what parcels I need to do the > offending tasks.. Thanks in advance! > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc - Use a sniffer to collect the actual package data you need going back and forth during a manual successful run-through of the procedure. - Create requests based on the data for steps: - Login - Gathering available report type parameters - Requesting results with parameters as selected from the available ones. - Execute with HTTPClient, make sure you have certificates set up nicely if you plan on using HTTPS*. - Parse the HttpResponses for the data you need for the next steps. A possible stumbling block in 7.7, is that when you set contents: to a string, it will automatically change contentType to 'text', so if the server expects a custom mime-type like 'application/x-www-form-urlencoded', you need to set this AFTER setting request contents. The prereqs for said package I made were: - HTTPS - X509 (To check certificate of site I connected to) - ASN1-Support (To serialize the "valid" X509-certificates in a string in the image) All in all, it was easier than I expected :D Cheers, Henry *Speaking of which, is there any way to set up certificates for a client executing multiple https-requests? I couldn't find a way to make them "stick" between requests, and had to resort to ugly code like: [client executeRequest: anHttpsRequest] on: Security.SSLBadCertificate do: [:error | "Setting certificate authentification correctly up front seems quite impossible when using HttpClient to execute https requests... " (error originator trustedCertificateMatching: self trustedRootCertificate subjectDNInBytes) ifNil: [error originator addTrusted: self trustedRootCertificate. error restart] ifNotNil: [error raise]] _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Rick Flower
"Henrik Sperre Johansen"<[hidden email]> wrote:
g of which, is there any way to set up certificates for a client > executing multiple https-requests? I couldn't find a way to make them > "stick" between requests, and had to resort to ugly code like: > [client executeRequest: anHttpsRequest] > on: Security.SSLBadCertificate > do: > [:error | > > "Setting certificate authentification correctly up > front seems quite impossible when using HttpClient to execute https > requests... " > (error originator > trustedCertificateMatching: self > trustedRootCertificate subjectDNInBytes) > ifNil: > [error originator addTrusted: self > trustedRootCertificate. > error restart] > ifNotNil: [error raise]] You have 2 options: 1) Either you preload the CA certificate into the global default registry: X509Registry default addTrusted: trustedRootCertificate If you don't configure different registry explicitly, the default is used instead. 2) If you'd rather not share the registry between different applications in your image, you can use private registries for each. To set up a client with its own private registry you need to do something like this: registry := X509Registry new addTrusted: trustedRootCertificate. anHttpClient sslContext: (SSLContext newWithSecureCipherSuitesUsing: registry) HTH, Martin _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
On 20.04.2010 00:36, [hidden email] wrote:
> "Henrik Sperre Johansen"<[hidden email]> wrote: > g of which, is there any way to set up certificates for a client >> executing multiple https-requests? > You have 2 options: > > 1) Either you preload the CA certificate into the global default registry: > > X509Registry default addTrusted: trustedRootCertificate > > If you don't configure different registry explicitly, the default is used instead. > > 2) If you'd rather not share the registry between different applications in your image, you can use private registries for each. To set up a client with its own private registry you need to do something like this: > > registry := X509Registry new addTrusted: trustedRootCertificate. > anHttpClient sslContext: (SSLContext newWithSecureCipherSuitesUsing: registry) > > HTH, > > Martin Security Guide, but the rest of what I read seemed geared on either using the context in error handlers, or using the connection creation methods of the context. Should've thought to look at the protocol of NetClient as well as HttpClient, couldn't quite figure out how you were supposed to set it up front following the logic in a debugger :) Cheers, Henry _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Rick Flower
> I don't know of any code out of the box that can parse the incoming
> HTML > into a DOM, so I've ended up using code like this: > > beginning := pageString indexOfSubCollection: '<title>' startingAt: > 1. > end := pageString indexOfSubCollection: '</title>' startingAt: > beginning. > titleString := pageString copyFrom: beginning + 7 to: end - 1. > > Others may have a more elegant solution. I'm not sure any solution to do with HTML "in the wild" can be elegant, but I've preferred using streams to strings and indices: aStream skipThroughAll: '<title>'. titleString := aStream upToAndSkipThroughAll: '</title>'. There's some reasonably robust code for parsing HTML in the Webtalk package in the public repository. The Webtalk class there is a simple parser for extracting speakable text and links from HTML. It throws most of the information away, but it should be easy enough to modify #parseHtml to make it grab what you want. (If you publish a new version, please use a blessing < Development, otherwise a few blind Finnish people will find their browser updates to your version! I hadn't thought the market for this package would be large enough to interest anyone else :->) You could also consider using the HTML Tidy library, http://tidy.sourceforge.net/, if only to preprocess the HTML to make parsing it easier. I thought BottomFeeder used that, but I'm not seeing a package for it in the public repository. BottomFeeder also has a TolerantXML-Parser, which might work for a particular HTML page (presumably providing it's written in a modern XML-like style). Wasn't there also a unit testing framework for HTML forms? That might be useful. Steve _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Rick Flower
"Steven Kelly"<[hidden email]> wrote:
> Wasn't there also a unit testing framework for HTML forms? That might be > useful. The seaside/SUnitToo-Seaside parcel does that. Check out the parcel comment. HTH, Martin _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
On Tue, 20 Apr 2010 10:06:11 -0400, [hidden email] wrote:
> "Steven Kelly"<[hidden email]> wrote: >> Wasn't there also a unit testing framework for HTML forms? That might be >> useful. > > The seaside/SUnitToo-Seaside parcel does that. Check out the parcel > comment. Thanks all for the great suggestions! I'll check them out once I get my VW7.7NC up and running and loaded with my code again.. -- Rick _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Free forum by Nabble | Edit this page |