I am writing a small progam that will automatically download pictures from a
web site (a book that has been scanned and is stored as 358 different pictures at 'http://runeberg.org/geodet/0001.html' and so on). Basically, the behaviour that I want is the same as one gets by right-clicking on the picture in Internet Explorer and choosing "save link as..." but automatic. I have a working solution which uses URLMonLibrary>>urlDownload:toFile: to download the source of the web page, which is then analysed for links to pictures, which are downloaded and saved as files, also using #urlDownload:toFile:. Since this solution is a bit complex, especially the part that identifies links to the pictures, I was wondering if there might be a better solution to the problem. I have investigated the Internet Explorer package with it's IWebBrowser class, but have found nothing useful so far. Best regards, Mikael Svane |
"Mikael Svane" <[hidden email]> wrote in message
news:[hidden email]... >I am writing a small progam that will automatically download pictures from >a web site (a book that has been scanned and is stored as 358 different >pictures at 'http://runeberg.org/geodet/0001.html' and so on). Basically, >the behaviour that I want is the same as one gets by right-clicking on the >picture in Internet Explorer and choosing "save link as..." but automatic. >I have a working solution which uses URLMonLibrary>>urlDownload:toFile: to >download the source of the web page, which is then analysed for links to >pictures, which are downloaded and saved as files, also using >#urlDownload:toFile:. Since this solution is a bit complex, especially the >part that identifies links to the pictures, I was wondering if there might >be a better solution to the problem. I have investigated the Internet >Explorer package with it's IWebBrowser class, but have found nothing useful >so far. I am not sure about the assumptions one can make, or the generalality of solution desired. You could continue to do it in a way simliar to what you do not, but use a Stream on the page source for parseing, since there is no need to save it to a file. You certainly could drive IE via automation, but unless you need to do lots of parsing or web interaction it may not be worth using. ========== htmlStream := (FileStream on: (IStream onURL: 'http://runeberg.org/geodet/0300.html') text: true). refText := 'alt="scanned image"'. htmlStream skipToAll: refText. htmlStream position: htmlStream position - refText size. [htmlStream pop; peek =$"] whileFalse. endPos := htmlStream position. [htmlStream pop; peek =$"] whileFalse. htmlStream skip: 1. startPos := htmlStream position. relativeImageURL := htmlStream next: endPos -startPos. ========== I took a look at the site, and I see that just as the HTML pages have predictable file names so do the images. ex: http://runeberg.org/img/geodet/0001.5.png ... http://runeberg.org/img/geodet/0300.5.png Since you know the format of the url, and the number of pages, why not do something like this: ============================ firstPageNum := 1. lastPageNum := 358. urlCol := (firstPageNum to: lastPageNum) collect: [:pageNum | 'http://runeberg.org/img/geodet/%04d.5.png' sprintfWith: pageNum]. ============================ Then just use your existing code to save all the images in the urlCol. Chris |
Free forum by Nabble | Edit this page |