Hello,
I have now this code : https://github.com/RoelofWobben/Rijksmuseam but it seems to be slow. Can anyone help me with a way I can use a sort of cache so the page looks first at the cache if a image is there . If so, take the image from there , if not , ask the api for the url of the image. Roelof |
What do you want the code to do? Have you profiled the code to see where the time is going? A quick look at the code shows - Paintings does one web get - each Painting does two more web gets ! and the first of those seems to be pretty pointless, as it refetches an object that Paintings already fetched and just looked at. On Sun, 3 Jan 2021 at 01:16, Roelof Wobben via Pharo-users <[hidden email]> wrote: Hello, |
In reply to this post by Pharo Smalltalk Users mailing list
I did it on the root document and see
this :
So as far as I see it , The most time it taken by getting all the data from all the 10 images. I hope someone can look at me if im on the right track and will help me to figure out faster ways to achieve the same Roelof Op 5-1-2021 om 05:16 schreef Richard O'Keefe:
|
Roelof,
Working with multiple high resolution images, as I believe you are doing, is always going to be a real challenge, performance wise. It just takes time to transfer lots of data. First you have to make sure that you are not doing too much work (double downloads, using too high resolutions for previews or browsing). Also, make sure your ultimate client (the browser) can cache as well if applicable (set modification dates on the response). Next you could cache images locally (on your app server) so that next time you need the same image, you do not need to download it again. Of course, this only helps if your hit rate is higher than zero (if you actually ask for the same image multiple times). It is also possible to do multiple download requests concurrently: if the other end is fast enough, that can certainly help. HTH, Sven > On 6 Jan 2021, at 18:11, Roelof Wobben via Pharo-users <[hidden email]> wrote: > > > I did it on the root document and see this : > > <nkdkdknfekaflcfc.png> > > So as far as I see it , The most time it taken by getting all the data from all the 10 images. > > I hope someone can look at me if im on the right track and will help me to figure out faster ways to achieve the same > > Roelof > > > > Op 5-1-2021 om 05:16 schreef Richard O'Keefe: >> Before you take another step, explore the root document. >> >> Profiling is easy. >> Open a Playground. >> Type an expression such as >> 3 tinyBenchmarks >> Right click and select 'Profile it'. >> >> More generally, in a browser, look at the "Tool - Profilers" >> class category. The classic approach was >> MessageTally spyOn: [3 tinyBenchmarks] >> If I understand correctly, 'Profile it' uses TimeProfiler, >> which has a nicer interface. (This is in Pharo 8.) >> >> >> On Sun, 3 Jan 2021 at 23:03, Roelof Wobben <[hidden email]> wrote: >> I want that the code fetches a url and some data from the Rijksmuseaum api. >> And as far as I see it the second it not pointless because it getting more detailed info about the painting as in the first get. >> >> I did not profiled it because I never learned how to do that in Pharo. >> >> Roelof >> >> >> >> Op 3-1-2021 om 01:09 schreef Richard O'Keefe: >>> What do you want the code to do? >>> Have you profiled the code to see where the time is going? >>> >>> A quick look at the code shows >>> - Paintings does one web get >>> - each Painting does two more web gets >>> ! and the first of those seems to be pretty pointless, >>> as it refetches an object that Paintings already fetched >>> and just looked at. >>> >>> >>> >>> On Sun, 3 Jan 2021 at 01:16, Roelof Wobben via Pharo-users <[hidden email]> wrote: >>> Hello, >>> >>> I have now this code : https://github.com/RoelofWobben/Rijksmuseam >>> >>> but it seems to be slow. >>> >>> Can anyone help me with a way I can use a sort of cache so the page >>> looks first at the cache if a image is there . >>> If so, take the image from there , if not , ask the api for the url of >>> the image. >>> >>> Roelof >> > |
In reply to this post by Pharo Smalltalk Users mailing list
Thanks, Right now im downloading/fetching the images every time again. As I see it, the biggest bottleneck is that I have 10 images. And for all 10 I fetching the image and the data I could display when a user wants it. So that will be 20 calls to the api. So maybe some cache could be handy. Any hints how to make a cache in smalltalk. Roelof Op 6-1-2021 om 19:19 schreef Sven Van Caekenberghe: > Roelof, > > Working with multiple high resolution images, as I believe you are > doing, is always going to be a real challenge, performance wise. It > just takes time to transfer lots of data. > > First you have to make sure that you are not doing too much work > (double downloads, using too high resolutions for previews or > browsing). Also, make sure your ultimate client (the browser) can > cache as well if applicable (set modification dates on the response). > > Next you could cache images locally (on your app server) so that next > time you need the same image, you do not need to download it again. Of > course, this only helps if your hit rate is higher than zero (if you > actually ask for the same image multiple times). > > It is also possible to do multiple download requests concurrently: if > the other end is fast enough, that can certainly help. > > HTH, > > Sven > >> On 6 Jan 2021, at 18:11, Roelof Wobben via Pharo-users >> <[hidden email]> wrote: >> >> >> I did it on the root document and see this : >> >> <nkdkdknfekaflcfc.png> >> >> So as far as I see it , The most time it taken by getting all the >> data from all the 10 images. >> >> I hope someone can look at me if im on the right track and will help >> me to figure out faster ways to achieve the same >> >> Roelof >> >> >> >> Op 5-1-2021 om 05:16 schreef Richard O'Keefe: >>> Before you take another step, explore the root document. >>> >>> Profiling is easy. >>> Open a Playground. >>> Type an expression such as >>> 3 tinyBenchmarks >>> Right click and select 'Profile it'. >>> >>> More generally, in a browser, look at the "Tool - Profilers" >>> class category. The classic approach was >>> MessageTally spyOn: [3 tinyBenchmarks] >>> If I understand correctly, 'Profile it' uses TimeProfiler, >>> which has a nicer interface. (This is in Pharo 8.) >>> >>> >>> On Sun, 3 Jan 2021 at 23:03, Roelof Wobben <[hidden email]> wrote: >>> I want that the code fetches a url and some data from the >>> Rijksmuseaum api. >>> And as far as I see it the second it not pointless because it >>> getting more detailed info about the painting as in the first get. >>> >>> I did not profiled it because I never learned how to do that in Pharo. >>> >>> Roelof >>> >>> >>> >>> Op 3-1-2021 om 01:09 schreef Richard O'Keefe: >>>> What do you want the code to do? >>>> Have you profiled the code to see where the time is going? >>>> >>>> A quick look at the code shows >>>> - Paintings does one web get >>>> - each Painting does two more web gets >>>> ! and the first of those seems to be pretty pointless, >>>> as it refetches an object that Paintings already fetched >>>> and just looked at. >>>> >>>> >>>> >>>> On Sun, 3 Jan 2021 at 01:16, Roelof Wobben via Pharo-users >>>> <[hidden email]> wrote: >>>> Hello, >>>> >>>> I have now this code : https://github.com/RoelofWobben/Rijksmuseam >>>> >>>> but it seems to be slow. >>>> >>>> Can anyone help me with a way I can use a sort of cache so the page >>>> looks first at the cache if a image is there . >>>> If so, take the image from there , if not , ask the api for the url of >>>> the image. >>>> >>>> Roelof |
Administrator
|
On Wed, Jan 6, 2021 at 10:34 AM Roelof Wobben via Pharo-users <[hidden email]> wrote:
Dictionary coupled with #at:ifAbsentPut:
|
Op 6-1-2021 om 19:36 schreef Richard
Sargent:
Could be working maybe. Then I have to look how to change my code in moreData to achieve this or maybe the code from getPaintings Roelof ph |
In reply to this post by Richard Sargent
To avoid doing something like that, and for a web scrapping tool I
wrote, I implemented a basic subclass of ZnClient (called ZnCachingClient) that used a disk cache where each key/file was the hash of the requested URL. I haven't used in a while but it should continue to work, in any case, I published it in a Github repository: https://github.com/eMaringolo/zinc-caching-client Best regards, Esteban A. Maringolo On Wed, Jan 6, 2021 at 3:37 PM Richard Sargent <[hidden email]> wrote: > > > > On Wed, Jan 6, 2021 at 10:34 AM Roelof Wobben via Pharo-users <[hidden email]> wrote: >> >> >> Thanks, >> >> Right now im downloading/fetching the images every time again. >> >> As I see it, the biggest bottleneck is that I have 10 images. >> And for all 10 I fetching the image and the data I could display when a >> user wants it. >> So that will be 20 calls to the api. >> >> So maybe some cache could be handy. >> Any hints how to make a cache in smalltalk. > > > Dictionary coupled with #at:ifAbsentPut: > >> >> Roelof >> >> >> >> Op 6-1-2021 om 19:19 schreef Sven Van Caekenberghe: >> > Roelof, >> > >> > Working with multiple high resolution images, as I believe you are >> > doing, is always going to be a real challenge, performance wise. It >> > just takes time to transfer lots of data. >> > >> > First you have to make sure that you are not doing too much work >> > (double downloads, using too high resolutions for previews or >> > browsing). Also, make sure your ultimate client (the browser) can >> > cache as well if applicable (set modification dates on the response). >> > >> > Next you could cache images locally (on your app server) so that next >> > time you need the same image, you do not need to download it again. Of >> > course, this only helps if your hit rate is higher than zero (if you >> > actually ask for the same image multiple times). >> > >> > It is also possible to do multiple download requests concurrently: if >> > the other end is fast enough, that can certainly help. >> > >> > HTH, >> > >> > Sven >> > >> >> On 6 Jan 2021, at 18:11, Roelof Wobben via Pharo-users >> >> <[hidden email]> wrote: >> >> >> >> >> >> I did it on the root document and see this : >> >> >> >> <nkdkdknfekaflcfc.png> >> >> >> >> So as far as I see it , The most time it taken by getting all the >> >> data from all the 10 images. >> >> >> >> I hope someone can look at me if im on the right track and will help >> >> me to figure out faster ways to achieve the same >> >> >> >> Roelof >> >> >> >> >> >> >> >> Op 5-1-2021 om 05:16 schreef Richard O'Keefe: >> >>> Before you take another step, explore the root document. >> >>> >> >>> Profiling is easy. >> >>> Open a Playground. >> >>> Type an expression such as >> >>> 3 tinyBenchmarks >> >>> Right click and select 'Profile it'. >> >>> >> >>> More generally, in a browser, look at the "Tool - Profilers" >> >>> class category. The classic approach was >> >>> MessageTally spyOn: [3 tinyBenchmarks] >> >>> If I understand correctly, 'Profile it' uses TimeProfiler, >> >>> which has a nicer interface. (This is in Pharo 8.) >> >>> >> >>> >> >>> On Sun, 3 Jan 2021 at 23:03, Roelof Wobben <[hidden email]> wrote: >> >>> I want that the code fetches a url and some data from the >> >>> Rijksmuseaum api. >> >>> And as far as I see it the second it not pointless because it >> >>> getting more detailed info about the painting as in the first get. >> >>> >> >>> I did not profiled it because I never learned how to do that in Pharo. >> >>> >> >>> Roelof >> >>> >> >>> >> >>> >> >>> Op 3-1-2021 om 01:09 schreef Richard O'Keefe: >> >>>> What do you want the code to do? >> >>>> Have you profiled the code to see where the time is going? >> >>>> >> >>>> A quick look at the code shows >> >>>> - Paintings does one web get >> >>>> - each Painting does two more web gets >> >>>> ! and the first of those seems to be pretty pointless, >> >>>> as it refetches an object that Paintings already fetched >> >>>> and just looked at. >> >>>> >> >>>> >> >>>> >> >>>> On Sun, 3 Jan 2021 at 01:16, Roelof Wobben via Pharo-users >> >>>> <[hidden email]> wrote: >> >>>> Hello, >> >>>> >> >>>> I have now this code : https://github.com/RoelofWobben/Rijksmuseam >> >>>> >> >>>> but it seems to be slow. >> >>>> >> >>>> Can anyone help me with a way I can use a sort of cache so the page >> >>>> looks first at the cache if a image is there . >> >>>> If so, take the image from there , if not , ask the api for the url of >> >>>> the image. >> >>>> >> >>>> Roelof |
You can also try https://github.com/ba-st/Superluminal which has caching support among other things. On Wed, 6 Jan 2021, 16:42 Esteban Maringolo, <[hidden email]> wrote: To avoid doing something like that, and for a web scrapping tool I |
Just my input to the cache thing. Pharo has a class named LRUCache (Least Recently Used Cache), which is very helpful for such. If you want to store it on disk instead of the image Estaban's suggestion seems like the way to go. Best, Kasper |
Free forum by Nabble | Edit this page |