Smalltalk › Pharo › Pharo Smalltalk Users

is there a better way

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

10 messages Options

Pharo Smalltalk Users mailing list

is there a better way

Hello,

I have now this code : https://github.com/RoelofWobben/Rijksmuseam

but it seems to be slow.

Can anyone help me with a way I can use a sort of cache so the page
looks first at the cache if a image is there .
If so, take the image from there , if not , ask the api for the url of
the image.

Roelof

Richard O'Keefe

Re: is there a better way

What do you want the code to do?

Have you profiled the code to see where the time is going?

A quick look at the code shows

 - Paintings does one web get

 - each Painting does two more web gets

   ! and the first of those seems to be pretty pointless,

     as it refetches an object that Paintings already fetched

     and just looked at.

On Sun, 3 Jan 2021 at 01:16, Roelof Wobben via Pharo-users <[hidden email]> wrote:

Hello,

I have now this code : https://github.com/RoelofWobben/Rijksmuseam

but it seems to be slow.

Can anyone help me with a way I can use a sort of cache so the page
looks first at the cache if a image is there .
If so, take the image from there , if not , ask the api for the url of
the image.

Roelof

Pharo Smalltalk Users mailing list

Re: is there a better way

In reply to this post by Pharo Smalltalk Users mailing list

I did it on the root document and see this :

So as far as I see it , The most time it taken by getting all the data from all the 10 images.

I hope someone can look at me if im on the right track and will help me to figure out faster ways to achieve the same

Roelof

Op 5-1-2021 om 05:16 schreef Richard O'Keefe:

Before you take another step, explore the root document.

Profiling is easy.

Open a Playground.

Type an expression such as

3 tinyBenchmarks

Right click and select 'Profile it'.

More generally, in a browser, look at the "Tool - Profilers"

class category. The classic approach was

MessageTally spyOn: [3 tinyBenchmarks]

If I understand correctly, 'Profile it' uses TimeProfiler,

which has a nicer interface. (This is in Pharo 8.)

On Sun, 3 Jan 2021 at 23:03, Roelof Wobben <[hidden email]> wrote:

I want that the code fetches a url and some data from the Rijksmuseaum api.
And as far as I see it the second it not pointless because it getting more detailed info about the painting as in the first get.

I did not profiled it because I never learned how to do that in Pharo.

Roelof

Op 3-1-2021 om 01:09 schreef Richard O'Keefe:

What do you want the code to do?

Have you profiled the code to see where the time is going?

A quick look at the code shows

- Paintings does one web get

- each Painting does two more web gets

   ! and the first of those seems to be pretty pointless,

     as it refetches an object that Paintings already fetched

     and just looked at.

On Sun, 3 Jan 2021 at 01:16, Roelof Wobben via Pharo-users <[hidden email]> wrote:

Hello,

I have now this code : https://github.com/RoelofWobben/Rijksmuseam

but it seems to be slow.

Can anyone help me with a way I can use a sort of cache so the page
looks first at the cache if a image is there .
If so, take the image from there , if not , ask the api for the url of
the image.

Roelof

Sven Van Caekenberghe-2

Re: is there a better way

Roelof,

Working with multiple high resolution images, as I believe you are doing, is always going to be a real challenge, performance wise. It just takes time to transfer lots of data.

First you have to make sure that you are not doing too much work (double downloads, using too high resolutions for previews or browsing). Also, make sure your ultimate client (the browser) can cache as well if applicable (set modification dates on the response).

Next you could cache images locally (on your app server) so that next time you need the same image, you do not need to download it again. Of course, this only helps if your hit rate is higher than zero (if you actually ask for the same image multiple times).

It is also possible to do multiple download requests concurrently: if the other end is fast enough, that can certainly help.

HTH,

Sven

> On 6 Jan 2021, at 18:11, Roelof Wobben via Pharo-users <[hidden email]> wrote:
>
>
> I did it on the root document and see this :
>
> <nkdkdknfekaflcfc.png>
>
> So as far as I see it , The most time it taken by getting all the data from all the 10 images.
>
> I hope someone can look at me if im on the right track and will help me to figure out faster ways to achieve the same
>
> Roelof
>
>
>
> Op 5-1-2021 om 05:16 schreef Richard O'Keefe:
>> Before you take another step, explore the root document.
>>
>> Profiling is easy.
>> Open a Playground.
>> Type an expression such as
>> 3 tinyBenchmarks
>> Right click and select 'Profile it'.
>>
>> More generally, in a browser, look at the "Tool - Profilers"
>> class category. The classic approach was
>> MessageTally spyOn: [3 tinyBenchmarks]
>> If I understand correctly, 'Profile it' uses TimeProfiler,
>> which has a nicer interface. (This is in Pharo 8.)
>>
>>
>> On Sun, 3 Jan 2021 at 23:03, Roelof Wobben <[hidden email]> wrote:
>> I want that the code fetches a url and some data from the Rijksmuseaum api.
>> And as far as I see it the second it not pointless because it getting more detailed info about the painting as in the first get.
>>
>> I did not profiled it because I never learned how to do that in Pharo.
>>
>> Roelof
>>
>>
>>
>> Op 3-1-2021 om 01:09 schreef Richard O'Keefe:
>>> What do you want the code to do?
>>> Have you profiled the code to see where the time is going?
>>>
>>> A quick look at the code shows
>>> - Paintings does one web get
>>> - each Painting does two more web gets
>>> ! and the first of those seems to be pretty pointless,
>>> as it refetches an object that Paintings already fetched
>>> and just looked at.
>>>
>>>
>>>
>>> On Sun, 3 Jan 2021 at 01:16, Roelof Wobben via Pharo-users <[hidden email]> wrote:
>>> Hello,
>>>
>>> I have now this code : https://github.com/RoelofWobben/Rijksmuseam
>>>
>>> but it seems to be slow.
>>>
>>> Can anyone help me with a way I can use a sort of cache so the page
>>> looks first at the cache if a image is there .
>>> If so, take the image from there , if not , ask the api for the url of
>>> the image.
>>>
>>> Roelof
>>
>

Pharo Smalltalk Users mailing list

Re: is there a better way

In reply to this post by Pharo Smalltalk Users mailing list

Thanks,

Right now im downloading/fetching the images every time again.

As I see it, the biggest bottleneck is that I have 10 images.
And for all 10 I fetching the image and the data I could display when a
user wants it.
So that will be 20 calls to the api.

So maybe some cache could be handy.
Any hints how to make a cache in smalltalk.

Roelof

Op 6-1-2021 om 19:19 schreef Sven Van Caekenberghe:

> Roelof,
>
> Working with multiple high resolution images, as I believe you are
> doing, is always going to be a real challenge, performance wise. It
> just takes time to transfer lots of data.
>
> First you have to make sure that you are not doing too much work
> (double downloads, using too high resolutions for previews or
> browsing). Also, make sure your ultimate client (the browser) can
> cache as well if applicable (set modification dates on the response).
>
> Next you could cache images locally (on your app server) so that next
> time you need the same image, you do not need to download it again. Of
> course, this only helps if your hit rate is higher than zero (if you
> actually ask for the same image multiple times).
>
> It is also possible to do multiple download requests concurrently: if
> the other end is fast enough, that can certainly help.
>
> HTH,
>
> Sven
>
>> On 6 Jan 2021, at 18:11, Roelof Wobben via Pharo-users
>> <[hidden email]> wrote:
>>
>>
>> I did it on the root document and see this :
>>
>> <nkdkdknfekaflcfc.png>
>>
>> So as far as I see it , The most time it taken by getting all the
>> data from all the 10 images.
>>
>> I hope someone can look at me if im on the right track and will help
>> me to figure out faster ways to achieve the same
>>
>> Roelof
>>
>>
>>
>> Op 5-1-2021 om 05:16 schreef Richard O'Keefe:
>>> Before you take another step, explore the root document.
>>>
>>> Profiling is easy.
>>> Open a Playground.
>>> Type an expression such as
>>> 3 tinyBenchmarks
>>> Right click and select 'Profile it'.
>>>
>>> More generally, in a browser, look at the "Tool - Profilers"
>>> class category. The classic approach was
>>> MessageTally spyOn: [3 tinyBenchmarks]
>>> If I understand correctly, 'Profile it' uses TimeProfiler,
>>> which has a nicer interface. (This is in Pharo 8.)
>>>
>>>
>>> On Sun, 3 Jan 2021 at 23:03, Roelof Wobben <[hidden email]> wrote:
>>> I want that the code fetches a url and some data from the
>>> Rijksmuseaum api.
>>> And as far as I see it the second it not pointless because it
>>> getting more detailed info about the painting as in the first get.
>>>
>>> I did not profiled it because I never learned how to do that in Pharo.
>>>
>>> Roelof
>>>
>>>
>>>
>>> Op 3-1-2021 om 01:09 schreef Richard O'Keefe:
>>>> What do you want the code to do?
>>>> Have you profiled the code to see where the time is going?
>>>>
>>>> A quick look at the code shows
>>>> - Paintings does one web get
>>>> - each Painting does two more web gets
>>>> ! and the first of those seems to be pretty pointless,
>>>> as it refetches an object that Paintings already fetched
>>>> and just looked at.
>>>>
>>>>
>>>>
>>>> On Sun, 3 Jan 2021 at 01:16, Roelof Wobben via Pharo-users
>>>> <[hidden email]> wrote:
>>>> Hello,
>>>>
>>>> I have now this code : https://github.com/RoelofWobben/Rijksmuseam
>>>>
>>>> but it seems to be slow.
>>>>
>>>> Can anyone help me with a way I can use a sort of cache so the page
>>>> looks first at the cache if a image is there .
>>>> If so, take the image from there , if not , ask the api for the url of
>>>> the image.
>>>>
>>>> Roelof

Richard Sargent

Re: is there a better way

Administrator

On Wed, Jan 6, 2021 at 10:34 AM Roelof Wobben via Pharo-users <[hidden email]> wrote:

Thanks,

Right now im downloading/fetching the images every time again.

As I see it, the biggest bottleneck is that I have 10 images.
And for all 10 I fetching the image and the data I could display when a
user wants it.
So that will be 20 calls to the api.

So maybe some cache could be handy.
Any hints how to make a cache in smalltalk.

Dictionary coupled with #at:ifAbsentPut:

Roelof

Op 6-1-2021 om 19:19 schreef Sven Van Caekenberghe:
> Roelof,
>
> Working with multiple high resolution images, as I believe you are
> doing, is always going to be a real challenge, performance wise. It
> just takes time to transfer lots of data.
>
> First you have to make sure that you are not doing too much work
> (double downloads, using too high resolutions for previews or
> browsing). Also, make sure your ultimate client (the browser) can
> cache as well if applicable (set modification dates on the response).
>
> Next you could cache images locally (on your app server) so that next
> time you need the same image, you do not need to download it again. Of
> course, this only helps if your hit rate is higher than zero (if you
> actually ask for the same image multiple times).
>
> It is also possible to do multiple download requests concurrently: if
> the other end is fast enough, that can certainly help.
>
> HTH,
>
> Sven
>
>> On 6 Jan 2021, at 18:11, Roelof Wobben via Pharo-users
>> <[hidden email]> wrote:
>>
>>
>> I did it on the root document and see this :
>>
>> <nkdkdknfekaflcfc.png>
>>
>> So as far as I see it , The most time it taken by getting all the
>> data from all the 10 images.
>>
>> I hope someone can look at me if im on the right track and will help
>> me to figure out faster ways to achieve the same
>>
>> Roelof
>>
>>
>>
>> Op 5-1-2021 om 05:16 schreef Richard O'Keefe:
>>> Before you take another step, explore the root document.
>>>
>>> Profiling is easy.
>>> Open a Playground.
>>> Type an expression such as
>>> 3 tinyBenchmarks
>>> Right click and select 'Profile it'.
>>>
>>> More generally, in a browser, look at the "Tool - Profilers"
>>> class category. The classic approach was
>>> MessageTally spyOn: [3 tinyBenchmarks]
>>> If I understand correctly, 'Profile it' uses TimeProfiler,
>>> which has a nicer interface. (This is in Pharo 8.)
>>>
>>>
>>> On Sun, 3 Jan 2021 at 23:03, Roelof Wobben <[hidden email]> wrote:
>>> I want that the code fetches a url and some data from the
>>> Rijksmuseaum api.
>>> And as far as I see it the second it not pointless because it
>>> getting more detailed info about the painting as in the first get.
>>>
>>> I did not profiled it because I never learned how to do that in Pharo.
>>>
>>> Roelof
>>>
>>>
>>>
>>> Op 3-1-2021 om 01:09 schreef Richard O'Keefe:
>>>> What do you want the code to do?
>>>> Have you profiled the code to see where the time is going?
>>>>
>>>> A quick look at the code shows
>>>> - Paintings does one web get
>>>> - each Painting does two more web gets
>>>> ! and the first of those seems to be pretty pointless,
>>>> as it refetches an object that Paintings already fetched
>>>> and just looked at.
>>>>
>>>>
>>>>
>>>> On Sun, 3 Jan 2021 at 01:16, Roelof Wobben via Pharo-users
>>>> <[hidden email]> wrote:
>>>> Hello,
>>>>
>>>> I have now this code : https://github.com/RoelofWobben/Rijksmuseam
>>>>
>>>> but it seems to be slow.
>>>>
>>>> Can anyone help me with a way I can use a sort of cache so the page
>>>> looks first at the cache if a image is there .
>>>> If so, take the image from there , if not , ask the api for the url of
>>>> the image.
>>>>
>>>> Roelof

Pharo Smalltalk Users mailing list

Re: is there a better way

Op 6-1-2021 om 19:36 schreef Richard Sargent:

On Wed, Jan 6, 2021 at 10:34 AM Roelof Wobben via Pharo-users <[hidden email]> wrote:

Thanks,

Right now im downloading/fetching the images every time again.

As I see it, the biggest bottleneck is that I have 10 images.
And for all 10 I fetching the image and the data I could display when a
user wants it.
So that will be 20 calls to the api.

So maybe some cache could be handy.
Any hints how to make a cache in smalltalk.

Dictionary coupled with #at:ifAbsentPut:

Roelof

Could be working maybe.
Then I have to look how to change my code in moreData to achieve this
or maybe the code from getPaintings

Roelof
ph

Esteban A. Maringolo

Re: is there a better way

In reply to this post by Richard Sargent

To avoid doing something like that, and for a web scrapping tool I
wrote, I implemented a basic subclass of ZnClient (called
ZnCachingClient) that used a disk cache where each key/file was the
hash of the requested URL.

I haven't used in a while but it should continue to work, in any case,
I published it in a Github repository:
https://github.com/eMaringolo/zinc-caching-client

Best regards,

Esteban A. Maringolo

On Wed, Jan 6, 2021 at 3:37 PM Richard Sargent
<[hidden email]> wrote:

>
>
>
> On Wed, Jan 6, 2021 at 10:34 AM Roelof Wobben via Pharo-users <[hidden email]> wrote:
>>
>>
>> Thanks,
>>
>> Right now im downloading/fetching the images every time again.
>>
>> As I see it, the biggest bottleneck is that I have 10 images.
>> And for all 10 I fetching the image and the data I could display when a
>> user wants it.
>> So that will be 20 calls to the api.
>>
>> So maybe some cache could be handy.
>> Any hints how to make a cache in smalltalk.
>
>
> Dictionary coupled with #at:ifAbsentPut:
>
>>
>> Roelof
>>
>>
>>
>> Op 6-1-2021 om 19:19 schreef Sven Van Caekenberghe:
>> > Roelof,
>> >
>> > Working with multiple high resolution images, as I believe you are
>> > doing, is always going to be a real challenge, performance wise. It
>> > just takes time to transfer lots of data.
>> >
>> > First you have to make sure that you are not doing too much work
>> > (double downloads, using too high resolutions for previews or
>> > browsing). Also, make sure your ultimate client (the browser) can
>> > cache as well if applicable (set modification dates on the response).
>> >
>> > Next you could cache images locally (on your app server) so that next
>> > time you need the same image, you do not need to download it again. Of
>> > course, this only helps if your hit rate is higher than zero (if you
>> > actually ask for the same image multiple times).
>> >
>> > It is also possible to do multiple download requests concurrently: if
>> > the other end is fast enough, that can certainly help.
>> >
>> > HTH,
>> >
>> > Sven
>> >
>> >> On 6 Jan 2021, at 18:11, Roelof Wobben via Pharo-users
>> >> <[hidden email]> wrote:
>> >>
>> >>
>> >> I did it on the root document and see this :
>> >>
>> >> <nkdkdknfekaflcfc.png>
>> >>
>> >> So as far as I see it , The most time it taken by getting all the
>> >> data from all the 10 images.
>> >>
>> >> I hope someone can look at me if im on the right track and will help
>> >> me to figure out faster ways to achieve the same
>> >>
>> >> Roelof
>> >>
>> >>
>> >>
>> >> Op 5-1-2021 om 05:16 schreef Richard O'Keefe:
>> >>> Before you take another step, explore the root document.
>> >>>
>> >>> Profiling is easy.
>> >>> Open a Playground.
>> >>> Type an expression such as
>> >>> 3 tinyBenchmarks
>> >>> Right click and select 'Profile it'.
>> >>>
>> >>> More generally, in a browser, look at the "Tool - Profilers"
>> >>> class category. The classic approach was
>> >>> MessageTally spyOn: [3 tinyBenchmarks]
>> >>> If I understand correctly, 'Profile it' uses TimeProfiler,
>> >>> which has a nicer interface. (This is in Pharo 8.)
>> >>>
>> >>>
>> >>> On Sun, 3 Jan 2021 at 23:03, Roelof Wobben <[hidden email]> wrote:
>> >>> I want that the code fetches a url and some data from the
>> >>> Rijksmuseaum api.
>> >>> And as far as I see it the second it not pointless because it
>> >>> getting more detailed info about the painting as in the first get.
>> >>>
>> >>> I did not profiled it because I never learned how to do that in Pharo.
>> >>>
>> >>> Roelof
>> >>>
>> >>>
>> >>>
>> >>> Op 3-1-2021 om 01:09 schreef Richard O'Keefe:
>> >>>> What do you want the code to do?
>> >>>> Have you profiled the code to see where the time is going?
>> >>>>
>> >>>> A quick look at the code shows
>> >>>> - Paintings does one web get
>> >>>> - each Painting does two more web gets
>> >>>> ! and the first of those seems to be pretty pointless,
>> >>>> as it refetches an object that Paintings already fetched
>> >>>> and just looked at.
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Sun, 3 Jan 2021 at 01:16, Roelof Wobben via Pharo-users
>> >>>> <[hidden email]> wrote:
>> >>>> Hello,
>> >>>>
>> >>>> I have now this code : https://github.com/RoelofWobben/Rijksmuseam
>> >>>>
>> >>>> but it seems to be slow.
>> >>>>
>> >>>> Can anyone help me with a way I can use a sort of cache so the page
>> >>>> looks first at the cache if a image is there .
>> >>>> If so, take the image from there , if not , ask the api for the url of
>> >>>> the image.
>> >>>>
>> >>>> Roelof

Julián Maestri-2

Re: is there a better way

You can also try https://github.com/ba-st/Superluminal which has caching support among other things.

On Wed, 6 Jan 2021, 16:42 Esteban Maringolo, <[hidden email]> wrote:

To avoid doing something like that, and for a web scrapping tool I
wrote, I implemented a basic subclass of ZnClient (called
ZnCachingClient) that used a disk cache where each key/file was the
hash of the requested URL.

I haven't used in a while but it should continue to work, in any case,
I published it in a Github repository:
https://github.com/eMaringolo/zinc-caching-client

Best regards,

Esteban A. Maringolo

On Wed, Jan 6, 2021 at 3:37 PM Richard Sargent
<[hidden email]> wrote:
>
>
>
> On Wed, Jan 6, 2021 at 10:34 AM Roelof Wobben via Pharo-users <[hidden email]> wrote:
>>
>>
>> Thanks,
>>
>> Right now im downloading/fetching the images every time again.
>>
>> As I see it, the biggest bottleneck is that I have 10 images.
>> And for all 10 I fetching the image and the data I could display when a
>> user wants it.
>> So that will be 20 calls to the api.
>>
>> So maybe some cache could be handy.
>> Any hints how to make a cache in smalltalk.
>
>
> Dictionary coupled with #at:ifAbsentPut:
>
>>
>> Roelof
>>
>>
>>
>> Op 6-1-2021 om 19:19 schreef Sven Van Caekenberghe:
>> > Roelof,
>> >
>> > Working with multiple high resolution images, as I believe you are
>> > doing, is always going to be a real challenge, performance wise. It
>> > just takes time to transfer lots of data.
>> >
>> > First you have to make sure that you are not doing too much work
>> > (double downloads, using too high resolutions for previews or
>> > browsing). Also, make sure your ultimate client (the browser) can
>> > cache as well if applicable (set modification dates on the response).
>> >
>> > Next you could cache images locally (on your app server) so that next
>> > time you need the same image, you do not need to download it again. Of
>> > course, this only helps if your hit rate is higher than zero (if you
>> > actually ask for the same image multiple times).
>> >
>> > It is also possible to do multiple download requests concurrently: if
>> > the other end is fast enough, that can certainly help.
>> >
>> > HTH,
>> >
>> > Sven
>> >
>> >> On 6 Jan 2021, at 18:11, Roelof Wobben via Pharo-users
>> >> <[hidden email]> wrote:
>> >>
>> >>
>> >> I did it on the root document and see this :
>> >>
>> >> <nkdkdknfekaflcfc.png>
>> >>
>> >> So as far as I see it , The most time it taken by getting all the
>> >> data from all the 10 images.
>> >>
>> >> I hope someone can look at me if im on the right track and will help
>> >> me to figure out faster ways to achieve the same
>> >>
>> >> Roelof
>> >>
>> >>
>> >>
>> >> Op 5-1-2021 om 05:16 schreef Richard O'Keefe:
>> >>> Before you take another step, explore the root document.
>> >>>
>> >>> Profiling is easy.
>> >>> Open a Playground.
>> >>> Type an expression such as
>> >>> 3 tinyBenchmarks
>> >>> Right click and select 'Profile it'.
>> >>>
>> >>> More generally, in a browser, look at the "Tool - Profilers"
>> >>> class category. The classic approach was
>> >>> MessageTally spyOn: [3 tinyBenchmarks]
>> >>> If I understand correctly, 'Profile it' uses TimeProfiler,
>> >>> which has a nicer interface. (This is in Pharo 8.)
>> >>>
>> >>>
>> >>> On Sun, 3 Jan 2021 at 23:03, Roelof Wobben <[hidden email]> wrote:
>> >>> I want that the code fetches a url and some data from the
>> >>> Rijksmuseaum api.
>> >>> And as far as I see it the second it not pointless because it
>> >>> getting more detailed info about the painting as in the first get.
>> >>>
>> >>> I did not profiled it because I never learned how to do that in Pharo.
>> >>>
>> >>> Roelof
>> >>>
>> >>>
>> >>>
>> >>> Op 3-1-2021 om 01:09 schreef Richard O'Keefe:
>> >>>> What do you want the code to do?
>> >>>> Have you profiled the code to see where the time is going?
>> >>>>
>> >>>> A quick look at the code shows
>> >>>> - Paintings does one web get
>> >>>> - each Painting does two more web gets
>> >>>> ! and the first of those seems to be pretty pointless,
>> >>>> as it refetches an object that Paintings already fetched
>> >>>> and just looked at.
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Sun, 3 Jan 2021 at 01:16, Roelof Wobben via Pharo-users
>> >>>> <[hidden email]> wrote:
>> >>>> Hello,
>> >>>>
>> >>>> I have now this code : https://github.com/RoelofWobben/Rijksmuseam
>> >>>>
>> >>>> but it seems to be slow.
>> >>>>
>> >>>> Can anyone help me with a way I can use a sort of cache so the page
>> >>>> looks first at the cache if a image is there .
>> >>>> If so, take the image from there , if not , ask the api for the url of
>> >>>> the image.
>> >>>>
>> >>>> Roelof

Kasper Osterbye

Re: is there a better way

Just my input to the cache thing. Pharo has a class named LRUCache (Least Recently Used Cache), which is very helpful for such.

If you want to store it on disk instead of the image Estaban's suggestion seems like the way to go.

Best,

Kasper