Cache for ZnClient

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Cache for ZnClient

Esteban A. Maringolo
Is there any library/setting that adds "cache" to ZnClient?

I'm doing some web scrapping with ZnClient, and sometimes I request
the same pages (~300) more than once, and I'd like to save requests
(and time) by simply retrieving what's in the cache (since the content
doesn't change frequently).

Maybe there other way is to put an HTTP proxy in between and ask the
local proxy.

Regards,

Esteban A. Maringolo

Reply | Threaded
Open this post in threaded view
|

Re: Cache for ZnClient

Sven Van Caekenberghe-2
Hi Esteban,

> On 4 Dec 2019, at 21:35, Esteban Maringolo <[hidden email]> wrote:
>
> Is there any library/setting that adds "cache" to ZnClient?
>
> I'm doing some web scrapping with ZnClient, and sometimes I request
> the same pages (~300) more than once, and I'd like to save requests
> (and time) by simply retrieving what's in the cache (since the content
> doesn't change frequently).
>
> Maybe there other way is to put an HTTP proxy in between and ask the
> local proxy.
>
> Regards,
>
> Esteban A. Maringolo

No, this does not exist as such, not in public code anyway, AFAIK.

In our codebase there is a RestJsonClient which holds onto an httpClient and a CachingRestJsonClient as a subclass of that one, that add a cache url->json with time and number of entries based lru invalidation.

It would not be too hard to built this yourself.

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Cache for ZnClient

Esteban A. Maringolo
Hi Sven,

Yeap, building something would be fairly easy, I just wanted to avoid doing it.
Or to know where I should "hook" such cache lookup in the ZnClient
before performing the actual network request.

Regards,



Esteban A. Maringolo

On Wed, Dec 4, 2019 at 6:27 PM Sven Van Caekenberghe <[hidden email]> wrote:

>
> Hi Esteban,
>
> > On 4 Dec 2019, at 21:35, Esteban Maringolo <[hidden email]> wrote:
> >
> > Is there any library/setting that adds "cache" to ZnClient?
> >
> > I'm doing some web scrapping with ZnClient, and sometimes I request
> > the same pages (~300) more than once, and I'd like to save requests
> > (and time) by simply retrieving what's in the cache (since the content
> > doesn't change frequently).
> >
> > Maybe there other way is to put an HTTP proxy in between and ask the
> > local proxy.
> >
> > Regards,
> >
> > Esteban A. Maringolo
>
> No, this does not exist as such, not in public code anyway, AFAIK.
>
> In our codebase there is a RestJsonClient which holds onto an httpClient and a CachingRestJsonClient as a subclass of that one, that add a cache url->json with time and number of entries based lru invalidation.
>
> It would not be too hard to built this yourself.
>
> Sven
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Cache for ZnClient

Esteban A. Maringolo
I created a ZnCachingClient subclass of ZnClient, and I cache GET
requests responses on disk.

The overview is here:
https://gist.github.com/eMaringolo/bed9974d70c9ab8c6398149716e22b08

I don't differentiate about content type nor encoding, because for my
use case I'm only caching text/html responses.
It's simple, but it works pretty well and saved me a lot of time in
the debugging of the scrapper :-)


Regards!

Esteban A. Maringolo

Esteban A. Maringolo


On Wed, Dec 4, 2019 at 6:58 PM Esteban Maringolo <[hidden email]> wrote:

>
> Hi Sven,
>
> Yeap, building something would be fairly easy, I just wanted to avoid doing it.
> Or to know where I should "hook" such cache lookup in the ZnClient
> before performing the actual network request.
>
> Regards,
>
>
>
> Esteban A. Maringolo
>
> On Wed, Dec 4, 2019 at 6:27 PM Sven Van Caekenberghe <[hidden email]> wrote:
> >
> > Hi Esteban,
> >
> > > On 4 Dec 2019, at 21:35, Esteban Maringolo <[hidden email]> wrote:
> > >
> > > Is there any library/setting that adds "cache" to ZnClient?
> > >
> > > I'm doing some web scrapping with ZnClient, and sometimes I request
> > > the same pages (~300) more than once, and I'd like to save requests
> > > (and time) by simply retrieving what's in the cache (since the content
> > > doesn't change frequently).
> > >
> > > Maybe there other way is to put an HTTP proxy in between and ask the
> > > local proxy.
> > >
> > > Regards,
> > >
> > > Esteban A. Maringolo
> >
> > No, this does not exist as such, not in public code anyway, AFAIK.
> >
> > In our codebase there is a RestJsonClient which holds onto an httpClient and a CachingRestJsonClient as a subclass of that one, that add a cache url->json with time and number of entries based lru invalidation.
> >
> > It would not be too hard to built this yourself.
> >
> > Sven
> >
> >