Is there any library/setting that adds "cache" to ZnClient?
I'm doing some web scrapping with ZnClient, and sometimes I request the same pages (~300) more than once, and I'd like to save requests (and time) by simply retrieving what's in the cache (since the content doesn't change frequently). Maybe there other way is to put an HTTP proxy in between and ask the local proxy. Regards, Esteban A. Maringolo |
Hi Esteban,
> On 4 Dec 2019, at 21:35, Esteban Maringolo <[hidden email]> wrote: > > Is there any library/setting that adds "cache" to ZnClient? > > I'm doing some web scrapping with ZnClient, and sometimes I request > the same pages (~300) more than once, and I'd like to save requests > (and time) by simply retrieving what's in the cache (since the content > doesn't change frequently). > > Maybe there other way is to put an HTTP proxy in between and ask the > local proxy. > > Regards, > > Esteban A. Maringolo No, this does not exist as such, not in public code anyway, AFAIK. In our codebase there is a RestJsonClient which holds onto an httpClient and a CachingRestJsonClient as a subclass of that one, that add a cache url->json with time and number of entries based lru invalidation. It would not be too hard to built this yourself. Sven |
Hi Sven,
Yeap, building something would be fairly easy, I just wanted to avoid doing it. Or to know where I should "hook" such cache lookup in the ZnClient before performing the actual network request. Regards, Esteban A. Maringolo On Wed, Dec 4, 2019 at 6:27 PM Sven Van Caekenberghe <[hidden email]> wrote: > > Hi Esteban, > > > On 4 Dec 2019, at 21:35, Esteban Maringolo <[hidden email]> wrote: > > > > Is there any library/setting that adds "cache" to ZnClient? > > > > I'm doing some web scrapping with ZnClient, and sometimes I request > > the same pages (~300) more than once, and I'd like to save requests > > (and time) by simply retrieving what's in the cache (since the content > > doesn't change frequently). > > > > Maybe there other way is to put an HTTP proxy in between and ask the > > local proxy. > > > > Regards, > > > > Esteban A. Maringolo > > No, this does not exist as such, not in public code anyway, AFAIK. > > In our codebase there is a RestJsonClient which holds onto an httpClient and a CachingRestJsonClient as a subclass of that one, that add a cache url->json with time and number of entries based lru invalidation. > > It would not be too hard to built this yourself. > > Sven > > |
I created a ZnCachingClient subclass of ZnClient, and I cache GET
requests responses on disk. The overview is here: https://gist.github.com/eMaringolo/bed9974d70c9ab8c6398149716e22b08 I don't differentiate about content type nor encoding, because for my use case I'm only caching text/html responses. It's simple, but it works pretty well and saved me a lot of time in the debugging of the scrapper :-) Regards! Esteban A. Maringolo Esteban A. Maringolo On Wed, Dec 4, 2019 at 6:58 PM Esteban Maringolo <[hidden email]> wrote: > > Hi Sven, > > Yeap, building something would be fairly easy, I just wanted to avoid doing it. > Or to know where I should "hook" such cache lookup in the ZnClient > before performing the actual network request. > > Regards, > > > > Esteban A. Maringolo > > On Wed, Dec 4, 2019 at 6:27 PM Sven Van Caekenberghe <[hidden email]> wrote: > > > > Hi Esteban, > > > > > On 4 Dec 2019, at 21:35, Esteban Maringolo <[hidden email]> wrote: > > > > > > Is there any library/setting that adds "cache" to ZnClient? > > > > > > I'm doing some web scrapping with ZnClient, and sometimes I request > > > the same pages (~300) more than once, and I'd like to save requests > > > (and time) by simply retrieving what's in the cache (since the content > > > doesn't change frequently). > > > > > > Maybe there other way is to put an HTTP proxy in between and ask the > > > local proxy. > > > > > > Regards, > > > > > > Esteban A. Maringolo > > > > No, this does not exist as such, not in public code anyway, AFAIK. > > > > In our codebase there is a RestJsonClient which holds onto an httpClient and a CachingRestJsonClient as a subclass of that one, that add a cache url->json with time and number of entries based lru invalidation. > > > > It would not be too hard to built this yourself. > > > > Sven > > > > |
Free forum by Nabble | Edit this page |