How to properly handle ConnectionClosed with Zinc?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to properly handle ConnectionClosed with Zinc?

hernanmd
Hi,

I have a situation where downloading lot of data results in ConnectionClosed.

>>httpGet: urlString
    "Get the response from the given url"
   
    ^ [ ... .get: urlString ]
        on: ZnEntityTooLarge, ConnectionClosed
        do: [ : ex |
            ex isResumable
                ifTrue: [ ex resume ]
                ifFalse: [ self handleNonResumableConnection: ex ] ].

I read a mail from Sven you could change a global setting in ZnNetworkingUtils as follows:

>>handleNonResumableConnection: anException

    | msg |
    msg :=  'We got a non-resumable exception while connecting to ' , self url host asString , '. Re-try switching ZdcSocketStream to SocketStream?'.
    (self confirm: msg)
        ifTrue: [
            ZnNetworkingUtils default socketStreamClass: SocketStream.
            anException retry.]   

But I would like to read about better handling, or recommendations.
What do you think?

Cheers,

Hernán


Reply | Threaded
Open this post in threaded view
|

Re: How to properly handle ConnectionClosed with Zinc?

Sven Van Caekenberghe-2
Hi Hernán,

If you do lots of requests (downloads) or if you download large resources (files) and you get some networking exception, all you have to do is catch it, close the connection, maybe wait a couple of seconds and try again. There is (normally) no need for any more magic, unless you found a bug.

ZnClient has some basic support for this, see #numberOfRetries[:] #retryDelay[:] #timeout[:]

Of course, I do not know exactly what you are trying to do, what the other/server end is, what code you have written, what problems you see, what you networking quality is, and so on.

Switching the socket stream implementation is only necessary in very rare cases, it is not recommended that normal users do that.

HTH,

Sven

> On 18 Aug 2016, at 06:43, Hernán Morales Durand <[hidden email]> wrote:
>
> Hi,
>
> I have a situation where downloading lot of data results in ConnectionClosed.
>
> >>httpGet: urlString
>     "Get the response from the given url"
>    
>     ^ [ ... .get: urlString ]
>         on: ZnEntityTooLarge, ConnectionClosed
>         do: [ : ex |
>             ex isResumable
>                 ifTrue: [ ex resume ]
>                 ifFalse: [ self handleNonResumableConnection: ex ] ].
>
> I read a mail from Sven you could change a global setting in ZnNetworkingUtils as follows:
>
> >>handleNonResumableConnection: anException
>
>     | msg |
>     msg :=  'We got a non-resumable exception while connecting to ' , self url host asString , '. Re-try switching ZdcSocketStream to SocketStream?'.
>     (self confirm: msg)
>         ifTrue: [
>             ZnNetworkingUtils default socketStreamClass: SocketStream.
>             anException retry.]    
>
> But I would like to read about better handling, or recommendations.
> What do you think?
>
> Cheers,
>
> Hernán
>
>


Reply | Threaded
Open this post in threaded view
|

Re: How to properly handle ConnectionClosed with Zinc?

hernanmd
Hi Sven!

Yes, this is lot of requests to a public database named GenBank through a public API called "Entrez". I repeated the test today to send you a log of what's happening. The requests are done in batches of 500 uid's because they limit/penalize huge requests. Each uid is human meta-data, I have requested 20,000 samples, and Argentina has one of the worst connections in the world.

Another problem I have is the Transcript writes everything once the exception is signaled :(
Is there a way to revert to the old behavior where each Transcript show: writes to Transcript in situ?

The ClosedConnection didn't signaled today, but OutOfMemory.

It would be nice to know how much memory was requested on startup (this is Pharo 4 on Win 8.1), because the "restart the Pharo VM with a larger memory allocation" is useless without any numbers.

Hernán


2016-08-18 9:33 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
Hi Hernán,

If you do lots of requests (downloads) or if you download large resources (files) and you get some networking exception, all you have to do is catch it, close the connection, maybe wait a couple of seconds and try again. There is (normally) no need for any more magic, unless you found a bug.

ZnClient has some basic support for this, see #numberOfRetries[:] #retryDelay[:] #timeout[:]

Of course, I do not know exactly what you are trying to do, what the other/server end is, what code you have written, what problems you see, what you networking quality is, and so on.

Switching the socket stream implementation is only necessary in very rare cases, it is not recommended that normal users do that.

HTH,

Sven

> On 18 Aug 2016, at 06:43, Hernán Morales Durand <[hidden email]> wrote:
>
> Hi,
>
> I have a situation where downloading lot of data results in ConnectionClosed.
>
> >>httpGet: urlString
>     "Get the response from the given url"
>
>     ^ [ ... .get: urlString ]
>         on: ZnEntityTooLarge, ConnectionClosed
>         do: [ : ex |
>             ex isResumable
>                 ifTrue: [ ex resume ]
>                 ifFalse: [ self handleNonResumableConnection: ex ] ].
>
> I read a mail from Sven you could change a global setting in ZnNetworkingUtils as follows:
>
> >>handleNonResumableConnection: anException
>
>     | msg |
>     msg :=  'We got a non-resumable exception while connecting to ' , self url host asString , '. Re-try switching ZdcSocketStream to SocketStream?'.
>     (self confirm: msg)
>         ifTrue: [
>             ZnNetworkingUtils default socketStreamClass: SocketStream.
>             anException retry.]
>
> But I would like to read about better handling, or recommendations.
> What do you think?
>
> Cheers,
>
> Hernán
>
>




Pharo_OOM.jpg (365K) Download Attachment
Zn_Requests.zip (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to properly handle ConnectionClosed with Zinc?

Sven Van Caekenberghe-2

> On 18 Aug 2016, at 18:35, Hernán Morales Durand <[hidden email]> wrote:
>
> Hi Sven!
>
> Yes, this is lot of requests to a public database named GenBank through a public API called "Entrez". I repeated the test today to send you a log of what's happening. The requests are done in batches of 500 uid's because they limit/penalize huge requests. Each uid is human meta-data, I have requested 20,000 samples, and Argentina has one of the worst connections in the world.

I see that you do 10 request concurrently, correct ? In different threads/processes ?

I tried the first request that I saw,

ZnClient new url: 'http://eutils.ncbi.nlm.nih.gov:80/entrez/eutils/efetch.fcgi?retmode=xml&id=188847,188840,188835,188827,188833,188831,5919044,5919041,188829,188824,188816,188841,188828,188822,188809,188838,188836,5919066,5919018,188826,188807,5919042,188812,5919052,5919028,188832,188830,188820,188823,188819,5919056,188784,5919051,5919010,188814,5919036,5919011,188815,188794,5919043,5919009,5919004,188802,5919040,5919003,5919006,5919001,5919037,5919020,188844,188782,188779,5919067,5919045,5918994,188783,5919062,5919019,5918993,188743,188734,5919013,5919000,5918987,188817,5919046,5919007,5918996,5919032,5919029,5919027,5919021,188719,188795,188785,188788,188780,5919026,5919024,5919023,5919022,5919017,5919002,5919064,5918997,5919059,5919048,5919005,188781,5919055,5919047,5919033,5919030,5919025,5918992,188739,5919016,188732,188731,5919035,5918991,5918990,188834,188774,188763,5919031,5919012,5918989,188825,188745,5919015,5918995,188715,188775,188730,188718,5919049,188821,188803,5919065,5919050,5919034,188714,5918988,188720,5919063,5919054,5919014,5919008,188797,5918999,5918998,188717,5919057,188750,188723,188758,188769,188752,188751,188724,188760,188754,5919060,188759,188757,188756,5919061,188771,188736,5919038,188761,5919053,188735,188786,5919058,188818,188753,188733,188845,188755,188837,188777,188725,188846,188744,1002820005,999844743,941582049,941581974,941581869,941581732,941581138,941580894,941580699,850201272,653603877,653603608,653603509,653603323,653603248,653602787,827529357,817037023,807985587,807985574,807985096,806641362,735659021,735658976,735658957,735658862,735658829,735658814,735658794,735658792,735658766,735658757,735658751,695118655,695118621,695118612,695118403,695118088,695118067,700283821,700282771,700281917,575524695,550825751,556709823,556709753,667060998,568246384,610260256,610260144,430766974,397187538,397181812,570436518,374089790,374089754,374089025,374089020,374088999,374088998,374088995,549444460,410062763,472836119,408775754,408775351,536189361,513791441,513791357,513791301,513791217,513790965,513790629,513790321,513790223,513788781,461681792,378744435,401666731,401666186,302377629,302377601,392496866,392496829,392496762,392496587,392496297,392496261,392496213,392496202,392496104,392495779,392495588,381261925,381261506,381253498,326200539,339745356,339744796,339744703,339744578,339744577,305656401,305656233,305656149,305656121,305655967,305655953,305655925,334189684,320339399,329741913,290758881,305387591,318039611,289600384,288806465,288806339,288805758,224474796,224474781,209916429,223015240,223015212,223015170,223015156,223012454,223009682,205278793,189396910,189175349,189175200,189175179,189175178,189175014,189175012,189174852,171675625,171674995,171674925,156077831,119034956,154100682,86450681,78775919,92109049,48596178,32891159,75905872,32348701,32348575,32348436,32348394,61287889,58333751,18845598,941581228,381273587,198445276,198445114,198445030,198444993,116009304,116009141,1046813254,1046813224,1046813183,1046813083,1046813056,1044327069,1043189398,961420771,961420766,961420707,961420612,961420499,961420495,961420492,961420468,961420465,961420273,961420254,961420206,961420158,914343336,1036141119,1031954968,1025815253,1025813517,1025813307,1024312154,961391305,961391239,961391204,961391190,961391164,961390978,1002820131,1002820047,1002819991,1002819935,1002819837,1002552665,1002552627,1002552595,1002552503,999844882,999844783,999844777,999844747,999844727,998453634,998453620,982278317,941583102,941582715,941582531,941582178,941582015,941581822,941581789,941581731,941581674,941581539,941581502,941581448,941581294,941581131,941581007,941580980,941580783,957761438,821345416,255367289,255367177,255367163,255366085,891497898,850203159,850203157,850202660,850202610,850202608,850202194,653604148,653604144,653604143,653604135,653604084,653604022,653604006,653603906,653603905,653603896,653603888,653603850,653603838,653603811,653603724,653603559,653603399,653603328,653603288,653603266,653603234,653603186,653603183,653603163,653603162,653603154,653603090,653602987,653602982,653602979,653602962,653602868,653602840,653602832,653602802,807985564,807985482,807985463,807985332,807985294,807985282,807985270,807985265,807985240,807985201,807985176,807985141,807985134,807985121,807985088,806642054,806641999,806641444,806641353,806641250,806641208,806641180,806641082,806640718,735659024,735658983,735658901,735658884,735658841,735658805,735658747,756762254,666695871,695118992,695118986,695118959,695118943,695118900,695118876,695118874,695118870,695118824,695118768,695118703,695118692,695118688,695118661,695118589,695118438,695118421,695118375,695118344,695118179,695118173,695118170&db=nuccore'; get; yourself.

Which resulted in about 11Mb XML, which you parse and convert I guess ?

That would be between 100 and 200 Mb, at least, no ?

That already puts a heavy burden on the image/VM. Make sure to properly close and cleanup everything as soon as you can.

And all this 40 times (500 x 40 = 20000) ?

I think that goes way over the 2Gb limit (which is rather theoretical, it is more like 1.5).

Unless you clean up very well afterwards and only keep the minimal useful/needed info in memory.

> Another problem I have is the Transcript writes everything once the exception is signaled :(
> Is there a way to revert to the old behavior where each Transcript show: writes to Transcript in situ?

Do you block the UI thread (while running your main top level expression) ? Try forking.

> The ClosedConnection didn't signaled today, but OutOfMemory.

See above.

> It would be nice to know how much memory was requested on startup (this is Pharo 4 on Win 8.1), because the "restart the Pharo VM with a larger memory allocation" is useless without any numbers.
>
> Hernán
>
>
> 2016-08-18 9:33 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
> Hi Hernán,
>
> If you do lots of requests (downloads) or if you download large resources (files) and you get some networking exception, all you have to do is catch it, close the connection, maybe wait a couple of seconds and try again. There is (normally) no need for any more magic, unless you found a bug.
>
> ZnClient has some basic support for this, see #numberOfRetries[:] #retryDelay[:] #timeout[:]
>
> Of course, I do not know exactly what you are trying to do, what the other/server end is, what code you have written, what problems you see, what you networking quality is, and so on.
>
> Switching the socket stream implementation is only necessary in very rare cases, it is not recommended that normal users do that.
>
> HTH,
>
> Sven
>
> > On 18 Aug 2016, at 06:43, Hernán Morales Durand <[hidden email]> wrote:
> >
> > Hi,
> >
> > I have a situation where downloading lot of data results in ConnectionClosed.
> >
> > >>httpGet: urlString
> >     "Get the response from the given url"
> >
> >     ^ [ ... .get: urlString ]
> >         on: ZnEntityTooLarge, ConnectionClosed
> >         do: [ : ex |
> >             ex isResumable
> >                 ifTrue: [ ex resume ]
> >                 ifFalse: [ self handleNonResumableConnection: ex ] ].
> >
> > I read a mail from Sven you could change a global setting in ZnNetworkingUtils as follows:
> >
> > >>handleNonResumableConnection: anException
> >
> >     | msg |
> >     msg :=  'We got a non-resumable exception while connecting to ' , self url host asString , '. Re-try switching ZdcSocketStream to SocketStream?'.
> >     (self confirm: msg)
> >         ifTrue: [
> >             ZnNetworkingUtils default socketStreamClass: SocketStream.
> >             anException retry.]
> >
> > But I would like to read about better handling, or recommendations.
> > What do you think?
> >
> > Cheers,
> >
> > Hernán
> >
> >
>
>
>
> <Pharo_OOM.jpg><Zn_Requests.zip>


Reply | Threaded
Open this post in threaded view
|

Re: How to properly handle ConnectionClosed with Zinc?

hernanmd

2016-08-18 16:57 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:

> On 18 Aug 2016, at 18:35, Hernán Morales Durand <[hidden email]> wrote:
>
> Hi Sven!
>
> Yes, this is lot of requests to a public database named GenBank through a public API called "Entrez". I repeated the test today to send you a log of what's happening. The requests are done in batches of 500 uid's because they limit/penalize huge requests. Each uid is human meta-data, I have requested 20,000 samples, and Argentina has one of the worst connections in the world.

I see that you do 10 request concurrently, correct ? In different threads/processes ?


Mmmm no, I wrote #splitDownload methods to handle batch requests (historically NCBI limits to 3 requests per second)... This is the code (entrezUrlUidLimit = 500)

>>splitDownload

    | index splittedRs |

    splittedRs := OrderedCollection new: self entrezUrlUidLimit.   
    index := 1.
    self uids do: [: id  |
            splittedRs add: id.
            index \\ self entrezUrlUidLimit = 0
                ifTrue: [
                    self bioLog: 'Requesting ' , splittedRs size asString , ' records to Entrez'.
                    self results add: (self genBankFetchRecords: splittedRs).
                    splittedRs := OrderedCollection new: self entrezUrlUidLimit ].
            index := index + 1 ]
        displayingProgress: 'Downloading...' translated.
    " Add remaining records "
    self bioLog: 'Requesting ' , splittedRs size asString , ' records to Entrez'.
    self results add: (self genBankFetchRecords: splittedRs).
    ^ self results

Then I just isolated the code in Pharo 5 and did a break (Alt + . on Windows).

2016-08-19 04:06:33 Requesting 500 records to Entrez
2016-08-19 04:06:33 Executing...BioEFetchSeq
2016-08-19 04:07:17 Requesting 500 records to Entrez
2016-08-19 04:07:17 Executing...BioEFetchSeq
2016-08-19 04:08:20 Requesting 500 records to Entrez
2016-08-19 04:08:20 Executing...BioEFetchSeq
2016-08-19 04:09:28 Requesting 500 records to Entrez
2016-08-19 04:09:28 Executing...BioEFetchSeq
2016-08-19 04:10:34 Requesting 500 records to Entrez
2016-08-19 04:10:34 Executing...BioEFetchSeq
2016-08-19 04:12:09 Requesting 500 records to Entrez
2016-08-19 04:12:09 Executing...BioEFetchSeq
2016-08-19 04:14:05 Requesting 500 records to Entrez
2016-08-19 04:14:05 Executing...BioEFetchSeq
2016-08-19 04:14:54 Requesting 500 records to Entrez
2016-08-19 04:14:54 Executing...BioEFetchSeq
2016-08-19 04:15:58 Requesting 500 records to Entrez
2016-08-19 04:15:58 Executing...BioEFetchSeq
2016-08-19 04:17:38 Requesting 500 records to Entrez
2016-08-19 04:17:38 Executing...BioEFetchSeq
2016-08-19 04:19:12 Requesting 500 records to Entrez
2016-08-19 04:19:12 Executing...BioEFetchSeq
2016-08-19 04:21:01 Requesting 500 records to Entrez
2016-08-19 04:21:01 Executing...BioEFetchSeq
2016-08-19 04:22:41 Requesting 500 records to Entrez
2016-08-19 04:22:41 Executing...BioEFetchSeq
2016-08-19 04:24:17 Requesting 500 records to Entrez
2016-08-19 04:24:17 Executing...BioEFetchSeq
2016-08-19 04:25:29 Requesting 500 records to Entrez
2016-08-19 04:25:29 Executing...BioEFetchSeq
2016-08-19 04:26:34 Requesting 500 records to Entrez
2016-08-19 04:26:34 Executing...BioEFetchSeq
2016-08-19 04:28:48 Requesting 500 records to Entrez
2016-08-19 04:28:48 Executing...BioEFetchSeq
2016-08-19 04:29:37 Requesting 500 records to Entrez
2016-08-19 04:29:37 Executing...BioEFetchSeq
2016-08-19 04:30:53 Requesting 500 records to Entrez
2016-08-19 04:30:53 Executing...BioEFetchSeq
2016-08-19 04:32:43 Requesting 500 records to Entrez
2016-08-19 04:32:43 Executing...BioEFetchSeq
2016-08-19 04:33:56 Requesting 500 records to Entrez
2016-08-19 04:33:56 Executing...BioEFetchSeq
2016-08-19 04:34:29 Requesting 500 records to Entrez
2016-08-19 04:34:29 Executing...BioEFetchSeq

And now Transcript displays the intervals (all at once of course), something weird is happening there.
 
I tried the first request that I saw,

ZnClient new url: 'http://eutils.ncbi.nlm.nih.gov:80/entrez/eutils/efetch.fcgi?retmode=xml&id=188847,188840,188835,188827,188833,188831,5919044,5919041,188829,188824,188816,188841,188828,188822,188809,188838,188836,5919066,5919018,188826,188807,5919042,188812,5919052,5919028,188832,188830,188820,188823,188819,5919056,188784,5919051,5919010,188814,5919036,5919011,188815,188794,5919043,5919009,5919004,188802,5919040,5919003,5919006,5919001,5919037,5919020,188844,188782,188779,5919067,5919045,5918994,188783,5919062,5919019,5918993,188743,188734,5919013,5919000,5918987,188817,5919046,5919007,5918996,5919032,5919029,5919027,5919021,188719,188795,188785,188788,188780,5919026,5919024,5919023,5919022,5919017,5919002,5919064,5918997,5919059,5919048,5919005,188781,5919055,5919047,5919033,5919030,5919025,5918992,188739,5919016,188732,188731,5919035,5918991,5918990,188834,188774,188763,5919031,5919012,5918989,188825,188745,5919015,5918995,188715,188775,188730,188718,5919049,188821,188803,5919065,5919050,5919034,188714,5918988,188720,5919063,5919054,5919014,5919008,188797,5918999,5918998,188717,5919057,188750,188723,188758,188769,188752,188751,188724,188760,188754,5919060,188759,188757,188756,5919061,188771,188736,5919038,188761,5919053,188735,188786,5919058,188818,188753,188733,188845,188755,188837,188777,188725,188846,188744,1002820005,999844743,941582049,941581974,941581869,941581732,941581138,941580894,941580699,850201272,653603877,653603608,653603509,653603323,653603248,653602787,827529357,817037023,807985587,807985574,807985096,806641362,735659021,735658976,735658957,735658862,735658829,735658814,735658794,735658792,735658766,735658757,735658751,695118655,695118621,695118612,695118403,695118088,695118067,700283821,700282771,700281917,575524695,550825751,556709823,556709753,667060998,568246384,610260256,610260144,430766974,397187538,397181812,570436518,374089790,374089754,374089025,374089020,374088999,374088998,374088995,549444460,410062763,472836119,408775754,408775351,536189361,513791441,513791357,513791301,513791217,513790965,513790629,513790321,513790223,513788781,461681792,378744435,401666731,401666186,302377629,302377601,392496866,392496829,392496762,392496587,392496297,392496261,392496213,392496202,392496104,392495779,392495588,381261925,381261506,381253498,326200539,339745356,339744796,339744703,339744578,339744577,305656401,305656233,305656149,305656121,305655967,305655953,305655925,334189684,320339399,329741913,290758881,305387591,318039611,289600384,288806465,288806339,288805758,224474796,224474781,209916429,223015240,223015212,223015170,223015156,223012454,223009682,205278793,189396910,189175349,189175200,189175179,189175178,189175014,189175012,189174852,171675625,171674995,171674925,156077831,119034956,154100682,86450681,78775919,92109049,48596178,32891159,75905872,32348701,32348575,32348436,32348394,61287889,58333751,18845598,941581228,381273587,198445276,198445114,198445030,198444993,116009304,116009141,1046813254,1046813224,1046813183,1046813083,1046813056,1044327069,1043189398,961420771,961420766,961420707,961420612,961420499,961420495,961420492,961420468,961420465,961420273,961420254,961420206,961420158,914343336,1036141119,1031954968,1025815253,1025813517,1025813307,1024312154,961391305,961391239,961391204,961391190,961391164,961390978,1002820131,1002820047,1002819991,1002819935,1002819837,1002552665,1002552627,1002552595,1002552503,999844882,999844783,999844777,999844747,999844727,998453634,998453620,982278317,941583102,941582715,941582531,941582178,941582015,941581822,941581789,941581731,941581674,941581539,941581502,941581448,941581294,941581131,941581007,941580980,941580783,957761438,821345416,255367289,255367177,255367163,255366085,891497898,850203159,850203157,850202660,850202610,850202608,850202194,653604148,653604144,653604143,653604135,653604084,653604022,653604006,653603906,653603905,653603896,653603888,653603850,653603838,653603811,653603724,653603559,653603399,653603328,653603288,653603266,653603234,653603186,653603183,653603163,653603162,653603154,653603090,653602987,653602982,653602979,653602962,653602868,653602840,653602832,653602802,807985564,807985482,807985463,807985332,807985294,807985282,807985270,807985265,807985240,807985201,807985176,807985141,807985134,807985121,807985088,806642054,806641999,806641444,806641353,806641250,806641208,806641180,806641082,806640718,735659024,735658983,735658901,735658884,735658841,735658805,735658747,756762254,666695871,695118992,695118986,695118959,695118943,695118900,695118876,695118874,695118870,695118824,695118768,695118703,695118692,695118688,695118661,695118589,695118438,695118421,695118375,695118344,695118179,695118173,695118170&db=nuccore'; get; yourself.

Which resulted in about 11Mb XML, which you parse and convert I guess ?


Yes, I have a parser for that in BioSmalltalk.
 
That would be between 100 and 200 Mb, at least, no ?

That already puts a heavy burden on the image/VM. Make sure to properly close and cleanup everything as soon as you can.

And all this 40 times (500 x 40 = 20000) ?

I think that goes way over the 2Gb limit (which is rather theoretical, it is more like 1.5).

Unless you clean up very well afterwards and only keep the minimal useful/needed info in memory.


This is the first time I try with human sequences (many samples there, we normally do animals, bacteria and viruses which are considerably fewer), I don't have the final numbers but I will serialize each downloaded file.
 
> Another problem I have is the Transcript writes everything once the exception is signaled :(
> Is there a way to revert to the old behavior where each Transcript show: writes to Transcript in situ?

Do you block the UI thread (while running your main top level expression) ? Try forking.


I know, but I don't want my users opening and closing critical windows while downloading data causing more disasters.
Is there a way to rollback to older behavior where a Transcript show: had higher priority? I don't know why this was changed, it is *critical* to see what happens in real time!
 
> The ClosedConnection didn't signaled today, but OutOfMemory.

See above.


Thanks Sven

Hernán
Reply | Threaded
Open this post in threaded view
|

Re: How to properly handle ConnectionClosed with Zinc?

Sven Van Caekenberghe-2

> On 19 Aug 2016, at 09:44, Hernán Morales Durand <[hidden email]> wrote:
>
>
> 2016-08-18 16:57 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
>
> > On 18 Aug 2016, at 18:35, Hernán Morales Durand <[hidden email]> wrote:
> >
> > Hi Sven!
> >
> > Yes, this is lot of requests to a public database named GenBank through a public API called "Entrez". I repeated the test today to send you a log of what's happening. The requests are done in batches of 500 uid's because they limit/penalize huge requests. Each uid is human meta-data, I have requested 20,000 samples, and Argentina has one of the worst connections in the world.
>
> I see that you do 10 request concurrently, correct ? In different threads/processes ?
>
>
> Mmmm no, I wrote #splitDownload methods to handle batch requests (historically NCBI limits to 3 requests per second)... This is the code (entrezUrlUidLimit = 500)

Ah, I think I know what happened. You probably called #logToTranscript multiple times with an older version of Zinc. It seems as if every log event is displayed multiple times. The code should read:

ZnLogEvent class>>#logToTranscript
        self stopLoggingToTranscript.
        ^ self announcer when: ZnLogEvent do: [ :event | Transcript crShow: event ].

ZnLogEvent class>>stopLoggingToTranscript
        self announcer unsubscribe: self

> >>splitDownload
>
>     | index splittedRs |
>
>     splittedRs := OrderedCollection new: self entrezUrlUidLimit.    
>     index := 1.
>     self uids do: [: id  |
>             splittedRs add: id.
>             index \\ self entrezUrlUidLimit = 0
>                 ifTrue: [
>                     self bioLog: 'Requesting ' , splittedRs size asString , ' records to Entrez'.
>                     self results add: (self genBankFetchRecords: splittedRs).
>                     splittedRs := OrderedCollection new: self entrezUrlUidLimit ].
>             index := index + 1 ]
>         displayingProgress: 'Downloading...' translated.
>     " Add remaining records "
>     self bioLog: 'Requesting ' , splittedRs size asString , ' records to Entrez'.
>     self results add: (self genBankFetchRecords: splittedRs).
>     ^ self results
>
> Then I just isolated the code in Pharo 5 and did a break (Alt + . on Windows).
>
> 2016-08-19 04:06:33 Requesting 500 records to Entrez
> 2016-08-19 04:06:33 Executing...BioEFetchSeq
> 2016-08-19 04:07:17 Requesting 500 records to Entrez
> 2016-08-19 04:07:17 Executing...BioEFetchSeq
> 2016-08-19 04:08:20 Requesting 500 records to Entrez
> 2016-08-19 04:08:20 Executing...BioEFetchSeq
> 2016-08-19 04:09:28 Requesting 500 records to Entrez
> 2016-08-19 04:09:28 Executing...BioEFetchSeq
> 2016-08-19 04:10:34 Requesting 500 records to Entrez
> 2016-08-19 04:10:34 Executing...BioEFetchSeq
> 2016-08-19 04:12:09 Requesting 500 records to Entrez
> 2016-08-19 04:12:09 Executing...BioEFetchSeq
> 2016-08-19 04:14:05 Requesting 500 records to Entrez
> 2016-08-19 04:14:05 Executing...BioEFetchSeq
> 2016-08-19 04:14:54 Requesting 500 records to Entrez
> 2016-08-19 04:14:54 Executing...BioEFetchSeq
> 2016-08-19 04:15:58 Requesting 500 records to Entrez
> 2016-08-19 04:15:58 Executing...BioEFetchSeq
> 2016-08-19 04:17:38 Requesting 500 records to Entrez
> 2016-08-19 04:17:38 Executing...BioEFetchSeq
> 2016-08-19 04:19:12 Requesting 500 records to Entrez
> 2016-08-19 04:19:12 Executing...BioEFetchSeq
> 2016-08-19 04:21:01 Requesting 500 records to Entrez
> 2016-08-19 04:21:01 Executing...BioEFetchSeq
> 2016-08-19 04:22:41 Requesting 500 records to Entrez
> 2016-08-19 04:22:41 Executing...BioEFetchSeq
> 2016-08-19 04:24:17 Requesting 500 records to Entrez
> 2016-08-19 04:24:17 Executing...BioEFetchSeq
> 2016-08-19 04:25:29 Requesting 500 records to Entrez
> 2016-08-19 04:25:29 Executing...BioEFetchSeq
> 2016-08-19 04:26:34 Requesting 500 records to Entrez
> 2016-08-19 04:26:34 Executing...BioEFetchSeq
> 2016-08-19 04:28:48 Requesting 500 records to Entrez
> 2016-08-19 04:28:48 Executing...BioEFetchSeq
> 2016-08-19 04:29:37 Requesting 500 records to Entrez
> 2016-08-19 04:29:37 Executing...BioEFetchSeq
> 2016-08-19 04:30:53 Requesting 500 records to Entrez
> 2016-08-19 04:30:53 Executing...BioEFetchSeq
> 2016-08-19 04:32:43 Requesting 500 records to Entrez
> 2016-08-19 04:32:43 Executing...BioEFetchSeq
> 2016-08-19 04:33:56 Requesting 500 records to Entrez
> 2016-08-19 04:33:56 Executing...BioEFetchSeq
> 2016-08-19 04:34:29 Requesting 500 records to Entrez
> 2016-08-19 04:34:29 Executing...BioEFetchSeq
>
> And now Transcript displays the intervals (all at once of course), something weird is happening there.

It is really hard for me to see the context, which is probably complex. Unless you isolate the problem for me so that I can run it in a standard image, I won't be able to say much more.

> I tried the first request that I saw,
>
> ZnClient new url: 'http://eutils.ncbi.nlm.nih.gov:80/entrez/eutils/efetch.fcgi?retmode=xml&id=188847,188840,188835,188827,188833,188831,5919044,5919041,188829,188824,188816,188841,188828,188822,188809,188838,188836,5919066,5919018,188826,188807,5919042,188812,5919052,5919028,188832,188830,188820,188823,188819,5919056,188784,5919051,5919010,188814,5919036,5919011,188815,188794,5919043,5919009,5919004,188802,5919040,5919003,5919006,5919001,5919037,5919020,188844,188782,188779,5919067,5919045,5918994,188783,5919062,5919019,5918993,188743,188734,5919013,5919000,5918987,188817,5919046,5919007,5918996,5919032,5919029,5919027,5919021,188719,188795,188785,188788,188780,5919026,5919024,5919023,5919022,5919017,5919002,5919064,5918997,5919059,5919048,5919005,188781,5919055,5919047,5919033,5919030,5919025,5918992,188739,5919016,188732,188731,5919035,5918991,5918990,188834,188774,188763,5919031,5919012,5918989,188825,188745,5919015,5918995,188715,188775,188730,188718,5919049,188821,188803,5919065,5919050,5919034,188714,5918988,188720,5919063,5919054,5919014,5919008,188797,5918999,5918998,188717,5919057,188750,188723,188758,188769,188752,188751,188724,188760,188754,5919060,188759,188757,188756,5919061,188771,188736,5919038,188761,5919053,188735,188786,5919058,188818,188753,188733,188845,188755,188837,188777,188725,188846,188744,1002820005,999844743,941582049,941581974,941581869,941581732,941581138,941580894,941580699,850201272,653603877,653603608,653603509,653603323,653603248,653602787,827529357,817037023,807985587,807985574,807985096,806641362,735659021,735658976,735658957,735658862,735658829,735658814,735658794,735658792,735658766,735658757,735658751,695118655,695118621,695118612,695118403,695118088,695118067,700283821,700282771,700281917,575524695,550825751,556709823,556709753,667060998,568246384,610260256,610260144,430766974,397187538,397181812,570436518,374089790,374089754,374089025,374089020,374088999,374088998,374088995,549444460,410062763,472836119,408775754,408775351,536189361,513791441,513791357,513791301,513791217,513790965,513790629,513790321,513790223,513788781,461681792,378744435,401666731,401666186,302377629,302377601,392496866,392496829,392496762,392496587,392496297,392496261,392496213,392496202,392496104,392495779,392495588,381261925,381261506,381253498,326200539,339745356,339744796,339744703,339744578,339744577,305656401,305656233,305656149,305656121,305655967,305655953,305655925,334189684,320339399,329741913,290758881,305387591,318039611,289600384,288806465,288806339,288805758,224474796,224474781,209916429,223015240,223015212,223015170,223015156,223012454,223009682,205278793,189396910,189175349,189175200,189175179,189175178,189175014,189175012,189174852,171675625,171674995,171674925,156077831,119034956,154100682,86450681,78775919,92109049,48596178,32891159,75905872,32348701,32348575,32348436,32348394,61287889,58333751,18845598,941581228,381273587,198445276,198445114,198445030,198444993,116009304,116009141,1046813254,1046813224,1046813183,1046813083,1046813056,1044327069,1043189398,961420771,961420766,961420707,961420612,961420499,961420495,961420492,961420468,961420465,961420273,961420254,961420206,961420158,914343336,1036141119,1031954968,1025815253,1025813517,1025813307,1024312154,961391305,961391239,961391204,961391190,961391164,961390978,1002820131,1002820047,1002819991,1002819935,1002819837,1002552665,1002552627,1002552595,1002552503,999844882,999844783,999844777,999844747,999844727,998453634,998453620,982278317,941583102,941582715,941582531,941582178,941582015,941581822,941581789,941581731,941581674,941581539,941581502,941581448,941581294,941581131,941581007,941580980,941580783,957761438,821345416,255367289,255367177,255367163,255366085,891497898,850203159,850203157,850202660,850202610,850202608,850202194,653604148,653604144,653604143,653604135,653604084,653604022,653604006,653603906,653603905,653603896,653603888,653603850,653603838,653603811,653603724,653603559,653603399,653603328,653603288,653603266,653603234,653603186,653603183,653603163,653603162,653603154,653603090,653602987,653602982,653602979,653602962,653602868,653602840,653602832,653602802,807985564,807985482,807985463,807985332,807985294,807985282,807985270,807985265,807985240,807985201,807985176,807985141,807985134,807985121,807985088,806642054,806641999,806641444,806641353,806641250,806641208,806641180,806641082,806640718,735659024,735658983,735658901,735658884,735658841,735658805,735658747,756762254,666695871,695118992,695118986,695118959,695118943,695118900,695118876,695118874,695118870,695118824,695118768,695118703,695118692,695118688,695118661,695118589,695118438,695118421,695118375,695118344,695118179,695118173,695118170&db=nuccore'; get; yourself.
>
> Which resulted in about 11Mb XML, which you parse and convert I guess ?
>
>
> Yes, I have a parser for that in BioSmalltalk.
>  
> That would be between 100 and 200 Mb, at least, no ?
>
> That already puts a heavy burden on the image/VM. Make sure to properly close and cleanup everything as soon as you can.
>
> And all this 40 times (500 x 40 = 20000) ?
>
> I think that goes way over the 2Gb limit (which is rather theoretical, it is more like 1.5).
>
> Unless you clean up very well afterwards and only keep the minimal useful/needed info in memory.
>
>
> This is the first time I try with human sequences (many samples there, we normally do animals, bacteria and viruses which are considerably fewer), I don't have the final numbers but I will serialize each downloaded file.
>  
> > Another problem I have is the Transcript writes everything once the exception is signaled :(
> > Is there a way to revert to the old behavior where each Transcript show: writes to Transcript in situ?
>
> Do you block the UI thread (while running your main top level expression) ? Try forking.
>
>
> I know, but I don't want my users opening and closing critical windows while downloading data causing more disasters.
> Is there a way to rollback to older behavior where a Transcript show: had higher priority? I don't know why this was changed, it is *critical* to see what happens in real time!

To see what I mean, try both of the following expressions with a clean Transcript:

1 to: 10 do: [ :n | ('Now doing ', n asString) crLog. 1 second wait ].

[ 1 to: 10 do: [ :n | ('Now doing ', n asString) crLog. 1 second wait ] ] fork.

If you want to signal progress, there are (better) options for that.

Have a look at ZnClientTest>>#testProgress or even MCHttpRepository>>#displayProgress:during: for examples.

>  
> > The ClosedConnection didn't signaled today, but OutOfMemory.
>
> See above.
>
>
> Thanks Sven
>
> Hernán


Reply | Threaded
Open this post in threaded view
|

Re: How to properly handle ConnectionClosed with Zinc?

hernanmd


2016-08-19 5:05 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:

> On 19 Aug 2016, at 09:44, Hernán Morales Durand <[hidden email]> wrote:
>
>
> 2016-08-18 16:57 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
>
> > On 18 Aug 2016, at 18:35, Hernán Morales Durand <[hidden email]> wrote:
> >
> > Hi Sven!
> >
> > Yes, this is lot of requests to a public database named GenBank through a public API called "Entrez". I repeated the test today to send you a log of what's happening. The requests are done in batches of 500 uid's because they limit/penalize huge requests. Each uid is human meta-data, I have requested 20,000 samples, and Argentina has one of the worst connections in the world.
>
> I see that you do 10 request concurrently, correct ? In different threads/processes ?
>
>
> Mmmm no, I wrote #splitDownload methods to handle batch requests (historically NCBI limits to 3 requests per second)... This is the code (entrezUrlUidLimit = 500)

Ah, I think I know what happened. You probably called #logToTranscript multiple times with an older version of Zinc. It seems as if every log event is displayed multiple times. The code should read:

ZnLogEvent class>>#logToTranscript
        self stopLoggingToTranscript.
        ^ self announcer when: ZnLogEvent do: [ :event | Transcript crShow: event ].

ZnLogEvent class>>stopLoggingToTranscript
        self announcer unsubscribe: self

> >>splitDownload
>
>     | index splittedRs |
>
>     splittedRs := OrderedCollection new: self entrezUrlUidLimit.
>     index := 1.
>     self uids do: [: id  |
>             splittedRs add: id.
>             index \\ self entrezUrlUidLimit = 0
>                 ifTrue: [
>                     self bioLog: 'Requesting ' , splittedRs size asString , ' records to Entrez'.
>                     self results add: (self genBankFetchRecords: splittedRs).
>                     splittedRs := OrderedCollection new: self entrezUrlUidLimit ].
>             index := index + 1 ]
>         displayingProgress: 'Downloading...' translated.
>     " Add remaining records "
>     self bioLog: 'Requesting ' , splittedRs size asString , ' records to Entrez'.
>     self results add: (self genBankFetchRecords: splittedRs).
>     ^ self results
>
> Then I just isolated the code in Pharo 5 and did a break (Alt + . on Windows).
>
> 2016-08-19 04:06:33 Requesting 500 records to Entrez
> 2016-08-19 04:06:33 Executing...BioEFetchSeq
> 2016-08-19 04:07:17 Requesting 500 records to Entrez
> 2016-08-19 04:07:17 Executing...BioEFetchSeq
> 2016-08-19 04:08:20 Requesting 500 records to Entrez
> 2016-08-19 04:08:20 Executing...BioEFetchSeq
> 2016-08-19 04:09:28 Requesting 500 records to Entrez
> 2016-08-19 04:09:28 Executing...BioEFetchSeq
> 2016-08-19 04:10:34 Requesting 500 records to Entrez
> 2016-08-19 04:10:34 Executing...BioEFetchSeq
> 2016-08-19 04:12:09 Requesting 500 records to Entrez
> 2016-08-19 04:12:09 Executing...BioEFetchSeq
> 2016-08-19 04:14:05 Requesting 500 records to Entrez
> 2016-08-19 04:14:05 Executing...BioEFetchSeq
> 2016-08-19 04:14:54 Requesting 500 records to Entrez
> 2016-08-19 04:14:54 Executing...BioEFetchSeq
> 2016-08-19 04:15:58 Requesting 500 records to Entrez
> 2016-08-19 04:15:58 Executing...BioEFetchSeq
> 2016-08-19 04:17:38 Requesting 500 records to Entrez
> 2016-08-19 04:17:38 Executing...BioEFetchSeq
> 2016-08-19 04:19:12 Requesting 500 records to Entrez
> 2016-08-19 04:19:12 Executing...BioEFetchSeq
> 2016-08-19 04:21:01 Requesting 500 records to Entrez
> 2016-08-19 04:21:01 Executing...BioEFetchSeq
> 2016-08-19 04:22:41 Requesting 500 records to Entrez
> 2016-08-19 04:22:41 Executing...BioEFetchSeq
> 2016-08-19 04:24:17 Requesting 500 records to Entrez
> 2016-08-19 04:24:17 Executing...BioEFetchSeq
> 2016-08-19 04:25:29 Requesting 500 records to Entrez
> 2016-08-19 04:25:29 Executing...BioEFetchSeq
> 2016-08-19 04:26:34 Requesting 500 records to Entrez
> 2016-08-19 04:26:34 Executing...BioEFetchSeq
> 2016-08-19 04:28:48 Requesting 500 records to Entrez
> 2016-08-19 04:28:48 Executing...BioEFetchSeq
> 2016-08-19 04:29:37 Requesting 500 records to Entrez
> 2016-08-19 04:29:37 Executing...BioEFetchSeq
> 2016-08-19 04:30:53 Requesting 500 records to Entrez
> 2016-08-19 04:30:53 Executing...BioEFetchSeq
> 2016-08-19 04:32:43 Requesting 500 records to Entrez
> 2016-08-19 04:32:43 Executing...BioEFetchSeq
> 2016-08-19 04:33:56 Requesting 500 records to Entrez
> 2016-08-19 04:33:56 Executing...BioEFetchSeq
> 2016-08-19 04:34:29 Requesting 500 records to Entrez
> 2016-08-19 04:34:29 Executing...BioEFetchSeq
>
> And now Transcript displays the intervals (all at once of course), something weird is happening there.

It is really hard for me to see the context, which is probably complex. Unless you isolate the problem for me so that I can run it in a standard image, I won't be able to say much more.


(Some classes could be missing)


> > Another problem I have is the Transcript writes everything once the exception is signaled :(
> > Is there a way to revert to the old behavior where each Transcript show: writes to Transcript in situ?
>
> Do you block the UI thread (while running your main top level expression) ? Try forking.
>
>
> I know, but I don't want my users opening and closing critical windows while downloading data causing more disasters.
> Is there a way to rollback to older behavior where a Transcript show: had higher priority? I don't know why this was changed, it is *critical* to see what happens in real time!

To see what I mean, try both of the following expressions with a clean Transcript:

1 to: 10 do: [ :n | ('Now doing ', n asString) crLog. 1 second wait ].

[ 1 to: 10 do: [ :n | ('Now doing ', n asString) crLog. 1 second wait ] ] fork.

If you want to signal progress, there are (better) options for that.

Have a look at ZnClientTest>>#testProgress or even MCHttpRepository>>#displayProgress:during: for examples.


Cool! Thanks, I will check as soon as I can.

Hernán

 
>
> > The ClosedConnection didn't signaled today, but OutOfMemory.
>
> See above.
>
>
> Thanks Sven
>
> Hernán