Hi, I have a situation where downloading lot of data results in ConnectionClosed. >>httpGet: urlString "Get the response from the given url" ^ [ ... .get: urlString ] on: ZnEntityTooLarge, ConnectionClosed do: [ : ex | ex isResumable ifTrue: [ ex resume ] ifFalse: [ self handleNonResumableConnection: ex ] ]. I read a mail from Sven you could change a global setting in ZnNetworkingUtils as follows: | msg | msg := 'We got a non-resumable exception while connecting to ' , self url host asString , '. Re-try switching ZdcSocketStream to SocketStream?'. (self confirm: msg) ifTrue: [ ZnNetworkingUtils default socketStreamClass: SocketStream. anException retry.] But I would like to read about better handling, or recommendations. What do you think? Cheers, Hernán |
Hi Hernán,
If you do lots of requests (downloads) or if you download large resources (files) and you get some networking exception, all you have to do is catch it, close the connection, maybe wait a couple of seconds and try again. There is (normally) no need for any more magic, unless you found a bug. ZnClient has some basic support for this, see #numberOfRetries[:] #retryDelay[:] #timeout[:] Of course, I do not know exactly what you are trying to do, what the other/server end is, what code you have written, what problems you see, what you networking quality is, and so on. Switching the socket stream implementation is only necessary in very rare cases, it is not recommended that normal users do that. HTH, Sven > On 18 Aug 2016, at 06:43, Hernán Morales Durand <[hidden email]> wrote: > > Hi, > > I have a situation where downloading lot of data results in ConnectionClosed. > > >>httpGet: urlString > "Get the response from the given url" > > ^ [ ... .get: urlString ] > on: ZnEntityTooLarge, ConnectionClosed > do: [ : ex | > ex isResumable > ifTrue: [ ex resume ] > ifFalse: [ self handleNonResumableConnection: ex ] ]. > > I read a mail from Sven you could change a global setting in ZnNetworkingUtils as follows: > > >>handleNonResumableConnection: anException > > | msg | > msg := 'We got a non-resumable exception while connecting to ' , self url host asString , '. Re-try switching ZdcSocketStream to SocketStream?'. > (self confirm: msg) > ifTrue: [ > ZnNetworkingUtils default socketStreamClass: SocketStream. > anException retry.] > > But I would like to read about better handling, or recommendations. > What do you think? > > Cheers, > > Hernán > > |
Hi Sven! Yes, this is lot of requests to a public database named GenBank through a public API called "Entrez". I repeated the test today to send you a log of what's happening. The requests are done in batches of 500 uid's because they limit/penalize huge requests. Each uid is human meta-data, I have requested 20,000 samples, and Argentina has one of the worst connections in the world.Hernán 2016-08-18 9:33 GMT-03:00 Sven Van Caekenberghe <[hidden email]>: Hi Hernán, |
2016-08-18 16:57 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
Mmmm no, I wrote #splitDownload methods to handle batch requests (historically NCBI limits to 3 requests per second)... This is the code (entrezUrlUidLimit = 500) >>splitDownload | index splittedRs | splittedRs := OrderedCollection new: self entrezUrlUidLimit. index := 1. self uids do: [: id | splittedRs add: id. index \\ self entrezUrlUidLimit = 0 ifTrue: [ self bioLog: 'Requesting ' , splittedRs size asString , ' records to Entrez'. self results add: (self genBankFetchRecords: splittedRs). splittedRs := OrderedCollection new: self entrezUrlUidLimit ]. index := index + 1 ] displayingProgress: 'Downloading...' translated. " Add remaining records " self bioLog: 'Requesting ' , splittedRs size asString , ' records to Entrez'. self results add: (self genBankFetchRecords: splittedRs). ^ self results Then I just isolated the code in Pharo 5 and did a break (Alt + . on Windows). 2016-08-19 04:06:33 Requesting 500 records to Entrez 2016-08-19 04:06:33 Executing...BioEFetchSeq 2016-08-19 04:07:17 Requesting 500 records to Entrez 2016-08-19 04:07:17 Executing...BioEFetchSeq 2016-08-19 04:08:20 Requesting 500 records to Entrez 2016-08-19 04:08:20 Executing...BioEFetchSeq 2016-08-19 04:09:28 Requesting 500 records to Entrez 2016-08-19 04:09:28 Executing...BioEFetchSeq 2016-08-19 04:10:34 Requesting 500 records to Entrez 2016-08-19 04:10:34 Executing...BioEFetchSeq 2016-08-19 04:12:09 Requesting 500 records to Entrez 2016-08-19 04:12:09 Executing...BioEFetchSeq 2016-08-19 04:14:05 Requesting 500 records to Entrez 2016-08-19 04:14:05 Executing...BioEFetchSeq 2016-08-19 04:14:54 Requesting 500 records to Entrez 2016-08-19 04:14:54 Executing...BioEFetchSeq 2016-08-19 04:15:58 Requesting 500 records to Entrez 2016-08-19 04:15:58 Executing...BioEFetchSeq 2016-08-19 04:17:38 Requesting 500 records to Entrez 2016-08-19 04:17:38 Executing...BioEFetchSeq 2016-08-19 04:19:12 Requesting 500 records to Entrez 2016-08-19 04:19:12 Executing...BioEFetchSeq 2016-08-19 04:21:01 Requesting 500 records to Entrez 2016-08-19 04:21:01 Executing...BioEFetchSeq 2016-08-19 04:22:41 Requesting 500 records to Entrez 2016-08-19 04:22:41 Executing...BioEFetchSeq 2016-08-19 04:24:17 Requesting 500 records to Entrez 2016-08-19 04:24:17 Executing...BioEFetchSeq 2016-08-19 04:25:29 Requesting 500 records to Entrez 2016-08-19 04:25:29 Executing...BioEFetchSeq 2016-08-19 04:26:34 Requesting 500 records to Entrez 2016-08-19 04:26:34 Executing...BioEFetchSeq 2016-08-19 04:28:48 Requesting 500 records to Entrez 2016-08-19 04:28:48 Executing...BioEFetchSeq 2016-08-19 04:29:37 Requesting 500 records to Entrez 2016-08-19 04:29:37 Executing...BioEFetchSeq 2016-08-19 04:30:53 Requesting 500 records to Entrez 2016-08-19 04:30:53 Executing...BioEFetchSeq 2016-08-19 04:32:43 Requesting 500 records to Entrez 2016-08-19 04:32:43 Executing...BioEFetchSeq 2016-08-19 04:33:56 Requesting 500 records to Entrez 2016-08-19 04:33:56 Executing...BioEFetchSeq 2016-08-19 04:34:29 Requesting 500 records to Entrez 2016-08-19 04:34:29 Executing...BioEFetchSeq And now Transcript displays the intervals (all at once of course), something weird is happening there. I tried the first request that I saw, Yes, I have a parser for that in BioSmalltalk. That would be between 100 and 200 Mb, at least, no ? This is the first time I try with human sequences (many samples there, we normally do animals, bacteria and viruses which are considerably fewer), I don't have the final numbers but I will serialize each downloaded file. > Another problem I have is the Transcript writes everything once the exception is signaled :( I know, but I don't want my users opening and closing critical windows while downloading data causing more disasters. Is there a way to rollback to older behavior where a Transcript show: had higher priority? I don't know why this was changed, it is *critical* to see what happens in real time! > The ClosedConnection didn't signaled today, but OutOfMemory. Thanks Sven Hernán |
> On 19 Aug 2016, at 09:44, Hernán Morales Durand <[hidden email]> wrote: > > > 2016-08-18 16:57 GMT-03:00 Sven Van Caekenberghe <[hidden email]>: > > > On 18 Aug 2016, at 18:35, Hernán Morales Durand <[hidden email]> wrote: > > > > Hi Sven! > > > > Yes, this is lot of requests to a public database named GenBank through a public API called "Entrez". I repeated the test today to send you a log of what's happening. The requests are done in batches of 500 uid's because they limit/penalize huge requests. Each uid is human meta-data, I have requested 20,000 samples, and Argentina has one of the worst connections in the world. > > I see that you do 10 request concurrently, correct ? In different threads/processes ? > > > Mmmm no, I wrote #splitDownload methods to handle batch requests (historically NCBI limits to 3 requests per second)... This is the code (entrezUrlUidLimit = 500) Ah, I think I know what happened. You probably called #logToTranscript multiple times with an older version of Zinc. It seems as if every log event is displayed multiple times. The code should read: ZnLogEvent class>>#logToTranscript self stopLoggingToTranscript. ^ self announcer when: ZnLogEvent do: [ :event | Transcript crShow: event ]. ZnLogEvent class>>stopLoggingToTranscript self announcer unsubscribe: self > >>splitDownload > > | index splittedRs | > > splittedRs := OrderedCollection new: self entrezUrlUidLimit. > index := 1. > self uids do: [: id | > splittedRs add: id. > index \\ self entrezUrlUidLimit = 0 > ifTrue: [ > self bioLog: 'Requesting ' , splittedRs size asString , ' records to Entrez'. > self results add: (self genBankFetchRecords: splittedRs). > splittedRs := OrderedCollection new: self entrezUrlUidLimit ]. > index := index + 1 ] > displayingProgress: 'Downloading...' translated. > " Add remaining records " > self bioLog: 'Requesting ' , splittedRs size asString , ' records to Entrez'. > self results add: (self genBankFetchRecords: splittedRs). > ^ self results > > Then I just isolated the code in Pharo 5 and did a break (Alt + . on Windows). > > 2016-08-19 04:06:33 Requesting 500 records to Entrez > 2016-08-19 04:06:33 Executing...BioEFetchSeq > 2016-08-19 04:07:17 Requesting 500 records to Entrez > 2016-08-19 04:07:17 Executing...BioEFetchSeq > 2016-08-19 04:08:20 Requesting 500 records to Entrez > 2016-08-19 04:08:20 Executing...BioEFetchSeq > 2016-08-19 04:09:28 Requesting 500 records to Entrez > 2016-08-19 04:09:28 Executing...BioEFetchSeq > 2016-08-19 04:10:34 Requesting 500 records to Entrez > 2016-08-19 04:10:34 Executing...BioEFetchSeq > 2016-08-19 04:12:09 Requesting 500 records to Entrez > 2016-08-19 04:12:09 Executing...BioEFetchSeq > 2016-08-19 04:14:05 Requesting 500 records to Entrez > 2016-08-19 04:14:05 Executing...BioEFetchSeq > 2016-08-19 04:14:54 Requesting 500 records to Entrez > 2016-08-19 04:14:54 Executing...BioEFetchSeq > 2016-08-19 04:15:58 Requesting 500 records to Entrez > 2016-08-19 04:15:58 Executing...BioEFetchSeq > 2016-08-19 04:17:38 Requesting 500 records to Entrez > 2016-08-19 04:17:38 Executing...BioEFetchSeq > 2016-08-19 04:19:12 Requesting 500 records to Entrez > 2016-08-19 04:19:12 Executing...BioEFetchSeq > 2016-08-19 04:21:01 Requesting 500 records to Entrez > 2016-08-19 04:21:01 Executing...BioEFetchSeq > 2016-08-19 04:22:41 Requesting 500 records to Entrez > 2016-08-19 04:22:41 Executing...BioEFetchSeq > 2016-08-19 04:24:17 Requesting 500 records to Entrez > 2016-08-19 04:24:17 Executing...BioEFetchSeq > 2016-08-19 04:25:29 Requesting 500 records to Entrez > 2016-08-19 04:25:29 Executing...BioEFetchSeq > 2016-08-19 04:26:34 Requesting 500 records to Entrez > 2016-08-19 04:26:34 Executing...BioEFetchSeq > 2016-08-19 04:28:48 Requesting 500 records to Entrez > 2016-08-19 04:28:48 Executing...BioEFetchSeq > 2016-08-19 04:29:37 Requesting 500 records to Entrez > 2016-08-19 04:29:37 Executing...BioEFetchSeq > 2016-08-19 04:30:53 Requesting 500 records to Entrez > 2016-08-19 04:30:53 Executing...BioEFetchSeq > 2016-08-19 04:32:43 Requesting 500 records to Entrez > 2016-08-19 04:32:43 Executing...BioEFetchSeq > 2016-08-19 04:33:56 Requesting 500 records to Entrez > 2016-08-19 04:33:56 Executing...BioEFetchSeq > 2016-08-19 04:34:29 Requesting 500 records to Entrez > 2016-08-19 04:34:29 Executing...BioEFetchSeq > > And now Transcript displays the intervals (all at once of course), something weird is happening there. It is really hard for me to see the context, which is probably complex. Unless you isolate the problem for me so that I can run it in a standard image, I won't be able to say much more. > I tried the first request that I saw, > > ZnClient new url: 'http://eutils.ncbi.nlm.nih.gov:80/entrez/eutils/efetch.fcgi?retmode=xml&id=188847,188840,188835,188827,188833,188831,5919044,5919041,188829,188824,188816,188841,188828,188822,188809,188838,188836,5919066,5919018,188826,188807,5919042,188812,5919052,5919028,188832,188830,188820,188823,188819,5919056,188784,5919051,5919010,188814,5919036,5919011,188815,188794,5919043,5919009,5919004,188802,5919040,5919003,5919006,5919001,5919037,5919020,188844,188782,188779,5919067,5919045,5918994,188783,5919062,5919019,5918993,188743,188734,5919013,5919000,5918987,188817,5919046,5919007,5918996,5919032,5919029,5919027,5919021,188719,188795,188785,188788,188780,5919026,5919024,5919023,5919022,5919017,5919002,5919064,5918997,5919059,5919048,5919005,188781,5919055,5919047,5919033,5919030,5919025,5918992,188739,5919016,188732,188731,5919035,5918991,5918990,188834,188774,188763,5919031,5919012,5918989,188825,188745,5919015,5918995,188715,188775,188730,188718,5919049,188821,188803,5919065,5919050,5919034,188714,5918988,188720,5919063,5919054,5919014,5919008,188797,5918999,5918998,188717,5919057,188750,188723,188758,188769,188752,188751,188724,188760,188754,5919060,188759,188757,188756,5919061,188771,188736,5919038,188761,5919053,188735,188786,5919058,188818,188753,188733,188845,188755,188837,188777,188725,188846,188744,1002820005,999844743,941582049,941581974,941581869,941581732,941581138,941580894,941580699,850201272,653603877,653603608,653603509,653603323,653603248,653602787,827529357,817037023,807985587,807985574,807985096,806641362,735659021,735658976,735658957,735658862,735658829,735658814,735658794,735658792,735658766,735658757,735658751,695118655,695118621,695118612,695118403,695118088,695118067,700283821,700282771,700281917,575524695,550825751,556709823,556709753,667060998,568246384,610260256,610260144,430766974,397187538,397181812,570436518,374089790,374089754,374089025,374089020,374088999,374088998,374088995,549444460,410062763,472836119,408775754,408775351,536189361,513791441,513791357,513791301,513791217,513790965,513790629,513790321,513790223,513788781,461681792,378744435,401666731,401666186,302377629,302377601,392496866,392496829,392496762,392496587,392496297,392496261,392496213,392496202,392496104,392495779,392495588,381261925,381261506,381253498,326200539,339745356,339744796,339744703,339744578,339744577,305656401,305656233,305656149,305656121,305655967,305655953,305655925,334189684,320339399,329741913,290758881,305387591,318039611,289600384,288806465,288806339,288805758,224474796,224474781,209916429,223015240,223015212,223015170,223015156,223012454,223009682,205278793,189396910,189175349,189175200,189175179,189175178,189175014,189175012,189174852,171675625,171674995,171674925,156077831,119034956,154100682,86450681,78775919,92109049,48596178,32891159,75905872,32348701,32348575,32348436,32348394,61287889,58333751,18845598,941581228,381273587,198445276,198445114,198445030,198444993,116009304,116009141,1046813254,1046813224,1046813183,1046813083,1046813056,1044327069,1043189398,961420771,961420766,961420707,961420612,961420499,961420495,961420492,961420468,961420465,961420273,961420254,961420206,961420158,914343336,1036141119,1031954968,1025815253,1025813517,1025813307,1024312154,961391305,961391239,961391204,961391190,961391164,961390978,1002820131,1002820047,1002819991,1002819935,1002819837,1002552665,1002552627,1002552595,1002552503,999844882,999844783,999844777,999844747,999844727,998453634,998453620,982278317,941583102,941582715,941582531,941582178,941582015,941581822,941581789,941581731,941581674,941581539,941581502,941581448,941581294,941581131,941581007,941580980,941580783,957761438,821345416,255367289,255367177,255367163,255366085,891497898,850203159,850203157,850202660,850202610,850202608,850202194,653604148,653604144,653604143,653604135,653604084,653604022,653604006,653603906,653603905,653603896,653603888,653603850,653603838,653603811,653603724,653603559,653603399,653603328,653603288,653603266,653603234,653603186,653603183,653603163,653603162,653603154,653603090,653602987,653602982,653602979,653602962,653602868,653602840,653602832,653602802,807985564,807985482,807985463,807985332,807985294,807985282,807985270,807985265,807985240,807985201,807985176,807985141,807985134,807985121,807985088,806642054,806641999,806641444,806641353,806641250,806641208,806641180,806641082,806640718,735659024,735658983,735658901,735658884,735658841,735658805,735658747,756762254,666695871,695118992,695118986,695118959,695118943,695118900,695118876,695118874,695118870,695118824,695118768,695118703,695118692,695118688,695118661,695118589,695118438,695118421,695118375,695118344,695118179,695118173,695118170&db=nuccore'; get; yourself. > > Which resulted in about 11Mb XML, which you parse and convert I guess ? > > > Yes, I have a parser for that in BioSmalltalk. > > That would be between 100 and 200 Mb, at least, no ? > > That already puts a heavy burden on the image/VM. Make sure to properly close and cleanup everything as soon as you can. > > And all this 40 times (500 x 40 = 20000) ? > > I think that goes way over the 2Gb limit (which is rather theoretical, it is more like 1.5). > > Unless you clean up very well afterwards and only keep the minimal useful/needed info in memory. > > > This is the first time I try with human sequences (many samples there, we normally do animals, bacteria and viruses which are considerably fewer), I don't have the final numbers but I will serialize each downloaded file. > > > Another problem I have is the Transcript writes everything once the exception is signaled :( > > Is there a way to revert to the old behavior where each Transcript show: writes to Transcript in situ? > > Do you block the UI thread (while running your main top level expression) ? Try forking. > > > I know, but I don't want my users opening and closing critical windows while downloading data causing more disasters. > Is there a way to rollback to older behavior where a Transcript show: had higher priority? I don't know why this was changed, it is *critical* to see what happens in real time! To see what I mean, try both of the following expressions with a clean Transcript: 1 to: 10 do: [ :n | ('Now doing ', n asString) crLog. 1 second wait ]. [ 1 to: 10 do: [ :n | ('Now doing ', n asString) crLog. 1 second wait ] ] fork. If you want to signal progress, there are (better) options for that. Have a look at ZnClientTest>>#testProgress or even MCHttpRepository>>#displayProgress:during: for examples. > > > The ClosedConnection didn't signaled today, but OutOfMemory. > > See above. > > > Thanks Sven > > Hernán |
2016-08-19 5:05 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
(Some classes could be missing) > > Another problem I have is the Transcript writes everything once the exception is signaled :( Cool! Thanks, I will check as soon as I can. Hernán
|
Free forum by Nabble | Edit this page |