Hi everyone, I just added a new bug entry for an issue we are experimenting since some times: Here is the description: History: This issue has been spotted after experimenting strange behavior with seaside upload. Steps to reproduce: I have been able to set up a small scenario that highlight the issue. Download Pharo 6.1 on my Mac (Sierra 10.12.6): https://pharo.org/web/download => start the pharo image
=> Open a web browser page on: http://localhost:1701/form-test-3
Debugging:
Bob Arning has been able to reproduce the issue with my scenario. Here are the conclusion till there:
suspicions This issue appeared with Pharo 6. Some people suggested that it could be a vm issue, and to try my little scenario with the last available vm. I am not sure where to find the last available vm. I did the test using these elements: https://files.pharo.org/get-files/70/pharo-mac-latest.zip/
The issue is still present Cyrille Delaunay |
Hi Cyril, try with the last vms available at: For example, the last Ubuntu 64bits vm is at: Regards, Thierry 2018-01-18 16:42 GMT+01:00 Cyrille Delaunay <[hidden email]>:
|
Tx Cyril We know that there is some heap corruption and this is super that you get a reproducible scenario. Stef On Thu, Jan 18, 2018 at 4:51 PM, Thierry Goubier <[hidden email]> wrote:
|
In reply to this post by cdelaunay
As a separate data point:
- I've seen corruption twice out of several runs (the process hits "self halt") in a 32 bit VM. -- In both cases the first four bytes of the array were 0. - I haven't yet seen a crash or corruption in the 64 bit VM (it's in the process of looping 1000 times). OS: Ubuntu 16.04. Images: - Pharo-7.0+alpha.build.425.sha.eb0a6fb140ac4a42b1f158ed37717e0650f778b4 (32 Bit) - Pharo-7.0+alpha.build.436.sha.7e0f6d30dca546f3859b32764483b42f4fdcd63c (64 Bit) VM: 5.0-201801110739 Thursday 11 January 09:22:24 CET 2018 gcc 4.8.5 [Production Spur 64-bit VM] CoInterpreter VMMaker.oscog-eem.2302 uuid: 55ec8f63-cdbe-4e79-8f22-48fdea585b88 Jan 11 2018 StackToRegisterMappingCogit VMMaker.oscog-eem.2302 uuid: 55ec8f63-cdbe-4e79-8f22-48fdea585b88 Jan 11 2018 VM: 201801110739 alistair@alistair-xps13:snap/pharo-snap/pharo-vm/opensmalltalk-vm $ Date: Wed Jan 10 23:39:30 2018 -0800 $ Plugins: 201801110739 alistair@alistair-xps13:snap/pharo-snap/pharo-vm/opensmalltalk-vm $ Linux b07d7880072c 4.13.0-26-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 22:00:44 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux plugin path: /snap/pharo7/x1/usr/bin/pharo-vm/5.0-201801110739 [default: /snap/pharo7/x1/usr/bin/pharo-vm/5.0-201801110739/] The 32 bit VM is the same version as above. Cheers, Alistair On 18 January 2018 at 16:42, Cyrille Delaunay <[hidden email]> wrote: > Hi everyone, > > I just added a new bug entry for an issue we are experimenting since some > times: > > https://pharo.fogbugz.com/f/cases/20982/Random-corrupted-data-when-copying-from-very-large-byte-array > > Here is the description: > > > History: > > This issue has been spotted after experimenting strange behavior with > seaside upload. > After uploading a big file from a web browser, the modeled file generated > within pharo image begins with 4 unexpected bytes. > This issue occurs randomly: sometimes the first 4 bytes are right. Sometimes > the first 4 bytes are wrong. > This issue only occurs with Pharo 6. > This issue occurs for all platforms (Mac, Ubuntu, Windows) > > Steps to reproduce: > > I have been able to set up a small scenario that highlight the issue. > > Download Pharo 6.1 on my Mac (Sierra 10.12.6): > https://pharo.org/web/download > Then, iterate over this process till spotting the issue: > > => start the pharo image > => execute this piece of code in a playground > > 1: > 2: > 3: > 4: > 5: > 6: > > ZnServer startDefaultOn: 1701. > ZnServer default maximumEntitySize: 80* 1024 * 1024. > '/Users/cdelaunay/myzip.zip' asFileReference writeStreamDo: [ :out | > out binary; nextPutAll: #[80 75 3 4 10 0 0 0 0 0 125 83 67 73 0 0 0 0 0 0]. > 18202065 timesRepeat: [ out nextPut: 0 ] > ]. > > => Open a web browser page on: http://localhost:1701/form-test-3 > => Upload the file zip file previously generated ('myzip.zip'). > => If the web page displays: "contents=000000000a00..." (or anything > unexpected), THIS IS THE ISSUE ! > => If the web page displays: "contents=504b03040a00..", the upload worked > fine. You can close the image (without saving) > > > > Debugging: > > > > Bob Arning has been able to reproduce the issue with my scenario. > He dived into the code involved during this process, till reaching some > "basic" methods where he saw the issue occuring. > > Here are the conclusion till there: > => A corruption occurs while reading an input stream with method ZnUtils > class>>readUpToEnd:limit: > The first 4 bytes may be altered randomely. > => The first 4 bytes are initially correctly written to an outputStream. > But, the first 4 bytes of this outputStream gets altered (corrupted), > sometimes when the inner byte array grows OR when performing the final > "outputStream contents" > => Here is a piece of code that reproduce the issue (still randomely. > stopping an restarting the image may change the behavior) > > 1: > 2: > 3: > 4: > 5: > 6: > 7: > 8: > 9: > 10: > 11: > 12: > 13: > 14: > 15: > 16: > 17: > 18: > 19: > 20: > > test4"self test4" | species bufferSize buffer totalRead outputStream > answer inputStream ba byte1 | ba := ByteArray new: 18202085. > ba atAllPut: 99. 1 to: 20 do: [ :i | ba at: i put: (#[80 75 3 4 10 7 > 7 7 7 7 125 83 67 73 7 7 7 7 7 7] at: i) ]. inputStream := ba readStream. > bufferSize := 16384. species := ByteArray. > buffer := species new: bufferSize. > totalRead := 0. > outputStream := nil. > [ inputStream atEnd ] whileFalse: [ | readCount | > readCount := inputStream readInto: buffer startingAt: 1 count: > bufferSize. > totalRead = 0 ifTrue: [ > byte1 := buffer first. > ]. > totalRead := totalRead + readCount. > > outputStream ifNil: [ > inputStream atEnd > ifTrue: [ ^ buffer copyFrom: 1 to: readCount ] > ifFalse: [ outputStream := (species new: bufferSize) > writeStream ] ]. > outputStream next: readCount putAll: buffer startingAt: 1. > byte1 = outputStream contents first ifFalse: [ self halt ]. > ]. > answer := outputStream ifNil: [ species new ] ifNotNil: [ outputStream > contents ]. > byte1 = answer first ifFalse: [ self halt ]. ^answer > > > > suspicions > > This issue appeared with Pharo 6. > > Some people suggested that it could be a vm issue, and to try my little > scenario with the last available vm. > > I am not sure where to find the last available vm. > > I did the test using these elements: > > https://files.pharo.org/image/60/latest.zip > > https://files.pharo.org/get-files/70/pharo-mac-latest.zip/ > > > > The issue is still present > > > > > -- > Cyrille Delaunay |
In reply to this post by Thierry Goubier
I would suspect a bug in primitive 105 on byte objects (it was changed recently in the VM), called by copyFrom: 1 to: readCount. The bug would likely by due to specific alignment in readCount or something like that. (Assuming you're in 32 bits since the 4 bytes are corrupted). When I get better I can have a look (I am currently quite sick). On Thu, Jan 18, 2018 at 4:51 PM, Thierry Goubier <[hidden email]> wrote:
Clément Béra Pharo consortium engineer Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq |
Does not seem to be related to prim 105. I am confused. Has the size of the array an impact at all ? It seems the problem shows since the first copy of 16k elements. I can't really reproduce the bug - it happened once but I cannot do it again. Does the bug happen with the StackVM/PharoS VM you can find here the 32 bits versions : http://files.pharo.org/vm/pharoS-spur32/ ? The StackVM/PharoS VM is the VM without the JIT, it may be since the bug is unreliable that it happens only in jitted code, so trying that out may be worth it. On Thu, Jan 18, 2018 at 7:12 PM, Clément Bera <[hidden email]> wrote:
Clément Béra Pharo consortium engineer Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq |
Hi Clément,
On 19 January 2018 at 17:04, Clément Bera <[hidden email]> wrote: > Does not seem to be related to prim 105. > > I am confused. Has the size of the array an impact at all ? Yes, I tried reducing the size of the array by a factor of 10 and wasn't able to reproduce the problem at all. With the full size array it failed over half the time (32 bit). I ran the test about 180 times on 64 bit and didn't get a single failure. > It seems the > problem shows since the first copy of 16k elements. > > I can't really reproduce the bug - it happened once but I cannot do it > again. Does the bug happen with the StackVM/PharoS VM you can find here the > 32 bits versions : http://files.pharo.org/vm/pharoS-spur32/ ? The > StackVM/PharoS VM is the VM without the JIT, it may be since the bug is > unreliable that it happens only in jitted code, so trying that out may be > worth it. I'll try and have a look at this over the weekend. Cheers, Alistair > On Thu, Jan 18, 2018 at 7:12 PM, Clément Bera <[hidden email]> > wrote: >> >> I would suspect a bug in primitive 105 on byte objects (it was changed >> recently in the VM), called by copyFrom: 1 to: readCount. The bug would >> likely by due to specific alignment in readCount or something like that. >> (Assuming you're in 32 bits since the 4 bytes are corrupted). >> >> When I get better I can have a look (I am currently quite sick). >> >> On Thu, Jan 18, 2018 at 4:51 PM, Thierry Goubier >> <[hidden email]> wrote: >>> >>> Hi Cyril, >>> >>> try with the last vms available at: >>> >>> https://bintray.com/opensmalltalk/vm/cog/ >>> >>> For example, the last Ubuntu 64bits vm is at: >>> >>> https://bintray.com/opensmalltalk/vm/cog/201801170946#files >>> >>> Regards, >>> >>> Thierry >>> >>> 2018-01-18 16:42 GMT+01:00 Cyrille Delaunay <[hidden email]>: >>>> >>>> Hi everyone, >>>> >>>> I just added a new bug entry for an issue we are experimenting since >>>> some times: >>>> >>>> >>>> https://pharo.fogbugz.com/f/cases/20982/Random-corrupted-data-when-copying-from-very-large-byte-array >>>> >>>> Here is the description: >>>> >>>> >>>> History: >>>> >>>> This issue has been spotted after experimenting strange behavior with >>>> seaside upload. >>>> After uploading a big file from a web browser, the modeled file >>>> generated within pharo image begins with 4 unexpected bytes. >>>> This issue occurs randomly: sometimes the first 4 bytes are right. >>>> Sometimes the first 4 bytes are wrong. >>>> This issue only occurs with Pharo 6. >>>> This issue occurs for all platforms (Mac, Ubuntu, Windows) >>>> >>>> Steps to reproduce: >>>> >>>> I have been able to set up a small scenario that highlight the issue. >>>> >>>> Download Pharo 6.1 on my Mac (Sierra 10.12.6): >>>> https://pharo.org/web/download >>>> Then, iterate over this process till spotting the issue: >>>> >>>> => start the pharo image >>>> => execute this piece of code in a playground >>>> >>>> 1: >>>> 2: >>>> 3: >>>> 4: >>>> 5: >>>> 6: >>>> >>>> ZnServer startDefaultOn: 1701. >>>> ZnServer default maximumEntitySize: 80* 1024 * 1024. >>>> '/Users/cdelaunay/myzip.zip' asFileReference writeStreamDo: [ :out | >>>> out binary; nextPutAll: #[80 75 3 4 10 0 0 0 0 0 125 83 67 73 0 0 0 0 0 >>>> 0]. >>>> 18202065 timesRepeat: [ out nextPut: 0 ] >>>> ]. >>>> >>>> => Open a web browser page on: http://localhost:1701/form-test-3 >>>> => Upload the file zip file previously generated ('myzip.zip'). >>>> => If the web page displays: "contents=000000000a00..." (or anything >>>> unexpected), THIS IS THE ISSUE ! >>>> => If the web page displays: "contents=504b03040a00..", the upload >>>> worked fine. You can close the image (without saving) >>>> >>>> >>>> >>>> Debugging: >>>> >>>> >>>> >>>> Bob Arning has been able to reproduce the issue with my scenario. >>>> He dived into the code involved during this process, till reaching some >>>> "basic" methods where he saw the issue occuring. >>>> >>>> Here are the conclusion till there: >>>> => A corruption occurs while reading an input stream with method ZnUtils >>>> class>>readUpToEnd:limit: >>>> The first 4 bytes may be altered randomely. >>>> => The first 4 bytes are initially correctly written to an outputStream. >>>> But, the first 4 bytes of this outputStream gets altered (corrupted), >>>> sometimes when the inner byte array grows OR when performing the final >>>> "outputStream contents" >>>> => Here is a piece of code that reproduce the issue (still randomely. >>>> stopping an restarting the image may change the behavior) >>>> >>>> 1: >>>> 2: >>>> 3: >>>> 4: >>>> 5: >>>> 6: >>>> 7: >>>> 8: >>>> 9: >>>> 10: >>>> 11: >>>> 12: >>>> 13: >>>> 14: >>>> 15: >>>> 16: >>>> 17: >>>> 18: >>>> 19: >>>> 20: >>>> >>>> test4"self test4" | species bufferSize buffer totalRead outputStream >>>> answer inputStream ba byte1 | ba := ByteArray new: 18202085. >>>> ba atAllPut: 99. 1 to: 20 do: [ :i | ba at: i put: (#[80 75 3 4 10 7 >>>> 7 7 7 7 125 83 67 73 7 7 7 7 7 7] at: i) ]. inputStream := ba readStream. >>>> bufferSize := 16384. species := ByteArray. >>>> buffer := species new: bufferSize. >>>> totalRead := 0. >>>> outputStream := nil. >>>> [ inputStream atEnd ] whileFalse: [ | readCount | >>>> readCount := inputStream readInto: buffer startingAt: 1 count: >>>> bufferSize. >>>> totalRead = 0 ifTrue: [ >>>> byte1 := buffer first. >>>> ]. >>>> totalRead := totalRead + readCount. >>>> >>>> outputStream ifNil: [ >>>> inputStream atEnd >>>> ifTrue: [ ^ buffer copyFrom: 1 to: readCount ] >>>> ifFalse: [ outputStream := (species new: bufferSize) >>>> writeStream ] ]. >>>> outputStream next: readCount putAll: buffer startingAt: 1. >>>> byte1 = outputStream contents first ifFalse: [ self halt ]. >>>> ]. >>>> answer := outputStream ifNil: [ species new ] ifNotNil: [ >>>> outputStream contents ]. >>>> byte1 = answer first ifFalse: [ self halt ]. ^answer >>>> >>>> >>>> >>>> suspicions >>>> >>>> This issue appeared with Pharo 6. >>>> >>>> Some people suggested that it could be a vm issue, and to try my little >>>> scenario with the last available vm. >>>> >>>> I am not sure where to find the last available vm. >>>> >>>> I did the test using these elements: >>>> >>>> https://files.pharo.org/image/60/latest.zip >>>> >>>> https://files.pharo.org/get-files/70/pharo-mac-latest.zip/ >>>> >>>> >>>> >>>> The issue is still present >>>> >>>> >>>> >>>> >>>> -- >>>> Cyrille Delaunay >>> >>> >> >> >> >> -- >> Clément Béra >> Pharo consortium engineer >> https://clementbera.wordpress.com/ >> Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq > > > > > -- > Clément Béra > Pharo consortium engineer > https://clementbera.wordpress.com/ > Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq |
Hi Clément,
On 19 January 2018 at 17:21, Alistair Grant <[hidden email]> wrote: > Hi Clément, > > On 19 January 2018 at 17:04, Clément Bera <[hidden email]> wrote: >> Does not seem to be related to prim 105. >> >> I am confused. Has the size of the array an impact at all ? > > Yes, I tried reducing the size of the array by a factor of 10 and > wasn't able to reproduce the problem at all. > > With the full size array it failed over half the time (32 bit). > > I ran the test about 180 times on 64 bit and didn't get a single failure. > >> It seems the >> problem shows since the first copy of 16k elements. >> >> I can't really reproduce the bug - it happened once but I cannot do it >> again. Does the bug happen with the StackVM/PharoS VM you can find here the >> 32 bits versions : http://files.pharo.org/vm/pharoS-spur32/ ? The >> StackVM/PharoS VM is the VM without the JIT, it may be since the bug is >> unreliable that it happens only in jitted code, so trying that out may be >> worth it. > > I'll try and have a look at this over the weekend. This didn't fail once in 55 runs. OS: Ubuntu 16.04 Image: Pharo 6.0 Latest update: #60528 VM: 5.0 #1 Wed Oct 12 15:48:53 CEST 2016 gcc 4.6.3 [Production Spur ITHB VM] StackInterpreter VMMaker.oscog-EstebanLorenzano.1881 uuid: ed616067-a57c-409b-bfb6-dab51f058235 Oct 12 2016 https://github.com/pharo-project/pharo-vm.git Commit: 01a03276a2e2b243cd4a7d3427ba541f835c07d3 Date: 2016-10-12 14:31:09 +0200 By: Esteban Lorenzano <[hidden email]> Jenkins build #606 Linux pharo-linux 3.2.0-31-generic-pae #50-Ubuntu SMP Fri Sep 7 16:39:45 UTC 2012 i686 i686 i386 GNU/Linux plugin path: /home/alistair/pharo7/Issue20982/bin/ [default: /home/alistair/pharo7/Issue20982/bin/] I then went back and attempted to reproduce the failures in my regular 32 bit image, but only got 1 corruption in 10 runs. I've been working in this image without restarting for most of the day. Quitting out and restarting the image and then running the corruption check resulted in 11 corruptions from 11 runs. Image: Pharo 7.0 Build information: Pharo-7.0+alpha.build.425.sha.eb0a6fb140ac4a42b1f158ed37717e0650f778b4 (32 Bit) VM: 5.0-201801110739 Thursday 11 January 09:30:12 CET 2018 gcc 4.8.5 [Production Spur VM] CoInterpreter VMMaker.oscog-eem.2302 uuid: 55ec8f63-cdbe-4e79-8f22-48fdea585b88 Jan 11 2018 StackToRegisterMappingCogit VMMaker.oscog-eem.2302 uuid: 55ec8f63-cdbe-4e79-8f22-48fdea585b88 Jan 11 2018 VM: 201801110739 alistair@alistair-xps13:snap/pharo-snap/pharo-vm/opensmalltalk-vm $ Date: Wed Jan 10 23:39:30 2018 -0800 $ Plugins: 201801110739 alistair@alistair-xps13:snap/pharo-snap/pharo-vm/opensmalltalk-vm $ Linux b07d7880072c 4.13.0-26-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 22:00:44 UTC 2018 i686 i686 i686 GNU/Linux plugin path: /snap/core/3748/lib/i386-linux-gnu/ [default: /snap/core/3748/lib/i386-linux-gnu/] So, as well as restarting the image before running the test, just wondering if the gcc compiler version could have an impact? HTH, Alistair > Cheers, > Alistair > > > >> On Thu, Jan 18, 2018 at 7:12 PM, Clément Bera <[hidden email]> >> wrote: >>> >>> I would suspect a bug in primitive 105 on byte objects (it was changed >>> recently in the VM), called by copyFrom: 1 to: readCount. The bug would >>> likely by due to specific alignment in readCount or something like that. >>> (Assuming you're in 32 bits since the 4 bytes are corrupted). >>> >>> When I get better I can have a look (I am currently quite sick). >>> >>> On Thu, Jan 18, 2018 at 4:51 PM, Thierry Goubier >>> <[hidden email]> wrote: >>>> >>>> Hi Cyril, >>>> >>>> try with the last vms available at: >>>> >>>> https://bintray.com/opensmalltalk/vm/cog/ >>>> >>>> For example, the last Ubuntu 64bits vm is at: >>>> >>>> https://bintray.com/opensmalltalk/vm/cog/201801170946#files >>>> >>>> Regards, >>>> >>>> Thierry >>>> >>>> 2018-01-18 16:42 GMT+01:00 Cyrille Delaunay <[hidden email]>: >>>>> >>>>> Hi everyone, >>>>> >>>>> I just added a new bug entry for an issue we are experimenting since >>>>> some times: >>>>> >>>>> >>>>> https://pharo.fogbugz.com/f/cases/20982/Random-corrupted-data-when-copying-from-very-large-byte-array >>>>> >>>>> Here is the description: >>>>> >>>>> >>>>> History: >>>>> >>>>> This issue has been spotted after experimenting strange behavior with >>>>> seaside upload. >>>>> After uploading a big file from a web browser, the modeled file >>>>> generated within pharo image begins with 4 unexpected bytes. >>>>> This issue occurs randomly: sometimes the first 4 bytes are right. >>>>> Sometimes the first 4 bytes are wrong. >>>>> This issue only occurs with Pharo 6. >>>>> This issue occurs for all platforms (Mac, Ubuntu, Windows) >>>>> >>>>> Steps to reproduce: >>>>> >>>>> I have been able to set up a small scenario that highlight the issue. >>>>> >>>>> Download Pharo 6.1 on my Mac (Sierra 10.12.6): >>>>> https://pharo.org/web/download >>>>> Then, iterate over this process till spotting the issue: >>>>> >>>>> => start the pharo image >>>>> => execute this piece of code in a playground >>>>> >>>>> 1: >>>>> 2: >>>>> 3: >>>>> 4: >>>>> 5: >>>>> 6: >>>>> >>>>> ZnServer startDefaultOn: 1701. >>>>> ZnServer default maximumEntitySize: 80* 1024 * 1024. >>>>> '/Users/cdelaunay/myzip.zip' asFileReference writeStreamDo: [ :out | >>>>> out binary; nextPutAll: #[80 75 3 4 10 0 0 0 0 0 125 83 67 73 0 0 0 0 0 >>>>> 0]. >>>>> 18202065 timesRepeat: [ out nextPut: 0 ] >>>>> ]. >>>>> >>>>> => Open a web browser page on: http://localhost:1701/form-test-3 >>>>> => Upload the file zip file previously generated ('myzip.zip'). >>>>> => If the web page displays: "contents=000000000a00..." (or anything >>>>> unexpected), THIS IS THE ISSUE ! >>>>> => If the web page displays: "contents=504b03040a00..", the upload >>>>> worked fine. You can close the image (without saving) >>>>> >>>>> >>>>> >>>>> Debugging: >>>>> >>>>> >>>>> >>>>> Bob Arning has been able to reproduce the issue with my scenario. >>>>> He dived into the code involved during this process, till reaching some >>>>> "basic" methods where he saw the issue occuring. >>>>> >>>>> Here are the conclusion till there: >>>>> => A corruption occurs while reading an input stream with method ZnUtils >>>>> class>>readUpToEnd:limit: >>>>> The first 4 bytes may be altered randomely. >>>>> => The first 4 bytes are initially correctly written to an outputStream. >>>>> But, the first 4 bytes of this outputStream gets altered (corrupted), >>>>> sometimes when the inner byte array grows OR when performing the final >>>>> "outputStream contents" >>>>> => Here is a piece of code that reproduce the issue (still randomely. >>>>> stopping an restarting the image may change the behavior) >>>>> >>>>> 1: >>>>> 2: >>>>> 3: >>>>> 4: >>>>> 5: >>>>> 6: >>>>> 7: >>>>> 8: >>>>> 9: >>>>> 10: >>>>> 11: >>>>> 12: >>>>> 13: >>>>> 14: >>>>> 15: >>>>> 16: >>>>> 17: >>>>> 18: >>>>> 19: >>>>> 20: >>>>> >>>>> test4"self test4" | species bufferSize buffer totalRead outputStream >>>>> answer inputStream ba byte1 | ba := ByteArray new: 18202085. >>>>> ba atAllPut: 99. 1 to: 20 do: [ :i | ba at: i put: (#[80 75 3 4 10 7 >>>>> 7 7 7 7 125 83 67 73 7 7 7 7 7 7] at: i) ]. inputStream := ba readStream. >>>>> bufferSize := 16384. species := ByteArray. >>>>> buffer := species new: bufferSize. >>>>> totalRead := 0. >>>>> outputStream := nil. >>>>> [ inputStream atEnd ] whileFalse: [ | readCount | >>>>> readCount := inputStream readInto: buffer startingAt: 1 count: >>>>> bufferSize. >>>>> totalRead = 0 ifTrue: [ >>>>> byte1 := buffer first. >>>>> ]. >>>>> totalRead := totalRead + readCount. >>>>> >>>>> outputStream ifNil: [ >>>>> inputStream atEnd >>>>> ifTrue: [ ^ buffer copyFrom: 1 to: readCount ] >>>>> ifFalse: [ outputStream := (species new: bufferSize) >>>>> writeStream ] ]. >>>>> outputStream next: readCount putAll: buffer startingAt: 1. >>>>> byte1 = outputStream contents first ifFalse: [ self halt ]. >>>>> ]. >>>>> answer := outputStream ifNil: [ species new ] ifNotNil: [ >>>>> outputStream contents ]. >>>>> byte1 = answer first ifFalse: [ self halt ]. ^answer >>>>> >>>>> >>>>> >>>>> suspicions >>>>> >>>>> This issue appeared with Pharo 6. >>>>> >>>>> Some people suggested that it could be a vm issue, and to try my little >>>>> scenario with the last available vm. >>>>> >>>>> I am not sure where to find the last available vm. >>>>> >>>>> I did the test using these elements: >>>>> >>>>> https://files.pharo.org/image/60/latest.zip >>>>> >>>>> https://files.pharo.org/get-files/70/pharo-mac-latest.zip/ >>>>> >>>>> >>>>> >>>>> The issue is still present >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Cyrille Delaunay >>>> >>>> >>> >>> >>> >>> -- >>> Clément Béra >>> Pharo consortium engineer >>> https://clementbera.wordpress.com/ >>> Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq >> >> >> >> >> -- >> Clément Béra >> Pharo consortium engineer >> https://clementbera.wordpress.com/ >> Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq |
Hi Alistair, Hi Clément,
On Fri, Jan 19, 2018 at 12:53 PM, Alistair Grant <[hidden email]> wrote: Hi Clément, I suspect that the problem is the same compactor bug I've been trying to reproduce all week, and have just fixed. Could you try and reproduce with a VM built from the latest commit? Some details: The SpurPlanningCompactor works by using the fact that all Spur objects have room for a forwarding pointer. The compactor make three passes: - the first pass through memory works out where objects will go, replacig their first fields with where they will go, and saving their first fields in a buffer (savedFirstFieldsSpace). - the second pass scans all pointer objects, replacing their fields with where the objects referenced will go (following the forwarding pointers), and also relocates any pointer fields in savedFirstFieldsSpace - the final pass slides objects down, restoring their relocated first fields The buffer used for savedFirstFieldsSpace determines how many passes are used. The system will either use eden (which is empty when compaction occurs) or a large free chunk or allocate a new segment, depending on whatever yields the largest space. So in the right circumstances eden will be used and more than one pass required. The bug was that when multiple passes are used the compactor forgot to unmark the corpse left behind when the object was moved. Instead of the corpse being changed into free space it was retained, but its first field would be that of the forwarding pointer to its new location, not the actual first field. So on 32-bits a ByteArray that should have been collected would have its first 4 bytes appear to be invalid, and on 64-bits its first 8 bytes. Because the heap on 64-bits can grow larger it could be that the bug shows itself much less frequently than on 32-bits. When compaction can be completed in a single pass all corpses are correctly collected, so most of the time the bug is hidden. This is the commit: commit 0fe1e1ea108e53501a0e728736048062c83a66ce Author: Eliot Miranda <[hidden email]> Date: Fri Jan 19 13:17:57 2018 -0800 CogVM source as per VMMaker.oscog-eem.2320 Spur: Fix a bad bug in SpurPlnningCompactor. unmarkObjectsFromFirstFreeObject, used when the compactor requires more than one pass due to insufficient savedFirstFieldsSpace, expects the corpse of a moved object to be unmarked, but copyAndUnmarkObject:to:bytes:firstField: only unmarked the target. Unmarking the corpse before the copy unmarks both. This fixes a crash with ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring: creates lots of files, enough to push the system into the multi-pass regime.
_,,,^..^,,,_ best, Eliot |
In reply to this post by alistairgrant
Ok so the bug happens in VMMaker 2265 while the change in prim 105 was introduced in 2273, so that's not it (I was surprised, there is lots of tests for this). It seems each time I have the problem, it happens when totalReadCount reach 18186240. I guess that matches some specific array growth strategy ? There are multiple different paths in the code dealing with this case. --- We had issues in the past with signed vs unsigned operations and things like that which would make some operations fails for large integers but it does not seem to be related in this case (I don't see any integer overflowing in the 32 bits range instead of 31 bits). I don't think that's related to the C compiler either since the bug happen in Mac and Linux which are compiled with llvm and gcc respectively. --- I tried in Pharo 5, with recent VM it fails, older VMs it works. I tried once for each though. There is no simple code to debug, so I don't really know where to look. I tried to wrap prim 105 with assertion and they all pass so the primitive may not be faulty (I could check the diff with old versions though...). On Fri, Jan 19, 2018 at 9:53 PM, Alistair Grant <[hidden email]> wrote: Hi Clément, Clément Béra Pharo consortium engineer Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq |
In reply to this post by Eliot Miranda-2
Hi everyone, With last answers from Eliot and Alistair, I understand that using a ""GCC 5 VM" may solve the issue. Am I right ? I am willing to give a try on mac os and check if the issue show up with such a vm. But where can I download such a vm ? 2018-01-23 9:24 GMT+01:00 Alistair Grant <[hidden email]>: Hi Eliot, Cyrille Delaunay |
Hi Cyrille,
On 13 February 2018 at 09:38, Cyrille Delaunay <[hidden email]> wrote: > Hi everyone, > > With last answers from Eliot and Alistair, I understand that using a ""GCC 5 > VM" may solve the issue. > Am I right ? After posting about using gcc 5 I started getting weird results which make me wonder if something was wrong during my last testing. Stuart saw the problem on BSD compiled with clang. I assume that gcc and clang are independent enough that it suggests it isn't a gcc issue. But it needs proper investigation. > I am willing to give a try on mac os and check if the issue show up with > such a vm. > But where can I download such a vm ? I think you'd have to compile your own VM at the moment. Cheers, Alistair > 2018-01-23 9:24 GMT+01:00 Alistair Grant <[hidden email]>: >> >> Hi Eliot, >> >> On 23 January 2018 at 01:47, Eliot Miranda <[hidden email]> >> wrote: >> > >> > Hi Alistair, >> > >> > On Mon, Jan 22, 2018 at 1:42 AM, Alistair Grant <[hidden email]> >> > wrote: >> >> >> >> >> >> Hi Eliot, >> >> >> >> On Sat, Jan 20, 2018 at 09:19:04AM +0100, Alistair Grant wrote: >> >> > Hi Eliot, >> >> > >> >> > On 19 January 2018 at 23:04, Eliot Miranda <[hidden email]> >> >> > wrote: >> >> > > Hi Alistair, Hi Cl??ment, >> >> > > >> >> > > On Fri, Jan 19, 2018 at 12:53 PM, Alistair Grant >> >> > > <[hidden email]> >> >> > > wrote: >> >> > >> >> >> > >> Hi Cl??ment, >> >> > >> >> >> > >> On 19 January 2018 at 17:21, Alistair Grant >> >> > >> <[hidden email]> wrote: >> >> > >> > Hi Cl??ment, >> >> > >> > >> >> > >> > On 19 January 2018 at 17:04, Cl??ment Bera >> >> > >> > <[hidden email]> >> >> > >> > wrote: >> >> > >> >> Does not seem to be related to prim 105. >> >> > >> >> >> >> > > >> >> > > >> >> > > I suspect that the problem is the same compactor bug I've been >> >> > > trying to >> >> > > reproduce all week, and have just fixed. Could you try and >> >> > > reproduce with a >> >> > > VM built from the latest commit? >> >> > >> >> > Happy to, but I'm out all day today, so it will be tomorrow or >> >> > Monday. >> >> > >> >> > Cheers, >> >> > Alistair >> >> > (on the run...) >> >> >> >> >> >> I've tested this with 2 images and 3 VMs in all 6 >> >> combinations: >> >> >> >> - "Old VM": commit date: Wed Jan 10 23:39:30 2018 -0800, gcc 4.8.5 >> >> - "New VM": commit date: Sat Jan 20 13:52:26 2018 +0100, gcc 4.8.5 >> >> - "GCC 5 VM": commit date: Sat Jan 20 13:52:26 2018 +0100, gcc 5.4.0 >> >> - Clean image: commit id: b28d466f >> >> - Work image: commit id: eb0a6fb1 >> >> >> >> The gcc 5 is only there because I was playing with it. The results may >> >> be useful, or completely misleading. :-) >> >> >> >> Each time I ran "5 timesRepeat: [ self test4 ]" >> >> with the halts replaced with a count increment. >> >> test4 is the method provided in Cyrille's original message. >> >> >> >> Result summary: >> >> >> >> - Old VM + Work image: 5, 5, 5, 0, 0 >> >> - Old VM + Clean image: 5, 5, 0, 0, 0 >> >> - New VM + Work image: 5, 0, 5, 5, 5 >> >> - New VM + Clean image: 0, 0, 1, 5, 5 >> >> - GCC 5 + Work image: 0, 0, 0, 0, 0 >> >> - GCC 5 + Clean image: 0, 0, 0, 0, 0 >> > >> > >> > This is strong evidence for the issue being a compiler bug with 4.8.x >> > If exactly the same input source for the Vm wrks with gcc 5 but not with >> > 4.8.x then there is a small chance it is due to the Vm relying on undefined >> > behavior, but I doubt it. >> > Assuming it is a gcc bug then >> > - it should be documented in the HowToBuild files for the relevant >> > platforms >> > - Ci builds should start using gcc 5 and dispense with gcc 4.8.x >> > - since the problem is fixed with gcc 5 there seems little point trying >> > to identify which version of gcc introduces the problem and communicating >> > the problem to the gcc maintainers >> > >> > What's the status of the bug on Windows and Mac OS X? >> >> >> I can't check MacOS. Clement? >> >> I get the problem on Windows (version info below). >> >> Note that Max Leske and I have been using gcc 4.8 because of a problem >> with OSProcess and gcc 5 (it possibly doesn't affect me anymore, I'm >> currently using OSSubprocess). See: >> >> http://lists.squeakfoundation.org/pipermail/vm-dev/2017-May/025216.html >> >> I'll try and confirm correct OSSubprocess operation with gcc 5, but it >> will take a while, I'm battling too many different problems at the >> moment: >> >> - TZ and DST handling on Windows >> - Windows VM suddenly can't find any of its plugins. Is anyone else >> seeing strange behaviour here with VMs from the last few days? >> -- It looks like the plugin handling mechanism is completely different >> on Windows to Unix, so the VM doesn't report where it is looking and >> doesn't provide the option to specify the directory. Grrrrrgh. >> - My linux build environment. >> - real life :-) >> >> >> Windows version info: >> >> OS: Windows 10 >> Image: Pharo 7.0 Build information: >> Pharo-7.0+alpha.build.439.sha.481071068244d0484241924f79ba791f80701316 >> (32 Bit) >> VM: >> Win32 built on Nov 27 2017 00:05:44 GMT Compiler: 6.4.0 [Production Spur >> VM] >> CoInterpreter VMMaker.oscog- nice.2281 uuid: >> 4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017 >> StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid: >> 2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017 >> VM: 201711262336 https://github.com/OpenSmalltalk/opensmalltalk-vm.git >> $ Date: Mon Nov 27 00:36:29 2017 +0100 $ >> Plugins: 201711262336 >> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ >> >> >> >> >> Cheers, >> Alistair >> > > > > -- > Cyrille Delaunay |
Free forum by Nabble | Edit this page |