Some of you might know that David T. Lewis has been working on changes to the VM source to make it work fully within 32 or 64 bit address spaces. As we know the Squeak VM treated memory address which are unsigned values as signed integer values. This wrong usage of signed math in compare statments or do loops which would cause the VM to make an incorrect decision resulting in corrupted memory and causing the VM to crash. This issue would usually occur if you wanted to use 1GB of memory for your VM and the host operating system would then allocate memory for you above the 2GB boundary, or at say the 1.5GB boundary. Resulting either in an instant crash, or a crash much later when your memory needs caused the VM to grow over the 2GB boundary. Some fixes were done in the past to make the VM mostly run when fully over the 2GB boundary but at best they were insufficient patches. Over the last couple of days I reviewed David Lewis' changes, plus made some fixes, and revised the macintosh os-x support files, plus worked up some general test cases to see what happens when you run the macro bench marks below the 2GB boundary, crossing the 2GB boundary, and when the image is allocated at the 3GB boundary. This afternoon I'm pleased to say the VM passed all runs of my trivial test cases, so I have check in the Mac OS carbon source code changes and David's changes to the Mac OS source tree for further review. People wanting to build a VM should review the Mac OS build instructions to build a Mac OS carbon VM, or review the required changes to VMMaker as per the Carbon VM build readme to build a 32bit clean VM. I have not: (a) build a 64 bit VM and tested it. VM developers should consider the mmap call in the memory allocation routine, you can specify a suggested starting position. On OS-X I was able to chose 1GB, 1.5GB, 2GB and 3GB. I have not tested 64bit VMs at the 0x8000000000000000 boundary. I suspect you could allocate at the 0x7FFFFFFFF0000000 Then ask for 600MB of memory for the image. That would set the end of memory at 0x8000000015800000, 344MB over the negative sign boundary. (b) I have not tested or reviewed any of the external plugins for improper use of usqInt. (c) I have not confirmed the changes work with the Unix VM, or the Windows VM, I have no plans to do so. -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
2007/6/10, John M McIntosh <[hidden email]>: > (c) I have not confirmed the changes work with the Unix VM, or the > Windows VM, I have no plans to do so. What can we do to test the unix vm? -- Damien Cassou |
In reply to this post by johnmci
You make me want to buy a Mac. Cheers Philippe 2007/6/10, John M McIntosh <[hidden email]>: > > Some of you might know that David T. Lewis has been working on > changes to the VM source to make it work fully within 32 or 64 bit > address spaces. > > As we know the Squeak VM treated memory address which are unsigned > values as signed integer values. This wrong usage of signed math in > compare statments or do loops which would cause the VM to make an > incorrect decision resulting in corrupted memory and causing the VM > to crash. > > This issue would usually occur if you wanted to use 1GB of memory for > your VM and the host operating system would then allocate memory for > you above the 2GB boundary, or at say the 1.5GB boundary. Resulting > either in an instant crash, or a crash much later when your memory > needs caused the VM to grow over the 2GB boundary. > > Some fixes were done in the past to make the VM mostly run when fully > over the 2GB boundary but at best they were insufficient patches. > > Over the last couple of days I reviewed David Lewis' changes, plus > made some fixes, and revised the macintosh os-x support files, plus > worked up some general test cases to see what happens when you run > the macro bench marks below the 2GB boundary, crossing the 2GB > boundary, and when the image is allocated at the 3GB boundary. > > This afternoon I'm pleased to say the VM passed all runs of my > trivial test cases, so I have check in the Mac OS carbon source code > changes and David's changes to the Mac OS source tree for further > review. > > People wanting to build a VM should review the Mac OS build > instructions to build a Mac OS carbon VM, or review the required > changes to VMMaker as per the Carbon VM build readme to build a 32bit > clean VM. > > I have not: > > (a) build a 64 bit VM and tested it. > > VM developers should consider the mmap call in the memory allocation > routine, you can specify a suggested starting position. On OS-X I was > able to chose 1GB, 1.5GB, 2GB and 3GB. I have not tested 64bit VMs > at the 0x8000000000000000 boundary. I suspect you could allocate at > the 0x7FFFFFFFF0000000 Then ask for 600MB of memory for the image. > That would set the end of memory at 0x8000000015800000, 344MB over > the negative sign boundary. > > (b) I have not tested or reviewed any of the external plugins for > improper use of usqInt. > > (c) I have not confirmed the changes work with the Unix VM, or the > Windows VM, I have no plans to do so. > > -- > ======================================================================== > === > John M. McIntosh <[hidden email]> > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > ======================================================================== > === > > > |
In reply to this post by Damien Cassou-3
Well follow the instructions to build a unix VM. At the point where you use VMMaker to make the VM ensure you go to Mac OS/vm/specialChangeSets and load ArraysToGlobalStruct-JMM.1.cs May already be in image, check source. bigCursor-bf.1.cs JMM-fixBiasToGrow.1.cs.zip VMM38-64bit-imageUpdates.1.cs May already be in image, check source. VMM38-gc-instrument-image.1.cs May already be in image, check source. VmUpdates-dtl VmUpdates-1001-dtl.1.cs VmUpdates-1002-dtl.1.cs VmUpdates-1003-dtl.1.cs VmUpdates-1004-dtl.1.cs VmUpdates-1005-dtl.1.cs VmUpdates-1006-dtl.1.cs JMM-VmUpdates32bitclean.2.cs For 64bit work no idea, wasn't there some issue with fixes needed to build it anyway? Once the VM is build to test I suggest you look at the call to mmap in the unix memory allocation source sqUnixMemory.c and set the start location from zero to say 1.5GB, then startup your VM and ask for 600MB of memory. In uxGrowMemoryBy look at the value for heap + heapSize to see where the heap ends to ensure your choices are correct. I then downloaded a 3.5 image since it contains the macrobenchmarks, and ran | suck | suck := OrderedCollection new. suck add: (ByteArray new: 1024*1024*480). 97 timesRepeat: [ Smalltalk macroBenchmarks. suck add: (ByteArray new: 1024*1024*1). Transcript show: Smalltalk garbageCollectMost;cr. Transcript show: Smalltalk garbageCollect;cr]. By adjust the 1024*1024*480) you want to put the entire active vm memory heap under the 2GB boundary, then the timesRepeat: loop allocates memory and runs the benchmarks to cross over the boundary. Modifications, using a smaller value for 1024*1024*1, really this should be 4 bytes in order to march over the boundary in more possible conditions however running it would require on the order of 4 million iterations. Maybe someone could devote a week and run with a 4K allocation. As mentioned earlier I have not tested 64bit VMs at the 0x8000000000000000 boundary. I suspect you could allocate at the 0x7FFFFFFFF0000000 Then ask for 600MB of memory for the image. That would set the end of memory at 0x8000000015800000, 344MB over the negative sign boundary, adjusting the initial memory allocation to find the boundary. The other two test cases are the image below the 2gb boundary, which should be the first couple of running benchmarks for the VM, and the VM fully over the 2GB boundary which can be set by adjust mmap to say 3GB. On Jun 10, 2007, at 3:21 AM, Damien Cassou wrote: > 2007/6/10, John M McIntosh <[hidden email]>: > >> (c) I have not confirmed the changes work with the Unix VM, or the >> Windows VM, I have no plans to do so. > > What can we do to test the unix vm? > > > -- > Damien Cassou -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Damien Cassou-3
On Sun, Jun 10, 2007 at 12:21:05PM +0200, Damien Cassou wrote: > > 2007/6/10, John M McIntosh <[hidden email]>: > > >(c) I have not confirmed the changes work with the Unix VM, or the > >Windows VM, I have no plans to do so. > > What can we do to test the unix vm? > Well, since you asked ;) Following up on John's OS X work, I have now built Unix VMs on both 32-bit and 64-bit systems without problems. The good news is that everything works fine on both platforms, even when I set the base of heap memory to just below 2MB as John suggested for testing. The bad news is that I cannot get it to fail. My 32-bit system is an older 2.4 Linux kernel, which refuses to mmap things at the requested locations and therefore does not have a problem. On the 64-bit (2.6 kernel) system, I can allocate heap below 2MB, and Squeak is perfectly happy. There probably has never been any issue on 64-bit Linux systems as far as I can tell (but you do need to also load the fix from Mantis 5688 if you are building for a 64-bit system, and some others if you want to run an actual 64-bit image). So here is what is needed: We need someone with a Linux system that *does* have the memory problem, such that (for example) your Seaside application will crash if you do not run it with the "-memory" option. On that same system, build a new VM with the latest Subversion sources, with VMMaker from SqueakMap, plus the fileins that John has provided in the "platforms/Mac OS/vm/specialChangeSets/VmUpdates-dtl" directory. No other fileins should be necessary on a 32-bit system, so if you can build this VM, then run the Seaside application without using a "-memory" option on the Squeak command line, then we are probably in good shape. It is quite likely that this *will* work, but someone with a newer 32-bit Linux system will need to confirm it. Note that some follow-up testing will probably be needed to try forcing Squeak memory allocation at certain specific locations (i.e. right below the 2BM address boundary), but just building and running a new VM to see if it makes a known problem go away would be a big help. Thanks, Dave |
> The good news is that everything works fine on both platforms, even > when I set the base of heap memory to just below 2MB as John suggested > for testing. Well I assume you mean 2GB for 32bit systems, but for 64bit you need to get up to the 0x8000000000000000 boundary. > > The bad news is that I cannot get it to fail. My 32-bit system is > an older 2.4 Linux kernel, which refuses to mmap things at the > requested locations and therefore does not have a problem. In all the crash cases we see the stack context go over the 2gb boundary expressed as negative values in the the VM stack traces. Since we know the for() loop in incCompMove trashs memory when you walk an object move over the 2GB boundary what you really need is to confirm the image works fine when it starts under 2gb, and ends over 2gb. That and that really big number for 64bit systems. -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by David T. Lewis
David T. Lewis wrote: > The bad news is that I cannot get it to fail. My 32-bit system is I have a debian system on VMware that so far has reliably failed without the -memory. If you send me your VM I can give it a try. Michael |
In reply to this post by johnmci
On Tue, Jun 12, 2007 at 12:12:16AM -0700, John M McIntosh wrote: > >The good news is that everything works fine on both platforms, even > >when I set the base of heap memory to just below 2MB as John suggested > >for testing. > > Well I assume you mean 2GB for 32bit systems, but for 64bit you need > to get up to the 0x8000000000000000 boundary. On the 64-bit system, it's not allowing me to use anything that high in the address space. I can request a mmap to 0xfff00000000 and the request will be honored: uxAllocateMemory: heap requested at fff00000000, allocated at fff00000000 But if I request a higher location, it decides that I am being unreasonable and uses its own assignment: uxAllocateMemory: heap requested at ffff00000000, allocated at 2aca47532000 Thus I cannot say if there would be any issues at the 0x800000000000000 boundary, but I can say that this does not appear to be a possible failure mode on current Linux implementations. > > > >The bad news is that I cannot get it to fail. My 32-bit system is > >an older 2.4 Linux kernel, which refuses to mmap things at the > >requested locations and therefore does not have a problem. > > > In all the crash cases we see the stack context go over the 2gb > boundary expressed as negative values > in the the VM stack traces. Since we know the for() loop in > incCompMove trashs memory when you walk an object move over > the 2GB boundary what you really need is to confirm the image works > fine when it starts under 2gb, and ends over 2gb. Right. But I was too hasty. I saw that my mmap request did not work at 0x7FFFFFFF and assumed that my older Linux system did not honor this, but I just tried again with: #define SQBASE (0x7FFFFFFF - 2000000) And this *did* work. Better yet, I can now reproduce the problem reliably: uxAllocateMemory: heap requested at 7fe17b7f, allocated at 7fe18000 sweep failed to find exact end of memory -2126938800 SystemDictionary>garbageCollect I will re-apply our fixes and see what happens. I'm late for work though, so it may not be done this morning. Dave |
In reply to this post by Michael Rueger-6
On Tue, Jun 12, 2007 at 09:39:48AM +0200, Michael Rueger wrote: > > David T. Lewis wrote: > > >The bad news is that I cannot get it to fail. My 32-bit system is > > I have a debian system on VMware that so far has reliably failed without > the -memory. > If you send me your VM I can give it a try. Michael, Thanks for the offer. I think I've got a repeatable failure case now so I should be able to complete the testing on my system. Dave |
In reply to this post by David T. Lewis
OK, I can now confirm that the changes work on 32-bit Linux also. After applying the changes, then forcing the heap to this: #define SQBASE (0x7FFFFFFF - 2000000) I can run Squeak without the crash, allocating 300MB of strings in the image, freeing them and doing a GC, all without problems: lewis@dtlewis:/data3/lewis/squeak/sq/Squeak3.9> squeak withfixes uxAllocateMemory: heap requested at 7fe17b7f, allocated at 7fe18000 So everything looks good on both 32-bit and 64-bit Linux. I have not tried any 64-bit images, but would not expect any problem there either. Dave On Tue, Jun 12, 2007 at 07:15:46AM -0400, David T. Lewis wrote: > > On Tue, Jun 12, 2007 at 12:12:16AM -0700, John M McIntosh wrote: > > >The good news is that everything works fine on both platforms, even > > >when I set the base of heap memory to just below 2MB as John suggested > > >for testing. > > > > Well I assume you mean 2GB for 32bit systems, but for 64bit you need > > to get up to the 0x8000000000000000 boundary. > > On the 64-bit system, it's not allowing me to use anything that high > in the address space. I can request a mmap to 0xfff00000000 and the > request will be honored: > > uxAllocateMemory: heap requested at fff00000000, allocated at fff00000000 > > But if I request a higher location, it decides that I am being unreasonable > and uses its own assignment: > > uxAllocateMemory: heap requested at ffff00000000, allocated at 2aca47532000 > > Thus I cannot say if there would be any issues at the 0x800000000000000 > boundary, but I can say that this does not appear to be a possible > failure mode on current Linux implementations. > > > > > > >The bad news is that I cannot get it to fail. My 32-bit system is > > >an older 2.4 Linux kernel, which refuses to mmap things at the > > >requested locations and therefore does not have a problem. > > > > > > In all the crash cases we see the stack context go over the 2gb > > boundary expressed as negative values > > in the the VM stack traces. Since we know the for() loop in > > incCompMove trashs memory when you walk an object move over > > the 2GB boundary what you really need is to confirm the image works > > fine when it starts under 2gb, and ends over 2gb. > > Right. > > But I was too hasty. I saw that my mmap request did not work at > 0x7FFFFFFF and assumed that my older Linux system did not honor > this, but I just tried again with: > #define SQBASE (0x7FFFFFFF - 2000000) > > And this *did* work. Better yet, I can now reproduce the problem > reliably: > > uxAllocateMemory: heap requested at 7fe17b7f, allocated at 7fe18000 > > sweep failed to find exact end of memory > > -2126938800 SystemDictionary>garbageCollect > > I will re-apply our fixes and see what happens. > I'm late for work though, so it may not be done this morning. > > Dave |
In reply to this post by David T. Lewis
On Jun 12, 2007, at 4:15 AM, David T. Lewis wrote: > On the 64-bit system, it's not allowing me to use anything that high > in the address space. I can request a mmap to 0xfff00000000 and the > request will be honored: I was just reviewing these emails for michael and I don't think I commented on this allocating at 0xfff00000000 would be fine to test the squeak oops space over the 0x8000000000000000 boundary. It wasn't clear if you were able to allocate below the 0x7FFFFFFFFFFFFFFF boundary to test allocating over that boundary. Also I've never heard if any one has been able to run a Squeak image at say at or near 4GB ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
>> On the 64-bit system, it's not allowing me to use anything that high >> in the address space. I can request a mmap to 0xfff00000000 and the >> request will be honored: > > I was just reviewing these emails for michael and I don't think I > commented on this > allocating at 0xfff00000000 would be fine to test the squeak oops > space over the 0x8000000000000000 boundary. > > It wasn't clear if you were able to allocate below the > 0x7FFFFFFFFFFFFFFF boundary to test allocating over > that boundary. Also I've never heard if any one has been able to > run a Squeak image at say at or near 4GB Dave: if you look in sqUnixMemory.c you'll see a facility in the memory allocator to artificially skew all oops by a certain amount. If you calculate the amount after the allocation you can place the apparent start of memory at any address you like. There is (or was) some corresponding code in sqMemory.h to allow unskew all oops accordingly to bring them back into the allocated memory, but I've no idea if anyone removed that since (or even if it didn't make it out of the 64-bit prototype sources and into the final repository version). This would seem by far the easiest way to force oops (as seen by the Interpreter) to occupy interesting borderline address ranges. Cheers, Ian |
John, No, I did not test the 0x7FFFFFFFFFFFFFFF boundary at all. Is it important to do so? If so, I'll see if I can set up a test based on Ian's tip. Ian, are you referring to the SQ_FAKE_MEMORY_OFFSET macro? It looks like that would do what you are suggesting. Avi, have you been running the VM with these changes in any production situations? If so, any feedback you might be able to provide would be appreciated. Thanks! Dave On Sat, Jul 14, 2007 at 11:39:26AM -0700, Ian Piumarta wrote: > >>On the 64-bit system, it's not allowing me to use anything that high > >>in the address space. I can request a mmap to 0xfff00000000 and the > >>request will be honored: > > > >I was just reviewing these emails for michael and I don't think I > >commented on this > >allocating at 0xfff00000000 would be fine to test the squeak oops > >space over the 0x8000000000000000 boundary. > > > >It wasn't clear if you were able to allocate below the > >0x7FFFFFFFFFFFFFFF boundary to test allocating over > >that boundary. Also I've never heard if any one has been able to > >run a Squeak image at say at or near 4GB > > Dave: if you look in sqUnixMemory.c you'll see a facility in the > memory allocator to artificially skew all oops by a certain amount. > If you calculate the amount after the allocation you can place the > apparent start of memory at any address you like. There is (or was) > some corresponding code in sqMemory.h to allow unskew all oops > accordingly to bring them back into the allocated memory, but I've no > idea if anyone removed that since (or even if it didn't make it out > of the 64-bit prototype sources and into the final repository > version). This would seem by far the easiest way to force oops (as > seen by the Interpreter) to occupy interesting borderline address > ranges. > > Cheers, > Ian > |
David T. Lewis wrote: > Avi, have you been running the VM with these changes in any production > situations? If so, any feedback you might be able to provide would be > appreciated. Thanks! We've been running VMs with these changes[*] and they have fixed the Linux problems that we had. As an aside, one thing that we ran into (and that I just fixed a couple of days ago) is the effect that Delay in Squeak is not safe. Much of the manipulation of the Delay internal structures (SuspendedDelay and friends) happens from the calling process and if that calling process gets killed things go south quickly. Unfortunately, you won't ever run into this unless you run a server (because you won't ever get "truly asynchronous" interrupts to cause this to happen) which makes it all but impossible to recreate this problem on a single machine. [*] Yours plus John's IGC fix which turned out to be important. Cheers, - Andreas |
In reply to this post by David T. Lewis
On Jul 14, 2007, at 2:40 PM, David T. Lewis wrote: > John, > > No, I did not test the 0x7FFFFFFFFFFFFFFF boundary at all. Is it > important > to do so? If so, I'll see if I can set up a test based on Ian's tip. Well that boundary is the magic positive versus negative signed 64 bit integer value. It's really just a cross check to confirm everything works as expected in the 64bit version. it would appear that you've done below that value, and above the value with your 0xFF testing, it's just the crossing of that value, to dot our I's and cross our t's so to speak. > > Ian, are you referring to the SQ_FAKE_MEMORY_OFFSET macro? It looks > like > that would do what you are suggesting. > > Avi, have you been running the VM with these changes in any production > situations? If so, any feedback you might be able to provide would be > appreciated. Thanks! > > Dave ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Andreas.Raab
On Jul 14, 2007, at 2:48 PM, Andreas Raab wrote: > > We've been running VMs with these changes[*] and they have fixed > the Linux problems that we had. As an aside, one thing that we ran > into (and that I just fixed a couple of days ago So you have these changes where? I was not clear on your comment about server versus desktop and how the issue is triggered. Do dual processor intel desktop machines count as Server machines? Or is it the application mix? Single user application, versus MC or Seaside server. -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
John M McIntosh wrote: > On Jul 14, 2007, at 2:48 PM, Andreas Raab wrote: >> We've been running VMs with these changes[*] and they have fixed the >> Linux problems that we had. As an aside, one thing that we ran into >> (and that I just fixed a couple of days ago > > So you have these changes where? I was not clear on your comment about > server versus desktop and how the issue is triggered. Do dual processor > intel desktop machines count as Server machines? Can't say for sure. Only that our server's MTBF was somewhere between 24-48 hours because of that problem. After deploying the fix we've been going for three days straight with no problems (fingers crossed). If we can make it to a week or so I'll post the changes since deploying them on such short notice was a somewhat desperate measure due to heavy customer complaints. If you want to look at some code, the problematic places are pretty obvious: Delay>>schedule, Delay>>unschedule, and Delay>>activate are all prone to being terminated while updating Delay-internal structures. When that happens, the result is a total system lockup since Delay resources are globally shared. Also, note that these operations run with the client's priority which makes it very possible to be preempted by a higher priority process and cause other problems. For example, consider a low priority process holding the Delay lock and a medium priority process sitting in a tight loop for some reason; this will lock up the entire system since the timer interrupt watcher won't be able to enter the semaphore. I have a a couple of stack traces showing these and related problems. The one saving grace for us was to have USR1 generate a full stack dump of all processes for forensic reasons. Without that we'd be using Java on the servers by now (no kidding; this is still an option and depends largely on whether we can make Squeak reliable enough as a server). Cheers, - Andreas |
Free forum by Nabble | Edit this page |