Ronie Salgado Faila uploaded a new version of VMMaker to project VM Maker: http://source.squeak.org/VMMaker/VMMaker.oscog-rsf.2083.mcz ==================== Summary ==================== Name: VMMaker.oscog-rsf.2083 Author: rsf Time: 11 January 2017, 4:42:00.330997 am UUID: 2debebfc-5008-4ab3-b16d-37ab942d9bc0 Ancestors: VMMaker.oscog-eem.2082 Workaround a GCC crash in Windows when building a Lowcode VM. Too much register allocation pressure for calling a builtin memcpy. =============== Diff against VMMaker.oscog-eem.2082 =============== Item was changed: ----- Method: StackInterpreter>>internalPushShadowCallStackStructure:size: (in category 'internal interpreter access') ----- internalPushShadowCallStackStructure: structurePointer size: size <option: #LowcodeVM> shadowCallStackPointer := shadowCallStackPointer - size. + self lowcode_mem: shadowCallStackPointer cp: structurePointer y: size! - self mem: shadowCallStackPointer cp: structurePointer y: size! Item was changed: ----- Method: StackInterpreter>>lowcodePrimitiveInt32ToPointer (in category 'inline primitive generated code') ----- lowcodePrimitiveInt32ToPointer <option: #LowcodeVM> "Lowcode instruction generator" | value result | <var: #value type: #'sqInt' > <var: #result type: #'char*' > value := self internalPopStackInt32. + result := self cCoerce: (self cCoerce: value to: 'uintptr_t') to: 'char*'. - result := self cCoerce: value to: 'uintptr_t'. self internalPushPointer: result. ! Item was changed: ----- Method: StackInterpreter>>lowcodePrimitiveMemcpy32 (in category 'inline primitive generated code') ----- lowcodePrimitiveMemcpy32 <option: #LowcodeVM> "Lowcode instruction generator" | source dest size | <var: #source type: #'char*' > <var: #dest type: #'char*' > <var: #size type: #'sqInt' > size := self internalPopStackInt32. source := self internalPopStackPointer. dest := self internalPopStackPointer. + self lowcode_mem: dest cp: source y: size. - self mem: dest cp: source y: size. ! Item was changed: ----- Method: StackInterpreter>>lowcodePrimitiveMemcpy64 (in category 'inline primitive generated code') ----- lowcodePrimitiveMemcpy64 <option: #LowcodeVM> "Lowcode instruction generator" | source dest size | <var: #source type: #'char*' > <var: #dest type: #'char*' > <var: #size type: #'sqLong' > size := self internalPopStackInt64. source := self internalPopStackPointer. dest := self internalPopStackPointer. + self lowcode_mem: dest cp: source y: size. - self mem: dest cp: source y: size. ! Item was changed: ----- Method: StackInterpreter>>lowcodePrimitiveMemcpyFixed (in category 'inline primitive generated code') ----- lowcodePrimitiveMemcpyFixed <option: #LowcodeVM> "Lowcode instruction generator" | source size dest | <var: #source type: #'char*' > <var: #dest type: #'char*' > size := extA. source := self internalPopStackPointer. dest := self internalPopStackPointer. + self lowcode_mem: dest cp: source y: size. - self mem: dest cp: source y: size. extA := 0. ! Item was changed: ----- Method: StackInterpreter>>lowcodePrimitivePerformCallStructure (in category 'inline primitive generated code') ----- lowcodePrimitivePerformCallStructure <option: #LowcodeVM> "Lowcode instruction generator" | resultPointer result function structureSize | <var: #resultPointer type: #'char*' > <var: #result type: #'char*' > function := extA. structureSize := extB. result := self internalPopStackPointer. self internalPushShadowCallStackPointer: result. resultPointer := self lowcodeCalloutPointerResult: (self cCoerce: function to: #'char*'). self internalPushPointer: resultPointer. extA := 0. extB := 0. numExtB := 0. + ! Item was changed: ----- Method: StackInterpreter>>lowcodePrimitivePointerAddConstantOffset (in category 'inline primitive generated code') ----- lowcodePrimitivePointerAddConstantOffset <option: #LowcodeVM> "Lowcode instruction generator" | base offset result | <var: #base type: #'char*' > <var: #result type: #'char*' > offset := extB. base := self internalPopStackPointer. result := base + offset. self internalPushPointer: result. extB := 0. numExtB := 0. ! Item was added: + ----- Method: StackInterpreter>>lowcode_mem:cp:y: (in category 'inline primitive support') ----- + lowcode_mem: destAddress cp: sourceAddress y: bytes + "This method is a workaround a GCC bug. + In Windows memcpy is putting too much register pressure on GCC when used by Lowcode instructions" + <inline: #never> + <option: #LowcodeVM> + <var: #destAddress type: #'void*'> + <var: #sourceAddress type: #'void*'> + <var: #bytes type: #'sqInt'> + + "Using memmove instead of memcpy to avoid crashing GCC in Windows." + self mem: destAddress mo: sourceAddress ve: bytes! Item was changed: ----- Method: StackToRegisterMappingCogit>>genLowcodePerformCallStructure (in category 'inline primitive generators generated code') ----- genLowcodePerformCallStructure <option: #LowcodeVM> "Lowcode instruction generator" "Push the result space" self ssNativeTop nativeStackPopToReg: TempReg. self ssNativePop: 1. self PushR: TempReg. "Call the function" self callSwitchToCStack. self MoveCw: extA R: TempReg. self CallRT: ceFFICalloutTrampoline. "Fetch the result" self MoveR: backEnd cResultRegister R: ReceiverResultReg. self ssPushNativeRegister: ReceiverResultReg. extA := 0. extB := 0. numExtB := 0. ^ 0 ! Item was changed: ----- Method: StackToRegisterMappingCogit>>genLowcodePointerAddConstantOffset (in category 'inline primitive generators generated code') ----- genLowcodePointerAddConstantOffset <option: #LowcodeVM> "Lowcode instruction generator" | base offset | offset := extB. (base := backEnd availableRegisterOrNoneFor: self liveRegisters) = NoReg ifTrue: [self ssAllocateRequiredReg: (base := optStatus isReceiverResultRegLive ifTrue: [Arg0Reg] ifFalse: [ReceiverResultReg])]. base = ReceiverResultReg ifTrue: [ optStatus isReceiverResultRegLive: false ]. self ssNativeTop nativePopToReg: base. self ssNativePop: 1. self AddCq: offset R: base. self ssPushNativeRegister: base. extB := 0. numExtB := 0. ^ 0 ! |
Hi Ronie, I see this :-) lowcode_mem: destAddress cp: sourceAddress y: bytes "This method is a workaround a GCC bug. In Windows memcpy is putting too much register pressure on GCC when used by Lowcode instructions" <inline: #never> <option: #LowcodeVM> <var: #destAddress type: #'void*'> <var: #sourceAddress type: #'void*'> <var: #bytes type: #'sqInt'> "Using memmove instead of memcpy to avoid crashing GCC in Windows." self mem: destAddress mo: sourceAddress ve: bytes Isn't it great when one has to work around compiler bugs?? ;-) However, let me suggest that this this is perhaps a case where a macro would be better. If you added the definition of the macro to StackInterpreter class>>#preambleCCode you'd be able to avoid the overhead with compilers that can correctly inline memcpy. If required, a sqPlatform.h could define a value, say DontInlineMemcpyForLowcode and then in the preamble you could have #if DontInlineMemcpyForLowcode # define memcpy(a,b,c) noinline_memcpy(a,b,c) #endif ? And then noinline_memcpy could be defined in some platform support file, sqWin32Main.c perhaps? The simulator's noinline_memcpy would be defined as <doNotGenerate>. Anyway, some way of making this platform-dependent is nice as you'll get better performance on the other platforms, and on x64. And yes, feel free to ignore me as this perhaps does count as a premature optimization. On Tue, Jan 10, 2017 at 11:42 PM, <[hidden email]> wrote:
_,,,^..^,,,_ best, Eliot |
Hi Eliot,
This is very annoying. And for me this is not the first time. It seems that with the heavy inlining in GCC we are putting a bit too much stress on its register allocator.
I was thinking on doing something like this, but I did not knew how to do it because of Slang. Later in some time I will fix it. Best regards, Ronie 2017-01-11 16:38 GMT-03:00 Eliot Miranda <[hidden email]>:
|
In reply to this post by Eliot Miranda-2
> On 11 Jan 2017, at 20:38, Eliot Miranda <[hidden email]> wrote: > > Hi Ronie, > Hi Ronie, > "Using memmove instead of memcpy to avoid crashing GCC in Windows." > self mem: destAddress mo: sourceAddress ve: bytes do you have an example C-code to illustrate the issue? I am happy to file a bug report for GCC and do the follow-up. kind regards holger |
Hi Holger, Try with the attached file. It constains the problematic sources passed through the preprocessor to remove the dependencies on the headers.../../spurlowcodesrc/vm/gcc3x-cointerp.c:32885:1: error: unable to find a register to spill } ^ ../../spurlowcodesrc/vm/gcc3x-cointerp.c:32885:1: error: este es la insn (this is the instruction): (insn 13277 83596 83595 1281 (parallel [ (set (reg:SI 27472 [25785]) (const_int 0 [0])) (set (reg/f:SI 27470 [orig:21149 D.104541 ] [21149]) (plus:SI (ashift:SI (reg:SI 27472 [25785]) (const_int 2 [0x2])) (reg/f:SI 27470 [orig:21149 D.104541 ] [21149]))) (set (reg/f:SI 27471 [orig:21150 structurePointer ] [21150]) (plus:SI (ashift:SI (reg:SI 27472 [25785]) (const_int 2 [0x2])) (reg/f:SI 27471 [orig:21150 structurePointer ] [21150]))) (set (mem:BLK (reg/f:SI 27470 [orig:21149 D.104541 ] [21149]) [0 A32]) (mem:BLK (reg/f:SI 27471 [orig:21150 structurePointer ] [21150]) [0 A8])) (use (reg:SI 27472 [25785])) ]) ../../spurlowcodesrc/vm/gcc3x-cointerp.c:85260 773 {*rep_movsi} (expr_list:REG_UNUSED (reg:SI 27472 [25785]) (nil))) ../../spurlowcodesrc/vm/gcc3x-cointerp.c:32885: confusión por errores previos, saliendo (confussion due to previous errors, exiting) Build command line: i686-w64-mingw32-gcc -x c -MT build/vm/gcc3x-cointerp.o -MMD -MP -MF deps/gcc3x-cointerp.Td -msse2 -ggdb2 -m32 -mno-rtd -mms-bitfields -O2 -march=pentium4 -momit-leaf-frame-pointer -funroll-loops -D_MT -fno-builtin-printf -fno-builtin-putchar -fno-builtin-fprintf -Wall -Wno-unused-variable -Wno-unknown-pragmas -Wno-unused-value -Wno-unused-function -Wno-unused-but-set-variable -I. -I../../spurlowcodesrc/vm -I../../platforms/win32/vm -I../../platforms/Cross/vm -DSTACK_ALIGN_BYTES=16 -DALLOCA_LIES_SO_USE_GETSP=0 -DCOGMTVM=0 -DDEBUGVM=0 -D_WIN32_WINNT=0x0501 -DWINVER=0x0501 -DWIN32 -DWIN32_FILE_SUPPORT -DNO_ISNAN -DNO_SERVICE -DNO_STD_FILE_SUPPORT -D'TZ="CLST"' -DNDEBUG -D'VM_LABEL(foo)=0' -DLSB_FIRST -D'VM_NAME="Squeak"' -DSQUEAK_BUILTIN_PLUGIN -DCROQUET -c gcc3x-cointerp.prep.c -o build/vm/gcc3x-cointerp.o GCC version: i686-w64-mingw32-gcc --version i686-w64-mingw32-gcc (GCC) 5.4.0 On Linux and Mac I am not having this problem. Best regards, Ronie 2017-01-12 19:12 GMT-03:00 Holger Freyther <[hidden email]>:
gcc3x-cointerp.prep.c.zip (417K) Download Attachment |
Free forum by Nabble | Edit this page |