Dear all: With the RoarVM, I was hitting a problem with Clang 3.0 (tags/Apple/clang-211.9). It turned out that Clang does not like the current implementation of bytecodePrimMultiply. From what I can tell, it should be identical with what I found in the SqueakVM sources. RoarVM src: https://github.com/smarr/RoarVM/blob/master/vm/src/interpreter/interpreter_bytecodes.cpp#L316 oop_int_t ri = rcvr.integerValue(); oop_int_t ai = arg.integerValue(); oop_int_t r = ri * ai; if (ai == 0 || (r / ai == ri && Oop::isIntegerValue(r))) { internalPopThenPush(2, Oop::from_int(r)); fetchNextBytecode(); return; } Squeak src: http://squeakvm.org/cgi-bin/viewcvs.cgi/branches/Cog/src/vm/cointerp.c?rev=2497&view=auto rcvr = (rcvr >> 1); arg = (arg >> 1); result = rcvr * arg; if (((arg == 0) || ((result / arg) == rcvr)) && ((result ^ (result << 1)) >= 0)) { /* begin internalPop:thenPush: */ longAtPointerput((localSP += (2 - 1) * BytesPerWord), ((result << 1) | 1)); /* begin fetchNextBytecode */ currentBytecode = byteAtPointer(++localIP); goto l69; } The problematic example is rcvr: 40453, arg: 864001. In that case, the current implementation does not detect the overflow correctly when Clang is used with >= O1 optimization. The big question for me now is, whether we got here a Clang compiler bug, or unspecified C behavior. My quick fix is: oop_int_t ri = rcvr.integerValue(); oop_int_t ai = arg.integerValue(); long long result_with_overflow = (long long)ri * ai; if (ai == 0 || ((result_with_overflow >= -1073741824) && (result_with_overflow <= 1073741823))) { internalPopThenPush(2, Oop::from_int(result_with_overflow)); fetchNextBytecode(); return; } This works obviously, but I assume that the original code is a well considered speed optimizations. Any thoughts? Thanks Stefan -- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525 |
Hi Stefan, so the VMMaker source is probably easier to read. First, thanks for the optimization; clearly (arg = 0 or: [(result // arg) = rcvr and: [self isIntegerValue: result]])
is better than the original ((arg = 0 or: [(result // arg) = rcvr]) and: [self isIntegerValue: result])
ObjectMemory methods for interpreter access isIntegerValue: intValue "Return true if the given value can be represented as a Smalltalk integer value."
"Use a shift and XOR to set the sign bit if and only if the top two bits of the given value are the same, then test the sign bit. Note that the top two bits are equal for exactly those integers in the range that can be represented in 31-bits or 63-bits."
^ (intValue bitXor: (intValue << 1)) >= 0 The comment makes sense but there is perhaps nothing to guarantee that a C compiler must consider the type of signed ^ signed as signed, in which case the compiler would assume it is always >= 0, and the overflow detection would fail.
If you change the method to ObjectMemory methods for interpreter access isIntegerValue: intValue "Return true if the given value can be represented as a Smalltalk integer value."
"Use a shift and XOR to set the sign bit if and only if the top two bits of the given value are the same, then test the sign bit. Note that the top two bits are equal for exactly those integers in the range that can be represented in 31-bits or 63-bits."
^(intValue bitXor: (intValue << 1)) asInteger >= 0 which should force a cast to sqInt, does that fix things?
i.e. it should produce if (((arg == 0) || ((result / arg) == rcvr)) && ((sqInt)((result ^ (result << 1))) >= 0)) {
On Mon, Oct 3, 2011 at 5:46 PM, Stefan Marr <[hidden email]> wrote:
best, Eliot |
Hi Eliot: On 04 Oct 2011, at 19:11, Eliot Miranda wrote: > so the VMMaker source is probably easier to read. Not for me ;) > First, thanks for the optimization; clearly > > (arg = 0 or: [(result // arg) = rcvr and: [self isIntegerValue: result]]) > > > I'd suspect the definition of isIntegerValue:, which is: No, not according to my testing with the RoarVM code base. isIntegerValue works fine. Extra casts do not make a difference, but that might also be because the return type of our isIntegerValue is already our equivalent of sqInt (oop_int_t). So, I am pretty sure it is not that. So what remains is: `(result // arg) = rcvr`/`((r / ai) == ri)` or in context: oop_int_t ri = rcvr.integerValue(); oop_int_t ai = arg.integerValue(); oop_int_t r = ri * ai; if (ai == 0 || (((r / ai) == ri) && Oop::isIntegerValue(r))) { Lets look at some beautiful assembly :-/ This is RoarVM's bytecodePrimMultiply, C++ code for reference below. This is the output with clang and -O2. I think the critical part is: 0x00032f04 <+0068> sar %ebx // oop_int_t ri = rcvr.integerValue(); 0x00032f06 <+0070> sar %edx // oop_int_t ai = arg.integerValue(); 0x00032f08 <+0072> imul %edx,%ebx // oop_int_t r = ri * ai // The following should be: if (ai == 0 || (((r / ai) == ri) && Oop::isIntegerValue(r))) { 0x00032f0b <+0075> lea (%ebx,%ebx,1),%eax 0x00032f0e <+0078> test %edx,%edx 0x00032f10 <+0080> je 0x32f1a <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+90> 0x00032f12 <+0082> xor %eax,%ebx 0x00032f14 <+0084> js 0x3301f <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+351> I don't fully understand what that means, my assembly is rather rusty... The full code is at the end of the mail. But first for comparison my quick fix: oop_int_t ri = rcvr.integerValue(); oop_int_t ai = arg.integerValue(); long long result_with_overflow = (long long)ri * ai; if (ai == 0 || Oop::isIntegerValue(result_with_overflow)) { internalPopThenPush(2, Oop::from_int(result_with_overflow)); fetchNextBytecode(); return; } This results in the following code, relevant for this snippet: 0x00032ee4 <+0068> sar %ebx // oop_int_t ri = rcvr.integerValue(); 0x00032ee6 <+0070> sar %edx // oop_int_t ai = arg.integerValue(); 0x00032ee8 <+0072> mov %ebx,%eax 0x00032eea <+0074> imul %edx 0x00032eec <+0076> test %ebx,%ebx 0x00032eee <+0078> jne 0x32ef5 <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+85> 0x00032ef0 <+0080> lea (%eax,%eax,1),%ecx 0x00032ef3 <+0083> jmp 0x32f15 <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+117> 0x00032ef5 <+0085> mov %eax,%ecx 0x00032ef7 <+0087> xor %eax,%ecx 0x00032ef9 <+0089> mov %eax,%edi 0x00032efb <+0091> sar $0x1f,%edi 0x00032efe <+0094> xor %edx,%edi 0x00032f00 <+0096> or %ecx,%edi 0x00032f02 <+0098> jne 0x3301a <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+378> 0x00032f08 <+0104> lea (%eax,%eax,1),%ecx 0x00032f0b <+0107> mov %ecx,%edx 0x00032f0d <+0109> xor %eax,%edx 0x00032f0f <+0111> js 0x3301a <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+378> Well, I do not really get it. But it sure looks like my quick fix could be a bit more expensive... Best regards Stefan Here the assembly generated for the original code: // Starting with the prolog 0x00032ec0 <+0000> push %ebp 0x00032ec1 <+0001> mov %esp,%ebp 0x00032ec3 <+0003> push %ebx 0x00032ec4 <+0004> push %edi 0x00032ec5 <+0005> push %esi 0x00032ec6 <+0006> sub $0x3c,%esp 0x00032ec9 <+0009> call 0x32ece <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+14> 0x00032ece <+0014> pop %edi 0x00032ecf <+0015> mov 0x8(%ebp),%esi // looks like some test for the Int_Tag 0x00032ed2 <+0018> testb $0x1,0x7196(%esi) 0x00032ed9 <+0025> je 0x3304f <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+399> 0x00032edf <+0031> mov 0x7188(%esi),%eax 0x00032ee5 <+0037> mov -0x4(%eax),%ebx 0x00032ee8 <+0040> mov (%eax),%edx 0x00032eea <+0042> mov %ebx,%ecx // looks like the second test for the Int_Tag 0x00032eec <+0044> and $0x1,%ecx 0x00032eef <+0047> test %ecx,%edx 0x00032ef1 <+0049> je 0x32f3c <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+124> 0x00032ef3 <+0051> test %ecx,%ecx 0x00032ef5 <+0053> je 0x33083 <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+451> 0x00032efb <+0059> test $0x1,%dl 0x00032efe <+0062> je 0x33083 <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+451> 0x00032f04 <+0068> sar %ebx // oop_int_t ri = rcvr.integerValue(); 0x00032f06 <+0070> sar %edx // oop_int_t ai = arg.integerValue(); 0x00032f08 <+0072> imul %edx,%ebx // oop_int_t r = ri * ai // The following should be: if (ai == 0 || (((r / ai) == ri) && Oop::isIntegerValue(r))) { 0x00032f0b <+0075> lea (%ebx,%ebx,1),%eax 0x00032f0e <+0078> test %edx,%edx 0x00032f10 <+0080> je 0x32f1a <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+90> 0x00032f12 <+0082> xor %eax,%ebx 0x00032f14 <+0084> js 0x3301f <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+351> // The following is the successful case, and the remaining body of bytecodePrimMultiply. 0x00032f1a <+0090> or $0x1,%eax 0x00032f1d <+0093> mov %eax,-0x10(%ebp) 0x00032f20 <+0096> mov -0x10(%ebp),%eax 0x00032f23 <+0099> mov %eax,0x8(%esp) 0x00032f27 <+0103> mov %esi,(%esp) 0x00032f2a <+0106> movl $0x2,0x4(%esp) 0x00032f32 <+0114> call 0x36d70 <_ZN18Squeak_Interpreter19internalPopThenPushEi3Oop> 0x00032f37 <+0119> jmp 0x33015 <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+341> 0x00032f3c <+0124> movb $0x1,0x4180(%esi) 0x00032f43 <+0131> mov 0x7184(%esi),%ecx 0x00032f49 <+0137> mov %ecx,0x4168(%esi) 0x00032f4f <+0143> mov %eax,0x416c(%esi) 0x00032f55 <+0149> mov 0x718c(%esi),%eax 0x00032f5b <+0155> mov (%eax),%ecx 0x00032f5d <+0157> and $0x3,%ecx 0x00032f60 <+0160> mov %edx,-0x2c(%ebp) 0x00032f63 <+0163> mov 0x6e3e6(%edi),%edx 0x00032f69 <+0169> movsbl (%edx,%ecx,1),%ecx 0x00032f6d <+0173> shl $0x2,%ecx 0x00032f70 <+0176> neg %ecx 0x00032f72 <+0178> mov (%eax,%ecx,1),%ecx 0x00032f75 <+0181> and $0xfffffffc,%ecx 0x00032f78 <+0184> mov %ecx,0x3c(%esi) 0x00032f7b <+0187> mov %eax,0x4164(%esi) 0x00032f81 <+0193> movb $0x1,0x7195(%esi) 0x00032f88 <+0200> lea -0x18(%ebp),%eax 0x00032f8b <+0203> mov %eax,(%esp) 0x00032f8e <+0206> movl $0x1,0x4(%esp) 0x00032f96 <+0214> call 0x75070 <_ZN17Safepoint_AbilityC1Eb> 0x00032f9b <+0219> mov %ebx,-0x20(%ebp) 0x00032f9e <+0222> mov -0x2c(%ebp),%eax 0x00032fa1 <+0225> mov %eax,-0x28(%ebp) 0x00032fa4 <+0228> mov -0x28(%ebp),%eax 0x00032fa7 <+0231> mov %eax,0x8(%esp) 0x00032fab <+0235> mov -0x20(%ebp),%eax 0x00032fae <+0238> mov %eax,0x4(%esp) 0x00032fb2 <+0242> mov %esi,(%esp) 0x00032fb5 <+0245> call 0x3ccf0 <_ZN18Squeak_Interpreter22primitiveFloatMultiplyE3OopS0_> 0x00032fba <+0250> lea -0x18(%ebp),%eax 0x00032fbd <+0253> mov %eax,(%esp) 0x00032fc0 <+0256> call 0x75120 <_ZN17Safepoint_AbilityD1Ev> 0x00032fc5 <+0261> testb $0x1,0x7195(%esi) 0x00032fcc <+0268> je 0x330b7 <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+503> 0x00032fd2 <+0274> mov 0x72e0(%esi),%eax 0x00032fd8 <+0280> testb $0x1,(%eax) 0x00032fdb <+0283> jne 0x330eb <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+555> 0x00032fe1 <+0289> mov 0x4168(%esi),%eax 0x00032fe7 <+0295> mov %eax,0x7184(%esi) 0x00032fed <+0301> mov 0x416c(%esi),%eax 0x00032ff3 <+0307> mov %eax,0x7188(%esi) 0x00032ff9 <+0313> mov 0x4164(%esi),%eax 0x00032fff <+0319> mov %eax,0x718c(%esi) 0x00033005 <+0325> movb $0x1,0x7196(%esi) 0x0003300c <+0332> testb $0x1,0x4180(%esi) 0x00033013 <+0339> je 0x3301f <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+351> 0x00033015 <+0341> mov %esi,(%esp) 0x00033018 <+0344> call 0x35f30 <_ZN18Squeak_Interpreter17fetchNextBytecodeEv> 0x0003301d <+0349> jmp 0x33047 <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+391> 0x0003301f <+0351> mov 0x8(%esi),%eax 0x00033022 <+0354> mov (%eax),%eax 0x00033024 <+0356> and $0xfffffffc,%eax 0x00033027 <+0359> mov 0x60(%eax),%eax 0x0003302a <+0362> mov (%eax),%eax 0x0003302c <+0364> and $0xfffffffc,%eax 0x0003302f <+0367> mov 0x44(%eax),%eax 0x00033032 <+0370> mov %eax,0x44(%esi) 0x00033035 <+0373> movl $0x1,0x71a0(%esi) 0x0003303f <+0383> mov %esi,(%esp) 0x00033042 <+0386> call 0x36460 <_ZN18Squeak_Interpreter10normalSendEv> 0x00033047 <+0391> add $0x3c,%esp 0x0003304a <+0394> pop %esi 0x0003304b <+0395> pop %edi 0x0003304c <+0396> pop %ebx 0x0003304d <+0397> pop %ebp 0x0003304e <+0398> ret 0x0003304f <+0399> lea 0x5e81e(%edi),%eax 0x00033055 <+0405> mov %eax,0x10(%esp) 0x00033059 <+0409> lea 0x62a4e(%edi),%eax 0x0003305f <+0415> mov %eax,0xc(%esp) 0x00033063 <+0419> lea 0x615ae(%edi),%eax 0x00033069 <+0425> mov %eax,0x4(%esp) 0x0003306d <+0429> lea 0x62a3e(%edi),%eax 0x00033073 <+0435> mov %eax,(%esp) 0x00033076 <+0438> movl $0xd1,0x8(%esp) 0x0003307e <+0446> call 0x6db20 <_Z14assert_failurePKcS0_iS0_S0_> 0x00033083 <+0451> lea 0x5e81e(%edi),%eax 0x00033089 <+0457> mov %eax,0x10(%esp) 0x0003308d <+0461> lea 0x62b61(%edi),%eax 0x00033093 <+0467> mov %eax,0xc(%esp) 0x00033097 <+0471> lea 0x62b23(%edi),%eax 0x0003309d <+0477> mov %eax,0x4(%esp) 0x000330a1 <+0481> lea 0x62b16(%edi),%eax 0x000330a7 <+0487> mov %eax,(%esp) 0x000330aa <+0490> movl $0x3f,0x8(%esp) 0x000330b2 <+0498> call 0x6db20 <_Z14assert_failurePKcS0_iS0_S0_> 0x000330b7 <+0503> lea 0x5e81e(%edi),%eax 0x000330bd <+0509> mov %eax,0x10(%esp) 0x000330c1 <+0513> lea 0x6298b(%edi),%eax C++ code: void Squeak_Interpreter::bytecodePrimMultiply() { Oop rcvr = internalStackValue(1); Oop arg = internalStackValue(0); if (areIntegers(rcvr, arg)) { oop_int_t ri = rcvr.integerValue(); oop_int_t ai = arg.integerValue(); oop_int_t r = ri * ai; if (ai == 0 || (((r / ai) == ri) && Oop::isIntegerValue(r))) { internalPopThenPush(2, Oop::from_int(r)); fetchNextBytecode(); return; } } else { successFlag = true; externalizeIPandSP(); { Safepoint_Ability sa(true); primitiveFloatMultiply(rcvr, arg); } internalizeIPandSP(); if (successFlag) { fetchNextBytecode(); return; } } roots.messageSelector = specialSelector(8); set_argumentCount(1); normalSend(); } -- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525 |
On Tue, 4 Oct 2011, Stefan Marr wrote: snip > Lets look at some beautiful assembly :-/ > > This is RoarVM's bytecodePrimMultiply, C++ code for reference below. This is the output with clang and -O2. > > I think the critical part is: > > 0x00032f04 <+0068> sar %ebx // oop_int_t ri = rcvr.integerValue(); > 0x00032f06 <+0070> sar %edx // oop_int_t ai = arg.integerValue(); > 0x00032f08 <+0072> imul %edx,%ebx // oop_int_t r = ri * ai > // The following should be: if (ai == 0 || (((r / ai) == ri) && Oop::isIntegerValue(r))) { > 0x00032f0b <+0075> lea (%ebx,%ebx,1),%eax This is the first part of the isIntegerValue check (assign the value of r1 + (r1 * 1) to the eax register). > 0x00032f0e <+0078> test %edx,%edx > 0x00032f10 <+0080> je 0x32f1a <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+90> This is the ai == 0 check. > 0x00032f12 <+0082> xor %eax,%ebx > 0x00032f14 <+0084> js 0x3301f <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+351> This is the second part of the #isIntegerValue check (xor 2 * r1 with r1) So the (r / ai) == ri was assumed to be true by the compiler, therefore not checked. Levente |
Hi Levente: On 04 Oct 2011, at 22:25, Levente Uzonyi wrote: > On Tue, 4 Oct 2011, Stefan Marr wrote: > > snip > >> 0x00032f04 <+0068> sar %ebx // oop_int_t ri = rcvr.integerValue(); >> 0x00032f06 <+0070> sar %edx // oop_int_t ai = arg.integerValue(); >> 0x00032f08 <+0072> imul %edx,%ebx // oop_int_t r = ri * ai >> // The following should be: if (ai == 0 || (((r / ai) == ri) && Oop::isIntegerValue(r))) { >> 0x00032f0b <+0075> lea (%ebx,%ebx,1),%eax > > This is the first part of the isIntegerValue check (assign the value of r1 + (r1 * 1) to the eax register). > >> 0x00032f0e <+0078> test %edx,%edx >> 0x00032f10 <+0080> je 0x32f1a <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+90> > > This is the ai == 0 check. >> 0x00032f12 <+0082> xor %eax,%ebx >> 0x00032f14 <+0084> js 0x3301f <_ZN18Squeak_Interpreter20bytecodePrimMultiplyEv+351> > > This is the second part of the #isIntegerValue check (xor 2 * r1 with r1) > So the (r / ai) == ri was assumed to be true by the compiler, therefore not checked. Ok, now the question is probably whether that is a legal transformation, and if so, what the best alternative is. Using the `long long` to be able to check for overflow might not be the most efficient one. Thanks Stefan -- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525 |
In reply to this post by Eliot Miranda-2
On Tue, Oct 04, 2011 at 10:11:33AM -0700, Eliot Miranda wrote: > > Hi Stefan, > > so the VMMaker source is probably easier to read. First, thanks for the > optimization; clearly > > (arg = 0 or: [(result // arg) = rcvr and: [self isIntegerValue: result]]) > > > is better than the original > > ((arg = 0 or: [(result // arg) = rcvr]) and: [self isIntegerValue: result]) > > > I'd suspect the definition of isIntegerValue:, which is: > > ObjectMemory methods for interpreter access > isIntegerValue: intValue > "Return true if the given value can be represented as a Smalltalk integer > value." > "Use a shift and XOR to set the sign bit if and only if the top two bits of > the given value are the same, then test the sign bit. Note that the top two > bits are equal for exactly those integers in the range that can be > represented in 31-bits or 63-bits." > > ^ (intValue bitXor: (intValue << 1)) >= 0 > > The comment makes sense but there is perhaps nothing to guarantee that a C > compiler must consider the type of signed ^ signed as signed, in which case > the compiler would assume it is always >= 0, and the overflow detection > would fail. > > If you change the method to > > ObjectMemory methods for interpreter access > isIntegerValue: intValue > "Return true if the given value can be represented as a Smalltalk integer > value." > "Use a shift and XOR to set the sign bit if and only if the top two bits of > the given value are the same, then test the sign bit. Note that the top two > bits are equal for exactly those integers in the range that can be > represented in 31-bits or 63-bits." > > ^(intValue bitXor: (intValue << 1)) asInteger >= 0 > > which should force a cast to sqInt, does that fix things? > > i.e. it should produce > > if (((arg == 0) > || ((result / arg) == rcvr)) > && ((sqInt)((result ^ (result << 1))) >= 0)) { The current implementation of isIntegerValue does this type declaration properly. Background: http://bugs.squeak.org/view.php?id=5238 The implementation of ObjectMemory>>isIntegerValue: in VMMaker trunk is as follows: isIntegerValue: intValue "Return true if the given value can be represented as a Smalltalk integer value." "Use a shift and XOR to set the sign bit if and only if the top two bits of the given value are the same, then test the sign bit. Note that the top two bits are equal for exactly those integers in the range that can be represented in 31-bits or 63-bits. Operands are coerced to machine integer size so the test will work with 64 bit images on 32 bit hosts. When running on a 32 bit host, the cast to int has little or no performance impact for either 32 bit or 64 bit images. On a 64 bit host, the shift and XOR test is replaced by an explicit range check, which provides the best performance for both 32 bit and 64 bit images. If the range of small integers is enlarged for 64 bit images, this method must be updated accordingly." ^ self isDefined: 'SQ_HOST32' inSmalltalk: [true] comment: 'cast to int for 64 bit image on 32 bit host' ifTrue: ((self cCoerce: intValue to: 'int') bitXor: ((self cCoerce: intValue to: 'int') << 1)) >= 0 ifFalse: (intValue >= 16r-40000000 and: [intValue <= 16r3FFFFFFF]) When translated to C this becomes: sqInt isIntegerValue(sqInt intValue) { return # ifdef SQ_HOST32 // cast to int for 64 bit image on 32 bit host (((((int) intValue)) ^ ((((int) intValue)) << 1)) >= 0) # else ((intValue >= -1073741824) && (intValue <= 1073741823)) # endif // SQ_HOST32 ; } So on a 32-bit platform this ensures that the check will use the following signed comparison: sqInt isIntegerValue(sqInt intValue) { return (((((int) intValue)) ^ ((((int) intValue)) << 1)) >= 0); } Dave > > On Mon, Oct 3, 2011 at 5:46 PM, Stefan Marr <[hidden email]> wrote: > > > > > Dear all: > > > > With the RoarVM, I was hitting a problem with Clang 3.0 > > (tags/Apple/clang-211.9). > > > > It turned out that Clang does not like the current implementation of > > bytecodePrimMultiply. > > > > From what I can tell, it should be identical with what I found in the > > SqueakVM sources. > > > > RoarVM src: > > https://github.com/smarr/RoarVM/blob/master/vm/src/interpreter/interpreter_bytecodes.cpp#L316 > > > > oop_int_t ri = rcvr.integerValue(); > > oop_int_t ai = arg.integerValue(); > > oop_int_t r = ri * ai; > > if (ai == 0 || (r / ai == ri && Oop::isIntegerValue(r))) { > > internalPopThenPush(2, Oop::from_int(r)); > > fetchNextBytecode(); > > return; > > } > > > > Squeak src: > > > > http://squeakvm.org/cgi-bin/viewcvs.cgi/branches/Cog/src/vm/cointerp.c?rev=2497&view=auto > > > > rcvr = (rcvr >> 1); > > arg = (arg >> 1); > > result = rcvr * arg; > > if (((arg == 0) > > || ((result / arg) == rcvr)) > > && ((result ^ (result << 1)) >= 0)) { > > /* begin internalPop:thenPush: */ > > longAtPointerput((localSP += (2 - 1) * BytesPerWord), > > ((result << 1) | 1)); > > /* begin fetchNextBytecode */ > > currentBytecode = byteAtPointer(++localIP); > > goto l69; > > } > > > > > > The problematic example is rcvr: 40453, arg: 864001. > > > > In that case, the current implementation does not detect the overflow > > correctly when Clang is used with >= O1 optimization. > > > > The big question for me now is, whether we got here a Clang compiler bug, > > or unspecified C behavior. > > > > > > My quick fix is: > > > > oop_int_t ri = rcvr.integerValue(); > > oop_int_t ai = arg.integerValue(); > > long long result_with_overflow = (long long)ri * ai; > > if (ai == 0 || ((result_with_overflow >= -1073741824) && > > (result_with_overflow <= 1073741823))) > > { > > internalPopThenPush(2, Oop::from_int(result_with_overflow)); > > fetchNextBytecode(); > > return; > > } > > > > This works obviously, but I assume that the original code is a well > > considered speed optimizations. > > > > Any thoughts? > > > > Thanks > > Stefan > > > > -- > > Stefan Marr > > Software Languages Lab > > Vrije Universiteit Brussel > > Pleinlaan 2 / B-1050 Brussels / Belgium > > http://soft.vub.ac.be/~smarr > > Phone: +32 2 629 2974 > > Fax: +32 2 629 3525 > > > > > > > -- > best, > Eliot |
Free forum by Nabble | Edit this page |