After running croquet for a while, a FP except ion occured. So, I
restarted it from scratch and started "Croquet (Master)" and left it to run. I did not enter to world and move anything around. Eventually, the FP exception occured. I searched the developers archives for floating point exception, but couldn't find anything. I can not debug it much, it runs very slowly. But, i took a screen shot of it and attached it. It might tell someone something. I'm going to attempt to attach the screenshot to this msg, hoping that this mailing list accepts attachments. This is the standard off-the-shelf SDK 1.0 from the croquetproject website on Linux: $ uname -a Linux IVES 2.6.25-gentoo-r7-a #3 SMP PREEMPT Wed Oct 1 14:45:40 PDT 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ AuthenticAMD GNU/Linux -- Brad Fuller FPE.png (29K) Download Attachment |
That's kind of interesting.
The problem seems to be that a 3d vector is somehow reporting a negative length. Taking the square root of a negative number causes croquet to throw an exception. This is implemented by a primitive method called from FloatArray called dot: which does this in Smalltalk: dot: aFloatVector "Primitive. Return the dot product of the receiver and the argument. Fail if the argument is not of the same size as the receiver." | result | <primitive:'primitiveDotProduct' module: 'FloatArrayPlugin'> self size = aFloatVector size ifFalse:[^self error:'Must be equal size']. result := 0.0. 1 to: self size do:[:i| result := result + ((self at: i) * (aFloatVector at: i)). ]. ^result If the C code is doing what the Smalltalk claims to be doing, it should never return a negative number, unless someone's implemented complex floats :) Steve On Thu, Nov 6, 2008 at 8:04 AM, Brad Fuller <[hidden email]> wrote: After running croquet for a while, a FP except ion occured. So, I |
In reply to this post by Brad Fuller-4
Brad - very likely you have one of two problems:
1) A bad Croquet/Squeak VM. 2) a serious bug in your Linux kernel. The stack trace suggests that the code which computes the length of a 3D vector is trying to take the sqrt of a number less than zero. This should NEVER happen. What could cause it to happen is a processor floating point exception flag that is left set and which triggers a faux exception in the sqrt operation. I don't have your machine or the particular SDK. Try running it on another machine with a different Linux kernel... (there have been many conflicts in the last few months about using the new GCC to compile the Linux kernel. There is assembly code in the x86 Linux kernel that expects GCC to treat flag register one way, and GCC just decided to go another way, based on the C Standard, though who knows if it has affected your case) Brad Fuller wrote: > After running croquet for a while, a FP except ion occured. So, I > restarted it from scratch and started "Croquet (Master)" and left it > to run. I did not enter to world and move anything around. Eventually, > the FP exception occured. > > I searched the developers archives for floating point exception, but > couldn't find anything. > > I can not debug it much, it runs very slowly. But, i took a screen > shot of it and attached it. It might tell someone something. I'm going > to attempt to attach the screenshot to this msg, hoping that this > mailing list accepts attachments. > > This is the standard off-the-shelf SDK 1.0 from the croquetproject > website on Linux: > > $ uname -a > Linux IVES 2.6.25-gentoo-r7-a #3 SMP PREEMPT Wed Oct 1 14:45:40 PDT > 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ > AuthenticAMD GNU/Linux > > > ------------------------------------------------------------------------ > |
thanks all. I'll see what I can do.
brad On Thu, Nov 6, 2008 at 9:05 AM, David P. Reed <[hidden email]> wrote: > Brad - very likely you have one of two problems: > > 1) A bad Croquet/Squeak VM. > 2) a serious bug in your Linux kernel. > > The stack trace suggests that the code which computes the length of a 3D > vector is trying to take the sqrt of a number less than zero. This should > NEVER happen. > > What could cause it to happen is a processor floating point exception flag > that is left set and which triggers a faux exception in the sqrt operation. > > I don't have your machine or the particular SDK. Try running it on another > machine with a different Linux kernel... > > (there have been many conflicts in the last few months about using the new > GCC to compile the Linux kernel. There is assembly code in the x86 Linux > kernel that expects GCC to treat flag register one way, and GCC just decided > to go another way, based on the C Standard, though who knows if it has > affected your case) > > Brad Fuller wrote: >> >> After running croquet for a while, a FP except ion occured. So, I >> restarted it from scratch and started "Croquet (Master)" and left it >> to run. I did not enter to world and move anything around. Eventually, >> the FP exception occured. >> >> I searched the developers archives for floating point exception, but >> couldn't find anything. >> >> I can not debug it much, it runs very slowly. But, i took a screen >> shot of it and attached it. It might tell someone something. I'm going >> to attempt to attach the screenshot to this msg, hoping that this >> mailing list accepts attachments. >> >> This is the standard off-the-shelf SDK 1.0 from the croquetproject >> website on Linux: >> >> $ uname -a >> Linux IVES 2.6.25-gentoo-r7-a #3 SMP PREEMPT Wed Oct 1 14:45:40 PDT >> 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ >> AuthenticAMD GNU/Linux >> >> >> ------------------------------------------------------------------------ >> > -- Brad Fuller |
In reply to this post by Brad Fuller-4
somehow you took the squareroot of a negative number. Sorry if I am
late to the party in responding, I've been busy. Regards, Les H On Thu, 2008-11-06 at 08:04 -0800, Brad Fuller wrote: > After running croquet for a while, a FP except ion occured. So, I > restarted it from scratch and started "Croquet (Master)" and left it > to run. I did not enter to world and move anything around. Eventually, > the FP exception occured. > > I searched the developers archives for floating point exception, but > couldn't find anything. > > I can not debug it much, it runs very slowly. But, i took a screen > shot of it and attached it. It might tell someone something. I'm going > to attempt to attach the screenshot to this msg, hoping that this > mailing list accepts attachments. > > This is the standard off-the-shelf SDK 1.0 from the croquetproject > website on Linux: > > $ uname -a > Linux IVES 2.6.25-gentoo-r7-a #3 SMP PREEMPT Wed Oct 1 14:45:40 PDT > 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ > AuthenticAMD GNU/Linux > |
Les wrote:
somehow you took the squareroot of a negative number. Sorry if I am late to the party in responding, I've been busy. Yeah, but the weird thing is that the dot-product of a vector with itself is never negative. It might be interesting/enlightening to compute the dot-products of some random vectors and verify that the results are sane. Eg: we would expect "1@2@3 dot: 4@5@6" to evaluate to 32.0; does it? It would be a very helpful test case if we could identify a pair of vectors that give an incorrect result. Cheers, Josh Regards, Les H On Thu, 2008-11-06 at 08:04 -0800, Brad Fuller wrote:After running croquet for a while, a FP except ion occured. So, I restarted it from scratch and started "Croquet (Master)" and left it to run. I did not enter to world and move anything around. Eventually, the FP exception occured. I searched the developers archives for floating point exception, but couldn't find anything. I can not debug it much, it runs very slowly. But, i took a screen shot of it and attached it. It might tell someone something. I'm going to attempt to attach the screenshot to this msg, hoping that this mailing list accepts attachments. This is the standard off-the-shelf SDK 1.0 from the croquetproject website on Linux: $ uname -a Linux IVES 2.6.25-gentoo-r7-a #3 SMP PREEMPT Wed Oct 1 14:45:40 PDT 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ AuthenticAMD GNU/Linux |
In reply to this post by Les Howell
If you look at the stack, the sqrt in question is in a calculation of
the length of a vector. This calculation is: sqrt(delta_x^2 + delta_y^2 + delta_z^2) The ^2 makes all summands positive or zero. Thus, the square root NEVER has a negative argument in this case. However, the FSQRT function in the x87 instruction subset does not clear the FPE exception. If the exception flag is checked only *after* the FSQRT (as is the convention in most C compilers), any prior FPE setting instruction can have caused the exception, and in Linux these days, it can actually be inherited from other parallel processes, because of the bug I mentioned. So in fact, the bug may actually be caused far, far away from where it appears if the problem is in the VM or the Linux kernel. And the Croquet call history is a path that is repeatedly recomputed (many times per second) and always gives the same answer based on the same data if the user is not moving the mouse, so the likelihood that the symptom occurs only after a long idle time due to "local" effects is near zero. Les wrote: > somehow you took the squareroot of a negative number. Sorry if I am > late to the party in responding, I've been busy. > > Regards, > Les H > On Thu, 2008-11-06 at 08:04 -0800, Brad Fuller wrote: > >> After running croquet for a while, a FP except ion occured. So, I >> restarted it from scratch and started "Croquet (Master)" and left it >> to run. I did not enter to world and move anything around. Eventually, >> the FP exception occured. >> >> I searched the developers archives for floating point exception, but >> couldn't find anything. >> >> I can not debug it much, it runs very slowly. But, i took a screen >> shot of it and attached it. It might tell someone something. I'm going >> to attempt to attach the screenshot to this msg, hoping that this >> mailing list accepts attachments. >> >> This is the standard off-the-shelf SDK 1.0 from the croquetproject >> website on Linux: >> >> $ uname -a >> Linux IVES 2.6.25-gentoo-r7-a #3 SMP PREEMPT Wed Oct 1 14:45:40 PDT >> 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ >> AuthenticAMD GNU/Linux >> >> > > > |
FWIW, the version of #primitiveSqrt I have seems to generate the
exception only if self < 0.0. While the FSQRT bug may be a problem elsewhere, it doesn't look like the culprit here. Cheers, Bob On Nov 7, 2008, at 10:49 AM, David P. Reed wrote: > If you look at the stack, the sqrt in question is in a calculation > of the length of a vector. > > This calculation is: sqrt(delta_x^2 + delta_y^2 + delta_z^2) > > The ^2 makes all summands positive or zero. Thus, the square root > NEVER has a negative argument in this case. > > However, the FSQRT function in the x87 instruction subset does not > clear the FPE exception. If the exception flag is checked only > *after* the FSQRT (as is the convention in most C compilers), any > prior FPE setting instruction can have caused the exception, and in > Linux these days, it can actually be inherited from other parallel > processes, because of the bug I mentioned. > > So in fact, the bug may actually be caused far, far away from where > it appears if the problem is in the VM or the Linux kernel. And > the Croquet call history is a path that is repeatedly recomputed > (many times per second) and always gives the same answer based on > the same data if the user is not moving the mouse, so the > likelihood that the symptom occurs only after a long idle time due > to "local" effects is near zero. > > > > Les wrote: >> somehow you took the squareroot of a negative number. Sorry if I am >> late to the party in responding, I've been busy. >> >> Regards, >> Les H >> On Thu, 2008-11-06 at 08:04 -0800, Brad Fuller wrote: >> >>> After running croquet for a while, a FP except ion occured. So, I >>> restarted it from scratch and started "Croquet (Master)" and left it >>> to run. I did not enter to world and move anything around. >>> Eventually, >>> the FP exception occured. >>> >>> I searched the developers archives for floating point exception, but >>> couldn't find anything. >>> >>> I can not debug it much, it runs very slowly. But, i took a screen >>> shot of it and attached it. It might tell someone something. I'm >>> going >>> to attempt to attach the screenshot to this msg, hoping that this >>> mailing list accepts attachments. >>> >>> This is the standard off-the-shelf SDK 1.0 from the croquetproject >>> website on Linux: >>> >>> $ uname -a >>> Linux IVES 2.6.25-gentoo-r7-a #3 SMP PREEMPT Wed Oct 1 14:45:40 PDT >>> 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ >>> AuthenticAMD GNU/Linux >>> >>> >> >> >> |
That's why it is a mystery! The code in question cannot pass a negative
number. Ever. Bob Arning wrote: > FWIW, the version of #primitiveSqrt I have seems to generate the > exception only if self < 0.0. While the FSQRT bug may be a problem > elsewhere, it doesn't look like the culprit here. > > Cheers, > Bob > > > On Nov 7, 2008, at 10:49 AM, David P. Reed wrote: > >> If you look at the stack, the sqrt in question is in a calculation of >> the length of a vector. >> >> This calculation is: sqrt(delta_x^2 + delta_y^2 + delta_z^2) >> >> The ^2 makes all summands positive or zero. Thus, the square root >> NEVER has a negative argument in this case. >> >> However, the FSQRT function in the x87 instruction subset does not >> clear the FPE exception. If the exception flag is checked only >> *after* the FSQRT (as is the convention in most C compilers), any >> prior FPE setting instruction can have caused the exception, and in >> Linux these days, it can actually be inherited from other parallel >> processes, because of the bug I mentioned. >> >> So in fact, the bug may actually be caused far, far away from where >> it appears if the problem is in the VM or the Linux kernel. And the >> Croquet call history is a path that is repeatedly recomputed (many >> times per second) and always gives the same answer based on the same >> data if the user is not moving the mouse, so the likelihood that the >> symptom occurs only after a long idle time due to "local" effects is >> near zero. >> >> >> >> Les wrote: >>> somehow you took the squareroot of a negative number. Sorry if I am >>> late to the party in responding, I've been busy. >>> >>> Regards, >>> Les H >>> On Thu, 2008-11-06 at 08:04 -0800, Brad Fuller wrote: >>> >>>> After running croquet for a while, a FP except ion occured. So, I >>>> restarted it from scratch and started "Croquet (Master)" and left it >>>> to run. I did not enter to world and move anything around. Eventually, >>>> the FP exception occured. >>>> >>>> I searched the developers archives for floating point exception, but >>>> couldn't find anything. >>>> >>>> I can not debug it much, it runs very slowly. But, i took a screen >>>> shot of it and attached it. It might tell someone something. I'm going >>>> to attempt to attach the screenshot to this msg, hoping that this >>>> mailing list accepts attachments. >>>> >>>> This is the standard off-the-shelf SDK 1.0 from the croquetproject >>>> website on Linux: >>>> >>>> $ uname -a >>>> Linux IVES 2.6.25-gentoo-r7-a #3 SMP PREEMPT Wed Oct 1 14:45:40 PDT >>>> 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ >>>> AuthenticAMD GNU/Linux >>>> >>>> >>> >>> >>> > > |
Unless the compiler is broken or the primitive code is being incorrectly generated.
Can someone reproduce this problem in a debugger? On Fri, Nov 7, 2008 at 8:27 AM, David P. Reed <[hidden email]> wrote: That's why it is a mystery! The code in question cannot pass a negative number. Ever. |
Short of seeing this in a debugger , the SqueakDebug.log file has a
lot of clues. Seeing the file from Brad's case might tell us a lot. Cheers, Bob FloatingPointException: undefined if less than zero. 7 November 2008 11:34:51 am VM: Mac OS - a SmalltalkImage Image: Croquet1.0beta [latest update: #2] SecurityManager state: Restricted: false FileAccess: true SocketAccess: true Working Dir /Users/bob/Desktop/Miscellaneous/CroquetSDK-1.0.18 Trusted Dir /Users/bob/Desktop/Miscellaneous/CroquetSDK-1.0.18 Untrusted Dir foobar/tooBar/forSqueak/bogus Float>>primitiveSqrt Receiver: -1.0 Arguments and temporary variables: exp: nil guess: nil eps: nil delta: nil Receiver's instance variables: -1.0 Float>>sqrt Receiver: -1.0 Arguments and temporary variables: Receiver's instance variables: -1.0 SmallInteger(Number)>>sqrt Receiver: -1 Arguments and temporary variables: Receiver's instance variables: -1 UndefinedObject>>DoIt Receiver: nil Arguments and temporary variables: Receiver's instance variables: nil On Nov 7, 2008, at 11:31 AM, Steve Wart wrote: > Unless the compiler is broken or the primitive code is being > incorrectly generated. > > Can someone reproduce this problem in a debugger? > > On Fri, Nov 7, 2008 at 8:27 AM, David P. Reed <[hidden email]> wrote: > That's why it is a mystery! The code in question cannot pass a > negative number. Ever. > > > Bob Arning wrote: > FWIW, the version of #primitiveSqrt I have seems to generate the > exception only if self < 0.0. While the FSQRT bug may be a problem > elsewhere, it doesn't look like the culprit here. > > Cheers, > Bob > > > On Nov 7, 2008, at 10:49 AM, David P. Reed wrote: > > If you look at the stack, the sqrt in question is in a calculation > of the length of a vector. > > This calculation is: sqrt(delta_x^2 + delta_y^2 + delta_z^2) > > The ^2 makes all summands positive or zero. Thus, the square root > NEVER has a negative argument in this case. > > However, the FSQRT function in the x87 instruction subset does not > clear the FPE exception. If the exception flag is checked only > *after* the FSQRT (as is the convention in most C compilers), any > prior FPE setting instruction can have caused the exception, and in > Linux these days, it can actually be inherited from other parallel > processes, because of the bug I mentioned. > > So in fact, the bug may actually be caused far, far away from where > it appears if the problem is in the VM or the Linux kernel. And > the Croquet call history is a path that is repeatedly recomputed > (many times per second) and always gives the same answer based on > the same data if the user is not moving the mouse, so the > likelihood that the symptom occurs only after a long idle time due > to "local" effects is near zero. > > > > Les wrote: > somehow you took the squareroot of a negative number. Sorry if I am > late to the party in responding, I've been busy. > > Regards, > Les H > On Thu, 2008-11-06 at 08:04 -0800, Brad Fuller wrote: > > After running croquet for a while, a FP except ion occured. So, I > restarted it from scratch and started "Croquet (Master)" and left it > to run. I did not enter to world and move anything around. Eventually, > the FP exception occured. > > I searched the developers archives for floating point exception, but > couldn't find anything. > > I can not debug it much, it runs very slowly. But, i took a screen > shot of it and attached it. It might tell someone something. I'm going > to attempt to attach the screenshot to this msg, hoping that this > mailing list accepts attachments. > > This is the standard off-the-shelf SDK 1.0 from the croquetproject > website on Linux: > > $ uname -a > Linux IVES 2.6.25-gentoo-r7-a #3 SMP PREEMPT Wed Oct 1 14:45:40 PDT > 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ > AuthenticAMD GNU/Linux > > > > > > > > |
I was just examining the code again, and noticed the loop went from 1 to
size. I haven't played with squeak or cobalt, so the array sizes may not be what I think, but in C and most other modern languages, arrays start at element 0 and go to size-1. If that is the case, could there be an "empty" number at the end of the array? If so could code have changed that number to a negative, or could an overflow have happened. Also the code says fail if sizes don't match, what is the fail condition? If a fail returns a minus 1, that could be the source of the error. Regards, Les H On Fri, 2008-11-07 at 11:38 -0500, Bob Arning wrote: > Short of seeing this in a debugger , the SqueakDebug.log file has a > lot of clues. Seeing the file from Brad's case might tell us a lot. > > Cheers, > Bob > > > FloatingPointException: undefined if less than zero. > 7 November 2008 11:34:51 am > > VM: Mac OS - a SmalltalkImage > Image: Croquet1.0beta [latest update: #2] > > SecurityManager state: > Restricted: false > FileAccess: true > SocketAccess: true > Working Dir /Users/bob/Desktop/Miscellaneous/CroquetSDK-1.0.18 > Trusted Dir /Users/bob/Desktop/Miscellaneous/CroquetSDK-1.0.18 > Untrusted Dir foobar/tooBar/forSqueak/bogus > > Float>>primitiveSqrt > Receiver: -1.0 > Arguments and temporary variables: > exp: nil > guess: nil > eps: nil > delta: nil > Receiver's instance variables: > -1.0 > > Float>>sqrt > Receiver: -1.0 > Arguments and temporary variables: > > Receiver's instance variables: > -1.0 > > SmallInteger(Number)>>sqrt > Receiver: -1 > Arguments and temporary variables: > > Receiver's instance variables: > -1 > > UndefinedObject>>DoIt > Receiver: nil > Arguments and temporary variables: > > Receiver's instance variables: > nil > > On Nov 7, 2008, at 11:31 AM, Steve Wart wrote: > > > Unless the compiler is broken or the primitive code is being > > incorrectly generated. > > > > Can someone reproduce this problem in a debugger? > > > > On Fri, Nov 7, 2008 at 8:27 AM, David P. Reed <[hidden email]> wrote: > > That's why it is a mystery! The code in question cannot pass a > > negative number. Ever. > > > > > > Bob Arning wrote: > > FWIW, the version of #primitiveSqrt I have seems to generate the > > exception only if self < 0.0. While the FSQRT bug may be a problem > > elsewhere, it doesn't look like the culprit here. > > > > Cheers, > > Bob > > > > > > On Nov 7, 2008, at 10:49 AM, David P. Reed wrote: > > > > If you look at the stack, the sqrt in question is in a calculation > > of the length of a vector. > > > > This calculation is: sqrt(delta_x^2 + delta_y^2 + delta_z^2) > > > > The ^2 makes all summands positive or zero. Thus, the square root > > NEVER has a negative argument in this case. > > > > However, the FSQRT function in the x87 instruction subset does not > > clear the FPE exception. If the exception flag is checked only > > *after* the FSQRT (as is the convention in most C compilers), any > > prior FPE setting instruction can have caused the exception, and in > > Linux these days, it can actually be inherited from other parallel > > processes, because of the bug I mentioned. > > > > So in fact, the bug may actually be caused far, far away from where > > it appears if the problem is in the VM or the Linux kernel. And > > the Croquet call history is a path that is repeatedly recomputed > > (many times per second) and always gives the same answer based on > > the same data if the user is not moving the mouse, so the > > likelihood that the symptom occurs only after a long idle time due > > to "local" effects is near zero. > > > > > > > > Les wrote: > > somehow you took the squareroot of a negative number. Sorry if I am > > late to the party in responding, I've been busy. > > > > Regards, > > Les H > > On Thu, 2008-11-06 at 08:04 -0800, Brad Fuller wrote: > > > > After running croquet for a while, a FP except ion occured. So, I > > restarted it from scratch and started "Croquet (Master)" and left it > > to run. I did not enter to world and move anything around. Eventually, > > the FP exception occured. > > > > I searched the developers archives for floating point exception, but > > couldn't find anything. > > > > I can not debug it much, it runs very slowly. But, i took a screen > > shot of it and attached it. It might tell someone something. I'm going > > to attempt to attach the screenshot to this msg, hoping that this > > mailing list accepts attachments. > > > > This is the standard off-the-shelf SDK 1.0 from the croquetproject > > website on Linux: > > > > $ uname -a > > Linux IVES 2.6.25-gentoo-r7-a #3 SMP PREEMPT Wed Oct 1 14:45:40 PDT > > 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ > > AuthenticAMD GNU/Linux > > > > > > > > > > > > > > > > > |
And if what is past the end of the array happens to be a -inf or NaN,
it may be that the negative would survive the squaring operation (not sure about what happens in these corner cases, but I think that NaN may pass every comparison test). Russ At 11:55 AM 11/11/2008, you wrote: >I was just examining the code again, and noticed the loop went from 1 to >size. I haven't played with squeak or cobalt, so the array sizes may >not be what I think, but in C and most other modern languages, arrays >start at element 0 and go to size-1. If that is the case, could there >be an "empty" number at the end of the array? If so could code have >changed that number to a negative, or could an overflow have happened. >Also the code says fail if sizes don't match, what is the fail >condition? If a fail returns a minus 1, that could be the source of the >error. > >Regards, >Les H >On Fri, 2008-11-07 at 11:38 -0500, Bob Arning wrote: > > Short of seeing this in a debugger , the SqueakDebug.log file has a > > lot of clues. Seeing the file from Brad's case might tell us a lot. > > > > Cheers, > > Bob > > > > > > FloatingPointException: undefined if less than zero. > > 7 November 2008 11:34:51 am > > > > VM: Mac OS - a SmalltalkImage > > Image: Croquet1.0beta [latest update: #2] > > > > SecurityManager state: > > Restricted: false > > FileAccess: true > > SocketAccess: true > > Working Dir /Users/bob/Desktop/Miscellaneous/CroquetSDK-1.0.18 > > Trusted Dir /Users/bob/Desktop/Miscellaneous/CroquetSDK-1.0.18 > > Untrusted Dir foobar/tooBar/forSqueak/bogus > > > > Float>>primitiveSqrt > > Receiver: -1.0 > > Arguments and temporary variables: > > exp: nil > > guess: nil > > eps: nil > > delta: nil > > Receiver's instance variables: > > -1.0 > > > > Float>>sqrt > > Receiver: -1.0 > > Arguments and temporary variables: > > > > Receiver's instance variables: > > -1.0 > > > > SmallInteger(Number)>>sqrt > > Receiver: -1 > > Arguments and temporary variables: > > > > Receiver's instance variables: > > -1 > > > > UndefinedObject>>DoIt > > Receiver: nil > > Arguments and temporary variables: > > > > Receiver's instance variables: > > nil > > > > On Nov 7, 2008, at 11:31 AM, Steve Wart wrote: > > > > > Unless the compiler is broken or the primitive code is being > > > incorrectly generated. > > > > > > Can someone reproduce this problem in a debugger? > > > > > > On Fri, Nov 7, 2008 at 8:27 AM, David P. Reed <[hidden email]> wrote: > > > That's why it is a mystery! The code in question cannot pass a > > > negative number. Ever. > > > > > > > > > Bob Arning wrote: > > > FWIW, the version of #primitiveSqrt I have seems to generate the > > > exception only if self < 0.0. While the FSQRT bug may be a problem > > > elsewhere, it doesn't look like the culprit here. > > > > > > Cheers, > > > Bob > > > > > > > > > On Nov 7, 2008, at 10:49 AM, David P. Reed wrote: > > > > > > If you look at the stack, the sqrt in question is in a calculation > > > of the length of a vector. > > > > > > This calculation is: sqrt(delta_x^2 + delta_y^2 + delta_z^2) > > > > > > The ^2 makes all summands positive or zero. Thus, the square root > > > NEVER has a negative argument in this case. > > > > > > However, the FSQRT function in the x87 instruction subset does not > > > clear the FPE exception. If the exception flag is checked only > > > *after* the FSQRT (as is the convention in most C compilers), any > > > prior FPE setting instruction can have caused the exception, and in > > > Linux these days, it can actually be inherited from other parallel > > > processes, because of the bug I mentioned. > > > > > > So in fact, the bug may actually be caused far, far away from where > > > it appears if the problem is in the VM or the Linux kernel. And > > > the Croquet call history is a path that is repeatedly recomputed > > > (many times per second) and always gives the same answer based on > > > the same data if the user is not moving the mouse, so the > > > likelihood that the symptom occurs only after a long idle time due > > > to "local" effects is near zero. > > > > > > > > > > > > Les wrote: > > > somehow you took the squareroot of a negative number. Sorry if I am > > > late to the party in responding, I've been busy. > > > > > > Regards, > > > Les H > > > On Thu, 2008-11-06 at 08:04 -0800, Brad Fuller wrote: > > > > > > After running croquet for a while, a FP except ion occured. So, I > > > restarted it from scratch and started "Croquet (Master)" and left it > > > to run. I did not enter to world and move anything around. Eventually, > > > the FP exception occured. > > > > > > I searched the developers archives for floating point exception, but > > > couldn't find anything. > > > > > > I can not debug it much, it runs very slowly. But, i took a screen > > > shot of it and attached it. It might tell someone something. I'm going > > > to attempt to attach the screenshot to this msg, hoping that this > > > mailing list accepts attachments. > > > > > > This is the standard off-the-shelf SDK 1.0 from the croquetproject > > > website on Linux: > > > > > > $ uname -a > > > Linux IVES 2.6.25-gentoo-r7-a #3 SMP PREEMPT Wed Oct 1 14:45:40 PDT > > > 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ > > > AuthenticAMD GNU/Linux > > > > > > > > > > > > > > > > > > > > > > > > > > --- Russell M. Taylor II, Ph.D. [hidden email] CB #3175, Sitterson Hall www.cs.unc.edu/~taylorr University of North Carolina, Voice: (919) 962-1701 Chapel Hill, NC 27599-3175 FAX: (919) 962-1799 |
In reply to this post by Brad Fuller-4
Please take me off you email list as of today |
In reply to this post by Russell M. Taylor II
On Nov 11, 2008, at 3:05 PM, Russell M. Taylor II wrote: > And if what is past the end of the array happens to be a -inf or > NaN, it may be that the negative would survive the squaring > operation (not sure about what happens in these corner cases, but I > think that NaN may pass every comparison test). > > Russ NaN could well be what's happening: (Vector3 x: Float nan y: 1 z: 1) length will produce the error first reported--- FloatingPointException: undefined if less than zero. 11 November 2008 11:04:59 pm VM: Mac OS - a SmalltalkImage Image: Croquet1.0beta [latest update: #2] SecurityManager state: Restricted: false FileAccess: true SocketAccess: true Working Dir /Users/bob/Desktop/Miscellaneous/CroquetSDK-1.0.18 Trusted Dir /Users/bob/Desktop/Miscellaneous/CroquetSDK-1.0.18 Untrusted Dir foobar/tooBar/forSqueak/bogus Float>>primitiveSqrt Receiver: NaN Arguments and temporary variables: exp: nil guess: nil eps: nil delta: nil Receiver's instance variables: NaN Float>>sqrt Receiver: NaN Arguments and temporary variables: Receiver's instance variables: NaN Vector3(FloatArray)>>length Receiver: a Vector3(NaN 1.0 1.0) Arguments and temporary variables: Receiver's instance variables: a Vector3(NaN 1.0 1.0) |
In reply to this post by Russell M. Taylor II
Russell M. Taylor II wrote:
> And if what is past the end of the array happens to be a -inf or NaN, > it may be that the negative would survive the squaring operation (not > sure about what happens in these corner cases, but I think that NaN > may pass every comparison test). > We seem to be veering off into wild speculation. Let's see if we can avoid doing so... There are two methods that Les might be talking about when he was "just examining the code again": - FloatArray>>dot: - FloatArrayPlugin>>primitiveDotProduct: I'm assuming that we're talking about FloatArray>>dot:, since that's the one that is indexed from 1 to size. It's easy to verify that this is the way that FloatArray is supposed to be used. Try evaluating the following expressions: #(1 2 3) asFloatArray at: 1 "answers 1.0" #(1 2 3) asFloatArray at: 2 "answers 2.0" #(1 2 3) asFloatArray at: 3 "answers 3.0" #(1 2 3) asFloatArray at: 0 "error" #(1 2 3) asFloatArray at: 4 "error" So, if we are executing the Smalltalk code in FloatArray>>dot:, we can't do an out-of-bounds access. However, note that under normal conditions, the Smalltalk code in FloatArray>>dot: will never be executed. Instead, it will execute the primitive specified by FloatArrayPlugin>>primitiveDotProduct:. If you're not familiar with Squeak plugins, the gist is that #primitiveFloatPlugin: is translated to C, compiled with a C compiler, and dynamically linked into Croquet. The Smalltalk code in FloatArray>>dot: is evaluated only if the primitive fails (for example, if the two vectors have different lengths). Note that the primitive does bounds-checking; the problem doesn't appear to be there. My guess is still that the problem is something REALLY WEIRD. If it was a straight-forward bug in such heavily-used code, then users like Qwaq would have found it long ago. Cheers, Josh > Russ > > At 11:55 AM 11/11/2008, you wrote: >> I was just examining the code again, and noticed the loop went from 1 to >> size. I haven't played with squeak or cobalt, so the array sizes may >> not be what I think, but in C and most other modern languages, arrays >> start at element 0 and go to size-1. If that is the case, could there >> be an "empty" number at the end of the array? If so could code have >> changed that number to a negative, or could an overflow have happened. >> Also the code says fail if sizes don't match, what is the fail >> condition? If a fail returns a minus 1, that could be the source of the >> error. >> >> Regards, >> Les H >> On Fri, 2008-11-07 at 11:38 -0500, Bob Arning wrote: >> > Short of seeing this in a debugger , the SqueakDebug.log file has a >> > lot of clues. Seeing the file from Brad's case might tell us a lot. >> > >> > Cheers, >> > Bob >> > >> > >> > FloatingPointException: undefined if less than zero. >> > 7 November 2008 11:34:51 am >> > >> > VM: Mac OS - a SmalltalkImage >> > Image: Croquet1.0beta [latest update: #2] >> > >> > SecurityManager state: >> > Restricted: false >> > FileAccess: true >> > SocketAccess: true >> > Working Dir /Users/bob/Desktop/Miscellaneous/CroquetSDK-1.0.18 >> > Trusted Dir /Users/bob/Desktop/Miscellaneous/CroquetSDK-1.0.18 >> > Untrusted Dir foobar/tooBar/forSqueak/bogus >> > >> > Float>>primitiveSqrt >> > Receiver: -1.0 >> > Arguments and temporary variables: >> > exp: nil >> > guess: nil >> > eps: nil >> > delta: nil >> > Receiver's instance variables: >> > -1.0 >> > >> > Float>>sqrt >> > Receiver: -1.0 >> > Arguments and temporary variables: >> > >> > Receiver's instance variables: >> > -1.0 >> > >> > SmallInteger(Number)>>sqrt >> > Receiver: -1 >> > Arguments and temporary variables: >> > >> > Receiver's instance variables: >> > -1 >> > >> > UndefinedObject>>DoIt >> > Receiver: nil >> > Arguments and temporary variables: >> > >> > Receiver's instance variables: >> > nil >> > >> > On Nov 7, 2008, at 11:31 AM, Steve Wart wrote: >> > >> > > Unless the compiler is broken or the primitive code is being >> > > incorrectly generated. >> > > >> > > Can someone reproduce this problem in a debugger? >> > > >> > > On Fri, Nov 7, 2008 at 8:27 AM, David P. Reed <[hidden email]> >> wrote: >> > > That's why it is a mystery! The code in question cannot pass a >> > > negative number. Ever. >> > > >> > > >> > > Bob Arning wrote: >> > > FWIW, the version of #primitiveSqrt I have seems to generate the >> > > exception only if self < 0.0. While the FSQRT bug may be a problem >> > > elsewhere, it doesn't look like the culprit here. >> > > >> > > Cheers, >> > > Bob >> > > >> > > >> > > On Nov 7, 2008, at 10:49 AM, David P. Reed wrote: >> > > >> > > If you look at the stack, the sqrt in question is in a calculation >> > > of the length of a vector. >> > > >> > > This calculation is: sqrt(delta_x^2 + delta_y^2 + delta_z^2) >> > > >> > > The ^2 makes all summands positive or zero. Thus, the square root >> > > NEVER has a negative argument in this case. >> > > >> > > However, the FSQRT function in the x87 instruction subset does not >> > > clear the FPE exception. If the exception flag is checked only >> > > *after* the FSQRT (as is the convention in most C compilers), any >> > > prior FPE setting instruction can have caused the exception, and in >> > > Linux these days, it can actually be inherited from other parallel >> > > processes, because of the bug I mentioned. >> > > >> > > So in fact, the bug may actually be caused far, far away from where >> > > it appears if the problem is in the VM or the Linux kernel. And >> > > the Croquet call history is a path that is repeatedly recomputed >> > > (many times per second) and always gives the same answer based on >> > > the same data if the user is not moving the mouse, so the >> > > likelihood that the symptom occurs only after a long idle time due >> > > to "local" effects is near zero. >> > > >> > > >> > > >> > > Les wrote: >> > > somehow you took the squareroot of a negative number. Sorry if I am >> > > late to the party in responding, I've been busy. >> > > >> > > Regards, >> > > Les H >> > > On Thu, 2008-11-06 at 08:04 -0800, Brad Fuller wrote: >> > > >> > > After running croquet for a while, a FP except ion occured. So, I >> > > restarted it from scratch and started "Croquet (Master)" and left it >> > > to run. I did not enter to world and move anything around. >> Eventually, >> > > the FP exception occured. >> > > >> > > I searched the developers archives for floating point exception, but >> > > couldn't find anything. >> > > >> > > I can not debug it much, it runs very slowly. But, i took a screen >> > > shot of it and attached it. It might tell someone something. I'm >> going >> > > to attempt to attach the screenshot to this msg, hoping that this >> > > mailing list accepts attachments. >> > > >> > > This is the standard off-the-shelf SDK 1.0 from the croquetproject >> > > website on Linux: >> > > >> > > $ uname -a >> > > Linux IVES 2.6.25-gentoo-r7-a #3 SMP PREEMPT Wed Oct 1 14:45:40 PDT >> > > 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ >> > > AuthenticAMD GNU/Linux >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > > --- > Russell M. Taylor II, Ph.D. [hidden email] > CB #3175, Sitterson Hall www.cs.unc.edu/~taylorr > University of North Carolina, Voice: (919) 962-1701 > Chapel Hill, NC 27599-3175 FAX: (919) 962-1799 |
Of course you are right, Josh, I didn't realize that the function in
question was a library function, so that is why I made the suggestion I did. Regards, Les H On Wed, 2008-11-12 at 01:05 -0800, Joshua Gargus wrote: > Russell M. Taylor II wrote: > > And if what is past the end of the array happens to be a -inf or NaN, > > it may be that the negative would survive the squaring operation (not > > sure about what happens in these corner cases, but I think that NaN > > may pass every comparison test). > > > > We seem to be veering off into wild speculation. Let's see if we can > avoid doing so... > > There are two methods that Les might be talking about when he was "just > examining the code again": > - FloatArray>>dot: > - FloatArrayPlugin>>primitiveDotProduct: > > I'm assuming that we're talking about FloatArray>>dot:, since that's the > one that is indexed from 1 to size. It's easy to verify that this is > the way that FloatArray is supposed to be used. Try evaluating the > following expressions: > > #(1 2 3) asFloatArray at: 1 "answers 1.0" > #(1 2 3) asFloatArray at: 2 "answers 2.0" > #(1 2 3) asFloatArray at: 3 "answers 3.0" > #(1 2 3) asFloatArray at: 0 "error" > #(1 2 3) asFloatArray at: 4 "error" > > So, if we are executing the Smalltalk code in FloatArray>>dot:, we can't > do an out-of-bounds access. > > However, note that under normal conditions, the Smalltalk code in > FloatArray>>dot: will never be executed. Instead, it will execute the > primitive specified by FloatArrayPlugin>>primitiveDotProduct:. If > you're not familiar with Squeak plugins, the gist is that > #primitiveFloatPlugin: is translated to C, compiled with a C compiler, > and dynamically linked into Croquet. The Smalltalk code in > FloatArray>>dot: is evaluated only if the primitive fails (for example, > if the two vectors have different lengths). Note that the primitive > does bounds-checking; the problem doesn't appear to be there. > > My guess is still that the problem is something REALLY WEIRD. If it was > a straight-forward bug in such heavily-used code, then users like Qwaq > would have found it long ago. > > Cheers, > Josh > > > > > Russ > > > > At 11:55 AM 11/11/2008, you wrote: > >> I was just examining the code again, and noticed the loop went from 1 to > >> size. I haven't played with squeak or cobalt, so the array sizes may > >> not be what I think, but in C and most other modern languages, arrays > >> start at element 0 and go to size-1. If that is the case, could there > >> be an "empty" number at the end of the array? If so could code have > >> changed that number to a negative, or could an overflow have happened. > >> Also the code says fail if sizes don't match, what is the fail > >> condition? If a fail returns a minus 1, that could be the source of the > >> error. > >> > >> Regards, > >> Les H > >> On Fri, 2008-11-07 at 11:38 -0500, Bob Arning wrote: > >> > Short of seeing this in a debugger , the SqueakDebug.log file has a > >> > lot of clues. Seeing the file from Brad's case might tell us a lot. > >> > > >> > Cheers, > >> > Bob > >> > > >> > > >> > FloatingPointException: undefined if less than zero. > >> > 7 November 2008 11:34:51 am > >> > > >> > VM: Mac OS - a SmalltalkImage > >> > Image: Croquet1.0beta [latest update: #2] > >> > > >> > SecurityManager state: > >> > Restricted: false > >> > FileAccess: true > >> > SocketAccess: true > >> > Working Dir /Users/bob/Desktop/Miscellaneous/CroquetSDK-1.0.18 > >> > Trusted Dir /Users/bob/Desktop/Miscellaneous/CroquetSDK-1.0.18 > >> > Untrusted Dir foobar/tooBar/forSqueak/bogus > >> > > >> > Float>>primitiveSqrt > >> > Receiver: -1.0 > >> > Arguments and temporary variables: > >> > exp: nil > >> > guess: nil > >> > eps: nil > >> > delta: nil > >> > Receiver's instance variables: > >> > -1.0 > >> > > >> > Float>>sqrt > >> > Receiver: -1.0 > >> > Arguments and temporary variables: > >> > > >> > Receiver's instance variables: > >> > -1.0 > >> > > >> > SmallInteger(Number)>>sqrt > >> > Receiver: -1 > >> > Arguments and temporary variables: > >> > > >> > Receiver's instance variables: > >> > -1 > >> > > >> > UndefinedObject>>DoIt > >> > Receiver: nil > >> > Arguments and temporary variables: > >> > > >> > Receiver's instance variables: > >> > nil > >> > > >> > On Nov 7, 2008, at 11:31 AM, Steve Wart wrote: > >> > > >> > > Unless the compiler is broken or the primitive code is being > >> > > incorrectly generated. > >> > > > >> > > Can someone reproduce this problem in a debugger? > >> > > > >> > > On Fri, Nov 7, 2008 at 8:27 AM, David P. Reed <[hidden email]> > >> wrote: > >> > > That's why it is a mystery! The code in question cannot pass a > >> > > negative number. Ever. > >> > > > >> > > > >> > > Bob Arning wrote: > >> > > FWIW, the version of #primitiveSqrt I have seems to generate the > >> > > exception only if self < 0.0. While the FSQRT bug may be a problem > >> > > elsewhere, it doesn't look like the culprit here. > >> > > > >> > > Cheers, > >> > > Bob > >> > > > >> > > > >> > > On Nov 7, 2008, at 10:49 AM, David P. Reed wrote: > >> > > > >> > > If you look at the stack, the sqrt in question is in a calculation > >> > > of the length of a vector. > >> > > > >> > > This calculation is: sqrt(delta_x^2 + delta_y^2 + delta_z^2) > >> > > > >> > > The ^2 makes all summands positive or zero. Thus, the square root > >> > > NEVER has a negative argument in this case. > >> > > > >> > > However, the FSQRT function in the x87 instruction subset does not > >> > > clear the FPE exception. If the exception flag is checked only > >> > > *after* the FSQRT (as is the convention in most C compilers), any > >> > > prior FPE setting instruction can have caused the exception, and in > >> > > Linux these days, it can actually be inherited from other parallel > >> > > processes, because of the bug I mentioned. > >> > > > >> > > So in fact, the bug may actually be caused far, far away from where > >> > > it appears if the problem is in the VM or the Linux kernel. And > >> > > the Croquet call history is a path that is repeatedly recomputed > >> > > (many times per second) and always gives the same answer based on > >> > > the same data if the user is not moving the mouse, so the > >> > > likelihood that the symptom occurs only after a long idle time due > >> > > to "local" effects is near zero. > >> > > > >> > > > >> > > > >> > > Les wrote: > >> > > somehow you took the squareroot of a negative number. Sorry if I am > >> > > late to the party in responding, I've been busy. > >> > > > >> > > Regards, > >> > > Les H > >> > > On Thu, 2008-11-06 at 08:04 -0800, Brad Fuller wrote: > >> > > > >> > > After running croquet for a while, a FP except ion occured. So, I > >> > > restarted it from scratch and started "Croquet (Master)" and left it > >> > > to run. I did not enter to world and move anything around. > >> Eventually, > >> > > the FP exception occured. > >> > > > >> > > I searched the developers archives for floating point exception, but > >> > > couldn't find anything. > >> > > > >> > > I can not debug it much, it runs very slowly. But, i took a screen > >> > > shot of it and attached it. It might tell someone something. I'm > >> going > >> > > to attempt to attach the screenshot to this msg, hoping that this > >> > > mailing list accepts attachments. > >> > > > >> > > This is the standard off-the-shelf SDK 1.0 from the croquetproject > >> > > website on Linux: > >> > > > >> > > $ uname -a > >> > > Linux IVES 2.6.25-gentoo-r7-a #3 SMP PREEMPT Wed Oct 1 14:45:40 PDT > >> > > 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ > >> > > AuthenticAMD GNU/Linux > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > > > --- > > Russell M. Taylor II, Ph.D. [hidden email] > > CB #3175, Sitterson Hall www.cs.unc.edu/~taylorr > > University of North Carolina, Voice: (919) 962-1701 > > Chapel Hill, NC 27599-3175 FAX: (919) 962-1799 > |
Free forum by Nabble | Edit this page |