Hello gods of FFI and of the vm,
I am blocked for a while in the development of Smallapack (Smalltalk interface to LAPACK) on squeak. I have a very strange behaviour isolated on a small test case (see below): I want to call DLANGE, a LAPACK FORTRAN routine to compute the norm of a matrix. And on the seventh call, i always get a stange result (result is 0.0 but will answer false to = 0.0). This fail on unix with image 3.9alpha7029, vm is: squeak -version 3.7-7 #1 Sat Mar 19 13:12:20 PST 2005 gcc 3.3.5 Squeak3.7 of '4 September 2004' [latest update: #5989] Linux squeak.hpl.hp.com 2.4.27-1-386 #1 Fri Sep 3 06:24:46 UTC 2004 i686 GNU/Linux default plugin location: /usr/local/lib/squeak/3.7-7/*.so Same code seem to work on windows (i have tried it today)... Beware, i presume this can corrupt your image. Maybe i am doing something wrong, could someone explain me please ? Nicolas ---------------------------------------------------------------------------------------------------- My definition of dlange2.c (translated with f2c then modified to simply answer 0.0) is: /* #include "f2c.h" */ typedef double doublereal; typedef long integer; typedef long ftnlen; /*< DOUBLE PRECISION FUNCTION DLANGE( NORM, M, N, A, LDA, WORK ) >*/ doublereal dlange2_(char *norm, integer *m, integer *n, doublereal *a, integer *lda, doublereal *work, ftnlen norm_len) { return 0.0; } /* dlange2_ */ you just compile with: gcc -c dlange2.c; ld -shared -o libdlange2.so dlange2.o and call from Squeak with: ExternalLibrary subclass: #DLANGE2Library instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: 'Smallapack-Test-DLANGE' DLANGE2Library class>>moduleName ^'dlange2' DLANGE2Library>>dlange2Withnorm: norm m: m n: n a: a lda: lda work: work length: lengthOfnorm <cdecl: double 'dlange2_'( char * long * long * double * long * double * long )> ^self externalCallFailed DLANGE2Library class>>testDLANGE2 "DLANGE2Library testDLANGE2" | a m n lda norm cm cn clda work | m := lda := 3. n := 4. a := ExternalData fromHandle: (ByteArray new: m*n*8) type: ExternalType double. "AS FORTRAN IS PASSING POINTERS, DO ALLOCATE ExternalData" cm := ExternalData fromHandle: ((ByteArray new: 4) signedLongAt: 1 put: m; yourself) type: ExternalType long. cn := ExternalData fromHandle: ((ByteArray new: 4) signedLongAt: 1 put: n; yourself) type: ExternalType long. clda := ExternalData fromHandle: ((ByteArray new: 4) signedLongAt: 1 put: lda; yourself) type: ExternalType long. norm := 'M'. work := nil. ^(1 to: 10) collect: [:i | (self new dlange2Withnorm: norm m: cm n: cn a: a lda: clda work: work length: 1) = 0.0] "you always obtain false from the seventh entry on..." |
I am just joining the files...
DLANGE2Library.st (1K) Download Attachment libdlange2.so (2K) Download Attachment dlange2.c (336 bytes) Download Attachment |
Hi nicolas,
I just test this : ^(1 to: 10) collect: [:i | r := (self new dlange2Withnorm: norm m: cm n: cn a: a lda: clda work: work length: 1) = 0.0. Transcript show: r asString. r ]. And its works...... but if I remove the "Transcrip show" I obtain : #(true true false false false false false false false false) vincent. 2006/6/28, nicolas cellier <[hidden email]>: Nobody alive on the thread? |
In reply to this post by Nicolas Cellier-3
Hello vincent,
yes, the FFI call is always returning 0.0, but then the vm fails somewhere after when testing = 0.0. If you execute step by step in the debugger, it does not fail... something weird. I guess we'd better quit without saving the image when such symptoms occurs... Nicolas bouchet vincent: > Hi nicolas, > > I just test this : > > ^(1 to: 10) collect: [:i | r := (self new dlange2Withnorm: norm m: cm n: cn a: a lda: clda work: work length: 1) = 0.0. > Transcript show: r asString. > r ]. > > > And its works...... but if I remove the "Transcrip show" I obtain : #(true true false false false false false false false false) > > vincent. > > 2006/6/28, nicolas cellier <[hidden email]>:Nobody alive on the thread? > The bug is now at http://bugs.impara.de/view.php?id=3929 > > Nicolas > > > > ________________________________________________________________________ iFRANCE, exprimez-vous ! http://web.ifrance.com |
In reply to this post by Nicolas Cellier-3
Hi Nicolas,
> And on the seventh call, i always get a stange result (result is 0.0 but will > answer false to = 0.0). I don't know a whole lot about FFI, but I know that generally accepted practice (at least in C which is the language I'm most familiar with) is that you NEVER test for equality when using floating point numbers. You always use some small epsilon and test that the number you've got is within epsilon of the comparison number. This is because due to roundoff and other such effects, you can wind up with really tiny numbers, like 1 x 10^-53 which is essentially zero, but is not equal to zero. This might not even be relevant to your discussion, but seeing the equality test on 0.0 raised a red flag for me. -- Dave Hylands Vancouver, BC, Canada http://www.DaveHylands.com/ |
Hi all,
I think to a "synchronisation" problem : I remember similar problem with c++ (in an other life) : The value is read before she was write. (with a Transcript, or a debug screen, the execution speed is slower.) 2006/6/29, Dave Hylands <[hidden email]>: Hi Nicolas, |
In reply to this post by Nicolas Cellier-3
Dave, you are perfectly right, and in my original TestCase i used such a carefully crafted epsilon based on matrix dimensions and Float precision. And the test (norm < epsilon) did also fail... What you see here is the result of my peregrinations in isolating and tracking the bug. And since i force the value to 0.0 which has an exact representation in IEEE floating point, there is no problem using equal, this is one of the rare cases where this construct is licit. Beside, if i retry the expression while in the debugger, it does never fail. If you load my testcase, you can replace the test = 0.0 with < 1.0e-5, ~= 1 or whatever, i guess it will still fail on the seventh call (sorry, no unix image under my hands to assert what i say). Nicolas Dave Hylands: > Hi Nicolas, > > > And on the seventh call, i always get a stange result (result is 0.0 but will > > answer false to = 0.0). > > I don't know a whole lot about FFI, but I know that generally accepted > practice (at least in C which is the language I'm most familiar with) > is that you NEVER test for equality when using floating point numbers. > > You always use some small epsilon and test that the number you've got > is within epsilon of the comparison number. > > This is because due to roundoff and other such effects, you can wind > up with really tiny numbers, like 1 x 10^-53 which is essentially > zero, but is not equal to zero. > > This might not even be relevant to your discussion, but seeing the > equality test on 0.0 raised a red flag for me. > > -- > Dave Hylands > Vancouver, BC, Canada > http://www.DaveHylands.com/ > ________________________________________________________________________ iFRANCE, exprimez-vous ! http://web.ifrance.com |
In reply to this post by Nicolas Cellier-3
bouchet vincent: > Hi all, > > I think to a "synchronisation" problem : I remember similar problemwith c++ (in an other life) : The value is read before she was write.(with a Transcript, or a debug screen, the execution speed is slower.) > Interesting, now i have at least 2 solutions, Transcript in loops or steal a 486DX at the museum. I thought there were a single thread of execution and have not the necessary background to understand what you say, but i will transmit the suggestion to Ian. Thank you Nicolas ________________________________________________________________________ iFRANCE, exprimez-vous ! http://web.ifrance.com |
On Thu, Jun 29, 2006 at 05:52:25PM +0200, [hidden email] wrote:
> > bouchet vincent: > > Hi all, > > > > I think to a "synchronisation" problem : I remember similar problemwith c++ (in an other life) : The value is read before she was write.(with a Transcript, or a debug screen, the execution speed is slower.) > > > > Interesting, > now i have at least 2 solutions, Transcript in loops or steal a 486DX at the museum. > I thought there were a single thread of execution and have not the necessary background to understand what you say, but i will transmit the suggestion to Ian. > Nicolas, This is probably not related to your problem, but I will mention it just in case. The Transcript is a very unreliable way to debug something that may be related to timing, because it must be updated in the Squeak user interface process. Instead of the Transcript, it may be better to write to console standard output. If you have OSProcess loaded in your image, you can use OSProcess class>>debugMessage: and OSProcess class>>trace: to write to the console output immediately from the active Squeak process. Again, I do *not* think that this is directly related to the problem you are trying to solve, but maybe it will help make your debugging more predictable. Dave |
Free forum by Nabble | Edit this page |