Hi all, I do hope someone can help me out here because it's a real stopper for my app. I was getting sporadic VM crashes when calling my plugin, which I have now got down to a very small subset of code to reproduce the problem. The crash dump never has a stack trace and sometimes it fails so bad no dump is produced. Essentially what I have is a 'C' method that returns and array of 4096 float values. I have taken away all the application C code and simply load and return an array, so I am quite sure it's not the C code that's breaking but somewhere in the C/Squeak interface. Here is what I have (stripped to minimum for clarity). For testing I am running this flat out in a loop. If I run 100,000 iterations I will usually but not always get a crash. Is there anything I can do to isolate this a bit further? Thanks in advance for any help. Bob The C code:
void process_test(float *results) {
The plugin code which is translated and compiled with the C code:
test: data
self primitive: 'test'
(self cCode: 'process_test(data)').
The calling code:
primTest: data
Finally, the method I call:
test
data := FloatArray new: 4096.
*** Confidentiality Notice ***
Proprietary/Confidential |
On Sun, Jun 04, 2006 at 01:12:12PM +0100, [hidden email] wrote:
> Hi all, > > I do hope someone can help me out here because it's a real stopper for my > app. I was getting sporadic VM crashes when calling my plugin, which I have > now got down to a very small subset of code to reproduce the problem. The > crash dump never has a stack trace and sometimes it fails so bad no dump is > produced. Essentially what I have is a 'C' method that returns and array of > 4096 float values. I have taken away all the application C code and simply > load and return an array, so I am quite sure it's not the C code that's > breaking but somewhere in the C/Squeak interface. Here is what I have > (stripped to minimum for clarity). For testing I am running this flat out in > a loop. If I run 100,000 iterations I will usually but not always get a > crash. Is there anything I can do to isolate this a bit further? Run your VM in a debugger if possible. I have not tried this on Windows, but there is a gdb debugger in the cygwin tools that should work for this. If you can get the VM to crash while running under a debugger you should get some clues as to what was going on. I'm not too familiar with the SmartSyntaxInterpreterPlugin, but I don't see anything obviously wrong with what you are doing. As a general rule though, intermittent memory-related crashes can be the result of the Squeak garbage collector moving object pointers while your primitive is running. Double check all of your primitive code to make sure that it is not doing anything that allocates new objects (such as allocating an array object in the primitive itself). If you are doing anything like this, you need to protect any object pointers that you are using with #pushRemappableOop: and #popRemappableOop. I don't see this in the code that you supplied, but I mention it because it's a common cause of intermittent problems in primitives. Dave |
In reply to this post by Bob.Cowdery
On Sun, Jun 04, 2006 at 01:12:12PM +0100, [hidden email] wrote:
> Hi all, > > I do hope someone can help me out here because it's a real stopper for my > app. I was getting sporadic VM crashes when calling my plugin, which I have > now got down to a very small subset of code to reproduce the problem. The > crash dump never has a stack trace and sometimes it fails so bad no dump is > produced. Essentially what I have is a 'C' method that returns and array of > 4096 float values. I have taken away all the application C code and simply > load and return an array, so I am quite sure it's not the C code that's > breaking but somewhere in the C/Squeak interface. Here is what I have > (stripped to minimum for clarity). For testing I am running this flat out in > a loop. If I run 100,000 iterations I will usually but not always get a > crash. Is there anything I can do to isolate this a bit further? > > Thanks in advance for any help. I tried building your test plugin code on my Linux system, and it works just fine. The test method returns the expected array of floats from 0.0 through 4095.0. I ran this test method 1,000,000 times with no errors, and ran it again still with no errors (2 million iterations total). I'd be quite surprised if there was any difference between the windows and unix VMs in this case, but I guess nothing is impossible. I'm attaching the actual code that I ran for your reference. You might want to try building this exact code on your system just as a double-check to be sure there is not something else going on in your plugin that somehow affects this test. HTH, Dave testplugin.zip (1K) Download Attachment |
In reply to this post by Bob.Cowdery
On Sun, Jun 04, 2006 at 01:12:12PM +0100, [hidden email] wrote: > Hi all, > > I do hope someone can help me out here because it's a real stopper for my > app. I was getting sporadic VM crashes when calling my plugin, which I have > now got down to a very small subset of code to reproduce the problem. The > crash dump never has a stack trace and sometimes it fails so bad no dump is > produced. Essentially what I have is a 'C' method that returns and array of > 4096 float values. I have taken away all the application C code and simply > load and return an array, so I am quite sure it's not the C code that's > breaking but somewhere in the C/Squeak interface. Here is what I have > (stripped to minimum for clarity). For testing I am running this flat out in > a loop. If I run 100,000 iterations I will usually but not always get a > crash. Is there anything I can do to isolate this a bit further? > > Thanks in advance for any help. >> I tried building your test plugin code on my Linux system, and it >> works just fine. The test method returns the expected array of floats >> from 0.0 through 4095.0. I ran this test method 1,000,000 times with >> no errors, and ran it again still with no errors (2 million iterations >> total). >> I'd be quite surprised if there was any difference between the >> windows and unix VMs in this case, but I guess nothing is impossible. >> I'm attaching the actual code that I ran for your reference. You >> might want to try building this exact code on your system just as >> a double-check to be sure there is not something else going on in >> your plugin that somehow affects this test. Dave Thanks very much for taking the time to do that. I built what you sent back into a separate plugin and ran it with your code. I set it going on two machines. My desktop is still running at 605,000 odd but my laptop crashed after 108,843 with this dump. I don't really know what to think at the moment. I guess I need somehow to get a debugger going. This could be a long haul. My other plugin is still loaded but wasn't active. The machine that's still running doesn't have it loaded. I will try with it active though heaven knows why an inactive dll should make a difference. Bob --------------------------------------------------------------------- Sun Jun 04 19:21:53 2006 Exception code: C0000005 Exception addr: 00419EED Access violation (read access) at 21F41A44 EAX:21F41A44 EBX:80000002 ECX:00000001 EDX:00000000 ESI:11EF59C4 EDI:11EF580C EBP:11EA60D0 ESP:0006FBA8 EIP:00419EED EFL:00210246 FP Control: FFFF037F FP Status: FFFF4020 FP Tag: FFFFFFFF VM Version: Squeak 3.7.1 (release) from Sep 23 2004 Compiler: gcc 2.95.2 19991024 (release) Current byte code: 217 Primitive index: 0 Loaded plugins: ZipPlugin 23 September 2004 (i) TestForBob 4 June 2006 (e) SoundGenerationPlugin 23 September 2004 (i) SoundPlugin 23 September 2004 (i) SDRDttSpDSPPlugin 4 June 2006 (e) DirectTransportPlugin 7 December 2004 (e) JPEGReadWriter2Plugin 23 September 2004 (i) LargeIntegers v1.3 23 September 2004 (i) SocketPlugin 23 September 2004 (i) Matrix2x3Plugin 23 September 2004 (i) FloatArrayPlugin 23 September 2004 (i) B2DPlugin 23 September 2004 (i) BitBltPlugin 23 September 2004 (i) SecurityPlugin 23 September 2004 (i) FilePlugin 23 September 2004 (i) MiscPrimitivePlugin 23 September 2004 (i) Stack dump: 300915876 [] in Delay>schedule 300916060 [] in Semaphore>critical: 300916244 BlockContext>ensure: 300915968 Semaphore>critical: 300915784 Delay>schedule 300915508 Delay>wait 300915600 [] in EventSensor>eventTickler 300915416 BlockContext>on:do: 299589928 EventSensor>eventTickler 299589652 [] in EventSensor>installEventTickler 299589836 [] in BlockContext>newProcess *** Confidentiality Notice *** Proprietary/Confidential Information belonging to CGI Group Inc. and its affiliates may be contained in this message. If you are not a recipient indicated or intended in this message (or responsible for delivery of this message to such person), or you think for any reason that this message may have been addressed to you in error, you may not use or copy or deliver this message to anyone else. In such case, you should destroy this message and are asked to notify the sender by reply email. |
In reply to this post by Bob.Cowdery
David - and anyone else that can help. Things are looking bad now! I can now crash my VM in about 10 seconds, guaranteed. I wonder if you would be good enough to try this on Linux. I moved your class code to another class so I could instance the calling class and then ran this: [1 to: 1000 do: [:a|data1 := app1 test]]fork.
Where app1 to app10 are instances of the calling class. It breaks almost every time (now I have taken out the Transcript writes which were slowing it down). Maybe there are re-entrancy problems somewhere and what I am doing is not valid. The stack trace in the dump I am pretty sure is a consequence of where it was in the block execution and not anything to do with the crash. Thanks
*** Confidentiality Notice ***
Proprietary/Confidential |
On 6/4/06, [hidden email] <[hidden email]> wrote:
> I can now crash my VM in about 10 seconds, guaranteed. Cool! You must be close to finding the problem. > Maybe there are re-entrancy problems somewhere and what I am > doing is not valid. It's probably something like that. Is the returned array correct? That is, does it have the proper floating point values in it? If not, maybe you're not storing the data where it's supposed to go. That would do it. Hope this helps! --Tom Phoenix |
In reply to this post by Bob.Cowdery
Hi Tom On 6/4/06, [hidden email] <[hidden email]> wrote: > I can now crash my VM in about 10 seconds, guaranteed. >> Cool! You must be close to finding the problem.
> Maybe there are re-entrancy problems somewhere and what I am
>> It's probably something like that. Is the returned array correct? That
Don't know if you followed the previous posts. All the values I get back are valid, even when running multiple threads. It fails with just one thread but takes a long time. I kicked it hard to see what would happen, and it falls apart fast. What I don't know yet is if it an XP VM problem or all VM's. This could kill my project I've worked 6 months on. Bob *** Confidentiality Notice ***
Proprietary/Confidential |
In reply to this post by Bob.Cowdery
Hi Bob,
I've looked how #asFloat, used in the FloatArrayPlugin, is c-translated: its argument is casted to double. But your c-code example defines float *result and float[4096] ? /Klaus On Sun, 04 Jun 2006 23:02:18 +0200, you <[hidden email]> wrote: > David - and anyone else that can help. Things are looking bad now! > > I can now crash my VM in about 10 seconds, guaranteed. I wonder if you > would > be good enough to try this on Linux. I moved your class code to another > class so I could instance the calling class and then ran this: > > [1 to: 1000 do: [:a|data1 := app1 test]]fork. > [1 to: 1000 do: [:b|data2 := app2 test]]fork. > [1 to: 1000 do: [:c|data3 := app3 test]]fork. > [1 to: 1000 do: [:d|data4 := app4 test]]fork. > [1 to: 1000 do: [:e|data5 := app5 test]]fork. > [1 to: 1000 do: [:f|data6 := app6 test]]fork. > [1 to: 1000 do: [:g|data7 := app7 test]]fork. > [1 to: 1000 do: [:h|data8 := app8 test]]fork. > [1 to: 1000 do: [:i|data9 := app9 test]]fork. > [1 to: 1000 do: [:j|data10 := app10 test]]fork. > > Where app1 to app10 are instances of the calling class. It breaks almost > every time (now I have taken out the Transcript writes which were > slowing it > down). Maybe there are re-entrancy problems somewhere and what I am > doing is > not valid. The stack trace in the dump I am pretty sure is a consequence > of > where it was in the block execution and not anything to do with the > crash. > > Thanks > Bob > > *** Confidentiality Notice *** Proprietary/Confidential > Information belonging to CGI Group Inc. and its affiliates > may be contained in this message. If you are not a recipient > indicated or intended in this message (or responsible for > delivery of this message to such person), or you think for > any reason that this message may have been addressed to you > in error, you may not use or copy or deliver this message > to anyone else. In such case, you should destroy this > message and are asked to notify the sender by reply email. |
In reply to this post by Bob.Cowdery
Bob Cowdery wrote on Sun, 4 Jun 2006 13:12:12 +0100
> memcpy((void *)results, a, 4096*sizeof(float)); I haven't looked into interfacing with the VM at all and so am probably saying something really stupid, but I would think that results[0] would hold the object's header and overwritting it would cause really bad things to happen. -- Jecel |
In reply to this post by Bob.Cowdery
A good place to borrow from is FloatArrayPlugin>>#primitiveAt
/Klaus On Sun, 04 Jun 2006 23:42:06 +0200, Jecel Assumpcao Jr <[hidden email]> wrote: > Bob Cowdery wrote on Sun, 4 Jun 2006 13:12:12 +0100 > >> memcpy((void *)results, a, 4096*sizeof(float)); > > I haven't looked into interfacing with the VM at all and so am probably > saying something really stupid, but I would think that results[0] would > hold the object's header and overwritting it would cause really bad > things to happen. > > -- Jecel > > |
In reply to this post by Bob.Cowdery
Am 04.06.2006 um 23:42 schrieb Jecel Assumpcao Jr:
> Bob Cowdery wrote on Sun, 4 Jun 2006 13:12:12 +0100 > >> memcpy((void *)results, a, 4096*sizeof(float)); > > I haven't looked into interfacing with the VM at all and so am > probably > saying something really stupid, but I would think that results[0] > would > hold the object's header and overwritting it would cause really bad > things to happen. That's what I suspected, too. So, you have to convert between Oops and C pointers, for example using #arrayValueOf: if you know your Oop is an array, or #firstIndexableField: for the general case. Also, be sure to look at the generated C code, and attach it for reference. Besides, this question would be well-suited for the vm list. Third, Bob, since nobody seems to have told you yet, it would be nice if you would send plain-text messages. I don't like having to crank up the font size for every of your messages. And your Confidentiality Notice is sort of nonsensical when posting to a public mailing list. Thanks. - Bert - |
On 4-Jun-06, at 3:05 PM, Bert Freudenberg wrote: > Am 04.06.2006 um 23:42 schrieb Jecel Assumpcao Jr: > >> Bob Cowdery wrote on Sun, 4 Jun 2006 13:12:12 +0100 >> >>> memcpy((void *)results, a, 4096*sizeof(float)); >> >> I haven't looked into interfacing with the VM at all and so am >> probably >> saying something really stupid, but I would think that results[0] >> would >> hold the object's header and overwritting it would cause really bad >> things to happen. > > That's what I suspected, too. > > So, you have to convert between Oops and C pointers, for example > using #arrayValueOf: if you know your Oop is an array, or > #firstIndexableField: for the general case. Also, be sure to look > at the generated C code, and attach it for reference. The plugin function is generated as:- EXPORT(sqInt) test(void) { float *data; data = ((float *) (interpreterProxy->arrayValueOf(interpreterProxy- >stackValue(0)))); if (interpreterProxy->failed()) { return null; } process_test(data); if (interpreterProxy->failed()) { return null; } interpreterProxy->popthenPush(2, data); return null; } so we can see it is handling the firstIndexableField issue via arrayValueOf(). tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Diagnostics are the programs that run when nothing else will. |
On Mon, 05 Jun 2006 00:35:40 +0200, tim Rowledge <[hidden email]> wrote:
> > On 4-Jun-06, at 3:05 PM, Bert Freudenberg wrote: > >> Am 04.06.2006 um 23:42 schrieb Jecel Assumpcao Jr: >> >>> Bob Cowdery wrote on Sun, 4 Jun 2006 13:12:12 +0100 >>> >>>> memcpy((void *)results, a, 4096*sizeof(float)); >>> >>> I haven't looked into interfacing with the VM at all and so am probably >>> saying something really stupid, but I would think that results[0] would >>> hold the object's header and overwritting it would cause really bad >>> things to happen. >> >> That's what I suspected, too. >> >> So, you have to convert between Oops and C pointers, for example using >> #arrayValueOf: if you know your Oop is an array, or >> #firstIndexableField: for the general case. Also, be sure to look at >> the generated C code, and attach it for reference. > > The plugin function is generated as:- > > EXPORT(sqInt) test(void) { > float *data; > > data = ((float *) (interpreterProxy->arrayValueOf(interpreterProxy- > >stackValue(0)))); > if (interpreterProxy->failed()) { > return null; > } > process_test(data); > if (interpreterProxy->failed()) { > return null; > } > interpreterProxy->popthenPush(2, data); > return null; > } > > so we can see it is handling the firstIndexableField issue via > arrayValueOf(). Sure. But interpreterProxy->popthenPush(2, data); is not the converse of data = ((float *) (interpreterProxy->arrayValueOf(interpreterProxy->stackValue(0)))); , simply because data is not an oop. /Klaus |
On 4-Jun-06, at 3:48 PM, Klaus D. Witzel wrote: > > Sure. But interpreterProxy->popthenPush(2, data); is not the > converse of > > data = ((float *) (interpreterProxy->arrayValueOf(interpreterProxy- > >stackValue(0)))); > > , simply because data is not an oop. D'oh! Of course. Bob, lose the last line of test: data self export: true. self primitive: 'test' parameters:#(FloatArray). (self cCode: 'process_test(data)'). ^data. - you don't need to return anything anyway since you are filling in the array you pass in and as Klaus quite rightly points out it is returning completely the wrong thing. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim After a number of decimal places, nobody gives a damn. |
In reply to this post by Bob.Cowdery
On Sun, Jun 04, 2006 at 10:02:18PM +0100, [hidden email] wrote:
> David - and anyone else that can help. Things are looking bad now! > > I can now crash my VM in about 10 seconds, guaranteed. I wonder if you would > be good enough to try this on Linux. I moved your class code to another > class so I could instance the calling class and then ran this: > > [1 to: 1000 do: [:a|data1 := app1 test]]fork. > [1 to: 1000 do: [:b|data2 := app2 test]]fork. > [1 to: 1000 do: [:c|data3 := app3 test]]fork. > [1 to: 1000 do: [:d|data4 := app4 test]]fork. > [1 to: 1000 do: [:e|data5 := app5 test]]fork. > [1 to: 1000 do: [:f|data6 := app6 test]]fork. > [1 to: 1000 do: [:g|data7 := app7 test]]fork. > [1 to: 1000 do: [:h|data8 := app8 test]]fork. > [1 to: 1000 do: [:i|data9 := app9 test]]fork. > [1 to: 1000 do: [:j|data10 := app10 test]]fork. > > Where app1 to app10 are instances of the calling class. It breaks almost > every time (now I have taken out the Transcript writes which were slowing it > down). Maybe there are re-entrancy problems somewhere and what I am doing is > not valid. The stack trace in the dump I am pretty sure is a consequence of > where it was in the block execution and not anything to do with the crash. I tried running this on my system, and I do *not* get a crash. However, others in this thread pointed out that your return value from the primitive is incorrect. I should return an object (an OOP) rather than a C pointer value. You should definitely fix this, but I don't think it is the cause of your VM crashes as long as you are not actually making use of the return value. However, if you *do* make use of the return value, you will crash the VM. I just tried it, and the VM crashed. So I would say your next step should be to get rid of the "^ data" as the last line of your primitive, and re-run your crash tests. My guess is that you will still get crashes, because I suspect that we are all barking up the wrong tree at this point. But if the crashes stop, then you're home free. One more thing to think about: You have been running an image with a plugin that returns bogus object pointers after callng your primitive. If these bogus object pointers are somehow accumulating in your image, and if they are later accessed for any reason (possibly including the garbage collector), then I would guess that it's possible you have an image that is waiting to crash on you and unexpected times. You mentioned that you have two systems (a laptop and something else), and that one of them crashes, and the other does not. I think that you should copy the image & changes from the system that does not crash over to the system that does crash, and see if that image runs OK on both computers. If so, you may want to just save all of your source code and file it in to a clean image to start fresh. This is pure speculation, but my next best guess would have been to blame the hardware on the system that crashes, and that would almost certainly have been a bad guess ;) Dave |
In reply to this post by Bob.Cowdery
To. David, Tim, Klaus, Bert, Tom and Jecel.
I thank you all very much for the input and I am very pleased to say that removing the ^data from the primitive 'seems' to have cured the problem. At least I can't get the test code to crash now. I wasn't actually using the returned result but even so it was clearly doing something nasty. I have responded to the specifics below. Bob-- David T Lewis said > I tried running this on my system, and I do *not* get a crash. Initially this looked very much like an XP only VM problem and I still don't quite understand why Linux survives. > should return an object (an OOP) rather than a C pointer value. You should definitely fix this.. Fixed and seems to work reliably now. > You mentioned that you have two systems (a laptop and something else), and that one of them crashes, and the other does not. > I think that you should copy the image & changes from the system that does not crash over to the system that does crash, and > see if that image runs OK on both computers. Both did actually fail in the end. I do from time to time start with a new image as I haven't quite mastered keeping the image clean although I am getting better. Tim said > you don't need to return anything anyway since you are filling in > the array you pass in and as Klaus quite rightly points out it is > returning completely the wrong thing. Yup, it also gets rid of some compiler warnings which in hindsight I should not have ignored. Bert said > Third, Bob, since nobody seems to have told you yet, it would be nice > if you would send plain-text messages. I don't like having to crank > up the font size for every of your messages. And your Confidentiality > Notice is sort of nonsensical when posting to a public mailing list. > Thanks. Yes I know, sorry. The mail server adds the footer so I can't avoid that and I do switch to plain text 'usually' but sometimes it inexplicably gets put back into HTML. This one should definitely be plain text according to my settings. Klaus said > I've looked how #asFloat, used in the FloatArrayPlugin, is c-translated: > its argument is casted to double. But your c-code example defines float > *result and float[4096] ? That sound like it should be a problem but in practice seems ok. *** Confidentiality Notice *** Proprietary/Confidential Information belonging to CGI Group Inc. and its affiliates may be contained in this message. If you are not a recipient indicated or intended in this message (or responsible for delivery of this message to such person), or you think for any reason that this message may have been addressed to you in error, you may not use or copy or deliver this message to anyone else. In such case, you should destroy this message and are asked to notify the sender by reply email. |
On Mon, Jun 05, 2006 at 09:39:31AM +0100, Cowdery, Bob [UK] wrote:
> > I've looked how #asFloat, used in the FloatArrayPlugin, is c-translated: > > its argument is casted to double. But your c-code example defines float > > *result and float[4096] ? > That sound like it should be a problem but in practice seems ok. A FloatArray contains 32 bit float values (not C doubles), so your declaration is correct. Dave |
Free forum by Nabble | Edit this page |