Christopher,
We can't get a test case for this problem here (even under NT) so we'll have to rely on your continued sleuthing I'm afraid. That comment swap code that you've found obviously was an attempt to speed thing up and has been there since before 1997. It is a shabby piece of work and your proposed fixed should probably be reinstated into the image regardless of whether it is a true fix for the four byte overwrite issue or not. I doubt it makes any real speed difference but we should test this first anyway. The thing that puzzles us about your fix is that it is not that piece of code (#writeInstanceVariables) that is responsible for writing out the corrupted class names but, rather, #writeClass:withPrefix:. Anyway, we await the results of your further testing with interest. Best Regards, Andy Bower Dolphin Support http://www.object-arts.com --- Are you trying too hard? http://www.object-arts.com/Relax.htm --- "Christopher J. Demers" <[hidden email]> wrote in message news:aoffjn$lug31$[hidden email]... > Bill Schwab <[hidden email]> wrote in message > news:ao2cqm$hs79j$[hidden email]... > > > > Remember the binary filing problem (typically showed up when saving view > > resources) that corrupts some/all of the first four bytes of a class name, > > and was (in my limited experience with it) fairly common on NT? I've now > > seen it (with STB, not in the VC) on win2k sp2. Any ideas? A > reproduceable > > case would be terrific :) > > Ian's message got me thinking about problems with pointers to strings again. > I looked back at STBOutFiler<<writeInstanceVariables: which had previously > seemed interesting to me due to its use of yourAddress asExternalAddress. > > I remmed this line: > stream next: basicSize putAll: objectToSave yourAddress asExternalAddress > startingAt: 1] > and unremmed the line above it: > 1 to: basicSize do: [:i | stream nextPut: (objectToSave basicAt: i) > asInteger]] > > It looks like OA was trying optimize STB performance. The two lines seem > accomplish the same thing. I can't claim this as a fix for sure due to the > random nature of this problem, however after this change I have not had an > STB corruption. I will keep testing. Anyone else experiencing this problem > (that feels brave) can try this "fix". Let us know if it actually fixes > anything. Be aware that this change might make the STB system slower. Be > carefull. > > Chris > > |
Andy,
> The thing that puzzles us about your fix is that it is not that piece of > code (#writeInstanceVariables) that is responsible for writing out the > corrupted class names but, rather, #writeClass:withPrefix:. Anyway, we await > the results of your further testing with interest. The code does run though (I set a breakpoint to verify that my tweaked version was excercising the branch that I altered), so if there is something wrong with "yourAddress asExternalAddress", then it might make a difference. Or, is "yourAddress asExternalAddress" guaranteed to be safe? Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
Bill,
> The code does run though (I set a breakpoint to verify that my tweaked > version was excercising the branch that I altered), so if there is something > wrong with "yourAddress asExternalAddress", then it might make a difference. > Or, is "yourAddress asExternalAddress" guaranteed to be safe? Well I wouldn't ever like to use the word "guaranteed" :-) but providing the object in question doesn't go out of scope and get GC'd while that external address is used there shouldn't be a probelm. I can't see how that could happen with the code in question since objectToSave is a parameter and must be in scope until at least the end of the method. Best Regards, Andy Bower Dolphin Support http://www.object-arts.com --- Are you trying too hard? http://www.object-arts.com/Relax.htm --- |
Andy,
> > The code does run though (I set a breakpoint to verify that my tweaked > > version was excercising the branch that I altered), so if there is > something > > wrong with "yourAddress asExternalAddress", then it might make a > difference. > > Or, is "yourAddress asExternalAddress" guaranteed to be safe? > > Well I wouldn't ever like to use the word "guaranteed" :-) but providing the > object in question doesn't go out of scope and get GC'd while that external > address is used there shouldn't be a probelm. I can't see how that could > happen with the code in question since objectToSave is a parameter and must > be in scope until at least the end of the method. Understood, but what about the object returned by #yourAddress? I'm wondering if it gets away, perhaps only when it is or isn't already in some particular format. Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
In reply to this post by Andy Bower
Andy,
> object in question doesn't go out of scope and get GC'd while that external > address is used there shouldn't be a probelm. I can't see how that could > happen with the code in question since objectToSave is a parameter and must > be in scope until at least the end of the method. Isn't there at least the theoretical possibility that the object will be moved by GC before or while the bytes from its "external address" are being copied to the stream ? I haven't been able to find any circumstances when byte objects do change their external address, running D5 over W2K, but maybe something happens differently on NT ? -- chris |
In reply to this post by Bill Schwab-2
Bill Schwab <[hidden email]> wrote in message
news:aoh3c3$ls177$[hidden email]... > Chris, > > What does your NT box think of the following? What do _you_ think of it? > Hopefully it preserves the speed boost w/o the bug?? Note that I could be > holding on to the wrong object, so feel encouraged to tweak as you see fit. ... No luck with this code. I tried it both as you wrote it, and with an another variable doing them same thing to the result of asExternalAddress as well. The corruption occurred with both versions. Then I want back to the code that does not use asExternalAddress and did not have any corruptions. I think there is something wrong with using yourAddress and/or asExternalAddress in this way on certain machines. I ran a D5 EXE with my fix on my NT machine last night. I had it looping 100,000 times sending a view back and forth to STB. It eventually consumed all my system resources (either there was a leak, or my half second delay did not let it GC), but it did not cause any STB corruptions. So as you may have noticed I am not using quotes around fix any more. I will keep testing, but I think the fix works. I would encourage you to give it a shot and see if it alleviates your problems. Chris |
In reply to this post by Andy Bower
Andy Bower <[hidden email]> wrote in message
news:aohav2$mb4og$[hidden email]... > > We can't get a test case for this problem here (even under NT) so we'll have > to rely on your continued sleuthing I'm afraid. That comment swap code that > you've found obviously was an attempt to speed thing up and has been there > since before 1997. It is a shabby piece of work and your proposed fixed > should probably be reinstated into the image regardless of whether it is a > true fix for the four byte overwrite issue or not. I doubt it makes any real > speed difference but we should test this first anyway. I generated a Dolphin 5 EXE with my fix, and told it to run a big loop of the same tests that were causing corruptions for me before. I did not see any STB corruptions. The program eventually exhausted my system resources overnight, but that is another matter and since I don't normally try to open and close a view 100,000 times in a row I don't much care. ;) I feel quite confident that not using yourAddress and/or asExternalAddress is the fix for the STB problems on my computer. I only wish I had discovered this sooner. I would suggest making this change to Dolphin if there is not much downside as it will improve the overall reliability of Dolphin on different platforms. This may not be as important for people developing "in house" solutions, but I think it is important for anyone shipping a Dolphin based product to end users. > The thing that puzzles us about your fix is that it is not that piece of > code (#writeInstanceVariables) that is responsible for writing out the > corrupted class names but, rather, #writeClass:withPrefix:. Anyway, we await > the results of your further testing with interest. I wondered about that too and it kept me from looking at that method sooner, here are my thoughts: My most common corruption before was IdentityDictionary. I also saw corruption in the display name of a tab in the class browser once, and lately I have seen it in method names. The later two seem like they certainly could be in instance variable values on some level. I think the first item is as well, since it is contained in an STBIdentityDictionaryProxy which has an instance variable for the class. I just looked at the code, and I am just as puzzled as you are, because I think the class should still be routed to #writeClass:withPrefix: even if it is in an instance variable. Well I can't explain why, but the fix I described appears to have eliminated the STB corruption problem for me on my problem computer. Chris |
In reply to this post by Chris Uppal-3
Chris,
> I haven't been able to find any circumstances when byte objects do change > their external address, running D5 over W2K, but maybe something happens > differently on NT ? But it also happens on 2k, just a lot less often. As an aside, a different machine (sp 3 if memory serves) got into trouble this afternoon. I've pretty much decided to install a hot fix based on your proposed patch. Thanks for providing the option! Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
In reply to this post by Christopher J. Demers
Chris,
> No luck with this code. I tried it both as you wrote it, and with an > another variable doing them same thing to the result of asExternalAddress as > well. The corruption occurred with both versions. Then I want back to the > code that does not use asExternalAddress and did not have any corruptions. > I think there is something wrong with using yourAddress and/or > asExternalAddress in this way on certain machines. Thanks - I'm just about to create the hot fix that I mentioned moments ago. I was going to use your _fix_<g> as written anyway, but now it's pretty obviously the right thing to do. > I ran a D5 EXE with my fix on my NT machine last night. I had it looping > 100,000 times sending a view back and forth to STB. It eventually consumed > all my system resources (either there was a leak, or my half second delay > did not let it GC), but it did not cause any STB corruptions. So as you may > have noticed I am not using quotes around fix any more. I will keep > testing, but I think the fix works. I would encourage you to give it a shot > and see if it alleviates your problems. Will do - very shortly. THANKS!!!! Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
In reply to this post by Christopher J. Demers
Andy, Chris,
> > The thing that puzzles us about your fix is that it is not that piece of > > code (#writeInstanceVariables) that is responsible for writing out the > > corrupted class names but, rather, #writeClass:withPrefix:. Anyway, we > await > > the results of your further testing with interest. > > I wondered about that too and it kept me from looking at that method sooner, > here are my thoughts: My most common corruption before was > IdentityDictionary. I also saw corruption in the display name of a tab in > the class browser once, and lately I have seen it in method names. The > later two seem like they certainly could be in instance variable values on > some level. I think the first item is as well, since it is contained in an > STBIdentityDictionaryProxy which has an instance variable for the class. I > just looked at the code, and I am just as puzzled as you are, because I > think the class should still be routed to #writeClass:withPrefix: even if it > is in an instance variable. Well I can't explain why, but the fix I > described appears to have eliminated the STB corruption problem for me on my > problem computer. I sorta hate to mention it, but Davorin has raised a question of the same kind of corruption absent STB. Could the problem be something more generic with #yourAddress or #asExternalAddress? Maybe your problematic NT machine could spend some time looking for objects that move around after the address is grabbed, as you theorized might be happening. Perhaps we should create something similar to Davorin's DLL (maybe even the same DLL if he's willing and able to share it with you) and make LOTS of calls??? Failing that, perhaps we could rig up a DLL that even just copies a string from one buffer to another, and have yourAddress asExternalAddress figure in the call; hopefully the corruption would show itself by looking at the results after the call. Another possibility (wild guess only) would be a problem w/ hashing. Davorin, do you use dictionaries in your problematic case? Do you explictly obtain addresses? If so, using which messages to what kinds of objects? Feel free to ask yourself any questions I'm forgetting :) Ok, on to that hot fix. Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
Bill Schwab <[hidden email]> wrote in message
news:aoi8l3$m0ugm$[hidden email]... > > I sorta hate to mention it, but Davorin has raised a question of the same > kind of corruption absent STB. Could the problem be something more generic > with #yourAddress or #asExternalAddress? Maybe your problematic NT machine > could spend some time looking for objects that move around after the address ... This thread has turned into a bit of a monster, so I may be having trouble following it or missing messages. I saw a post from "rush" that sounds like what you are referring to. Is "rush" Davorin, or am I missing some posts? If someone wants to write some simple test code I would be happy to run it on my "problem machine" as my time permits. However I probably won't have time to write the test code for this myself. I have already spent more time working on this STB corruption issue than I expected to. Now that is seems to be resolved I have to get back to slaying some other dragons. I would have thought that if this corruption issue could happen in regard to interfacing with external DLL's that the image itself would be a good deal less stable. As I said I don't work in Dolphin on my problem computer any more, but while I was kicking the development environment around to experiment with the STB corruptions I did not experience any instability. Surely Dolphin calls a good many API functions in DLL's during normal use. However this world is full of strange things, so who really knows? Chris |
Chris,
> This thread has turned into a bit of a monster, so I may be having trouble > following it or missing messages. I saw a post from "rush" that sounds like > what you are referring to. Is "rush" Davorin, or am I missing some posts? I hope so :) > If someone wants to write some simple test code I would be happy to run it > on my "problem machine" as my time permits. However I probably won't have > time to write the test code for this myself. I have already spent more time > working on this STB corruption issue than I expected to. Now that is seems > to be resolved I have to get back to slaying some other dragons. Fair enough. I'll throw something together, but like you, I'm over quota on this for the week. Well, the real truth is that I have to give a talk on Thursday, and it's not entirely done. I'll try to write something next week. > I would have thought that if this corruption issue could happen in regard to > interfacing with external DLL's that the image itself would be a good deal > less stable. As I said I don't work in Dolphin on my problem computer any > more, but while I was kicking the development environment around to > experiment with the STB corruptions I did not experience any instability. > Surely Dolphin calls a good many API functions in DLL's during normal use. > However this world is full of strange things, so who really knows? Understood, but one failure every 1500 hours is more than worth pursuing. The horse is already dead, so I say we practice our backhand :) Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
"Bill Schwab" <[hidden email]> wrote in message
news:aoihtc$mnn55$[hidden email]... > > what you are referring to. Is "rush" Davorin, or am I missing some posts? > > I hope so :) It's me rush == Davorin Rusevljan, and my appologies to everyone for confusion about my identity. I thought that since there is another Davorin also posting here (Mestric), using my nick name would be a good thing. (not so sure anymore). rush |
In reply to this post by Bill Schwab-2
"Bill Schwab" <[hidden email]> wrote in message
news:aohfk1$mm4tf$[hidden email]... > Andy, > > > > The code does run though (I set a breakpoint to verify that my tweaked > > > version was excercising the branch that I altered), so if there is > > something > > > wrong with "yourAddress asExternalAddress", then it might make a > > difference. > > > Or, is "yourAddress asExternalAddress" guaranteed to be safe? > > > > Well I wouldn't ever like to use the word "guaranteed" :-) but providing > the > > object in question doesn't go out of scope and get GC'd while that > external > > address is used there shouldn't be a probelm. I can't see how that could > > happen with the code in question since objectToSave is a parameter and > must > > be in scope until at least the end of the method. > > Understood, but what about the object returned by #yourAddress? I'm > wondering if it gets away, perhaps only when it is or isn't already in > particular format. Its a SmallInteger representation of the address. But think it through, even if it were a non-immediate object (i.e. one that consumed memory subject to GC) then it must remain in scope for the required time because it is the receiver of the #asExternalAddress method. Regards Blair |
In reply to this post by Christopher J. Demers
"Christopher J. Demers" <[hidden email]> wrote in
message news:aoig26$mgmqp$[hidden email]... > on my "problem machine" as my time permits. However I probably won't have > time to write the test code for this myself. I have already spent more time > working on this STB corruption issue than I expected to. Now that is seems > to be resolved I have to get back to slaying some other dragons. > I would have thought that if this corruption issue could happen in regard to > interfacing with external DLL's that the image itself would be a good deal > less stable. As I said I don't work in Dolphin on my problem computer any > more, but while I was kicking the development environment around to > experiment with the STB corruptions I did not experience any instability. > Surely Dolphin calls a good many API functions in DLL's during normal use. > However this world is full of strange things, so who really knows? Chirs, do not get me wrong, I am, and I think also everyone else is very gratefull to your finding and effort you have putted into it. And certanly no one finds you obligated by any means to debug it any more. My writing is more oriented towards OA, to provide them some information in hope that it can be of some help to them. What I am concerned about is that I suspect that your fix has curred a simptom of a problem, but that cause is still laying somewhere. I say suspect since there is possibility that the bug is in our dll or in our smalltalk code, and that our problem is completely unrelated to the problem you have decribed and fixed. However here are the reasons that make me suspect that this is another manifestation of the same problem (none of them I consider a proof but a hint or a lead): a) 4 bytes corruption in both cases. b) it does not happen allways, but under prolonged period c) the problem seems to be exagerated by slower or more stressed machines d) I am not sure that I understand why the code you have replaced is wrong and how does it exactly generate 4 bytes corruption. e) I am not sure that dialog box text corruption presented by Ian(?) is related to the stb also? Another thing that leads me to suspect the subtle and rare VM problem is that we noticed the problem for the first time started when we switched the VM (98 to 3.x). Another theory can be that In our program we use same "anti pattern" as OA in the image, and the new Dolphin just provoked it. I have looked at our code again, and the place where we are getting the wrong results is something like this: ByteArray fromAddress: self theKrypt buffer length: len sometimes, the byte array obtained this way has first 4 bytes corrupted. again it is possible that the bug is in our dll, but: a) it worked ok in dolphin 98. b) the dll does the crc checks on the data several times, and once very close before the buffer is returned. we can not completely exclude that we do not currupt the buffer from our smalltalk code but I would not rate the probability of the bug beeing there above 15-20% I have looked at the #fromAddress:length: code and it is like: ^anAddress asExternalAddress replaceBytesOf: (self new: anInteger) from: 1 to anInteger startingAt:1 ----- I think we have a few unused per incident support tickets, and if OA thinks this is enough data to get into it, I would be more than happy to spend them. If not I'll wait another occassion, when more data will surface. Finding and solving rare and stubbron bugs is patience and endurance game, but I think not giving up on such problems and solving them even after a few years, greatly improves the quality of the product. (something like aging with vine). But my gut feeling tells me this is a right time to squash this one. rush |
In reply to this post by Bill Schwab-2
"Bill Schwab" <[hidden email]> wrote in message
news:aoi8l3$m0ugm$[hidden email]... > Andy, Chris, > > > > The thing that puzzles us about your fix is that it is not that piece of > > > code (#writeInstanceVariables) that is responsible for writing out the > > > corrupted class names but, rather, #writeClass:withPrefix:. Anyway, we > > await > > > the results of your further testing with interest. > > > > I wondered about that too and it kept me from looking at that method > sooner, > > here are my thoughts: My most common corruption before was > > IdentityDictionary. I also saw corruption in the display name of a tab in > > the class browser once, and lately I have seen it in method names. The > > later two seem like they certainly could be in instance variable values on > > some level. I think the first item is as well, since it is contained in > an > > STBIdentityDictionaryProxy which has an instance variable for the class. > I > > just looked at the code, and I am just as puzzled as you are, because I > > think the class should still be routed to #writeClass:withPrefix: even if > it > > is in an instance variable. Well I can't explain why, but the fix I > > described appears to have eliminated the STB corruption problem for me on > my > > problem computer. > > I sorta hate to mention it, but Davorin has raised a question of the same > kind of corruption absent STB. Could the problem be something more generic > with #yourAddress or #asExternalAddress? Whoa there. The evidence by no means suggests any such general issue. What evidence is there to draw the conclusion that Davorin's problem is related? He mentions that he started to experience corruption in an application that does a lot of FFI work when he moved from D2.1 to D3. This is far more likely to be due to a buffer overrun in the external code; the memory subsystem in 3.x and later is much "tighter", and with less slop such problems are more likely to show up. Furthermore Dolphin runs thousands of API calls per second, because it does most things from the image rather than the VM. The #yourAddress and #asExternalAddress methods are very heavily used. Regarding the STB issue, lets look at the evidence there and try and apply some logical thought rather than jumping to conclusions. There is a distinct pattern, judging from peoples reports. The first four bytes of certain strings, very often class names, get corrupted. We can't be sure this is limited to strings, it could be any byte object, since it would be far less visible in ByteArrays, and anyway those are less common. It doesn't appear to affect "pointer" objects, because if it did this would probably crash the system, although that might not happen if the corruption is caused by another valid Oop being written. What is particularly distinctive about this corruption, is that it always seems to be of the first four bytes. We can visually see that it doesn't go any further than that. We can conclude that it doesn't start any earlier, because the visible part of the string object is preceeded by a header that includes the size of the string. If the header were being corrupted, then the size of the string would very likely get corrupted, and this would be obvious since it would either be truncated, or include extra junk at the end. Ian suggested that perhaps the memory in question was being prematurely re-used. A quite reasonable theory, but on consideration it doesn't fit the observed evidence. If this were the case, then it is unlikely that we would see the corruption that consistently occurs. Firstly it seems very unlikely that the size of the new object would be precisely the same as the one who's memory it is overwriting, and secondly why would the corruption always be limited to that first four bytes? Ian was mislead by the dump output because in the one case the raw string has been printed, where as in the other it is showing octal/hex (can't remember which) escape sequences. This is because the VM's crash dump code uses C++ iostreams, and presumably previous output has left the stream in a different mode when the stack object is printed. A more plausible explanation would be that the string is itself re-using memory that is still in use by another object. If this were a pointer object and its first field was updated, then this might conceivably explain the corruption that has been observed. Finally let us consider Chris' proposed fix. Unfortunately we can't be absolutely sure that this fixes the problem, because we cannot reliably produce it (in fact I have not been able to reproduce it at all yet). Therefore we have to rely on statistical evidence, and he seems confident that it has stopped occurring. However, given that it is an intermittent fault, and we haven't been able to identify a reasonable cause, we can't be sure that this isn't because we have changed the dynamics of the situation, and just hidden whatever problem exists. In particular the fact that most of the corrupted strings that have been observed have been the class names that are not even written by this code, is suggestive. Remember that those class names are written from a string which is a copy of the class name symbol, and that is copied again into a byte array that is then written. No one has reported that the original class name has been permanently corrupted. Therefore it seems very unlikely that the code in question could be directly responsible for the corruption of the class name objects - those names are not even in existence at any time it runs, and they have already been written to the stream. So why does it appear to correct the problem (if it really does)? At the moment its just guess work, but we could speculate that it indicates a possible issue with the stream block write operation. The change switches back to using #nextPut: on a byte-by-byte basis, which would certainly change how the stream is used. However the intermittent nature of the fault does not really support this theory. In summary I don't think we yet have enough to go on. A hotfix is undoubtedly useful, but I won't be satisfied until we can produce a reproduceable case that we can reliably correct, or failing that a reasonably explanation of what is wrong and why the fix works. Regards Blair |
Blair,
> Whoa there. The evidence by no means suggests any such general issue. What > evidence is there to draw the conclusion that Davorin's problem is related? > He mentions that he started to experience corruption in an application that > does a lot of FFI work when he moved from D2.1 to D3. This is far more > likely to be due to a buffer overrun in the external code; the memory > subsystem in 3.x and later is much "tighter", and with less slop such > problems are more likely to show up. Furthermore Dolphin runs thousands of > API calls per second, because it does most things from the image rather than > the VM. The #yourAddress and #asExternalAddress methods are very heavily > used. With respect, such leaps of illogic (which I freely admit is the case - it's legal, in fact encouraged in brainstorming sessions, which is what this has been) have uncovered bugs in Dolphin, and in my code too on several occaisions. Granted, #yourAddress and #asExternalAddress are heavily used. BUT, the concatenation of them occurs only four times that I found with a naive search - that same combination is at the heart of something that Chris found suspicious, so I ran with it just as Chris ran with a suggestion from Ian. I'll write the test unless Davorin has something ready made, and Chris has agreed to run it on a machine that has (thankfully) exhibited an ability to reproduce a long standing bug. That's not meant to trivialize the other excellent analysis that you've provided. Here's hoping you'll soon put it all together and patch the real problem. Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
In reply to this post by Blair McGlashan
"Blair McGlashan" <[hidden email]> wrote in message
news:aoja9i$l82g0$[hidden email]... > [.... snip...] > [re the fix] > So why does it appear to correct the problem (if it really does)? At the > moment its just guess work, but we could speculate that it indicates a > possible issue with the stream block write operation. The change switches > back to using #nextPut: on a byte-by-byte basis, which would certainly > change how the stream is used. However the intermittent nature of the fault > does not really support this theory. We have identified a definite bug in WriteStream>>next:putAll:startingAt:, or more specifically the fact that FileStream does not override it. This can be seen by running the attached SUnit. As you will see any block write to a file stream through #next:putAll:startingAt: can fail to get written out if it fits within a "page" (8192 bytes), and if the write in question was the first and last to that page. The reason is simple: The WriteStream>>next:putAll:startingAt: implementation optimizes the case where a block fits within the "collection", by using a block transfer operation. Where the write is too large, it goes through #nextPut:. FileStreams are implemented as a sort of write through cache onto a page of a file. Correct updating relies on a dirty flag to know whether the current page has been updated and needs to be written back when a new page is loaded (or the file stream closed). If any block write to a FileStream occurs when the dirty flag is not yet set, and the write fits within the current page, and no further writes occur to the page that set the dirty flag, then the page will not get written out, and the updates are lost. I have not been able to come up with a scenario which would explain how the bug could create the sort of corruption seen. However, there is no doubt that this is a real bug, and Chris' change (which is essentially to fall back on #nextPut: rather than the #next:putAll:startingAt:) would work around this bug, and this is a possible explanation as for why his patch works. I have also attached a patch for this so that those who are experiencing the problem can try it out. Of course you'll need to use the original STB code, or this patch will not get exercised. Regards Blair ------------------------------------------ "Filed out from Dolphin Smalltalk XP 2002 release 5.10"! TestCase subclass: #FileStreamTest2 instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' classInstanceVariableNames: ''! !FileStreamTest2 methodsFor! failureLog "dialect-specific" "VA - System errorLog" "VW, Dolphin - Transcript" ^Transcript ! isLogging ^true! testNextPutAllStartingAt | stream name data r written | name := File temporaryFilename. r := Random new. #(0 1 512 4095 4096 4097 8191 8192 8193 16383 16384 16385) do: [:i | data := ((1 to: i) collect: [:each | (r next * 256) truncated]) asByteArray. stream := FileStream write: name text: false. [stream next: data size putAll: data startingAt: 1] ensure: [stream close]. stream := FileStream read: name text: false. written := [stream contents] ensure: [stream close]. self assert: data = written description: 'Failed on write of ' , i printString , ' bytes' resumable: true]! ! !FileStreamTest2 categoriesFor: #failureLog!Accessing!public! ! !FileStreamTest2 categoriesFor: #isLogging!Accessing!public! ! !FileStreamTest2 categoriesFor: #testNextPutAllStartingAt!public!unit tests! ! "--------------------------------------------------------------------------- ----- And the patch - run the test first before installing this to see the failure The eventual fix will not be quite like this, because there is a refactoring opportunity here, but this is the simplest, lowest risk, fix. ---------------------------------------------------------------------------- -----"! !FileStream methodsFor! next: size putAll: aSequenceableCollection startingAt: start "Append countInteger elements of aSequenceableCollection from startInteger to the receiver. Answer aSequenceableCollection." | stop | stop := position + size. stop <= writeLimit ifTrue: ["The block fits into the current collection and can be written directly" collection replaceFrom: position + 1 to: stop with: aSequenceableCollection startingAt: start. position := stop. self beDirty] ifFalse: ["The block will not fit within the current collection, so fall back on the slow way" start to: start + size - 1 do: [:i | self nextPut: (aSequenceableCollection at: i)]]. ^aSequenceableCollection! ! !FileStream categoriesFor: #next:putAll:startingAt:!accessing!public! ! |
In reply to this post by Christopher J. Demers
Christopher,
> If someone wants to write some simple test code I would be happy to run it > on my "problem machine" as my time permits. However I probably won't have > time to write the test code for this myself. I have already spent more time > working on this STB corruption issue than I expected to. Now that is seems > to be resolved I have to get back to slaying some other dragons. Could you (when you have time) try rewriting the offending bit to: class isBytes ifTrue: [| was | was := objectToSave yourAddress. stream next: basicSize putAll: was asExternalAddress startingAt: 1. self assert: [was = objectToSave yourAddress]] ifFalse: .... and see if the test throws assertion failures at the same time as the errors. Probably a waste of time, but you never know... -- chris |
In reply to this post by Bill Schwab-2
Bill,
> Thanks for providing the option! You're thanking the wrong "Chris" I'm afraid. But I'm sure that Christopher J Demers took the will for the deed. Whereas I am just: -- chris |
Free forum by Nabble | Edit this page |