Hello all,
I'm not sure what to make of this one. I just spent a couple of hours trying to find the "leak" in an algorithm of mine. It was reading roughly 1200 records, claiming to have processed all of them, and yet writing only about 450 rows into an output text file. One clue should have been that the number of output rows was somewhat random; I did not fully appreciate that until I worked around the problme. I tried an explicit #flush - no help. I looked for logic errors and found none. The file was being written to a Windows hosted share mounted by CIFS (which I am learning to view with contempt) from Ubuntu 9.04. You can see where this is going: writing the file locally gave the expected result. Any ideas on how one might further isolate the problem? My trust in Windows is well known<g>; I have never liked shared directories; I _really_ do not like CIFS as compared (reliability wise) to SMBFS; the network between me and the server is in question too (long story). All of that said, file support in Squeak, and hence so far inherited by Pharo, is not the best code I have seen to date, so it is easy to suspect too. Can one argue that since it worked locally, Pharo is not the problem? The little bit that I know of cifs is not encouraging. It sounds as though things moved from an easily killed process into the kernel which shows an almost Windows-like unwillingness to shut down when it cannot see servers. I have found numerous reports of problems copying large files over cifs, and I have enountered them too. Bill _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
I do not really know if this is related but yesterday I got a problem
with a student saving MCZ files on usb disc. May be the OS was using a buffer but none of the files where wirtten there, when surprisingly the changes were saved. Stef On Oct 23, 2009, at 2:31 AM, Schwab,Wilhelm K wrote: > Hello all, > > I'm not sure what to make of this one. I just spent a couple of > hours trying to find the "leak" in an algorithm of mine. It was > reading roughly 1200 records, claiming to have processed all of > them, and yet writing only about 450 rows into an output text file. > One clue should have been that the number of output rows was > somewhat random; I did not fully appreciate that until I worked > around the problme. > > I tried an explicit #flush - no help. I looked for logic errors and > found none. The file was being written to a Windows hosted share > mounted by CIFS (which I am learning to view with contempt) from > Ubuntu 9.04. You can see where this is going: writing the file > locally gave the expected result. > > Any ideas on how one might further isolate the problem? My trust in > Windows is well known<g>; I have never liked shared directories; I > _really_ do not like CIFS as compared (reliability wise) to SMBFS; > the network between me and the server is in question too (long > story). All of that said, file support in Squeak, and hence so far > inherited by Pharo, is not the best code I have seen to date, so it > is easy to suspect too. Can one argue that since it worked locally, > Pharo is not the problem? > > The little bit that I know of cifs is not encouraging. It sounds as > though things moved from an easily killed process into the kernel > which shows an almost Windows-like unwillingness to shut down when > it cannot see servers. I have found numerous reports of problems > copying large files over cifs, and I have enountered them too. > > Bill > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Stef,
If you can think of any ways to reproduce it, let me know - I have reliably bad network connectivity at your service :) Bill -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Stéphane Ducasse Sent: Friday, October 23, 2009 2:39 AM To: [hidden email] Subject: Re: [Pharo-project] Shared directories: The bug is quicker than the eye I do not really know if this is related but yesterday I got a problem with a student saving MCZ files on usb disc. May be the OS was using a buffer but none of the files where wirtten there, when surprisingly the changes were saved. Stef On Oct 23, 2009, at 2:31 AM, Schwab,Wilhelm K wrote: > Hello all, > > I'm not sure what to make of this one. I just spent a couple of hours > trying to find the "leak" in an algorithm of mine. It was reading > roughly 1200 records, claiming to have processed all of > them, and yet writing only about 450 rows into an output text file. > One clue should have been that the number of output rows was somewhat > random; I did not fully appreciate that until I worked around the > problme. > > I tried an explicit #flush - no help. I looked for logic errors and > found none. The file was being written to a Windows hosted share > mounted by CIFS (which I am learning to view with contempt) from > Ubuntu 9.04. You can see where this is going: writing the file > locally gave the expected result. > > Any ideas on how one might further isolate the problem? My trust in > Windows is well known<g>; I have never liked shared directories; I > _really_ do not like CIFS as compared (reliability wise) to SMBFS; the > network between me and the server is in question too (long story). > All of that said, file support in Squeak, and hence so far inherited > by Pharo, is not the best code I have seen to date, so it is easy to > suspect too. Can one argue that since it worked locally, Pharo is not > the problem? > > The little bit that I know of cifs is not encouraging. It sounds as > though things moved from an easily killed process into the kernel > which shows an almost Windows-like unwillingness to shut down when it > cannot see servers. I have found numerous reports of problems copying > large files over cifs, and I have enountered them too. > > Bill > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Schwab,Wilhelm K
If you invoke the flush primitive then it does this on unix/linux/mac-
osx/iPhone http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man3/fflush.3.html & http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man2/write.2.html#//apple_ref/doc/man/2/write sqInt sqFileFlush(SQFile *f) { /* Return the length of the given file. */ if (!sqFileValid(f)) return interpreterProxy->success(false); fflush(getFile(f)); return 1; } On windows it does http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx sqInt sqFileFlush(SQFile *f) { if (!sqFileValid(f)) FAIL(); /* note: ignores the return value in case of read-only access */ FlushFileBuffers(FILE_HANDLE(f)); return 1; } I'll note the api doesn't actually check for errors, give feedback or whatever. *cough* it could be failing and give you a clue, but you would need to resort to FFI to do the file system calls to get the data to visualize, or build your own VM with that returns the result. On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote: > Hello all, > > I'm not sure what to make of this one. I just spent a couple of > hours trying to find the "leak" in an algorithm of mine. It was > reading roughly 1200 records, claiming to have processed all of > them, and yet writing only about 450 rows into an output text file. > One clue should have been that the number of output rows was > somewhat random; I did not fully appreciate that until I worked > around the problme. > > I tried an explicit #flush - no help. I looked for logic errors and > found none. The file was being written to a Windows hosted share > mounted by CIFS (which I am learning to view with contempt) from > Ubuntu 9.04. You can see where this is going: writing the file > locally gave the expected result. > > Any ideas on how one might further isolate the problem? My trust in > Windows is well known<g>; I have never liked shared directories; I > _really_ do not like CIFS as compared (reliability wise) to SMBFS; > the network between me and the server is in question too (long > story). All of that said, file support in Squeak, and hence so far > inherited by Pharo, is not the best code I have seen to date, so it > is easy to suspect too. Can one argue that since it worked locally, > Pharo is not the problem? > > The little bit that I know of cifs is not encouraging. It sounds as > though things moved from an easily killed process into the kernel > which shows an almost Windows-like unwillingness to shut down when > it cannot see servers. I have found numerous reports of problems > copying large files over cifs, and I have enountered them too. > > Bill > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project -- = = = ======================================================================== John M. McIntosh <[hidden email]> Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ======================================================================== _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
John,
That sounds badly, badly, broken. Is it "only" #flush that suffers, or will any of the I/O fail silently? Bill -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of John M McIntosh Sent: Friday, October 23, 2009 1:57 PM To: [hidden email] Subject: Re: [Pharo-project] Shared directories: The bug is quicker than the eye If you invoke the flush primitive then it does this on unix/linux/mac- osx/iPhone http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man3/fflush.3.html & http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man2/write.2.html#//apple_ref/doc/man/2/write sqInt sqFileFlush(SQFile *f) { /* Return the length of the given file. */ if (!sqFileValid(f)) return interpreterProxy->success(false); fflush(getFile(f)); return 1; } On windows it does http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx sqInt sqFileFlush(SQFile *f) { if (!sqFileValid(f)) FAIL(); /* note: ignores the return value in case of read-only access */ FlushFileBuffers(FILE_HANDLE(f)); return 1; } I'll note the api doesn't actually check for errors, give feedback or whatever. *cough* it could be failing and give you a clue, but you would need to resort to FFI to do the file system calls to get the data to visualize, or build your own VM with that returns the result. On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote: > Hello all, > > I'm not sure what to make of this one. I just spent a couple of hours > trying to find the "leak" in an algorithm of mine. It was reading > roughly 1200 records, claiming to have processed all of > them, and yet writing only about 450 rows into an output text file. > One clue should have been that the number of output rows was somewhat > random; I did not fully appreciate that until I worked around the > problme. > > I tried an explicit #flush - no help. I looked for logic errors and > found none. The file was being written to a Windows hosted share > mounted by CIFS (which I am learning to view with contempt) from > Ubuntu 9.04. You can see where this is going: writing the file > locally gave the expected result. > > Any ideas on how one might further isolate the problem? My trust in > Windows is well known<g>; I have never liked shared directories; I > _really_ do not like CIFS as compared (reliability wise) to SMBFS; the > network between me and the server is in question too (long story). > All of that said, file support in Squeak, and hence so far inherited > by Pharo, is not the best code I have seen to date, so it is easy to > suspect too. Can one argue that since it worked locally, Pharo is not > the problem? > > The little bit that I know of cifs is not encouraging. It sounds as > though things moved from an easily killed process into the kernel > which shows an almost Windows-like unwillingness to shut down when it > cannot see servers. I have found numerous reports of problems copying > large files over cifs, and I have enountered them too. > > Bill > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project -- = = = ======================================================================== John M. McIntosh <[hidden email]> Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ======================================================================== _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Well the rate of change in the file i/o plugin is rather glacial.
The write code on unix systems is below, you'll note the magic, "lastOp", "position" and a seek or two to ensure where you thought you wrote to, is where you wrote to. Also setSize() is busy updating the file size, which later fools you if you are reading from a shared file because you don't actually ask for the real file size, you get back the size we think it was in the past. However the fwrite() does return the bytes written and there is a check to see if it matches expectations, and if not this it setups the primitive call to fail. So if the write failed it would run the primitive failure call smalltalk code what that does, or who handles the exception raised if any, I have no idea... size_t sqFileWriteFromAt(SQFile *f, size_t count, char* byteArrayIndex, size_t startIndex) { /* Write count bytes to the given writable file starting at startIndex in the given byteArray. (See comment in sqFileReadIntoAt for interpretation of byteArray and startIndex). */ char *src; size_t bytesWritten; squeakFileOffsetType position; FILE *file; if (!(sqFileValid(f) && f->writable)) return interpreterProxy->success (false); file= getFile(f); if (f->lastOp == READ_OP) fseek(file, 0, SEEK_CUR); /* seek between reading and writing */ src = byteArrayIndex + startIndex; bytesWritten = fwrite(src, 1, count, file); position = ftell(file); if (position > getSize(f)) { setSize(f, position); /* update file size */ } if (bytesWritten != count) { interpreterProxy->success(false); } f->lastOp = WRITE_OP; return bytesWritten; } On 2009-10-23, at 12:43 PM, Schwab,Wilhelm K wrote: > John, > > That sounds badly, badly, broken. Is it "only" #flush that suffers, > or will any of the I/O fail silently? > > Bill > > > > -----Original Message----- > From: [hidden email] [mailto:pharo- > [hidden email]] On Behalf Of John M McIntosh > Sent: Friday, October 23, 2009 1:57 PM > To: [hidden email] > Subject: Re: [Pharo-project] Shared directories: The bug is quicker > than the eye > > If you invoke the flush primitive then it does this on unix/linux/ > mac- osx/iPhone http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man3/fflush.3.html > & > http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man2/write.2.html#//apple_ref/doc/man/2/write > > sqInt sqFileFlush(SQFile *f) { > /* Return the length of the given file. */ > > if (!sqFileValid(f)) return interpreterProxy->success(false); > fflush(getFile(f)); > return 1; > } > > On windows it does > http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx > > sqInt sqFileFlush(SQFile *f) { > if (!sqFileValid(f)) FAIL(); > /* note: ignores the return value in case of read-only access */ > FlushFileBuffers(FILE_HANDLE(f)); > return 1; > } > > I'll note the api doesn't actually check for errors, give feedback > or whatever. > *cough* it could be failing and give you a clue, but you would need > to resort to FFI to do the file system calls to get the data to > visualize, or build your own VM with that returns the result. > > > > On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote: > >> Hello all, >> >> I'm not sure what to make of this one. I just spent a couple of >> hours >> trying to find the "leak" in an algorithm of mine. It was reading >> roughly 1200 records, claiming to have processed all of >> them, and yet writing only about 450 rows into an output text file. >> One clue should have been that the number of output rows was somewhat >> random; I did not fully appreciate that until I worked around the >> problme. >> >> I tried an explicit #flush - no help. I looked for logic errors and >> found none. The file was being written to a Windows hosted share >> mounted by CIFS (which I am learning to view with contempt) from >> Ubuntu 9.04. You can see where this is going: writing the file >> locally gave the expected result. >> >> Any ideas on how one might further isolate the problem? My trust in >> Windows is well known<g>; I have never liked shared directories; I >> _really_ do not like CIFS as compared (reliability wise) to SMBFS; >> the >> network between me and the server is in question too (long story). >> All of that said, file support in Squeak, and hence so far inherited >> by Pharo, is not the best code I have seen to date, so it is easy to >> suspect too. Can one argue that since it worked locally, Pharo is >> not >> the problem? >> >> The little bit that I know of cifs is not encouraging. It sounds as >> though things moved from an easily killed process into the kernel >> which shows an almost Windows-like unwillingness to shut down when it >> cannot see servers. I have found numerous reports of problems >> copying >> large files over cifs, and I have enountered them too. >> >> Bill >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > -- > = > = > = > = > = > ====================================================================== > John M. McIntosh <[hidden email]> Twitter: > squeaker68882 > Corporate Smalltalk Consulting Ltd. http:// > www.smalltalkconsulting.com > = > = > = > = > = > ====================================================================== > > > > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project -- = = = ======================================================================== John M. McIntosh <[hidden email]> Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ======================================================================== _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by johnmci
I don't know if this helps, but for purposes of debugging here is
a way to get at the error code from a flush(). No doubt I wrote the primitive by accident, not noticing that there was already a primitiveFileFlush in the FilePlugin. aStream := FileStream readOnlyFileNamed: 'foo.tmp'. result := OSProcess accessor flushExternalStream: aStream fileID. errorString := OSProcess accessor primErrorMessageAt: result. Dave On Fri, Oct 23, 2009 at 11:57:06AM -0700, John M McIntosh wrote: > If you invoke the flush primitive then it does this on unix/linux/mac- > osx/iPhone > http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man3/fflush.3.html > & > http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man2/write.2.html#//apple_ref/doc/man/2/write > > sqInt sqFileFlush(SQFile *f) { > /* Return the length of the given file. */ > > if (!sqFileValid(f)) return interpreterProxy->success(false); > fflush(getFile(f)); > return 1; > } > > On windows it does > http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx > > sqInt sqFileFlush(SQFile *f) { > if (!sqFileValid(f)) FAIL(); > /* note: ignores the return value in case of read-only access */ > FlushFileBuffers(FILE_HANDLE(f)); > return 1; > } > > I'll note the api doesn't actually check for errors, give feedback or > whatever. > *cough* it could be failing and give you a clue, but you would need to > resort to FFI to do the file system calls to > get the data to visualize, or build your own VM with that returns the > result. > > > > On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote: > > > Hello all, > > > > I'm not sure what to make of this one. I just spent a couple of > > hours trying to find the "leak" in an algorithm of mine. It was > > reading roughly 1200 records, claiming to have processed all of > > them, and yet writing only about 450 rows into an output text file. > > One clue should have been that the number of output rows was > > somewhat random; I did not fully appreciate that until I worked > > around the problme. > > > > I tried an explicit #flush - no help. I looked for logic errors and > > found none. The file was being written to a Windows hosted share > > mounted by CIFS (which I am learning to view with contempt) from > > Ubuntu 9.04. You can see where this is going: writing the file > > locally gave the expected result. > > > > Any ideas on how one might further isolate the problem? My trust in > > Windows is well known<g>; I have never liked shared directories; I > > _really_ do not like CIFS as compared (reliability wise) to SMBFS; > > the network between me and the server is in question too (long > > story). All of that said, file support in Squeak, and hence so far > > inherited by Pharo, is not the best code I have seen to date, so it > > is easy to suspect too. Can one argue that since it worked locally, > > Pharo is not the problem? > > > > The little bit that I know of cifs is not encouraging. It sounds as > > though things moved from an easily killed process into the kernel > > which shows an almost Windows-like unwillingness to shut down when > > it cannot see servers. I have found numerous reports of problems > > copying large files over cifs, and I have enountered them too. > > > > Bill > > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Dave,
I'm not a fully trained penguin herder yet, so I have to ask a dumb question: is the idea to connect to the same file that went wrong and get an error condition after the fact, or are you hoping that opening and flushing will cause an error by itself? I suspect the problem is very real, and more "random" than that could hope to uncover. Bill -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of David T. Lewis Sent: Sunday, October 25, 2009 3:27 PM To: [hidden email]; [hidden email] Subject: Re: [Pharo-project] Shared directories: The bug is quicker than the eye I don't know if this helps, but for purposes of debugging here is a way to get at the error code from a flush(). No doubt I wrote the primitive by accident, not noticing that there was already a primitiveFileFlush in the FilePlugin. aStream := FileStream readOnlyFileNamed: 'foo.tmp'. result := OSProcess accessor flushExternalStream: aStream fileID. errorString := OSProcess accessor primErrorMessageAt: result. Dave On Fri, Oct 23, 2009 at 11:57:06AM -0700, John M McIntosh wrote: > If you invoke the flush primitive then it does this on unix/linux/mac- > osx/iPhone > http://developer.apple.com/mac/library/documentation/Darwin/Reference/ > ManPages/man3/fflush.3.html > & > http://developer.apple.com/mac/library/documentation/Darwin/Reference/ > ManPages/man2/write.2.html#//apple_ref/doc/man/2/write > > sqInt sqFileFlush(SQFile *f) { > /* Return the length of the given file. */ > > if (!sqFileValid(f)) return interpreterProxy->success(false); > fflush(getFile(f)); > return 1; > } > > On windows it does > http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx > > sqInt sqFileFlush(SQFile *f) { > if (!sqFileValid(f)) FAIL(); > /* note: ignores the return value in case of read-only access */ > FlushFileBuffers(FILE_HANDLE(f)); > return 1; > } > > I'll note the api doesn't actually check for errors, give feedback or > whatever. > *cough* it could be failing and give you a clue, but you would need to > resort to FFI to do the file system calls to get the data to > visualize, or build your own VM with that returns the result. > > > > On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote: > > > Hello all, > > > > I'm not sure what to make of this one. I just spent a couple of > > hours trying to find the "leak" in an algorithm of mine. It was > > reading roughly 1200 records, claiming to have processed all of > > them, and yet writing only about 450 rows into an output text file. > > One clue should have been that the number of output rows was > > somewhat random; I did not fully appreciate that until I worked > > around the problme. > > > > I tried an explicit #flush - no help. I looked for logic errors and > > found none. The file was being written to a Windows hosted share > > mounted by CIFS (which I am learning to view with contempt) from > > Ubuntu 9.04. You can see where this is going: writing the file > > locally gave the expected result. > > > > Any ideas on how one might further isolate the problem? My trust in > > Windows is well known<g>; I have never liked shared directories; I > > _really_ do not like CIFS as compared (reliability wise) to SMBFS; > > the network between me and the server is in question too (long > > story). All of that said, file support in Squeak, and hence so far > > inherited by Pharo, is not the best code I have seen to date, so it > > is easy to suspect too. Can one argue that since it worked locally, > > Pharo is not the problem? > > > > The little bit that I know of cifs is not encouraging. It sounds as > > though things moved from an easily killed process into the kernel > > which shows an almost Windows-like unwillingness to shut down when > > it cannot see servers. I have found numerous reports of problems > > copying large files over cifs, and I have enountered them too. > > > > Bill > > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On any platform that supports the C library fflush(), there is numeric
error number ("errno") that is set if the fflush() call does not succeed (type "man fflush" on your Linux or OS X console for details). John pointed out that the #flush method does not provide you with the errno value in the event of a failure, so you cannot directly check the errno value to see if there was a problem. I was showing a way that this could be done using some OSProcess calls rather than the normal #flush. My assumption was that you have some kind of file stream in your code, and that you would do the OSProcess thing instead of #flush just for purposes of debugging the problem. My personal hunch is that you are not likely to see any failure from the #flush calls, even if you are doing the #flush and the data seems to be flushed down the bit bucket. The reason is that the result of the #flush will appear successful as long as the C library fflush() has handed the data over to the operating system. On networked file systems there is a long way to go between the time the data leaves your program (the VM doing fflush()) and the time that something actually lands on the spinning platter. So you can try checking the result of the #flush calls to be sure that they are not failing, but I'm guessing that you won't see any failures. HTH, Dave On Wed, Oct 28, 2009 at 03:10:00PM -0400, Schwab,Wilhelm K wrote: > Dave, > > I'm not a fully trained penguin herder yet, so I have to ask a dumb question: is the idea to connect to the same file that went wrong and get an error condition after the fact, or are you hoping that opening and flushing will cause an error by itself? I suspect the problem is very real, and more "random" than that could hope to uncover. > > Bill > > > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] On Behalf Of David T. Lewis > Sent: Sunday, October 25, 2009 3:27 PM > To: [hidden email]; [hidden email] > Subject: Re: [Pharo-project] Shared directories: The bug is quicker than the eye > > I don't know if this helps, but for purposes of debugging here is a way to get at the error code from a flush(). No doubt I wrote the primitive by accident, not noticing that there was already a primitiveFileFlush in the FilePlugin. > > aStream := FileStream readOnlyFileNamed: 'foo.tmp'. > result := OSProcess accessor flushExternalStream: aStream fileID. > errorString := OSProcess accessor primErrorMessageAt: result. > > Dave > > > On Fri, Oct 23, 2009 at 11:57:06AM -0700, John M McIntosh wrote: > > If you invoke the flush primitive then it does this on unix/linux/mac- > > osx/iPhone > > http://developer.apple.com/mac/library/documentation/Darwin/Reference/ > > ManPages/man3/fflush.3.html > > & > > http://developer.apple.com/mac/library/documentation/Darwin/Reference/ > > ManPages/man2/write.2.html#//apple_ref/doc/man/2/write > > > > sqInt sqFileFlush(SQFile *f) { > > /* Return the length of the given file. */ > > > > if (!sqFileValid(f)) return interpreterProxy->success(false); > > fflush(getFile(f)); > > return 1; > > } > > > > On windows it does > > http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx > > > > sqInt sqFileFlush(SQFile *f) { > > if (!sqFileValid(f)) FAIL(); > > /* note: ignores the return value in case of read-only access */ > > FlushFileBuffers(FILE_HANDLE(f)); > > return 1; > > } > > > > I'll note the api doesn't actually check for errors, give feedback or > > whatever. > > *cough* it could be failing and give you a clue, but you would need to > > resort to FFI to do the file system calls to get the data to > > visualize, or build your own VM with that returns the result. > > > > > > > > On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote: > > > > > Hello all, > > > > > > I'm not sure what to make of this one. I just spent a couple of > > > hours trying to find the "leak" in an algorithm of mine. It was > > > reading roughly 1200 records, claiming to have processed all of > > > them, and yet writing only about 450 rows into an output text file. > > > One clue should have been that the number of output rows was > > > somewhat random; I did not fully appreciate that until I worked > > > around the problme. > > > > > > I tried an explicit #flush - no help. I looked for logic errors and > > > found none. The file was being written to a Windows hosted share > > > mounted by CIFS (which I am learning to view with contempt) from > > > Ubuntu 9.04. You can see where this is going: writing the file > > > locally gave the expected result. > > > > > > Any ideas on how one might further isolate the problem? My trust in > > > Windows is well known<g>; I have never liked shared directories; I > > > _really_ do not like CIFS as compared (reliability wise) to SMBFS; > > > the network between me and the server is in question too (long > > > story). All of that said, file support in Squeak, and hence so far > > > inherited by Pharo, is not the best code I have seen to date, so it > > > is easy to suspect too. Can one argue that since it worked locally, > > > Pharo is not the problem? > > > > > > The little bit that I know of cifs is not encouraging. It sounds as > > > though things moved from an easily killed process into the kernel > > > which shows an almost Windows-like unwillingness to shut down when > > > it cannot see servers. I have found numerous reports of problems > > > copying large files over cifs, and I have enountered them too. > > > > > > Bill > > > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Dave,
I am pretty certain I am *not* following all of this, but it is something that I will hopefully check in the near future. I have been amazed at Squeak's willingness to suppress information about errors in other situations, so this is worth a careful look. Bill -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of David T. Lewis Sent: Wednesday, October 28, 2009 5:20 PM To: [hidden email] Subject: Re: [Pharo-project] Shared directories: The bug is quicker than the eye On any platform that supports the C library fflush(), there is numeric error number ("errno") that is set if the fflush() call does not succeed (type "man fflush" on your Linux or OS X console for details). John pointed out that the #flush method does not provide you with the errno value in the event of a failure, so you cannot directly check the errno value to see if there was a problem. I was showing a way that this could be done using some OSProcess calls rather than the normal #flush. My assumption was that you have some kind of file stream in your code, and that you would do the OSProcess thing instead of #flush just for purposes of debugging the problem. My personal hunch is that you are not likely to see any failure from the #flush calls, even if you are doing the #flush and the data seems to be flushed down the bit bucket. The reason is that the result of the #flush will appear successful as long as the C library fflush() has handed the data over to the operating system. On networked file systems there is a long way to go between the time the data leaves your program (the VM doing fflush()) and the time that something actually lands on the spinning platter. So you can try checking the result of the #flush calls to be sure that they are not failing, but I'm guessing that you won't see any failures. HTH, Dave On Wed, Oct 28, 2009 at 03:10:00PM -0400, Schwab,Wilhelm K wrote: > Dave, > > I'm not a fully trained penguin herder yet, so I have to ask a dumb question: is the idea to connect to the same file that went wrong and get an error condition after the fact, or are you hoping that opening and flushing will cause an error by itself? I suspect the problem is very real, and more "random" than that could hope to uncover. > > Bill > > > -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On Behalf Of > David T. Lewis > Sent: Sunday, October 25, 2009 3:27 PM > To: [hidden email]; > [hidden email] > Subject: Re: [Pharo-project] Shared directories: The bug is quicker > than the eye > > I don't know if this helps, but for purposes of debugging here is a way to get at the error code from a flush(). No doubt I wrote the primitive by accident, not noticing that there was already a primitiveFileFlush in the FilePlugin. > > aStream := FileStream readOnlyFileNamed: 'foo.tmp'. > result := OSProcess accessor flushExternalStream: aStream fileID. > errorString := OSProcess accessor primErrorMessageAt: result. > > Dave > > > On Fri, Oct 23, 2009 at 11:57:06AM -0700, John M McIntosh wrote: > > If you invoke the flush primitive then it does this on > > unix/linux/mac- osx/iPhone > > http://developer.apple.com/mac/library/documentation/Darwin/Referenc > > e/ > > ManPages/man3/fflush.3.html > > & > > http://developer.apple.com/mac/library/documentation/Darwin/Referenc > > e/ ManPages/man2/write.2.html#//apple_ref/doc/man/2/write > > > > sqInt sqFileFlush(SQFile *f) { > > /* Return the length of the given file. */ > > > > if (!sqFileValid(f)) return interpreterProxy->success(false); > > fflush(getFile(f)); > > return 1; > > } > > > > On windows it does > > http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx > > > > sqInt sqFileFlush(SQFile *f) { > > if (!sqFileValid(f)) FAIL(); > > /* note: ignores the return value in case of read-only access */ > > FlushFileBuffers(FILE_HANDLE(f)); > > return 1; > > } > > > > I'll note the api doesn't actually check for errors, give feedback > > or whatever. > > *cough* it could be failing and give you a clue, but you would need > > to resort to FFI to do the file system calls to get the data to > > visualize, or build your own VM with that returns the result. > > > > > > > > On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote: > > > > > Hello all, > > > > > > I'm not sure what to make of this one. I just spent a couple of > > > hours trying to find the "leak" in an algorithm of mine. It was > > > reading roughly 1200 records, claiming to have processed all of > > > them, and yet writing only about 450 rows into an output text file. > > > One clue should have been that the number of output rows was > > > somewhat random; I did not fully appreciate that until I worked > > > around the problme. > > > > > > I tried an explicit #flush - no help. I looked for logic errors > > > and found none. The file was being written to a Windows hosted > > > share mounted by CIFS (which I am learning to view with contempt) > > > from Ubuntu 9.04. You can see where this is going: writing the > > > file locally gave the expected result. > > > > > > Any ideas on how one might further isolate the problem? My trust > > > in Windows is well known<g>; I have never liked shared > > > directories; I _really_ do not like CIFS as compared (reliability > > > wise) to SMBFS; the network between me and the server is in > > > question too (long story). All of that said, file support in > > > Squeak, and hence so far inherited by Pharo, is not the best code > > > I have seen to date, so it is easy to suspect too. Can one argue > > > that since it worked locally, Pharo is not the problem? > > > > > > The little bit that I know of cifs is not encouraging. It sounds > > > as though things moved from an easily killed process into the > > > kernel which shows an almost Windows-like unwillingness to shut > > > down when it cannot see servers. I have found numerous reports of > > > problems copying large files over cifs, and I have enountered them too. > > > > > > Bill > > > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Free forum by Nabble | Edit this page |