Shared directories: The bug is quicker than the eye

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Shared directories: The bug is quicker than the eye

Schwab,Wilhelm K
Hello all,

I'm not sure what to make of this one.  I just spent a couple of hours trying to find the "leak" in an algorithm of mine.  It was reading roughly 1200 records, claiming to have processed all of them, and yet writing only about 450 rows into an output text file.  One clue should have been that the number of output rows was somewhat random; I did not fully appreciate that until I worked around the problme.

I tried an explicit #flush - no help.  I looked for logic errors and found none.  The file was being written to a Windows hosted share mounted by CIFS (which I am learning to view with contempt) from Ubuntu 9.04.  You can see where this is going: writing the file locally gave the expected result.

Any ideas on how one might further isolate the problem?  My trust in Windows is well known<g>; I have never liked shared directories; I _really_ do not like CIFS as compared (reliability wise) to SMBFS; the network between me and the server is in question too (long story).  All of that said, file support in Squeak, and hence so far inherited by Pharo, is not the best code I have seen to date, so it is easy to suspect too.  Can one argue that since it worked locally, Pharo is not the problem?

The little bit that I know of cifs is not encouraging.  It sounds as though things moved from an easily killed process into the kernel which shows an almost Windows-like unwillingness to shut down when it cannot see servers.  I have found numerous reports of problems copying large files over cifs, and I have enountered them too.

Bill

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Shared directories: The bug is quicker than the eye

Stéphane Ducasse
I do not really know if this is related but yesterday I got a problem  
with a student saving MCZ files on usb disc.
May be the OS was using a buffer but none of the files where wirtten  
there, when surprisingly the changes were saved.

Stef

On Oct 23, 2009, at 2:31 AM, Schwab,Wilhelm K wrote:

> Hello all,
>
> I'm not sure what to make of this one.  I just spent a couple of  
> hours trying to find the "leak" in an algorithm of mine.  It was  
> reading roughly 1200 records, claiming to have processed all of  
> them, and yet writing only about 450 rows into an output text file.  
> One clue should have been that the number of output rows was  
> somewhat random; I did not fully appreciate that until I worked  
> around the problme.
>
> I tried an explicit #flush - no help.  I looked for logic errors and  
> found none.  The file was being written to a Windows hosted share  
> mounted by CIFS (which I am learning to view with contempt) from  
> Ubuntu 9.04.  You can see where this is going: writing the file  
> locally gave the expected result.
>
> Any ideas on how one might further isolate the problem?  My trust in  
> Windows is well known<g>; I have never liked shared directories; I  
> _really_ do not like CIFS as compared (reliability wise) to SMBFS;  
> the network between me and the server is in question too (long  
> story).  All of that said, file support in Squeak, and hence so far  
> inherited by Pharo, is not the best code I have seen to date, so it  
> is easy to suspect too.  Can one argue that since it worked locally,  
> Pharo is not the problem?
>
> The little bit that I know of cifs is not encouraging.  It sounds as  
> though things moved from an easily killed process into the kernel  
> which shows an almost Windows-like unwillingness to shut down when  
> it cannot see servers.  I have found numerous reports of problems  
> copying large files over cifs, and I have enountered them too.
>
> Bill
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Shared directories: The bug is quicker than the eye

Schwab,Wilhelm K
Stef,

If you can think of any ways to reproduce it, let me know - I have reliably bad network connectivity at your service :)

Bill
 

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Stéphane Ducasse
Sent: Friday, October 23, 2009 2:39 AM
To: [hidden email]
Subject: Re: [Pharo-project] Shared directories: The bug is quicker than the eye

I do not really know if this is related but yesterday I got a problem with a student saving MCZ files on usb disc.
May be the OS was using a buffer but none of the files where wirtten there, when surprisingly the changes were saved.

Stef

On Oct 23, 2009, at 2:31 AM, Schwab,Wilhelm K wrote:

> Hello all,
>
> I'm not sure what to make of this one.  I just spent a couple of hours
> trying to find the "leak" in an algorithm of mine.  It was reading
> roughly 1200 records, claiming to have processed all of
> them, and yet writing only about 450 rows into an output text file.  
> One clue should have been that the number of output rows was somewhat
> random; I did not fully appreciate that until I worked around the
> problme.
>
> I tried an explicit #flush - no help.  I looked for logic errors and
> found none.  The file was being written to a Windows hosted share
> mounted by CIFS (which I am learning to view with contempt) from
> Ubuntu 9.04.  You can see where this is going: writing the file
> locally gave the expected result.
>
> Any ideas on how one might further isolate the problem?  My trust in
> Windows is well known<g>; I have never liked shared directories; I
> _really_ do not like CIFS as compared (reliability wise) to SMBFS; the
> network between me and the server is in question too (long story).  
> All of that said, file support in Squeak, and hence so far inherited
> by Pharo, is not the best code I have seen to date, so it is easy to
> suspect too.  Can one argue that since it worked locally, Pharo is not
> the problem?
>
> The little bit that I know of cifs is not encouraging.  It sounds as
> though things moved from an easily killed process into the kernel
> which shows an almost Windows-like unwillingness to shut down when it
> cannot see servers.  I have found numerous reports of problems copying
> large files over cifs, and I have enountered them too.
>
> Bill
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Shared directories: The bug is quicker than the eye

johnmci
In reply to this post by Schwab,Wilhelm K
If you invoke the flush primitive then it does this on unix/linux/mac-
osx/iPhone
http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man3/fflush.3.html
&
http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man2/write.2.html#//apple_ref/doc/man/2/write

sqInt sqFileFlush(SQFile *f) {
        /* Return the length of the given file. */

        if (!sqFileValid(f)) return interpreterProxy->success(false);
        fflush(getFile(f));
        return 1;
}

On windows it does
http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx

sqInt sqFileFlush(SQFile *f) {
   if (!sqFileValid(f)) FAIL();
   /* note: ignores the return value in case of read-only access */
   FlushFileBuffers(FILE_HANDLE(f));
   return 1;
}

I'll note the api doesn't actually check for errors, give feedback or  
whatever.
*cough* it could be failing and give you a clue, but you would need to  
resort to FFI to do the file system calls to
get the data to visualize, or build your own VM with that returns the  
result.



On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote:

> Hello all,
>
> I'm not sure what to make of this one.  I just spent a couple of  
> hours trying to find the "leak" in an algorithm of mine.  It was  
> reading roughly 1200 records, claiming to have processed all of  
> them, and yet writing only about 450 rows into an output text file.  
> One clue should have been that the number of output rows was  
> somewhat random; I did not fully appreciate that until I worked  
> around the problme.
>
> I tried an explicit #flush - no help.  I looked for logic errors and  
> found none.  The file was being written to a Windows hosted share  
> mounted by CIFS (which I am learning to view with contempt) from  
> Ubuntu 9.04.  You can see where this is going: writing the file  
> locally gave the expected result.
>
> Any ideas on how one might further isolate the problem?  My trust in  
> Windows is well known<g>; I have never liked shared directories; I  
> _really_ do not like CIFS as compared (reliability wise) to SMBFS;  
> the network between me and the server is in question too (long  
> story).  All of that said, file support in Squeak, and hence so far  
> inherited by Pharo, is not the best code I have seen to date, so it  
> is easy to suspect too.  Can one argue that since it worked locally,  
> Pharo is not the problem?
>
> The little bit that I know of cifs is not encouraging.  It sounds as  
> though things moved from an easily killed process into the kernel  
> which shows an almost Windows-like unwillingness to shut down when  
> it cannot see servers.  I have found numerous reports of problems  
> copying large files over cifs, and I have enountered them too.
>
> Bill
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>   Twitter:  
squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================





_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Shared directories: The bug is quicker than the eye

Schwab,Wilhelm K
John,

That sounds badly, badly, broken.  Is it "only" #flush that suffers, or will any of the I/O fail silently?

Bill



-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of John M McIntosh
Sent: Friday, October 23, 2009 1:57 PM
To: [hidden email]
Subject: Re: [Pharo-project] Shared directories: The bug is quicker than the eye

If you invoke the flush primitive then it does this on unix/linux/mac- osx/iPhone http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man3/fflush.3.html
&
http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man2/write.2.html#//apple_ref/doc/man/2/write

sqInt sqFileFlush(SQFile *f) {
        /* Return the length of the given file. */

        if (!sqFileValid(f)) return interpreterProxy->success(false);
        fflush(getFile(f));
        return 1;
}

On windows it does
http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx

sqInt sqFileFlush(SQFile *f) {
   if (!sqFileValid(f)) FAIL();
   /* note: ignores the return value in case of read-only access */
   FlushFileBuffers(FILE_HANDLE(f));
   return 1;
}

I'll note the api doesn't actually check for errors, give feedback or whatever.
*cough* it could be failing and give you a clue, but you would need to resort to FFI to do the file system calls to get the data to visualize, or build your own VM with that returns the result.



On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote:

> Hello all,
>
> I'm not sure what to make of this one.  I just spent a couple of hours
> trying to find the "leak" in an algorithm of mine.  It was reading
> roughly 1200 records, claiming to have processed all of
> them, and yet writing only about 450 rows into an output text file.  
> One clue should have been that the number of output rows was somewhat
> random; I did not fully appreciate that until I worked around the
> problme.
>
> I tried an explicit #flush - no help.  I looked for logic errors and
> found none.  The file was being written to a Windows hosted share
> mounted by CIFS (which I am learning to view with contempt) from
> Ubuntu 9.04.  You can see where this is going: writing the file
> locally gave the expected result.
>
> Any ideas on how one might further isolate the problem?  My trust in
> Windows is well known<g>; I have never liked shared directories; I
> _really_ do not like CIFS as compared (reliability wise) to SMBFS; the
> network between me and the server is in question too (long story).  
> All of that said, file support in Squeak, and hence so far inherited
> by Pharo, is not the best code I have seen to date, so it is easy to
> suspect too.  Can one argue that since it worked locally, Pharo is not
> the problem?
>
> The little bit that I know of cifs is not encouraging.  It sounds as
> though things moved from an easily killed process into the kernel
> which shows an almost Windows-like unwillingness to shut down when it
> cannot see servers.  I have found numerous reports of problems copying
> large files over cifs, and I have enountered them too.
>
> Bill
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>   Twitter:  
squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================





_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Shared directories: The bug is quicker than the eye

johnmci
Well the rate of change in the file i/o plugin is rather glacial.

The write code on unix systems is below, you'll note the magic,  
"lastOp", "position" and a seek or two to ensure where you
thought you wrote to, is where you wrote to. Also setSize() is busy  
updating the file size, which later fools you if you are reading
from a shared file because you don't actually ask for the real file  
size, you get back the size we think it was in the past.

However the fwrite() does return the bytes written and there is a  
check to see if it matches expectations, and
if not this it setups the primitive call to fail.    So if the write  
failed it would run the primitive failure call smalltalk code
what that does, or who handles the exception raised if any, I have no  
idea...



size_t sqFileWriteFromAt(SQFile *f, size_t count, char*  
byteArrayIndex, size_t startIndex) {
        /* Write count bytes to the given writable file starting at startIndex
           in the given byteArray. (See comment in sqFileReadIntoAt for  
interpretation
           of byteArray and startIndex).
        */

        char *src;
        size_t bytesWritten;
        squeakFileOffsetType position;
        FILE *file;

        if (!(sqFileValid(f) && f->writable)) return interpreterProxy->success
(false);
        file= getFile(f);
        if (f->lastOp == READ_OP) fseek(file, 0, SEEK_CUR);  /* seek between  
reading and writing */
        src = byteArrayIndex + startIndex;
        bytesWritten = fwrite(src, 1, count, file);

        position = ftell(file);
        if (position > getSize(f)) {
                setSize(f, position);  /* update file size */
        }

        if (bytesWritten != count) {
                interpreterProxy->success(false);
        }
        f->lastOp = WRITE_OP;
        return bytesWritten;
}

On 2009-10-23, at 12:43 PM, Schwab,Wilhelm K wrote:

> John,
>
> That sounds badly, badly, broken.  Is it "only" #flush that suffers,  
> or will any of the I/O fail silently?
>
> Bill
>
>
>
> -----Original Message-----
> From: [hidden email] [mailto:pharo-
> [hidden email]] On Behalf Of John M McIntosh
> Sent: Friday, October 23, 2009 1:57 PM
> To: [hidden email]
> Subject: Re: [Pharo-project] Shared directories: The bug is quicker  
> than the eye
>
> If you invoke the flush primitive then it does this on unix/linux/
> mac- osx/iPhone http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man3/fflush.3.html
> &
> http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man2/write.2.html#//apple_ref/doc/man/2/write
>
> sqInt sqFileFlush(SQFile *f) {
> /* Return the length of the given file. */
>
> if (!sqFileValid(f)) return interpreterProxy->success(false);
> fflush(getFile(f));
> return 1;
> }
>
> On windows it does
> http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx
>
> sqInt sqFileFlush(SQFile *f) {
>   if (!sqFileValid(f)) FAIL();
>   /* note: ignores the return value in case of read-only access */
>   FlushFileBuffers(FILE_HANDLE(f));
>   return 1;
> }
>
> I'll note the api doesn't actually check for errors, give feedback  
> or whatever.
> *cough* it could be failing and give you a clue, but you would need  
> to resort to FFI to do the file system calls to get the data to  
> visualize, or build your own VM with that returns the result.
>
>
>
> On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote:
>
>> Hello all,
>>
>> I'm not sure what to make of this one.  I just spent a couple of  
>> hours
>> trying to find the "leak" in an algorithm of mine.  It was reading
>> roughly 1200 records, claiming to have processed all of
>> them, and yet writing only about 450 rows into an output text file.
>> One clue should have been that the number of output rows was somewhat
>> random; I did not fully appreciate that until I worked around the
>> problme.
>>
>> I tried an explicit #flush - no help.  I looked for logic errors and
>> found none.  The file was being written to a Windows hosted share
>> mounted by CIFS (which I am learning to view with contempt) from
>> Ubuntu 9.04.  You can see where this is going: writing the file
>> locally gave the expected result.
>>
>> Any ideas on how one might further isolate the problem?  My trust in
>> Windows is well known<g>; I have never liked shared directories; I
>> _really_ do not like CIFS as compared (reliability wise) to SMBFS;  
>> the
>> network between me and the server is in question too (long story).
>> All of that said, file support in Squeak, and hence so far inherited
>> by Pharo, is not the best code I have seen to date, so it is easy to
>> suspect too.  Can one argue that since it worked locally, Pharo is  
>> not
>> the problem?
>>
>> The little bit that I know of cifs is not encouraging.  It sounds as
>> though things moved from an easily killed process into the kernel
>> which shows an almost Windows-like unwillingness to shut down when it
>> cannot see servers.  I have found numerous reports of problems  
>> copying
>> large files over cifs, and I have enountered them too.
>>
>> Bill
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
> --
> =
> =
> =
> =
> =
> ======================================================================
> John M. McIntosh <[hidden email]>   Twitter:
> squeaker68882
> Corporate Smalltalk Consulting Ltd.  http://
> www.smalltalkconsulting.com
> =
> =
> =
> =
> =
> ======================================================================
>
>
>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>   Twitter:  
squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================





_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Shared directories: The bug is quicker than the eye

David T. Lewis
In reply to this post by johnmci
I don't know if this helps, but for purposes of debugging here is
a way to get at the error code from a flush(). No doubt I wrote the
primitive by accident, not noticing that there was already a
primitiveFileFlush in the FilePlugin.

  aStream := FileStream readOnlyFileNamed: 'foo.tmp'.
  result := OSProcess accessor flushExternalStream: aStream fileID.
  errorString := OSProcess accessor primErrorMessageAt: result.

Dave


On Fri, Oct 23, 2009 at 11:57:06AM -0700, John M McIntosh wrote:

> If you invoke the flush primitive then it does this on unix/linux/mac-
> osx/iPhone
> http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man3/fflush.3.html
> &
> http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man2/write.2.html#//apple_ref/doc/man/2/write
>
> sqInt sqFileFlush(SQFile *f) {
> /* Return the length of the given file. */
>
> if (!sqFileValid(f)) return interpreterProxy->success(false);
> fflush(getFile(f));
> return 1;
> }
>
> On windows it does
> http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx
>
> sqInt sqFileFlush(SQFile *f) {
>    if (!sqFileValid(f)) FAIL();
>    /* note: ignores the return value in case of read-only access */
>    FlushFileBuffers(FILE_HANDLE(f));
>    return 1;
> }
>
> I'll note the api doesn't actually check for errors, give feedback or  
> whatever.
> *cough* it could be failing and give you a clue, but you would need to  
> resort to FFI to do the file system calls to
> get the data to visualize, or build your own VM with that returns the  
> result.
>
>
>
> On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote:
>
> > Hello all,
> >
> > I'm not sure what to make of this one.  I just spent a couple of  
> > hours trying to find the "leak" in an algorithm of mine.  It was  
> > reading roughly 1200 records, claiming to have processed all of  
> > them, and yet writing only about 450 rows into an output text file.  
> > One clue should have been that the number of output rows was  
> > somewhat random; I did not fully appreciate that until I worked  
> > around the problme.
> >
> > I tried an explicit #flush - no help.  I looked for logic errors and  
> > found none.  The file was being written to a Windows hosted share  
> > mounted by CIFS (which I am learning to view with contempt) from  
> > Ubuntu 9.04.  You can see where this is going: writing the file  
> > locally gave the expected result.
> >
> > Any ideas on how one might further isolate the problem?  My trust in  
> > Windows is well known<g>; I have never liked shared directories; I  
> > _really_ do not like CIFS as compared (reliability wise) to SMBFS;  
> > the network between me and the server is in question too (long  
> > story).  All of that said, file support in Squeak, and hence so far  
> > inherited by Pharo, is not the best code I have seen to date, so it  
> > is easy to suspect too.  Can one argue that since it worked locally,  
> > Pharo is not the problem?
> >
> > The little bit that I know of cifs is not encouraging.  It sounds as  
> > though things moved from an easily killed process into the kernel  
> > which shows an almost Windows-like unwillingness to shut down when  
> > it cannot see servers.  I have found numerous reports of problems  
> > copying large files over cifs, and I have enountered them too.
> >
> > Bill
> >

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Shared directories: The bug is quicker than the eye

Schwab,Wilhelm K
Dave,

I'm not a fully trained penguin herder yet, so I have to ask a dumb question: is the idea to connect to the same file that went wrong and get an error condition after the fact, or are you hoping that opening and flushing will cause an error by itself?  I suspect the problem is very real, and more "random" than that could hope to uncover.

Bill


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of David T. Lewis
Sent: Sunday, October 25, 2009 3:27 PM
To: [hidden email]; [hidden email]
Subject: Re: [Pharo-project] Shared directories: The bug is quicker than the eye

I don't know if this helps, but for purposes of debugging here is a way to get at the error code from a flush(). No doubt I wrote the primitive by accident, not noticing that there was already a primitiveFileFlush in the FilePlugin.

  aStream := FileStream readOnlyFileNamed: 'foo.tmp'.
  result := OSProcess accessor flushExternalStream: aStream fileID.
  errorString := OSProcess accessor primErrorMessageAt: result.

Dave


On Fri, Oct 23, 2009 at 11:57:06AM -0700, John M McIntosh wrote:

> If you invoke the flush primitive then it does this on unix/linux/mac-
> osx/iPhone
> http://developer.apple.com/mac/library/documentation/Darwin/Reference/
> ManPages/man3/fflush.3.html
> &
> http://developer.apple.com/mac/library/documentation/Darwin/Reference/
> ManPages/man2/write.2.html#//apple_ref/doc/man/2/write
>
> sqInt sqFileFlush(SQFile *f) {
> /* Return the length of the given file. */
>
> if (!sqFileValid(f)) return interpreterProxy->success(false);
> fflush(getFile(f));
> return 1;
> }
>
> On windows it does
> http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx
>
> sqInt sqFileFlush(SQFile *f) {
>    if (!sqFileValid(f)) FAIL();
>    /* note: ignores the return value in case of read-only access */
>    FlushFileBuffers(FILE_HANDLE(f));
>    return 1;
> }
>
> I'll note the api doesn't actually check for errors, give feedback or
> whatever.
> *cough* it could be failing and give you a clue, but you would need to
> resort to FFI to do the file system calls to get the data to
> visualize, or build your own VM with that returns the result.
>
>
>
> On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote:
>
> > Hello all,
> >
> > I'm not sure what to make of this one.  I just spent a couple of
> > hours trying to find the "leak" in an algorithm of mine.  It was
> > reading roughly 1200 records, claiming to have processed all of
> > them, and yet writing only about 450 rows into an output text file.  
> > One clue should have been that the number of output rows was
> > somewhat random; I did not fully appreciate that until I worked
> > around the problme.
> >
> > I tried an explicit #flush - no help.  I looked for logic errors and
> > found none.  The file was being written to a Windows hosted share
> > mounted by CIFS (which I am learning to view with contempt) from
> > Ubuntu 9.04.  You can see where this is going: writing the file
> > locally gave the expected result.
> >
> > Any ideas on how one might further isolate the problem?  My trust in
> > Windows is well known<g>; I have never liked shared directories; I
> > _really_ do not like CIFS as compared (reliability wise) to SMBFS;
> > the network between me and the server is in question too (long
> > story).  All of that said, file support in Squeak, and hence so far
> > inherited by Pharo, is not the best code I have seen to date, so it
> > is easy to suspect too.  Can one argue that since it worked locally,
> > Pharo is not the problem?
> >
> > The little bit that I know of cifs is not encouraging.  It sounds as
> > though things moved from an easily killed process into the kernel
> > which shows an almost Windows-like unwillingness to shut down when
> > it cannot see servers.  I have found numerous reports of problems
> > copying large files over cifs, and I have enountered them too.
> >
> > Bill
> >

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Shared directories: The bug is quicker than the eye

David T. Lewis
On any platform that supports the C library fflush(), there is numeric
error number ("errno") that is set if the fflush() call does not succeed
(type "man fflush" on your Linux or OS X console for details).

John pointed out that the #flush method does not provide you with the
errno value in the event of a failure, so you cannot directly check the
errno value to see if there was a problem. I was showing a way that this
could be done using some OSProcess calls rather than the normal #flush.
My assumption was that you have some kind of file stream in your code,
and that you would do the OSProcess thing instead of #flush just for
purposes of debugging the problem.

My personal hunch is that you are not likely to see any failure from
the #flush calls, even if you are doing the #flush and the data seems
to be flushed down the bit bucket. The reason is that the result of the
#flush will appear successful as long as the C library fflush() has
handed the data over to the operating system. On networked file systems
there is a long way to go between the time the data leaves your program
(the VM doing fflush()) and the time that something actually lands
on the spinning platter. So you can try checking the result of the
#flush calls to be sure that they are not failing, but I'm guessing
that you won't see any failures.

HTH,

Dave

On Wed, Oct 28, 2009 at 03:10:00PM -0400, Schwab,Wilhelm K wrote:

> Dave,
>
> I'm not a fully trained penguin herder yet, so I have to ask a dumb question: is the idea to connect to the same file that went wrong and get an error condition after the fact, or are you hoping that opening and flushing will cause an error by itself?  I suspect the problem is very real, and more "random" than that could hope to uncover.
>
> Bill
>
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of David T. Lewis
> Sent: Sunday, October 25, 2009 3:27 PM
> To: [hidden email]; [hidden email]
> Subject: Re: [Pharo-project] Shared directories: The bug is quicker than the eye
>
> I don't know if this helps, but for purposes of debugging here is a way to get at the error code from a flush(). No doubt I wrote the primitive by accident, not noticing that there was already a primitiveFileFlush in the FilePlugin.
>
>   aStream := FileStream readOnlyFileNamed: 'foo.tmp'.
>   result := OSProcess accessor flushExternalStream: aStream fileID.
>   errorString := OSProcess accessor primErrorMessageAt: result.
>
> Dave
>
>
> On Fri, Oct 23, 2009 at 11:57:06AM -0700, John M McIntosh wrote:
> > If you invoke the flush primitive then it does this on unix/linux/mac-
> > osx/iPhone
> > http://developer.apple.com/mac/library/documentation/Darwin/Reference/
> > ManPages/man3/fflush.3.html
> > &
> > http://developer.apple.com/mac/library/documentation/Darwin/Reference/
> > ManPages/man2/write.2.html#//apple_ref/doc/man/2/write
> >
> > sqInt sqFileFlush(SQFile *f) {
> > /* Return the length of the given file. */
> >
> > if (!sqFileValid(f)) return interpreterProxy->success(false);
> > fflush(getFile(f));
> > return 1;
> > }
> >
> > On windows it does
> > http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx
> >
> > sqInt sqFileFlush(SQFile *f) {
> >    if (!sqFileValid(f)) FAIL();
> >    /* note: ignores the return value in case of read-only access */
> >    FlushFileBuffers(FILE_HANDLE(f));
> >    return 1;
> > }
> >
> > I'll note the api doesn't actually check for errors, give feedback or
> > whatever.
> > *cough* it could be failing and give you a clue, but you would need to
> > resort to FFI to do the file system calls to get the data to
> > visualize, or build your own VM with that returns the result.
> >
> >
> >
> > On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote:
> >
> > > Hello all,
> > >
> > > I'm not sure what to make of this one.  I just spent a couple of
> > > hours trying to find the "leak" in an algorithm of mine.  It was
> > > reading roughly 1200 records, claiming to have processed all of
> > > them, and yet writing only about 450 rows into an output text file.  
> > > One clue should have been that the number of output rows was
> > > somewhat random; I did not fully appreciate that until I worked
> > > around the problme.
> > >
> > > I tried an explicit #flush - no help.  I looked for logic errors and
> > > found none.  The file was being written to a Windows hosted share
> > > mounted by CIFS (which I am learning to view with contempt) from
> > > Ubuntu 9.04.  You can see where this is going: writing the file
> > > locally gave the expected result.
> > >
> > > Any ideas on how one might further isolate the problem?  My trust in
> > > Windows is well known<g>; I have never liked shared directories; I
> > > _really_ do not like CIFS as compared (reliability wise) to SMBFS;
> > > the network between me and the server is in question too (long
> > > story).  All of that said, file support in Squeak, and hence so far
> > > inherited by Pharo, is not the best code I have seen to date, so it
> > > is easy to suspect too.  Can one argue that since it worked locally,
> > > Pharo is not the problem?
> > >
> > > The little bit that I know of cifs is not encouraging.  It sounds as
> > > though things moved from an easily killed process into the kernel
> > > which shows an almost Windows-like unwillingness to shut down when
> > > it cannot see servers.  I have found numerous reports of problems
> > > copying large files over cifs, and I have enountered them too.
> > >
> > > Bill
> > >
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Shared directories: The bug is quicker than the eye

Schwab,Wilhelm K
Dave,

I am pretty certain I am *not* following all of this, but it is something that I will hopefully check in the near future.  I have been amazed at Squeak's willingness to suppress information about errors in other situations, so this is worth a careful look.

Bill



-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of David T. Lewis
Sent: Wednesday, October 28, 2009 5:20 PM
To: [hidden email]
Subject: Re: [Pharo-project] Shared directories: The bug is quicker than the eye

On any platform that supports the C library fflush(), there is numeric error number ("errno") that is set if the fflush() call does not succeed (type "man fflush" on your Linux or OS X console for details).

John pointed out that the #flush method does not provide you with the errno value in the event of a failure, so you cannot directly check the errno value to see if there was a problem. I was showing a way that this could be done using some OSProcess calls rather than the normal #flush.
My assumption was that you have some kind of file stream in your code, and that you would do the OSProcess thing instead of #flush just for purposes of debugging the problem.

My personal hunch is that you are not likely to see any failure from the #flush calls, even if you are doing the #flush and the data seems to be flushed down the bit bucket. The reason is that the result of the #flush will appear successful as long as the C library fflush() has handed the data over to the operating system. On networked file systems there is a long way to go between the time the data leaves your program (the VM doing fflush()) and the time that something actually lands on the spinning platter. So you can try checking the result of the #flush calls to be sure that they are not failing, but I'm guessing that you won't see any failures.

HTH,

Dave

On Wed, Oct 28, 2009 at 03:10:00PM -0400, Schwab,Wilhelm K wrote:

> Dave,
>
> I'm not a fully trained penguin herder yet, so I have to ask a dumb question: is the idea to connect to the same file that went wrong and get an error condition after the fact, or are you hoping that opening and flushing will cause an error by itself?  I suspect the problem is very real, and more "random" than that could hope to uncover.
>
> Bill
>
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> David T. Lewis
> Sent: Sunday, October 25, 2009 3:27 PM
> To: [hidden email];
> [hidden email]
> Subject: Re: [Pharo-project] Shared directories: The bug is quicker
> than the eye
>
> I don't know if this helps, but for purposes of debugging here is a way to get at the error code from a flush(). No doubt I wrote the primitive by accident, not noticing that there was already a primitiveFileFlush in the FilePlugin.
>
>   aStream := FileStream readOnlyFileNamed: 'foo.tmp'.
>   result := OSProcess accessor flushExternalStream: aStream fileID.
>   errorString := OSProcess accessor primErrorMessageAt: result.
>
> Dave
>
>
> On Fri, Oct 23, 2009 at 11:57:06AM -0700, John M McIntosh wrote:
> > If you invoke the flush primitive then it does this on
> > unix/linux/mac- osx/iPhone
> > http://developer.apple.com/mac/library/documentation/Darwin/Referenc
> > e/
> > ManPages/man3/fflush.3.html
> > &
> > http://developer.apple.com/mac/library/documentation/Darwin/Referenc
> > e/ ManPages/man2/write.2.html#//apple_ref/doc/man/2/write
> >
> > sqInt sqFileFlush(SQFile *f) {
> > /* Return the length of the given file. */
> >
> > if (!sqFileValid(f)) return interpreterProxy->success(false);
> > fflush(getFile(f));
> > return 1;
> > }
> >
> > On windows it does
> > http://msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx
> >
> > sqInt sqFileFlush(SQFile *f) {
> >    if (!sqFileValid(f)) FAIL();
> >    /* note: ignores the return value in case of read-only access */
> >    FlushFileBuffers(FILE_HANDLE(f));
> >    return 1;
> > }
> >
> > I'll note the api doesn't actually check for errors, give feedback
> > or whatever.
> > *cough* it could be failing and give you a clue, but you would need
> > to resort to FFI to do the file system calls to get the data to
> > visualize, or build your own VM with that returns the result.
> >
> >
> >
> > On 2009-10-22, at 5:31 PM, Schwab,Wilhelm K wrote:
> >
> > > Hello all,
> > >
> > > I'm not sure what to make of this one.  I just spent a couple of
> > > hours trying to find the "leak" in an algorithm of mine.  It was
> > > reading roughly 1200 records, claiming to have processed all of
> > > them, and yet writing only about 450 rows into an output text file.  
> > > One clue should have been that the number of output rows was
> > > somewhat random; I did not fully appreciate that until I worked
> > > around the problme.
> > >
> > > I tried an explicit #flush - no help.  I looked for logic errors
> > > and found none.  The file was being written to a Windows hosted
> > > share mounted by CIFS (which I am learning to view with contempt)
> > > from Ubuntu 9.04.  You can see where this is going: writing the
> > > file locally gave the expected result.
> > >
> > > Any ideas on how one might further isolate the problem?  My trust
> > > in Windows is well known<g>; I have never liked shared
> > > directories; I _really_ do not like CIFS as compared (reliability
> > > wise) to SMBFS; the network between me and the server is in
> > > question too (long story).  All of that said, file support in
> > > Squeak, and hence so far inherited by Pharo, is not the best code
> > > I have seen to date, so it is easy to suspect too.  Can one argue
> > > that since it worked locally, Pharo is not the problem?
> > >
> > > The little bit that I know of cifs is not encouraging.  It sounds
> > > as though things moved from an easily killed process into the
> > > kernel which shows an almost Windows-like unwillingness to shut
> > > down when it cannot see servers.  I have found numerous reports of
> > > problems copying large files over cifs, and I have enountered them too.
> > >
> > > Bill
> > >
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project