Proposed mod to FilePlugin>>primitiveFileAtEnd for non-regular files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposed mod to FilePlugin>>primitiveFileAtEnd for non-regular files

alistairgrant
 
FilePlugin currently splits files in to two groups: 1) Stdio streams and
2) everything else.

To test for the end of file, FilePlugin>>primitiveFileAtEnd:

1) Uses feof() for stdio streams.
2) Compares the current position to the file size for everything else.

This returns the expected results for regular files, but fails for
non-regular files, e.g.:

(FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.

returns an empty array.

On Unix, the proper way to check is to read past the end of the file and
then call feof() to confirm that it is in fact at the end.

Pharo (and I assume, Squeak) has plenty of code that assumes that a file
position >= the file size means that the file is at the end, and as
stated above this generally works for regular files.

This patch modifies FilePlugin>>primitiveFileAtEnd to:

a) Keep the current behaviour of using the file position test for
regular files.
b) Keep the current behaviour of using feof() for stdio streams.
c) Use feof() for non-regular files, e.g. /dev/urandom.

This allows existing code to continue to function, and allows
non-regular files to be tested correctly.

After applying the patch, the example code above answers the expected
result, e.g.:

(FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
" #[179 136 227 226 28 147 197 125]"

On Windows, as far as I can tell, all files are regular, and the
position test is used regularly, so no change is requried.

https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/232
contains the changes.  The CI builds failed, but for different reasons
to do with sista.

If I don't hear any dissenting opinion in a few days I'll integrate this
in.


Cheers,
Alistair
Reply | Threaded
Open this post in threaded view
|

Re: Proposed mod to FilePlugin>>primitiveFileAtEnd for non-regular files

Levente Uzonyi
 

>
> FilePlugin currently splits files in to two groups: 1) Stdio streams and
> 2) everything else.
>
> To test for the end of file, FilePlugin>>primitiveFileAtEnd:
>
> 1) Uses feof() for stdio streams.
> 2) Compares the current position to the file size for everything else.
>
> This returns the expected results for regular files, but fails for
> non-regular files, e.g.:
>
> (FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
>
> returns an empty array.

The FileStream hierachy in Squeak has no issue with the current
implementation, because the code will call the primitive to read from the
file even if #atEnd returns true (which conforms to what you described as
expected usage). E.g.

StandardFileStream readOnlyFileNamed: '/dev/urandom' do: [ :file |
  file binary; next: 8 ]

will return #[192 229 133 67 128 216 51 114].

>
> On Unix, the proper way to check is to read past the end of the file and
> then call feof() to confirm that it is in fact at the end.
>
> Pharo (and I assume, Squeak) has plenty of code that assumes that a file
> position >= the file size means that the file is at the end, and as
> stated above this generally works for regular files.

It seems that the n+1th stream implementation that was incorporated
into Pharo is not compatible with the rest of the stream libraries (the
original Stream hierarchy - as shown above, Nile, Xtreams, and whatever
else is out there I'm not aware of). If it were, then this issue would
have arisen long before.
So, if I were you, I would first check if the new implementation in Pharo
is correct.

Levente

>
> This patch modifies FilePlugin>>primitiveFileAtEnd to:
>
> a) Keep the current behaviour of using the file position test for
> regular files.
> b) Keep the current behaviour of using feof() for stdio streams.
> c) Use feof() for non-regular files, e.g. /dev/urandom.
>
> This allows existing code to continue to function, and allows
> non-regular files to be tested correctly.
>
> After applying the patch, the example code above answers the expected
> result, e.g.:
>
> (FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
> " #[179 136 227 226 28 147 197 125]"
>
> On Windows, as far as I can tell, all files are regular, and the
> position test is used regularly, so no change is requried.
>
> https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/232
> contains the changes.  The CI builds failed, but for different reasons
> to do with sista.
>
> If I don't hear any dissenting opinion in a few days I'll integrate this
> in.
>
>
> Cheers,
> Alistair
Reply | Threaded
Open this post in threaded view
|

Re: Proposed mod to FilePlugin>>primitiveFileAtEnd for non-regular files

alistairgrant
 
Hi Levente,

Right, my nice detailed description is all about the symptoms and not
the actual issue.  Sorry about that.

You're also correct that a new Pharo stream (ZnBufferedReadStream) had
a bug in it, which is in the process of being fixed.

The actual issue is the value returned by #atEnd, which is returning
true as you suggested:


| outputs |

outputs := OrderedCollection new.
StandardFileStream readOnlyFileNamed: '/dev/urandom' do: [ :file |
    outputs add: (file binary; next: 8).
    outputs add: file atEnd.
    outputs add: (file binary; next: 8).
    outputs add: file atEnd.
    ].
outputs
" an OrderedCollection(
    #[80 31 245 49 20 97 163 76]
    true
    #[207 210 106 185 26 86 15 49]
    true)"


The proposed modification means that #atEnd returns the correct value
for non-regular files like the example above (false), while still
maintaining the existing behaviour for regular files and stdio
streams.

Hopefully that is clearer,
Alistair


On 26 March 2018 at 18:16, Levente Uzonyi <[hidden email]> wrote:

>
>>
>> FilePlugin currently splits files in to two groups: 1) Stdio streams and
>> 2) everything else.
>>
>> To test for the end of file, FilePlugin>>primitiveFileAtEnd:
>>
>> 1) Uses feof() for stdio streams.
>> 2) Compares the current position to the file size for everything else.
>>
>> This returns the expected results for regular files, but fails for
>> non-regular files, e.g.:
>>
>> (FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
>>
>> returns an empty array.
>
>
> The FileStream hierachy in Squeak has no issue with the current
> implementation, because the code will call the primitive to read from the
> file even if #atEnd returns true (which conforms to what you described as
> expected usage). E.g.
>
> StandardFileStream readOnlyFileNamed: '/dev/urandom' do: [ :file |
>         file binary; next: 8 ]
>
> will return #[192 229 133 67 128 216 51 114].
>
>>
>> On Unix, the proper way to check is to read past the end of the file and
>> then call feof() to confirm that it is in fact at the end.
>>
>> Pharo (and I assume, Squeak) has plenty of code that assumes that a file
>> position >= the file size means that the file is at the end, and as
>> stated above this generally works for regular files.
>
>
> It seems that the n+1th stream implementation that was incorporated into
> Pharo is not compatible with the rest of the stream libraries (the original
> Stream hierarchy - as shown above, Nile, Xtreams, and whatever else is out
> there I'm not aware of). If it were, then this issue would have arisen long
> before.
> So, if I were you, I would first check if the new implementation in Pharo is
> correct.
>
> Levente
>
>
>>
>> This patch modifies FilePlugin>>primitiveFileAtEnd to:
>>
>> a) Keep the current behaviour of using the file position test for
>> regular files.
>> b) Keep the current behaviour of using feof() for stdio streams.
>> c) Use feof() for non-regular files, e.g. /dev/urandom.
>>
>> This allows existing code to continue to function, and allows
>> non-regular files to be tested correctly.
>>
>> After applying the patch, the example code above answers the expected
>> result, e.g.:
>>
>> (FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
>> " #[179 136 227 226 28 147 197 125]"
>>
>> On Windows, as far as I can tell, all files are regular, and the
>> position test is used regularly, so no change is requried.
>>
>> https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/232
>> contains the changes.  The CI builds failed, but for different reasons
>> to do with sista.
>>
>> If I don't hear any dissenting opinion in a few days I'll integrate this
>> in.
>>
>>
>> Cheers,
>> Alistair
Reply | Threaded
Open this post in threaded view
|

Re: Proposed mod to FilePlugin>>primitiveFileAtEnd for non-regular files

David T. Lewis
In reply to this post by alistairgrant
 
FYI, OSProcess provides this:

OSProcessAccessor>>isAtEndOfFile: anIOHandle
        "Answer whether the file represented by anIOHandle is at end of file, as determined
        by a call to feof(). This is different from StandardFileStream>>primAtEnd: which answers
        true if the file pointer is at the end of the file, but which does not call feof() to
        determine that an end of file condition has occurred. The difference is significant
        if aSqFileStruct represents a pipe or a device file, which may not be positionable
        in the sense of a conventional disk file."

And the primitive for Unix is:

UnixOSProcessPlugin>>primitiveTestEndOfFileFlag
        "Take a struct SQFile from the stack, and call feof(3) to determine if the file has
        reached end of file. The flag is set only by a previous read operation, so end of
        file is not detected until an actual EOF condition has been detected by a read attempt."

Typical usage is a file stream on an OS pipe stream, in which case OSProcess makes
a distiction between reading to end of file (current available data in a stream)
versus reading all available data until the remote end of the pipe is closed (hence
all of the data that is or ever will be available on that stream).

In my opinion, the distinction between "stdio streams" and "everything else" is a
mis-feature, and it presumably entered the FilePlugin to accommodate Windows. The
console streams on Windows are quite fundamentally different from other streams,
and have pre-assigned handle numbers to identify them. This is not the case for
Unix-based systems, which attempt to treat IO streams as similarly as possible,
and which are free to dup() the handles at will.

The general idea is that stdin, stdout, and stderr should behave as reasonably
as possible regardless of whether they are attached to a positional stream on a
disk file, or to a pipe connected to some other process, or to some sort of special
file. Or to say it another way, The stdin/stdout/stderr streams might be connected
to things that are positionable or non-positionable, and you should not try to guess
which behavior is appropriate based on whether that stream happens to be labeled
as a "stdio stream".

Dave


On Mon, Mar 26, 2018 at 05:32:15PM +0200, Alistair Grant wrote:

>  
> FilePlugin currently splits files in to two groups: 1) Stdio streams and
> 2) everything else.
>
> To test for the end of file, FilePlugin>>primitiveFileAtEnd:
>
> 1) Uses feof() for stdio streams.
> 2) Compares the current position to the file size for everything else.
>
> This returns the expected results for regular files, but fails for
> non-regular files, e.g.:
>
> (FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
>
> returns an empty array.
>
> On Unix, the proper way to check is to read past the end of the file and
> then call feof() to confirm that it is in fact at the end.
>
> Pharo (and I assume, Squeak) has plenty of code that assumes that a file
> position >= the file size means that the file is at the end, and as
> stated above this generally works for regular files.
>
> This patch modifies FilePlugin>>primitiveFileAtEnd to:
>
> a) Keep the current behaviour of using the file position test for
> regular files.
> b) Keep the current behaviour of using feof() for stdio streams.
> c) Use feof() for non-regular files, e.g. /dev/urandom.
>
> This allows existing code to continue to function, and allows
> non-regular files to be tested correctly.
>
> After applying the patch, the example code above answers the expected
> result, e.g.:
>
> (FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
> " #[179 136 227 226 28 147 197 125]"
>
> On Windows, as far as I can tell, all files are regular, and the
> position test is used regularly, so no change is requried.
>
> https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/232
> contains the changes.  The CI builds failed, but for different reasons
> to do with sista.
>
> If I don't hear any dissenting opinion in a few days I'll integrate this
> in.
>
>
> Cheers,
> Alistair
Reply | Threaded
Open this post in threaded view
|

Re: Proposed mod to FilePlugin>>primitiveFileAtEnd for non-regular files

alistairgrant
 
On 26 March 2018 at 23:51, David T. Lewis <[hidden email]> wrote:

>
> FYI, OSProcess provides this:
>
> OSProcessAccessor>>isAtEndOfFile: anIOHandle
>         "Answer whether the file represented by anIOHandle is at end of file, as determined
>         by a call to feof(). This is different from StandardFileStream>>primAtEnd: which answers
>         true if the file pointer is at the end of the file, but which does not call feof() to
>         determine that an end of file condition has occurred. The difference is significant
>         if aSqFileStruct represents a pipe or a device file, which may not be positionable
>         in the sense of a conventional disk file."
>
> And the primitive for Unix is:
>
> UnixOSProcessPlugin>>primitiveTestEndOfFileFlag
>         "Take a struct SQFile from the stack, and call feof(3) to determine if the file has
>         reached end of file. The flag is set only by a previous read operation, so end of
>         file is not detected until an actual EOF condition has been detected by a read attempt."
>
> Typical usage is a file stream on an OS pipe stream, in which case OSProcess makes
> a distiction between reading to end of file (current available data in a stream)
> versus reading all available data until the remote end of the pipe is closed (hence
> all of the data that is or ever will be available on that stream).
>
> In my opinion, the distinction between "stdio streams" and "everything else" is a
> mis-feature,

+1

> and it presumably entered the FilePlugin to accommodate Windows. The
> console streams on Windows are quite fundamentally different from other streams,
> and have pre-assigned handle numbers to identify them. This is not the case for
> Unix-based systems, which attempt to treat IO streams as similarly as possible,
> and which are free to dup() the handles at will.
>
> The general idea is that stdin, stdout, and stderr should behave as reasonably
> as possible regardless of whether they are attached to a positional stream on a
> disk file, or to a pipe connected to some other process, or to some sort of special
> file. Or to say it another way, The stdin/stdout/stderr streams might be connected
> to things that are positionable or non-positionable, and you should not try to guess
> which behavior is appropriate based on whether that stream happens to be labeled
> as a "stdio stream".

Right.  Given that the primitive for opening/connecting to a file by
file descriptor is now in to the VM, combined with this proposed patch
it's possible that on Unix we could get rid of the special stdio
stream handling altogether (I haven't gone through all the code to
check).  On Windows it should be possible to move the special handling
out of the VM and in to the image, but again, that is obviously more
work.

Cheers,
Alistair



> On Mon, Mar 26, 2018 at 05:32:15PM +0200, Alistair Grant wrote:
>>
>> FilePlugin currently splits files in to two groups: 1) Stdio streams and
>> 2) everything else.
>>
>> To test for the end of file, FilePlugin>>primitiveFileAtEnd:
>>
>> 1) Uses feof() for stdio streams.
>> 2) Compares the current position to the file size for everything else.
>>
>> This returns the expected results for regular files, but fails for
>> non-regular files, e.g.:
>>
>> (FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
>>
>> returns an empty array.
>>
>> On Unix, the proper way to check is to read past the end of the file and
>> then call feof() to confirm that it is in fact at the end.
>>
>> Pharo (and I assume, Squeak) has plenty of code that assumes that a file
>> position >= the file size means that the file is at the end, and as
>> stated above this generally works for regular files.
>>
>> This patch modifies FilePlugin>>primitiveFileAtEnd to:
>>
>> a) Keep the current behaviour of using the file position test for
>> regular files.
>> b) Keep the current behaviour of using feof() for stdio streams.
>> c) Use feof() for non-regular files, e.g. /dev/urandom.
>>
>> This allows existing code to continue to function, and allows
>> non-regular files to be tested correctly.
>>
>> After applying the patch, the example code above answers the expected
>> result, e.g.:
>>
>> (FileSystem / 'dev' / 'urandom') binaryReadStream next: 8.
>> " #[179 136 227 226 28 147 197 125]"
>>
>> On Windows, as far as I can tell, all files are regular, and the
>> position test is used regularly, so no change is requried.
>>
>> https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/232
>> contains the changes.  The CI builds failed, but for different reasons
>> to do with sista.
>>
>> If I don't hear any dissenting opinion in a few days I'll integrate this
>> in.
>>
>>
>> Cheers,
>> Alistair