StdioFileStream and text files (long)

Previous Topic Next Topic
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view

StdioFileStream and text files (long)

Stefan Schmiedl

I have a problem with reading a text file.

My (failing) test case opens a file (exported from some other app)
and checks the file length.

  self assert: (OffaReader on: '010007') contents size = 177935.

The method being tested is:

OffaReader class>>on: aString
  ^self new
    stream: (StdioFileStream read: (self buildPathTo: aString) type: #text)

The message OffaReader>>contents above just returns the

buildPathTo packages the quoted number into its pathname:

OffaReader class>>buildPathTo: aFilename

  | fixed found |
  fixed := (aFilename , '________' leftString: 8) , '.AN~'.
  found := File
             locateFilename: fixed
             in: (OrderedCollection with: 'c:\somepath\').
  fixed size = found size
    ifTrue: [ FileException signalOn: aFilename ]
    ifFalse: [ ^found ]

Now consider the exception: a Signal('End of stream').

I checked the implementation:

  "Answer a <String> or <ByteArray> containing the complete contents of the file
  accessed by the receiver."

  ^self reset; next: self size

Huh? Reading the amount of bytes in the file causes 'End of stream'?
More info:

StdioFileStream>>next:into:startingAt: has two interesting local variables:
count = 177935
read  = 177785

ok, let's count the lines:
> wc -l Projekte/roth/Web66-Konvertierung/010007__.AN~
    150 Projekte/roth/Web66-Konvertierung/010007__.AN~

So it seems that we have the culprit.

What is really funny, is the following: Some months ago, I noticed that
StdioFileStream class>>read:text: has some attributes switched:

Original image: mode: (aBoolean ifTrue: ['rb'] ifFalse: ['rt']))
My image:       mode: (aBoolean ifTrue: ['rt'] ifFalse: ['rb']))

Since the strings get passed on to the C runtime library I figured that
since aBoolean = true for text files, the switch was ok. One of the Dolphin
trainers confirmed this, IIRC. And now finally the Dolphin strikes back.
Because, when I restore the original setting, my TestRunner shows green!

So should I use the "wrong" types for reading text files?
Should I meddle with the "read" counter above?
Should I avoid StdioFileStream? If so, how do I read files?

Could one of those running Dolphin 5 try this?



Stefan Schmiedl
EDV-Beratung, Programmierung, Schulung
Loreleystr. 5, 94315 Straubing, Germany
Tel. (0 94 21) 74 01 06, Fax (0 94 21) 74 01 21
Public Key:

shhhh ... I can't hear my code!

Reply | Threaded
Open this post in threaded view

Re: StdioFileStream and text files (long)

Ian Bartholomew-8

> So should I use the "wrong" types for reading text files?
> Should I meddle with the "read" counter above?
> Should I avoid StdioFileStream? If so, how do I read files?

I've just done a little trial and the problem is that the CRTLibrary that
StdioFileStream uses is converting all cr-lf pairs in an incoming text file
into single lf characters before Dolphin gets to see them - hence the 150
difference.  I guessed that might be the problem but I had to test the
theory because I have never actually used this class, and was wondering why
you did?  Dolphin's FileStream class is as easy to use, has more facilities
and is quite efficient - although speed is the one area where the run time
library version might possibly have an advantage?.  If you look in the image
you will see that FileStream is used _much_ more than StdioFileStream which
is, usually, a sign that one class is preferred over another.

I haven't checked it thoroughly but it looks like you can just replace
StdioFileStream with FileStream, anything the former can do is implemented
in the latter?

I would also guessed that opening the file in binary mode, which you do when
you switch back to the incorrect version of StdioFileStream
class>>read:text:, worked as the CRTLibrary function is notified that a file
is opened in binary mode and this, presumably inhibits the cr removal.


Reply | Threaded
Open this post in threaded view

Re: StdioFileStream and text files (long)

Stefan Schmiedl
Hi Ian,

how's the weather at your place?
In front of my window cherry trees are fully blossoming :-)

On Sun, 21 Apr 2002 18:02:58 +0100,
Ian Bartholomew <[hidden email]> wrote:

> Stefan,
>> So should I use the "wrong" types for reading text files?
>> Should I meddle with the "read" counter above?
>> Should I avoid StdioFileStream? If so, how do I read files?
> I've just done a little trial and the problem is that the CRTLibrary that
> StdioFileStream uses is converting all cr-lf pairs in an incoming text file
> into single lf characters before Dolphin gets to see them - hence the 150
> difference.

I thought so.

> I guessed that might be the problem but I had to test the
> theory because I have never actually used this class, and was wondering why
> you did?

well ... it is much easier to find than FileStream :-)
But finally I did and used the "lines" method. It's good that
the files I have to parse don't get too big.

> Dolphin's FileStream class is as easy to use, has more facilities
> and is quite efficient - although speed is the one area where the run time
> library version might possibly have an advantage?. If you look in the image
> you will see that FileStream is used _much_ more than StdioFileStream which
> is, usually, a sign that one class is preferred over another.
> I haven't checked it thoroughly but it looks like you can just replace
> StdioFileStream with FileStream, anything the former can do is implemented
> in the latter?

I don't know about speed, I wanted the nextLine method
to parse the file line by line ... been working a lot on
linux lately ;>

> I would also guessed that opening the file in binary mode, which you do when
> you switch back to the incorrect version of StdioFileStream
> class>>read:text:, worked as the CRTLibrary function is notified that a file
> is opened in binary mode and this, presumably inhibits the cr removal.

I tried this, but a StdioFileStream stores the "mode" in an instance variable,
and a binary file prohibits using nextLine ... strangely enough at the end
of the method is a comment referring to both binary and text mode.

Stefan Schmiedl
EDV-Beratung, Programmierung, Schulung
Loreleystr. 5, 94315 Straubing, Germany
Tel. (0 94 21) 74 01 06, Fax (0 94 21) 74 01 21
Public Key:

shhhh ... I can't hear my code!

Reply | Threaded
Open this post in threaded view

Re: StdioFileStream and text files (long)

Ian Bartholomew-8
In reply to this post by Ian Bartholomew-8
> and is quite efficient - although speed is the one area where the run time
> library version might possibly have an advantage?.

Weird. I've just done a test to compare speeds. On a 10 meg file using the

Time millisecondsToRun: [ | fs |
    fs := "Stdio"FileStream read: 'working.sml' text: false.
    [fs contents] ensure: [fs close]]

the FileStream version took ~85 mS. The StdioFileStream version alternated
between ~85, ~150 and ~250 mS. Obviously the library was doing some form of
caching, but not very efficiently.

Just shows that although it is more "featured" FileStream is no slower than


Reply | Threaded
Open this post in threaded view

Re: StdioFileStream and text files (long)

Ian Bartholomew-8
In reply to this post by Stefan Schmiedl

> how's the weather at your place?
> In front of my window cherry trees are fully blossoming :-)

Lovely. Sun shining on white cherry blossom on one side of the garden, pink
apple blossom on the other - best time of the year. Only downside is some
cloud appearing to hide my view of the planetary alignment - again.

> well ... it is much easier to find than FileStream :-)
> But finally I did and used the "lines" method. It's good that
> the files I have to parse don't get too big.

?. I guess you mean ...

aStream contents lines

which I don't suppose would be too efficient.

> I don't know about speed, I wanted the nextLine method
> to parse the file line by line ... been working a lot on
> linux lately ;>

FileStream understands #nextLine - implemented in the PositionableStream
superclass. The only caveat is that it needs a files with cr-lf end of
lines, but that was what you file contained?

> I tried this, but a StdioFileStream stores the "mode" in an instance
> and a binary file prohibits using nextLine ... strangely enough at the end
> of the method is a comment referring to both binary and text mode.

Hmm. As CRTLibrary appears to strip out cr characters by default (perhaps
there is a switch somewhere to change this?) then the code at the end of
StdioFileStream does seem a bit superfluous.  The only time it would be used
is if you open the file in binary mode and call #nextLine but, as you say,
the test at the start would then fail.  Not having used the class though I
can't be sure.


Reply | Threaded
Open this post in threaded view

Re: StdioFileStream and text files (long)

Stefan Schmiedl
On Sun, 21 Apr 2002 19:29:09 +0100,
Ian Bartholomew <[hidden email]> wrote:
> Stefan,
>> how's the weather at your place?
>> In front of my window cherry trees are fully blossoming :-)
> Lovely. Sun shining on white cherry blossom on one side of the garden, pink
> apple blossom on the other - best time of the year. Only downside is some
> cloud appearing to hide my view of the planetary alignment - again.

besides the different placement of trees, it is exactly the same here :-)

>> well ... it is much easier to find than FileStream :-)
>> But finally I did and used the "lines" method. It's good that
>> the files I have to parse don't get too big.
> ?. I guess you mean ...
> aStream contents lines
> which I don't suppose would be too efficient.

exactly, but not important with the small files I use.

>> I don't know about speed, I wanted the nextLine method
>> to parse the file line by line ... been working a lot on
>> linux lately ;>
> FileStream understands #nextLine - implemented in the PositionableStream
> superclass. The only caveat is that it needs a files with cr-lf end of
> lines, but that was what you file contained?

ok. I overlooked it. does anybody know a programming language
which actively reminds the programmer, when he's doing something

>> I tried this, but a StdioFileStream stores the "mode" in an instance
> variable,
>> and a binary file prohibits using nextLine ... strangely enough at the end
>> of the method is a comment referring to both binary and text mode.
> Hmm. As CRTLibrary appears to strip out cr characters by default (perhaps
> there is a switch somewhere to change this?) then the code at the end of
> StdioFileStream does seem a bit superfluous.  The only time it would be used
> is if you open the file in binary mode and call #nextLine but, as you say,
> the test at the start would then fail.  Not having used the class though I
> can't be sure.

seems to be creeper. maybe some day someone will clean it up ...


Reply | Threaded
Open this post in threaded view

Re: StdioFileStream and text files (long)

Stefan Schmiedl
In reply to this post by Ian Bartholomew-8
On Sun, 21 Apr 2002 18:36:20 +0100,
Ian Bartholomew <[hidden email]> wrote:

>> and is quite efficient - although speed is the one area where the run time
>> library version might possibly have an advantage?.
> Weird. I've just done a test to compare speeds. On a 10 meg file using the
> following
> Time millisecondsToRun: [ | fs |
>     fs := "Stdio"FileStream read: 'working.sml' text: false.
>     [fs contents] ensure: [fs close]]
> the FileStream version took ~85 mS. The StdioFileStream version alternated
> between ~85, ~150 and ~250 mS. Obviously the library was doing some form of
> caching, but not very efficiently.

I used my "default.img" (5.5 MB) for reading and got between 112 and 118 ms
on various runs. no large differences here.

> Just shows that although it is more "featured" FileStream is no slower than
> StdioFileStream...

makes me wonder, where the "final" implementation of the two streams is.
