StdioFileStream and text files (long)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

StdioFileStream and text files (long)

Stefan Schmiedl
Greetings,

I have a problem with reading a text file.

My (failing) test case opens a file (exported from some other app)
and checks the file length.

testOpenAngebot
  self assert: (OffaReader on: '010007') contents size = 177935.

The method being tested is:

OffaReader class>>on: aString
  ^self new
    stream: (StdioFileStream read: (self buildPathTo: aString) type: #text)

The message OffaReader>>contents above just returns the
StdioFileStream>>contents.

buildPathTo packages the quoted number into its pathname:

OffaReader class>>buildPathTo: aFilename

  | fixed found |
  fixed := (aFilename , '________' leftString: 8) , '.AN~'.
  found := File
             locateFilename: fixed
             in: (OrderedCollection with: 'c:\somepath\').
  fixed size = found size
    ifTrue: [ FileException signalOn: aFilename ]
    ifFalse: [ ^found ]


Now consider the exception: a Signal('End of stream').

I checked the implementation:

StdioFileStream>>contents
  "Answer a <String> or <ByteArray> containing the complete contents of the file
  accessed by the receiver."

  ^self reset; next: self size

Huh? Reading the amount of bytes in the file causes 'End of stream'?
More info:

StdioFileStream>>next:into:startingAt: has two interesting local variables:
count = 177935
read  = 177785
      --------
           150

ok, let's count the lines:
> wc -l Projekte/roth/Web66-Konvertierung/010007__.AN~
    150 Projekte/roth/Web66-Konvertierung/010007__.AN~

So it seems that we have the culprit.

What is really funny, is the following: Some months ago, I noticed that
StdioFileStream class>>read:text: has some attributes switched:

Original image: mode: (aBoolean ifTrue: ['rb'] ifFalse: ['rt']))
My image:       mode: (aBoolean ifTrue: ['rt'] ifFalse: ['rb']))

Since the strings get passed on to the C runtime library I figured that
since aBoolean = true for text files, the switch was ok. One of the Dolphin
trainers confirmed this, IIRC. And now finally the Dolphin strikes back.
Because, when I restore the original setting, my TestRunner shows green!

So should I use the "wrong" types for reading text files?
Should I meddle with the "read" counter above?
Should I avoid StdioFileStream? If so, how do I read files?

Could one of those running Dolphin 5 try this?

Thanks.

s.

--
Stefan Schmiedl
EDV-Beratung, Programmierung, Schulung
Loreleystr. 5, 94315 Straubing, Germany
Tel. (0 94 21) 74 01 06, Fax (0 94 21) 74 01 21
Public Key: http://xss.de/stefan.public

shhhh ... I can't hear my code!


Reply | Threaded
Open this post in threaded view
|

Re: StdioFileStream and text files (long)

Ian Bartholomew-8
Stefan,

> So should I use the "wrong" types for reading text files?
> Should I meddle with the "read" counter above?
> Should I avoid StdioFileStream? If so, how do I read files?

I've just done a little trial and the problem is that the CRTLibrary that
StdioFileStream uses is converting all cr-lf pairs in an incoming text file
into single lf characters before Dolphin gets to see them - hence the 150
difference.  I guessed that might be the problem but I had to test the
theory because I have never actually used this class, and was wondering why
you did?  Dolphin's FileStream class is as easy to use, has more facilities
and is quite efficient - although speed is the one area where the run time
library version might possibly have an advantage?.  If you look in the image
you will see that FileStream is used _much_ more than StdioFileStream which
is, usually, a sign that one class is preferred over another.

I haven't checked it thoroughly but it looks like you can just replace
StdioFileStream with FileStream, anything the former can do is implemented
in the latter?

I would also guessed that opening the file in binary mode, which you do when
you switch back to the incorrect version of StdioFileStream
class>>read:text:, worked as the CRTLibrary function is notified that a file
is opened in binary mode and this, presumably inhibits the cr removal.

Ian


Reply | Threaded
Open this post in threaded view
|

Re: StdioFileStream and text files (long)

Stefan Schmiedl
Hi Ian,

how's the weather at your place?
In front of my window cherry trees are fully blossoming :-)

On Sun, 21 Apr 2002 18:02:58 +0100,
Ian Bartholomew <[hidden email]> wrote:

> Stefan,
>
>> So should I use the "wrong" types for reading text files?
>> Should I meddle with the "read" counter above?
>> Should I avoid StdioFileStream? If so, how do I read files?
>
> I've just done a little trial and the problem is that the CRTLibrary that
> StdioFileStream uses is converting all cr-lf pairs in an incoming text file
> into single lf characters before Dolphin gets to see them - hence the 150
> difference.

I thought so.

> I guessed that might be the problem but I had to test the
> theory because I have never actually used this class, and was wondering why
> you did?

well ... it is much easier to find than FileStream :-)
But finally I did and used the "lines" method. It's good that
the files I have to parse don't get too big.

> Dolphin's FileStream class is as easy to use, has more facilities
> and is quite efficient - although speed is the one area where the run time
> library version might possibly have an advantage?. If you look in the image
> you will see that FileStream is used _much_ more than StdioFileStream which
> is, usually, a sign that one class is preferred over another.
>
> I haven't checked it thoroughly but it looks like you can just replace
> StdioFileStream with FileStream, anything the former can do is implemented
> in the latter?

I don't know about speed, I wanted the nextLine method
to parse the file line by line ... been working a lot on
linux lately ;>

>
> I would also guessed that opening the file in binary mode, which you do when
> you switch back to the incorrect version of StdioFileStream
> class>>read:text:, worked as the CRTLibrary function is notified that a file
> is opened in binary mode and this, presumably inhibits the cr removal.

I tried this, but a StdioFileStream stores the "mode" in an instance variable,
and a binary file prohibits using nextLine ... strangely enough at the end
of the method is a comment referring to both binary and text mode.

Thanks.
s.
--
Stefan Schmiedl
EDV-Beratung, Programmierung, Schulung
Loreleystr. 5, 94315 Straubing, Germany
Tel. (0 94 21) 74 01 06, Fax (0 94 21) 74 01 21
Public Key: http://xss.de/stefan.public

shhhh ... I can't hear my code!


Reply | Threaded
Open this post in threaded view
|

Re: StdioFileStream and text files (long)

Ian Bartholomew-8
In reply to this post by Ian Bartholomew-8
> and is quite efficient - although speed is the one area where the run time
> library version might possibly have an advantage?.

Weird. I've just done a test to compare speeds. On a 10 meg file using the
following

Time millisecondsToRun: [ | fs |
    fs := "Stdio"FileStream read: 'working.sml' text: false.
    [fs contents] ensure: [fs close]]

the FileStream version took ~85 mS. The StdioFileStream version alternated
between ~85, ~150 and ~250 mS. Obviously the library was doing some form of
caching, but not very efficiently.

Just shows that although it is more "featured" FileStream is no slower than
StdioFileStream...

Ian


Reply | Threaded
Open this post in threaded view
|

Re: StdioFileStream and text files (long)

Ian Bartholomew-8
In reply to this post by Stefan Schmiedl
Stefan,

> how's the weather at your place?
> In front of my window cherry trees are fully blossoming :-)

Lovely. Sun shining on white cherry blossom on one side of the garden, pink
apple blossom on the other - best time of the year. Only downside is some
cloud appearing to hide my view of the planetary alignment - again.

> well ... it is much easier to find than FileStream :-)
> But finally I did and used the "lines" method. It's good that
> the files I have to parse don't get too big.

?. I guess you mean ...

aStream contents lines

which I don't suppose would be too efficient.

> I don't know about speed, I wanted the nextLine method
> to parse the file line by line ... been working a lot on
> linux lately ;>

FileStream understands #nextLine - implemented in the PositionableStream
superclass. The only caveat is that it needs a files with cr-lf end of
lines, but that was what you file contained?

> I tried this, but a StdioFileStream stores the "mode" in an instance
variable,
> and a binary file prohibits using nextLine ... strangely enough at the end
> of the method is a comment referring to both binary and text mode.

Hmm. As CRTLibrary appears to strip out cr characters by default (perhaps
there is a switch somewhere to change this?) then the code at the end of
StdioFileStream does seem a bit superfluous.  The only time it would be used
is if you open the file in binary mode and call #nextLine but, as you say,
the test at the start would then fail.  Not having used the class though I
can't be sure.

Ian


Reply | Threaded
Open this post in threaded view
|

Re: StdioFileStream and text files (long)

Stefan Schmiedl
On Sun, 21 Apr 2002 19:29:09 +0100,
Ian Bartholomew <[hidden email]> wrote:
> Stefan,
>
>> how's the weather at your place?
>> In front of my window cherry trees are fully blossoming :-)
>
> Lovely. Sun shining on white cherry blossom on one side of the garden, pink
> apple blossom on the other - best time of the year. Only downside is some
> cloud appearing to hide my view of the planetary alignment - again.

besides the different placement of trees, it is exactly the same here :-)

>
>> well ... it is much easier to find than FileStream :-)
>> But finally I did and used the "lines" method. It's good that
>> the files I have to parse don't get too big.
>
> ?. I guess you mean ...
>
> aStream contents lines
>
> which I don't suppose would be too efficient.

exactly, but not important with the small files I use.

>
>> I don't know about speed, I wanted the nextLine method
>> to parse the file line by line ... been working a lot on
>> linux lately ;>
>
> FileStream understands #nextLine - implemented in the PositionableStream
> superclass. The only caveat is that it needs a files with cr-lf end of
> lines, but that was what you file contained?

ok. I overlooked it. does anybody know a programming language
which actively reminds the programmer, when he's doing something
stupid?

>
>> I tried this, but a StdioFileStream stores the "mode" in an instance
> variable,
>> and a binary file prohibits using nextLine ... strangely enough at the end
>> of the method is a comment referring to both binary and text mode.
>
> Hmm. As CRTLibrary appears to strip out cr characters by default (perhaps
> there is a switch somewhere to change this?) then the code at the end of
> StdioFileStream does seem a bit superfluous.  The only time it would be used
> is if you open the file in binary mode and call #nextLine but, as you say,
> the test at the start would then fail.  Not having used the class though I
> can't be sure.
>

seems to be creeper. maybe some day someone will clean it up ...

s.


Reply | Threaded
Open this post in threaded view
|

Re: StdioFileStream and text files (long)

Stefan Schmiedl
In reply to this post by Ian Bartholomew-8
On Sun, 21 Apr 2002 18:36:20 +0100,
Ian Bartholomew <[hidden email]> wrote:

>
>> and is quite efficient - although speed is the one area where the run time
>> library version might possibly have an advantage?.
>
> Weird. I've just done a test to compare speeds. On a 10 meg file using the
> following
>
> Time millisecondsToRun: [ | fs |
>     fs := "Stdio"FileStream read: 'working.sml' text: false.
>     [fs contents] ensure: [fs close]]
>
> the FileStream version took ~85 mS. The StdioFileStream version alternated
> between ~85, ~150 and ~250 mS. Obviously the library was doing some form of
> caching, but not very efficiently.

I used my "default.img" (5.5 MB) for reading and got between 112 and 118 ms
on various runs. no large differences here.

>
> Just shows that although it is more "featured" FileStream is no slower than
> StdioFileStream...
>

makes me wonder, where the "final" implementation of the two streams is.

s.