Greetings,
I have a problem with reading a text file. My (failing) test case opens a file (exported from some other app) and checks the file length. testOpenAngebot self assert: (OffaReader on: '010007') contents size = 177935. The method being tested is: OffaReader class>>on: aString ^self new stream: (StdioFileStream read: (self buildPathTo: aString) type: #text) The message OffaReader>>contents above just returns the StdioFileStream>>contents. buildPathTo packages the quoted number into its pathname: OffaReader class>>buildPathTo: aFilename | fixed found | fixed := (aFilename , '________' leftString: 8) , '.AN~'. found := File locateFilename: fixed in: (OrderedCollection with: 'c:\somepath\'). fixed size = found size ifTrue: [ FileException signalOn: aFilename ] ifFalse: [ ^found ] Now consider the exception: a Signal('End of stream'). I checked the implementation: StdioFileStream>>contents "Answer a <String> or <ByteArray> containing the complete contents of the file accessed by the receiver." ^self reset; next: self size Huh? Reading the amount of bytes in the file causes 'End of stream'? More info: StdioFileStream>>next:into:startingAt: has two interesting local variables: count = 177935 read = 177785 -------- 150 ok, let's count the lines: > wc -l Projekte/roth/Web66-Konvertierung/010007__.AN~ 150 Projekte/roth/Web66-Konvertierung/010007__.AN~ So it seems that we have the culprit. What is really funny, is the following: Some months ago, I noticed that StdioFileStream class>>read:text: has some attributes switched: Original image: mode: (aBoolean ifTrue: ['rb'] ifFalse: ['rt'])) My image: mode: (aBoolean ifTrue: ['rt'] ifFalse: ['rb'])) Since the strings get passed on to the C runtime library I figured that since aBoolean = true for text files, the switch was ok. One of the Dolphin trainers confirmed this, IIRC. And now finally the Dolphin strikes back. Because, when I restore the original setting, my TestRunner shows green! So should I use the "wrong" types for reading text files? Should I meddle with the "read" counter above? Should I avoid StdioFileStream? If so, how do I read files? Could one of those running Dolphin 5 try this? Thanks. s. -- Stefan Schmiedl EDV-Beratung, Programmierung, Schulung Loreleystr. 5, 94315 Straubing, Germany Tel. (0 94 21) 74 01 06, Fax (0 94 21) 74 01 21 Public Key: http://xss.de/stefan.public shhhh ... I can't hear my code! |
Stefan,
> So should I use the "wrong" types for reading text files? > Should I meddle with the "read" counter above? > Should I avoid StdioFileStream? If so, how do I read files? I've just done a little trial and the problem is that the CRTLibrary that StdioFileStream uses is converting all cr-lf pairs in an incoming text file into single lf characters before Dolphin gets to see them - hence the 150 difference. I guessed that might be the problem but I had to test the theory because I have never actually used this class, and was wondering why you did? Dolphin's FileStream class is as easy to use, has more facilities and is quite efficient - although speed is the one area where the run time library version might possibly have an advantage?. If you look in the image you will see that FileStream is used _much_ more than StdioFileStream which is, usually, a sign that one class is preferred over another. I haven't checked it thoroughly but it looks like you can just replace StdioFileStream with FileStream, anything the former can do is implemented in the latter? I would also guessed that opening the file in binary mode, which you do when you switch back to the incorrect version of StdioFileStream class>>read:text:, worked as the CRTLibrary function is notified that a file is opened in binary mode and this, presumably inhibits the cr removal. Ian |
Hi Ian,
how's the weather at your place? In front of my window cherry trees are fully blossoming :-) On Sun, 21 Apr 2002 18:02:58 +0100, Ian Bartholomew <[hidden email]> wrote: > Stefan, > >> So should I use the "wrong" types for reading text files? >> Should I meddle with the "read" counter above? >> Should I avoid StdioFileStream? If so, how do I read files? > > I've just done a little trial and the problem is that the CRTLibrary that > StdioFileStream uses is converting all cr-lf pairs in an incoming text file > into single lf characters before Dolphin gets to see them - hence the 150 > difference. I thought so. > I guessed that might be the problem but I had to test the > theory because I have never actually used this class, and was wondering why > you did? well ... it is much easier to find than FileStream :-) But finally I did and used the "lines" method. It's good that the files I have to parse don't get too big. > Dolphin's FileStream class is as easy to use, has more facilities > and is quite efficient - although speed is the one area where the run time > library version might possibly have an advantage?. If you look in the image > you will see that FileStream is used _much_ more than StdioFileStream which > is, usually, a sign that one class is preferred over another. > > I haven't checked it thoroughly but it looks like you can just replace > StdioFileStream with FileStream, anything the former can do is implemented > in the latter? I don't know about speed, I wanted the nextLine method to parse the file line by line ... been working a lot on linux lately ;> > > I would also guessed that opening the file in binary mode, which you do when > you switch back to the incorrect version of StdioFileStream > class>>read:text:, worked as the CRTLibrary function is notified that a file > is opened in binary mode and this, presumably inhibits the cr removal. I tried this, but a StdioFileStream stores the "mode" in an instance variable, and a binary file prohibits using nextLine ... strangely enough at the end of the method is a comment referring to both binary and text mode. Thanks. s. -- Stefan Schmiedl EDV-Beratung, Programmierung, Schulung Loreleystr. 5, 94315 Straubing, Germany Tel. (0 94 21) 74 01 06, Fax (0 94 21) 74 01 21 Public Key: http://xss.de/stefan.public shhhh ... I can't hear my code! |
In reply to this post by Ian Bartholomew-8
> and is quite efficient - although speed is the one area where the run time
> library version might possibly have an advantage?. Weird. I've just done a test to compare speeds. On a 10 meg file using the following Time millisecondsToRun: [ | fs | fs := "Stdio"FileStream read: 'working.sml' text: false. [fs contents] ensure: [fs close]] the FileStream version took ~85 mS. The StdioFileStream version alternated between ~85, ~150 and ~250 mS. Obviously the library was doing some form of caching, but not very efficiently. Just shows that although it is more "featured" FileStream is no slower than StdioFileStream... Ian |
In reply to this post by Stefan Schmiedl
Stefan,
> how's the weather at your place? > In front of my window cherry trees are fully blossoming :-) Lovely. Sun shining on white cherry blossom on one side of the garden, pink apple blossom on the other - best time of the year. Only downside is some cloud appearing to hide my view of the planetary alignment - again. > well ... it is much easier to find than FileStream :-) > But finally I did and used the "lines" method. It's good that > the files I have to parse don't get too big. ?. I guess you mean ... aStream contents lines which I don't suppose would be too efficient. > I don't know about speed, I wanted the nextLine method > to parse the file line by line ... been working a lot on > linux lately ;> FileStream understands #nextLine - implemented in the PositionableStream superclass. The only caveat is that it needs a files with cr-lf end of lines, but that was what you file contained? > I tried this, but a StdioFileStream stores the "mode" in an instance variable, > and a binary file prohibits using nextLine ... strangely enough at the end > of the method is a comment referring to both binary and text mode. Hmm. As CRTLibrary appears to strip out cr characters by default (perhaps there is a switch somewhere to change this?) then the code at the end of StdioFileStream does seem a bit superfluous. The only time it would be used is if you open the file in binary mode and call #nextLine but, as you say, the test at the start would then fail. Not having used the class though I can't be sure. Ian |
On Sun, 21 Apr 2002 19:29:09 +0100,
Ian Bartholomew <[hidden email]> wrote: > Stefan, > >> how's the weather at your place? >> In front of my window cherry trees are fully blossoming :-) > > Lovely. Sun shining on white cherry blossom on one side of the garden, pink > apple blossom on the other - best time of the year. Only downside is some > cloud appearing to hide my view of the planetary alignment - again. besides the different placement of trees, it is exactly the same here :-) > >> well ... it is much easier to find than FileStream :-) >> But finally I did and used the "lines" method. It's good that >> the files I have to parse don't get too big. > > ?. I guess you mean ... > > aStream contents lines > > which I don't suppose would be too efficient. exactly, but not important with the small files I use. > >> I don't know about speed, I wanted the nextLine method >> to parse the file line by line ... been working a lot on >> linux lately ;> > > FileStream understands #nextLine - implemented in the PositionableStream > superclass. The only caveat is that it needs a files with cr-lf end of > lines, but that was what you file contained? ok. I overlooked it. does anybody know a programming language which actively reminds the programmer, when he's doing something stupid? > >> I tried this, but a StdioFileStream stores the "mode" in an instance > variable, >> and a binary file prohibits using nextLine ... strangely enough at the end >> of the method is a comment referring to both binary and text mode. > > Hmm. As CRTLibrary appears to strip out cr characters by default (perhaps > there is a switch somewhere to change this?) then the code at the end of > StdioFileStream does seem a bit superfluous. The only time it would be used > is if you open the file in binary mode and call #nextLine but, as you say, > the test at the start would then fail. Not having used the class though I > can't be sure. > seems to be creeper. maybe some day someone will clean it up ... s. |
In reply to this post by Ian Bartholomew-8
On Sun, 21 Apr 2002 18:36:20 +0100,
Ian Bartholomew <[hidden email]> wrote: > >> and is quite efficient - although speed is the one area where the run time >> library version might possibly have an advantage?. > > Weird. I've just done a test to compare speeds. On a 10 meg file using the > following > > Time millisecondsToRun: [ | fs | > fs := "Stdio"FileStream read: 'working.sml' text: false. > [fs contents] ensure: [fs close]] > > the FileStream version took ~85 mS. The StdioFileStream version alternated > between ~85, ~150 and ~250 mS. Obviously the library was doing some form of > caching, but not very efficiently. I used my "default.img" (5.5 MB) for reading and got between 112 and 118 ms on various runs. no large differences here. > > Just shows that although it is more "featured" FileStream is no slower than > StdioFileStream... > makes me wonder, where the "final" implementation of the two streams is. s. |
Free forum by Nabble | Edit this page |