StandardFileStream rant continued [was Problem handling nextLine]

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

StandardFileStream rant continued [was Problem handling nextLine]

Nicolas Cellier
Beside,
look at what happens when we execute:

strm := StandardFileStream oldFileNamed: 'foo'.
[[stream atEnd] whileFalse: [strm upTo: Character cr]] ensure: [strm close].

We
- first read 2000 characters in the buffer,
- keep only a few ones (the first line),
- abandon the rest of the buffer to the garbageCollector,
- position the file back
- and restart reading 2000 chars at next call...
Prodigiously inefficient.
I did not check in VMMaker what real work take place under the carpet,
but the primitives looks enough like corresponding standard OS
functions.

Why not just
- reuse at image level the same buffer,
- avoid spurious garbage collection,
- keep a pointer on available data (already read in buffer),
- avoid moving the file pointer back and forth,
- and avoid reading the same file zone again and again ?
We have to thank our OS and hardware to buffer disk data for us and be
glad Squeak does not handle disk heads with such a policy !

Wanna gain a bit of speed ? Don't wait Cog, there is plenty of room here !

Nicolas

2009/11/17 Nicolas Cellier <[hidden email]>:

> I get it.
> I wanted to use #upToAnyOf: crlf.
> Unfortunately, this one would not tell me whether a cr or a lf was hit
> (or the end of stream).
> So I inlined the sole implementor in the hierarchy naively thinking it
> would just work...
> MY ERROR!
> not only is it sub optimal (slow), #upToAnyOf: is broken in
> StandardFileStream...
>
> Recipe:
> - create an inst var in abstract superclass,
> - use it in many methods,
> - define a subclass that do not use this inst var,
> - define only a subset of superclass messages in the subclass,
> congratulations you created at least as many bugs as the number of
> methods you failed to override.
>
> When I read all this code, I get the feeling that the whole stream
> hierarchy is broken...
> If you trust library consistency, you're in trouble.
> That is the next big core library deserving clean-up in trunk.
> I guess Pharo will have an easier task integrating Nile complete rewrite ;)
>
> OK, end of rant :)
>
> Nicolas
>
> 2009/11/17 Nicolas Cellier <[hidden email]>:
>> Hi Casimiro,
>> it would help if you would provide receiver class information.
>> Otherwise, I see I probably messed up the horrific Stream hierarchy !
>> I will revert this change until a better fix - mean more tests .
>>
>> Cheers
>>
>> 2009/11/17 Casimiro de Almeida Barreto <[hidden email]>:
>>> After today's update, the following code:
>>>
>>>    [ fHandle atEnd ] whileFalse: [
>>>        stringFromFile := fHandle nextLine.
>>>        draw := Draw newFromString: stringFromFile.
>>>        draw isNil
>>>            ifTrue: [
>>>                Transcript show: 'Failed to load draw.'; cr.
>>>                fHandle close.
>>>                ^nil ].
>>>        model draws addLast: draw ].
>>>
>>> Results in the following error:
>>>
>>> nextLine
>>>    "Answer next line (may be empty), or nil if at end.
>>>    Handle a zoo of line delimiters CR, LF, or CR-LF pair"
>>>
>>>    | newStream element crlf |
>>>    self atEnd ifTrue: [^nil].
>>>    crlf := CharacterSet crlf.
>>>    newStream := WriteStream on: (collection species new: 100).
>>>    [self atEnd ifTrue: [^newStream contents].
>>>    crlf includes: (element := self next)]
>>>        whileFalse: [newStream nextPut: element].
>>>    element = Character cr ifTrue: [self peekFor: Character lf]. "handle
>>> an eventual CR LF pair"
>>>    ^newStream contents
>>>
>>> =============================================================
>>>
>>> new: sizeRequested
>>>    "Answer an initialized instance of this class with the number of
>>> indexable
>>>    variables specified by the argument, sizeRequested."
>>>
>>>    ^ (self basicNew: sizeRequested) initialize
>>>
>>> =============================================================
>>>
>>> basicNew: sizeRequested
>>>    "Primitive. Answer an instance of this class with the number
>>>    of indexable variables specified by the argument, sizeRequested.
>>>    Fail if this class is not indexable or if the argument is not a
>>>    positive Integer, or if there is not enough memory available.
>>>    Essential. See Object documentation whatIsAPrimitive."
>>>
>>>    <primitive: 71>
>>>    self isVariable ifFalse:
>>>        [self error: self printString, ' cannot have variable sized
>>> instances'].
>>>    (sizeRequested isInteger and: [sizeRequested >= 0]) ifTrue:
>>>        ["arg okay; space must be low."
>>>        self environment signalLowSpace.
>>>        ^ self basicNew: sizeRequested  "retry if user proceeds"].
>>>    self primitiveFailed
>>>
>>> =============================================================
>>>
>>> error: aString
>>>    "Throw a generic Error exception."
>>>
>>>    ^Error new signal: aString
>>>
>>> Error: UndefinedObject cannot have variable sized instances
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: StandardFileStream rant continued [was Problem handling nextLine]

Igor Stasenko
2009/11/18 Nicolas Cellier <[hidden email]>:

> Beside,
> look at what happens when we execute:
>
> strm := StandardFileStream oldFileNamed: 'foo'.
> [[stream atEnd] whileFalse: [strm upTo: Character cr]] ensure: [strm close].
>
> We
> - first read 2000 characters in the buffer,
> - keep only a few ones (the first line),
> - abandon the rest of the buffer to the garbageCollector,
> - position the file back
> - and restart reading 2000 chars at next call...
> Prodigiously inefficient.
> I did not check in VMMaker what real work take place under the carpet,
> but the primitives looks enough like corresponding standard OS
> functions.
>
the call chain in #upTo: endig with primitive:
  count := self primRead: fileID into: aString
  startingAt: startIndex count: n.

So, really, reading 2000 bytes using primitive could be faster than
reading 1 byte at a time (as in CrLfFileStream>>upTo:),
unless a file stream having own read-ahead buffer which is used
internally for all sort of things, including in #next and #next: and
#position:

> Why not just
> - reuse at image level the same buffer,
> - avoid spurious garbage collection,
> - keep a pointer on available data (already read in buffer),
> - avoid moving the file pointer back and forth,
> - and avoid reading the same file zone again and again ?
> We have to thank our OS and hardware to buffer disk data for us and be
> glad Squeak does not handle disk heads with such a policy !
>
> Wanna gain a bit of speed ? Don't wait Cog, there is plenty of room here !
>
Indeed :)

> Nicolas
>
> 2009/11/17 Nicolas Cellier <[hidden email]>:
>> I get it.
>> I wanted to use #upToAnyOf: crlf.
>> Unfortunately, this one would not tell me whether a cr or a lf was hit
>> (or the end of stream).
>> So I inlined the sole implementor in the hierarchy naively thinking it
>> would just work...
>> MY ERROR!
>> not only is it sub optimal (slow), #upToAnyOf: is broken in
>> StandardFileStream...
>>
>> Recipe:
>> - create an inst var in abstract superclass,
>> - use it in many methods,
>> - define a subclass that do not use this inst var,
>> - define only a subset of superclass messages in the subclass,
>> congratulations you created at least as many bugs as the number of
>> methods you failed to override.
>>
>> When I read all this code, I get the feeling that the whole stream
>> hierarchy is broken...
>> If you trust library consistency, you're in trouble.
>> That is the next big core library deserving clean-up in trunk.
>> I guess Pharo will have an easier task integrating Nile complete rewrite ;)
>>
>> OK, end of rant :)
>>
>> Nicolas
>>
>> 2009/11/17 Nicolas Cellier <[hidden email]>:
>>> Hi Casimiro,
>>> it would help if you would provide receiver class information.
>>> Otherwise, I see I probably messed up the horrific Stream hierarchy !
>>> I will revert this change until a better fix - mean more tests .
>>>
>>> Cheers
>>>
>>> 2009/11/17 Casimiro de Almeida Barreto <[hidden email]>:
>>>> After today's update, the following code:
>>>>
>>>>    [ fHandle atEnd ] whileFalse: [
>>>>        stringFromFile := fHandle nextLine.
>>>>        draw := Draw newFromString: stringFromFile.
>>>>        draw isNil
>>>>            ifTrue: [
>>>>                Transcript show: 'Failed to load draw.'; cr.
>>>>                fHandle close.
>>>>                ^nil ].
>>>>        model draws addLast: draw ].
>>>>
>>>> Results in the following error:
>>>>
>>>> nextLine
>>>>    "Answer next line (may be empty), or nil if at end.
>>>>    Handle a zoo of line delimiters CR, LF, or CR-LF pair"
>>>>
>>>>    | newStream element crlf |
>>>>    self atEnd ifTrue: [^nil].
>>>>    crlf := CharacterSet crlf.
>>>>    newStream := WriteStream on: (collection species new: 100).
>>>>    [self atEnd ifTrue: [^newStream contents].
>>>>    crlf includes: (element := self next)]
>>>>        whileFalse: [newStream nextPut: element].
>>>>    element = Character cr ifTrue: [self peekFor: Character lf]. "handle
>>>> an eventual CR LF pair"
>>>>    ^newStream contents
>>>>
>>>> =============================================================
>>>>
>>>> new: sizeRequested
>>>>    "Answer an initialized instance of this class with the number of
>>>> indexable
>>>>    variables specified by the argument, sizeRequested."
>>>>
>>>>    ^ (self basicNew: sizeRequested) initialize
>>>>
>>>> =============================================================
>>>>
>>>> basicNew: sizeRequested
>>>>    "Primitive. Answer an instance of this class with the number
>>>>    of indexable variables specified by the argument, sizeRequested.
>>>>    Fail if this class is not indexable or if the argument is not a
>>>>    positive Integer, or if there is not enough memory available.
>>>>    Essential. See Object documentation whatIsAPrimitive."
>>>>
>>>>    <primitive: 71>
>>>>    self isVariable ifFalse:
>>>>        [self error: self printString, ' cannot have variable sized
>>>> instances'].
>>>>    (sizeRequested isInteger and: [sizeRequested >= 0]) ifTrue:
>>>>        ["arg okay; space must be low."
>>>>        self environment signalLowSpace.
>>>>        ^ self basicNew: sizeRequested  "retry if user proceeds"].
>>>>    self primitiveFailed
>>>>
>>>> =============================================================
>>>>
>>>> error: aString
>>>>    "Throw a generic Error exception."
>>>>
>>>>    ^Error new signal: aString
>>>>
>>>> Error: UndefinedObject cannot have variable sized instances
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>



--
Best regards,
Igor Stasenko AKA sig.