EndOfStream unused

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: EndOfStream unused

Paolo Bonzini-2

> Why would it kill performance?  The block is created at the first
> call, and thereafter a reference to it is past, no?  That sounds
> pretty fast to me.

If you run this:

     [ self next == nil ] whileTrue: [ ... ]

and Stream>>next is this

     ^self nextIfAtEnd: [ ^ nil ]

every time you invoke #next, it you create a new
BlockContext/BlockClosure, and that makes it impossible to exploit LIFO
behavior for thisContext.  It kills performance.

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: EndOfStream unused

Nicolas Cellier-3
To close the nextIfAtEnd: alternative see micro benchmarks (source and
results) at http://bugs.squeak.org/view.php?id=6755.

Paolo Bonzini a écrit :

>
>> Why would it kill performance?  The block is created at the first
>> call, and thereafter a reference to it is past, no?  That sounds
>> pretty fast to me.
>
> If you run this:
>
>     [ self next == nil ] whileTrue: [ ... ]
>
> and Stream>>next is this
>
>     ^self nextIfAtEnd: [ ^ nil ]
>
> every time you invoke #next, it you create a new
> BlockContext/BlockClosure, and that makes it impossible to exploit LIFO
> behavior for thisContext.  It kills performance.
>

Yeah, implementing default next like this is definitely a no go

> Paolo
>
>


Reply | Threaded
Open this post in threaded view
|

EndOfStream performance cost [was EndOfStream unused]

Nicolas Cellier-3
In reply to this post by Andreas.Raab
Hi Andreas

To answer these important questions:
- what performance drop/gain can we expect (trade off)?
- should we have to rewrite anything apart core #next methods?

you can see I completed your tests at
http://bugs.squeak.org/view.php?id=6755

Here I take rationale again with a similar more systematic approach.

PROCEDURE:
=========

Analyzing residual costs with Micro bench should tell us the price:
1) == nil test costs = p (like pastEndTest)
2) atEnd test costs = a
3) exception handling cost = e when not handled and = h when handled

4) stream size is n

We get three main ways to write loops:
5) (P) a pastEnd loop costs (residual only)
        n*p   before,
        n*p+e after EndOfStream change.
6) (A) a atEnd test loop costs
        n*a before,
        n*a after.
7) (E) an exception loop
        is impossible before,
        costs h after.

RESULTS:
=======

On my machine:

[200000 timesRepeat: []] timeToRun. 148
[200000 timesRepeat: [$a == nil ifFalse: [nil]]] timeToRun. 167

[5000 timesRepeat: []] timeToRun. 3
[5000 timesRepeat: [EndOfStream signal]] timeToRun. 159

[5000 timesRepeat: []] timeToRun. 3
[5000 timesRepeat: [[EndOfStream signal] on: EndOfStream do: [:exc | exc
return]]] timeToRun. 209

s := ReadStream on: (1 to: 1000) asArray.
[200000 timesRepeat: []] timeToRun. 148
[200000 timesRepeat: [s atEnd ifFalse: [nil]]] timeToRun.  223

e=0.03 h=0.04 a=0.00037 p=0.0001


IS IT WORTH REWRITING (A) loops ?
---------------------------------

Obviously, no hurry...
Since no exception is raised, cost is unchanged
It would be worth only if average n > h/a.
        (ON MY MACHINE n>120)

WHAT THE CHANGE COST ON (P) loops?
---------------------------------

- Introducing the change costs a penalty e on each pastEnd loop (P)
- This is neglectable as soon as n >> e/p
- But does de-optimize small pastEnd loops as noticed by Andreas.

IS IT WORTH REWRITING (P) loops ?
---------------------------------

- rewrite (P) in (E) for small pastEnd loops is no use
   because n*p + e and h are same for small n
- rewrite (P) in (A) would be viable if n*p+e > n*a, n < e/(a-p)
        (ON MY MACHINE n<110)
- rewrite (P) in (E) would be worth for n > h/p.
        (ON MY MACHINE n>400)
   But no hurry, we have seen that penalty is small for such loops.

CONCLUSIONS
-----------

Introducing the change potentially cost a penalty on pastEnd short
stream tight loops.
If necessary, we might re-optimize a little most critical loops with a
(P)->(A) rewrite, but without reaching previous performance.

Suggested by Paolo, my guess however is that:
- Most pastEnd loops are written for files. files are long and
performance drop would be neglectable. A rewrite would benefit but no hurry.
- Most String processing loops use atEnd loop (upper level ReadStream
does not use == nil trick, client code might however uses few).

POSSIBLE PLAN to confirm: trap EndOfStream to trace usage of pastEnd
loops and run maximum activity in squeak...

Introducing this change does not seem worth per se, except maybe for
huge files.
However - TO BE CONFIRMED, it might as well be neutral toward
performances with current VMs.

It would be worth in this case for enabling Stream extensions having
costly atEnd tests and no possible == nil trick.

Apart core #next, no rewrite of basic method is necessary.
No hurry to do massive rewrite of code anyway.

Cheers

Nicolas


Reply | Threaded
Open this post in threaded view
|

Re: EndOfStream performance cost [was EndOfStream unused]

Paolo Bonzini-2
> - Most String processing loops use atEnd loop (upper level ReadStream
> does not use == nil trick, client code might however uses few).

I found only one that doesn't, in InflateStream>>#atEnd, using this code:

(CompiledMethod allInstances
   select: [ :m | | s |
      s := m getSource asString.
      ('*next*' match: s ) and: [
         ('*next == nil*' match: s )  or: [
         ('*next isNil'  match: s ) or: [
         '*next notNil' match: s ]]]])

Here is the source code:

     atEnd
        "Note: It is possible that we have a few bits left,
        representing just the EOB marker. To check for
        this we must force decompression of the next
        block if at end of data."
        super atEnd ifFalse:[^false]. "Primitive test"
        (position >= readLimit and:[state = StateNoMoreData])
            ifTrue:[^true].
        self next == nil ifTrue:[^true].
        position := position - 1.
        ^false

and in this same class, we have:

     pastEndRead
        state = StateNoMoreData ifTrue:[^nil].
        ...
        ^self next

     next
        <primitive: 65>
        position >= readLimit
                ifTrue: [^self pastEndRead]
                ifFalse: [^collection at: (position := position + 1)]

So, the "self next == nil" could be written "self pastEndRead == nil".
Or even better, one could simply rewrite "atEnd and next" like this:

     atEnd
        position < readLimit ifFalse: [^false].
        "for speed, we inline pastEndRead with these changes:"
        state = StateNoMoreData ifTrue:[^true].
        ...
        ^false

     next
        <primitive: 65>
        "position >= readLimit test inlined for speed"
        (position >= readLimit and: [self atEnd]) ifTrue: [ ^nil ].
        ^collection at: (position := position + 1)

Actually, this implementation of #next could be moved up to ReadStream,
so that no override is necessary in InflateStream.  So, it looks like
the only explicit usage (in my image) of "self next == nil" ought to be
eliminated anyway.

In some cases, the "self next == nil" test is used implicitly, for example:

Number>>readExponent:base:
        ('edq' includes: aStream next) ifFalse: [^ nil].
        ...

This code however does not seem to be written with speed in mind.

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: EndOfStream performance cost [was EndOfStream unused]

Nicolas Cellier-3
Yes as expected most are FileStream kind
However, using:

   SystemNavigation new browseAllCallsOn: #next and: #==

I found some other cases that might read pastEnd like for example:
- Number class>>#readExponent:base:from: might read pastEnd
- PositionableStream>>#next:into:startingAt: which has some senders...
- etc...

So I'm now trying to instrument with a tally...
It's hacky, but seems to work:

tally := MessageTally new.
tally spyEvery: 100 on: ['em ezilaitini ot si siht' reverse].
tally class: World class method: #doOneCycle.
    "middle lines begin"
tallyEnd := false.
[(Delay forSeconds: 300) wait. tallyEnd := true] fork.
[[World doOneCycle.  Processor yield.tallyEnd] whileFalse]
                on: EndOfStream
                do: [:exc | tally tally: exc signalerContext by: 1.
                        exc resume].
(StringHolder new contents:
                (String streamContents: [:s | tally report: s]))
        openLabel: 'EndOfStream Spy Results'.
    "middle lines end"
tally close.

If you do not execute tally close, and execute middle lines again, I
think results can be cumulated.

I effectively find Number readFrom: used from Compiler. But got only 40
tallies from ReadStream in 5 minutes...
Gasp, did not find the gold mine of pastEnd...

If someone wants to try to help, load ConnectEndOfStream-M6755-nice.1.cs
(http://bugs.squeak.org/file_download.php?file_id=3064&type=bug) from
http://bugs.squeak.org/view.php?id=6755 and execute above hack in a
workspace, then play your favourite activities...

In case of severe slow down please post results to me or the list.
Thanks

Nicolas

Paolo Bonzini a écrit :

>> - Most String processing loops use atEnd loop (upper level ReadStream
>> does not use == nil trick, client code might however uses few).
>
> I found only one that doesn't, in InflateStream>>#atEnd, using this code:
>
> (CompiledMethod allInstances
>   select: [ :m | | s |
>      s := m getSource asString.
>      ('*next*' match: s ) and: [
>         ('*next == nil*' match: s )  or: [
>         ('*next isNil'  match: s ) or: [
>         '*next notNil' match: s ]]]])
>
> Here is the source code:
>
>     atEnd
>     "Note: It is possible that we have a few bits left,
>     representing just the EOB marker. To check for
>     this we must force decompression of the next
>     block if at end of data."
>     super atEnd ifFalse:[^false]. "Primitive test"
>     (position >= readLimit and:[state = StateNoMoreData])
>         ifTrue:[^true].
>     self next == nil ifTrue:[^true].
>     position := position - 1.
>     ^false
>
> and in this same class, we have:
>
>     pastEndRead
>     state = StateNoMoreData ifTrue:[^nil].
>     ...
>     ^self next
>
>     next
>     <primitive: 65>
>     position >= readLimit
>         ifTrue: [^self pastEndRead]
>         ifFalse: [^collection at: (position := position + 1)]
>
> So, the "self next == nil" could be written "self pastEndRead == nil".
> Or even better, one could simply rewrite "atEnd and next" like this:
>
>     atEnd
>     position < readLimit ifFalse: [^false].
>     "for speed, we inline pastEndRead with these changes:"
>     state = StateNoMoreData ifTrue:[^true].
>     ...
>     ^false
>
>     next
>     <primitive: 65>
>     "position >= readLimit test inlined for speed"
>     (position >= readLimit and: [self atEnd]) ifTrue: [ ^nil ].
>     ^collection at: (position := position + 1)
>
> Actually, this implementation of #next could be moved up to ReadStream,
> so that no override is necessary in InflateStream.  So, it looks like
> the only explicit usage (in my image) of "self next == nil" ought to be
> eliminated anyway.
>
> In some cases, the "self next == nil" test is used implicitly, for example:
>
> Number>>readExponent:base:
>     ('edq' includes: aStream next) ifFalse: [^ nil].
>     ...
>
> This code however does not seem to be written with speed in mind.
>
> Paolo
>
>


12