Streams. Status and where to go?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Streams. Status and where to go?

Igor Stasenko
On 28 February 2010 12:45, Nicolas Cellier
<[hidden email]> wrote:

> 2010/2/28 Igor Stasenko <[hidden email]>:
>> On 28 February 2010 12:00, Nicolas Cellier
>> <[hidden email]> wrote:
>>> 2010/2/28 Igor Stasenko <[hidden email]>:
>>>> Hi, i'm also did some hacking. I uploaded XTream-Wrappers-sig.1 into SqS/XTream.
>>>>
>>>> There is a basic XtreamWrapper class, which should work transparently
>>>> for any stream (hopefully ;).
>>>> Next, in subclass i created converter. Sure thing i could also add a
>>>> buffered wrapper, but maybe later :)
>>>>
>>>> Here some benchmarks. The file i used to test is utf-8 russian doc
>>>> text - in attachment..
>>>>
>>>> | str |
>>>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream.
>>>> {
>>>> [ str reset. (XtreamUTF8Converter on: str readXtream) upToEnd ] bench.
>>>> [ str reset. (UTF8Decoder new source: str readXtream) upToEnd ] bench.
>>>> }
>>>> #('21.71314741035857 per second.' '14.0371688414393 per second.')
>>>>  #('22.16896345116836 per second.' '14.5186953062848 per second.')
>>>>
>>>> Next, buffered
>>>>
>>>> | str |
>>>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream.
>>>> {
>>>> [ str reset. (XtreamUTF8Converter on: str readXtream buffered) upToEnd ] bench.
>>>> [ str reset. (UTF8Decoder new source: str readXtream buffered) upToEnd ] bench.
>>>> }
>>>> #('58.52976428286057 per second.' '25.44225800039754 per second.')
>>>> #('58.90575079872205 per second.' '25.87064676616916 per second.')
>>>>
>>>>
>>>> I'm also tried double-buffering, but neither my class nor yours
>>>> currently works with it:
>>>>
>>>> | str |
>>>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream.
>>>> {
>>>> [ str reset. (XtreamUTF8Converter on: str readXtream buffered)
>>>> buffered upToEnd ] bench.
>>>> [ str reset. (UTF8Decoder new source: str readXtream buffered)
>>>> buffered upToEnd ] bench.
>>>> }
>>>>
>>>> Please , take a look. There are some quirks which not because i
>>>> cleaned up decoding/encoding code.
>>>> See XtreamWrapper>>upToEnd implementation.
>>>>
>>>>
>>>
>>> Yes I published a bit soon and messed up because one temp from text
>>> converter method (source) had same name as CharacterDecoder inst var
>>> :(
>>> Find a second attempt:
>>>
>>> | str |
>>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream binary.
>>> {
>>> [ str reset. (XtreamUTF8Converter on: str readXtream buffered)
>>> buffered upToEnd ] bench.
>>> [ str reset. (UTF8Decoder new source: str readXtream buffered)
>>> buffered upToEnd ] bench.
>>> }
>>> #('118.0347513481126 per second.' '31.38117129722167 per second.')
>>>
>>>
>>> As you can see, the optimistic ASCII version is pessimistic in case of
>>> non ASCII...
>>> It creates a composite stream and perform a lot of copys...
>>> This is known and waiting better algorithm :)
>>>
>>
>> whoops.. you got more than 3x speedup, while mine was around 2x.
>> But please, try on ascii files.
>>
>>  | str |
>>  str := (String new: 1000 withAll: $a) asByteArray.
>>  {
>>  [ (XtreamUTF8Converter on: str readXtream binary)  upToEnd ] bench.
>>  [ (UTF8Decoder new source: str readXtream binary)  upToEnd ] bench.
>>  [ str readXtream binary upToEnd ] bench.
>>  }
>>  #('2039.392121575685 per second.' '1158.568286342731 per second.'
>> '92143.1713657269 per second.')
>>
>> so, conversion is 90..45 times slower than just copying data :)
>> We need to tighten up this gap.
>> One would be to optimize #readInto:startingAt:count: using batch-mode
>> conversion.
>>
>
> Igor, you also got a problem:
>
> | str |
> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream binary.
> (XtreamUTF8Converter on: str readXtream) upToEnd = (StandardFileStream
> readOnlyFileNamed: 'unitext.txt') contents utf8ToSqueak
> -> false
>
> unless it's utf8ToSqueak and leadingChar stuff...
>
yes, this is because of wrong quirk in XtreamWrapper>>upToEnd
i also tried to replace #next: by #nextAvailable:
but it still not working right - returns 8192 characters (4 fully read
buffers), but missing the tail, when buffer is underflow due to
meeting end of file.
By reverting this method back to same as in ReadXtream it works
correctly, but apparently will be slower:

(XtreamUTF8Converter on:  (StandardFileStream readOnlyFileNamed:
'unitext.txt') readXtream binary) contents size 10153
(StandardFileStream readOnlyFileNamed: 'unitext.txt') contents
utf8ToSqueak size  10153

Please see what can be done in this regard.

P.S. tried to bench it without my quirk..  heh, not so slower....
 | str |
str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream binary.
{
[ str reset. (XtreamUTF8Converter on: str readXtream buffered) upToEnd ] bench.
[ str reset. (UTF8Decoder new source: str readXtream buffered) upToEnd ] bench.
 }
 #('66.653362602275 per second.' '27.16197323746755 per second.')

>>> Nicolas
>>>
>>>> --
>>>> Best regards,
>>>> Igor Stasenko AKA sig.
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Streams. Status and where to go?

Nicolas Cellier
2010/2/28 Igor Stasenko <[hidden email]>:

> On 28 February 2010 12:45, Nicolas Cellier
> <[hidden email]> wrote:
>> 2010/2/28 Igor Stasenko <[hidden email]>:
>>> On 28 February 2010 12:00, Nicolas Cellier
>>> <[hidden email]> wrote:
>>>> 2010/2/28 Igor Stasenko <[hidden email]>:
>>>>> Hi, i'm also did some hacking. I uploaded XTream-Wrappers-sig.1 into SqS/XTream.
>>>>>
>>>>> There is a basic XtreamWrapper class, which should work transparently
>>>>> for any stream (hopefully ;).
>>>>> Next, in subclass i created converter. Sure thing i could also add a
>>>>> buffered wrapper, but maybe later :)
>>>>>
>>>>> Here some benchmarks. The file i used to test is utf-8 russian doc
>>>>> text - in attachment..
>>>>>
>>>>> | str |
>>>>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream.
>>>>> {
>>>>> [ str reset. (XtreamUTF8Converter on: str readXtream) upToEnd ] bench.
>>>>> [ str reset. (UTF8Decoder new source: str readXtream) upToEnd ] bench.
>>>>> }
>>>>> #('21.71314741035857 per second.' '14.0371688414393 per second.')
>>>>>  #('22.16896345116836 per second.' '14.5186953062848 per second.')
>>>>>
>>>>> Next, buffered
>>>>>
>>>>> | str |
>>>>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream.
>>>>> {
>>>>> [ str reset. (XtreamUTF8Converter on: str readXtream buffered) upToEnd ] bench.
>>>>> [ str reset. (UTF8Decoder new source: str readXtream buffered) upToEnd ] bench.
>>>>> }
>>>>> #('58.52976428286057 per second.' '25.44225800039754 per second.')
>>>>> #('58.90575079872205 per second.' '25.87064676616916 per second.')
>>>>>
>>>>>
>>>>> I'm also tried double-buffering, but neither my class nor yours
>>>>> currently works with it:
>>>>>
>>>>> | str |
>>>>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream.
>>>>> {
>>>>> [ str reset. (XtreamUTF8Converter on: str readXtream buffered)
>>>>> buffered upToEnd ] bench.
>>>>> [ str reset. (UTF8Decoder new source: str readXtream buffered)
>>>>> buffered upToEnd ] bench.
>>>>> }
>>>>>
>>>>> Please , take a look. There are some quirks which not because i
>>>>> cleaned up decoding/encoding code.
>>>>> See XtreamWrapper>>upToEnd implementation.
>>>>>
>>>>>
>>>>
>>>> Yes I published a bit soon and messed up because one temp from text
>>>> converter method (source) had same name as CharacterDecoder inst var
>>>> :(
>>>> Find a second attempt:
>>>>
>>>> | str |
>>>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream binary.
>>>> {
>>>> [ str reset. (XtreamUTF8Converter on: str readXtream buffered)
>>>> buffered upToEnd ] bench.
>>>> [ str reset. (UTF8Decoder new source: str readXtream buffered)
>>>> buffered upToEnd ] bench.
>>>> }
>>>> #('118.0347513481126 per second.' '31.38117129722167 per second.')
>>>>
>>>>
>>>> As you can see, the optimistic ASCII version is pessimistic in case of
>>>> non ASCII...
>>>> It creates a composite stream and perform a lot of copys...
>>>> This is known and waiting better algorithm :)
>>>>
>>>
>>> whoops.. you got more than 3x speedup, while mine was around 2x.
>>> But please, try on ascii files.
>>>
>>>  | str |
>>>  str := (String new: 1000 withAll: $a) asByteArray.
>>>  {
>>>  [ (XtreamUTF8Converter on: str readXtream binary)  upToEnd ] bench.
>>>  [ (UTF8Decoder new source: str readXtream binary)  upToEnd ] bench.
>>>  [ str readXtream binary upToEnd ] bench.
>>>  }
>>>  #('2039.392121575685 per second.' '1158.568286342731 per second.'
>>> '92143.1713657269 per second.')
>>>
>>> so, conversion is 90..45 times slower than just copying data :)
>>> We need to tighten up this gap.
>>> One would be to optimize #readInto:startingAt:count: using batch-mode
>>> conversion.
>>>
>>
>> Igor, you also got a problem:
>>
>> | str |
>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream binary.
>> (XtreamUTF8Converter on: str readXtream) upToEnd = (StandardFileStream
>> readOnlyFileNamed: 'unitext.txt') contents utf8ToSqueak
>> -> false
>>
>> unless it's utf8ToSqueak and leadingChar stuff...
>>
> yes, this is because of wrong quirk in XtreamWrapper>>upToEnd
> i also tried to replace #next: by #nextAvailable:
> but it still not working right - returns 8192 characters (4 fully read
> buffers), but missing the tail, when buffer is underflow due to
> meeting end of file.
> By reverting this method back to same as in ReadXtream it works
> correctly, but apparently will be slower:
>
> (XtreamUTF8Converter on:  (StandardFileStream readOnlyFileNamed:
> 'unitext.txt') readXtream binary) contents size 10153
> (StandardFileStream readOnlyFileNamed: 'unitext.txt') contents
> utf8ToSqueak size  10153
>
> Please see what can be done in this regard.
>
> P.S. tried to bench it without my quirk..  heh, not so slower....
>  | str |
> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream binary.
> {
> [ str reset. (XtreamUTF8Converter on: str readXtream buffered) upToEnd ] bench.
> [ str reset. (UTF8Decoder new source: str readXtream buffered) upToEnd ] bench.
>  }
>  #('66.653362602275 per second.' '27.16197323746755 per second.')
>

Anyway, main contributor of UTF8Decoder inefficiency is
Unicode>>value: and leadingChar handling:

| str |
str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream binary.
MessageTally spyOn:
[ (UTF8Decoder new source: str readXtream buffered)
buffered upToEnd ]


 - 133 tallies, 133 msec.

**Tree**
--------------------------------
Process: (40) 37486: nil
--------------------------------
99.2% {132ms} BufferedReadXtream>>checkAvailableDataInBuffer
  99.2% {132ms} BufferedReadXtream>>readNextBuffer
    99.2% {132ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
      50.4% {67ms} UTF8Decoder>>next
        |45.1% {60ms} Unicode class>>value:
        |  |30.1% {40ms} Locale>>languageEnvironment
        |  |  |29.3% {39ms} LanguageEnvironment class>>localeID:
        |  |  |  29.3% {39ms} Dictionary>>at:ifAbsent:
        |  |  |    27.1% {36ms} Dictionary>>scanFor:
        |  |  |      |15.0% {20ms} LocaleID>>=
        |  |  |      |  |6.0% {8ms} UndefinedObject(Object)>>=
        |  |  |      |  |6.0% {8ms} primitives
        |  |  |      |  |3.0% {4ms} ByteString(String)>>=
        |  |  |      |  |  2.3% {3ms} primitives
        |  |  |      |9.8% {13ms} LocaleID>>hash
        |  |  |      |  |6.8% {9ms} ByteString(String)>>hash
        |  |  |      |  |2.3% {3ms} UndefinedObject(Object)>>hash
        |  |  |      |  |  1.5% {2ms} primitives
        |  |  |      |2.3% {3ms} primitives
        |  |  |    2.3% {3ms} primitives
        |  |9.8% {13ms} Character class>>leadingChar:code:
        |  |  |5.3% {7ms} Character class>>value:
        |  |  |  |3.0% {4ms} primitives
        |  |  |  |2.3% {3ms} Character>>setValue:
        |  |  |4.5% {6ms} primitives
        |  |2.3% {3ms} primitives
        |  |1.5% {2ms} Latin1Environment(LanguageEnvironment)>>leadingChar
        |  |1.5% {2ms} Locale class>>currentPlatform
        |3.0% {4ms} CollectionReadXtream>>next
        |2.3% {3ms} primitives
      18.8% {25ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
        |9.8% {13ms} UTF8Decoder>>next
        |  |9.0% {12ms} Unicode class>>value:
        |  |  6.0% {8ms} Locale>>languageEnvironment
        |  |    |5.3% {7ms} LanguageEnvironment class>>localeID:
        |  |    |  5.3% {7ms} Dictionary>>at:ifAbsent:
        |  |    |    5.3% {7ms} Dictionary>>scanFor:
        |  |    |      3.0% {4ms} LocaleID>>=
        |  |    |        |1.5% {2ms} UndefinedObject(Object)>>=
        |  |    |        |1.5% {2ms} ByteString(String)>>=
        |  |    |        |  1.5% {2ms}
ByteString(String)>>compare:with:collated:
        |  |    |      2.3% {3ms} LocaleID>>hash
        |  |    |        1.5% {2ms} UndefinedObject(Object)>>hash
        |  |  2.3% {3ms} Latin1Environment(LanguageEnvironment)>>leadingChar
        |5.3% {7ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
        |  |3.0% {4ms} UTF8Decoder>>next
        |  |  |3.0% {4ms} Unicode class>>value:
        |  |  |  2.3% {3ms} Locale>>languageEnvironment
        |  |  |    2.3% {3ms} LanguageEnvironment class>>localeID:
        |  |  |      2.3% {3ms} Dictionary>>at:ifAbsent:
        |  |  |        2.3% {3ms} Dictionary>>scanFor:
        |  |1.5% {2ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
        |2.3% {3ms} ByteArray(SequenceableCollection)>>readXtreamFrom:to:
        |  1.5% {2ms} CollectionReadXtream class>>read:from:to:
        |    1.5% {2ms} CollectionReadXtream>>read:from:to:
      13.5% {18ms} ByteArray(SequenceableCollection)>>readXtreamFrom:to:
        |10.5% {14ms} CollectionReadXtream class>>read:from:to:
        |  |6.8% {9ms} CollectionReadXtream>>read:from:to:
        |  |  |3.0% {4ms} primitives
        |  |  |2.3% {3ms} SmallInteger(Magnitude)>>max:
        |  |  |1.5% {2ms} SmallInteger(Magnitude)>>min:
        |  |2.3% {3ms} CollectionReadXtream class(Behavior)>>new
        |  |1.5% {2ms} primitives
        |3.0% {4ms} primitives
      3.8% {5ms} Unicode class>>value:
        |3.0% {4ms} primitives
      2.3% {3ms} CollectionReadXtream>>position
      2.3% {3ms} CollectionReadXtream(ReadXtream)>>endOfStreamAction:
      2.3% {3ms} primitives
      1.5% {2ms} ByteString>>at:put:
      1.5% {2ms} WideString class(Behavior)>>isBytes
      1.5% {2ms} UTF8Decoder(ReadXtream)>>source:
      1.5% {2ms} WideString>>at:put:

**Leaves**
7.5% {10ms} UndefinedObject(Object)>>=
6.8% {9ms} LocaleID>>=
6.8% {9ms} ByteString(String)>>hash
6.0% {8ms} Unicode class>>value:
4.5% {6ms} Character class>>leadingChar:code:
3.8% {5ms} CollectionReadXtream>>position
3.8% {5ms} CollectionReadXtream>>read:from:to:
3.8% {5ms} ByteArray(SequenceableCollection)>>readXtreamFrom:to:
3.8% {5ms} CollectionReadXtream>>next
3.8% {5ms} Latin1Environment(LanguageEnvironment)>>leadingChar
3.8% {5ms} Character class>>value:
3.0% {4ms} UndefinedObject(Object)>>hash
3.0% {4ms} WideString>>at:put:
3.0% {4ms} Dictionary>>scanFor:
3.0% {4ms} SmallInteger(Magnitude)>>max:
3.0% {4ms} Character>>setValue:
3.0% {4ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
2.3% {3ms} CollectionReadXtream(ReadXtream)>>endOfStreamAction:
2.3% {3ms} UndefinedObject(ProtoObject)>>scaledIdentityHash
2.3% {3ms} UTF8Decoder>>next
2.3% {3ms} ByteString(String)>>compare:with:collated:
2.3% {3ms} Dictionary>>at:ifAbsent:
2.3% {3ms} CollectionReadXtream class(Behavior)>>new
2.3% {3ms} ByteString(String)>>=
1.5% {2ms} Locale class>>currentPlatform
1.5% {2ms} LocaleID>>hash
1.5% {2ms} Locale>>languageEnvironment
1.5% {2ms} UTF8Decoder(ReadXtream)>>source:
1.5% {2ms} CollectionReadXtream class>>read:from:to:
1.5% {2ms} SmallInteger(Magnitude)>>min:

**Memory**
        old +0 bytes
        young -454,224 bytes
        used -454,224 bytes
        free +454,224 bytes

**GCs**
        full 0 totalling 0ms (0.0% uptime)
        incr 15 totalling 14ms (11.0% uptime), avg 1.0ms
        tenures 0
        root table 0 overflows


>>>> Nicolas
>>>>
>>>>> --
>>>>> Best regards,
>>>>> Igor Stasenko AKA sig.
>>>>>
>>>>> _______________________________________________
>>>>> Pharo-project mailing list
>>>>> [hidden email]
>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Streams. Status and where to go?

Igor Stasenko
On 28 February 2010 14:33, Nicolas Cellier
<[hidden email]> wrote:
> 2010/2/28 Igor Stasenko <[hidden email]>:
>> On 28 February 2010 12:45, Nicolas Cellier
......
>
> Anyway, main contributor of UTF8Decoder inefficiency is
> Unicode>>value: and leadingChar handling:
>

btw, why its using #value: , not more efficient charFromUnicode: ?

value: code

        | l |
        code < 256 ifTrue: [^ Character value: code].
        l := Locale currentPlatform languageEnvironment leadingChar.
        l = 0 ifTrue: [l := 255].
        ^ Character leadingChar: l code: code.

for unicode leadingChar is always 255 (see Unicode>>leadingChar)

So, i'm not sure what this code with Locale doing here?

Btw, converter could use a 'leadingChar' ivar, which can be initialized to
Locale currentPlatform languageEnvironment leadingChar. (if this correct),
or Unicode>>leadingChar , which i feel more correct.

And then just build chars by own, without using Unicode global:
  Character leadingChar: leadingChar code: uniCode



> | str |
> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream binary.
> MessageTally spyOn:
> [ (UTF8Decoder new source: str readXtream buffered)
> buffered upToEnd ]
>
>
>  - 133 tallies, 133 msec.
>
> **Tree**
> --------------------------------
> Process: (40) 37486: nil
> --------------------------------
> 99.2% {132ms} BufferedReadXtream>>checkAvailableDataInBuffer
>  99.2% {132ms} BufferedReadXtream>>readNextBuffer
>    99.2% {132ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>      50.4% {67ms} UTF8Decoder>>next
>        |45.1% {60ms} Unicode class>>value:
>        |  |30.1% {40ms} Locale>>languageEnvironment
>        |  |  |29.3% {39ms} LanguageEnvironment class>>localeID:
>        |  |  |  29.3% {39ms} Dictionary>>at:ifAbsent:
>        |  |  |    27.1% {36ms} Dictionary>>scanFor:
>        |  |  |      |15.0% {20ms} LocaleID>>=
>        |  |  |      |  |6.0% {8ms} UndefinedObject(Object)>>=
>        |  |  |      |  |6.0% {8ms} primitives
>        |  |  |      |  |3.0% {4ms} ByteString(String)>>=
>        |  |  |      |  |  2.3% {3ms} primitives
>        |  |  |      |9.8% {13ms} LocaleID>>hash
>        |  |  |      |  |6.8% {9ms} ByteString(String)>>hash
>        |  |  |      |  |2.3% {3ms} UndefinedObject(Object)>>hash
>        |  |  |      |  |  1.5% {2ms} primitives
>        |  |  |      |2.3% {3ms} primitives
>        |  |  |    2.3% {3ms} primitives
>        |  |9.8% {13ms} Character class>>leadingChar:code:
>        |  |  |5.3% {7ms} Character class>>value:
>        |  |  |  |3.0% {4ms} primitives
>        |  |  |  |2.3% {3ms} Character>>setValue:
>        |  |  |4.5% {6ms} primitives
>        |  |2.3% {3ms} primitives
>        |  |1.5% {2ms} Latin1Environment(LanguageEnvironment)>>leadingChar
>        |  |1.5% {2ms} Locale class>>currentPlatform
>        |3.0% {4ms} CollectionReadXtream>>next
>        |2.3% {3ms} primitives
>      18.8% {25ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>        |9.8% {13ms} UTF8Decoder>>next
>        |  |9.0% {12ms} Unicode class>>value:
>        |  |  6.0% {8ms} Locale>>languageEnvironment
>        |  |    |5.3% {7ms} LanguageEnvironment class>>localeID:
>        |  |    |  5.3% {7ms} Dictionary>>at:ifAbsent:
>        |  |    |    5.3% {7ms} Dictionary>>scanFor:
>        |  |    |      3.0% {4ms} LocaleID>>=
>        |  |    |        |1.5% {2ms} UndefinedObject(Object)>>=
>        |  |    |        |1.5% {2ms} ByteString(String)>>=
>        |  |    |        |  1.5% {2ms}
> ByteString(String)>>compare:with:collated:
>        |  |    |      2.3% {3ms} LocaleID>>hash
>        |  |    |        1.5% {2ms} UndefinedObject(Object)>>hash
>        |  |  2.3% {3ms} Latin1Environment(LanguageEnvironment)>>leadingChar
>        |5.3% {7ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>        |  |3.0% {4ms} UTF8Decoder>>next
>        |  |  |3.0% {4ms} Unicode class>>value:
>        |  |  |  2.3% {3ms} Locale>>languageEnvironment
>        |  |  |    2.3% {3ms} LanguageEnvironment class>>localeID:
>        |  |  |      2.3% {3ms} Dictionary>>at:ifAbsent:
>        |  |  |        2.3% {3ms} Dictionary>>scanFor:
>        |  |1.5% {2ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>        |2.3% {3ms} ByteArray(SequenceableCollection)>>readXtreamFrom:to:
>        |  1.5% {2ms} CollectionReadXtream class>>read:from:to:
>        |    1.5% {2ms} CollectionReadXtream>>read:from:to:
>      13.5% {18ms} ByteArray(SequenceableCollection)>>readXtreamFrom:to:
>        |10.5% {14ms} CollectionReadXtream class>>read:from:to:
>        |  |6.8% {9ms} CollectionReadXtream>>read:from:to:
>        |  |  |3.0% {4ms} primitives
>        |  |  |2.3% {3ms} SmallInteger(Magnitude)>>max:
>        |  |  |1.5% {2ms} SmallInteger(Magnitude)>>min:
>        |  |2.3% {3ms} CollectionReadXtream class(Behavior)>>new
>        |  |1.5% {2ms} primitives
>        |3.0% {4ms} primitives
>      3.8% {5ms} Unicode class>>value:
>        |3.0% {4ms} primitives
>      2.3% {3ms} CollectionReadXtream>>position
>      2.3% {3ms} CollectionReadXtream(ReadXtream)>>endOfStreamAction:
>      2.3% {3ms} primitives
>      1.5% {2ms} ByteString>>at:put:
>      1.5% {2ms} WideString class(Behavior)>>isBytes
>      1.5% {2ms} UTF8Decoder(ReadXtream)>>source:
>      1.5% {2ms} WideString>>at:put:
>
> **Leaves**
> 7.5% {10ms} UndefinedObject(Object)>>=
> 6.8% {9ms} LocaleID>>=
> 6.8% {9ms} ByteString(String)>>hash
> 6.0% {8ms} Unicode class>>value:
> 4.5% {6ms} Character class>>leadingChar:code:
> 3.8% {5ms} CollectionReadXtream>>position
> 3.8% {5ms} CollectionReadXtream>>read:from:to:
> 3.8% {5ms} ByteArray(SequenceableCollection)>>readXtreamFrom:to:
> 3.8% {5ms} CollectionReadXtream>>next
> 3.8% {5ms} Latin1Environment(LanguageEnvironment)>>leadingChar
> 3.8% {5ms} Character class>>value:
> 3.0% {4ms} UndefinedObject(Object)>>hash
> 3.0% {4ms} WideString>>at:put:
> 3.0% {4ms} Dictionary>>scanFor:
> 3.0% {4ms} SmallInteger(Magnitude)>>max:
> 3.0% {4ms} Character>>setValue:
> 3.0% {4ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
> 2.3% {3ms} CollectionReadXtream(ReadXtream)>>endOfStreamAction:
> 2.3% {3ms} UndefinedObject(ProtoObject)>>scaledIdentityHash
> 2.3% {3ms} UTF8Decoder>>next
> 2.3% {3ms} ByteString(String)>>compare:with:collated:
> 2.3% {3ms} Dictionary>>at:ifAbsent:
> 2.3% {3ms} CollectionReadXtream class(Behavior)>>new
> 2.3% {3ms} ByteString(String)>>=
> 1.5% {2ms} Locale class>>currentPlatform
> 1.5% {2ms} LocaleID>>hash
> 1.5% {2ms} Locale>>languageEnvironment
> 1.5% {2ms} UTF8Decoder(ReadXtream)>>source:
> 1.5% {2ms} CollectionReadXtream class>>read:from:to:
> 1.5% {2ms} SmallInteger(Magnitude)>>min:
>
> **Memory**
>        old                     +0 bytes
>        young           -454,224 bytes
>        used            -454,224 bytes
>        free            +454,224 bytes
>
> **GCs**
>        full                    0 totalling 0ms (0.0% uptime)
>        incr            15 totalling 14ms (11.0% uptime), avg 1.0ms
>        tenures         0
>        root table      0 overflows
>
>
>>>>> Nicolas
>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Igor Stasenko AKA sig.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pharo-project mailing list
>>>>>> [hidden email]
>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pharo-project mailing list
>>>>> [hidden email]
>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Igor Stasenko AKA sig.
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Streams. Status and where to go?

Nicolas Cellier
2010/2/28 Igor Stasenko <[hidden email]>:

> On 28 February 2010 14:33, Nicolas Cellier
> <[hidden email]> wrote:
>> 2010/2/28 Igor Stasenko <[hidden email]>:
>>> On 28 February 2010 12:45, Nicolas Cellier
> ......
>>
>> Anyway, main contributor of UTF8Decoder inefficiency is
>> Unicode>>value: and leadingChar handling:
>>
>
> btw, why its using #value: , not more efficient charFromUnicode: ?
>
> value: code
>
>        | l |
>        code < 256 ifTrue: [^ Character value: code].
>        l := Locale currentPlatform languageEnvironment leadingChar.
>        l = 0 ifTrue: [l := 255].
>        ^ Character leadingChar: l code: code.
>
> for unicode leadingChar is always 255 (see Unicode>>leadingChar)
>
> So, i'm not sure what this code with Locale doing here?
>
> Btw, converter could use a 'leadingChar' ivar, which can be initialized to
> Locale currentPlatform languageEnvironment leadingChar. (if this correct),
> or Unicode>>leadingChar , which i feel more correct.
>
> And then just build chars by own, without using Unicode global:
>  Character leadingChar: leadingChar code: uniCode
>
>

Yes, and as already suggested a few month ago by Andreas and former by
me and some others, the Unicode leadingChar should better be = 0

Nicolas

>
>> | str |
>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream binary.
>> MessageTally spyOn:
>> [ (UTF8Decoder new source: str readXtream buffered)
>> buffered upToEnd ]
>>
>>
>>  - 133 tallies, 133 msec.
>>
>> **Tree**
>> --------------------------------
>> Process: (40) 37486: nil
>> --------------------------------
>> 99.2% {132ms} BufferedReadXtream>>checkAvailableDataInBuffer
>>  99.2% {132ms} BufferedReadXtream>>readNextBuffer
>>    99.2% {132ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>>      50.4% {67ms} UTF8Decoder>>next
>>        |45.1% {60ms} Unicode class>>value:
>>        |  |30.1% {40ms} Locale>>languageEnvironment
>>        |  |  |29.3% {39ms} LanguageEnvironment class>>localeID:
>>        |  |  |  29.3% {39ms} Dictionary>>at:ifAbsent:
>>        |  |  |    27.1% {36ms} Dictionary>>scanFor:
>>        |  |  |      |15.0% {20ms} LocaleID>>=
>>        |  |  |      |  |6.0% {8ms} UndefinedObject(Object)>>=
>>        |  |  |      |  |6.0% {8ms} primitives
>>        |  |  |      |  |3.0% {4ms} ByteString(String)>>=
>>        |  |  |      |  |  2.3% {3ms} primitives
>>        |  |  |      |9.8% {13ms} LocaleID>>hash
>>        |  |  |      |  |6.8% {9ms} ByteString(String)>>hash
>>        |  |  |      |  |2.3% {3ms} UndefinedObject(Object)>>hash
>>        |  |  |      |  |  1.5% {2ms} primitives
>>        |  |  |      |2.3% {3ms} primitives
>>        |  |  |    2.3% {3ms} primitives
>>        |  |9.8% {13ms} Character class>>leadingChar:code:
>>        |  |  |5.3% {7ms} Character class>>value:
>>        |  |  |  |3.0% {4ms} primitives
>>        |  |  |  |2.3% {3ms} Character>>setValue:
>>        |  |  |4.5% {6ms} primitives
>>        |  |2.3% {3ms} primitives
>>        |  |1.5% {2ms} Latin1Environment(LanguageEnvironment)>>leadingChar
>>        |  |1.5% {2ms} Locale class>>currentPlatform
>>        |3.0% {4ms} CollectionReadXtream>>next
>>        |2.3% {3ms} primitives
>>      18.8% {25ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>>        |9.8% {13ms} UTF8Decoder>>next
>>        |  |9.0% {12ms} Unicode class>>value:
>>        |  |  6.0% {8ms} Locale>>languageEnvironment
>>        |  |    |5.3% {7ms} LanguageEnvironment class>>localeID:
>>        |  |    |  5.3% {7ms} Dictionary>>at:ifAbsent:
>>        |  |    |    5.3% {7ms} Dictionary>>scanFor:
>>        |  |    |      3.0% {4ms} LocaleID>>=
>>        |  |    |        |1.5% {2ms} UndefinedObject(Object)>>=
>>        |  |    |        |1.5% {2ms} ByteString(String)>>=
>>        |  |    |        |  1.5% {2ms}
>> ByteString(String)>>compare:with:collated:
>>        |  |    |      2.3% {3ms} LocaleID>>hash
>>        |  |    |        1.5% {2ms} UndefinedObject(Object)>>hash
>>        |  |  2.3% {3ms} Latin1Environment(LanguageEnvironment)>>leadingChar
>>        |5.3% {7ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>>        |  |3.0% {4ms} UTF8Decoder>>next
>>        |  |  |3.0% {4ms} Unicode class>>value:
>>        |  |  |  2.3% {3ms} Locale>>languageEnvironment
>>        |  |  |    2.3% {3ms} LanguageEnvironment class>>localeID:
>>        |  |  |      2.3% {3ms} Dictionary>>at:ifAbsent:
>>        |  |  |        2.3% {3ms} Dictionary>>scanFor:
>>        |  |1.5% {2ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>>        |2.3% {3ms} ByteArray(SequenceableCollection)>>readXtreamFrom:to:
>>        |  1.5% {2ms} CollectionReadXtream class>>read:from:to:
>>        |    1.5% {2ms} CollectionReadXtream>>read:from:to:
>>      13.5% {18ms} ByteArray(SequenceableCollection)>>readXtreamFrom:to:
>>        |10.5% {14ms} CollectionReadXtream class>>read:from:to:
>>        |  |6.8% {9ms} CollectionReadXtream>>read:from:to:
>>        |  |  |3.0% {4ms} primitives
>>        |  |  |2.3% {3ms} SmallInteger(Magnitude)>>max:
>>        |  |  |1.5% {2ms} SmallInteger(Magnitude)>>min:
>>        |  |2.3% {3ms} CollectionReadXtream class(Behavior)>>new
>>        |  |1.5% {2ms} primitives
>>        |3.0% {4ms} primitives
>>      3.8% {5ms} Unicode class>>value:
>>        |3.0% {4ms} primitives
>>      2.3% {3ms} CollectionReadXtream>>position
>>      2.3% {3ms} CollectionReadXtream(ReadXtream)>>endOfStreamAction:
>>      2.3% {3ms} primitives
>>      1.5% {2ms} ByteString>>at:put:
>>      1.5% {2ms} WideString class(Behavior)>>isBytes
>>      1.5% {2ms} UTF8Decoder(ReadXtream)>>source:
>>      1.5% {2ms} WideString>>at:put:
>>
>> **Leaves**
>> 7.5% {10ms} UndefinedObject(Object)>>=
>> 6.8% {9ms} LocaleID>>=
>> 6.8% {9ms} ByteString(String)>>hash
>> 6.0% {8ms} Unicode class>>value:
>> 4.5% {6ms} Character class>>leadingChar:code:
>> 3.8% {5ms} CollectionReadXtream>>position
>> 3.8% {5ms} CollectionReadXtream>>read:from:to:
>> 3.8% {5ms} ByteArray(SequenceableCollection)>>readXtreamFrom:to:
>> 3.8% {5ms} CollectionReadXtream>>next
>> 3.8% {5ms} Latin1Environment(LanguageEnvironment)>>leadingChar
>> 3.8% {5ms} Character class>>value:
>> 3.0% {4ms} UndefinedObject(Object)>>hash
>> 3.0% {4ms} WideString>>at:put:
>> 3.0% {4ms} Dictionary>>scanFor:
>> 3.0% {4ms} SmallInteger(Magnitude)>>max:
>> 3.0% {4ms} Character>>setValue:
>> 3.0% {4ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>> 2.3% {3ms} CollectionReadXtream(ReadXtream)>>endOfStreamAction:
>> 2.3% {3ms} UndefinedObject(ProtoObject)>>scaledIdentityHash
>> 2.3% {3ms} UTF8Decoder>>next
>> 2.3% {3ms} ByteString(String)>>compare:with:collated:
>> 2.3% {3ms} Dictionary>>at:ifAbsent:
>> 2.3% {3ms} CollectionReadXtream class(Behavior)>>new
>> 2.3% {3ms} ByteString(String)>>=
>> 1.5% {2ms} Locale class>>currentPlatform
>> 1.5% {2ms} LocaleID>>hash
>> 1.5% {2ms} Locale>>languageEnvironment
>> 1.5% {2ms} UTF8Decoder(ReadXtream)>>source:
>> 1.5% {2ms} CollectionReadXtream class>>read:from:to:
>> 1.5% {2ms} SmallInteger(Magnitude)>>min:
>>
>> **Memory**
>>        old                     +0 bytes
>>        young           -454,224 bytes
>>        used            -454,224 bytes
>>        free            +454,224 bytes
>>
>> **GCs**
>>        full                    0 totalling 0ms (0.0% uptime)
>>        incr            15 totalling 14ms (11.0% uptime), avg 1.0ms
>>        tenures         0
>>        root table      0 overflows
>>
>>
>>>>>> Nicolas
>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Igor Stasenko AKA sig.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pharo-project mailing list
>>>>>>> [hidden email]
>>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pharo-project mailing list
>>>>>> [hidden email]
>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Igor Stasenko AKA sig.
>>>>>
>>>>> _______________________________________________
>>>>> Pharo-project mailing list
>>>>> [hidden email]
>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Streams. Status and where to go?

Igor Stasenko
On 28 February 2010 15:44, Nicolas Cellier
<[hidden email]> wrote:

> 2010/2/28 Igor Stasenko <[hidden email]>:
>> On 28 February 2010 14:33, Nicolas Cellier
>> <[hidden email]> wrote:
>>> 2010/2/28 Igor Stasenko <[hidden email]>:
>>>> On 28 February 2010 12:45, Nicolas Cellier
>> ......
>>>
>>> Anyway, main contributor of UTF8Decoder inefficiency is
>>> Unicode>>value: and leadingChar handling:
>>>
>>
>> btw, why its using #value: , not more efficient charFromUnicode: ?
>>
>> value: code
>>
>>        | l |
>>        code < 256 ifTrue: [^ Character value: code].
>>        l := Locale currentPlatform languageEnvironment leadingChar.
>>        l = 0 ifTrue: [l := 255].
>>        ^ Character leadingChar: l code: code.
>>
>> for unicode leadingChar is always 255 (see Unicode>>leadingChar)
>>
>> So, i'm not sure what this code with Locale doing here?
>>
>> Btw, converter could use a 'leadingChar' ivar, which can be initialized to
>> Locale currentPlatform languageEnvironment leadingChar. (if this correct),
>> or Unicode>>leadingChar , which i feel more correct.
>>
>> And then just build chars by own, without using Unicode global:
>>  Character leadingChar: leadingChar code: uniCode
>>
>>
>
> Yes, and as already suggested a few month ago by Andreas and former by
> me and some others, the Unicode leadingChar should better be = 0
>
big +1
by default , a most straightforward things should be those, which is
universal in our world.
And unicode is hardly can be depicted as a marginal one :)

> Nicolas
>
>>
>>> | str |
>>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream binary.
>>> MessageTally spyOn:
>>> [ (UTF8Decoder new source: str readXtream buffered)
>>> buffered upToEnd ]
>>>
>>>
>>>  - 133 tallies, 133 msec.
>>>
>>> **Tree**
>>> --------------------------------
>>> Process: (40) 37486: nil
>>> --------------------------------
>>> 99.2% {132ms} BufferedReadXtream>>checkAvailableDataInBuffer
>>>  99.2% {132ms} BufferedReadXtream>>readNextBuffer
>>>    99.2% {132ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>>>      50.4% {67ms} UTF8Decoder>>next
>>>        |45.1% {60ms} Unicode class>>value:
>>>        |  |30.1% {40ms} Locale>>languageEnvironment
>>>        |  |  |29.3% {39ms} LanguageEnvironment class>>localeID:
>>>        |  |  |  29.3% {39ms} Dictionary>>at:ifAbsent:
>>>        |  |  |    27.1% {36ms} Dictionary>>scanFor:
>>>        |  |  |      |15.0% {20ms} LocaleID>>=
>>>        |  |  |      |  |6.0% {8ms} UndefinedObject(Object)>>=
>>>        |  |  |      |  |6.0% {8ms} primitives
>>>        |  |  |      |  |3.0% {4ms} ByteString(String)>>=
>>>        |  |  |      |  |  2.3% {3ms} primitives
>>>        |  |  |      |9.8% {13ms} LocaleID>>hash
>>>        |  |  |      |  |6.8% {9ms} ByteString(String)>>hash
>>>        |  |  |      |  |2.3% {3ms} UndefinedObject(Object)>>hash
>>>        |  |  |      |  |  1.5% {2ms} primitives
>>>        |  |  |      |2.3% {3ms} primitives
>>>        |  |  |    2.3% {3ms} primitives
>>>        |  |9.8% {13ms} Character class>>leadingChar:code:
>>>        |  |  |5.3% {7ms} Character class>>value:
>>>        |  |  |  |3.0% {4ms} primitives
>>>        |  |  |  |2.3% {3ms} Character>>setValue:
>>>        |  |  |4.5% {6ms} primitives
>>>        |  |2.3% {3ms} primitives
>>>        |  |1.5% {2ms} Latin1Environment(LanguageEnvironment)>>leadingChar
>>>        |  |1.5% {2ms} Locale class>>currentPlatform
>>>        |3.0% {4ms} CollectionReadXtream>>next
>>>        |2.3% {3ms} primitives
>>>      18.8% {25ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>>>        |9.8% {13ms} UTF8Decoder>>next
>>>        |  |9.0% {12ms} Unicode class>>value:
>>>        |  |  6.0% {8ms} Locale>>languageEnvironment
>>>        |  |    |5.3% {7ms} LanguageEnvironment class>>localeID:
>>>        |  |    |  5.3% {7ms} Dictionary>>at:ifAbsent:
>>>        |  |    |    5.3% {7ms} Dictionary>>scanFor:
>>>        |  |    |      3.0% {4ms} LocaleID>>=
>>>        |  |    |        |1.5% {2ms} UndefinedObject(Object)>>=
>>>        |  |    |        |1.5% {2ms} ByteString(String)>>=
>>>        |  |    |        |  1.5% {2ms}
>>> ByteString(String)>>compare:with:collated:
>>>        |  |    |      2.3% {3ms} LocaleID>>hash
>>>        |  |    |        1.5% {2ms} UndefinedObject(Object)>>hash
>>>        |  |  2.3% {3ms} Latin1Environment(LanguageEnvironment)>>leadingChar
>>>        |5.3% {7ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>>>        |  |3.0% {4ms} UTF8Decoder>>next
>>>        |  |  |3.0% {4ms} Unicode class>>value:
>>>        |  |  |  2.3% {3ms} Locale>>languageEnvironment
>>>        |  |  |    2.3% {3ms} LanguageEnvironment class>>localeID:
>>>        |  |  |      2.3% {3ms} Dictionary>>at:ifAbsent:
>>>        |  |  |        2.3% {3ms} Dictionary>>scanFor:
>>>        |  |1.5% {2ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>>>        |2.3% {3ms} ByteArray(SequenceableCollection)>>readXtreamFrom:to:
>>>        |  1.5% {2ms} CollectionReadXtream class>>read:from:to:
>>>        |    1.5% {2ms} CollectionReadXtream>>read:from:to:
>>>      13.5% {18ms} ByteArray(SequenceableCollection)>>readXtreamFrom:to:
>>>        |10.5% {14ms} CollectionReadXtream class>>read:from:to:
>>>        |  |6.8% {9ms} CollectionReadXtream>>read:from:to:
>>>        |  |  |3.0% {4ms} primitives
>>>        |  |  |2.3% {3ms} SmallInteger(Magnitude)>>max:
>>>        |  |  |1.5% {2ms} SmallInteger(Magnitude)>>min:
>>>        |  |2.3% {3ms} CollectionReadXtream class(Behavior)>>new
>>>        |  |1.5% {2ms} primitives
>>>        |3.0% {4ms} primitives
>>>      3.8% {5ms} Unicode class>>value:
>>>        |3.0% {4ms} primitives
>>>      2.3% {3ms} CollectionReadXtream>>position
>>>      2.3% {3ms} CollectionReadXtream(ReadXtream)>>endOfStreamAction:
>>>      2.3% {3ms} primitives
>>>      1.5% {2ms} ByteString>>at:put:
>>>      1.5% {2ms} WideString class(Behavior)>>isBytes
>>>      1.5% {2ms} UTF8Decoder(ReadXtream)>>source:
>>>      1.5% {2ms} WideString>>at:put:
>>>
>>> **Leaves**
>>> 7.5% {10ms} UndefinedObject(Object)>>=
>>> 6.8% {9ms} LocaleID>>=
>>> 6.8% {9ms} ByteString(String)>>hash
>>> 6.0% {8ms} Unicode class>>value:
>>> 4.5% {6ms} Character class>>leadingChar:code:
>>> 3.8% {5ms} CollectionReadXtream>>position
>>> 3.8% {5ms} CollectionReadXtream>>read:from:to:
>>> 3.8% {5ms} ByteArray(SequenceableCollection)>>readXtreamFrom:to:
>>> 3.8% {5ms} CollectionReadXtream>>next
>>> 3.8% {5ms} Latin1Environment(LanguageEnvironment)>>leadingChar
>>> 3.8% {5ms} Character class>>value:
>>> 3.0% {4ms} UndefinedObject(Object)>>hash
>>> 3.0% {4ms} WideString>>at:put:
>>> 3.0% {4ms} Dictionary>>scanFor:
>>> 3.0% {4ms} SmallInteger(Magnitude)>>max:
>>> 3.0% {4ms} Character>>setValue:
>>> 3.0% {4ms} UTF8Decoder(CharacterDecoder)>>readInto:startingAt:count:
>>> 2.3% {3ms} CollectionReadXtream(ReadXtream)>>endOfStreamAction:
>>> 2.3% {3ms} UndefinedObject(ProtoObject)>>scaledIdentityHash
>>> 2.3% {3ms} UTF8Decoder>>next
>>> 2.3% {3ms} ByteString(String)>>compare:with:collated:
>>> 2.3% {3ms} Dictionary>>at:ifAbsent:
>>> 2.3% {3ms} CollectionReadXtream class(Behavior)>>new
>>> 2.3% {3ms} ByteString(String)>>=
>>> 1.5% {2ms} Locale class>>currentPlatform
>>> 1.5% {2ms} LocaleID>>hash
>>> 1.5% {2ms} Locale>>languageEnvironment
>>> 1.5% {2ms} UTF8Decoder(ReadXtream)>>source:
>>> 1.5% {2ms} CollectionReadXtream class>>read:from:to:
>>> 1.5% {2ms} SmallInteger(Magnitude)>>min:
>>>
>>> **Memory**
>>>        old                     +0 bytes
>>>        young           -454,224 bytes
>>>        used            -454,224 bytes
>>>        free            +454,224 bytes
>>>
>>> **GCs**
>>>        full                    0 totalling 0ms (0.0% uptime)
>>>        incr            15 totalling 14ms (11.0% uptime), avg 1.0ms
>>>        tenures         0
>>>        root table      0 overflows
>>>
>>>
>>>>>>> Nicolas
>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>> Igor Stasenko AKA sig.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pharo-project mailing list
>>>>>>>> [hidden email]
>>>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pharo-project mailing list
>>>>>>> [hidden email]
>>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Igor Stasenko AKA sig.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pharo-project mailing list
>>>>>> [hidden email]
>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>
>>>>> _______________________________________________
>>>>> Pharo-project mailing list
>>>>> [hidden email]
>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Igor Stasenko AKA sig.
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Streams. Status and where to go?

Nicolas Cellier
In reply to this post by Igor Stasenko
2010/2/28 Igor Stasenko <[hidden email]>:

> On 28 February 2010 12:00, Nicolas Cellier
> <[hidden email]> wrote:
>> 2010/2/28 Igor Stasenko <[hidden email]>:
>>> Hi, i'm also did some hacking. I uploaded XTream-Wrappers-sig.1 into SqS/XTream.
>>>
>>> There is a basic XtreamWrapper class, which should work transparently
>>> for any stream (hopefully ;).
>>> Next, in subclass i created converter. Sure thing i could also add a
>>> buffered wrapper, but maybe later :)
>>>
>>> Here some benchmarks. The file i used to test is utf-8 russian doc
>>> text - in attachment..
>>>
>>> | str |
>>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream.
>>> {
>>> [ str reset. (XtreamUTF8Converter on: str readXtream) upToEnd ] bench.
>>> [ str reset. (UTF8Decoder new source: str readXtream) upToEnd ] bench.
>>> }
>>> #('21.71314741035857 per second.' '14.0371688414393 per second.')
>>>  #('22.16896345116836 per second.' '14.5186953062848 per second.')
>>>
>>> Next, buffered
>>>
>>> | str |
>>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream.
>>> {
>>> [ str reset. (XtreamUTF8Converter on: str readXtream buffered) upToEnd ] bench.
>>> [ str reset. (UTF8Decoder new source: str readXtream buffered) upToEnd ] bench.
>>> }
>>> #('58.52976428286057 per second.' '25.44225800039754 per second.')
>>> #('58.90575079872205 per second.' '25.87064676616916 per second.')
>>>
>>>
>>> I'm also tried double-buffering, but neither my class nor yours
>>> currently works with it:
>>>
>>> | str |
>>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream.
>>> {
>>> [ str reset. (XtreamUTF8Converter on: str readXtream buffered)
>>> buffered upToEnd ] bench.
>>> [ str reset. (UTF8Decoder new source: str readXtream buffered)
>>> buffered upToEnd ] bench.
>>> }
>>>
>>> Please , take a look. There are some quirks which not because i
>>> cleaned up decoding/encoding code.
>>> See XtreamWrapper>>upToEnd implementation.
>>>
>>>
>>
>> Yes I published a bit soon and messed up because one temp from text
>> converter method (source) had same name as CharacterDecoder inst var
>> :(
>> Find a second attempt:
>>
>> | str |
>> str := (StandardFileStream readOnlyFileNamed: 'unitext.txt') readXtream binary.
>> {
>> [ str reset. (XtreamUTF8Converter on: str readXtream buffered)
>> buffered upToEnd ] bench.
>> [ str reset. (UTF8Decoder new source: str readXtream buffered)
>> buffered upToEnd ] bench.
>> }
>> #('118.0347513481126 per second.' '31.38117129722167 per second.')
>>
>>
>> As you can see, the optimistic ASCII version is pessimistic in case of
>> non ASCII...
>> It creates a composite stream and perform a lot of copys...
>> This is known and waiting better algorithm :)
>>
>
> whoops.. you got more than 3x speedup, while mine was around 2x.
> But please, try on ascii files.
>
>  | str |
>  str := (String new: 1000 withAll: $a) asByteArray.
>  {
>  [ (XtreamUTF8Converter on: str readXtream binary)  upToEnd ] bench.
>  [ (UTF8Decoder new source: str readXtream binary)  upToEnd ] bench.
>  [ str readXtream binary upToEnd ] bench.
>  }
>  #('2039.392121575685 per second.' '1158.568286342731 per second.'
> '92143.1713657269 per second.')
>
> so, conversion is 90..45 times slower than just copying data :)
> We need to tighten up this gap.
> One would be to optimize #readInto:startingAt:count: using batch-mode
> conversion.
>

It deserves being buffered !
Here are the results with later version:

 | str |
 str := (String new: 1000 withAll: $a) asByteArray.
 {
 [ (XtreamUTF8Converter on: str readXtream binary) buffered upToEnd ] bench.
 [ (UTF8Decoder new source: str readXtream binary) buffered upToEnd ] bench.
 [ str readXtream binary upToEnd ] bench.
 }
 #('3441.711657668466 per second.' '44901.4197160568 per second.'
'168149.7700459908 per second.')

Nicolas

>> Nicolas
>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
12