Smalltalk › Squeak › Squeak - Dev

news from the Xtream front

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

14 messages Options

Nicolas Cellier

news from the Xtream front

To give a concrete view of what improment we might further get beyond
excellent changes from Levente, i just tried this in latest trunk,
with latest Xtream version:

{
[| tmp | tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles
at: 2) name) ascii; wantsLineEndConversion: false; converter:
UTF8TextConverter new.
1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun.
[| tmp | tmp := ((StandardFileStream readOnlyFileNamed: (SourceFiles
at: 2) name) readXtream ascii buffered decodeWith: (UTF8TextConverter
new installLineEndConvention: nil)) buffered.
1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun.
}

#(1395 84)

The first is the recently optimized trunk version. Unfortunately,
MultiByteFileStream at work, you get a looong one by one decoding
The second is the Xtream version with crafted #buffered sends.
Hardly believable what you can do with a utf8ToSqueak-like hack and a buffer...

Of course, this version is optimized only in case of ASCII source
encoded in UTF8 (the easy case, but the most common case concerning
source files).
I don't know what hapens when encountering a multi-byte utf-8 char...
... all I know is that performance in this case is likely a disaster
(my code is a bit stupid, but it's too late do correct it now)

Oh, maybe Levente will just port the idea tomorrow in trunk, so I can
have a bit more rest ;)

Cheers

Nicolas

Levente Uzonyi-2

Re: news from the Xtream front

On Tue, 8 Dec 2009, Nicolas Cellier wrote:

> To give a concrete view of what improment we might further get beyond
> excellent changes from Levente, i just tried this in latest trunk,
> with latest Xtream version:
>
> {
> [| tmp | tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name) ascii; wantsLineEndConversion: false; converter:
> UTF8TextConverter new.
> 1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun.
> [| tmp | tmp := ((StandardFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name) readXtream ascii buffered decodeWith: (UTF8TextConverter
> new installLineEndConvention: nil)) buffered.
> 1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun.
> }
>
> #(1395 84)
>

Really cool. :)

> The first is the recently optimized trunk version. Unfortunately,
> MultiByteFileStream at work, you get a looong one by one decoding
> The second is the Xtream version with crafted #buffered sends.
> Hardly believable what you can do with a utf8ToSqueak-like hack and a buffer...
>
> Of course, this version is optimized only in case of ASCII source
> encoded in UTF8 (the easy case, but the most common case concerning
> source files).

Don't forget that the sources are sometimes read backwards by the current
code.

> I don't know what hapens when encountering a multi-byte utf-8 char...
> ... all I know is that performance in this case is likely a disaster
> (my code is a bit stupid, but it's too late do correct it now)
>

It can still be much better than the current approach.

> Oh, maybe Levente will just port the idea tomorrow in trunk, so I can
> have a bit more rest ;)
>

Well, maybe, I'm working on other hacks, but I'll take a look, I'm
starting to like the idea. ;)

Levente

> Cheers
>
> Nicolas
>
>

Andreas.Raab

Re: news from the Xtream front

In reply to this post by Nicolas Cellier

Nicolas Cellier wrote:
> I don't know what hapens when encountering a multi-byte utf-8 char...

Easy to test, just grab your favorite non-english book, for example:

HTTPSocket httpGet: 'http://www.gutenberg.org/dirs/etext04/820kc10.txt'.

That should be plenty for a realistic test.

Cheers,
- Andreas

Nicolas Cellier

Re: news from the Xtream front

In reply to this post by Levente Uzonyi-2

2009/12/8 Levente Uzonyi <[hidden email]>:

> On Tue, 8 Dec 2009, Nicolas Cellier wrote:
>
>> To give a concrete view of what improment we might further get beyond
>> excellent changes from Levente, i just tried this in latest trunk,
>> with latest Xtream version:
>>
>> {
>> [| tmp | tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles
>> at: 2) name) ascii; wantsLineEndConversion: false; converter:
>> UTF8TextConverter new.
>> 1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun.
>> [| tmp | tmp := ((StandardFileStream readOnlyFileNamed: (SourceFiles
>> at: 2) name) readXtream ascii buffered decodeWith: (UTF8TextConverter
>> new installLineEndConvention: nil)) buffered.
>> 1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun.
>> }
>>
>> #(1395 84)
>>
>
> Really cool. :)
>
>> The first is the recently optimized trunk version. Unfortunately,
>> MultiByteFileStream at work, you get a looong one by one decoding
>> The second is the Xtream version with crafted #buffered sends.
>> Hardly believable what you can do with a utf8ToSqueak-like hack and a
>> buffer...
>>
>> Of course, this version is optimized only in case of ASCII source
>> encoded in UTF8 (the easy case, but the most common case concerning
>> source files).
>
> Don't forget that the sources are sometimes read backwards by the current
> code.
>

Oh yes, like this ?

| file |
[file := MultiByteFileStream newFileNamed: 'mbfs_skip.tst'.
file ascii; wantsLineEndConversion: false; converter: UTF8TextConverter new.
file nextPutAll: 'Ceci doit changé'.
file skip: -1. "Oops - grammatically incorrect"
file nextPutAll: 'er'.
file close.

file := StandardFileStream oldFileNamed: 'mbfs_skip.tst'.
file ascii.
file contentsOfEntireFile.]
ensure: [file close.
FileDirectory default deleteFileNamed: 'mbfs_skip.tst'].
-> 'Ceci doit changÃer' "Oops squeakly incorrect"

Ah Ah, MultiByteFileStream let us see a stream of encoded characters,
but position over a stream of bytes...
The only programmer choice is to put marks (by inquiring aMBFS
position) and restore position using these marks...

>> I don't know what hapens when encountering a multi-byte utf-8 char...
>> ... all I know is that performance in this case is likely a disaster
>> (my code is a bit stupid, but it's too late do correct it now)
>>
>
> It can still be much better than the current approach.
>

Yes it could

>> Oh, maybe Levente will just port the idea tomorrow in trunk, so I can
>> have a bit more rest ;)
>>
>
> Well, maybe, I'm working on other hacks, but I'll take a look, I'm starting
> to like the idea. ;)
>

Making something simple out of current MultiByteFileStream mess is a
challenge I don't even want to take, but you seem a but tougher than
me.

Cheers

Nicolas

>
> Levente
>
>> Cheers
>>
>> Nicolas
>>
>>
>
>

Nicolas Cellier

Re: Re: news from the Xtream front

In reply to this post by Andreas.Raab

2009/12/8 Andreas Raab <[hidden email]>:

> Nicolas Cellier wrote:
>>
>> I don't know what hapens when encountering a multi-byte utf-8 char...
>
> Easy to test, just grab your favorite non-english book, for example:
>
> HTTPSocket httpGet: 'http://www.gutenberg.org/dirs/etext04/820kc10.txt'.
>
> That should be plenty for a realistic test.
>
> Cheers,
> - Andreas
>
>

Oh, sure, good choice, Jules Verne could have been my neighbour in
Nantes (if I had a long white beard instead of just a few gray hairs).

Nicolas

Igor Stasenko

Re: news from the Xtream front

In reply to this post by Nicolas Cellier

2009/12/8 Nicolas Cellier <[hidden email]>:
> To give a concrete view of what improment we might further get beyond

>
> #(1395 84)
>
Unbelievable. You must be cheating! :)

--
Best regards,
Igor Stasenko AKA sig.

Nicolas Cellier

Re: news from the Xtream front

2009/12/8 Igor Stasenko <[hidden email]>:
> 2009/12/8 Nicolas Cellier <[hidden email]>:
>> To give a concrete view of what improment we might further get beyond
>
>>
>> #(1395 84)
>>
> Unbelievable. You must be cheating! :)
>
>

Sure, since we use UTF-8 encoding, but mostly put ASCII characters in
source files, no conversion is needed at all...
The cheat is just to detect that case, that's the #utf8ToSqueak hack.

Nicolas

> --
> Best regards,
> Igor Stasenko AKA sig.
>
>

Levente Uzonyi-2

Re: news from the Xtream front

In reply to this post by Nicolas Cellier

On Tue, 8 Dec 2009, Nicolas Cellier wrote:

>
> Oh yes, like this ?
>
> | file |
> [file := MultiByteFileStream newFileNamed: 'mbfs_skip.tst'.
> file ascii; wantsLineEndConversion: false; converter: UTF8TextConverter new.
> file nextPutAll: 'Ceci doit changé'.
> file skip: -1. "Oops - grammatically incorrect"
> file nextPutAll: 'er'.
> file close.
>
> file := StandardFileStream oldFileNamed: 'mbfs_skip.tst'.
> file ascii.
> file contentsOfEntireFile.]
> ensure: [file close.
> FileDirectory default deleteFileNamed: 'mbfs_skip.tst'].
> -> 'Ceci doit chang?er' "Oops squeakly incorrect"
>
> Ah Ah, MultiByteFileStream let us see a stream of encoded characters,
> but position over a stream of bytes...
> The only programmer choice is to put marks (by inquiring aMBFS
> position) and restore position using these marks...
>

Well, this part is broken, but the current fileIn/fileOut code relies on
this bug/"feature", otherwise it would be easy to fix it in the utf8 case.
Actually I was thinking about CompiledMethod >> #getPreambleFrom:at: or
even worse PositionableStream >> #backChunk.

> Making something simple out of current MultiByteFileStream mess is a
> challenge I don't even want to take, but you seem a but tougher than
> me.
>

I think the current performance of MultiByteFileStream is acceptable for
general use. According to my measurements the greatest bottleneck is
WriteStream >> #nextPut: for typical operations.

Levente

> Cheers
>
> Nicolas
>
>>
>> Levente
>>
>>> Cheers
>>>
>>> Nicolas
>>>
>>>
>>
>>
>
>

Gary Chambers-4

Re: news from the Xtream front

For some of our stuff we've had to switch to StandardFileStream (given ascii
encoding) to get a 10x perfromance improvement. Would be nice to not have to
do so!

Regards, Gary

----- Original Message -----
From: "Levente Uzonyi" <[hidden email]>
To: "The general-purpose Squeak developers list"
<[hidden email]>
Sent: Tuesday, December 08, 2009 2:12 PM
Subject: Re: [squeak-dev] news from the Xtream front

On Tue, 8 Dec 2009, Nicolas Cellier wrote:

>
> Oh yes, like this ?
>
> | file |
> [file := MultiByteFileStream newFileNamed: 'mbfs_skip.tst'.
> file ascii; wantsLineEndConversion: false; converter: UTF8TextConverter
> new.
> file nextPutAll: 'Ceci doit changé'.
> file skip: -1. "Oops - grammatically incorrect"
> file nextPutAll: 'er'.
> file close.
>
> file := StandardFileStream oldFileNamed: 'mbfs_skip.tst'.
> file ascii.
> file contentsOfEntireFile.]
> ensure: [file close.
> FileDirectory default deleteFileNamed: 'mbfs_skip.tst'].
> -> 'Ceci doit chang?er' "Oops squeakly incorrect"
>
> Ah Ah, MultiByteFileStream let us see a stream of encoded characters,
> but position over a stream of bytes...
> The only programmer choice is to put marks (by inquiring aMBFS
> position) and restore position using these marks...
>

> Cheers
>
> Nicolas
>
>>
>> Levente
>>
>>> Cheers
>>>
>>> Nicolas
>>>
>>>
>>
>>
>
>

--------------------------------------------------------------------------------

>
>

Nicolas Cellier

Re: news from the Xtream front

In reply to this post by Levente Uzonyi-2

2009/12/8 Levente Uzonyi <[hidden email]>:

> On Tue, 8 Dec 2009, Nicolas Cellier wrote:
>
>>
>> Oh yes, like this ?
>>
>> | file |
>> [file := MultiByteFileStream newFileNamed: 'mbfs_skip.tst'.
>> file ascii; wantsLineEndConversion: false; converter: UTF8TextConverter
>> new.
>> file nextPutAll: 'Ceci doit changé'.
>> file skip: -1. "Oops - grammatically incorrect"
>> file nextPutAll: 'er'.
>> file close.
>>
>> file := StandardFileStream oldFileNamed: 'mbfs_skip.tst'.
>> file ascii.
>> file contentsOfEntireFile.]
>> ensure: [file close.
>> FileDirectory default deleteFileNamed: 'mbfs_skip.tst'].
>> -> 'Ceci doit chang?er' "Oops squeakly incorrect"
>>
>> Ah Ah, MultiByteFileStream let us see a stream of encoded characters,
>> but position over a stream of bytes...
>> The only programmer choice is to put marks (by inquiring aMBFS
>> position) and restore position using these marks...
>>
>
> Well, this part is broken, but the current fileIn/fileOut code relies on
> this bug/"feature", otherwise it would be easy to fix it in the utf8 case.
> Actually I was thinking about CompiledMethod >> #getPreambleFrom:at: or even
> worse PositionableStream >> #backChunk.
>

Oh, I see... It seems we're lucky to use a delimiter with charCode < 128 !
Among everal alternatives:
1) make a generic PositionableXtreamWrapper that memorize source
position at some mark (at each buffer for example).
2) make a reverseXtreamWrapper
...

>> Making something simple out of current MultiByteFileStream mess is a
>> challenge I don't even want to take, but you seem a but tougher than
>> me.
>>
>
> I think the current performance of MultiByteFileStream is acceptable for
> general use. According to my measurements the greatest bottleneck is
> WriteStream >> #nextPut: for typical operations.
>
>
> Levente
>

You mean streaming on a collection ? Didn't someone corrected nextPut:
primitive recently ?
Without this primitive, avoid the isOctetCharacter and co, ByteString
at:put: handles that...
See Xtream implementation:

{
[|ws |
ws := (String new: 10000) writeStream.
1 to: 20000 do: [:i | ws nextPut: $0]] bench.
[| ws |
ws := (String new: 10000) writeXtream.
1 to: 20000 do: [:i | ws nextPut: $0]] bench.
}
#('86.4789294987018 per second.' '128.374325134973 per second.')
1.5x speed up is already something...

Otherwise, you'll have to look at a higher level to see if you cannot
use a buffered technique and nextPutAll: instead. That would be a
major speed up (10x or +).

Nicolas

>> Cheers
>>
>> Nicolas
>>
>>>
>>> Levente
>>>
>>>> Cheers
>>>>
>>>> Nicolas
>>>>
>>>>
>>>
>>>
>>
>
>
>
>

Eliot Miranda-2

Re: news from the Xtream front

In reply to this post by Levente Uzonyi-2

2009/12/8 Levente Uzonyi <[hidden email]>

On Tue, 8 Dec 2009, Nicolas Cellier wrote:

Oh yes, like this ?

| file |
[file := MultiByteFileStream newFileNamed: 'mbfs_skip.tst'.
file ascii; wantsLineEndConversion: false; converter: UTF8TextConverter new.
file nextPutAll: 'Ceci doit changé'.
file skip: -1. "Oops - grammatically incorrect"
file nextPutAll: 'er'.
file close.

file := StandardFileStream oldFileNamed: 'mbfs_skip.tst'.
file ascii.
file contentsOfEntireFile.]
ensure: [file close.
FileDirectory default deleteFileNamed: 'mbfs_skip.tst'].
-> 'Ceci doit chang?er' "Oops squeakly incorrect"

Ah Ah, MultiByteFileStream let us see a stream of encoded characters,
but position over a stream of bytes...
The only programmer choice is to put marks (by inquiring aMBFS
position) and restore position using these marks...

Well, this part is broken, but the current fileIn/fileOut code relies on this bug/"feature", otherwise it would be easy to fix it in the utf8 case.
Actually I was thinking about CompiledMethod >> #getPreambleFrom:at: or even worse PositionableStream >> #backChunk.

On a tangental note one can save significant time by having StandardSourceFilesArray cache read-only copies instead of creating new ones all the time. Find a change set attached that we use at Teleplace. You might find more places to use this than I have.

Making something simple out of current MultiByteFileStream mess is a
challenge I don't even want to take, but you seem a but tougher than
me.

I think the current performance of MultiByteFileStream is acceptable for general use. According to my measurements the greatest bottleneck is WriteStream >> #nextPut: for typical operations.

Levente

Cheers

Nicolas

Levente

Cheers

Nicolas

SourceFileReadOnlyCopy.1.cs (9K) Download Attachment

Eliot Miranda-2

Re: news from the Xtream front

On Tue, Dec 8, 2009 at 9:35 AM, Eliot Miranda <[hidden email]> wrote:

2009/12/8 Levente Uzonyi <[hidden email]>

On Tue, 8 Dec 2009, Nicolas Cellier wrote:

Oh yes, like this ?

| file |
[file := MultiByteFileStream newFileNamed: 'mbfs_skip.tst'.
file ascii; wantsLineEndConversion: false; converter: UTF8TextConverter new.
file nextPutAll: 'Ceci doit changé'.
file skip: -1. "Oops - grammatically incorrect"
file nextPutAll: 'er'.
file close.

file := StandardFileStream oldFileNamed: 'mbfs_skip.tst'.
file ascii.
file contentsOfEntireFile.]
ensure: [file close.
FileDirectory default deleteFileNamed: 'mbfs_skip.tst'].
-> 'Ceci doit chang?er' "Oops squeakly incorrect"

Ah Ah, MultiByteFileStream let us see a stream of encoded characters,
but position over a stream of bytes...
The only programmer choice is to put marks (by inquiring aMBFS
position) and restore position using these marks...

Well, this part is broken, but the current fileIn/fileOut code relies on this bug/"feature", otherwise it would be easy to fix it in the utf8 case.
Actually I was thinking about CompiledMethod >> #getPreambleFrom:at: or even worse PositionableStream >> #backChunk.

On a tangental note one can save significant time by having StandardSourceFilesArray cache read-only copies instead of creating new ones all the time. Find a change set attached that we use at Teleplace. You might find more places to use this than I have.

Henrik Johansen

to me

show details 11:42 AM (20 minutes ago)

>>On a tangental note one can save significant time by having StandardSourceFilesArray cache read-only copies instead of creating new ones all the time. Find a change set attached that we >>use at Teleplace. You might find more places to use this than I have.

I'm not subscribed to SqueakDev, but here's one worthwhile addition:

Benchies:
[CompiledMethod allInstances collect: [:each | each getSource] ] timeToRun

With caching: 17344
Without caching: 25721

Cheers,

Henry

Making something simple out of current MultiByteFileStream mess is a
challenge I don't even want to take, but you seem a but tougher than
me.

I think the current performance of MultiByteFileStream is acceptable for general use. According to my measurements the greatest bottleneck is WriteStream >> #nextPut: for typical operations.

Levente

Cheers

Nicolas

Levente

Cheers

Nicolas

RemoteString-text.st (918 bytes) Download Attachment

Eliot Miranda-2

Re: news from the Xtream front

On Tue, Dec 8, 2009 at 12:05 PM, Eliot Miranda <[hidden email]> wrote:

On Tue, Dec 8, 2009 at 9:35 AM, Eliot Miranda <[hidden email]> wrote:

2009/12/8 Levente Uzonyi <[hidden email]>

On Tue, 8 Dec 2009, Nicolas Cellier wrote:

Oh yes, like this ?

| file |
[file := MultiByteFileStream newFileNamed: 'mbfs_skip.tst'.
file ascii; wantsLineEndConversion: false; converter: UTF8TextConverter new.
file nextPutAll: 'Ceci doit changé'.
file skip: -1. "Oops - grammatically incorrect"
file nextPutAll: 'er'.
file close.

file := StandardFileStream oldFileNamed: 'mbfs_skip.tst'.
file ascii.
file contentsOfEntireFile.]
ensure: [file close.
FileDirectory default deleteFileNamed: 'mbfs_skip.tst'].
-> 'Ceci doit chang?er' "Oops squeakly incorrect"

Ah Ah, MultiByteFileStream let us see a stream of encoded characters,
but position over a stream of bytes...
The only programmer choice is to put marks (by inquiring aMBFS
position) and restore position using these marks...

Well, this part is broken, but the current fileIn/fileOut code relies on this bug/"feature", otherwise it would be easy to fix it in the utf8 case.
Actually I was thinking about CompiledMethod >> #getPreambleFrom:at: or even worse PositionableStream >> #backChunk.

On a tangental note one can save significant time by having StandardSourceFilesArray cache read-only copies instead of creating new ones all the time. Find a change set attached that we use at Teleplace. You might find more places to use this than I have.

Henrik Johansen
to me

show details 11:42 AM (20 minutes ago)

>>On a tangental note one can save significant time by having StandardSourceFilesArray cache read-only copies instead of creating new ones all the time. Find a change set attached that we >>use at Teleplace. You might find more places to use this than I have.

I'm not subscribed to SqueakDev, but here's one worthwhile addition:

Benchies:
[CompiledMethod allInstances collect: [:each | each getSource] ] timeToRun

With caching: 17344
Without caching: 25721

Cheers,
Henry

but beware...

On Tue, Dec 8, 2009 at 12:09 PM, Henrik Johansen <[hidden email]> wrote:

[snip]

I just noticed that for some reason, it seems to screw up accepting new versions of old methods...

Need more time to check out why, yay.

Making something simple out of current MultiByteFileStream mess is a
challenge I don't even want to take, but you seem a but tougher than
me.

I think the current performance of MultiByteFileStream is acceptable for general use. According to my measurements the greatest bottleneck is WriteStream >> #nextPut: for typical operations.

Levente

Cheers

Nicolas

Levente

Cheers

Nicolas

Levente Uzonyi-2

Re: news from the Xtream front

In reply to this post by Eliot Miranda-2

On Tue, 8 Dec 2009, Eliot Miranda wrote:

>> On a tangental note one can save significant time by having
>> StandardSourceFilesArray cache read-only copies instead of creating new ones
>> all the time. Find a change set attached that we use at Teleplace. You
>> might find more places to use this than I have.
>>

Simultaneous access (doesn't have to be parallel) to shared resources
(filestreams in this case) can cause problems.

> Henrik Johansen to me
> show details 11:42 AM (20 minutes ago)
>
> I'm not subscribed to SqueakDev, but here's one worthwhile addition:
>
> Benchies:
> [CompiledMethod allInstances collect: [:each | each getSource] ] timeToRun
>
>
> With caching: 17344
> Without caching: 25721
>

I guess these numbers are for pharo (assuming that caching means that the
patch is loaded). I didn't experience any difference in squeak with the
patch, mainly because #getSource uses the global streams from SourceFiles
(this is a problem, because debugging this method may cause problems if
the debugger is fetching or modifying the source) instead of creating
read-only copies. But other places can have benefits, like #timestamp
(~1.5 speedup).

Levente

> Cheers,
> Henry
>
>