To give a concrete view of what improment we might further get beyond
excellent changes from Levente, i just tried this in latest trunk, with latest Xtream version: { [| tmp | tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles at: 2) name) ascii; wantsLineEndConversion: false; converter: UTF8TextConverter new. 1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun. [| tmp | tmp := ((StandardFileStream readOnlyFileNamed: (SourceFiles at: 2) name) readXtream ascii buffered decodeWith: (UTF8TextConverter new installLineEndConvention: nil)) buffered. 1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun. } #(1395 84) The first is the recently optimized trunk version. Unfortunately, MultiByteFileStream at work, you get a looong one by one decoding The second is the Xtream version with crafted #buffered sends. Hardly believable what you can do with a utf8ToSqueak-like hack and a buffer... Of course, this version is optimized only in case of ASCII source encoded in UTF8 (the easy case, but the most common case concerning source files). I don't know what hapens when encountering a multi-byte utf-8 char... ... all I know is that performance in this case is likely a disaster (my code is a bit stupid, but it's too late do correct it now) Oh, maybe Levente will just port the idea tomorrow in trunk, so I can have a bit more rest ;) Cheers Nicolas |
On Tue, 8 Dec 2009, Nicolas Cellier wrote:
> To give a concrete view of what improment we might further get beyond > excellent changes from Levente, i just tried this in latest trunk, > with latest Xtream version: > > { > [| tmp | tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles > at: 2) name) ascii; wantsLineEndConversion: false; converter: > UTF8TextConverter new. > 1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun. > [| tmp | tmp := ((StandardFileStream readOnlyFileNamed: (SourceFiles > at: 2) name) readXtream ascii buffered decodeWith: (UTF8TextConverter > new installLineEndConvention: nil)) buffered. > 1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun. > } > > #(1395 84) > Really cool. :) > The first is the recently optimized trunk version. Unfortunately, > MultiByteFileStream at work, you get a looong one by one decoding > The second is the Xtream version with crafted #buffered sends. > Hardly believable what you can do with a utf8ToSqueak-like hack and a buffer... > > Of course, this version is optimized only in case of ASCII source > encoded in UTF8 (the easy case, but the most common case concerning > source files). Don't forget that the sources are sometimes read backwards by the current code. > I don't know what hapens when encountering a multi-byte utf-8 char... > ... all I know is that performance in this case is likely a disaster > (my code is a bit stupid, but it's too late do correct it now) > It can still be much better than the current approach. > Oh, maybe Levente will just port the idea tomorrow in trunk, so I can > have a bit more rest ;) > Well, maybe, I'm working on other hacks, but I'll take a look, I'm starting to like the idea. ;) Levente > Cheers > > Nicolas > > |
In reply to this post by Nicolas Cellier
Nicolas Cellier wrote:
> I don't know what hapens when encountering a multi-byte utf-8 char... Easy to test, just grab your favorite non-english book, for example: HTTPSocket httpGet: 'http://www.gutenberg.org/dirs/etext04/820kc10.txt'. That should be plenty for a realistic test. Cheers, - Andreas |
In reply to this post by Levente Uzonyi-2
2009/12/8 Levente Uzonyi <[hidden email]>:
> On Tue, 8 Dec 2009, Nicolas Cellier wrote: > >> To give a concrete view of what improment we might further get beyond >> excellent changes from Levente, i just tried this in latest trunk, >> with latest Xtream version: >> >> { >> [| tmp | tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles >> at: 2) name) ascii; wantsLineEndConversion: false; converter: >> UTF8TextConverter new. >> 1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun. >> [| tmp | tmp := ((StandardFileStream readOnlyFileNamed: (SourceFiles >> at: 2) name) readXtream ascii buffered decodeWith: (UTF8TextConverter >> new installLineEndConvention: nil)) buffered. >> 1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun. >> } >> >> #(1395 84) >> > > Really cool. :) > >> The first is the recently optimized trunk version. Unfortunately, >> MultiByteFileStream at work, you get a looong one by one decoding >> The second is the Xtream version with crafted #buffered sends. >> Hardly believable what you can do with a utf8ToSqueak-like hack and a >> buffer... >> >> Of course, this version is optimized only in case of ASCII source >> encoded in UTF8 (the easy case, but the most common case concerning >> source files). > > Don't forget that the sources are sometimes read backwards by the current > code. > Oh yes, like this ? | file | [file := MultiByteFileStream newFileNamed: 'mbfs_skip.tst'. file ascii; wantsLineEndConversion: false; converter: UTF8TextConverter new. file nextPutAll: 'Ceci doit changé'. file skip: -1. "Oops - grammatically incorrect" file nextPutAll: 'er'. file close. file := StandardFileStream oldFileNamed: 'mbfs_skip.tst'. file ascii. file contentsOfEntireFile.] ensure: [file close. FileDirectory default deleteFileNamed: 'mbfs_skip.tst']. -> 'Ceci doit changÃer' "Oops squeakly incorrect" Ah Ah, MultiByteFileStream let us see a stream of encoded characters, but position over a stream of bytes... The only programmer choice is to put marks (by inquiring aMBFS position) and restore position using these marks... >> I don't know what hapens when encountering a multi-byte utf-8 char... >> ... all I know is that performance in this case is likely a disaster >> (my code is a bit stupid, but it's too late do correct it now) >> > > It can still be much better than the current approach. > Yes it could >> Oh, maybe Levente will just port the idea tomorrow in trunk, so I can >> have a bit more rest ;) >> > > Well, maybe, I'm working on other hacks, but I'll take a look, I'm starting > to like the idea. ;) > Making something simple out of current MultiByteFileStream mess is a challenge I don't even want to take, but you seem a but tougher than me. Cheers Nicolas > > Levente > >> Cheers >> >> Nicolas >> >> > > |
In reply to this post by Andreas.Raab
2009/12/8 Andreas Raab <[hidden email]>:
> Nicolas Cellier wrote: >> >> I don't know what hapens when encountering a multi-byte utf-8 char... > > Easy to test, just grab your favorite non-english book, for example: > > HTTPSocket httpGet: 'http://www.gutenberg.org/dirs/etext04/820kc10.txt'. > > That should be plenty for a realistic test. > > Cheers, > - Andreas > > Oh, sure, good choice, Jules Verne could have been my neighbour in Nantes (if I had a long white beard instead of just a few gray hairs). Nicolas |
In reply to this post by Nicolas Cellier
2009/12/8 Nicolas Cellier <[hidden email]>:
> To give a concrete view of what improment we might further get beyond > > #(1395 84) > Unbelievable. You must be cheating! :) -- Best regards, Igor Stasenko AKA sig. |
2009/12/8 Igor Stasenko <[hidden email]>:
> 2009/12/8 Nicolas Cellier <[hidden email]>: >> To give a concrete view of what improment we might further get beyond > >> >> #(1395 84) >> > Unbelievable. You must be cheating! :) > > Sure, since we use UTF-8 encoding, but mostly put ASCII characters in source files, no conversion is needed at all... The cheat is just to detect that case, that's the #utf8ToSqueak hack. Nicolas > -- > Best regards, > Igor Stasenko AKA sig. > > |
In reply to this post by Nicolas Cellier
On Tue, 8 Dec 2009, Nicolas Cellier wrote:
> > Oh yes, like this ? > > | file | > [file := MultiByteFileStream newFileNamed: 'mbfs_skip.tst'. > file ascii; wantsLineEndConversion: false; converter: UTF8TextConverter new. > file nextPutAll: 'Ceci doit changé'. > file skip: -1. "Oops - grammatically incorrect" > file nextPutAll: 'er'. > file close. > > file := StandardFileStream oldFileNamed: 'mbfs_skip.tst'. > file ascii. > file contentsOfEntireFile.] > ensure: [file close. > FileDirectory default deleteFileNamed: 'mbfs_skip.tst']. > -> 'Ceci doit chang?er' "Oops squeakly incorrect" > > Ah Ah, MultiByteFileStream let us see a stream of encoded characters, > but position over a stream of bytes... > The only programmer choice is to put marks (by inquiring aMBFS > position) and restore position using these marks... > this bug/"feature", otherwise it would be easy to fix it in the utf8 case. Actually I was thinking about CompiledMethod >> #getPreambleFrom:at: or even worse PositionableStream >> #backChunk. > Making something simple out of current MultiByteFileStream mess is a > challenge I don't even want to take, but you seem a but tougher than > me. > I think the current performance of MultiByteFileStream is acceptable for general use. According to my measurements the greatest bottleneck is WriteStream >> #nextPut: for typical operations. Levente > Cheers > > Nicolas > >> >> Levente >> >>> Cheers >>> >>> Nicolas >>> >>> >> >> > > |
For some of our stuff we've had to switch to StandardFileStream (given ascii
encoding) to get a 10x perfromance improvement. Would be nice to not have to do so! Regards, Gary ----- Original Message ----- From: "Levente Uzonyi" <[hidden email]> To: "The general-purpose Squeak developers list" <[hidden email]> Sent: Tuesday, December 08, 2009 2:12 PM Subject: Re: [squeak-dev] news from the Xtream front On Tue, 8 Dec 2009, Nicolas Cellier wrote: > > Oh yes, like this ? > > | file | > [file := MultiByteFileStream newFileNamed: 'mbfs_skip.tst'. > file ascii; wantsLineEndConversion: false; converter: UTF8TextConverter > new. > file nextPutAll: 'Ceci doit changé'. > file skip: -1. "Oops - grammatically incorrect" > file nextPutAll: 'er'. > file close. > > file := StandardFileStream oldFileNamed: 'mbfs_skip.tst'. > file ascii. > file contentsOfEntireFile.] > ensure: [file close. > FileDirectory default deleteFileNamed: 'mbfs_skip.tst']. > -> 'Ceci doit chang?er' "Oops squeakly incorrect" > > Ah Ah, MultiByteFileStream let us see a stream of encoded characters, > but position over a stream of bytes... > The only programmer choice is to put marks (by inquiring aMBFS > position) and restore position using these marks... > Well, this part is broken, but the current fileIn/fileOut code relies on this bug/"feature", otherwise it would be easy to fix it in the utf8 case. Actually I was thinking about CompiledMethod >> #getPreambleFrom:at: or even worse PositionableStream >> #backChunk. > Making something simple out of current MultiByteFileStream mess is a > challenge I don't even want to take, but you seem a but tougher than > me. > I think the current performance of MultiByteFileStream is acceptable for general use. According to my measurements the greatest bottleneck is WriteStream >> #nextPut: for typical operations. Levente > Cheers > > Nicolas > >> >> Levente >> >>> Cheers >>> >>> Nicolas >>> >>> >> >> > > -------------------------------------------------------------------------------- > > |
In reply to this post by Levente Uzonyi-2
2009/12/8 Levente Uzonyi <[hidden email]>:
> On Tue, 8 Dec 2009, Nicolas Cellier wrote: > >> >> Oh yes, like this ? >> >> | file | >> [file := MultiByteFileStream newFileNamed: 'mbfs_skip.tst'. >> file ascii; wantsLineEndConversion: false; converter: UTF8TextConverter >> new. >> file nextPutAll: 'Ceci doit changé'. >> file skip: -1. "Oops - grammatically incorrect" >> file nextPutAll: 'er'. >> file close. >> >> file := StandardFileStream oldFileNamed: 'mbfs_skip.tst'. >> file ascii. >> file contentsOfEntireFile.] >> ensure: [file close. >> FileDirectory default deleteFileNamed: 'mbfs_skip.tst']. >> -> 'Ceci doit chang?er' "Oops squeakly incorrect" >> >> Ah Ah, MultiByteFileStream let us see a stream of encoded characters, >> but position over a stream of bytes... >> The only programmer choice is to put marks (by inquiring aMBFS >> position) and restore position using these marks... >> > > Well, this part is broken, but the current fileIn/fileOut code relies on > this bug/"feature", otherwise it would be easy to fix it in the utf8 case. > Actually I was thinking about CompiledMethod >> #getPreambleFrom:at: or even > worse PositionableStream >> #backChunk. > Oh, I see... It seems we're lucky to use a delimiter with charCode < 128 ! Among everal alternatives: 1) make a generic PositionableXtreamWrapper that memorize source position at some mark (at each buffer for example). 2) make a reverseXtreamWrapper ... >> Making something simple out of current MultiByteFileStream mess is a >> challenge I don't even want to take, but you seem a but tougher than >> me. >> > > I think the current performance of MultiByteFileStream is acceptable for > general use. According to my measurements the greatest bottleneck is > WriteStream >> #nextPut: for typical operations. > > > Levente > You mean streaming on a collection ? Didn't someone corrected nextPut: primitive recently ? Without this primitive, avoid the isOctetCharacter and co, ByteString at:put: handles that... See Xtream implementation: { [|ws | ws := (String new: 10000) writeStream. 1 to: 20000 do: [:i | ws nextPut: $0]] bench. [| ws | ws := (String new: 10000) writeXtream. 1 to: 20000 do: [:i | ws nextPut: $0]] bench. } #('86.4789294987018 per second.' '128.374325134973 per second.') 1.5x speed up is already something... Otherwise, you'll have to look at a higher level to see if you cannot use a buffered technique and nextPutAll: instead. That would be a major speed up (10x or +). Nicolas >> Cheers >> >> Nicolas >> >>> >>> Levente >>> >>>> Cheers >>>> >>>> Nicolas >>>> >>>> >>> >>> >> > > > > |
In reply to this post by Levente Uzonyi-2
2009/12/8 Levente Uzonyi <[hidden email]>
On a tangental note one can save significant time by having StandardSourceFilesArray cache read-only copies instead of creating new ones all the time. Find a change set attached that we use at Teleplace. You might find more places to use this than I have.
SourceFileReadOnlyCopy.1.cs (9K) Download Attachment |
On Tue, Dec 8, 2009 at 9:35 AM, Eliot Miranda <[hidden email]> wrote:
>>On a tangental note one can save significant time by having StandardSourceFilesArray cache read-only copies instead of creating new ones all the time. Find a change set attached that we >>use at Teleplace. You might find more places to use this than I have. I'm not subscribed to SqueakDev, but here's one worthwhile addition: Benchies: [CompiledMethod allInstances collect: [:each | each getSource] ] timeToRun With caching: 17344 Without caching: 25721 Cheers, Henry
RemoteString-text.st (918 bytes) Download Attachment |
On Tue, Dec 8, 2009 at 12:05 PM, Eliot Miranda <[hidden email]> wrote:
but beware...
On Tue, Dec 8, 2009 at 12:09 PM, Henrik Johansen <[hidden email]> wrote:
|
In reply to this post by Eliot Miranda-2
On Tue, 8 Dec 2009, Eliot Miranda wrote:
>> On a tangental note one can save significant time by having >> StandardSourceFilesArray cache read-only copies instead of creating new ones >> all the time. Find a change set attached that we use at Teleplace. You >> might find more places to use this than I have. >> Simultaneous access (doesn't have to be parallel) to shared resources (filestreams in this case) can cause problems. > Henrik Johansen to me > show details 11:42 AM (20 minutes ago) > > I'm not subscribed to SqueakDev, but here's one worthwhile addition: > > Benchies: > [CompiledMethod allInstances collect: [:each | each getSource] ] timeToRun > > > With caching: 17344 > Without caching: 25721 > I guess these numbers are for pharo (assuming that caching means that the patch is loaded). I didn't experience any difference in squeak with the patch, mainly because #getSource uses the global streams from SourceFiles (this is a problem, because debugging this method may cause problems if the debugger is fetching or modifying the source) instead of creating read-only copies. But other places can have benefits, like #timestamp (~1.5 speedup). Levente > Cheers, > Henry > > |
Free forum by Nabble | Edit this page |