Hi, just did a small test of a faster storeOn: for ByteStrings (using
next:putAll:startingAt: for sequences not containing quotes, instead of nextPut: for each character ), and ran into a weird quirk. When writing to disk (on Windows), the old storeOn: would result in storing the ø's in the string below as F8, while with storeOn2: they are stored as C3 B8. (Haven't looked at exactly why they're saved differently.) As far as I can tell from a google search, F8 is ascii encoding, and C3 B8 is UTF8 encoding. Could this be a cause for the "invalid UTF8 character"-errors we've been seeing sporadically? Attached a changeset with storeOn2: , below is the workspace I used. Cheers, Henry str := 'asdfasdfadfadfrgjn''fgoibocbxlgjsrgoihrgohgfn''cx,bmxnbøzghøfhzødfhxcvnzdfljfoaurgaorhr8htg0ae8gofhef08hasovhdfhøxo''vh89ah4f9aw8hf'. str storeOn: (FileStream newFileNamed: 'test22.txt'). Time millisecondsToRun: [|fs| fs := FileStream oldFileNamed: 'test22.txt'. 50000 timesRepeat: [str storeOn: fs]. fs close]. 23494 22869 str storeOn2: (FileStream newFileNamed: 'test2.txt'). Time millisecondsToRun: [|fs| fs := FileStream oldFileNamed: 'test2.txt'. 50000 timesRepeat: [str storeOn2: fs]. fs close]. 1303 1383 'From Pharo1.0beta of 16 May 2008 [Latest update: #10470] on 12 October 2009 at 2:31:08 pm'! !ByteString methodsFor: 'printing' stamp: 'HenrikSperreJohansen 10/12/2009 14:30'! storeOn2: aStream "Print inside string quotes, doubling inbedded quotes." | ix startIx| aStream nextPut: $'. startIx := 1. [(ix := self indexOf: $' startingAt: startIx) > 0 ] whileTrue: [ aStream next: ix +1 - startIx putAll: self startingAt: startIx. aStream nextPut: $'. startIx := ix +1]. aStream next: self size +1 - startIx putAll: self startingAt: startIx. aStream nextPut: $'. ! ! !ByteString methodsFor: 'printing' stamp: 'HenrikSperreJohansen 10/12/2009 14:30'! storeString2 "Answer a String representation of the receiver from which the receiver can be reconstructed." |result ws | result := String new: self size +2. ws := result writeStream. self storeOn2: ws. ^ws position = result size ifTrue: [result] ifFalse: [ws contents] ! ! _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
2009/10/12 Henrik Johansen <[hidden email]> Hi, just did a small test of a faster storeOn: for ByteStrings (using WOW!!! I run it here in a Windows XP 1GB RAM and these are the results: str storeOn: (FileStream newFileNamed: 'test22.txt'). Time millisecondsToRun: [|fs| fs := FileStream oldFileNamed: 'test22.txt'. 50000 timesRepeat: [str storeOn: fs]. fs close]. 84022 str storeOn2: (FileStream newFileNamed: 'test2.txt'). Time millisecondsToRun: [|fs| fs := FileStream oldFileNamed: 'test2.txt'. 50000 timesRepeat: [str storeOn2: fs]. fs close]. 4990 BIG DIFFERENCE 'From Pharo1.0beta of 16 May 2008 [Latest update: #10470] on 12 October 2009 at 2:31:08 pm'! _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Mariano Martinez Peck skrev:
> > WOW!!! I run it here in a Windows XP 1GB RAM and these are the results: > > str storeOn: (FileStream newFileNamed: 'test22.txt'). > Time millisecondsToRun: [|fs| > fs := FileStream oldFileNamed: 'test22.txt'. > 50000 timesRepeat: [str storeOn: fs]. > fs close]. 84022 > > str storeOn2: (FileStream newFileNamed: 'test2.txt'). > Time millisecondsToRun: [|fs| > fs := FileStream oldFileNamed: 'test2.txt'. > 50000 timesRepeat: [str storeOn2: fs]. > fs close]. 4990 > > BIG DIFFERENCE > nextPut: primitive) isn't exactly performant :P Btw, it works fine for Widestrings too when using storeOn: internally (in printString f.ex.), but not when writing them to file. :/ Plus, as I indicated, behaviour from storeOn: is not 100% preserved (different encoding used for ø), as I noted in the original post. Cheers, Henry _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
so what would be the next step :)
Stef On Oct 13, 2009, at 3:17 PM, Henrik Johansen wrote: > Mariano Martinez Peck skrev: >> >> WOW!!! I run it here in a Windows XP 1GB RAM and these are the >> results: >> >> str storeOn: (FileStream newFileNamed: 'test22.txt'). >> Time millisecondsToRun: [|fs| >> fs := FileStream oldFileNamed: 'test22.txt'. >> 50000 timesRepeat: [str storeOn: fs]. >> fs close]. 84022 >> >> str storeOn2: (FileStream newFileNamed: 'test2.txt'). >> Time millisecondsToRun: [|fs| >> fs := FileStream oldFileNamed: 'test2.txt'. >> 50000 timesRepeat: [str storeOn2: fs]. >> fs close]. 4990 >> >> BIG DIFFERENCE >> > Yeah, writing one char at a time (plus, when doing that, the broken > nextPut: primitive) isn't exactly performant :P > Btw, it works fine for Widestrings too when using storeOn: internally > (in printString f.ex.), but not when writing them to file. :/ > Plus, as I indicated, behaviour from storeOn: is not 100% preserved > (different encoding used for ø), as I noted in the original post. > > Cheers, > Henry > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Someone with a clue answering my original question of whether the
encoding when saving to file is wrong or not :) Alternatively someone with VM knowledge looking into why the nextPut primitive is broken... And no, it's not just that the stream hasn't been placed in cache yet or that collection is full, placing a counter in nextPut and printing: |string ws| Smalltalk at: #NextPutPrimitiveFails put: 0. string := String new: 500000. ws := string writeStream. 500000 timesRepeat: [ws nextPut: $a]. NextPutPrimitiveFails. 500000 ) Cheers, Henry Stéphane Ducasse skrev: > so what would be the next step :) > > Stef > > On Oct 13, 2009, at 3:17 PM, Henrik Johansen wrote: > > >> Mariano Martinez Peck skrev: >> >>> WOW!!! I run it here in a Windows XP 1GB RAM and these are the >>> results: >>> >>> str storeOn: (FileStream newFileNamed: 'test22.txt'). >>> Time millisecondsToRun: [|fs| >>> fs := FileStream oldFileNamed: 'test22.txt'. >>> 50000 timesRepeat: [str storeOn: fs]. >>> fs close]. 84022 >>> >>> str storeOn2: (FileStream newFileNamed: 'test2.txt'). >>> Time millisecondsToRun: [|fs| >>> fs := FileStream oldFileNamed: 'test2.txt'. >>> 50000 timesRepeat: [str storeOn2: fs]. >>> fs close]. 4990 >>> >>> BIG DIFFERENCE >>> >>> >> Yeah, writing one char at a time (plus, when doing that, the broken >> nextPut: primitive) isn't exactly performant :P >> Btw, it works fine for Widestrings too when using storeOn: internally >> (in printString f.ex.), but not when writing them to file. :/ >> Plus, as I indicated, behaviour from storeOn: is not 100% preserved >> (different encoding used for ø), as I noted in the original post. >> >> Cheers, >> Henry >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Free forum by Nabble | Edit this page |