Tobias Pape uploaded a new version of Multilingual to project The Trunk:
http://source.squeak.org/trunk/Multilingual-topa.205.mcz ==================== Summary ==================== Name: Multilingual-topa.205 Author: topa Time: 14 April 2015, 11:04:27.175 am UUID: da7bc0f1-5f76-40d0-9bcb-7cf96596ac92 Ancestors: Multilingual-topa.204, Multilingual-cbc.201 Pick up a fix for MultiByteFileStream>>#nextChunk =============== Diff against Multilingual-topa.204 =============== Item was changed: ----- Method: MultiByteFileStream>>nextChunk (in category 'fileIn/Out') ----- nextChunk "Answer the contents of the receiver, up to the next terminator character. Doubled terminators indicate an embedded terminator character." + ^(wantsLineEndConversion and: [ lineEndConvention notNil ]) + ifTrue: [converter nextChunkLineEndConvertingFromStream: self] + ifFalse: [converter nextChunkFromStream: self]! - ^converter nextChunkFromStream: self! Item was added: + ----- Method: UTF8TextConverter>>nextChunkLineEndConvertingFromStream: (in category 'fileIn/Out') ----- + nextChunkLineEndConvertingFromStream: input + "Answer the contents of input, up to the next terminator character. Doubled terminators indicate an embedded terminator character." + "Obey line end conversion." + + self skipSeparatorsFrom: input. + ^self + parseLangTagFor: ( + self class decodeByteString: ( + String new: 1000 streamContents: [ :stream | + [ + stream nextPutAll: (input upTo: $!!). + input basicNext == $!! ] + whileTrue: [ + stream nextPut: $!! ]. + input atEnd ifFalse: [ input skip: -1 ] ])) + fromStream: input! |
This doesn't look right, because it'll decode the input twice. Once in
#upTo:, and once in #decodeByteString:. The latter will fail if its argument is not a valid UTF-8 string after the first decoding. Levente On Tue, 14 Apr 2015, [hidden email] wrote: > Tobias Pape uploaded a new version of Multilingual to project The Trunk: > http://source.squeak.org/trunk/Multilingual-topa.205.mcz > > ==================== Summary ==================== > > Name: Multilingual-topa.205 > Author: topa > Time: 14 April 2015, 11:04:27.175 am > UUID: da7bc0f1-5f76-40d0-9bcb-7cf96596ac92 > Ancestors: Multilingual-topa.204, Multilingual-cbc.201 > > Pick up a fix for MultiByteFileStream>>#nextChunk > > =============== Diff against Multilingual-topa.204 =============== > > Item was changed: > ----- Method: MultiByteFileStream>>nextChunk (in category 'fileIn/Out') ----- > nextChunk > "Answer the contents of the receiver, up to the next terminator character. Doubled terminators indicate an embedded terminator character." > > + ^(wantsLineEndConversion and: [ lineEndConvention notNil ]) > + ifTrue: [converter nextChunkLineEndConvertingFromStream: self] > + ifFalse: [converter nextChunkFromStream: self]! > - ^converter nextChunkFromStream: self! > > Item was added: > + ----- Method: UTF8TextConverter>>nextChunkLineEndConvertingFromStream: (in category 'fileIn/Out') ----- > + nextChunkLineEndConvertingFromStream: input > + "Answer the contents of input, up to the next terminator character. Doubled terminators indicate an embedded terminator character." > + "Obey line end conversion." > + > + self skipSeparatorsFrom: input. > + ^self > + parseLangTagFor: ( > + self class decodeByteString: ( > + String new: 1000 streamContents: [ :stream | > + [ > + stream nextPutAll: (input upTo: $!!). > + input basicNext == $!! ] > + whileTrue: [ > + stream nextPut: $!! ]. > + input atEnd ifFalse: [ input skip: -1 ] ])) > + fromStream: input! > > > |
On 14.04.2015, at 13:17, Levente Uzonyi <[hidden email]> wrote: > This doesn't look right, because it'll decode the input twice. Once in #upTo:, and once in #decodeByteString:. The latter will fail if > its argument is not a valid UTF-8 string after the first decoding. > Darn. Do you have an Idea? Probably the magic of upTo must me moved to decodeByteString for the UTF converter? Best -Tobias > Levente > > On Tue, 14 Apr 2015, [hidden email] wrote: > >> Tobias Pape uploaded a new version of Multilingual to project The Trunk: >> http://source.squeak.org/trunk/Multilingual-topa.205.mcz >> >> ==================== Summary ==================== >> >> Name: Multilingual-topa.205 >> Author: topa >> Time: 14 April 2015, 11:04:27.175 am >> UUID: da7bc0f1-5f76-40d0-9bcb-7cf96596ac92 >> Ancestors: Multilingual-topa.204, Multilingual-cbc.201 >> >> Pick up a fix for MultiByteFileStream>>#nextChunk >> >> =============== Diff against Multilingual-topa.204 =============== >> >> Item was changed: >> ----- Method: MultiByteFileStream>>nextChunk (in category 'fileIn/Out') ----- >> nextChunk >> "Answer the contents of the receiver, up to the next terminator character. Doubled terminators indicate an embedded terminator character." >> >> + ^(wantsLineEndConversion and: [ lineEndConvention notNil ]) >> + ifTrue: [converter nextChunkLineEndConvertingFromStream: self] >> + ifFalse: [converter nextChunkFromStream: self]! >> - ^converter nextChunkFromStream: self! >> >> Item was added: >> + ----- Method: UTF8TextConverter>>nextChunkLineEndConvertingFromStream: (in category 'fileIn/Out') ----- >> + nextChunkLineEndConvertingFromStream: input >> + "Answer the contents of input, up to the next terminator character. Doubled terminators indicate an embedded terminator character." >> + "Obey line end conversion." >> + >> + self skipSeparatorsFrom: input. >> + ^self >> + parseLangTagFor: ( >> + self class decodeByteString: ( >> + String new: 1000 streamContents: [ :stream | >> + [ >> + stream nextPutAll: (input upTo: $!!). >> + input basicNext == $!! ] >> + whileTrue: [ >> + stream nextPut: $!! ]. >> + input atEnd ifFalse: [ input skip: -1 ] ])) >> + fromStream: input! |
On Tue, 14 Apr 2015, Tobias Pape wrote:
> > On 14.04.2015, at 13:17, Levente Uzonyi <[hidden email]> wrote: > >> This doesn't look right, because it'll decode the input twice. Once in #upTo:, and once in #decodeByteString:. The latter will fail if >> its argument is not a valid UTF-8 string after the first decoding. >> > > Darn. > Do you have an Idea? > Probably the magic of upTo must me moved > to decodeByteString for the UTF converter? I've uploaded Multilingual-ul.206 to the Inbox with a more general fix. The asymmetry in line end conversions was a bit surprising to me, but the tests are green. Levente > Best > -Tobias > >> Levente >> >> On Tue, 14 Apr 2015, [hidden email] wrote: >> >>> Tobias Pape uploaded a new version of Multilingual to project The Trunk: >>> http://source.squeak.org/trunk/Multilingual-topa.205.mcz >>> >>> ==================== Summary ==================== >>> >>> Name: Multilingual-topa.205 >>> Author: topa >>> Time: 14 April 2015, 11:04:27.175 am >>> UUID: da7bc0f1-5f76-40d0-9bcb-7cf96596ac92 >>> Ancestors: Multilingual-topa.204, Multilingual-cbc.201 >>> >>> Pick up a fix for MultiByteFileStream>>#nextChunk >>> >>> =============== Diff against Multilingual-topa.204 =============== >>> >>> Item was changed: >>> ----- Method: MultiByteFileStream>>nextChunk (in category 'fileIn/Out') ----- >>> nextChunk >>> "Answer the contents of the receiver, up to the next terminator character. Doubled terminators indicate an embedded terminator character." >>> >>> + ^(wantsLineEndConversion and: [ lineEndConvention notNil ]) >>> + ifTrue: [converter nextChunkLineEndConvertingFromStream: self] >>> + ifFalse: [converter nextChunkFromStream: self]! >>> - ^converter nextChunkFromStream: self! >>> >>> Item was added: >>> + ----- Method: UTF8TextConverter>>nextChunkLineEndConvertingFromStream: (in category 'fileIn/Out') ----- >>> + nextChunkLineEndConvertingFromStream: input >>> + "Answer the contents of input, up to the next terminator character. Doubled terminators indicate an embedded terminator character." >>> + "Obey line end conversion." >>> + >>> + self skipSeparatorsFrom: input. >>> + ^self >>> + parseLangTagFor: ( >>> + self class decodeByteString: ( >>> + String new: 1000 streamContents: [ :stream | >>> + [ >>> + stream nextPutAll: (input upTo: $!!). >>> + input basicNext == $!! ] >>> + whileTrue: [ >>> + stream nextPut: $!! ]. >>> + input atEnd ifFalse: [ input skip: -1 ] ])) >>> + fromStream: input! > > > > |
On 15.04.2015, at 01:20, Levente Uzonyi <[hidden email]> wrote: > On Tue, 14 Apr 2015, Tobias Pape wrote: > >> >> On 14.04.2015, at 13:17, Levente Uzonyi <[hidden email]> wrote: >> >>> This doesn't look right, because it'll decode the input twice. Once in #upTo:, and once in #decodeByteString:. The latter will fail if >>> its argument is not a valid UTF-8 string after the first decoding. >>> >> >> Darn. >> Do you have an Idea? >> Probably the magic of upTo must me moved >> to decodeByteString for the UTF converter? > > I've uploaded Multilingual-ul.206 to the Inbox with a more general fix. > The asymmetry in line end conversions was a bit surprising to me, but the tests are green. Thank you, Levente! Best -Tobias > >> Best >> -Tobias >> >>> Levente >>> >>> On Tue, 14 Apr 2015, [hidden email] wrote: >>> >>>> Tobias Pape uploaded a new version of Multilingual to project The Trunk: >>>> http://source.squeak.org/trunk/Multilingual-topa.205.mcz >>>> >>>> ==================== Summary ==================== >>>> >>>> Name: Multilingual-topa.205 >>>> Author: topa >>>> Time: 14 April 2015, 11:04:27.175 am >>>> UUID: da7bc0f1-5f76-40d0-9bcb-7cf96596ac92 >>>> Ancestors: Multilingual-topa.204, Multilingual-cbc.201 >>>> >>>> Pick up a fix for MultiByteFileStream>>#nextChunk >>>> >>>> =============== Diff against Multilingual-topa.204 =============== >>>> >>>> Item was changed: >>>> ----- Method: MultiByteFileStream>>nextChunk (in category 'fileIn/Out') ----- >>>> nextChunk >>>> "Answer the contents of the receiver, up to the next terminator character. Doubled terminators indicate an embedded terminator character." >>>> >>>> + ^(wantsLineEndConversion and: [ lineEndConvention notNil ]) >>>> + ifTrue: [converter nextChunkLineEndConvertingFromStream: self] >>>> + ifFalse: [converter nextChunkFromStream: self]! >>>> - ^converter nextChunkFromStream: self! >>>> >>>> Item was added: >>>> + ----- Method: UTF8TextConverter>>nextChunkLineEndConvertingFromStream: (in category 'fileIn/Out') ----- >>>> + nextChunkLineEndConvertingFromStream: input >>>> + "Answer the contents of input, up to the next terminator character. Doubled terminators indicate an embedded terminator character." >>>> + "Obey line end conversion." >>>> + >>>> + self skipSeparatorsFrom: input. >>>> + ^self >>>> + parseLangTagFor: ( >>>> + self class decodeByteString: ( >>>> + String new: 1000 streamContents: [ :stream | >>>> + [ >>>> + stream nextPutAll: (input upTo: $!!). >>>> + input basicNext == $!! ] >>>> + whileTrue: [ >>>> + stream nextPut: $!! ]. >>>> + input atEnd ifFalse: [ input skip: -1 ] ])) >>>> + fromStream: input! |
Free forum by Nabble | Edit this page |