Hello all,
I had some problems with encoding (UTF-8) in Aida 6 beta on Smalltalk/X recently. Honza Vrany found that the problem could be in method properString: aString in class AidaSite, that looks like this: properString: aString "if two byte string, convert it to one byte, cut twobyte characters, make them $? " | stream | aString class == ByteString ifTrue: [^aString]. stream := WriteStream on: String new. aString do: [:char | stream nextPut: (char asInteger < 256 ifTrue: [char] ifFalse: [$?])]. ^stream contents Honza changed it, so it encodes everything into UTF-8 if needed: properString: aString | stream | aString bitsPerCharacter == 8 ifTrue: [^aString]. ^aString utf8Encoded We find that method a little tricky and we are not sure whether that change is safe and won't cause some problems when it is called in some special contexts. Do you think our change is ok? Is that method still needed, if UTF-8 is becoming common encoding in Aida now? Kind regards, Jarda Havlin _______________________________________________ Aida mailing list [hidden email] http://lists.aidaweb.si/mailman/listinfo/aida |
Hello Jaroslav,
#properString: method was introduced in VW Aida for probably the same problem as you have, so now it is a time to find the cause. Namely, even that rendering the WebElements should always return ASCII strings only (by converting texts to UTF-8), sometimes it happens that they still return TwoByteString. I didn't have time to find where, so I implemented that #properString: instead. But we must get rid of that method ASAP, because it prohibits direct streaming to the output request and therefore causes slower response time. It would be just wonderful if you guys can find the problem by yourself. Probably by inserting some watchdog code to see, where some element emits TwoByteStrings for a first time, because that causes that a whole result become a TwoByteString. PS: TwoByteString is used in VW for Unicode text, in Squeak there is WideString, what is it in ST/X? Best regards Janko Jaroslav Havlín pravi: > Hello all, > > I had some problems with encoding (UTF-8) in Aida 6 beta on > Smalltalk/X recently. > Honza Vrany found that the problem could be in method properString: > aString in class AidaSite, that looks like this: > > properString: aString > "if two byte string, convert it to one byte, cut twobyte > characters, make them $? " > | stream | > aString class == ByteString ifTrue: [^aString]. > stream := WriteStream on: String new. > aString > do: [:char | stream nextPut: (char asInteger < 256 > ifTrue: [char] ifFalse: [$?])]. > ^stream contents > > Honza changed it, so it encodes everything into UTF-8 if needed: > > properString: aString > | stream | > aString bitsPerCharacter == 8 ifTrue: [^aString]. > ^aString utf8Encoded > > > We find that method a little tricky and we are not sure whether that > change is safe > and won't cause some problems when it is called in some special contexts. > > Do you think our change is ok? Is that method still needed, if UTF-8 > is becoming common encoding in Aida now? > > Kind regards, > Jarda Havlin > _______________________________________________ > Aida mailing list > [hidden email] > http://lists.aidaweb.si/mailman/listinfo/aida > -- Janko Mivšek Svetovalec za informatiko Eranova d.o.o. Ljubljana, Slovenija www.eranova.si tel: 01 514 22 55 faks: 01 514 22 56 gsm: 031 674 565 _______________________________________________ Aida mailing list [hidden email] http://lists.aidaweb.si/mailman/listinfo/aida |
In reply to this post by Jaroslav Havlín
> properString: aString
> | stream | > aString bitsPerCharacter == 8 ifTrue: [^aString]. > ^aString utf8Encoded > > > We find that method a little tricky and we are not sure whether that > change is safe > and won't cause some problems when it is called in some special contexts. > > Do you think our change is ok? Is that method still needed, if UTF-8 > is becoming common encoding in Aida now? This solution is a good extend of my hack, but let we try to avoid that method once for ever, because as I said it just hides the real problem somewhere else, and prohibits direct streaming of web pages. Best regards Janko -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si _______________________________________________ Aida mailing list [hidden email] http://lists.aidaweb.si/mailman/listinfo/aida |
In reply to this post by Janko Mivšek
Hi Janko,
On Wed, 2009-04-01 at 20:00 +0200, Janko Mivšek wrote: > Hello Jaroslav, > > #properString: method was introduced in VW Aida for probably the same > problem as you have, so now it is a time to find the cause. > > Namely, even that rendering the WebElements should always return ASCII > strings only (by converting texts to UTF-8), sometimes it happens that > they still return TwoByteString. I didn't have time to find where, so I > implemented that #properString: instead. But we must get rid of that > method ASAP, because it prohibits direct streaming to the output request > and therefore causes slower response time. Sure, that would be nice. Go on! > > It would be just wonderful if you guys can find the problem by yourself. > Probably by inserting some watchdog code to see, where some element > emits TwoByteStrings for a first time, because that causes that a whole > result become a TwoByteString. Well, the reason is pretty clear to me, but hard to explain :-) First of all, our Jaroslav's application basically renders some text with national characters (czech ones :-) to the output. In order to make all the national characters working seamlessly in Aida on St/X, we are using CharacterWriteStream class, which is basically a write stream on a single-byte stream. Once a non-ascii character (i.e. character who's code point is > 255) is written to a CharacterWriteStream, underlaying single-byte string is converted to multi-byte stream. CharacterWriteStream protect us from fiddling about all the string recoding every time something is written to response stream. Programmer can #nextPutAll: any string, no matter whether it is single or multi-byte string, no matter whether it is UTF8, UTF16 or even chinesse in GB/Big5 encoding. That's why we are using CharacterWriteStream in Aida on St/X. > > PS: TwoByteString is used in VW for Unicode text, in Squeak there is > WideString, what is it in ST/X? Well, in St/X there are bunch of string-like classes: CharacterArray String Symbol TwoByteString Unicode16String GBEncodedString BIG5EncodedString JISEncodedString KSCEncodedString FourByteString Unicode32String > > Best regards > Janko > > Jaroslav Havlín pravi: > > Hello all, > > > > I had some problems with encoding (UTF-8) in Aida 6 beta on > > Smalltalk/X recently. > > Honza Vrany found that the problem could be in method properString: > > aString in class AidaSite, that looks like this: > > > > properString: aString > > "if two byte string, convert it to one byte, cut twobyte > > characters, make them $? " > > | stream | > > aString class == ByteString ifTrue: [^aString]. > > stream := WriteStream on: String new. > > aString > > do: [:char | stream nextPut: (char asInteger < 256 > > ifTrue: [char] ifFalse: [$?])]. > > ^stream contents > > > > Honza changed it, so it encodes everything into UTF-8 if needed: > > > > properString: aString > > | stream | > > aString bitsPerCharacter == 8 ifTrue: [^aString]. > > ^aString utf8Encoded > > > > > > We find that method a little tricky and we are not sure whether that > > change is safe > > and won't cause some problems when it is called in some special contexts. > > > > Do you think our change is ok? Is that method still needed, if UTF-8 > > is becoming common encoding in Aida now? > > > > Kind regards, > > Jarda Havlin > > _______________________________________________ > > Aida mailing list > > [hidden email] > > http://lists.aidaweb.si/mailman/listinfo/aida > > > _______________________________________________ Aida mailing list [hidden email] http://lists.aidaweb.si/mailman/listinfo/aida |
In reply to this post by Janko Mivšek
OKay, so what about following solution:
lets define an output encoding (probably somewhere in AidaSite). Then, let the output stream transparently encodes all data to that encoding. This is pretty easy implement in St/X and probably on VW too, since there is something like EncodingWriteStream Cheers, Jan On Wed, 2009-04-01 at 20:03 +0200, Janko Mivšek wrote: > > properString: aString > > | stream | > > aString bitsPerCharacter == 8 ifTrue: [^aString]. > > ^aString utf8Encoded > > > > > > We find that method a little tricky and we are not sure whether that > > change is safe > > and won't cause some problems when it is called in some special contexts. > > > > Do you think our change is ok? Is that method still needed, if UTF-8 > > is becoming common encoding in Aida now? > > This solution is a good extend of my hack, but let we try to avoid that > method once for ever, because as I said it just hides the real problem > somewhere else, and prohibits direct streaming of web pages. > > Best regards > Janko > > _______________________________________________ Aida mailing list [hidden email] http://lists.aidaweb.si/mailman/listinfo/aida |
Jan Vrany pravi:
> OKay, so what about following solution: > > lets define an output encoding (probably somewhere in > AidaSite). Then, let the output stream transparently > encodes all data to that encoding. > > This is pretty easy implement in St/X and probably > on VW too, since there is something like EncodingWriteStream There are already methods in AIDASite class, see codepage converting: convert: aString fromCodepage: aSymbol convert: aString toCodepage: aSymbol convertFromWeb: aString on: aSession convertToWeb: aString on: aSession If you implemented first two methods at least for UTF-8, Aida should convert properly by itself. Then you just: e addText: anInternalUnicodeString and you'll get a proper UTF-8 formated text to the browser and also back. Hope this helps Janko > > Cheers, Jan > > > On Wed, 2009-04-01 at 20:03 +0200, Janko Mivšek wrote: >>> properString: aString >>> | stream | >>> aString bitsPerCharacter == 8 ifTrue: [^aString]. >>> ^aString utf8Encoded >>> >>> >>> We find that method a little tricky and we are not sure whether that >>> change is safe >>> and won't cause some problems when it is called in some special contexts. >>> >>> Do you think our change is ok? Is that method still needed, if UTF-8 >>> is becoming common encoding in Aida now? >> This solution is a good extend of my hack, but let we try to avoid that >> method once for ever, because as I said it just hides the real problem >> somewhere else, and prohibits direct streaming of web pages. >> >> Best regards >> Janko >> >> > > _______________________________________________ > Aida mailing list > [hidden email] > http://lists.aidaweb.si/mailman/listinfo/aida -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si _______________________________________________ Aida mailing list [hidden email] http://lists.aidaweb.si/mailman/listinfo/aida |
Free forum by Nabble | Edit this page |