Hi,
I've managed to find and modify WAListenerEncoded so that it can process multibyte language - I've only tested it with Korean as UTF-8. During testing I found following problem. When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered image files correctly. I can get CSS file or script file correctly, but I cannot get image files. It seems that when I use WAListener, the server sent the image file of the size of 16135 byte, but original file size is 10819 byte, and this might be the source of the problem. I cannot open wrong sized file even though I cut the size of the file to the original one. Can anyone help me? Thanks in advance. |
It appears that you're trying to convince Seaside to send UTF-8, right? I had the same problem during my time working with it. Then again, I didn't want complete texts but only the Euro currency sign. And and what I did was: I left my fingers off of WAListener and all those. Instead, I assembled the three bytes needed for a Euro sign myself by hand into a normal String and lo and behold, it looked right in the browser.
For Korean your approach is probably better, I guess. Mine was unsatisfying, too, because it assumed incoming text to be encoded in latin-15, but emitted texts in utf-8. Really not satisfying. I thought, the right approach might be to enhance the String class with an additional field for the encoding and give it some methods to transcode between them. By the way: the method asUtf8 (or something like that) does not work properly with the Euro sign.
For example, when you print this:
'Grüß Gott!' asUtf8
it looks awful, because the new string doesn't know it's UTF-8.
So, it appears that for Seaside to cope with Korean well, you might have to change more than Seaside only.
niko
2008/1/3, [hidden email] <[hidden email]>:
Hi, |
In reply to this post by Chun, Sungjin
2008/1/3, [hidden email] <[hidden email]>:
> Hi, > > I've managed to find and modify WAListenerEncoded so that it can process > multibyte language - I've only tested it with Korean as UTF-8. During testing > I found following problem. Korean as UTF-8 should not work on WAListenerEncoded. If it does then it's a bug in WAListenerEncoded. The reason for this is that Korean as UTF-8 violates the contract between the server adapter and you. The *Encoded* adapters give you Strings in Squeak encoding (well not quite in the case of CJK because that is not possible since Unicode does not have the concept of language tags) but in turn expect Strings in Squeak encoding. In the case of Korean this means WideStrings. UTF-8 Strings are ByteStrings and should therefore not work. > When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered > image files correctly. I can get CSS file or script file correctly, but I cannot get > image files. I don't think WAListenerEncoded can ever work for binary files. The problem is that due to it's streaming nature WAListenerEncoded compared to WAKomEncoded can never look at the response. This means it can never decide wehter is should do encoding (based on the mimetype), so it always does it. In the case of binary content this is clearly wrong. Your best option (as always) is to serve static files (images, CSS, javascript) with Apache or something similar. > It seems that when I use WAListener, the server sent the image file of the size > of 16135 byte, but original file size is 10819 byte, and this might be the source > of the problem. I cannot open wrong sized file even though I cut the size of the > file to the original one. WAListener should not do any encoding at all so images should work. But then again we don't know what code you changed so we can't really help you. It would help if you send us the image so we can test. Cheers Philippe |
>> It seems that when I use WAListener, the server sent the image file of the size >> of 16135 byte, but original file size is 10819 byte, and this might be the source >> of the problem. I cannot open wrong sized file even though I cut the size of the >> file to the original one. > > WAListener should not do any encoding at all so images should work. > But then again we don't know what code you changed so we can't really > help you. It would help if you send us the image so we can test. It's likely that your images' bytes are being UTF-8 encoded. If the bytes are uniform, you can expect a 50% increase in size (the low 128 bytes are passed as they are, the high 128 bytes are expanded to two bytes; (128+128*2)/256 = 1.5) and that's what you are seeing. Paolo |
2008/1/4, Paolo Bonzini <[hidden email]>:
> > >> It seems that when I use WAListener, the server sent the image file of the size > >> of 16135 byte, but original file size is 10819 byte, and this might be the source > >> of the problem. I cannot open wrong sized file even though I cut the size of the > >> file to the original one. > > > > WAListener should not do any encoding at all so images should work. > > But then again we don't know what code you changed so we can't really > > help you. It would help if you send us the image so we can test. > > It's likely that your images' bytes are being UTF-8 encoded. If the > bytes are uniform, you can expect a 50% increase in size (the low 128 > bytes are passed as they are, the high 128 bytes are expanded to two > bytes; (128+128*2)/256 = 1.5) and that's what you are seeing. But WAListener is like WAKom it should not do any encoding unless someone changed the code. Cheers Philippe |
In reply to this post by Chun, Sungjin
I've found the main reason of image corruption; that's because WAListenerEncoded
does use UTF8Stream *unconditionally* as you said it does not decide based on mime type. But I cannot understand why Korean as UTF8 should not work. My image is cutomized by me so that it does support Korean and others(Japanese and Chinese but no font for these 2). WideString for korean can be fawlessly converted to/from UTF8 encoded byte string. Is this the work be done by WAListenerEncoded? Thank you for your help. Now I'm trying to find content-type of WAResponse before using UTF8Stream. ----- Original Message ----- From: Philippe Marschall <[hidden email]> To: The general-purpose Squeak developers list <[hidden email]> Sent: 08-01-04 20:40:57 Subject: Re: [Q] WAListener and WAFileLibrary problem 2008/1/3, [hidden email] <[hidden email]>: > Hi, > > I've managed to find and modify WAListenerEncoded so that it can process > multibyte language - I've only tested it with Korean as UTF-8. During testing > I found following problem. Korean as UTF-8 should not work on WAListenerEncoded. If it does then it's a bug in WAListenerEncoded. The reason for this is that Korean as UTF-8 violates the contract between the server adapter and you. The *Encoded* adapters give you Strings in Squeak encoding (well not quite in the case of CJK because that is not possible since Unicode does not have the concept of language tags) but in turn expect Strings in Squeak encoding. In the case of Korean this means WideStrings. UTF-8 Strings are ByteStrings and should therefore not work. > When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered > image files correctly. I can get CSS file or script file correctly, but I cannot get > image files. I don't think WAListenerEncoded can ever work for binary files. The problem is that due to it's streaming nature WAListenerEncoded compared to WAKomEncoded can never look at the response. This means it can never decide wehter is should do encoding (based on the mimetype), so it always does it. In the case of binary content this is clearly wrong. Your best option (as always) is to serve static files (images, CSS, javascript) with Apache or something similar. > It seems that when I use WAListener, the server sent the image file of the size > of 16135 byte, but original file size is 10819 byte, and this might be the source > of the problem. I cannot open wrong sized file even though I cut the size of the > file to the original one. WAListener should not do any encoding at all so images should work. But then again we don't know what code you changed so we can't really help you. It would help if you send us the image so we can test. Cheers Philippe |
In reply to this post by Chun, Sungjin
I've found the main reason of image corruption; that's because WAListenerEncoded
does use UTF8Stream *unconditionally* as you said it does not decide based on mime type. But I cannot understand why Korean as UTF8 should not work. My image is cutomized by me so that it does support Korean and others(Japanese and Chinese but no font for these 2). WideString for korean can be fawlessly converted to/from UTF8 encoded byte string. Is this the work be done by WAListenerEncoded? Thank you for your help. Now I'm trying to find content-type of WAResponse before using UTF8Stream. ----- Original Message ----- From: Philippe Marschall <[hidden email]> To: The general-purpose Squeak developers list <[hidden email]> Sent: 08-01-04 20:40:57 Subject: Re: [Q] WAListener and WAFileLibrary problem 2008/1/3, [hidden email] <[hidden email]>: > Hi, > > I've managed to find and modify WAListenerEncoded so that it can process > multibyte language - I've only tested it with Korean as UTF-8. During testing > I found following problem. Korean as UTF-8 should not work on WAListenerEncoded. If it does then it's a bug in WAListenerEncoded. The reason for this is that Korean as UTF-8 violates the contract between the server adapter and you. The *Encoded* adapters give you Strings in Squeak encoding (well not quite in the case of CJK because that is not possible since Unicode does not have the concept of language tags) but in turn expect Strings in Squeak encoding. In the case of Korean this means WideStrings. UTF-8 Strings are ByteStrings and should therefore not work. > When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered > image files correctly. I can get CSS file or script file correctly, but I cannot get > image files. I don't think WAListenerEncoded can ever work for binary files. The problem is that due to it's streaming nature WAListenerEncoded compared to WAKomEncoded can never look at the response. This means it can never decide wehter is should do encoding (based on the mimetype), so it always does it. In the case of binary content this is clearly wrong. Your best option (as always) is to serve static files (images, CSS, javascript) with Apache or something similar. > It seems that when I use WAListener, the server sent the image file of the size > of 16135 byte, but original file size is 10819 byte, and this might be the source > of the problem. I cannot open wrong sized file even though I cut the size of the > file to the original one. WAListener should not do any encoding at all so images should work. But then again we don't know what code you changed so we can't really help you. It would help if you send us the image so we can test. Cheers Philippe |
In reply to this post by Chun, Sungjin
2008/1/5, [hidden email] <[hidden email]>:
> I've found the main reason of image corruption; that's because WAListenerEncoded > does use UTF8Stream *unconditionally* as you said it does not decide based on mime > type. > > But I cannot understand why Korean as UTF8 should not work. Because WAListenerEncoded encoded gives you Strings in Squeak encoding but also expects Strings from you to be in Squeak encoding. If you pass to it Strings that are already in UTF8 they get converted twice to UTF8. > My image is cutomized by me > so that it does support Korean and others(Japanese and Chinese but no font for these 2). > WideString for korean can be fawlessly converted to/from UTF8 encoded byte string. No, not at all. UTF8 has no concept of language tags. Chees Philippe > Is this > the work be done by WAListenerEncoded? > > Thank you for your help. Now I'm trying to find content-type of WAResponse before using > UTF8Stream. > > ----- Original Message ----- > From: Philippe Marschall <[hidden email]> > To: The general-purpose Squeak developers list <[hidden email]> > Sent: 08-01-04 20:40:57 > Subject: Re: [Q] WAListener and WAFileLibrary problem > > 2008/1/3, [hidden email] <[hidden email]>: > > Hi, > > > > I've managed to find and modify WAListenerEncoded so that it can process > > multibyte language - I've only tested it with Korean as UTF-8. During testing > > I found following problem. > > Korean as UTF-8 should not work on WAListenerEncoded. If it does then > it's a bug in WAListenerEncoded. The reason for this is that Korean as > UTF-8 violates the contract between the server adapter and you. The > *Encoded* adapters give you Strings in Squeak encoding (well not quite > in the case of CJK because that is not possible since Unicode does not > have the concept of language tags) but in turn expect Strings in > Squeak encoding. In the case of Korean this means WideStrings. UTF-8 > Strings are ByteStrings and should therefore not work. > > > When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered > > image files correctly. I can get CSS file or script file correctly, but I cannot get > > image files. > > I don't think WAListenerEncoded can ever work for binary files. The > problem is that due to it's streaming nature WAListenerEncoded > compared to WAKomEncoded can never look at the response. This means it > can never decide wehter is should do encoding (based on the mimetype), > so it always does it. In the case of binary content this is clearly > wrong. Your best option (as always) is to serve static files (images, > CSS, javascript) with Apache or something similar. > > > It seems that when I use WAListener, the server sent the image file of the size > > of 16135 byte, but original file size is 10819 byte, and this might be the source > > of the problem. I cannot open wrong sized file even though I cut the size of the > > file to the original one. > > WAListener should not do any encoding at all so images should work. > But then again we don't know what code you changed so we can't really > help you. It would help if you send us the image so we can test. > > Cheers > Philippe > > > > > > |
In reply to this post by Chun, Sungjin
Ah, I've changed/added support for UnicodeEnvironment so that UTF-8
encoded byte array be converted to/from squeak's internal encoding. With this, I can read UTF-8 encoded text(which can include korean or other languages encoded as UTF-8) from squeak environment like file list. Language tag is not required because unicode does already has region for korea, japanese or chinese or any other languages supported by unicode. So we can determine from byte value sequence, in what language region does this byte sequence matches. Anyway I'm currently finding ways for determining content-type of WAResponse, so that if it's not text/html UTF8Stream be not used. Thank you. ----- Original Message ----- From: Philippe Marschall <[hidden email]> To: The general-purpose Squeak developers list <[hidden email]> Sent: 08-01-05 15:14:48 Subject: Re: [Q] WAListener and WAFileLibrary problem 2008/1/5, [hidden email] <[hidden email]>: > I've found the main reason of image corruption; that's because WAListenerEncoded > does use UTF8Stream *unconditionally* as you said it does not decide based on mime > type. > > But I cannot understand why Korean as UTF8 should not work. Because WAListenerEncoded encoded gives you Strings in Squeak encoding but also expects Strings from you to be in Squeak encoding. If you pass to it Strings that are already in UTF8 they get converted twice to UTF8. > My image is cutomized by me > so that it does support Korean and others(Japanese and Chinese but no font for these 2). > WideString for korean can be fawlessly converted to/from UTF8 encoded byte string. No, not at all. UTF8 has no concept of language tags. Chees Philippe > Is this > the work be done by WAListenerEncoded? > > Thank you for your help. Now I'm trying to find content-type of WAResponse before using > UTF8Stream. > > ----- Original Message ----- > From: Philippe Marschall <[hidden email]> > To: The general-purpose Squeak developers list <[hidden email]> > Sent: 08-01-04 20:40:57 > Subject: Re: [Q] WAListener and WAFileLibrary problem > > 2008/1/3, [hidden email] <[hidden email]>: > > Hi, > > > > I've managed to find and modify WAListenerEncoded so that it can process > > multibyte language - I've only tested it with Korean as UTF-8. During testing > > I found following problem. > > Korean as UTF-8 should not work on WAListenerEncoded. If it does then > it's a bug in WAListenerEncoded. The reason for this is that Korean as > UTF-8 violates the contract between the server adapter and you. The > *Encoded* adapters give you Strings in Squeak encoding (well not quite > in the case of CJK because that is not possible since Unicode does not > have the concept of language tags) but in turn expect Strings in > Squeak encoding. In the case of Korean this means WideStrings. UTF-8 > Strings are ByteStrings and should therefore not work. > > > When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered > > image files correctly. I can get CSS file or script file correctly, but I cannot get > > image files. > > I don't think WAListenerEncoded can ever work for binary files. The > problem is that due to it's streaming nature WAListenerEncoded > compared to WAKomEncoded can never look at the response. This means it > can never decide wehter is should do encoding (based on the mimetype), > so it always does it. In the case of binary content this is clearly > wrong. Your best option (as always) is to serve static files (images, > CSS, javascript) with Apache or something similar. > > > It seems that when I use WAListener, the server sent the image file of the size > > of 16135 byte, but original file size is 10819 byte, and this might be the source > > of the problem. I cannot open wrong sized file even though I cut the size of the > > file to the original one. > > WAListener should not do any encoding at all so images should work. > But then again we don't know what code you changed so we can't really > help you. It would help if you send us the image so we can test. > > Cheers > Philippe > > > > > > |
In reply to this post by Chun, Sungjin
Ah, I've changed/added support for UnicodeEnvironment so that UTF-8
encoded byte array be converted to/from squeak's internal encoding. With this, I can read UTF-8 encoded text(which can include korean or other languages encoded as UTF-8) from squeak environment like file list. Language tag is not required because unicode does already has region for korea, japanese or chinese or any other languages supported by unicode. So we can determine from byte value sequence, in what language region does this byte sequence matches. Anyway I'm currently finding ways for determining content-type of WAResponse, so that if it's not text/html UTF8Stream be not used. Thank you. ----- Original Message ----- From: Philippe Marschall <[hidden email]> To: The general-purpose Squeak developers list <[hidden email]> Sent: 08-01-05 15:14:48 Subject: Re: [Q] WAListener and WAFileLibrary problem 2008/1/5, [hidden email] <[hidden email]>: > I've found the main reason of image corruption; that's because WAListenerEncoded > does use UTF8Stream *unconditionally* as you said it does not decide based on mime > type. > > But I cannot understand why Korean as UTF8 should not work. Because WAListenerEncoded encoded gives you Strings in Squeak encoding but also expects Strings from you to be in Squeak encoding. If you pass to it Strings that are already in UTF8 they get converted twice to UTF8. > My image is cutomized by me > so that it does support Korean and others(Japanese and Chinese but no font for these 2). > WideString for korean can be fawlessly converted to/from UTF8 encoded byte string. No, not at all. UTF8 has no concept of language tags. Chees Philippe > Is this > the work be done by WAListenerEncoded? > > Thank you for your help. Now I'm trying to find content-type of WAResponse before using > UTF8Stream. > > ----- Original Message ----- > From: Philippe Marschall <[hidden email]> > To: The general-purpose Squeak developers list <[hidden email]> > Sent: 08-01-04 20:40:57 > Subject: Re: [Q] WAListener and WAFileLibrary problem > > 2008/1/3, [hidden email] <[hidden email]>: > > Hi, > > > > I've managed to find and modify WAListenerEncoded so that it can process > > multibyte language - I've only tested it with Korean as UTF-8. During testing > > I found following problem. > > Korean as UTF-8 should not work on WAListenerEncoded. If it does then > it's a bug in WAListenerEncoded. The reason for this is that Korean as > UTF-8 violates the contract between the server adapter and you. The > *Encoded* adapters give you Strings in Squeak encoding (well not quite > in the case of CJK because that is not possible since Unicode does not > have the concept of language tags) but in turn expect Strings in > Squeak encoding. In the case of Korean this means WideStrings. UTF-8 > Strings are ByteStrings and should therefore not work. > > > When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered > > image files correctly. I can get CSS file or script file correctly, but I cannot get > > image files. > > I don't think WAListenerEncoded can ever work for binary files. The > problem is that due to it's streaming nature WAListenerEncoded > compared to WAKomEncoded can never look at the response. This means it > can never decide wehter is should do encoding (based on the mimetype), > so it always does it. In the case of binary content this is clearly > wrong. Your best option (as always) is to serve static files (images, > CSS, javascript) with Apache or something similar. > > > It seems that when I use WAListener, the server sent the image file of the size > > of 16135 byte, but original file size is 10819 byte, and this might be the source > > of the problem. I cannot open wrong sized file even though I cut the size of the > > file to the original one. > > WAListener should not do any encoding at all so images should work. > But then again we don't know what code you changed so we can't really > help you. It would help if you send us the image so we can test. > > Cheers > Philippe > > > > > > |
In reply to this post by Chun, Sungjin
2008/1/5, [hidden email] <[hidden email]>:
> Ah, I've changed/added support for UnicodeEnvironment so that UTF-8 > encoded byte array be converted to/from squeak's internal encoding. > With this, I can read UTF-8 encoded text(which can include korean or > other languages encoded as UTF-8) from squeak environment like > file list. > > Language tag is not required because unicode does already has region for > korea, japanese or chinese or any other languages supported by unicode. > So we can determine from byte value sequence, in what language region > does this byte sequence matches. Uhm no. Unicode does Han-Unification. So for some byte sequences there is no way of telling whether they're Chinese, Japanese or Korean. Cheers Philippe > Anyway I'm currently finding ways for determining content-type of WAResponse, > so that if it's not text/html UTF8Stream be not used. > > Thank you. > > ----- Original Message ----- > From: Philippe Marschall <[hidden email]> > To: The general-purpose Squeak developers list <[hidden email]> > Sent: 08-01-05 15:14:48 > Subject: Re: [Q] WAListener and WAFileLibrary problem > > 2008/1/5, [hidden email] <[hidden email]>: > > I've found the main reason of image corruption; that's because WAListenerEncoded > > does use UTF8Stream *unconditionally* as you said it does not decide based on mime > > type. > > > > But I cannot understand why Korean as UTF8 should not work. > > Because WAListenerEncoded encoded gives you Strings in Squeak encoding > but also expects Strings from you to be in Squeak encoding. If you > pass to it Strings that are already in UTF8 they get converted twice > to UTF8. > > > My image is cutomized by me > > so that it does support Korean and others(Japanese and Chinese but no font for these 2). > > WideString for korean can be fawlessly converted to/from UTF8 encoded byte string. > > No, not at all. UTF8 has no concept of language tags. > > Chees > Philippe > > > Is this > > the work be done by WAListenerEncoded? > > > > Thank you for your help. Now I'm trying to find content-type of WAResponse before using > > UTF8Stream. > > > > ----- Original Message ----- > > From: Philippe Marschall <[hidden email]> > > To: The general-purpose Squeak developers list <[hidden email]> > > Sent: 08-01-04 20:40:57 > > Subject: Re: [Q] WAListener and WAFileLibrary problem > > > > 2008/1/3, [hidden email] <[hidden email]>: > > > Hi, > > > > > > I've managed to find and modify WAListenerEncoded so that it can process > > > multibyte language - I've only tested it with Korean as UTF-8. During testing > > > I found following problem. > > > > Korean as UTF-8 should not work on WAListenerEncoded. If it does then > > it's a bug in WAListenerEncoded. The reason for this is that Korean as > > UTF-8 violates the contract between the server adapter and you. The > > *Encoded* adapters give you Strings in Squeak encoding (well not quite > > in the case of CJK because that is not possible since Unicode does not > > have the concept of language tags) but in turn expect Strings in > > Squeak encoding. In the case of Korean this means WideStrings. UTF-8 > > Strings are ByteStrings and should therefore not work. > > > > > When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered > > > image files correctly. I can get CSS file or script file correctly, but I cannot get > > > image files. > > > > I don't think WAListenerEncoded can ever work for binary files. The > > problem is that due to it's streaming nature WAListenerEncoded > > compared to WAKomEncoded can never look at the response. This means it > > can never decide wehter is should do encoding (based on the mimetype), > > so it always does it. In the case of binary content this is clearly > > wrong. Your best option (as always) is to serve static files (images, > > CSS, javascript) with Apache or something similar. > > > > > It seems that when I use WAListener, the server sent the image file of the size > > > of 16135 byte, but original file size is 10819 byte, and this might be the source > > > of the problem. I cannot open wrong sized file even though I cut the size of the > > > file to the original one. > > > > WAListener should not do any encoding at all so images should work. > > But then again we don't know what code you changed so we can't really > > help you. It would help if you send us the image so we can test. > > > > Cheers > > Philippe > > > > > > > > > > > > > > > > > > |
In reply to this post by Chun, Sungjin
I do not know Han-Unification part - in fact Hanja, the chinese letter or alphabet is not
included when I say Korean; only hangul, the korean alphabet/letter I say. This does have dedicated region. ----- Original Message ----- From: Philippe Marschall <[hidden email]> To: The general-purpose Squeak developers list <[hidden email]> Sent: 08-01-05 20:38:40 Subject: Re: Re: [Q] WAListener and WAFileLibrary problem 2008/1/5, [hidden email] <[hidden email]>: > Ah, I've changed/added support for UnicodeEnvironment so that UTF-8 > encoded byte array be converted to/from squeak's internal encoding. > With this, I can read UTF-8 encoded text(which can include korean or > other languages encoded as UTF-8) from squeak environment like > file list. > > Language tag is not required because unicode does already has region for > korea, japanese or chinese or any other languages supported by unicode. > So we can determine from byte value sequence, in what language region > does this byte sequence matches. is no way of telling whether they're Chinese, Japanese or Korean. Cheers Philippe > Anyway I'm currently finding ways for determining content-type of WAResponse, > so that if it's not text/html UTF8Stream be not used. > > Thank you. > > ----- Original Message ----- > From: Philippe Marschall <[hidden email]> > To: The general-purpose Squeak developers list <[hidden email]> > Sent: 08-01-05 15:14:48 > Subject: Re: [Q] WAListener and WAFileLibrary problem > > 2008/1/5, [hidden email] <[hidden email]>: > > I've found the main reason of image corruption; that's because WAListenerEncoded > > does use UTF8Stream *unconditionally* as you said it does not decide based on mime > > type. > > > > But I cannot understand why Korean as UTF8 should not work. > > Because WAListenerEncoded encoded gives you Strings in Squeak encoding > but also expects Strings from you to be in Squeak encoding. If you > pass to it Strings that are already in UTF8 they get converted twice > to UTF8. > > > My image is cutomized by me > > so that it does support Korean and others(Japanese and Chinese but no font for these 2). > > WideString for korean can be fawlessly converted to/from UTF8 encoded byte string. > > No, not at all. UTF8 has no concept of language tags. > > Chees > Philippe > > > Is this > > the work be done by WAListenerEncoded? > > > > Thank you for your help. Now I'm trying to find content-type of WAResponse before using > > UTF8Stream. > > > > ----- Original Message ----- > > From: Philippe Marschall <[hidden email]> > > To: The general-purpose Squeak developers list <[hidden email]> > > Sent: 08-01-04 20:40:57 > > Subject: Re: [Q] WAListener and WAFileLibrary problem > > > > 2008/1/3, [hidden email] <[hidden email]>: > > > Hi, > > > > > > I've managed to find and modify WAListenerEncoded so that it can process > > > multibyte language - I've only tested it with Korean as UTF-8. During testing > > > I found following problem. > > > > Korean as UTF-8 should not work on WAListenerEncoded. If it does then > > it's a bug in WAListenerEncoded. The reason for this is that Korean as > > UTF-8 violates the contract between the server adapter and you. The > > *Encoded* adapters give you Strings in Squeak encoding (well not quite > > in the case of CJK because that is not possible since Unicode does not > > have the concept of language tags) but in turn expect Strings in > > Squeak encoding. In the case of Korean this means WideStrings. UTF-8 > > Strings are ByteStrings and should therefore not work. > > > > > When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered > > > image files correctly. I can get CSS file or script file correctly, but I cannot get > > > image files. > > > > I don't think WAListenerEncoded can ever work for binary files. The > > problem is that due to it's streaming nature WAListenerEncoded > > compared to WAKomEncoded can never look at the response. This means it > > can never decide wehter is should do encoding (based on the mimetype), > > so it always does it. In the case of binary content this is clearly > > wrong. Your best option (as always) is to serve static files (images, > > CSS, javascript) with Apache or something similar. > > > > > It seems that when I use WAListener, the server sent the image file of the size > > > of 16135 byte, but original file size is 10819 byte, and this might be the source > > > of the problem. I cannot open wrong sized file even though I cut the size of the > > > file to the original one. > > > > WAListener should not do any encoding at all so images should work. > > But then again we don't know what code you changed so we can't really > > help you. It would help if you send us the image so we can test. > > > > Cheers > > Philippe > > > > > > > > > > > > > > > > > > |
In reply to this post by Chun, Sungjin
I do not know Han-Unification part - in fact Hanja, the chinese letter or alphabet is not
included when I say Korean; only hangul, the korean alphabet/letter I say. This does have dedicated region. ----- Original Message ----- From: Philippe Marschall <[hidden email]> To: The general-purpose Squeak developers list <[hidden email]> Sent: 08-01-05 20:38:40 Subject: Re: Re: [Q] WAListener and WAFileLibrary problem 2008/1/5, [hidden email] <[hidden email]>: > Ah, I've changed/added support for UnicodeEnvironment so that UTF-8 > encoded byte array be converted to/from squeak's internal encoding. > With this, I can read UTF-8 encoded text(which can include korean or > other languages encoded as UTF-8) from squeak environment like > file list. > > Language tag is not required because unicode does already has region for > korea, japanese or chinese or any other languages supported by unicode. > So we can determine from byte value sequence, in what language region > does this byte sequence matches. is no way of telling whether they're Chinese, Japanese or Korean. Cheers Philippe > Anyway I'm currently finding ways for determining content-type of WAResponse, > so that if it's not text/html UTF8Stream be not used. > > Thank you. > > ----- Original Message ----- > From: Philippe Marschall <[hidden email]> > To: The general-purpose Squeak developers list <[hidden email]> > Sent: 08-01-05 15:14:48 > Subject: Re: [Q] WAListener and WAFileLibrary problem > > 2008/1/5, [hidden email] <[hidden email]>: > > I've found the main reason of image corruption; that's because WAListenerEncoded > > does use UTF8Stream *unconditionally* as you said it does not decide based on mime > > type. > > > > But I cannot understand why Korean as UTF8 should not work. > > Because WAListenerEncoded encoded gives you Strings in Squeak encoding > but also expects Strings from you to be in Squeak encoding. If you > pass to it Strings that are already in UTF8 they get converted twice > to UTF8. > > > My image is cutomized by me > > so that it does support Korean and others(Japanese and Chinese but no font for these 2). > > WideString for korean can be fawlessly converted to/from UTF8 encoded byte string. > > No, not at all. UTF8 has no concept of language tags. > > Chees > Philippe > > > Is this > > the work be done by WAListenerEncoded? > > > > Thank you for your help. Now I'm trying to find content-type of WAResponse before using > > UTF8Stream. > > > > ----- Original Message ----- > > From: Philippe Marschall <[hidden email]> > > To: The general-purpose Squeak developers list <[hidden email]> > > Sent: 08-01-04 20:40:57 > > Subject: Re: [Q] WAListener and WAFileLibrary problem > > > > 2008/1/3, [hidden email] <[hidden email]>: > > > Hi, > > > > > > I've managed to find and modify WAListenerEncoded so that it can process > > > multibyte language - I've only tested it with Korean as UTF-8. During testing > > > I found following problem. > > > > Korean as UTF-8 should not work on WAListenerEncoded. If it does then > > it's a bug in WAListenerEncoded. The reason for this is that Korean as > > UTF-8 violates the contract between the server adapter and you. The > > *Encoded* adapters give you Strings in Squeak encoding (well not quite > > in the case of CJK because that is not possible since Unicode does not > > have the concept of language tags) but in turn expect Strings in > > Squeak encoding. In the case of Korean this means WideStrings. UTF-8 > > Strings are ByteStrings and should therefore not work. > > > > > When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered > > > image files correctly. I can get CSS file or script file correctly, but I cannot get > > > image files. > > > > I don't think WAListenerEncoded can ever work for binary files. The > > problem is that due to it's streaming nature WAListenerEncoded > > compared to WAKomEncoded can never look at the response. This means it > > can never decide wehter is should do encoding (based on the mimetype), > > so it always does it. In the case of binary content this is clearly > > wrong. Your best option (as always) is to serve static files (images, > > CSS, javascript) with Apache or something similar. > > > > > It seems that when I use WAListener, the server sent the image file of the size > > > of 16135 byte, but original file size is 10819 byte, and this might be the source > > > of the problem. I cannot open wrong sized file even though I cut the size of the > > > file to the original one. > > > > WAListener should not do any encoding at all so images should work. > > But then again we don't know what code you changed so we can't really > > help you. It would help if you send us the image so we can test. > > > > Cheers > > Philippe > > > > > > > > > > > > > > > > > > |
In reply to this post by Chun, Sungjin
2008/1/5, [hidden email] <[hidden email]>:
> Ah, I've changed/added support for UnicodeEnvironment so that UTF-8 > encoded byte array be converted to/from squeak's internal encoding. > With this, I can read UTF-8 encoded text(which can include korean or > other languages encoded as UTF-8) from squeak environment like > file list. > > Language tag is not required because unicode does already has region for > korea, japanese or chinese or any other languages supported by unicode. > So we can determine from byte value sequence, in what language region > does this byte sequence matches. > > Anyway I'm currently finding ways for determining content-type of WAResponse, > so that if it's not text/html UTF8Stream be not used. This is the code we use in Seaside 2.9. You'll probably have to adopt it for Seaside 2.7 writeResponseForRequest: aRequest on: aStream | request response | request := self convertRequest: aRequest. request responseStream: aStream. response := entryPoint handleRequest: request. response ifNil: [ ^ self ]. response class = WAResponse ifTrue: [ aStream resetBuffers. response contentType isBinary ifTrue: [ aStream binary ] ]. response writeOn: aStream. response release Note that css and js files are text too (and xml as well). This works for stuff like WAFile library that creates a new, non-streaming response. Cheers Philippe > > ----- Original Message ----- > From: Philippe Marschall <[hidden email]> > To: The general-purpose Squeak developers list <[hidden email]> > Sent: 08-01-05 15:14:48 > Subject: Re: [Q] WAListener and WAFileLibrary problem > > 2008/1/5, [hidden email] <[hidden email]>: > > I've found the main reason of image corruption; that's because WAListenerEncoded > > does use UTF8Stream *unconditionally* as you said it does not decide based on mime > > type. > > > > But I cannot understand why Korean as UTF8 should not work. > > Because WAListenerEncoded encoded gives you Strings in Squeak encoding > but also expects Strings from you to be in Squeak encoding. If you > pass to it Strings that are already in UTF8 they get converted twice > to UTF8. > > > My image is cutomized by me > > so that it does support Korean and others(Japanese and Chinese but no font for these 2). > > WideString for korean can be fawlessly converted to/from UTF8 encoded byte string. > > No, not at all. UTF8 has no concept of language tags. > > Chees > Philippe > > > Is this > > the work be done by WAListenerEncoded? > > > > Thank you for your help. Now I'm trying to find content-type of WAResponse before using > > UTF8Stream. > > > > ----- Original Message ----- > > From: Philippe Marschall <[hidden email]> > > To: The general-purpose Squeak developers list <[hidden email]> > > Sent: 08-01-04 20:40:57 > > Subject: Re: [Q] WAListener and WAFileLibrary problem > > > > 2008/1/3, [hidden email] <[hidden email]>: > > > Hi, > > > > > > I've managed to find and modify WAListenerEncoded so that it can process > > > multibyte language - I've only tested it with Korean as UTF-8. During testing > > > I found following problem. > > > > Korean as UTF-8 should not work on WAListenerEncoded. If it does then > > it's a bug in WAListenerEncoded. The reason for this is that Korean as > > UTF-8 violates the contract between the server adapter and you. The > > *Encoded* adapters give you Strings in Squeak encoding (well not quite > > in the case of CJK because that is not possible since Unicode does not > > have the concept of language tags) but in turn expect Strings in > > Squeak encoding. In the case of Korean this means WideStrings. UTF-8 > > Strings are ByteStrings and should therefore not work. > > > > > When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered > > > image files correctly. I can get CSS file or script file correctly, but I cannot get > > > image files. > > > > I don't think WAListenerEncoded can ever work for binary files. The > > problem is that due to it's streaming nature WAListenerEncoded > > compared to WAKomEncoded can never look at the response. This means it > > can never decide wehter is should do encoding (based on the mimetype), > > so it always does it. In the case of binary content this is clearly > > wrong. Your best option (as always) is to serve static files (images, > > CSS, javascript) with Apache or something similar. > > > > > It seems that when I use WAListener, the server sent the image file of the size > > > of 16135 byte, but original file size is 10819 byte, and this might be the source > > > of the problem. I cannot open wrong sized file even though I cut the size of the > > > file to the original one. > > > > WAListener should not do any encoding at all so images should work. > > But then again we don't know what code you changed so we can't really > > help you. It would help if you send us the image so we can test. > > > > Cheers > > Philippe > > > > > > > > > > > > > > > > > > |
In reply to this post by Chun, Sungjin
Thank you for your help. I'll try this.
----- Original Message ----- From: Philippe Marschall <[hidden email]> To: The general-purpose Squeak developers list <[hidden email]> Sent: 08-01-06 00:18:40 Subject: Re: Re: [Q] WAListener and WAFileLibrary problem 2008/1/5, [hidden email] <[hidden email]>: > Ah, I've changed/added support for UnicodeEnvironment so that UTF-8 > encoded byte array be converted to/from squeak's internal encoding. > With this, I can read UTF-8 encoded text(which can include korean or > other languages encoded as UTF-8) from squeak environment like > file list. > > Language tag is not required because unicode does already has region for > korea, japanese or chinese or any other languages supported by unicode. > So we can determine from byte value sequence, in what language region > does this byte sequence matches. > > Anyway I'm currently finding ways for determining content-type of WAResponse, > so that if it's not text/html UTF8Stream be not used. it for Seaside 2.7 writeResponseForRequest: aRequest on: aStream | request response | request := self convertRequest: aRequest. request responseStream: aStream. response := entryPoint handleRequest: request. response ifNil: [ ^ self ]. response class = WAResponse ifTrue: [ aStream resetBuffers. response contentType isBinary ifTrue: [ aStream binary ] ]. response writeOn: aStream. response release Note that css and js files are text too (and xml as well). This works for stuff like WAFile library that creates a new, non-streaming response. Cheers Philippe > > ----- Original Message ----- > From: Philippe Marschall <[hidden email]> > To: The general-purpose Squeak developers list <[hidden email]> > Sent: 08-01-05 15:14:48 > Subject: Re: [Q] WAListener and WAFileLibrary problem > > 2008/1/5, [hidden email] <[hidden email]>: > > I've found the main reason of image corruption; that's because WAListenerEncoded > > does use UTF8Stream *unconditionally* as you said it does not decide based on mime > > type. > > > > But I cannot understand why Korean as UTF8 should not work. > > Because WAListenerEncoded encoded gives you Strings in Squeak encoding > but also expects Strings from you to be in Squeak encoding. If you > pass to it Strings that are already in UTF8 they get converted twice > to UTF8. > > > My image is cutomized by me > > so that it does support Korean and others(Japanese and Chinese but no font for these 2). > > WideString for korean can be fawlessly converted to/from UTF8 encoded byte string. > > No, not at all. UTF8 has no concept of language tags. > > Chees > Philippe > > > Is this > > the work be done by WAListenerEncoded? > > > > Thank you for your help. Now I'm trying to find content-type of WAResponse before using > > UTF8Stream. > > > > ----- Original Message ----- > > From: Philippe Marschall <[hidden email]> > > To: The general-purpose Squeak developers list <[hidden email]> > > Sent: 08-01-04 20:40:57 > > Subject: Re: [Q] WAListener and WAFileLibrary problem > > > > 2008/1/3, [hidden email] <[hidden email]>: > > > Hi, > > > > > > I've managed to find and modify WAListenerEncoded so that it can process > > > multibyte language - I've only tested it with Korean as UTF-8. During testing > > > I found following problem. > > > > Korean as UTF-8 should not work on WAListenerEncoded. If it does then > > it's a bug in WAListenerEncoded. The reason for this is that Korean as > > UTF-8 violates the contract between the server adapter and you. The > > *Encoded* adapters give you Strings in Squeak encoding (well not quite > > in the case of CJK because that is not possible since Unicode does not > > have the concept of language tags) but in turn expect Strings in > > Squeak encoding. In the case of Korean this means WideStrings. UTF-8 > > Strings are ByteStrings and should therefore not work. > > > > > When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered > > > image files correctly. I can get CSS file or script file correctly, but I cannot get > > > image files. > > > > I don't think WAListenerEncoded can ever work for binary files. The > > problem is that due to it's streaming nature WAListenerEncoded > > compared to WAKomEncoded can never look at the response. This means it > > can never decide wehter is should do encoding (based on the mimetype), > > so it always does it. In the case of binary content this is clearly > > wrong. Your best option (as always) is to serve static files (images, > > CSS, javascript) with Apache or something similar. > > > > > It seems that when I use WAListener, the server sent the image file of the size > > > of 16135 byte, but original file size is 10819 byte, and this might be the source > > > of the problem. I cannot open wrong sized file even though I cut the size of the > > > file to the original one. > > > > WAListener should not do any encoding at all so images should work. > > But then again we don't know what code you changed so we can't really > > help you. It would help if you send us the image so we can test. > > > > Cheers > > Philippe > > > > > > > > > > > > > > > > > > |
In reply to this post by Chun, Sungjin
So how do you know whether the utf-8 byte sequence 0xE4 0xB8 0x8E
(U+4E0E) is generic Chinese, traditional Chinese, simplified Chinese, Japanese or Korean? Cheers Philippe 2008/1/5, [hidden email] <[hidden email]>: > I do not know Han-Unification part - in fact Hanja, the chinese letter or alphabet is not > included when I say Korean; only hangul, the korean alphabet/letter I say. This does have > dedicated region. > > ----- Original Message ----- > From: Philippe Marschall <[hidden email]> > To: The general-purpose Squeak developers list <[hidden email]> > Sent: 08-01-05 20:38:40 > Subject: Re: Re: [Q] WAListener and WAFileLibrary problem > > 2008/1/5, [hidden email] <[hidden email]>: > > Ah, I've changed/added support for UnicodeEnvironment so that UTF-8 > > encoded byte array be converted to/from squeak's internal encoding. > > With this, I can read UTF-8 encoded text(which can include korean or > > other languages encoded as UTF-8) from squeak environment like > > file list. > > > > Language tag is not required because unicode does already has region for > > korea, japanese or chinese or any other languages supported by unicode. > > So we can determine from byte value sequence, in what language region > > does this byte sequence matches. > > Uhm no. Unicode does Han-Unification. So for some byte sequences there > is no way of telling whether they're Chinese, Japanese or Korean. > > Cheers > Philippe > > > Anyway I'm currently finding ways for determining content-type of WAResponse, > > so that if it's not text/html UTF8Stream be not used. > > > > Thank you. > > > > ----- Original Message ----- > > From: Philippe Marschall <[hidden email]> > > To: The general-purpose Squeak developers list <[hidden email]> > > Sent: 08-01-05 15:14:48 > > Subject: Re: [Q] WAListener and WAFileLibrary problem > > > > 2008/1/5, [hidden email] <[hidden email]>: > > > I've found the main reason of image corruption; that's because WAListenerEncoded > > > does use UTF8Stream *unconditionally* as you said it does not decide based on mime > > > type. > > > > > > But I cannot understand why Korean as UTF8 should not work. > > > > Because WAListenerEncoded encoded gives you Strings in Squeak encoding > > but also expects Strings from you to be in Squeak encoding. If you > > pass to it Strings that are already in UTF8 they get converted twice > > to UTF8. > > > > > My image is cutomized by me > > > so that it does support Korean and others(Japanese and Chinese but no font for these 2). > > > WideString for korean can be fawlessly converted to/from UTF8 encoded byte string. > > > > No, not at all. UTF8 has no concept of language tags. > > > > Chees > > Philippe > > > > > Is this > > > the work be done by WAListenerEncoded? > > > > > > Thank you for your help. Now I'm trying to find content-type of WAResponse before using > > > UTF8Stream. > > > > > > ----- Original Message ----- > > > From: Philippe Marschall <[hidden email]> > > > To: The general-purpose Squeak developers list <[hidden email]> > > > Sent: 08-01-04 20:40:57 > > > Subject: Re: [Q] WAListener and WAFileLibrary problem > > > > > > 2008/1/3, [hidden email] <[hidden email]>: > > > > Hi, > > > > > > > > I've managed to find and modify WAListenerEncoded so that it can process > > > > multibyte language - I've only tested it with Korean as UTF-8. During testing > > > > I found following problem. > > > > > > Korean as UTF-8 should not work on WAListenerEncoded. If it does then > > > it's a bug in WAListenerEncoded. The reason for this is that Korean as > > > UTF-8 violates the contract between the server adapter and you. The > > > *Encoded* adapters give you Strings in Squeak encoding (well not quite > > > in the case of CJK because that is not possible since Unicode does not > > > have the concept of language tags) but in turn expect Strings in > > > Squeak encoding. In the case of Korean this means WideStrings. UTF-8 > > > Strings are ByteStrings and should therefore not work. > > > > > > > When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered > > > > image files correctly. I can get CSS file or script file correctly, but I cannot get > > > > image files. > > > > > > I don't think WAListenerEncoded can ever work for binary files. The > > > problem is that due to it's streaming nature WAListenerEncoded > > > compared to WAKomEncoded can never look at the response. This means it > > > can never decide wehter is should do encoding (based on the mimetype), > > > so it always does it. In the case of binary content this is clearly > > > wrong. Your best option (as always) is to serve static files (images, > > > CSS, javascript) with Apache or something similar. > > > > > > > It seems that when I use WAListener, the server sent the image file of the size > > > > of 16135 byte, but original file size is 10819 byte, and this might be the source > > > > of the problem. I cannot open wrong sized file even though I cut the size of the > > > > file to the original one. > > > > > > WAListener should not do any encoding at all so images should work. > > > But then again we don't know what code you changed so we can't really > > > help you. It would help if you send us the image so we can test. > > > > > > Cheers > > > Philippe > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Philippe Marschall wrote:
> So how do you know whether the utf-8 byte sequence 0xE4 0xB8 0x8E > (U+4E0E) is generic Chinese, traditional Chinese, simplified Chinese, > Japanese or Korean? Korean only uses CJK alphabets in a few cases (some person names, for example). Most of the time they use a phonetic/syllabic alphabet called Hangul which uses a different Unicode block. Paolo |
In reply to this post by Chun, Sungjin
With your given value, it does represent the Han Character proniunced as(in
Korea, chinese or japanese would have different pronunciation) "Ye". As I said in my previous mail and as Paolo said, above character is Hanja or Chinses letter so I cannot determine whether it's Chinese, Japanese or Korean. For example, with 'a' or'A' you cannot say that it's English or French or so. But given character is one of Hangul, then I can say it's Hangul and it probably represents Korean(though other language can be represented with Hangul). Sorry for my poor english. PS) Why we or other east asian countries have been using Han? Tha't because they didn't have their own alphabet. For example, We, Koean also had been used Hanja until Hangul was invented(at the time of Sejong, the Great, if you want to know history :-). ----- Original Message ----- From: Paolo Bonzini <[hidden email]> To: The general-purpose Squeak developers list <[hidden email]> Cc: [hidden email] Sent: 08-01-07 01:49:38 Subject: Re: [Q] WAListener and WAFileLibrary problem Philippe Marschall wrote: > So how do you know whether the utf-8 byte sequence 0xE4 0xB8 0x8E > (U+4E0E) is generic Chinese, traditional Chinese, simplified Chinese, > Japanese or Korean? Korean only uses CJK alphabets in a few cases (some person names, for example). Most of the time they use a phonetic/syllabic alphabet called Hangul which uses a different Unicode block. Paolo |
> With your given value, it does represent the Han Character proniunced as(in
> Korea, chinese or japanese would have different pronunciation) "Ye". In Japanese, its pronounciation is U+3088, or close to "Yo". > As I said in my previous mail and as Paolo said, above character is Hanja or > Chinses letter so I cannot determine whether it's Chinese, Japanese or > Korean. For example, with 'a' or'A' you cannot say that it's English or French > or so. Yes, imagine if "i" and "j" would have been unified in Unicode (i.e., share the same code point) because the "origin" of them are the same (i.e., "j" was/is a longer variation of "i"), using Unicode for many European languages would have been hard. -- Yoshiki |
Free forum by Nabble | Edit this page |