It seems to me that Clipboard primitives explicitely use UTF8 encoding on Win32 platforms.
See for example https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/aed5e3391301011cc6b9ee6a353ee563f4ab6dbd/platforms/win32/vm/sqWin32Window.c /* Convert data to Unicode UTF16. */ MultiByteToWideChar( CP_UTF8, 0, cvt, -1, out, wcharsNeeded ); /* Send the Unicode text to the clipboard. */ EmptyClipboard(); SetClipboardData(CF_UNICODETEXT, h); and: /* Get clipboard data in Unicode format */ h = GetClipboardData(CF_UNICODETEXT); src = GlobalLock(h); /* How many bytes do we need to store the UTF8 representation? */ bytesNeeded = WideCharToMultiByte(CP_UTF8, 0, src, -1, NULL, 0, NULL, NULL ); /* Convert Unicode text to UTF8. */ cvt = tmp = malloc(bytesNeeded); WideCharToMultiByte(CP_UTF8, 0, src, -1, tmp, bytesNeeded, NULL, NULL); So it seems to me that: 1) all the squeakToMac sends found in various ClipboardInterpreter subclasses (the Win32 ones at least) are completely obsolete 2) all the exotic ClipboardInterpreter subclasse, but UTF8ClipboardInterpreter, are themselves obsolete and could be simply withdrawn from service Did I miss something, or can I use the high pressure cleaner in this area? |
> On 11.06.2019, at 18:22, Nicolas Cellier <[hidden email]> wrote: > > It seems to me that Clipboard primitives explicitely use UTF8 encoding on Win32 platforms. > > See for example https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/aed5e3391301011cc6b9ee6a353ee563f4ab6dbd/platforms/win32/vm/sqWin32Window.c > > /* Convert data to Unicode UTF16. */ > MultiByteToWideChar( CP_UTF8, 0, cvt, -1, out, wcharsNeeded ); > > /* Send the Unicode text to the clipboard. */ > EmptyClipboard(); > SetClipboardData(CF_UNICODETEXT, h); > > and: > > /* Get clipboard data in Unicode format */ > h = GetClipboardData(CF_UNICODETEXT); > src = GlobalLock(h); > > /* How many bytes do we need to store the UTF8 representation? */ > bytesNeeded = WideCharToMultiByte(CP_UTF8, 0, src, -1, > NULL, 0, NULL, NULL ); > > /* Convert Unicode text to UTF8. */ > cvt = tmp = malloc(bytesNeeded); > WideCharToMultiByte(CP_UTF8, 0, src, -1, tmp, bytesNeeded, NULL, NULL); > > So it seems to me that: > 1) all the squeakToMac sends found in various ClipboardInterpreter subclasses (the Win32 ones at least) are completely obsolete > 2) all the exotic ClipboardInterpreter subclasse, but UTF8ClipboardInterpreter, are themselves obsolete and could be simply withdrawn from service > > Did I miss something, or can I use the high pressure cleaner in this area? > Powerwash all the things! Let's have UTF-8 for everything external (well, except CJK-locales object, but there we have the leading-char thing anyway, right?) -t |
> On 2019-06-11, at 10:06 AM, Tobias Pape <[hidden email]> wrote: >> > > Powerwash all the things! I like the general approach of cleaning things up but do remember to test that the older images are ok afterwards. Eliot has nicely explained the requirement many times in the past. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim "Bother" said Pooh, as the IRS kicked his door in. |
> On 11.06.2019, at 20:07, tim Rowledge <[hidden email]> wrote: > > > >> On 2019-06-11, at 10:06 AM, Tobias Pape <[hidden email]> wrote: >>> >> >> Powerwash all the things! > > I like the general approach of cleaning things up but do remember to test that the older images are ok afterwards. Eliot has nicely explained the requirement many times in the past. > However, with (a) the change from “traditional” Squeak encoding (ie, macroman) to latin1 (aka iso-8859-1, hence "squeakToIso") and (b) then the “rebrand” of bytestring-is-latin1/widestring-is-utf32 to "just" unicode[1], the (a) step became lost to many people. _if_ old images would not work with the changes Nicolas proposes, they would have been hosed long ago anyway. Best regards -Tobias [1]: this is the most amazing encoding "hack" I have seen, and I think that's actually why Unicode for codepoints <256 is the way it is: that latin1 strings are just that: valid unicode. I know it was not done _because_ of Squeak, but Andreas obviously saw then chance and took it. |
On Tue, Jun 11, 2019 at 11:19 AM Tobias Pape <[hidden email]> wrote:
The story was a bit more complicated, but we documented some ideas here: -- Yoshiki |
Dear Yoshiki
> On 11.06.2019, at 20:23, Yoshiki Ohshima <[hidden email]> wrote: > > > On Tue, Jun 11, 2019 at 11:19 AM Tobias Pape <[hidden email]> wrote: > > [1]: this is the most amazing encoding "hack" I have seen, and I think that's actually why Unicode for codepoints <256 is the way it is: that latin1 strings are just that: valid unicode. I know it was not done _because_ of Squeak, but Andreas obviously saw then chance and took it. > > > The story was a bit more complicated, but we documented some ideas here: > > http://www.vpri.org/pdf/ohshima_c5.pdf Thanks for setting me straight! I didn't know this document existed, will read it. Finally I'll understand some design decisions, eg, regarding TTCFonts :) Best regards -Tobias |
How do I load Japanese into Squeak5.2? I bounced between the Swiki and SqueakMap without much success.
Chris > On Jun 11, 2019, at 2:29 PM, Tobias Pape <[hidden email]> wrote: > > Dear Yoshiki > >> On 11.06.2019, at 20:23, Yoshiki Ohshima <[hidden email]> wrote: >> >> >> On Tue, Jun 11, 2019 at 11:19 AM Tobias Pape <[hidden email]> wrote: >> >> [1]: this is the most amazing encoding "hack" I have seen, and I think that's actually why Unicode for codepoints <256 is the way it is: that latin1 strings are just that: valid unicode. I know it was not done _because_ of Squeak, but Andreas obviously saw then chance and took it. >> >> >> The story was a bit more complicated, but we documented some ideas here: >> >> http://www.vpri.org/pdf/ohshima_c5.pdf > > > Thanks for setting me straight! > I didn't know this document existed, will read it. Finally I'll understand some design decisions, eg, regarding TTCFonts :) > > Best regards > -Tobias > > |
Free forum by Nabble | Edit this page |