Smalltalk › Squeak › Squeak - Dev

Some Win32 ClipboardInterpreter still use squeakToMac, why???

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

7 messages Options

Nicolas Cellier

Some Win32 ClipboardInterpreter still use squeakToMac, why???

It seems to me that Clipboard primitives explicitely use UTF8 encoding on Win32 platforms.

See for example https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/aed5e3391301011cc6b9ee6a353ee563f4ab6dbd/platforms/win32/vm/sqWin32Window.c

/* Convert data to Unicode UTF16. */
MultiByteToWideChar( CP_UTF8, 0, cvt, -1, out, wcharsNeeded );

/* Send the Unicode text to the clipboard. */
EmptyClipboard();
SetClipboardData(CF_UNICODETEXT, h);

and:

/* Get clipboard data in Unicode format */
h = GetClipboardData(CF_UNICODETEXT);
src = GlobalLock(h);

/* How many bytes do we need to store the UTF8 representation? */
bytesNeeded = WideCharToMultiByte(CP_UTF8, 0, src, -1,
NULL, 0, NULL, NULL );

/* Convert Unicode text to UTF8. */
cvt = tmp = malloc(bytesNeeded);
WideCharToMultiByte(CP_UTF8, 0, src, -1, tmp, bytesNeeded, NULL, NULL);

So it seems to me that:
1) all the squeakToMac sends found in various ClipboardInterpreter subclasses (the Win32 ones at least) are completely obsolete
2) all the exotic ClipboardInterpreter subclasse, but UTF8ClipboardInterpreter, are themselves obsolete and could be simply withdrawn from service

Did I miss something, or can I use the high pressure cleaner in this area?

Tobias Pape

Re: Some Win32 ClipboardInterpreter still use squeakToMac, why???

> On 11.06.2019, at 18:22, Nicolas Cellier <[hidden email]> wrote:
>
> It seems to me that Clipboard primitives explicitely use UTF8 encoding on Win32 platforms.
>
> See for example https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/aed5e3391301011cc6b9ee6a353ee563f4ab6dbd/platforms/win32/vm/sqWin32Window.c
>
> /* Convert data to Unicode UTF16. */
> MultiByteToWideChar( CP_UTF8, 0, cvt, -1, out, wcharsNeeded );
>
> /* Send the Unicode text to the clipboard. */
> EmptyClipboard();
> SetClipboardData(CF_UNICODETEXT, h);
>
> and:
>
> /* Get clipboard data in Unicode format */
> h = GetClipboardData(CF_UNICODETEXT);
> src = GlobalLock(h);
>
> /* How many bytes do we need to store the UTF8 representation? */
> bytesNeeded = WideCharToMultiByte(CP_UTF8, 0, src, -1,
> NULL, 0, NULL, NULL );
>
> /* Convert Unicode text to UTF8. */
> cvt = tmp = malloc(bytesNeeded);
> WideCharToMultiByte(CP_UTF8, 0, src, -1, tmp, bytesNeeded, NULL, NULL);
>
> So it seems to me that:
> 1) all the squeakToMac sends found in various ClipboardInterpreter subclasses (the Win32 ones at least) are completely obsolete
> 2) all the exotic ClipboardInterpreter subclasse, but UTF8ClipboardInterpreter, are themselves obsolete and could be simply withdrawn from service
>
> Did I miss something, or can I use the high pressure cleaner in this area?
>

Powerwash all the things!
Let's have UTF-8 for everything external (well, except CJK-locales object, but there we have the leading-char thing anyway, right?)
-t

timrowledge

Re: Some Win32 ClipboardInterpreter still use squeakToMac, why???

> On 2019-06-11, at 10:06 AM, Tobias Pape <[hidden email]> wrote:
>>
>
> Powerwash all the things!

I like the general approach of cleaning things up but do remember to test that the older images are ok afterwards. Eliot has nicely explained the requirement many times in the past.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
"Bother" said Pooh, as the IRS kicked his door in.

Tobias Pape

Re: Some Win32 ClipboardInterpreter still use squeakToMac, why???

> On 11.06.2019, at 20:07, tim Rowledge <[hidden email]> wrote:
>
>
>
>> On 2019-06-11, at 10:06 AM, Tobias Pape <[hidden email]> wrote:
>>>
>>
>> Powerwash all the things!
>
> I like the general approach of cleaning things up but do remember to test that the older images are ok afterwards. Eliot has nicely explained the requirement many times in the past.
>

That's true and I am in no way opposed to that.

However, with (a) the change from “traditional” Squeak encoding (ie, macroman) to latin1 (aka iso-8859-1, hence "squeakToIso") and (b) then the “rebrand” of bytestring-is-latin1/widestring-is-utf32 to "just" unicode[1], the (a) step became lost to many people.

_if_ old images would not work with the changes Nicolas proposes, they would have been hosed long ago anyway.

Best regards
-Tobias

[1]: this is the most amazing encoding "hack" I have seen, and I think that's actually why Unicode for codepoints <256 is the way it is: that latin1 strings are just that: valid unicode. I know it was not done _because_ of Squeak, but Andreas obviously saw then chance and took it.

Yoshiki Ohshima-3

Re: Some Win32 ClipboardInterpreter still use squeakToMac, why???

On Tue, Jun 11, 2019 at 11:19 AM Tobias Pape <[hidden email]> wrote:

[1]: this is the most amazing encoding "hack" I have seen, and I think that's actually why Unicode for codepoints <256 is the way it is: that latin1 strings are just that: valid unicode. I know it was not done _because_ of Squeak, but Andreas obviously saw then chance and took it.

The story was a bit more complicated, but we documented some ideas here:

http://www.vpri.org/pdf/ohshima_c5.pdf

-- Yoshiki

Tobias Pape

Re: Some Win32 ClipboardInterpreter still use squeakToMac, why???

Dear Yoshiki

> On 11.06.2019, at 20:23, Yoshiki Ohshima <[hidden email]> wrote:
>
>
> On Tue, Jun 11, 2019 at 11:19 AM Tobias Pape <[hidden email]> wrote:
>
> [1]: this is the most amazing encoding "hack" I have seen, and I think that's actually why Unicode for codepoints <256 is the way it is: that latin1 strings are just that: valid unicode. I know it was not done _because_ of Squeak, but Andreas obviously saw then chance and took it.
>
>
> The story was a bit more complicated, but we documented some ideas here:
>
> http://www.vpri.org/pdf/ohshima_c5.pdf

Thanks for setting me straight!
I didn't know this document existed, will read it. Finally I'll understand some design decisions, eg, regarding TTCFonts :)

Best regards
-Tobias

Chris Cunnington-4

Re: Some Win32 ClipboardInterpreter still use squeakToMac, why???

How do I load Japanese into Squeak5.2? I bounced between the Swiki and SqueakMap without much success.

Chris

> On Jun 11, 2019, at 2:29 PM, Tobias Pape <[hidden email]> wrote:
>
> Dear Yoshiki
>
>> On 11.06.2019, at 20:23, Yoshiki Ohshima <[hidden email]> wrote:
>>
>>
>> On Tue, Jun 11, 2019 at 11:19 AM Tobias Pape <[hidden email]> wrote:
>>
>> [1]: this is the most amazing encoding "hack" I have seen, and I think that's actually why Unicode for codepoints <256 is the way it is: that latin1 strings are just that: valid unicode. I know it was not done _because_ of Squeak, but Andreas obviously saw then chance and took it.
>>
>>
>> The story was a bit more complicated, but we documented some ideas here:
>>
>> http://www.vpri.org/pdf/ohshima_c5.pdf
>
>
> Thanks for setting me straight!
> I didn't know this document existed, will read it. Finally I'll understand some design decisions, eg, regarding TTCFonts :)
>
> Best regards
> -Tobias
>
>