Smalltalk › Squeak › Squeak VM

primitiveClipboardText mangling line endings

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

7 messages Options

Levente Uzonyi-2

primitiveClipboardText mangling line endings

Hi All,

I found that it's impossile to copy text from one image to another without
losing all CR characters.
If you open two images, and evaluate [Clipboard default
primitiveClipboardText: #[13].] in one of them, then evaluate
[Clipboard default primitiveClipboardText asByteArray.] in the other,
you'll get #[] - an empty ByteArray. So CRs are lost. The same thing
happens with longer strings too.
Using the same method, I found that LFs and CRLFs are converted to CRs.
I found this issue on Ubuntu 14.04 with Cog 3018 and 3104. I haven't tried
any other VMs or OSes.

Levente

David T. Lewis

Re: primitiveClipboardText mangling line endings

On Sat, Oct 25, 2014 at 09:04:29PM +0200, Levente Uzonyi wrote:

>
> Hi All,
>
> I found that it's impossile to copy text from one image to another without
> losing all CR characters.
> If you open two images, and evaluate [Clipboard default
> primitiveClipboardText: #[13].] in one of them, then evaluate
> [Clipboard default primitiveClipboardText asByteArray.] in the other,
> you'll get #[] - an empty ByteArray. So CRs are lost. The same thing
> happens with longer strings too.
> Using the same method, I found that LFs and CRLFs are converted to CRs.
> I found this issue on Ubuntu 14.04 with Cog 3018 and 3104. I haven't tried
> any other VMs or OSes.
>

This has been the case on the unix VM for quite a while. I don't recall the
reason for it being done this way, and I'm not sure if it is a bug or a feature.

Does anyone remember?

Dave

Eliot Miranda-2

Re: primitiveClipboardText mangling line endings

In reply to this post by Levente Uzonyi-2

Hi Levente,

On Sat, Oct 25, 2014 at 12:04 PM, Levente Uzonyi <[hidden email]> wrote:

Hi All,

I found that it's impossile to copy text from one image to another without losing all CR characters.
If you open two images, and evaluate [Clipboard default primitiveClipboardText: #[13].] in one of them, then evaluate [Clipboard default primitiveClipboardText asByteArray.] in the other, you'll get #[] - an empty ByteArray. So CRs are lost. The same thing happens with longer strings too.
Using the same method, I found that LFs and CRLFs are converted to CRs.
I found this issue on Ubuntu 14.04 with Cog 3018 and 3104. I haven't tried any other VMs or OSes.

I can't reproduce this. I just copied some text in one image on linux (running a very recent Spur VM, which has the same platform subsystem as the Cog VM) and in another image evaluated

Clipboard default clipboardText occurrencesOf: Character cr

which answered

Is there some magic X11 setting that could account for your issue?

--
best,

Eliot

Levente Uzonyi-2

Re: primitiveClipboardText mangling line endings

Hi Eliot,

On Sat, 25 Oct 2014, Eliot Miranda wrote:

> Is there some magic X11 setting that could account for your issue?

No, at least not intentionally. I tried to check if there are any settings
that could affect the clipboard encoding, but I didn't find anything. It
seems like only utf-8 is supported.

I'm pretty sure it's related to the VM, because when I copy some text from
another application, and paste it into an image, then the line endings are
converted to CRs, which is very unlikely to happen on linux.

I tried to copy all 7-bit ascii characters to the clipboard (besides zero
which is not possible), because those are the same in utf-8:

Clipboard default primitiveClipboardText: (1 to: 127) asByteArray.

When I checked it with xclip, it turned out that all the bytes are on the
clipboard:

$ xclip -o -selection clipboard | hexdump -b;
0000000 001 002 003 004 005 006 007 010 011 012 013 014 012 016 017 020
0000010 021 022 023 024 025 026 027 030 031 032 033 034 035 036 037 040
0000020 041 042 043 044 045 046 047 050 051 052 053 054 055 056 057 060
0000030 061 062 063 064 065 066 067 070 071 072 073 074 075 076 077 100
0000040 101 102 103 104 105 106 107 110 111 112 113 114 115 116 117 120
0000050 121 122 123 124 125 126 127 130 131 132 133 134 135 136 137 140
0000060 141 142 143 144 145 146 147 150 151 152 153 154 155 156 157 160
0000070 161 162 163 164 165 166 167 170 171 172 173 174 175 176 177
000007f

But in the other image, the bytes were filtered:

Clipboard default primitiveClipboardText asByteArray. #[9 13 27 32 33 34
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124
125 126]

Levente

Eliot Miranda-2

Re: primitiveClipboardText mangling line endings

On Sat, Oct 25, 2014 at 4:18 PM, Levente Uzonyi <[hidden email]> wrote:

Hi Eliot,

On Sat, 25 Oct 2014, Eliot Miranda wrote:

Is there some magic X11 setting that could account for your issue?

No, at least not intentionally. I tried to check if there are any settings that could affect the clipboard encoding, but I didn't find anything. It seems like only utf-8 is supported.

I'm pretty sure it's related to the VM, because when I copy some text from another application, and paste it into an image, then the line endings are converted to CRs, which is very unlikely to happen on linux.

I tried to copy all 7-bit ascii characters to the clipboard (besides zero which is not possible), because those are the same in utf-8:

Clipboard default primitiveClipboardText: (1 to: 127) asByteArray.

When I checked it with xclip, it turned out that all the bytes are on the clipboard:

$ xclip -o -selection clipboard | hexdump -b;
0000000 001 002 003 004 005 006 007 010 011 012 013 014 012 016 017 020
0000010 021 022 023 024 025 026 027 030 031 032 033 034 035 036 037 040
0000020 041 042 043 044 045 046 047 050 051 052 053 054 055 056 057 060
0000030 061 062 063 064 065 066 067 070 071 072 073 074 075 076 077 100
0000040 101 102 103 104 105 106 107 110 111 112 113 114 115 116 117 120
0000050 121 122 123 124 125 126 127 130 131 132 133 134 135 136 137 140
0000060 141 142 143 144 145 146 147 150 151 152 153 154 155 156 157 160
0000070 161 162 163 164 165 166 167 170 171 172 173 174 175 176 177
000007f

But in the other image, the bytes were filtered:

Clipboard default primitiveClipboardText asByteArray. #[9 13 27 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126]

Ugh, *why* does X11 have to be so complicated? Why does Ian's VM code have to be so complicated? I've discovered that there's a -textenc flag for the VM. If you do

squeak -textenc UTF8 myimage.image

in both images then you'll get all 127 characters copied across. I'd immediately make this the default but

a) there is no command-line argument to select the default, what ever that is

b) I *don't know* what the default is called, so I can't figure out a name. It's not that simple to determine. here's the operative code from platforms/unix/vm-display-X11/sqUnixX11.c:

static char *getSelectionFrom(Atom source)

{

char * data= NULL;

size_t bytes= 0;

/* request the selection */

Atom target= textEncodingUTF8 ? xaUTF8String : (localeEncoding ? xaCompoundText : XA_STRING);

Further down there's

# if defined(X_HAVE_UTF8_STRING)

if (uxUTF8Encoding == sqTextEncoding)

Xutf8TextPropertyToTextList(stDisplay, &textProperty, &strList, &n);

else

# endif

XmbTextPropertyToTextList(stDisplay, &textProperty, &strList, &n);

So I guess at one point UTF8 support was added, hence it not being the default. Any objections to us making it the default now?

Ugh...

--
best,

Eliot

Levente Uzonyi-2

Re: primitiveClipboardText mangling line endings

Nice find. The textenc parameter works. I found this in the man page of
squeakvm:

-textenc enc
specifies the external character encoding to be used by
Squeak when exchanging clipboard text with other
applications. The default is UTF-8 on Mac OS X and
ISO-8859-15 (aka Latin9) on other Unix systems. Note that X11
applications requesting the selection converted to
UTF8_STRING data will (correctly) receive the clipboard text
encoded as UTF-8, regardless of this setting.
Squeak recognizes a subset of the encoding names defined by
the IANA. (If you prefer to use the international currency
symbol rather than the Euro symbol in external text then you
might want to set this to ISO-8859-1, aka Latin1.)

So the default encoding is Latin9, but that doesn't make any sense to me.

Levente

On Sat, 25 Oct 2014, Eliot Miranda wrote:
>
> Ugh, *why* does X11 have to be so complicated? Why does Ian's VM code
have to be so complicated? I've discovered that there's a -textenc flag
for the VM. If you do
>
> squeak -textenc UTF8 myimage.image
>
> in both images then you'll get all 127 characters copied across. I'd
immediately make this the default but
> a) there is no command-line argument to select the default, what ever
that is
> b) I *don't know* what the default is called, so I can't figure out a
name. It's not that simple to determine. here's the operative code from
platforms/unix/vm-display-X11/sqUnixX11.c:
>
> static char *getSelectionFrom(Atom source)
> {
> char * data= NULL;
> size_t bytes= 0;
>
> /* request the selection */
> Atom target= textEncodingUTF8 ? xaUTF8String : (localeEncoding ?
xaCompoundText : XA_STRING);
>
>
> Further down there's
>
> # if defined(X_HAVE_UTF8_STRING)
> if (uxUTF8Encoding == sqTextEncoding)
> Xutf8TextPropertyToTextList(stDisplay, &textProperty, &strList,
&n);
> else
> # endif
> XmbTextPropertyToTextList(stDisplay, &textProperty, &strList,
&n);
>
> So I guess at one point UTF8 support was added, hence it not being the
default. Any objections to us making it the default now?
>
> Ugh...
> --
> best,Eliot

timrowledge

Re: primitiveClipboardText mangling line endings

In reply to this post by Eliot Miranda-2

On 25-10-2014, at 4:52 PM, Eliot Miranda <[hidden email]> wrote:

>
> Ugh, *why* does X11 have to be so complicated?

Because the unix world hates everyone. X11 hates unix for making things too easy.

> Why does Ian's VM code have to be so complicated?

Because he tries to cope with the above multiplied by about fifty different versions, all choosing different ways to express their spite.

We should all just switch to RISC OS. :-)

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Meets quality standards: It compiles without errors.