Hello,
I have a problem with copying and pasting a text to VisualWorks with diacritics. Let's say instead of letter 'ž' aByteArray '\u017e' appears. It's literally Unicode for 'ž', but it should be stored in two bytes, not in six bytes as a unicode number in a string. Other observation: 1. Some letters are fine (like 'á', 'é'). 2. VisualWorks even changes system clipboard somehow. If 'ž' is copied from a application, then pasted to VisualWorks (improperly), then pasted to the application, also improperly. 3. Pasting results in VisualWorks differs according to source application. I think a problem is in Screen>>getExternalSelection: method, with primitive: 910. It successes, but returns the wrong result. VisualWorks 7.4 is used, also tried on v7.6, but with the same problems. OS is Linux (X.org, Gnome environment), system coding UTF-8 or ISO8859-2. It works on MS Windows. Is there a way, how to solve it? I need to copy and paste text (don't need pictures or something more complicated) between applications. Thank you in advance, Jura -- Juraj Kubelka _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
i can only talk about the Mac OS X clipboard, but it should be the same in other OSs. Basically each application is resposible for what it puts on the clipboard. It provides certain both the types of data and the data itself. If an application puts corrupt data on the clipboard and you read this data, make sure you validate the data somehow. Kind Regards Karsten Juraj Kubelka wrote: Hello, -- Karsten Kusche - Dipl.Inf. - [hidden email] Tel: +49 3496 21 43 29 Georg Heeg eK - Köthen Handelsregister: Amtsgericht Dortmund A 12812 _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Thank you, Karsten!
Generally, I supposed the other applications are correct. I tried it with OpenOffice.org, Firefox, gedit (Gnome Text Editor). It works between each other, but not with VisualWorks. Can I somehow explore system clipboard in VW? Like to see clipboard semantics (it's text, it's html, rtf, jpeg, ...). I have just found Screen>>getExternalSelection: which gets selection from clipboard and also read about X Windows selections (http://en.wikipedia.org/wiki/X_Window_selection). Or is a text only thing I can get about system clipboard? regards, Juraj On Thu, Jan 22, 2009 at 1:40 PM, Karsten <[hidden email]> wrote:
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Juraj Kubelka-3
The issue is probably what encoding to use to talk to the clipboard. The primitives aren't the issue, they're just getting and setting bytes, it's a matter of how the string is encoded before that, and in particular, the value of the #default encoding. And the setting of that comes down to locales. In 7.4 I think that would not have worked, because there wouldn't have been a locale available. 7.6 does have Unix UTF-8 locales, so I would have expected that combination to do better. 7.7 should do a great deal better.
So the interesting question is what (StreamEncoder new: #default) encoding returns. However, I would have expected it to be cut and paste consistent between itself. That is, it's going to use the same encoding in both directions. So the other possibility is that somehow the system clipboard is mutating the data to be in a different encoding. I can't see any way that VisualWorks would be encoding the information as '\u017e'. At 07:15 AM 1/22/2009, Juraj Kubelka wrote: >Hello, > >I have a problem with copying and pasting a text to VisualWorks with diacritics. Let's say instead of letter '' aByteArray '\u017e' appears. It's literally Unicode for '', but it should be stored in two bytes, not in six bytes as a unicode number in a string. > >Other observation: >1. Some letters are fine (like 'á', 'é'). >2. VisualWorks even changes system clipboard somehow. If '' is copied from a application, then pasted to VisualWorks (improperly), then pasted to the application, also improperly. >3. Pasting results in VisualWorks differs according to source application. > >I think a problem is in Screen>>getExternalSelection: method, with primitive: 910. It successes, but returns the wrong result. >VisualWorks 7.4 is used, also tried on v7.6, but with the same problems. OS is Linux (X.org, Gnome environment), system coding UTF-8 or ISO8859-2. It works on MS Windows. > >Is there a way, how to solve it? I need to copy and paste text (don't need pictures or something more complicated) between applications. >Thank you in advance, >Jura > >-- >Juraj Kubelka >_______________________________________________ >vwnc mailing list >[hidden email] >http://lists.cs.uiuc.edu/mailman/listinfo/vwnc -- Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email] [hidden email] http://www.cincom.com/smalltalk _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Juraj Kubelka-3
(StreamEncoder new: #default) encoding
returns #'ISO-8859-2' on my UTF-8 system, because it's explicitly set by me. However, i tried the same explorations on ISO8859-2 system with the same results. The clipboard string is encoded to Unicode (ByteArray>>asStringEncoding:) right after it's fetched from system clipboard. Unfortunately the byte array is #[92 117 48 49 55 101] (= '\u017e') instead of #[197 190]. #[197 190] asStringEncoding: #UTF8. works fine. It returns 'ž'. #[92 117 48 49 55 101] asStringEncoding: #UTF8. returns '\u017e' Thank you for any comment. Juraj
On Thu, Jan 22, 2009 at 3:14 PM, Alan Knight <[hidden email]> wrote: The issue is probably what encoding to use to talk to the clipboard. The primitives aren't the issue, they're just getting and setting bytes, it's a matter of how the string is encoded before that, and in particular, the value of the #default encoding. And the setting of that comes down to locales. In 7.4 I think that would not have worked, because there wouldn't have been a locale available. 7.6 does have Unix UTF-8 locales, so I would have expected that combination to do better. 7.7 should do a great deal better. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Alan Knight-2
The primitives always specify ISO-8859-1 to copy and paste selections on X11. If your string has a different encoding,
it will simply be misinterpreted. See AR 54602. Alan Knight wrote: > The issue is probably what encoding to use to talk to the clipboard. The primitives aren't the issue, they're just getting and setting bytes, it's a matter of how the string is encoded before that, and in particular, the value of the #default encoding. And the setting of that comes down to locales. In 7.4 I think that would not have worked, because there wouldn't have been a locale available. 7.6 does have Unix UTF-8 locales, so I would have expected that combination to do better. 7.7 should do a great deal better. > > So the interesting question is what > (StreamEncoder new: #default) encoding > returns. > > However, I would have expected it to be cut and paste consistent between itself. That is, it's going to use the same encoding in both directions. So the other possibility is that somehow the system clipboard is mutating the data to be in a different encoding. I can't see any way that VisualWorks would be encoding the information as '\u017e'. > > At 07:15 AM 1/22/2009, Juraj Kubelka wrote: >>Hello, >> >>I have a problem with copying and pasting a text to VisualWorks with diacritics. Let's say instead of letter 'ž' aByteArray '\u017e' appears. It's literally Unicode for 'ž', but it should be stored in two bytes, not in six bytes as a unicode number in a string. >> >>Other observation: >>1. Some letters are fine (like 'á', 'é'). >>2. VisualWorks even changes system clipboard somehow. If 'ž' is copied from a application, then pasted to VisualWorks (improperly), then pasted to the application, also improperly. >>3. Pasting results in VisualWorks differs according to source application. >> >>I think a problem is in Screen>>getExternalSelection: method, with primitive: 910. It successes, but returns the wrong result. >>VisualWorks 7.4 is used, also tried on v7.6, but with the same problems. OS is Linux (X.org, Gnome environment), system coding UTF-8 or ISO8859-2. It works on MS Windows. >> >>Is there a way, how to solve it? I need to copy and paste text (don't need pictures or something more complicated) between applications. >>Thank you in advance, >>Jura >> >>-- >>Juraj Kubelka >>_______________________________________________ >>vwnc mailing list >>[hidden email] >>http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > > -- > Alan Knight [|], Engineering Manager, Cincom Smalltalk > [hidden email] > [hidden email] > http://www.cincom.com/smalltalk > > > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > > -- Ralf Propach, [hidden email] Tel: +49 231 975 99 38 Fax: +49 231 975 99 20 Georg Heeg eK (Dortmund) Handelsregister: Amtsgericht Dortmund A 12812 _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
I cannot find "AR 54602". Where is it?
Kind Regards, Juraj On Thu, Jan 22, 2009 at 4:40 PM, Ralf Propach <[hidden email]> wrote: The primitives always specify ISO-8859-1 to copy and paste selections on X11. If your string has a different encoding, _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Juraj Kubelka-3
I can reproduce this on my system as well (with the latest VW7.7 build). It seems that our primitives expect the platform to pass the selection over in whatever is the current platform encoding, which doesn't seem to be the case on (at least some) Linux distributions. BTW, mine is currently Fedora 8. Instead the OS is passing it in this "textual" encoding where normal (ASCII only?) symbols seem to be passed as is and the rest get the \u treatment.
I note that this seems ambiguous because when I copy a literal '\u0224' string (from the CharacterMap application) I get the same thing on VW end. So there doesn't seem to be any way to reliably distinguish the two cases. Anyway here is an interesting bit of the primitive on X11: /* Support for retrieving text from specified atoms */ switch(type) { case 0: /* old-style behavior */ if (getXSelection(display, XA_PRIMARY) == None && getXSelection(display, state->clipboard) == None) state->fetchedBytes = XFetchBytes(display, &(state->fetchedSize)); break; case 1: /* check only XA_PRIMARY */ getXSelection(display, XA_PRIMARY); break; case 2: /* check only XA_SECONDARY */ getXSelection(display, XA_SECONDARY); break; case 3: /* check only CLIPBOARD */ getXSelection(display, state->clipboard); break; } *size = state->fetchedSize; *value = state->fetchedBytes; Note that the type argument is mapped on the argument of the getExternalSelection: method. So you can do something like (Screen default getExternalSelection: 3) asStringEncoding: #UTF8 to try the 3rd case. However neither of them seem to yield what we expect on the image side. Clearly there's more work to do here, and if it is any consolation we have a significant I18N effort under-way to improve things. In the meantime you could certainly try post-processing the string after we fetch the selection (ignoring the above issue) by patching Screen>>getExternalSelection or something like that. If you want to try that, note that it seems that you actually don't want to decode the incoming string with UTF8 but rather with ISO8859-1 (before converting the \uXXXX bits). Martin "Juraj Kubelka"<[hidden email]> wrote: > (StreamEncoder new: #default) encoding > returns #'ISO-8859-2' on my UTF-8 system, because it's explicitly set by > me. However, i tried the same explorations on ISO8859-2 system with the same > results. The clipboard string is encoded to Unicode > (ByteArray>>asStringEncoding:) right after it's fetched from system > clipboard. Unfortunately the byte array is #[92 117 48 49 55 101] (= > '\u017e') instead of #[197 190]. > > #[197 190] asStringEncoding: #UTF8. > works fine. It returns 'ž'. > > #[92 117 48 49 55 101] asStringEncoding: #UTF8. > returns '\u017e' > > Thank you for any comment. > Juraj > > On Thu, Jan 22, 2009 at 3:14 PM, Alan Knight <[hidden email]> wrote: > > > The issue is probably what encoding to use to talk to the clipboard. The > > primitives aren't the issue, they're just getting and setting bytes, it's a > > matter of how the string is encoded before that, and in particular, the > > value of the #default encoding. And the setting of that comes down to > > locales. In 7.4 I think that would not have worked, because there wouldn't > > have been a locale available. 7.6 does have Unix UTF-8 locales, so I would > > have expected that combination to do better. 7.7 should do a great deal > > better. > > > > So the interesting question is what > > (StreamEncoder new: #default) encoding > > returns. > > > > However, I would have expected it to be cut and paste consistent between > > itself. That is, it's going to use the same encoding in both directions. So > > the other possibility is that somehow the system clipboard is mutating the > > data to be in a different encoding. I can't see any way that VisualWorks > > would be encoding the information as '\u017e'. > > > > At 07:15 AM 1/22/2009, Juraj Kubelka wrote: > > >Hello, > > > > > >I have a problem with copying and pasting a text to VisualWorks with > > diacritics. Let's say instead of letter 'ž' aByteArray '\u017e' appears. > > It's literally Unicode for 'ž', but it should be stored in two bytes, not in > > six bytes as a unicode number in a string. > > > > > >Other observation: > > >1. Some letters are fine (like 'á', 'é'). > > >2. VisualWorks even changes system clipboard somehow. If 'ž' is copied > > from a application, then pasted to VisualWorks (improperly), then pasted to > > the application, also improperly. > > >3. Pasting results in VisualWorks differs according to source application. > > > > > >I think a problem is in Screen>>getExternalSelection: method, with > > primitive: 910. It successes, but returns the wrong result. > > >VisualWorks 7.4 is used, also tried on v7.6, but with the same problems. > > OS is Linux (X.org, Gnome environment), system coding UTF-8 or ISO8859-2. It > > works on MS Windows. > > > > > >Is there a way, how to solve it? I need to copy and paste text (don't need > > pictures or something more complicated) between applications. > > >Thank you in advance, > > >Jura > > > > > >-- > > >Juraj Kubelka > > >_______________________________________________ > > >vwnc mailing list > > >[hidden email] > > >http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > > > > -- > > Alan Knight [|], Engineering Manager, Cincom Smalltalk > > [hidden email] > > [hidden email] > > http://www.cincom.com/smalltalk > > > > > > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Juraj Kubelka-3
If you check the Resolutions portal you
will find 2 resolutions under 7.6 that relate to this problem. Terry From:
[hidden email] [mailto:[hidden email]] On Behalf Of Juraj Kubelka I cannot find "AR
54602". Where is it? On Thu, Jan 22, 2009 at 4:40 PM, Ralf Propach <[hidden email]> wrote: The primitives always specify ISO-8859-1 to copy and paste selections
on X11. If your string has a different encoding,
>>Hello, >>_______________________________________________ Ralf Propach, [hidden email] Handelsregister: Amtsgericht Dortmund A 12812 _______________________________________________ _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Free forum by Nabble | Edit this page |