Unicode clipboard

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Unicode clipboard

Chris Petsos
 
Any quick ideas on how we can handle unicode text from and to the system clipboard with Squeak?
 
Christos
Reply | Threaded
Open this post in threaded view
|

Re: Unicode clipboard

Michael Rueger-6
 
Chris Petsos wrote:
> Any quick ideas on how we can handle unicode text from and to the system
> clipboard with Squeak?

There has been some work done in Sophie, currently being integrated with
the OLPC image.

Michael
Reply | Threaded
Open this post in threaded view
|

Re: Unicode clipboard

Takashi Yamamiya
 
Michael Rueger wrote:
> Chris Petsos wrote:
>> Any quick ideas on how we can handle unicode text from and to the
>> system clipboard with Squeak?
>
> There has been some work done in Sophie, currently being integrated with
> the OLPC image.

I'm working only for X11 (linux) with the OLPC.
If you want try on Mac or Win32 soon, see System-Clipboard-Extended
category in Sophie.

- Takashi

Reply | Threaded
Open this post in threaded view
|

Re: Unicode clipboard

Chris Petsos
 

> Michael Rueger wrote:
> > Chris Petsos wrote:
> >> Any quick ideas on how we can handle unicode text from and to the
> >> system clipboard with Squeak?
> >
> > There has been some work done in Sophie, currently being integrated with
> > the OLPC image.
>
> I'm working only for X11 (linux) with the OLPC.
> If you want try on Mac or Win32 soon, see System-Clipboard-Extended
> category in Sophie.
>
> - Takashi

>From what i saw System-Clipboard-Extended package uses UTF Converters for
the internal representation of the data.
The thing is that we are trying to create a VM where the internal
representation of the characters will be Unicode.
This means that the VM we use is sending unicode charcodes to the image, we
use unicode fonts etc...
So, a UTF interpreted string will not display properly in our image. Unless,
we use interpreters for our Unicode chars...
I think we will have to patch the VM again so that the clipboard related
methods send again unicode streams to the image.
Don't know which solution of the two is more desirable...

The related methods that are called when putting to or getting something
from the clipboard are
    int clipboardSize(void)
    int clipboardWriteFromAt(int count, int byteArrayIndex, int startIndex)
    int clipboardReadIntoAt(int count, int byteArrayIndex, int startIndex)

in
    sqWin32Window.c

Any help on that Diomidis?

Christos.

Reply | Threaded
Open this post in threaded view
|

Re: Unicode clipboard

johnmci
 
You're welcome to look at the
Sophie-Clipboard.st
ClipboardExtendedPlugin.c
JMMExtendedClipBoardPlugin.1.cs

in the mac os tree / plugins/ClipboardExtended  to see how we  
extended the clipboard logic for Sophie.

Higher up the extended clipboard class uses mimetype information to  
indicate the data type, at the lower level it's up to the plugin to
determine what for example ioReadClipboardData: clipboard format:  
format  means
where clipboard is a 32bit value (address), and format is a string  
value.

Likely the method that is not clear is the
ioGetClipboardFormat: clipboard formatNumber: formatNumber

on the macintosh you can have an item on the clipboard in many  
formats, such as a string in utf8, utf16, ascii, macroman
The ioGetClipboardFormat: formatNumber: returns each format type  
based on the index number formatNumber.

We used the results of that data  which we converted back to  
mimetypes to decide the best format for reading the clipboard.
Each platform has helper methods to convert the platform format data  
to a mimetype, so for example in windows we had

        clipboardFormatMap
                at: 49510 put: 'text/rtf' asMIMEType;
                at: 1 put: 'text/plain' asMIMEType; "CF_TEXT"
                at: 2 put: 'image/bmp' asMIMEType; "CF_BITMAP"
                at: 12 put: 'audio/wave' asMIMEType; "CF_WAVE"
                at: 13 put: 'text/unicode' asMIMEType; "CF_UNICODETEXT"
                at: 16 put: 'CF_LOCALE'; "CF_LOCALE"

I will note for Windows we used FFI to make the required calls and  
did not build a plugin.

So for example for textual data we would process either mime types of  
rtf, utf8, unicode, or plain

Later you use the
ioReadClipboardData: clipboard format: format
to actually return the data object.


I'll note for reading unicode on the mac it came across as UTF16 with  
no byte order mark, so our read WideString method that returned  
WideString data did:

readWideStringClipboardData
        | bytes |
        "utf16 plain text has no bom"

        bytes := self readClipboardData: 'public.utf16-plain-text'.
        ^bytes ifNil: [bytes] ifNotNil:
                [bytes asString convertFromWithConverter: (UTF16TextConverter new  
useLittleEndian: (SmalltalkImage current endianness = #little)
)]

on reading we did the following and supplied a byte order mark.

addWideStringClipboardData: aString
        | ba  |

        self clearClipboard.
        ba := aString convertToWithConverter: (UTF16TextConverter new  
useByteOrderMark: true).
        self addClipboardData: ba dataFormat: 'public.utf16-plain-text'



On May 22, 2007, at 3:18 AM, Chris Petsos wrote:

>
>> Michael Rueger wrote:
>>> Chris Petsos wrote:
>>>> Any quick ideas on how we can handle unicode text from and to the
>>>> system clipboard with Squeak?
>>>
>>> There has been some work done in Sophie, currently being  
>>> integrated with
>>> the OLPC image.
>>
>> I'm working only for X11 (linux) with the OLPC.
>> If you want try on Mac or Win32 soon, see System-Clipboard-Extended
>> category in Sophie.
>>
>> - Takashi
>
>> From what i saw System-Clipboard-Extended package uses UTF  
>> Converters for
> the internal representation of the data.
> The thing is that we are trying to create a VM where the internal
> representation of the characters will be Unicode.
> This means that the VM we use is sending unicode charcodes to the  
> image, we
> use unicode fonts etc...
> So, a UTF interpreted string will not display properly in our  
> image. Unless,
> we use interpreters for our Unicode chars...
> I think we will have to patch the VM again so that the clipboard  
> related
> methods send again unicode streams to the image.
> Don't know which solution of the two is more desirable...
>
> The related methods that are called when putting to or getting  
> something
> from the clipboard are
>     int clipboardSize(void)
>     int clipboardWriteFromAt(int count, int byteArrayIndex, int  
> startIndex)
>     int clipboardReadIntoAt(int count, int byteArrayIndex, int  
> startIndex)
>
> in
>     sqWin32Window.c
>
> Any help on that Diomidis?
>
> Christos.
>

--
========================================================================
===
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
========================================================================
===


Reply | Threaded
Open this post in threaded view
|

Re: Unicode clipboard

Andreas.Raab
In reply to this post by Chris Petsos
 
Use UTF-8 to transfer clipboard data. Windows has two nice functions to
efficiently convert those (MultiCharToWideChar and WideCharToMultiChar).
By far the easiest solution.

Cheers,
   - Andreas

Chris Petsos wrote:
>  
>
>
> ------------------------------------------------------------------------
>
> Any quick ideas on how we can handle unicode text >from and to the
> system clipboard with Squeak?
>  
> Christos
Reply | Threaded
Open this post in threaded view
|

Re: Unicode clipboard

Chris Petsos
In reply to this post by johnmci
 

> You're welcome to look at the
> Sophie-Clipboard.st
> ClipboardExtendedPlugin.c
> JMMExtendedClipBoardPlugin.1.cs

I didn't find JMMExtendedClipBoardPlugin.1.cs ...
Is it integrated in Sophie-Clipboard.st ?

Christos.

Reply | Threaded
Open this post in threaded view
|

Re: Unicode clipboard

Chris Petsos
In reply to this post by Chris Petsos
 
On Wed, 2007-05-30 at 11:19 +0300, Diomidis Spinellis wrote:

> Chris Petsos wrote:
> >> Michael Rueger wrote:
> >>> Chris Petsos wrote:
> >>>> Any quick ideas on how we can handle unicode text from and to the
> >>>> system clipboard with Squeak?
> >>> There has been some work done in Sophie, currently being integrated with
> >>> the OLPC image.
> >> I'm working only for X11 (linux) with the OLPC.
> >> If you want try on Mac or Win32 soon, see System-Clipboard-Extended
> >> category in Sophie.
> >>
> >> - Takashi
> >
> >>From what i saw System-Clipboard-Extended package uses UTF Converters for
> > the internal representation of the data.
> > The thing is that we are trying to create a VM where the internal
> > representation of the characters will be Unicode.
> > This means that the VM we use is sending unicode charcodes to the image, we
> > use unicode fonts etc...
> > So, a UTF interpreted string will not display properly in our image. Unless,
> > we use interpreters for our Unicode chars...
> > I think we will have to patch the VM again so that the clipboard related
> > methods send again unicode streams to the image.
> > Don't know which solution of the two is more desirable...
> >
> > The related methods that are called when putting to or getting something
> > from the clipboard are
> >     int clipboardSize(void)
> >     int clipboardWriteFromAt(int count, int byteArrayIndex, int startIndex)
> >     int clipboardReadIntoAt(int count, int byteArrayIndex, int startIndex)
> >
> > in
> >     sqWin32Window.c
> >
> > Any help on that Diomidis?
>
> Sorry for taking so long to reply.  The change needed in sqWin32Window.c
> is to replace the five instances of CF_TEXT with CF_UNICODETEXT.
> However, this solves only the Windows part of the problem.  For this to
> work, the characters we copy/paste must be of type wchar_t.  In the VM
>   (unsigned char *)byteArrayIndex + startIndex appears to point to byte
> characters.  How are Unicode characters represented there?
>
> Diomidis Spinellis - http://www.spinellis.gr

Ok..i am half the way there...the trick is that the image converts the
unicode chars to UTF8 before sending them to the VM. Thus, byte data
reach the VM in UTF8 representation. These data are then passed to
        MultiByteToWideChar( CP_UTF8, 0, src,
        GlobalSize(h) + 1, out,  
      GlobalSize(h2) );

Finally, the converted data are sent to the system clipboard with
        SetClipboardData(CF_UNICODETEXT, h2);

You are right in CF_UNICODETEXT Diomidis. I have in hand a very
pre-mature solution...just yesterday i managed to copy something from
eToys and paste it to MS Word correctly. But, i know it's a matter of
time...
I'll post a complete solution as soon as i complete it...
By the way...Takashi thanks for your interest... i'll send it as soon as
i can so that we can start testing...
Again thanks to everyone...

Christos.