Any quick ideas on how we can handle unicode text
from and to the system clipboard with Squeak?
Christos
|
Chris Petsos wrote: > Any quick ideas on how we can handle unicode text from and to the system > clipboard with Squeak? There has been some work done in Sophie, currently being integrated with the OLPC image. Michael |
Michael Rueger wrote: > Chris Petsos wrote: >> Any quick ideas on how we can handle unicode text from and to the >> system clipboard with Squeak? > > There has been some work done in Sophie, currently being integrated with > the OLPC image. I'm working only for X11 (linux) with the OLPC. If you want try on Mac or Win32 soon, see System-Clipboard-Extended category in Sophie. - Takashi |
> Michael Rueger wrote: > > Chris Petsos wrote: > >> Any quick ideas on how we can handle unicode text from and to the > >> system clipboard with Squeak? > > > > There has been some work done in Sophie, currently being integrated with > > the OLPC image. > > I'm working only for X11 (linux) with the OLPC. > If you want try on Mac or Win32 soon, see System-Clipboard-Extended > category in Sophie. > > - Takashi >From what i saw System-Clipboard-Extended package uses UTF Converters for the internal representation of the data. The thing is that we are trying to create a VM where the internal representation of the characters will be Unicode. This means that the VM we use is sending unicode charcodes to the image, we use unicode fonts etc... So, a UTF interpreted string will not display properly in our image. Unless, we use interpreters for our Unicode chars... I think we will have to patch the VM again so that the clipboard related methods send again unicode streams to the image. Don't know which solution of the two is more desirable... The related methods that are called when putting to or getting something from the clipboard are int clipboardSize(void) int clipboardWriteFromAt(int count, int byteArrayIndex, int startIndex) int clipboardReadIntoAt(int count, int byteArrayIndex, int startIndex) in sqWin32Window.c Any help on that Diomidis? Christos. |
You're welcome to look at the Sophie-Clipboard.st ClipboardExtendedPlugin.c JMMExtendedClipBoardPlugin.1.cs in the mac os tree / plugins/ClipboardExtended to see how we extended the clipboard logic for Sophie. Higher up the extended clipboard class uses mimetype information to indicate the data type, at the lower level it's up to the plugin to determine what for example ioReadClipboardData: clipboard format: format means where clipboard is a 32bit value (address), and format is a string value. Likely the method that is not clear is the ioGetClipboardFormat: clipboard formatNumber: formatNumber on the macintosh you can have an item on the clipboard in many formats, such as a string in utf8, utf16, ascii, macroman The ioGetClipboardFormat: formatNumber: returns each format type based on the index number formatNumber. We used the results of that data which we converted back to mimetypes to decide the best format for reading the clipboard. Each platform has helper methods to convert the platform format data to a mimetype, so for example in windows we had clipboardFormatMap at: 49510 put: 'text/rtf' asMIMEType; at: 1 put: 'text/plain' asMIMEType; "CF_TEXT" at: 2 put: 'image/bmp' asMIMEType; "CF_BITMAP" at: 12 put: 'audio/wave' asMIMEType; "CF_WAVE" at: 13 put: 'text/unicode' asMIMEType; "CF_UNICODETEXT" at: 16 put: 'CF_LOCALE'; "CF_LOCALE" I will note for Windows we used FFI to make the required calls and did not build a plugin. So for example for textual data we would process either mime types of rtf, utf8, unicode, or plain Later you use the ioReadClipboardData: clipboard format: format to actually return the data object. I'll note for reading unicode on the mac it came across as UTF16 with no byte order mark, so our read WideString method that returned WideString data did: readWideStringClipboardData | bytes | "utf16 plain text has no bom" bytes := self readClipboardData: 'public.utf16-plain-text'. ^bytes ifNil: [bytes] ifNotNil: [bytes asString convertFromWithConverter: (UTF16TextConverter new useLittleEndian: (SmalltalkImage current endianness = #little) )] on reading we did the following and supplied a byte order mark. addWideStringClipboardData: aString | ba | self clearClipboard. ba := aString convertToWithConverter: (UTF16TextConverter new useByteOrderMark: true). self addClipboardData: ba dataFormat: 'public.utf16-plain-text' On May 22, 2007, at 3:18 AM, Chris Petsos wrote: > >> Michael Rueger wrote: >>> Chris Petsos wrote: >>>> Any quick ideas on how we can handle unicode text from and to the >>>> system clipboard with Squeak? >>> >>> There has been some work done in Sophie, currently being >>> integrated with >>> the OLPC image. >> >> I'm working only for X11 (linux) with the OLPC. >> If you want try on Mac or Win32 soon, see System-Clipboard-Extended >> category in Sophie. >> >> - Takashi > >> From what i saw System-Clipboard-Extended package uses UTF >> Converters for > the internal representation of the data. > The thing is that we are trying to create a VM where the internal > representation of the characters will be Unicode. > This means that the VM we use is sending unicode charcodes to the > image, we > use unicode fonts etc... > So, a UTF interpreted string will not display properly in our > image. Unless, > we use interpreters for our Unicode chars... > I think we will have to patch the VM again so that the clipboard > related > methods send again unicode streams to the image. > Don't know which solution of the two is more desirable... > > The related methods that are called when putting to or getting > something > from the clipboard are > int clipboardSize(void) > int clipboardWriteFromAt(int count, int byteArrayIndex, int > startIndex) > int clipboardReadIntoAt(int count, int byteArrayIndex, int > startIndex) > > in > sqWin32Window.c > > Any help on that Diomidis? > > Christos. > -- ======================================================================== === John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
In reply to this post by Chris Petsos
Use UTF-8 to transfer clipboard data. Windows has two nice functions to efficiently convert those (MultiCharToWideChar and WideCharToMultiChar). By far the easiest solution. Cheers, - Andreas Chris Petsos wrote: > > > > ------------------------------------------------------------------------ > > Any quick ideas on how we can handle unicode text >from and to the > system clipboard with Squeak? > > Christos |
In reply to this post by johnmci
> You're welcome to look at the > Sophie-Clipboard.st > ClipboardExtendedPlugin.c > JMMExtendedClipBoardPlugin.1.cs I didn't find JMMExtendedClipBoardPlugin.1.cs ... Is it integrated in Sophie-Clipboard.st ? Christos. |
In reply to this post by Chris Petsos
On Wed, 2007-05-30 at 11:19 +0300, Diomidis Spinellis wrote: > Chris Petsos wrote: > >> Michael Rueger wrote: > >>> Chris Petsos wrote: > >>>> Any quick ideas on how we can handle unicode text from and to the > >>>> system clipboard with Squeak? > >>> There has been some work done in Sophie, currently being integrated with > >>> the OLPC image. > >> I'm working only for X11 (linux) with the OLPC. > >> If you want try on Mac or Win32 soon, see System-Clipboard-Extended > >> category in Sophie. > >> > >> - Takashi > > > >>From what i saw System-Clipboard-Extended package uses UTF Converters for > > the internal representation of the data. > > The thing is that we are trying to create a VM where the internal > > representation of the characters will be Unicode. > > This means that the VM we use is sending unicode charcodes to the image, we > > use unicode fonts etc... > > So, a UTF interpreted string will not display properly in our image. Unless, > > we use interpreters for our Unicode chars... > > I think we will have to patch the VM again so that the clipboard related > > methods send again unicode streams to the image. > > Don't know which solution of the two is more desirable... > > > > The related methods that are called when putting to or getting something > > from the clipboard are > > int clipboardSize(void) > > int clipboardWriteFromAt(int count, int byteArrayIndex, int startIndex) > > int clipboardReadIntoAt(int count, int byteArrayIndex, int startIndex) > > > > in > > sqWin32Window.c > > > > Any help on that Diomidis? > > Sorry for taking so long to reply. The change needed in sqWin32Window.c > is to replace the five instances of CF_TEXT with CF_UNICODETEXT. > However, this solves only the Windows part of the problem. For this to > work, the characters we copy/paste must be of type wchar_t. In the VM > (unsigned char *)byteArrayIndex + startIndex appears to point to byte > characters. How are Unicode characters represented there? > > Diomidis Spinellis - http://www.spinellis.gr Ok..i am half the way there...the trick is that the image converts the unicode chars to UTF8 before sending them to the VM. Thus, byte data reach the VM in UTF8 representation. These data are then passed to MultiByteToWideChar( CP_UTF8, 0, src, GlobalSize(h) + 1, out, GlobalSize(h2) ); Finally, the converted data are sent to the system clipboard with SetClipboardData(CF_UNICODETEXT, h2); You are right in CF_UNICODETEXT Diomidis. I have in hand a very pre-mature solution...just yesterday i managed to copy something from eToys and paste it to MS Word correctly. But, i know it's a matter of time... I'll post a complete solution as soon as i complete it... By the way...Takashi thanks for your interest... i'll send it as soon as i can so that we can start testing... Again thanks to everyone... Christos. |
Free forum by Nabble | Edit this page |