invalid utf8 input detected

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

invalid utf8 input detected

jhancock
I'm not sure what to expect with UTF-8 encoding in Pharo.  Can someone
point me to some docs or past discussion?

I have tried this today in pharo 1.3:

     1 - in a workspace: cnString := ''.
     2 - Now I go to Firefox and copy 4 simplified Chinese characters
and try to paste this in between the quotes of the smalltalk string in
the workspace.
     3 - the paste operation throws "Error: Invalid utf8 input detected"

The web page I'm copying from is utf-8 encoded.  I can copy text from
this page and paste into vim or gEdit just fine.
The error is thrown by UTF8TextConvertor>>errorMalformedInput

I would provide a stack trace...but being new to pharo, I'm not sure how
to grab such a report without doing a screen capture.

thanks, Jon


Reply | Threaded
Open this post in threaded view
|

Re: invalid utf8 input detected

Sven Van Caekenberghe
Hi Jon,

Welcome to Pharo!

This seems to work for me:

| cnString |
cnString := '请收藏我们的网址'.
cnString collect: [ :each | each charCode ] as: Array  

"Print It"

#(35831 25910 34255 25105 20204 30340 32593 22336)

I took the string from www.google.cn, I don't know what it means…

Mind you, due to font limitations, it shows as question marks in the workspace.

I guess this depends on the OS/VM. I did this example in Pharo 1.2.2 on Mac OS X 10.7 using Croquet Closure Cog VM [CoInterpreter VMMaker-oscog.47] Pharo Cog VM.

HTH,

Sven

On 26 Jul 2011, at 06:23, Jon Hancock wrote:

> I'm not sure what to expect with UTF-8 encoding in Pharo.  Can someone point me to some docs or past discussion?
>
> I have tried this today in pharo 1.3:
>
>    1 - in a workspace: cnString := ''.
>    2 - Now I go to Firefox and copy 4 simplified Chinese characters and try to paste this in between the quotes of the smalltalk string in the workspace.
>    3 - the paste operation throws "Error: Invalid utf8 input detected"
>
> The web page I'm copying from is utf-8 encoded.  I can copy text from this page and paste into vim or gEdit just fine.
> The error is thrown by UTF8TextConvertor>>errorMalformedInput
>
> I would provide a stack trace...but being new to pharo, I'm not sure how to grab such a report without doing a screen capture.
>
> thanks, Jon
>
>


Reply | Threaded
Open this post in threaded view
|

Re: invalid utf8 input detected

jhancock
Thanks Sven.  I'm using the latest Cog build on the latest pharo 1.3
build on linux with ubuntu system fonts loaded and am using in pharo the
same font as my gEdit text editor which displays correctly.
Anyone know why this error occurs or why the fonts show up as question
marks.  I am trying to consider using pharo for some Chinese projects
and need to ensure all string throughout pharo respect UTF8 well.  Is
the smalltalk code also stored in UTF8 form?
Also, anyone have a trick for capturing the full stack trace from a
walkback or debugger as a text stream so I can copy into emails?

thanks, Jon

On 07/26/2011 02:55 AM, Sven Van Caekenberghe wrote:

> Hi Jon,
>
> Welcome to Pharo!
>
> This seems to work for me:
>
> | cnString |
> cnString := '请收藏我们的网址'.
> cnString collect: [ :each | each charCode ] as: Array
>
> "Print It"
>
> #(35831 25910 34255 25105 20204 30340 32593 22336)
>
> I took the string from www.google.cn, I don't know what it means…
>
> Mind you, due to font limitations, it shows as question marks in the workspace.
>
> I guess this depends on the OS/VM. I did this example in Pharo 1.2.2 on Mac OS X 10.7 using Croquet Closure Cog VM [CoInterpreter VMMaker-oscog.47] Pharo Cog VM.
>
> HTH,
>
> Sven
>
> On 26 Jul 2011, at 06:23, Jon Hancock wrote:
>
>> I'm not sure what to expect with UTF-8 encoding in Pharo.  Can someone point me to some docs or past discussion?
>>
>> I have tried this today in pharo 1.3:
>>
>>     1 - in a workspace: cnString := ''.
>>     2 - Now I go to Firefox and copy 4 simplified Chinese characters and try to paste this in between the quotes of the smalltalk string in the workspace.
>>     3 - the paste operation throws "Error: Invalid utf8 input detected"
>>
>> The web page I'm copying from is utf-8 encoded.  I can copy text from this page and paste into vim or gEdit just fine.
>> The error is thrown by UTF8TextConvertor>>errorMalformedInput
>>
>> I would provide a stack trace...but being new to pharo, I'm not sure how to grab such a report without doing a screen capture.
>>
>> thanks, Jon
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: invalid utf8 input detected

Sven Van Caekenberghe

On 27 Jul 2011, at 05:49, Jon Hancock wrote:

> Thanks Sven.  I'm using the latest Cog build on the latest pharo 1.3 build on linux with ubuntu system fonts loaded and am using in pharo the same font as my gEdit text editor which displays correctly.
> Anyone know why this error occurs or why the fonts show up as question marks.  I am trying to consider using pharo for some Chinese projects and need to ensure all string throughout pharo respect UTF8 well.  Is the smalltalk code also stored in UTF8 form?
> Also, anyone have a trick for capturing the full stack trace from a walkback or debugger as a text stream so I can copy into emails?

I think one of the regular Linux users might be better placed to answer, as this relates to UX issues.

Internally, Pharo uses Unicode. UTF8 is one of a list of possible encodings (and the most used one for non ASCII). What happens when you paste from the system clipboard to Pharo is some kind of import, propably with conversions going on. This necessarily depends on the VM and OS.

On Mac OS X this seems to 'just work' as expected. In your case is seems to fail.

Anyone better placed to help Jon ?

Sven


Reply | Threaded
Open this post in threaded view
|

Re: invalid utf8 input detected

Sven Van Caekenberghe
In reply to this post by jhancock
Jon,

On 27 Jul 2011, at 05:49, Jon Hancock wrote:

> Also, anyone have a trick for capturing the full stack trace from a walkback or debugger as a text stream so I can copy into emails?

There is a menu item 'Mail out bug report' when you select a frame in the debugger (there is also a 'File out').

HTH,

Sven