Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Igor Stasenko

I think this is VM-related issue.

On 24 March 2011 15:31, Juraj Kubelka <[hidden email]> wrote:

> For now I am able to fix it by following code:
> #+BEGIN_SRC
> Locale current.  "a Locale(en)"
> Latin2Environment class compile: 'fileNameConverterClass
> ^ISO88592TextConverter'.
> Locale switchToID: (LocaleID isoLanguage: 'cs').
> LanguageEnvironment clearDefault.
> LanguageEnvironment defaultFileNameConverter. "an ISO88592TextConverter"
> #+END_SRC
> It is not perfect, directories with Czech characters is not possible to
> browse. But I do not mind for now. It is anyway strange, because my
> operating system environment is cs_CZ.UTF-8. And files are encoded in UTF-8.
> But FileDirectory>>primLookupEntryIn:index: returns file names in a one byte
> encoding. It is not valid Latin1 or Latin2.
> If anyone knows better solution, please let me know.
> Thanks in advance.
> Juraj
> On Wed, Mar 23, 2011 at 2:10 PM, Juraj Kubelka <[hidden email]>
> wrote:
>>
>> Hi,
>> If I try to execute:
>> (FileDirectory on: '/home/jura') directoryNames
>> it says "Invalid utf8 input detected"
>> (UTF8TextConverter>>errorMalformedInput), because of a directory named
>> 'Veřejné' ('ř' character is the wrong one). In the file system it is the
>> right. FileDirectory>>primLookupEntryIn:index: returns a wrong ByteString
>> ('Ve?ejné').
>> It was tested on Pharo 1.1.1 and Moose 4.3 where I am not able to use a
>> Moose Panel tool because of the problem.
>> Am I able to solve this problem somehow? Set-up something...
>> Thank you in advance,
>> Juraj
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Denis Kudriashov
 
Hello,

I have similar problem when running pharo from windows user with russian letters in login.
Debugger opened in SecurityManager stuff during image startUp procedure.
I ask in past why pharo/squeak doing SecurityManager logic. But without explanation.

2011/3/24 Igor Stasenko <[hidden email]>

I think this is VM-related issue.

On 24 March 2011 15:31, Juraj Kubelka <[hidden email]> wrote:
> For now I am able to fix it by following code:
> #+BEGIN_SRC
> Locale current.  "a Locale(en)"
> Latin2Environment class compile: 'fileNameConverterClass
> ^ISO88592TextConverter'.
> Locale switchToID: (LocaleID isoLanguage: 'cs').
> LanguageEnvironment clearDefault.
> LanguageEnvironment defaultFileNameConverter. "an ISO88592TextConverter"
> #+END_SRC
> It is not perfect, directories with Czech characters is not possible to
> browse. But I do not mind for now. It is anyway strange, because my
> operating system environment is cs_CZ.UTF-8. And files are encoded in UTF-8.
> But FileDirectory>>primLookupEntryIn:index: returns file names in a one byte
> encoding. It is not valid Latin1 or Latin2.
> If anyone knows better solution, please let me know.
> Thanks in advance.
> Juraj
> On Wed, Mar 23, 2011 at 2:10 PM, Juraj Kubelka <[hidden email]>
> wrote:
>>
>> Hi,
>> If I try to execute:
>> (FileDirectory on: '/home/jura') directoryNames
>> it says "Invalid utf8 input detected"
>> (UTF8TextConverter>>errorMalformedInput), because of a directory named
>> 'Veřejné' ('ř' character is the wrong one). In the file system it is the
>> right. FileDirectory>>primLookupEntryIn:index: returns a wrong ByteString
>> ('Ve?ejné').
>> It was tested on Pharo 1.1.1 and Moose 4.3 where I am not able to use a
>> Moose Panel tool because of the problem.
>> Am I able to solve this problem somehow? Set-up something...
>> Thank you in advance,
>> Juraj
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Igor Stasenko

On 24 March 2011 16:15, Denis Kudriashov <[hidden email]> wrote:
>
> Hello,
>
> I have similar problem when running pharo from windows user with russian letters in login.
> Debugger opened in SecurityManager stuff during image startUp procedure.
> I ask in past why pharo/squeak doing SecurityManager logic. But without explanation.
>
Denis,
please file the issue on Cog tracker (if its not works on Cog as
well), so it won't be lost in mailing list.
http://code.google.com/p/cog/issues/list

There is already an issue for linux..
So, most probably is not works on windows as well.

> 2011/3/24 Igor Stasenko <[hidden email]>
>>
>> I think this is VM-related issue.
>>
>> On 24 March 2011 15:31, Juraj Kubelka <[hidden email]> wrote:
>> > For now I am able to fix it by following code:
>> > #+BEGIN_SRC
>> > Locale current.  "a Locale(en)"
>> > Latin2Environment class compile: 'fileNameConverterClass
>> > ^ISO88592TextConverter'.
>> > Locale switchToID: (LocaleID isoLanguage: 'cs').
>> > LanguageEnvironment clearDefault.
>> > LanguageEnvironment defaultFileNameConverter. "an ISO88592TextConverter"
>> > #+END_SRC
>> > It is not perfect, directories with Czech characters is not possible to
>> > browse. But I do not mind for now. It is anyway strange, because my
>> > operating system environment is cs_CZ.UTF-8. And files are encoded in UTF-8.
>> > But FileDirectory>>primLookupEntryIn:index: returns file names in a one byte
>> > encoding. It is not valid Latin1 or Latin2.
>> > If anyone knows better solution, please let me know.
>> > Thanks in advance.
>> > Juraj
>> > On Wed, Mar 23, 2011 at 2:10 PM, Juraj Kubelka <[hidden email]>
>> > wrote:
>> >>
>> >> Hi,
>> >> If I try to execute:
>> >> (FileDirectory on: '/home/jura') directoryNames
>> >> it says "Invalid utf8 input detected"
>> >> (UTF8TextConverter>>errorMalformedInput), because of a directory named
>> >> 'Veřejné' ('ř' character is the wrong one). In the file system it is the
>> >> right. FileDirectory>>primLookupEntryIn:index: returns a wrong ByteString
>> >> ('Ve?ejné').
>> >> It was tested on Pharo 1.1.1 and Moose 4.3 where I am not able to use a
>> >> Moose Panel tool because of the problem.
>> >> Am I able to solve this problem somehow? Set-up something...
>> >> Thank you in advance,
>> >> Juraj
>> >
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>
>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Henrik Sperre Johansen
In reply to this post by Denis Kudriashov


On Mar 24, 2011, at 4:15 26PM, Denis Kudriashov wrote:

> Hello,
>
> I have similar problem when running pharo from windows user with russian letters in login.
> Debugger opened in SecurityManager stuff during image startUp procedure.
> I ask in past why pharo/squeak doing SecurityManager logic. But without explanation.

It's hard to tell with that as the only info.

F.ex:
If you're using Cog, it's because it doesn't convert encoding, but returns your codepage-encoded bytestring.
If you're using Non-Cog, it's either because you
        a) use a Locale in Pharo with non-UTF8 systemConverterClass. SecurityPlugin on Windows always returns utf8 strings.
        b) Use Unicode characters in your name  that have have no representation in your local codepage.
        SecurityPlugin uses the non-W api function to get your username, then does codepage -> UTF8 conversion, instead of using the W version and doing UTF16 -> UTF8 conversion.

Cheers,
Henry

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Igor Stasenko

On 24 March 2011 16:53, Henrik Johansen <[hidden email]> wrote:

>
>
> On Mar 24, 2011, at 4:15 26PM, Denis Kudriashov wrote:
>
>> Hello,
>>
>> I have similar problem when running pharo from windows user with russian letters in login.
>> Debugger opened in SecurityManager stuff during image startUp procedure.
>> I ask in past why pharo/squeak doing SecurityManager logic. But without explanation.
>
> It's hard to tell with that as the only info.
>
> F.ex:
> If you're using Cog, it's because it doesn't convert encoding, but returns your codepage-encoded bytestring.
> If you're using Non-Cog, it's either because you
>        a) use a Locale in Pharo with non-UTF8 systemConverterClass. SecurityPlugin on Windows always returns utf8 strings.
>        b) Use Unicode characters in your name  that have have no representation in your local codepage.
>        SecurityPlugin uses the non-W api function to get your username, then does codepage -> UTF8 conversion, instead of using the W version and doing UTF16 -> UTF8 conversion.
>

Henrik, it would be good if you can help with fixing this in Cog..


> Cheers,
> Henry
>


--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Eliot Miranda-2
In reply to this post by Henrik Sperre Johansen
 


On Thu, Mar 24, 2011 at 8:53 AM, Henrik Johansen <[hidden email]> wrote:


On Mar 24, 2011, at 4:15 26PM, Denis Kudriashov wrote:

> Hello,
>
> I have similar problem when running pharo from windows user with russian letters in login.
> Debugger opened in SecurityManager stuff during image startUp procedure.
> I ask in past why pharo/squeak doing SecurityManager logic. But without explanation.

It's hard to tell with that as the only info.

F.ex:
If you're using Cog, it's because it doesn't convert encoding, but returns your codepage-encoded bytestring.
If you're using Non-Cog, it's either because you
       a) use a Locale in Pharo with non-UTF8 systemConverterClass. SecurityPlugin on Windows always returns utf8 strings.
       b) Use Unicode characters in your name  that have have no representation in your local codepage.
       SecurityPlugin uses the non-W api function to get your username, then does codepage -> UTF8 conversion, instead of using the W version and doing UTF16 -> UTF8 conversion.

Clearly we need to do a careful merge of the http://www.squeakvm.org/svn/squeak/branches/Cog/platforms tree with http://squeakvm.org/svn/squeak/trunk/platforms tree.  There are important improvements from Cog:
- time based on 64-bit microseconds
- better crash/logging stack backtrace reporting
- concurrent lock-free external semaphore signalling
- threaded stdio
- vm version info
- others?

There are necessary support functions for Cog
- heartbeat
- making method zone executable

There are a raft of Teleplace-specific changes.  I'm not sure which ones we want to cherry-pick.  Andreas, I'd appreciate any views you have here.  But things like the host-window and sound plugins I think we have to simply discard the Teleplace changes.  I don't have the cycles to maintain or port these improvements as typically they require a more specific environment than the current environment of the standard Squeak VM.

I'm happy to do this work; it will just take time (I've at least merged my VMMaker with trunk VMMaker's HostWindowPlugin and SoundPlugin but have yet to commit).  I'm very happy to discuss with others who want to have a go at doing this themselves.  But I would like some editorial control; at least a chane to review.

best,
Eliot

Cheers,
Henry


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Igor Stasenko

On 24 March 2011 19:09, Eliot Miranda <[hidden email]> wrote:

>
>
>
> On Thu, Mar 24, 2011 at 8:53 AM, Henrik Johansen <[hidden email]> wrote:
>>
>>
>> On Mar 24, 2011, at 4:15 26PM, Denis Kudriashov wrote:
>>
>> > Hello,
>> >
>> > I have similar problem when running pharo from windows user with russian letters in login.
>> > Debugger opened in SecurityManager stuff during image startUp procedure.
>> > I ask in past why pharo/squeak doing SecurityManager logic. But without explanation.
>>
>> It's hard to tell with that as the only info.
>>
>> F.ex:
>> If you're using Cog, it's because it doesn't convert encoding, but returns your codepage-encoded bytestring.
>> If you're using Non-Cog, it's either because you
>>        a) use a Locale in Pharo with non-UTF8 systemConverterClass. SecurityPlugin on Windows always returns utf8 strings.
>>        b) Use Unicode characters in your name  that have have no representation in your local codepage.
>>        SecurityPlugin uses the non-W api function to get your username, then does codepage -> UTF8 conversion, instead of using the W version and doing UTF16 -> UTF8 conversion.
>
> Clearly we need to do a careful merge of the http://www.squeakvm.org/svn/squeak/branches/Cog/platforms tree with http://squeakvm.org/svn/squeak/trunk/platforms tree.  There are important improvements from Cog:
> - time based on 64-bit microseconds
> - better crash/logging stack backtrace reporting
> - concurrent lock-free external semaphore signalling
> - threaded stdio
> - vm version info
> - others?
> There are necessary support functions for Cog
> - heartbeat
> - making method zone executable
> There are a raft of Teleplace-specific changes.  I'm not sure which ones we want to cherry-pick.  Andreas, I'd appreciate any views you have here.  But things like the host-window and sound plugins I think we have to simply discard the Teleplace changes.  I don't have the cycles to maintain or port these improvements as typically they require a more specific environment than the current environment of the standard Squeak VM.
> I'm happy to do this work; it will just take time (I've at least merged my VMMaker with trunk VMMaker's HostWindowPlugin and SoundPlugin but have yet to commit).  I'm very happy to discuss with others who want to have a go at doing this themselves.  But I would like some editorial control; at least a chane to review.

Don't worry , mr Hudson will do some editorial control :)

To dear Everyone:

it would be nice to gather information about changes we need to port
from Squeak VM to Cog.
I propose to slowly fill the cog issue tracker with separate entry per plugin.
I really lack of specific information about it, because i were not
part of VM development process before,
and while i can help with hacking things around, before hacking it
would be good to know what exactly requires attention :)

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Henrik Sperre Johansen
In reply to this post by Igor Stasenko
Igor Stasenko wrote
On 24 March 2011 16:53, Henrik Johansen <henrik.s.johansen@veloxit.no> wrote:
>
>
> On Mar 24, 2011, at 4:15 26PM, Denis Kudriashov wrote:
>
>> Hello,
>>
>> I have similar problem when running pharo from windows user with russian letters in login.
>> Debugger opened in SecurityManager stuff during image startUp procedure.
>> I ask in past why pharo/squeak doing SecurityManager logic. But without explanation.
>
> It's hard to tell with that as the only info.
>
> F.ex:
> If you're using Cog, it's because it doesn't convert encoding, but returns your codepage-encoded bytestring.
> If you're using Non-Cog, it's either because you
>        a) use a Locale in Pharo with non-UTF8 systemConverterClass. SecurityPlugin on Windows always returns utf8 strings.
>        b) Use Unicode characters in your name  that have have no representation in your local codepage.
>        SecurityPlugin uses the non-W api function to get your username, then does codepage -> UTF8 conversion, instead of using the W version and doing UTF16 -> UTF8 conversion.
>

Henrik, it would be good if you can help with fixing this in Cog..


> Cheers,
> Henry
>


--
Best regards,
Igor Stasenko AKA sig.
Trunk also includes lots of code for storing the secure directories etc. in the Windows registry.
Reviewing/Rewriting/Removing the code for a proposal of something that could be included in a merge of cog/trunk takes longer than I'd like, especially with limited time and lots of other balls in the air.

I've yet to actually try and compile it, to put it that way :)

Cheers,
Henry
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Igor Stasenko
 
On 24 March 2011 19:37, Henrik Sperre Johansen
<[hidden email]> wrote:

>
>
> Igor Stasenko wrote:
>
> Trunk also includes lots of code for storing the secure directories etc. in
> the Windows registry.
> Reviewing/Rewriting/Removing the code for a proposal of something that could
> be included in a merge of cog/trunk takes longer than I'd like, especially
> with limited time and lots of other balls in the air.
>
> I've yet to actually try and compile it, to put it that way :)
>
Henrik i am not proposing to do it alone. Write a plan, put it on issue tracker
and we will slowly get there one day or another.
You probably must knowledgeable person in this area, so that's why i
asking, what you think
needs to be fixed and where.
So, by having these directions, and plan it is only a matter of time
to do that :)

> Cheers,
> Henry
>
> --
> View this message in context: http://forum.world.st/Re-Pharo-users-Get-Invalid-utf8-input-detected-error-for-a-filename-tp3402774p3403388.html
> Sent from the Squeak VM mailing list archive at Nabble.com.
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Henrik Sperre Johansen
 
On 24.03.2011 19:45, Igor Stasenko wrote:

>
> On 24 March 2011 19:37, Henrik Sperre Johansen
> <[hidden email]>  wrote:
>>
>> Igor Stasenko wrote:
>>
>> Trunk also includes lots of code for storing the secure directories etc. in
>> the Windows registry.
>> Reviewing/Rewriting/Removing the code for a proposal of something that could
>> be included in a merge of cog/trunk takes longer than I'd like, especially
>> with limited time and lots of other balls in the air.
>>
>> I've yet to actually try and compile it, to put it that way :)
>>
> Henrik i am not proposing to do it alone. Write a plan, put it on issue tracker
> and we will slowly get there one day or another.
> You probably must knowledgeable person in this area, so that's why i
> asking, what you think
> needs to be fixed and where.
> So, by having these directions, and plan it is only a matter of time
> to do that :)

Done, issue #14.
If someone has better suggestions for a direction to aim for than the
ones I suggested there, I'm interested.
(As well as if someone knows an impassable obstacle for either of them
being feasible)

Cheers,
Henry




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

stephane ducasse-2
In reply to this post by Eliot Miranda-2

>
> Clearly we need to do a careful merge of the http://www.squeakvm.org/svn/squeak/branches/Cog/platforms tree with http://squeakvm.org/svn/squeak/trunk/platforms tree.  There are important improvements from Cog:
> - time based on 64-bit microseconds
> - better crash/logging stack backtrace reporting
> - concurrent lock-free external semaphore signalling
> - threaded stdio
> - vm version info
> - others?

Elliot do you know if any of these issues include the fix that igor sent long time ago about the semaphore
for event input (because it is missing on windows) and with it we will be able to remove the polling behavior for events.

Stef
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Eliot Miranda-2
 


On Thu, Mar 24, 2011 at 2:12 PM, stephane ducasse <[hidden email]> wrote:

>
> Clearly we need to do a careful merge of the http://www.squeakvm.org/svn/squeak/branches/Cog/platforms tree with http://squeakvm.org/svn/squeak/trunk/platforms tree.  There are important improvements from Cog:
> - time based on 64-bit microseconds
> - better crash/logging stack backtrace reporting
> - concurrent lock-free external semaphore signalling
> - threaded stdio
> - vm version info
> - others?

Elliot do you know if any of these issues include the fix that igor sent long time ago about the semaphore
for event input (because it is missing on windows) and with it we will be able to remove the polling behavior for events.

No idea. Sorry. 

Stef

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Igor Stasenko

On 24 March 2011 22:12, Eliot Miranda <[hidden email]> wrote:

>
>
>
> On Thu, Mar 24, 2011 at 2:12 PM, stephane ducasse <[hidden email]> wrote:
>>
>> >
>> > Clearly we need to do a careful merge of the http://www.squeakvm.org/svn/squeak/branches/Cog/platforms tree with http://squeakvm.org/svn/squeak/trunk/platforms tree.  There are important improvements from Cog:
>> > - time based on 64-bit microseconds
>> > - better crash/logging stack backtrace reporting
>> > - concurrent lock-free external semaphore signalling
>> > - threaded stdio
>> > - vm version info
>> > - others?
>>
>> Elliot do you know if any of these issues include the fix that igor sent long time ago about the semaphore
>> for event input (because it is missing on windows) and with it we will be able to remove the polling behavior for events.
>
> No idea. Sorry.

Here it is:
primSetInputSemaphore: semaIndex
        "Set the input semaphore the VM should use for asynchronously
signaling the availability of events. Primitive. Optional."
        <primitive: 93>
        ^nil


On unix (platforms/unix/vm/sqUnixMain.c):
/* set asynchronous input event semaphore  */

sqInt ioSetInputSemaphore(sqInt semaIndex)
{
  if ((semaIndex == 0) || (noEvents == 1))
    success(false);
  else
    inputEventSemaIndex= semaIndex;
  return true;
}

and then you see there is calls to:

static void signalInputEvent(void)
{
#if DEBUG_EVENTS
  printf("signalInputEvent\n");
#endif
  if (inputEventSemaIndex > 0)
    signalSemaphoreWithIndex(inputEventSemaIndex);
}


On win32 (platforms/win32/sqWin32Window.c)

There is a function which sets it up:

int ioSetInputSemaphore(int semaIndex) {
  inputSemaphoreIndex = semaIndex;
  return 1;
}

but there is no any code which signals it.

There's only a code which looks like:

 if(inputSemaphoreIndex) {
      recordMouseEvent(lastMessage, nrClicks);
      break;
 }

The fix is ultimately trivial, as you may guess:


sqInputEvent *sqNextEventPut(void) {
  sqInputEvent *evt;
  evt = eventBuffer + eventBufferPut;
  eventBufferPut = (eventBufferPut + 1) % MAX_EVENT_BUFFER;
  if (eventBufferGet == eventBufferPut) {
    /* buffer overflow; drop the last event */
    printf("WARNING: event buffer overflow\n");
    eventBufferGet = (eventBufferGet + 1) % MAX_EVENT_BUFFER;
  }
       
+++  if(inputSemaphoreIndex)
+++  signalSemaphoreWithIndex(inputEventSemaIndex);

  return evt;
}




--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-users] Get "Invalid utf8 input detected" error for a filename

Igor Stasenko

On 24 March 2011 23:11, Igor Stasenko <[hidden email]> wrote:

> On 24 March 2011 22:12, Eliot Miranda <[hidden email]> wrote:
>>
>>
>>
>> On Thu, Mar 24, 2011 at 2:12 PM, stephane ducasse <[hidden email]> wrote:
>>>
>>> >
>>> > Clearly we need to do a careful merge of the http://www.squeakvm.org/svn/squeak/branches/Cog/platforms tree with http://squeakvm.org/svn/squeak/trunk/platforms tree.  There are important improvements from Cog:
>>> > - time based on 64-bit microseconds
>>> > - better crash/logging stack backtrace reporting
>>> > - concurrent lock-free external semaphore signalling
>>> > - threaded stdio
>>> > - vm version info
>>> > - others?
>>>
>>> Elliot do you know if any of these issues include the fix that igor sent long time ago about the semaphore
>>> for event input (because it is missing on windows) and with it we will be able to remove the polling behavior for events.
>>
>> No idea. Sorry.
>
in here
http://gitorious.org/~abrabapupa/cogvm/sig-cog/commit/6dd21e4fc981e16b3f7b5d895d54f4bb22e001f7

Tested on win with Pharo image:

InputEventSensor installEventSensorFramework: InputEventFetcher


--
Best regards,
Igor Stasenko AKA sig.