Smalltalk › Pharo › Pharo Smalltalk Developers

CompiledMethodTrailers ready for use

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

6 messages Options

Igor Stasenko

CompiledMethodTrailers ready for use

Hello,
i finished this stuff, and its ready for adoption.

See http://bugs.squeak.org/view.php?id=7428

Anyone wants to help pushing it into trunk update stream (using MC configs)?

It works fine on recent trunk image,
on pharo however i had some problems installing changes, because of
some differencies.

Tried on PharoCore-1.1-11106-ALPHA.image

phase2.1.cs
- do not filein the TextEditor changes, since pharo-core don't have it.
- do not filein the last line (reorganizing)..

- tests failing because pharo String class does not implements
#squeakToUtf8
nor
#utf8ToSqueak

Do we having an uniform way how to encode ANY String -> ByteString(utf8)
and back? What ANSI standard saying about it? Maybe i'm using wrong methods?

Still, i think we need this thing standartized and be common for all
dialects (not just Pharo/Squeak).

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Igor Stasenko

Re: CompiledMethodTrailers ready for use

2009/12/20 Igor Stasenko <[hidden email]>:

> Hello,
> i finished this stuff, and its ready for adoption.
>
> See http://bugs.squeak.org/view.php?id=7428
>
> Anyone wants to help pushing it into trunk update stream (using MC configs)?
>
> It works fine on recent trunk image,
> on pharo however i had some problems installing changes, because of
> some differencies.
>
> Tried on PharoCore-1.1-11106-ALPHA.image
>
> phase2.1.cs
> - do not filein the TextEditor changes, since pharo-core don't have it.
> - do not filein the last line (reorganizing)..
>
> - tests failing because pharo String class does not implements
> #squeakToUtf8
> nor
> #utf8ToSqueak
>
> Do we having an uniform way how to encode ANY String -> ByteString(utf8)
> and back? What ANSI standard saying about it? Maybe i'm using wrong methods?
>

Update:
- fixed the utf8 stuff, by using a #convertToEncoding: / #convertFromEncoding:
- @Pharoers: do not file-in a reorganize crap, attached in *phase* and
*cleanup* changesets.

There is an issue with #defaultMethodTrailer implementation, which i
missed to change.
In trunk, i changed it in TCompilingBehavior
but in Pharo, there's no such trait.
There is an additional .cs to fix that
(see notes on mantis)

Pfff.. i hope i din't miss anything this time :)

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Henrik Sperre Johansen

Re: CompiledMethodTrailers ready for use

In reply to this post by Igor Stasenko

On 20.12.2009 20:04, Igor Stasenko wrote:
> Hello,
> i finished this stuff, and its ready for adoption.
>
Nice!

> See http://bugs.squeak.org/view.php?id=7428
>
> Anyone wants to help pushing it into trunk update stream (using MC configs)?
>
> It works fine on recent trunk image,
> on pharo however i had some problems installing changes, because of
> some differencies.
>
> Tried on PharoCore-1.1-11106-ALPHA.image
>
> phase2.1.cs
> - do not filein the TextEditor changes, since pharo-core don't have it.
> - do not filein the last line (reorganizing)..
>
> - tests failing because pharo String class does not implements
> #squeakToUtf8
> nor
> #utf8ToSqueak
>
> Do we having an uniform way how to encode ANY String -> ByteString(utf8)
> and back? What ANSI standard saying about it? Maybe i'm using wrong methods?
>

"3.4.6.4 - It is erroneous if stringBody contains any characters that
does not exist in the implementation
defined execution character set used in the representation of character
objects."
So, implementation defined.
Every internal String (in Squeak and Pharo) (afaik) should be either
latin1 (ByteStrings) or + utf32 with the high byte used for
differentiation between language of the string.

To me, sending squeakToUtf8, then using StandardFileStream instead of
FileStream seems safe.
As long as the ByteString's bytes is utf8, utf8ToSqueak works. (And in
most other cases as well)
In fact, it's safer than UTF8Decoder for non-utf8 strings, which does
not perform the validity checks (only reads the total #of bytes) when
encountering bytes > 127.
The reason it seems mostly for internal use (to me) is the fact it
silently falls back to assuming string is already in latin1 (ie, the
"valid" ByteString format), instead of raising an error like the stream
decoder does. (Which, by the way, would be much nicer if was a
MalformedUTF8Error or some such...)

ws := StandardFileStream newFileNamed: 'test.txt'.
"Save as latin1"
ws nextPutAll: 'ååå'.
ws close.

"Read with UTF8Decoder"
rs := FileStream oldFileNamed: 'test.txt'.
"Print this, gives a ?"
rs contents.
rs close

"Read with Latin1Decoder"
rs := StandardFileStream oldFileNamed: 'test.txt'.
"Print this, gives ååå. since it's not valid utf8, thus assumes latin1"
rs contents utf8ToSqueak.
rs close
> Still, i think we need this thing standartized and be common for all
> dialects (not just Pharo/Squeak).
>
There's really only one way to store characters in a ByteArray (ie.
ByteString) and call it utf8 encoded.
As far as I can tell, Squeak seems to do the right thing :)
I believe Nicolas pushed for implementation in Pharo some time ago, not
sure what happened to that.

Cheers,
Henry

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Igor Stasenko

Re: CompiledMethodTrailers ready for use

2009/12/20 Henrik Sperre Johansen <[hidden email]>:

> On 20.12.2009 20:04, Igor Stasenko wrote:
>> Hello,
>> i finished this stuff, and its ready for adoption.
>>
> Nice!
>> See http://bugs.squeak.org/view.php?id=7428
>>
>> Anyone wants to help pushing it into trunk update stream (using MC configs)?
>>
>> It works fine on recent trunk image,
>> on pharo however i had some problems installing changes, because of
>> some differencies.
>>
>> Tried on PharoCore-1.1-11106-ALPHA.image
>>
>> phase2.1.cs
>> - do not filein the TextEditor changes, since pharo-core don't have it.
>> - do not filein the last line (reorganizing)..
>>
>> - tests failing because pharo String class does not implements
>> #squeakToUtf8
>> nor
>> #utf8ToSqueak
>>
>> Do we having an uniform way how to encode ANY String -> ByteString(utf8)
>> and back? What ANSI standard saying about it? Maybe i'm using wrong methods?
>>
> "3.4.6.4 - It is erroneous if stringBody contains any characters that
> does not exist in the implementation
> defined execution character set used in the representation of character
> objects."
> So, implementation defined.
> Every internal String (in Squeak and Pharo) (afaik) should be either
> latin1 (ByteStrings) or + utf32 with the high byte used for
> differentiation between language of the string.
>
> To me, sending squeakToUtf8, then using StandardFileStream instead of
> FileStream seems safe.
> As long as the ByteString's bytes is utf8, utf8ToSqueak works. (And in
> most other cases as well)
> In fact, it's safer than UTF8Decoder for non-utf8 strings, which does
> not perform the validity checks (only reads the total #of bytes) when
> encountering bytes > 127.
> The reason it seems mostly for internal use (to me) is the fact it
> silently falls back to assuming string is already in latin1 (ie, the
> "valid" ByteString format), instead of raising an error like the stream
> decoder does. (Which, by the way, would be much nicer if was a
> MalformedUTF8Error or some such...)
>
> ws := StandardFileStream newFileNamed: 'test.txt'.
> "Save as latin1"
> ws nextPutAll: 'ååå'.
> ws close.
>
> "Read with UTF8Decoder"
> rs := FileStream oldFileNamed: 'test.txt'.
> "Print this, gives a ?"
> rs contents.
> rs close
>
> "Read with Latin1Decoder"
> rs := StandardFileStream oldFileNamed: 'test.txt'.
> "Print this, gives ååå. since it's not valid utf8, thus assumes latin1"
> rs contents utf8ToSqueak.
> rs close
>> Still, i think we need this thing standartized and be common for all
>> dialects (not just Pharo/Squeak).
>>
> There's really only one way to store characters in a ByteArray (ie.
> ByteString) and call it utf8 encoded.
> As far as I can tell, Squeak seems to do the right thing :)
> I believe Nicolas pushed for implementation in Pharo some time ago, not
> sure what happened to that.
>

I seems solved this by using #convertToEncoding: / #convertFromEncoding: .
Tests working fine after that. I didn't tried however to use source
with other than Latin1 characters yet.

> Cheers,
> Henry
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Henrik Sperre Johansen

Re: CompiledMethodTrailers ready for use

On 20.12.2009 22:07, Igor Stasenko wrote:

> 2009/12/20 Henrik Sperre Johansen<[hidden email]>:
>
>> On 20.12.2009 20:04, Igor Stasenko wrote:
>>
>>> Hello,
>>> i finished this stuff, and its ready for adoption.
>>>
>>>
>> Nice!
>>
>>> See http://bugs.squeak.org/view.php?id=7428
>>>
>>> Anyone wants to help pushing it into trunk update stream (using MC configs)?
>>>
>>> It works fine on recent trunk image,
>>> on pharo however i had some problems installing changes, because of
>>> some differencies.
>>>
>>> Tried on PharoCore-1.1-11106-ALPHA.image
>>>
>>> phase2.1.cs
>>> - do not filein the TextEditor changes, since pharo-core don't have it.
>>> - do not filein the last line (reorganizing)..
>>>
>>> - tests failing because pharo String class does not implements
>>> #squeakToUtf8
>>> nor
>>> #utf8ToSqueak
>>>
>>> Do we having an uniform way how to encode ANY String -> ByteString(utf8)
>>> and back? What ANSI standard saying about it? Maybe i'm using wrong methods?
>>>
>>>
>> "3.4.6.4 - It is erroneous if stringBody contains any characters that
>> does not exist in the implementation
>> defined execution character set used in the representation of character
>> objects."
>> So, implementation defined.
>> Every internal String (in Squeak and Pharo) (afaik) should be either
>> latin1 (ByteStrings) or + utf32 with the high byte used for
>> differentiation between language of the string.
>>
>> To me, sending squeakToUtf8, then using StandardFileStream instead of
>> FileStream seems safe.
>> As long as the ByteString's bytes is utf8, utf8ToSqueak works. (And in
>> most other cases as well)
>> In fact, it's safer than UTF8Decoder for non-utf8 strings, which does
>> not perform the validity checks (only reads the total #of bytes) when
>> encountering bytes> 127.
>> The reason it seems mostly for internal use (to me) is the fact it
>> silently falls back to assuming string is already in latin1 (ie, the
>> "valid" ByteString format), instead of raising an error like the stream
>> decoder does. (Which, by the way, would be much nicer if was a
>> MalformedUTF8Error or some such...)
>>
>> ws := StandardFileStream newFileNamed: 'test.txt'.
>> "Save as latin1"
>> ws nextPutAll: 'ååå'.
>> ws close.
>>
>> "Read with UTF8Decoder"
>> rs := FileStream oldFileNamed: 'test.txt'.
>> "Print this, gives a ?"
>> rs contents.
>> rs close
>>
>> "Read with Latin1Decoder"
>> rs := StandardFileStream oldFileNamed: 'test.txt'.
>> "Print this, gives ååå. since it's not valid utf8, thus assumes latin1"
>> rs contents utf8ToSqueak.
>> rs close
>>
>>> Still, i think we need this thing standartized and be common for all
>>> dialects (not just Pharo/Squeak).
>>>
>>>
>> There's really only one way to store characters in a ByteArray (ie.
>> ByteString) and call it utf8 encoded.
>> As far as I can tell, Squeak seems to do the right thing :)
>> I believe Nicolas pushed for implementation in Pharo some time ago, not
>> sure what happened to that.
>>
>>
> I seems solved this by using #convertToEncoding: / #convertFromEncoding: .
> Tests working fine after that. I didn't tried however to use source
> with other than Latin1 characters yet.
>

Converting to utf8 from ByteString/WideString should not be a problem,
as long as you know the ByteString encoding is latin1. (Which it should
if created it by any normal means)
As long as you are SURE the string you are decoding is utf8 (like, when
you've encoded them all yourself ;) ), convertFromEncoding: shouldn't be
a problem either. (See previous mail, it's the same as used by
FileStream, so lacks the validity checks).

Cheers,
Henry

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Igor Stasenko

Re: CompiledMethodTrailers ready for use

2009/12/20 Henrik Sperre Johansen <[hidden email]>:

> On 20.12.2009 22:07, Igor Stasenko wrote:
>> 2009/12/20 Henrik Sperre Johansen<[hidden email]>:
>>
>>> On 20.12.2009 20:04, Igor Stasenko wrote:
>>>
>>>> Hello,
>>>> i finished this stuff, and its ready for adoption.
>>>>
>>>>
>>> Nice!
>>>
>>>> See http://bugs.squeak.org/view.php?id=7428
>>>>
>>>> Anyone wants to help pushing it into trunk update stream (using MC configs)?
>>>>
>>>> It works fine on recent trunk image,
>>>> on pharo however i had some problems installing changes, because of
>>>> some differencies.
>>>>
>>>> Tried on PharoCore-1.1-11106-ALPHA.image
>>>>
>>>> phase2.1.cs
>>>> - do not filein the TextEditor changes, since pharo-core don't have it.
>>>> - do not filein the last line (reorganizing)..
>>>>
>>>> - tests failing because pharo String class does not implements
>>>> #squeakToUtf8
>>>> nor
>>>> #utf8ToSqueak
>>>>
>>>> Do we having an uniform way how to encode ANY String -> ByteString(utf8)
>>>> and back? What ANSI standard saying about it? Maybe i'm using wrong methods?
>>>>
>>>>
>>> "3.4.6.4 - It is erroneous if stringBody contains any characters that
>>> does not exist in the implementation
>>> defined execution character set used in the representation of character
>>> objects."
>>> So, implementation defined.
>>> Every internal String (in Squeak and Pharo) (afaik) should be either
>>> latin1 (ByteStrings) or + utf32 with the high byte used for
>>> differentiation between language of the string.
>>>
>>> To me, sending squeakToUtf8, then using StandardFileStream instead of
>>> FileStream seems safe.
>>> As long as the ByteString's bytes is utf8, utf8ToSqueak works. (And in
>>> most other cases as well)
>>> In fact, it's safer than UTF8Decoder for non-utf8 strings, which does
>>> not perform the validity checks (only reads the total #of bytes) when
>>> encountering bytes> 127.
>>> The reason it seems mostly for internal use (to me) is the fact it
>>> silently falls back to assuming string is already in latin1 (ie, the
>>> "valid" ByteString format), instead of raising an error like the stream
>>> decoder does. (Which, by the way, would be much nicer if was a
>>> MalformedUTF8Error or some such...)
>>>
>>> ws := StandardFileStream newFileNamed: 'test.txt'.
>>> "Save as latin1"
>>> ws nextPutAll: 'ååå'.
>>> ws close.
>>>
>>> "Read with UTF8Decoder"
>>> rs := FileStream oldFileNamed: 'test.txt'.
>>> "Print this, gives a ?"
>>> rs contents.
>>> rs close
>>>
>>> "Read with Latin1Decoder"
>>> rs := StandardFileStream oldFileNamed: 'test.txt'.
>>> "Print this, gives ååå. since it's not valid utf8, thus assumes latin1"
>>> rs contents utf8ToSqueak.
>>> rs close
>>>
>>>> Still, i think we need this thing standartized and be common for all
>>>> dialects (not just Pharo/Squeak).
>>>>
>>>>
>>> There's really only one way to store characters in a ByteArray (ie.
>>> ByteString) and call it utf8 encoded.
>>> As far as I can tell, Squeak seems to do the right thing :)
>>> I believe Nicolas pushed for implementation in Pharo some time ago, not
>>> sure what happened to that.
>>>
>>>
>> I seems solved this by using #convertToEncoding: / #convertFromEncoding: .
>> Tests working fine after that. I didn't tried however to use source
>> with other than Latin1 characters yet.
>>
> Converting to utf8 from ByteString/WideString should not be a problem,
> as long as you know the ByteString encoding is latin1. (Which it should
> if created it by any normal means)
> As long as you are SURE the string you are decoding is utf8 (like, when
> you've encoded them all yourself ;) ), convertFromEncoding: shouldn't be
> a problem either. (See previous mail, it's the same as used by
> FileStream, so lacks the validity checks).
>

Ok, thanks for clarification.

I'm also found other places in Pharo where its using a #( 0 0 0 0)
as trailer in
addTraitSelector: aSymbol withMethod: aCompiledMethod

it needs to be fixed (as well as all other places which trying to use
arrays for defining a trailer).

> Cheers,
> Henry
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project