Base64 encoding + UTF8 ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Base64 encoding + UTF8 ?

fstephany
I might hit some problem with Base64 encoding in there.
It seems that Pharo does not use UTF8 for its Base64 encoding.

I'm probably missing something related to Base64 encoding...

In Pharo 3.0:

ZnBase64Encoder new encode: 'tamèreenslipdeguerre' asByteArray.
-> 'dGFt6HJlZW5zbGlwZGVndWVycmU='

'tamèreenslipdeguerre' base64Encoded.
-> 'dGFt6HJlZW5zbGlwZGVndWVycmU='

In Ruby 2.0:

Base64.strict_encode64("tamèreenslipdeguerre")
-> dGFtw6hyZWVuc2xpcGRlZ3VlcnJl

http://www.motobit.com/util/base64-decoder-encoder.asp
(iso-8859-1)
tamèreenslipdeguerre -> dGFt6HJlZW5zbGlwZGVndWVycmU=

http://www.motobit.com/util/base64-decoder-encoder.asp
(utf8)
-tamèreenslipdeguerre > dGFtw6hyZWVuc2xpcGRlZ3VlcnJl
Reply | Threaded
Open this post in threaded view
|

Re: Base64 encoding + UTF8 ?

Sven Van Caekenberghe-2
Bonsoir François,

From the class comment of ZnBase64Encoder:

[...]
Note that to encode a String as Base64, you first have to encode the characters as bytes using a character encoder.
[...]

Sending #asByteArray to a String is the same as doing no encoding (or doing null encoding).

Consider:

ZnBase64Encoder new encode: (ZnUTF8Encoder new encodeString: 'tamèreenslipdeguerre').

=> 'dGFtw6hyZWVuc2xpcGRlZ3VlcnJl'

ZnBase64Encoder new encode: (ZnByteEncoder iso88591 encodeString: 'tamèreenslipdeguerre').

=> 'dGFt6HJlZW5zbGlwZGVndWVycmU='

ZnBase64Encoder new encode: (ZnNullEncoder new encodeString: 'tamèreenslipdeguerre').

=> 'dGFt6HJlZW5zbGlwZGVndWVycmU='

The last two are often the same, and thus equivalent to #asByteArray, but not always.

HTH,

Sven

On 11 Jun 2014, at 21:56, François Stephany <[hidden email]> wrote:

> I might hit some problem with Base64 encoding in there.
> It seems that Pharo does not use UTF8 for its Base64 encoding.
>
> I'm probably missing something related to Base64 encoding...
>
> In Pharo 3.0:
>
> ZnBase64Encoder new encode: 'tamèreenslipdeguerre' asByteArray.
> -> 'dGFt6HJlZW5zbGlwZGVndWVycmU='
>
> 'tamèreenslipdeguerre' base64Encoded.
> -> 'dGFt6HJlZW5zbGlwZGVndWVycmU='
>
> In Ruby 2.0:
>
> Base64.strict_encode64("tamèreenslipdeguerre")
> -> dGFtw6hyZWVuc2xpcGRlZ3VlcnJl
>
> http://www.motobit.com/util/base64-decoder-encoder.asp
> (iso-8859-1)
> tamèreenslipdeguerre -> dGFt6HJlZW5zbGlwZGVndWVycmU=
>
> http://www.motobit.com/util/base64-decoder-encoder.asp
> (utf8)
> -tamèreenslipdeguerre > dGFtw6hyZWVuc2xpcGRlZ3VlcnJl


Reply | Threaded
Open this post in threaded view
|

Re: Base64 encoding + UTF8 ?

fstephany
Oh stupid me! 
Thanks a lot Sven, crystal clear explanations (and class comments!), as always :)



On Wed, Jun 11, 2014 at 10:28 PM, Sven Van Caekenberghe <[hidden email]> wrote:
Bonsoir François,

From the class comment of ZnBase64Encoder:

[...]
Note that to encode a String as Base64, you first have to encode the characters as bytes using a character encoder.
[...]

Sending #asByteArray to a String is the same as doing no encoding (or doing null encoding).

Consider:

ZnBase64Encoder new encode: (ZnUTF8Encoder new encodeString: 'tamèreenslipdeguerre').

=> 'dGFtw6hyZWVuc2xpcGRlZ3VlcnJl'

ZnBase64Encoder new encode: (ZnByteEncoder iso88591 encodeString: 'tamèreenslipdeguerre').

=> 'dGFt6HJlZW5zbGlwZGVndWVycmU='

ZnBase64Encoder new encode: (ZnNullEncoder new encodeString: 'tamèreenslipdeguerre').

=> 'dGFt6HJlZW5zbGlwZGVndWVycmU='

The last two are often the same, and thus equivalent to #asByteArray, but not always.

HTH,

Sven

On 11 Jun 2014, at 21:56, François Stephany <[hidden email]> wrote:

> I might hit some problem with Base64 encoding in there.
> It seems that Pharo does not use UTF8 for its Base64 encoding.
>
> I'm probably missing something related to Base64 encoding...
>
> In Pharo 3.0:
>
> ZnBase64Encoder new encode: 'tamèreenslipdeguerre' asByteArray.
> -> 'dGFt6HJlZW5zbGlwZGVndWVycmU='
>
> 'tamèreenslipdeguerre' base64Encoded.
> -> 'dGFt6HJlZW5zbGlwZGVndWVycmU='
>
> In Ruby 2.0:
>
> Base64.strict_encode64("tamèreenslipdeguerre")
> -> dGFtw6hyZWVuc2xpcGRlZ3VlcnJl
>
> http://www.motobit.com/util/base64-decoder-encoder.asp
> (iso-8859-1)
> tamèreenslipdeguerre -> dGFt6HJlZW5zbGlwZGVndWVycmU=
>
> http://www.motobit.com/util/base64-decoder-encoder.asp
> (utf8)
> -tamèreenslipdeguerre > dGFtw6hyZWVuc2xpcGRlZ3VlcnJl



Reply | Threaded
Open this post in threaded view
|

Re: Base64 encoding + UTF8 ?

Henrik Sperre Johansen
In reply to this post by Sven Van Caekenberghe-2

On 11 Jun 2014, at 10:28 , Sven Van Caekenberghe <[hidden email]> wrote:

> Bonsoir François,
>
> From the class comment of ZnBase64Encoder:
>
> [...]
> Note that to encode a String as Base64, you first have to encode the characters as bytes using a character encoder.
> [...]
>
> Sending #asByteArray to a String is the same as doing no encoding (or doing null encoding).
>
> Consider:
>
> ZnBase64Encoder new encode: (ZnUTF8Encoder new encodeString: 'tamèreenslipdeguerre').
>
> => 'dGFtw6hyZWVuc2xpcGRlZ3VlcnJl'
>
> ZnBase64Encoder new encode: (ZnByteEncoder iso88591 encodeString: 'tamèreenslipdeguerre').
>
> => 'dGFt6HJlZW5zbGlwZGVndWVycmU='
>
> ZnBase64Encoder new encode: (ZnNullEncoder new encodeString: 'tamèreenslipdeguerre').
>
> => 'dGFt6HJlZW5zbGlwZGVndWVycmU='
>
> The last two are often the same, and thus equivalent to #asByteArray, but not always.
>
> HTH,
>
> Sven
In other words, Base64 isn’t really an encoding, it’s a transfer format, whose purpose is to only transmit bytes with "safe" values that have no chance of being interpreted as control sequences by a set of protocols.

Encoding Strings -> Bytes is a separate concern.

Cheers,
Henry

signature.asc (859 bytes) Download Attachment