About encoding

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

About encoding

Stephane Ducasse-3
Hi sven

the web site I was using remove the file for my book.
So I copied the file on github.
When I open the file with texmate it tells that the encoding is western-latin1
but when I try to load it as follow I get an UTF-8 illegal error.

| lines |
lines := (ZnDefaultCharacterEncoder
  value: ZnCharacterEncoder latin1
  during: [
    ZnClient new
      get: 'https://raw.githubusercontent.com/SquareBracketAssociates/LearningOOPWithPharo/master/resources/listeDeMotsFrancaisFrGut.txt'
]) lines.

Do you have any idea?

Tx

Stef

Reply | Threaded
Open this post in threaded view
|

Re: About encoding

Sven Van Caekenberghe-2

> On 26 Sep 2017, at 17:25, Stephane Ducasse <[hidden email]> wrote:
>
> Hi sven
>
> the web site I was using remove the file for my book.
> So I copied the file on github.
> When I open the file with texmate it tells that the encoding is western-latin1
> but when I try to load it as follow I get an UTF-8 illegal error.
>
> | lines |
> lines := (ZnDefaultCharacterEncoder
>  value: ZnCharacterEncoder latin1
>  during: [
>    ZnClient new
>      get: 'https://raw.githubusercontent.com/SquareBracketAssociates/LearningOOPWithPharo/master/resources/listeDeMotsFrancaisFrGut.txt'
> ]) lines.
>
> Do you have any idea?
>
> Tx
>
> Stef

Any chance you can point me to the original file ?

The file is indeed in Latin1 encoded, but GitHub serves it as UTF-8 (it did not change the contents, but the meta data).

The default encoder option only works when the server says nothing, it does not override what the server says.

The only way to read it, is by reading it binary (which basically ignores the meta data) and then convert it manually:

(ZnCharacterEncoder latin1 decodeBytes:
  (ZnClient new
        beBinary;
        get: 'https://raw.githubusercontent.com/SquareBracketAssociates/LearningOOPWithPharo/master/resources/listeDeMotsFrancaisFrGut.txt')) lines.

But this is very ugly.

Best convert the original file to UTF-8 before uploading to GitHub.

Sven



Reply | Threaded
Open this post in threaded view
|

Re: About encoding

Stephane Ducasse-3
> Any chance you can point me to the original file ?

No they removed it
May be I could try to convert it to utf-8 (I do not know how to do it)

> The file is indeed in Latin1 encoded, but GitHub serves it as UTF-8 (it did not change the contents, but the meta data).

Ok I see the problem
>
> The default encoder option only works when the server says nothing, it does not override what the server says.

Ah ok.

> The only way to read it, is by reading it binary (which basically ignores the meta data) and then convert it manually:
>
> (ZnCharacterEncoder latin1 decodeBytes:
>   (ZnClient new
>         beBinary;
>         get: 'https://raw.githubusercontent.com/SquareBracketAssociates/LearningOOPWithPharo/master/resources/listeDeMotsFrancaisFrGut.txt')) lines.
>
> But this is very ugly.
>
> Best convert the original file to UTF-8 before uploading to GitHub.

OK I will try to leanr how to do it.


>
> Sven
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: About encoding

Sven Van Caekenberghe-2
You can convert it in Pharo, of course:

(FileLocator desktop / 'mots.txt') writeStreamDo: [ :out |
        out << (ZnCharacterEncoder latin1 decodeBytes:
(ZnClient new
        beBinary;
        get: 'https://raw.githubusercontent.com/SquareBracketAssociates/LearningOOPWithPharo/master/resources/listeDeMotsFrancaisFrGut.txt')) ].

You just take the string as it is in Pharo and write it out to a file, it is by default utf-8 encoded.

> On 26 Sep 2017, at 17:40, Stephane Ducasse <[hidden email]> wrote:
>
>> Any chance you can point me to the original file ?
>
> No they removed it
> May be I could try to convert it to utf-8 (I do not know how to do it)
>
>> The file is indeed in Latin1 encoded, but GitHub serves it as UTF-8 (it did not change the contents, but the meta data).
>
> Ok I see the problem
>>
>> The default encoder option only works when the server says nothing, it does not override what the server says.
>
> Ah ok.
>
>> The only way to read it, is by reading it binary (which basically ignores the meta data) and then convert it manually:
>>
>> (ZnCharacterEncoder latin1 decodeBytes:
>>  (ZnClient new
>>        beBinary;
>>        get: 'https://raw.githubusercontent.com/SquareBracketAssociates/LearningOOPWithPharo/master/resources/listeDeMotsFrancaisFrGut.txt')) lines.
>>
>> But this is very ugly.
>>
>> Best convert the original file to UTF-8 before uploading to GitHub.
>
> OK I will try to leanr how to do it.
>
>
>>
>> Sven
>>
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: About encoding

Stephane Ducasse-3
In reply to this post by Stephane Ducasse-3
I'm reading your chapter :)
Now I understand the file I found is totally bogus :)
But the first one I found is indeed encoded in latin1.
So I'm trying to convert it.




On Tue, Sep 26, 2017 at 5:40 PM, Stephane Ducasse
<[hidden email]> wrote:

>> Any chance you can point me to the original file ?
>
> No they removed it
> May be I could try to convert it to utf-8 (I do not know how to do it)
>
>> The file is indeed in Latin1 encoded, but GitHub serves it as UTF-8 (it did not change the contents, but the meta data).
>
> Ok I see the problem
>>
>> The default encoder option only works when the server says nothing, it does not override what the server says.
>
> Ah ok.
>
>> The only way to read it, is by reading it binary (which basically ignores the meta data) and then convert it manually:
>>
>> (ZnCharacterEncoder latin1 decodeBytes:
>>   (ZnClient new
>>         beBinary;
>>         get: 'https://raw.githubusercontent.com/SquareBracketAssociates/LearningOOPWithPharo/master/resources/listeDeMotsFrancaisFrGut.txt')) lines.
>>
>> But this is very ugly.
>>
>> Best convert the original file to UTF-8 before uploading to GitHub.
>
> OK I will try to leanr how to do it.
>
>
>>
>> Sven
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: About encoding

Stephane Ducasse-3
Now inspecting the file containent opens gtinspector and freezes Pharo :(
I think that I will remove this part of my book.
It is simpler.

On Tue, Sep 26, 2017 at 5:53 PM, Stephane Ducasse
<[hidden email]> wrote:

> I'm reading your chapter :)
> Now I understand the file I found is totally bogus :)
> But the first one I found is indeed encoded in latin1.
> So I'm trying to convert it.
>
>
>
>
> On Tue, Sep 26, 2017 at 5:40 PM, Stephane Ducasse
> <[hidden email]> wrote:
>>> Any chance you can point me to the original file ?
>>
>> No they removed it
>> May be I could try to convert it to utf-8 (I do not know how to do it)
>>
>>> The file is indeed in Latin1 encoded, but GitHub serves it as UTF-8 (it did not change the contents, but the meta data).
>>
>> Ok I see the problem
>>>
>>> The default encoder option only works when the server says nothing, it does not override what the server says.
>>
>> Ah ok.
>>
>>> The only way to read it, is by reading it binary (which basically ignores the meta data) and then convert it manually:
>>>
>>> (ZnCharacterEncoder latin1 decodeBytes:
>>>   (ZnClient new
>>>         beBinary;
>>>         get: 'https://raw.githubusercontent.com/SquareBracketAssociates/LearningOOPWithPharo/master/resources/listeDeMotsFrancaisFrGut.txt')) lines.
>>>
>>> But this is very ugly.
>>>
>>> Best convert the original file to UTF-8 before uploading to GitHub.
>>
>> OK I will try to leanr how to do it.
>>
>>
>>>
>>> Sven
>>>
>>>
>>>

Reply | Threaded
Open this post in threaded view
|

Re: About encoding

Stephane Ducasse-3
Here is a script that should work to convert from latin1 to utf-8.
Thanks to your book and trial and error.

| str wstr |
str := ('listeDeMotsFrancaisFrGut.txt' asFileReference readStreamDo: [ :in |
   (ZnCharacterReadStream on: in binary encoding: #latin1)
      upToEnd ]) lines.

'listeDeMotsFrancaisFrGutUTF8.txt' asFileReference writeStreamDo: [ :out |
   wstr := (ZnCharacterWriteStream on: out binary encoding: #utf8).
str do: [ :each | wstr nextPutAll: each. wstr crlf. ].
 ].

On Tue, Sep 26, 2017 at 6:01 PM, Stephane Ducasse
<[hidden email]> wrote:

> Now inspecting the file containent opens gtinspector and freezes Pharo :(
> I think that I will remove this part of my book.
> It is simpler.
>
> On Tue, Sep 26, 2017 at 5:53 PM, Stephane Ducasse
> <[hidden email]> wrote:
>> I'm reading your chapter :)
>> Now I understand the file I found is totally bogus :)
>> But the first one I found is indeed encoded in latin1.
>> So I'm trying to convert it.
>>
>>
>>
>>
>> On Tue, Sep 26, 2017 at 5:40 PM, Stephane Ducasse
>> <[hidden email]> wrote:
>>>> Any chance you can point me to the original file ?
>>>
>>> No they removed it
>>> May be I could try to convert it to utf-8 (I do not know how to do it)
>>>
>>>> The file is indeed in Latin1 encoded, but GitHub serves it as UTF-8 (it did not change the contents, but the meta data).
>>>
>>> Ok I see the problem
>>>>
>>>> The default encoder option only works when the server says nothing, it does not override what the server says.
>>>
>>> Ah ok.
>>>
>>>> The only way to read it, is by reading it binary (which basically ignores the meta data) and then convert it manually:
>>>>
>>>> (ZnCharacterEncoder latin1 decodeBytes:
>>>>   (ZnClient new
>>>>         beBinary;
>>>>         get: 'https://raw.githubusercontent.com/SquareBracketAssociates/LearningOOPWithPharo/master/resources/listeDeMotsFrancaisFrGut.txt')) lines.
>>>>
>>>> But this is very ugly.
>>>>
>>>> Best convert the original file to UTF-8 before uploading to GitHub.
>>>
>>> OK I will try to leanr how to do it.
>>>
>>>
>>>>
>>>> Sven
>>>>
>>>>
>>>>

Reply | Threaded
Open this post in threaded view
|

Re: About encoding

Sven Van Caekenberghe-2

> On 26 Sep 2017, at 18:09, Stephane Ducasse <[hidden email]> wrote:
>
> Here is a script that should work to convert from latin1 to utf-8.
> Thanks to your book and trial and error.
>
> | str wstr |
> str := ('listeDeMotsFrancaisFrGut.txt' asFileReference readStreamDo: [ :in |
>   (ZnCharacterReadStream on: in binary encoding: #latin1)
>      upToEnd ]) lines.
>
> 'listeDeMotsFrancaisFrGutUTF8.txt' asFileReference writeStreamDo: [ :out |
>   wstr := (ZnCharacterWriteStream on: out binary encoding: #utf8).
> str do: [ :each | wstr nextPutAll: each. wstr crlf. ].
> ].

Yes, that is correct (and using the newer encoders in both directions)

> On Tue, Sep 26, 2017 at 6:01 PM, Stephane Ducasse
> <[hidden email]> wrote:
>> Now inspecting the file containent opens gtinspector and freezes Pharo :(
>> I think that I will remove this part of my book.
>> It is simpler.
>>
>> On Tue, Sep 26, 2017 at 5:53 PM, Stephane Ducasse
>> <[hidden email]> wrote:
>>> I'm reading your chapter :)
>>> Now I understand the file I found is totally bogus :)
>>> But the first one I found is indeed encoded in latin1.
>>> So I'm trying to convert it.
>>>
>>>
>>>
>>>
>>> On Tue, Sep 26, 2017 at 5:40 PM, Stephane Ducasse
>>> <[hidden email]> wrote:
>>>>> Any chance you can point me to the original file ?
>>>>
>>>> No they removed it
>>>> May be I could try to convert it to utf-8 (I do not know how to do it)
>>>>
>>>>> The file is indeed in Latin1 encoded, but GitHub serves it as UTF-8 (it did not change the contents, but the meta data).
>>>>
>>>> Ok I see the problem
>>>>>
>>>>> The default encoder option only works when the server says nothing, it does not override what the server says.
>>>>
>>>> Ah ok.
>>>>
>>>>> The only way to read it, is by reading it binary (which basically ignores the meta data) and then convert it manually:
>>>>>
>>>>> (ZnCharacterEncoder latin1 decodeBytes:
>>>>>  (ZnClient new
>>>>>        beBinary;
>>>>>        get: 'https://raw.githubusercontent.com/SquareBracketAssociates/LearningOOPWithPharo/master/resources/listeDeMotsFrancaisFrGut.txt')) lines.
>>>>>
>>>>> But this is very ugly.
>>>>>
>>>>> Best convert the original file to UTF-8 before uploading to GitHub.
>>>>
>>>> OK I will try to leanr how to do it.
>>>>
>>>>
>>>>>
>>>>> Sven
>>>>>
>>>>>
>>>>>
>


Reply | Threaded
Open this post in threaded view
|

Re: About encoding

Stephane Ducasse-3
In reply to this post by Stephane Ducasse-3
So it works now. I converted the files in utf8 and now the book is not
broken anymore.
Thanks sven for at least taking the time to reply to my email. I helps
for my mental spirit.
Stef

On Tue, Sep 26, 2017 at 6:09 PM, Stephane Ducasse
<[hidden email]> wrote:

> Here is a script that should work to convert from latin1 to utf-8.
> Thanks to your book and trial and error.
>
> | str wstr |
> str := ('listeDeMotsFrancaisFrGut.txt' asFileReference readStreamDo: [ :in |
>    (ZnCharacterReadStream on: in binary encoding: #latin1)
>       upToEnd ]) lines.
>
> 'listeDeMotsFrancaisFrGutUTF8.txt' asFileReference writeStreamDo: [ :out |
>    wstr := (ZnCharacterWriteStream on: out binary encoding: #utf8).
> str do: [ :each | wstr nextPutAll: each. wstr crlf. ].
>  ].
>
> On Tue, Sep 26, 2017 at 6:01 PM, Stephane Ducasse
> <[hidden email]> wrote:
>> Now inspecting the file containent opens gtinspector and freezes Pharo :(
>> I think that I will remove this part of my book.
>> It is simpler.
>>
>> On Tue, Sep 26, 2017 at 5:53 PM, Stephane Ducasse
>> <[hidden email]> wrote:
>>> I'm reading your chapter :)
>>> Now I understand the file I found is totally bogus :)
>>> But the first one I found is indeed encoded in latin1.
>>> So I'm trying to convert it.
>>>
>>>
>>>
>>>
>>> On Tue, Sep 26, 2017 at 5:40 PM, Stephane Ducasse
>>> <[hidden email]> wrote:
>>>>> Any chance you can point me to the original file ?
>>>>
>>>> No they removed it
>>>> May be I could try to convert it to utf-8 (I do not know how to do it)
>>>>
>>>>> The file is indeed in Latin1 encoded, but GitHub serves it as UTF-8 (it did not change the contents, but the meta data).
>>>>
>>>> Ok I see the problem
>>>>>
>>>>> The default encoder option only works when the server says nothing, it does not override what the server says.
>>>>
>>>> Ah ok.
>>>>
>>>>> The only way to read it, is by reading it binary (which basically ignores the meta data) and then convert it manually:
>>>>>
>>>>> (ZnCharacterEncoder latin1 decodeBytes:
>>>>>   (ZnClient new
>>>>>         beBinary;
>>>>>         get: 'https://raw.githubusercontent.com/SquareBracketAssociates/LearningOOPWithPharo/master/resources/listeDeMotsFrancaisFrGut.txt')) lines.
>>>>>
>>>>> But this is very ugly.
>>>>>
>>>>> Best convert the original file to UTF-8 before uploading to GitHub.
>>>>
>>>> OK I will try to leanr how to do it.
>>>>
>>>>
>>>>>
>>>>> Sven
>>>>>
>>>>>
>>>>>