ZnInvalidUTF8 on response from squeaksource

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

ZnInvalidUTF8 on response from squeaksource

Patrick R.
Hi everyone,

I have been working on bringing http://squeaksource.com/ical/ up to speed for Squeak and wanted to make sure that it also works for Pharo. Therefore, I have created a travis build job for Squeak and Pharo (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which pulls the source from squeaksource.com.

Now the issue is that loading the package in Pharo fails with a GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this might be the result of the squeaksource page delivering the page as iso-8859-1 as it contains special characters. Any ideas on how to get this to work? I do not have access to the ical repository description and I would liek to avoid mirroring the whole repository on GitHub.

Thanks and bests
Patrick
Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

Ben Coman


On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[hidden email]> wrote:

>
> Hi everyone,
>
> I have been working on bringing http://squeaksource.com/ical/ up to speed
> for Squeak and wanted to make sure that it also works for Pharo. Therefore,
> I have created a travis build job for Squeak and Pharo
> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which pulls
> the source from squeaksource.com.
>
> Now the issue is that loading the package in Pharo fails with a
> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
> might be the result of the squeaksource page delivering the page as
> iso-8859-1 as it contains special characters. Any ideas on how to get this
> to work? I do not have access to the ical repository description and I would
> like to avoid mirroring the whole repository on GitHub.


In a fresh 60437 image, in Playground evaluating...

  Metacello new
       configuration: 'ICal';
       repository: 'github://codeZeilen/ical-smalltalk:master/repository';
       onConflict: [:ex | ex allow];
       load.
  ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not access http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for utf-8 encoding'


In a new fresh 60437 Image (i.e. empty package-cache)
  World menu > Monticello > +Repository > squeaksource.com...
     MCSqueaksourceRepository
        location: 'http://squeaksource.com/ical'
        user: ''
        password: ''
   ==> open repository then errors "MCRepositoryError: Could not access http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for utf-8 encoding"


In Chrome, opening http://www.squeaksource.com/ical
then clicking <Versions>
and the browser's View Page Source,
I see...
   <?xml version="1.0" encoding="iso-8859-1"?>

Googling: zinc iso-8859-1
finds... http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
but "ZnByteEncoder iso88591"
errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
and inspecting "ZnByteEncoder byteTextConverters keys sorted"
confirms this key is missing (@Sven, I'm curious why was this removed? )


Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
indicates IBM819 is an alias
and " ZnByteEncoder newForEncoding: 'ibm819' "
works okay

So in MCHttpRepository>>#loadAllFileNames
changing...
         queryAt: 'C' put: 'M;O=D' ;
         get.
to...
         queryAt: 'C' put: 'M;O=D' .
         ZnDefaultCharacterEncoder 
              value: (ZnByteEncoder newForEncoding: 'ibm819')
              during: [client get].

Then from Monticello opening the previously defined http://squeaksource.com/ical
works!!


Now I was hoping that reverting #loadAllFileNames
and in Playground doing...
    converters := ZnByteEncoder byteTextConverters.
    converters at: 'iso-8859-1' put: (converters at: 'ibm819').
might alleviate the problem, but no luck.


Anyone know a better way to deal with this that hardcoding the encoding into #loadAllFileNames?

cheers -ben
Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

David T. Lewis
squeaksource.com is still running on a quite old image, and I know that it
has problems with multibyte characters. If you are seeing problems related
to this, it's not the fault of Zinc.

If you can confirm that this is what is happening, then I guess it is time
to update that trusty old squeaksource.com image :-)

Dave

> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[hidden email]> wrote:
>>
>> Hi everyone,
>>
>> I have been working on bringing http://squeaksource.com/ical/ up to
>> speed
>> for Squeak and wanted to make sure that it also works for Pharo.
> Therefore,
>> I have created a travis build job for Squeak and Pharo
>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
> pulls
>> the source from squeaksource.com.
>>
>> Now the issue is that loading the package in Pharo fails with a
>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
>> might be the result of the squeaksource page delivering the page as
>> iso-8859-1 as it contains special characters. Any ideas on how to get
>> this
>> to work? I do not have access to the ical repository description and I
> would
>> like to avoid mirroring the whole repository on GitHub.
>
>
> In a fresh 60437 image, in Playground evaluating...
>
>   Metacello new
>        configuration: 'ICal';
>        repository: 'github://codeZeilen/ical-smalltalk:master/repository';
>        onConflict: [:ex | ex allow];
>        load.
>   ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
> access
> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> utf-8 encoding'
>
>
> In a new fresh 60437 Image (i.e. empty package-cache)
>   World menu > Monticello > +Repository > squeaksource.com...
>      MCSqueaksourceRepository
>         location: 'http://squeaksource.com/ical'
>         user: ''
>         password: ''
>    ==> open repository then errors "MCRepositoryError: Could not access
> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> utf-8 encoding"
>
>
> In Chrome, opening http://www.squeaksource.com/ical
> then clicking <Versions>
> and the browser's View Page Source,
> I see...
>    <?xml version="1.0" encoding="iso-8859-1"?>
>
> Googling: zinc iso-8859-1
> finds...
> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
> but "ZnByteEncoder iso88591"
> errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
> and inspecting "ZnByteEncoder byteTextConverters keys sorted"
> confirms this key is missing (@Sven, I'm curious why was this removed? )
>
>
> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
> indicates IBM819 is an alias
> and " ZnByteEncoder newForEncoding: 'ibm819' "
> works okay
>
> So in MCHttpRepository>>#loadAllFileNames
> changing...
>          queryAt: 'C' put: 'M;O=D' ;
>          get.
> to...
>          queryAt: 'C' put: 'M;O=D' .
>          ZnDefaultCharacterEncoder
>               value: (ZnByteEncoder newForEncoding: 'ibm819')
>               during: [client get].
>
> Then from Monticello opening the previously defined
> http://squeaksource.com/ical
> works!!
>
>
> Now I was hoping that reverting #loadAllFileNames
> and in Playground doing...
>     converters := ZnByteEncoder byteTextConverters.
>     converters at: 'iso-8859-1' put: (converters at: 'ibm819').
> might alleviate the problem, but no luck.
>
>
> Anyone know a better way to deal with this that hardcoding the encoding
> into #loadAllFileNames?
>
> cheers -ben
>



Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

Sven Van Caekenberghe-2
Hi,

This is a recurring issue. The problem is that the server serves a resource, in this case text/html, without specifying its encoding. Today, when no encoding is specified, we default to UTF-8. In this case the server silently serves a resource which is ISO-8895-1 encoded.

The error is triggered by accessing the following URL:

ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself.

If you inspect the response object inside the http client, you will see that the content-type is text/html. So Zn parses the incoming text using UTF-8 which fails (Zn encoders are strict by default).

Here is how to change the default during a call:

ZnDefaultCharacterEncoder
  value: ZnCharacterEncoder iso88591
  during: [ ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself ].

The solution would be that the server adds the proper charset specification.

Consider the default in Pharo:

ZnMimeType textHtml => text/html;charset=utf-8

The server should serve this resource using the following Content-Type:

text/html;charset=iso-8859-1

This is the server's responsibility. The page in question is the MC index page, which would normally be dynamically generated. Somewhere the server decides on the encoding. That encoding does not have to change, but it should be properly indicated in the HTTP response headers.

HTH,

Sven

> On 15 Mar 2017, at 17:42, David T. Lewis <[hidden email]> wrote:
>
> squeaksource.com is still running on a quite old image, and I know that it
> has problems with multibyte characters. If you are seeing problems related
> to this, it's not the fault of Zinc.
>
> If you can confirm that this is what is happening, then I guess it is time
> to update that trusty old squeaksource.com image :-)
>
> Dave
>
>> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[hidden email]> wrote:
>>>
>>> Hi everyone,
>>>
>>> I have been working on bringing http://squeaksource.com/ical/ up to
>>> speed
>>> for Squeak and wanted to make sure that it also works for Pharo.
>> Therefore,
>>> I have created a travis build job for Squeak and Pharo
>>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
>> pulls
>>> the source from squeaksource.com.
>>>
>>> Now the issue is that loading the package in Pharo fails with a
>>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
>>> might be the result of the squeaksource page delivering the page as
>>> iso-8859-1 as it contains special characters. Any ideas on how to get
>>> this
>>> to work? I do not have access to the ical repository description and I
>> would
>>> like to avoid mirroring the whole repository on GitHub.
>>
>>
>> In a fresh 60437 image, in Playground evaluating...
>>
>>  Metacello new
>>       configuration: 'ICal';
>>       repository: 'github://codeZeilen/ical-smalltalk:master/repository';
>>       onConflict: [:ex | ex allow];
>>       load.
>>  ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
>> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
>> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
>> access
>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
>> utf-8 encoding'
>>
>>
>> In a new fresh 60437 Image (i.e. empty package-cache)
>>  World menu > Monticello > +Repository > squeaksource.com...
>>     MCSqueaksourceRepository
>>        location: 'http://squeaksource.com/ical'
>>        user: ''
>>        password: ''
>>   ==> open repository then errors "MCRepositoryError: Could not access
>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
>> utf-8 encoding"
>>
>>
>> In Chrome, opening http://www.squeaksource.com/ical
>> then clicking <Versions>
>> and the browser's View Page Source,
>> I see...
>>   <?xml version="1.0" encoding="iso-8859-1"?>
>>
>> Googling: zinc iso-8859-1
>> finds...
>> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
>> but "ZnByteEncoder iso88591"
>> errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
>> and inspecting "ZnByteEncoder byteTextConverters keys sorted"
>> confirms this key is missing (@Sven, I'm curious why was this removed? )
>>
>>
>> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
>> indicates IBM819 is an alias
>> and " ZnByteEncoder newForEncoding: 'ibm819' "
>> works okay
>>
>> So in MCHttpRepository>>#loadAllFileNames
>> changing...
>>         queryAt: 'C' put: 'M;O=D' ;
>>         get.
>> to...
>>         queryAt: 'C' put: 'M;O=D' .
>>         ZnDefaultCharacterEncoder
>>              value: (ZnByteEncoder newForEncoding: 'ibm819')
>>              during: [client get].
>>
>> Then from Monticello opening the previously defined
>> http://squeaksource.com/ical
>> works!!
>>
>>
>> Now I was hoping that reverting #loadAllFileNames
>> and in Playground doing...
>>    converters := ZnByteEncoder byteTextConverters.
>>    converters at: 'iso-8859-1' put: (converters at: 'ibm819').
>> might alleviate the problem, but no luck.
>>
>>
>> Anyone know a better way to deal with this that hardcoding the encoding
>> into #loadAllFileNames?
>>
>> cheers -ben
>>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

Patrick R.
In reply to this post by David T. Lewis
Thanks for looking into this :)

@Dave: Can you explain what you would have expected to happen here? I see the point
that squeaksource could also encode the response as UTF-8. However, currently the
page is correctly encoded and delivered in iso-8859-1.  From the error message I read that Zinc
is nevertheless trying to decode it as UTF-8 which fails when it encounters a character with
a code point > 127.

Bests
Patrick
________________________________________
From: Pharo-dev <[hidden email]> on behalf of David T. Lewis <[hidden email]>
Sent: Wednesday, March 15, 2017 17:42
To: Pharo Development List
Subject: Re: [Pharo-dev] ZnInvalidUTF8 on response from squeaksource

squeaksource.com is still running on a quite old image, and I know that it
has problems with multibyte characters. If you are seeing problems related
to this, it's not the fault of Zinc.

If you can confirm that this is what is happening, then I guess it is time
to update that trusty old squeaksource.com image :-)

Dave

> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[hidden email]> wrote:
>>
>> Hi everyone,
>>
>> I have been working on bringing http://squeaksource.com/ical/ up to
>> speed
>> for Squeak and wanted to make sure that it also works for Pharo.
> Therefore,
>> I have created a travis build job for Squeak and Pharo
>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
> pulls
>> the source from squeaksource.com.
>>
>> Now the issue is that loading the package in Pharo fails with a
>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
>> might be the result of the squeaksource page delivering the page as
>> iso-8859-1 as it contains special characters. Any ideas on how to get
>> this
>> to work? I do not have access to the ical repository description and I
> would
>> like to avoid mirroring the whole repository on GitHub.
>
>
> In a fresh 60437 image, in Playground evaluating...
>
>   Metacello new
>        configuration: 'ICal';
>        repository: 'github://codeZeilen/ical-smalltalk:master/repository';
>        onConflict: [:ex | ex allow];
>        load.
>   ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
> access
> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> utf-8 encoding'
>
>
> In a new fresh 60437 Image (i.e. empty package-cache)
>   World menu > Monticello > +Repository > squeaksource.com...
>      MCSqueaksourceRepository
>         location: 'http://squeaksource.com/ical'
>         user: ''
>         password: ''
>    ==> open repository then errors "MCRepositoryError: Could not access
> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> utf-8 encoding"
>
>
> In Chrome, opening http://www.squeaksource.com/ical
> then clicking <Versions>
> and the browser's View Page Source,
> I see...
>    <?xml version="1.0" encoding="iso-8859-1"?>
>
> Googling: zinc iso-8859-1
> finds...
> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
> but "ZnByteEncoder iso88591"
> errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
> and inspecting "ZnByteEncoder byteTextConverters keys sorted"
> confirms this key is missing (@Sven, I'm curious why was this removed? )
>
>
> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
> indicates IBM819 is an alias
> and " ZnByteEncoder newForEncoding: 'ibm819' "
> works okay
>
> So in MCHttpRepository>>#loadAllFileNames
> changing...
>          queryAt: 'C' put: 'M;O=D' ;
>          get.
> to...
>          queryAt: 'C' put: 'M;O=D' .
>          ZnDefaultCharacterEncoder
>               value: (ZnByteEncoder newForEncoding: 'ibm819')
>               during: [client get].
>
> Then from Monticello opening the previously defined
> http://squeaksource.com/ical
> works!!
>
>
> Now I was hoping that reverting #loadAllFileNames
> and in Playground doing...
>     converters := ZnByteEncoder byteTextConverters.
>     converters at: 'iso-8859-1' put: (converters at: 'ibm819').
> might alleviate the problem, but no luck.
>
>
> Anyone know a better way to deal with this that hardcoding the encoding
> into #loadAllFileNames?
>
> cheers -ben
>




Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

Ben Coman
In reply to this post by Sven Van Caekenberghe-2
On Thu, Mar 16, 2017 at 1:25 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Hi,
>
> This is a recurring issue.


It would be cool if some magic(TM) could raise a dialog with an
explanation and pull-down list to select an encoding - but maybe that
is too much hand holding.


>
> The problem is that the server serves a resource, in this case text/html, without specifying its encoding.

I just bumped into [1] while browsing around to learn more, but I
don't know fully how to interpret it.
What do you make of it saying "An XHTML5 document is served as XML and
has XML syntax. XML parsers do not recognise the encoding declarations
in meta elements. They only recognise the XML declaration. Here is an
example:
    <?xml version="1.0" encoding="utf-8"?>
    <!DOCTYPE html ....

compared to the page having...
    <?xml version="1.0" encoding="iso-8859-1"?>

cheers -ben

[1]    https://www.w3.org/International/questions/qa-html-encoding-declarations


>
> Today, when no encoding is specified, we default to UTF-8. In this case the server silently serves a resource which is ISO-8895-1 encoded.
>
> The error is triggered by accessing the following URL:
>
> ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself.
>
> If you inspect the response object inside the http client, you will see that the content-type is text/html. So Zn parses the incoming text using UTF-8 which fails (Zn encoders are strict by default).
>
> Here is how to change the default during a call:
>
> ZnDefaultCharacterEncoder
>   value: ZnCharacterEncoder iso88591
>   during: [ ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself ].
>
> The solution would be that the server adds the proper charset specification.
>
> Consider the default in Pharo:
>
> ZnMimeType textHtml => text/html;charset=utf-8
>
> The server should serve this resource using the following Content-Type:
>
> text/html;charset=iso-8859-1
>
> This is the server's responsibility. The page in question is the MC index page, which would normally be dynamically generated. Somewhere the server decides on the encoding. That encoding does not have to change, but it should be properly indicated in the HTTP response headers.
>
> HTH,
>
> Sven
>
> > On 15 Mar 2017, at 17:42, David T. Lewis <[hidden email]> wrote:
> >
> > squeaksource.com is still running on a quite old image, and I know that it
> > has problems with multibyte characters. If you are seeing problems related
> > to this, it's not the fault of Zinc.
> >
> > If you can confirm that this is what is happening, then I guess it is time
> > to update that trusty old squeaksource.com image :-)
> >
> > Dave
> >
> >> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[hidden email]> wrote:
> >>>
> >>> Hi everyone,
> >>>
> >>> I have been working on bringing http://squeaksource.com/ical/ up to
> >>> speed
> >>> for Squeak and wanted to make sure that it also works for Pharo.
> >> Therefore,
> >>> I have created a travis build job for Squeak and Pharo
> >>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
> >> pulls
> >>> the source from squeaksource.com.
> >>>
> >>> Now the issue is that loading the package in Pharo fails with a
> >>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
> >>> might be the result of the squeaksource page delivering the page as
> >>> iso-8859-1 as it contains special characters. Any ideas on how to get
> >>> this
> >>> to work? I do not have access to the ical repository description and I
> >> would
> >>> like to avoid mirroring the whole repository on GitHub.
> >>
> >>
> >> In a fresh 60437 image, in Playground evaluating...
> >>
> >>  Metacello new
> >>       configuration: 'ICal';
> >>       repository: 'github://codeZeilen/ical-smalltalk:master/repository';
> >>       onConflict: [:ex | ex allow];
> >>       load.
> >>  ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
> >> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
> >> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
> >> access
> >> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> >> utf-8 encoding'
> >>
> >>
> >> In a new fresh 60437 Image (i.e. empty package-cache)
> >>  World menu > Monticello > +Repository > squeaksource.com...
> >>     MCSqueaksourceRepository
> >>        location: 'http://squeaksource.com/ical'
> >>        user: ''
> >>        password: ''
> >>   ==> open repository then errors "MCRepositoryError: Could not access
> >> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> >> utf-8 encoding"
> >>
> >>
> >> In Chrome, opening http://www.squeaksource.com/ical
> >> then clicking <Versions>
> >> and the browser's View Page Source,
> >> I see...
> >>   <?xml version="1.0" encoding="iso-8859-1"?>
> >>
> >> Googling: zinc iso-8859-1
> >> finds...
> >> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
> >> but "ZnByteEncoder iso88591"
> >> errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
> >> and inspecting "ZnByteEncoder byteTextConverters keys sorted"
> >> confirms this key is missing (@Sven, I'm curious why was this removed? )
> >>
> >>
> >> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
> >> indicates IBM819 is an alias
> >> and " ZnByteEncoder newForEncoding: 'ibm819' "
> >> works okay
> >>
> >> So in MCHttpRepository>>#loadAllFileNames
> >> changing...
> >>         queryAt: 'C' put: 'M;O=D' ;
> >>         get.
> >> to...
> >>         queryAt: 'C' put: 'M;O=D' .
> >>         ZnDefaultCharacterEncoder
> >>              value: (ZnByteEncoder newForEncoding: 'ibm819')
> >>              during: [client get].
> >>
> >> Then from Monticello opening the previously defined
> >> http://squeaksource.com/ical
> >> works!!
> >>
> >>
> >> Now I was hoping that reverting #loadAllFileNames
> >> and in Playground doing...
> >>    converters := ZnByteEncoder byteTextConverters.
> >>    converters at: 'iso-8859-1' put: (converters at: 'ibm819').
> >> might alleviate the problem, but no luck.
> >>
> >>
> >> Anyone know a better way to deal with this that hardcoding the encoding
> >> into #loadAllFileNames?
> >>
> >> cheers -ben
> >>
> >
> >
> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

Sven Van Caekenberghe-2

> On 15 Mar 2017, at 19:16, Ben Coman <[hidden email]> wrote:
>
> On Thu, Mar 16, 2017 at 1:25 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>> Hi,
>>
>> This is a recurring issue.
>
>
> It would be cool if some magic(TM) could raise a dialog with an
> explanation and pull-down list to select an encoding - but maybe that
> is too much hand holding.
>
>
>>
>> The problem is that the server serves a resource, in this case text/html, without specifying its encoding.
>
> I just bumped into [1] while browsing around to learn more, but I
> don't know fully how to interpret it.
> What do you make of it saying "An XHTML5 document is served as XML and
> has XML syntax. XML parsers do not recognise the encoding declarations
> in meta elements. They only recognise the XML declaration. Here is an
> example:
>    <?xml version="1.0" encoding="utf-8"?>
>    <!DOCTYPE html ....
>
> compared to the page having...
>    <?xml version="1.0" encoding="iso-8859-1"?>
>
> cheers -ben
>
> [1]    https://www.w3.org/International/questions/qa-html-encoding-declarations

I knew you would be going in this direction, but there is a difference in levels.

HTTP is a protocol that does not concern itself with the contents of the resources (documents) it transports. HTTP headers are used as meta data describing the otherwise opaque content. If the headers are wrong, there is not much that can be done.

I believe our XML framework does this interpretation (changing encoding while parsing) correctly, but that is another level (not the transport but the contents itself, it *knows* about XML).

>> Today, when no encoding is specified, we default to UTF-8. In this case the server silently serves a resource which is ISO-8895-1 encoded.
>>
>> The error is triggered by accessing the following URL:
>>
>> ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself.
>>
>> If you inspect the response object inside the http client, you will see that the content-type is text/html. So Zn parses the incoming text using UTF-8 which fails (Zn encoders are strict by default).
>>
>> Here is how to change the default during a call:
>>
>> ZnDefaultCharacterEncoder
>>  value: ZnCharacterEncoder iso88591
>>  during: [ ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself ].
>>
>> The solution would be that the server adds the proper charset specification.
>>
>> Consider the default in Pharo:
>>
>> ZnMimeType textHtml => text/html;charset=utf-8
>>
>> The server should serve this resource using the following Content-Type:
>>
>> text/html;charset=iso-8859-1
>>
>> This is the server's responsibility. The page in question is the MC index page, which would normally be dynamically generated. Somewhere the server decides on the encoding. That encoding does not have to change, but it should be properly indicated in the HTTP response headers.
>>
>> HTH,
>>
>> Sven
>>
>>> On 15 Mar 2017, at 17:42, David T. Lewis <[hidden email]> wrote:
>>>
>>> squeaksource.com is still running on a quite old image, and I know that it
>>> has problems with multibyte characters. If you are seeing problems related
>>> to this, it's not the fault of Zinc.
>>>
>>> If you can confirm that this is what is happening, then I guess it is time
>>> to update that trusty old squeaksource.com image :-)
>>>
>>> Dave
>>>
>>>> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[hidden email]> wrote:
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I have been working on bringing http://squeaksource.com/ical/ up to
>>>>> speed
>>>>> for Squeak and wanted to make sure that it also works for Pharo.
>>>> Therefore,
>>>>> I have created a travis build job for Squeak and Pharo
>>>>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
>>>> pulls
>>>>> the source from squeaksource.com.
>>>>>
>>>>> Now the issue is that loading the package in Pharo fails with a
>>>>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
>>>>> might be the result of the squeaksource page delivering the page as
>>>>> iso-8859-1 as it contains special characters. Any ideas on how to get
>>>>> this
>>>>> to work? I do not have access to the ical repository description and I
>>>> would
>>>>> like to avoid mirroring the whole repository on GitHub.
>>>>
>>>>
>>>> In a fresh 60437 image, in Playground evaluating...
>>>>
>>>> Metacello new
>>>>      configuration: 'ICal';
>>>>      repository: 'github://codeZeilen/ical-smalltalk:master/repository';
>>>>      onConflict: [:ex | ex allow];
>>>>      load.
>>>> ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
>>>> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
>>>> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
>>>> access
>>>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
>>>> utf-8 encoding'
>>>>
>>>>
>>>> In a new fresh 60437 Image (i.e. empty package-cache)
>>>> World menu > Monticello > +Repository > squeaksource.com...
>>>>    MCSqueaksourceRepository
>>>>       location: 'http://squeaksource.com/ical'
>>>>       user: ''
>>>>       password: ''
>>>>  ==> open repository then errors "MCRepositoryError: Could not access
>>>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
>>>> utf-8 encoding"
>>>>
>>>>
>>>> In Chrome, opening http://www.squeaksource.com/ical
>>>> then clicking <Versions>
>>>> and the browser's View Page Source,
>>>> I see...
>>>>  <?xml version="1.0" encoding="iso-8859-1"?>
>>>>
>>>> Googling: zinc iso-8859-1
>>>> finds...
>>>> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
>>>> but "ZnByteEncoder iso88591"
>>>> errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
>>>> and inspecting "ZnByteEncoder byteTextConverters keys sorted"
>>>> confirms this key is missing (@Sven, I'm curious why was this removed? )
>>>>
>>>>
>>>> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
>>>> indicates IBM819 is an alias
>>>> and " ZnByteEncoder newForEncoding: 'ibm819' "
>>>> works okay
>>>>
>>>> So in MCHttpRepository>>#loadAllFileNames
>>>> changing...
>>>>        queryAt: 'C' put: 'M;O=D' ;
>>>>        get.
>>>> to...
>>>>        queryAt: 'C' put: 'M;O=D' .
>>>>        ZnDefaultCharacterEncoder
>>>>             value: (ZnByteEncoder newForEncoding: 'ibm819')
>>>>             during: [client get].
>>>>
>>>> Then from Monticello opening the previously defined
>>>> http://squeaksource.com/ical
>>>> works!!
>>>>
>>>>
>>>> Now I was hoping that reverting #loadAllFileNames
>>>> and in Playground doing...
>>>>   converters := ZnByteEncoder byteTextConverters.
>>>>   converters at: 'iso-8859-1' put: (converters at: 'ibm819').
>>>> might alleviate the problem, but no luck.
>>>>
>>>>
>>>> Anyone know a better way to deal with this that hardcoding the encoding
>>>> into #loadAllFileNames?
>>>>
>>>> cheers -ben
>>>>
>>>
>>>
>>>
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

monty-3
In reply to this post by Ben Coman


> Sent: Wednesday, March 15, 2017 at 2:16 PM
> From: "Ben Coman" <[hidden email]>
> To: "Pharo Development List" <[hidden email]>
> Subject: Re: [Pharo-dev] ZnInvalidUTF8 on response from squeaksource
>
> On Thu, Mar 16, 2017 at 1:25 AM, Sven Van Caekenberghe <[hidden email]> wrote:
> >
> > Hi,
> >
> > This is a recurring issue.
>
>
> It would be cool if some magic(TM) could raise a dialog with an
> explanation and pull-down list to select an encoding - but maybe that
> is too much hand holding.

That's an interesting idea.

>
> >
> > The problem is that the server serves a resource, in this case text/html, without specifying its encoding.
>
> I just bumped into [1] while browsing around to learn more, but I
> don't know fully how to interpret it.
> What do you make of it saying "An XHTML5 document is served as XML and
> has XML syntax. XML parsers do not recognise the encoding declarations
> in meta elements. They only recognise the XML declaration. Here is an
> example:
>     <?xml version="1.0" encoding="utf-8"?>
>     <!DOCTYPE html ....
>
> compared to the page having...
>     <?xml version="1.0" encoding="iso-8859-1"?>
>
> cheers -ben

That isn't Zinc's responsibility; it just handles HTTP. The HTML or XML parser using it should disable Zinc's automatic decoding based on Content-Type and do its own decoding of the raw response (which can still be done using Zinc's decoders) informed by the content of the response and not just its Content-Type. XMLParser and XMLParserHTML both use Zinc this way.

> [1]    https://www.w3.org/International/questions/qa-html-encoding-declarations
>
>
> >
> > Today, when no encoding is specified, we default to UTF-8. In this case the server silently serves a resource which is ISO-8895-1 encoded.
> >
> > The error is triggered by accessing the following URL:
> >
> > ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself.
> >
> > If you inspect the response object inside the http client, you will see that the content-type is text/html. So Zn parses the incoming text using UTF-8 which fails (Zn encoders are strict by default).
> >
> > Here is how to change the default during a call:
> >
> > ZnDefaultCharacterEncoder
> >   value: ZnCharacterEncoder iso88591
> >   during: [ ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself ].
> >
> > The solution would be that the server adds the proper charset specification.
> >
> > Consider the default in Pharo:
> >
> > ZnMimeType textHtml => text/html;charset=utf-8
> >
> > The server should serve this resource using the following Content-Type:
> >
> > text/html;charset=iso-8859-1
> >
> > This is the server's responsibility. The page in question is the MC index page, which would normally be dynamically generated. Somewhere the server decides on the encoding. That encoding does not have to change, but it should be properly indicated in the HTTP response headers.
> >
> > HTH,
> >
> > Sven
> >
> > > On 15 Mar 2017, at 17:42, David T. Lewis <[hidden email]> wrote:
> > >
> > > squeaksource.com is still running on a quite old image, and I know that it
> > > has problems with multibyte characters. If you are seeing problems related
> > > to this, it's not the fault of Zinc.
> > >
> > > If you can confirm that this is what is happening, then I guess it is time
> > > to update that trusty old squeaksource.com image :-)
> > >
> > > Dave
> > >
> > >> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[hidden email]> wrote:
> > >>>
> > >>> Hi everyone,
> > >>>
> > >>> I have been working on bringing http://squeaksource.com/ical/ up to
> > >>> speed
> > >>> for Squeak and wanted to make sure that it also works for Pharo.
> > >> Therefore,
> > >>> I have created a travis build job for Squeak and Pharo
> > >>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
> > >> pulls
> > >>> the source from squeaksource.com.
> > >>>
> > >>> Now the issue is that loading the package in Pharo fails with a
> > >>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
> > >>> might be the result of the squeaksource page delivering the page as
> > >>> iso-8859-1 as it contains special characters. Any ideas on how to get
> > >>> this
> > >>> to work? I do not have access to the ical repository description and I
> > >> would
> > >>> like to avoid mirroring the whole repository on GitHub.
> > >>
> > >>
> > >> In a fresh 60437 image, in Playground evaluating...
> > >>
> > >>  Metacello new
> > >>       configuration: 'ICal';
> > >>       repository: 'github://codeZeilen/ical-smalltalk:master/repository';
> > >>       onConflict: [:ex | ex allow];
> > >>       load.
> > >>  ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
> > >> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
> > >> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
> > >> access
> > >> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> > >> utf-8 encoding'
> > >>
> > >>
> > >> In a new fresh 60437 Image (i.e. empty package-cache)
> > >>  World menu > Monticello > +Repository > squeaksource.com...
> > >>     MCSqueaksourceRepository
> > >>        location: 'http://squeaksource.com/ical'
> > >>        user: ''
> > >>        password: ''
> > >>   ==> open repository then errors "MCRepositoryError: Could not access
> > >> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> > >> utf-8 encoding"
> > >>
> > >>
> > >> In Chrome, opening http://www.squeaksource.com/ical
> > >> then clicking <Versions>
> > >> and the browser's View Page Source,
> > >> I see...
> > >>   <?xml version="1.0" encoding="iso-8859-1"?>
> > >>
> > >> Googling: zinc iso-8859-1
> > >> finds...
> > >> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
> > >> but "ZnByteEncoder iso88591"
> > >> errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
> > >> and inspecting "ZnByteEncoder byteTextConverters keys sorted"
> > >> confirms this key is missing (@Sven, I'm curious why was this removed? )
> > >>
> > >>
> > >> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
> > >> indicates IBM819 is an alias
> > >> and " ZnByteEncoder newForEncoding: 'ibm819' "
> > >> works okay
> > >>
> > >> So in MCHttpRepository>>#loadAllFileNames
> > >> changing...
> > >>         queryAt: 'C' put: 'M;O=D' ;
> > >>         get.
> > >> to...
> > >>         queryAt: 'C' put: 'M;O=D' .
> > >>         ZnDefaultCharacterEncoder
> > >>              value: (ZnByteEncoder newForEncoding: 'ibm819')
> > >>              during: [client get].
> > >>
> > >> Then from Monticello opening the previously defined
> > >> http://squeaksource.com/ical
> > >> works!!
> > >>
> > >>
> > >> Now I was hoping that reverting #loadAllFileNames
> > >> and in Playground doing...
> > >>    converters := ZnByteEncoder byteTextConverters.
> > >>    converters at: 'iso-8859-1' put: (converters at: 'ibm819').
> > >> might alleviate the problem, but no luck.
> > >>
> > >>
> > >> Anyone know a better way to deal with this that hardcoding the encoding
> > >> into #loadAllFileNames?
> > >>
> > >> cheers -ben
> > >>
> > >
> > >
> > >
> >
> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

Sven Van Caekenberghe-2
In reply to this post by Patrick R.

> On 15 Mar 2017, at 18:28, Rein, Patrick <[hidden email]> wrote:
>
> Thanks for looking into this :)
>
> @Dave: Can you explain what you would have expected to happen here? I see the point
> that squeaksource could also encode the response as UTF-8. However, currently the
> page is correctly encoded and delivered in iso-8859-1.

It is indeed encoded and delivered correctly, but its content-type (text/html) is missing a charset spec (text/html;charset=iso-8859-1). The server assumes that the client knows its default is ISO-8859-1, while the client cannot know or assume that.

There are two solutions: deliver the index file UTF-8 encoded or deliver the index file ISO-8859-1 encoded as it is now, but use the correct mime-type (text/html;charset=iso-8859-1) - both are good.

> From the error message I read that Zinc
> is nevertheless trying to decode it as UTF-8 which fails when it encounters a character with
> a code point > 127.
>
> Bests
> Patrick
> ________________________________________
> From: Pharo-dev <[hidden email]> on behalf of David T. Lewis <[hidden email]>
> Sent: Wednesday, March 15, 2017 17:42
> To: Pharo Development List
> Subject: Re: [Pharo-dev] ZnInvalidUTF8 on response from squeaksource
>
> squeaksource.com is still running on a quite old image, and I know that it
> has problems with multibyte characters. If you are seeing problems related
> to this, it's not the fault of Zinc.
>
> If you can confirm that this is what is happening, then I guess it is time
> to update that trusty old squeaksource.com image :-)
>
> Dave
>
>> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[hidden email]> wrote:
>>>
>>> Hi everyone,
>>>
>>> I have been working on bringing http://squeaksource.com/ical/ up to
>>> speed
>>> for Squeak and wanted to make sure that it also works for Pharo.
>> Therefore,
>>> I have created a travis build job for Squeak and Pharo
>>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
>> pulls
>>> the source from squeaksource.com.
>>>
>>> Now the issue is that loading the package in Pharo fails with a
>>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
>>> might be the result of the squeaksource page delivering the page as
>>> iso-8859-1 as it contains special characters. Any ideas on how to get
>>> this
>>> to work? I do not have access to the ical repository description and I
>> would
>>> like to avoid mirroring the whole repository on GitHub.
>>
>>
>> In a fresh 60437 image, in Playground evaluating...
>>
>>  Metacello new
>>       configuration: 'ICal';
>>       repository: 'github://codeZeilen/ical-smalltalk:master/repository';
>>       onConflict: [:ex | ex allow];
>>       load.
>>  ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
>> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
>> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
>> access
>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
>> utf-8 encoding'
>>
>>
>> In a new fresh 60437 Image (i.e. empty package-cache)
>>  World menu > Monticello > +Repository > squeaksource.com...
>>     MCSqueaksourceRepository
>>        location: 'http://squeaksource.com/ical'
>>        user: ''
>>        password: ''
>>   ==> open repository then errors "MCRepositoryError: Could not access
>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
>> utf-8 encoding"
>>
>>
>> In Chrome, opening http://www.squeaksource.com/ical
>> then clicking <Versions>
>> and the browser's View Page Source,
>> I see...
>>   <?xml version="1.0" encoding="iso-8859-1"?>
>>
>> Googling: zinc iso-8859-1
>> finds...
>> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
>> but "ZnByteEncoder iso88591"
>> errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
>> and inspecting "ZnByteEncoder byteTextConverters keys sorted"
>> confirms this key is missing (@Sven, I'm curious why was this removed? )
>>
>>
>> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
>> indicates IBM819 is an alias
>> and " ZnByteEncoder newForEncoding: 'ibm819' "
>> works okay
>>
>> So in MCHttpRepository>>#loadAllFileNames
>> changing...
>>         queryAt: 'C' put: 'M;O=D' ;
>>         get.
>> to...
>>         queryAt: 'C' put: 'M;O=D' .
>>         ZnDefaultCharacterEncoder
>>              value: (ZnByteEncoder newForEncoding: 'ibm819')
>>              during: [client get].
>>
>> Then from Monticello opening the previously defined
>> http://squeaksource.com/ical
>> works!!
>>
>>
>> Now I was hoping that reverting #loadAllFileNames
>> and in Playground doing...
>>    converters := ZnByteEncoder byteTextConverters.
>>    converters at: 'iso-8859-1' put: (converters at: 'ibm819').
>> might alleviate the problem, but no luck.
>>
>>
>> Anyone know a better way to deal with this that hardcoding the encoding
>> into #loadAllFileNames?
>>
>> cheers -ben
>>
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

Patrick R.
In reply to this post by Ben Coman
Unfortunately, as I am trying to fix a Travis build, I can not change the call to Zinc.

To be clear about this: I also think that squeaksource should serve UTF-8.
However, at the same time a missing charset in a HTTP response means that the content
should be decoded as ISO-8859-1 [1]. So in general this does seem to me like an issue in Zinc.

I see that this might be a problem to change though, so I will consider moving the project at one point (or removing that damn umlaut :) ).

Bests
Patrick

[1] https://tools.ietf.org/html/rfc2616#section-3.7.1

________________________________________
From: Pharo-dev <[hidden email]> on behalf of Ben Coman <[hidden email]>
Sent: Wednesday, March 15, 2017 19:16
To: Pharo Development List
Subject: Re: [Pharo-dev] ZnInvalidUTF8 on response from squeaksource

On Thu, Mar 16, 2017 at 1:25 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Hi,
>
> This is a recurring issue.


It would be cool if some magic(TM) could raise a dialog with an
explanation and pull-down list to select an encoding - but maybe that
is too much hand holding.


>
> The problem is that the server serves a resource, in this case text/html, without specifying its encoding.

I just bumped into [1] while browsing around to learn more, but I
don't know fully how to interpret it.
What do you make of it saying "An XHTML5 document is served as XML and
has XML syntax. XML parsers do not recognise the encoding declarations
in meta elements. They only recognise the XML declaration. Here is an
example:
    <?xml version="1.0" encoding="utf-8"?>
    <!DOCTYPE html ....

compared to the page having...
    <?xml version="1.0" encoding="iso-8859-1"?>

cheers -ben

[1]    https://www.w3.org/International/questions/qa-html-encoding-declarations


>
> Today, when no encoding is specified, we default to UTF-8. In this case the server silently serves a resource which is ISO-8895-1 encoded.
>
> The error is triggered by accessing the following URL:
>
> ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself.
>
> If you inspect the response object inside the http client, you will see that the content-type is text/html. So Zn parses the incoming text using UTF-8 which fails (Zn encoders are strict by default).
>
> Here is how to change the default during a call:
>
> ZnDefaultCharacterEncoder
>   value: ZnCharacterEncoder iso88591
>   during: [ ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself ].
>
> The solution would be that the server adds the proper charset specification.
>
> Consider the default in Pharo:
>
> ZnMimeType textHtml => text/html;charset=utf-8
>
> The server should serve this resource using the following Content-Type:
>
> text/html;charset=iso-8859-1
>
> This is the server's responsibility. The page in question is the MC index page, which would normally be dynamically generated. Somewhere the server decides on the encoding. That encoding does not have to change, but it should be properly indicated in the HTTP response headers.
>
> HTH,
>
> Sven
>
> > On 15 Mar 2017, at 17:42, David T. Lewis <[hidden email]> wrote:
> >
> > squeaksource.com is still running on a quite old image, and I know that it
> > has problems with multibyte characters. If you are seeing problems related
> > to this, it's not the fault of Zinc.
> >
> > If you can confirm that this is what is happening, then I guess it is time
> > to update that trusty old squeaksource.com image :-)
> >
> > Dave
> >
> >> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[hidden email]> wrote:
> >>>
> >>> Hi everyone,
> >>>
> >>> I have been working on bringing http://squeaksource.com/ical/ up to
> >>> speed
> >>> for Squeak and wanted to make sure that it also works for Pharo.
> >> Therefore,
> >>> I have created a travis build job for Squeak and Pharo
> >>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
> >> pulls
> >>> the source from squeaksource.com.
> >>>
> >>> Now the issue is that loading the package in Pharo fails with a
> >>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
> >>> might be the result of the squeaksource page delivering the page as
> >>> iso-8859-1 as it contains special characters. Any ideas on how to get
> >>> this
> >>> to work? I do not have access to the ical repository description and I
> >> would
> >>> like to avoid mirroring the whole repository on GitHub.
> >>
> >>
> >> In a fresh 60437 image, in Playground evaluating...
> >>
> >>  Metacello new
> >>       configuration: 'ICal';
> >>       repository: 'github://codeZeilen/ical-smalltalk:master/repository';
> >>       onConflict: [:ex | ex allow];
> >>       load.
> >>  ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
> >> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
> >> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
> >> access
> >> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> >> utf-8 encoding'
> >>
> >>
> >> In a new fresh 60437 Image (i.e. empty package-cache)
> >>  World menu > Monticello > +Repository > squeaksource.com...
> >>     MCSqueaksourceRepository
> >>        location: 'http://squeaksource.com/ical'
> >>        user: ''
> >>        password: ''
> >>   ==> open repository then errors "MCRepositoryError: Could not access
> >> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> >> utf-8 encoding"
> >>
> >>
> >> In Chrome, opening http://www.squeaksource.com/ical
> >> then clicking <Versions>
> >> and the browser's View Page Source,
> >> I see...
> >>   <?xml version="1.0" encoding="iso-8859-1"?>
> >>
> >> Googling: zinc iso-8859-1
> >> finds...
> >> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
> >> but "ZnByteEncoder iso88591"
> >> errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
> >> and inspecting "ZnByteEncoder byteTextConverters keys sorted"
> >> confirms this key is missing (@Sven, I'm curious why was this removed? )
> >>
> >>
> >> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
> >> indicates IBM819 is an alias
> >> and " ZnByteEncoder newForEncoding: 'ibm819' "
> >> works okay
> >>
> >> So in MCHttpRepository>>#loadAllFileNames
> >> changing...
> >>         queryAt: 'C' put: 'M;O=D' ;
> >>         get.
> >> to...
> >>         queryAt: 'C' put: 'M;O=D' .
> >>         ZnDefaultCharacterEncoder
> >>              value: (ZnByteEncoder newForEncoding: 'ibm819')
> >>              during: [client get].
> >>
> >> Then from Monticello opening the previously defined
> >> http://squeaksource.com/ical
> >> works!!
> >>
> >>
> >> Now I was hoping that reverting #loadAllFileNames
> >> and in Playground doing...
> >>    converters := ZnByteEncoder byteTextConverters.
> >>    converters at: 'iso-8859-1' put: (converters at: 'ibm819').
> >> might alleviate the problem, but no luck.
> >>
> >>
> >> Anyone know a better way to deal with this that hardcoding the encoding
> >> into #loadAllFileNames?
> >>
> >> cheers -ben
> >>
> >
> >
> >
>
>


Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

Sven Van Caekenberghe-2

> On 15 Mar 2017, at 20:52, Rein, Patrick <[hidden email]> wrote:
>
> Unfortunately, as I am trying to fix a Travis build, I can not change the call to Zinc.
>
> To be clear about this: I also think that squeaksource should serve UTF-8.
> However, at the same time a missing charset in a HTTP response means that the content
> should be decoded as ISO-8859-1 [1]. So in general this does seem to me like an issue in Zinc.
>
> I see that this might be a problem to change though, so I will consider moving the project at one point (or removing that damn umlaut :) ).
>
> Bests
> Patrick
>
> [1] https://tools.ietf.org/html/rfc2616#section-3.7.1

Hmm, OK, I never saw that paragraph, interesting.
Thanks for the pointer, I will put it on my todo list to think about.

> ________________________________________
> From: Pharo-dev <[hidden email]> on behalf of Ben Coman <[hidden email]>
> Sent: Wednesday, March 15, 2017 19:16
> To: Pharo Development List
> Subject: Re: [Pharo-dev] ZnInvalidUTF8 on response from squeaksource
>
> On Thu, Mar 16, 2017 at 1:25 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>> Hi,
>>
>> This is a recurring issue.
>
>
> It would be cool if some magic(TM) could raise a dialog with an
> explanation and pull-down list to select an encoding - but maybe that
> is too much hand holding.
>
>
>>
>> The problem is that the server serves a resource, in this case text/html, without specifying its encoding.
>
> I just bumped into [1] while browsing around to learn more, but I
> don't know fully how to interpret it.
> What do you make of it saying "An XHTML5 document is served as XML and
> has XML syntax. XML parsers do not recognise the encoding declarations
> in meta elements. They only recognise the XML declaration. Here is an
> example:
>    <?xml version="1.0" encoding="utf-8"?>
>    <!DOCTYPE html ....
>
> compared to the page having...
>    <?xml version="1.0" encoding="iso-8859-1"?>
>
> cheers -ben
>
> [1]    https://www.w3.org/International/questions/qa-html-encoding-declarations
>
>
>>
>> Today, when no encoding is specified, we default to UTF-8. In this case the server silently serves a resource which is ISO-8895-1 encoded.
>>
>> The error is triggered by accessing the following URL:
>>
>> ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself.
>>
>> If you inspect the response object inside the http client, you will see that the content-type is text/html. So Zn parses the incoming text using UTF-8 which fails (Zn encoders are strict by default).
>>
>> Here is how to change the default during a call:
>>
>> ZnDefaultCharacterEncoder
>>  value: ZnCharacterEncoder iso88591
>>  during: [ ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself ].
>>
>> The solution would be that the server adds the proper charset specification.
>>
>> Consider the default in Pharo:
>>
>> ZnMimeType textHtml => text/html;charset=utf-8
>>
>> The server should serve this resource using the following Content-Type:
>>
>> text/html;charset=iso-8859-1
>>
>> This is the server's responsibility. The page in question is the MC index page, which would normally be dynamically generated. Somewhere the server decides on the encoding. That encoding does not have to change, but it should be properly indicated in the HTTP response headers.
>>
>> HTH,
>>
>> Sven
>>
>>> On 15 Mar 2017, at 17:42, David T. Lewis <[hidden email]> wrote:
>>>
>>> squeaksource.com is still running on a quite old image, and I know that it
>>> has problems with multibyte characters. If you are seeing problems related
>>> to this, it's not the fault of Zinc.
>>>
>>> If you can confirm that this is what is happening, then I guess it is time
>>> to update that trusty old squeaksource.com image :-)
>>>
>>> Dave
>>>
>>>> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[hidden email]> wrote:
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I have been working on bringing http://squeaksource.com/ical/ up to
>>>>> speed
>>>>> for Squeak and wanted to make sure that it also works for Pharo.
>>>> Therefore,
>>>>> I have created a travis build job for Squeak and Pharo
>>>>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
>>>> pulls
>>>>> the source from squeaksource.com.
>>>>>
>>>>> Now the issue is that loading the package in Pharo fails with a
>>>>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
>>>>> might be the result of the squeaksource page delivering the page as
>>>>> iso-8859-1 as it contains special characters. Any ideas on how to get
>>>>> this
>>>>> to work? I do not have access to the ical repository description and I
>>>> would
>>>>> like to avoid mirroring the whole repository on GitHub.
>>>>
>>>>
>>>> In a fresh 60437 image, in Playground evaluating...
>>>>
>>>> Metacello new
>>>>      configuration: 'ICal';
>>>>      repository: 'github://codeZeilen/ical-smalltalk:master/repository';
>>>>      onConflict: [:ex | ex allow];
>>>>      load.
>>>> ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
>>>> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
>>>> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
>>>> access
>>>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
>>>> utf-8 encoding'
>>>>
>>>>
>>>> In a new fresh 60437 Image (i.e. empty package-cache)
>>>> World menu > Monticello > +Repository > squeaksource.com...
>>>>    MCSqueaksourceRepository
>>>>       location: 'http://squeaksource.com/ical'
>>>>       user: ''
>>>>       password: ''
>>>>  ==> open repository then errors "MCRepositoryError: Could not access
>>>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
>>>> utf-8 encoding"
>>>>
>>>>
>>>> In Chrome, opening http://www.squeaksource.com/ical
>>>> then clicking <Versions>
>>>> and the browser's View Page Source,
>>>> I see...
>>>>  <?xml version="1.0" encoding="iso-8859-1"?>
>>>>
>>>> Googling: zinc iso-8859-1
>>>> finds...
>>>> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
>>>> but "ZnByteEncoder iso88591"
>>>> errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
>>>> and inspecting "ZnByteEncoder byteTextConverters keys sorted"
>>>> confirms this key is missing (@Sven, I'm curious why was this removed? )
>>>>
>>>>
>>>> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
>>>> indicates IBM819 is an alias
>>>> and " ZnByteEncoder newForEncoding: 'ibm819' "
>>>> works okay
>>>>
>>>> So in MCHttpRepository>>#loadAllFileNames
>>>> changing...
>>>>        queryAt: 'C' put: 'M;O=D' ;
>>>>        get.
>>>> to...
>>>>        queryAt: 'C' put: 'M;O=D' .
>>>>        ZnDefaultCharacterEncoder
>>>>             value: (ZnByteEncoder newForEncoding: 'ibm819')
>>>>             during: [client get].
>>>>
>>>> Then from Monticello opening the previously defined
>>>> http://squeaksource.com/ical
>>>> works!!
>>>>
>>>>
>>>> Now I was hoping that reverting #loadAllFileNames
>>>> and in Playground doing...
>>>>   converters := ZnByteEncoder byteTextConverters.
>>>>   converters at: 'iso-8859-1' put: (converters at: 'ibm819').
>>>> might alleviate the problem, but no luck.
>>>>
>>>>
>>>> Anyone know a better way to deal with this that hardcoding the encoding
>>>> into #loadAllFileNames?
>>>>
>>>> cheers -ben
>>>>
>>>
>>>
>>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

David T. Lewis
In reply to this post by Patrick R.
On Wed, Mar 15, 2017 at 05:28:37PM +0000, Rein, Patrick wrote:
> Thanks for looking into this :)
>
> @Dave: Can you explain what you would have expected to happen here? I see the point
> that squeaksource could also encode the response as UTF-8. However, currently the
> page is correctly encoded and delivered in iso-8859-1.  From the error message I read that Zinc
> is nevertheless trying to decode it as UTF-8 which fails when it encounters a character with
> a code point > 127.

I did not have anything specific in mind, I just wanted to mention it in
case it was relevant to the problem. Based on the follow up discussion, I
do not think that it is relevant.

For what it's worth, the specific issue in the older squeaksource image is
that when someone registers an account with a user name that requires a
WideString rather than a ByteString, it ends up scrambling the squeaksource
user interface. I am fairly sure that this is fixed in newer versions of
Squeak and/or Seaside, so for now it is just an annoyance for the person
who keeps squeaksource.com running. Whenever that person gets sufficiently
annoyed, I'm sure that the image will get updated ;-)

Dave


>
> Bests
> Patrick
> ________________________________________
> From: Pharo-dev <[hidden email]> on behalf of David T. Lewis <[hidden email]>
> Sent: Wednesday, March 15, 2017 17:42
> To: Pharo Development List
> Subject: Re: [Pharo-dev] ZnInvalidUTF8 on response from squeaksource
>
> squeaksource.com is still running on a quite old image, and I know that it
> has problems with multibyte characters. If you are seeing problems related
> to this, it's not the fault of Zinc.
>
> If you can confirm that this is what is happening, then I guess it is time
> to update that trusty old squeaksource.com image :-)
>
> Dave
>
> > On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[hidden email]> wrote:
> >>
> >> Hi everyone,
> >>
> >> I have been working on bringing http://squeaksource.com/ical/ up to
> >> speed
> >> for Squeak and wanted to make sure that it also works for Pharo.
> > Therefore,
> >> I have created a travis build job for Squeak and Pharo
> >> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
> > pulls
> >> the source from squeaksource.com.
> >>
> >> Now the issue is that loading the package in Pharo fails with a
> >> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
> >> might be the result of the squeaksource page delivering the page as
> >> iso-8859-1 as it contains special characters. Any ideas on how to get
> >> this
> >> to work? I do not have access to the ical repository description and I
> > would
> >> like to avoid mirroring the whole repository on GitHub.
> >
> >
> > In a fresh 60437 image, in Playground evaluating...
> >
> >   Metacello new
> >        configuration: 'ICal';
> >        repository: 'github://codeZeilen/ical-smalltalk:master/repository';
> >        onConflict: [:ex | ex allow];
> >        load.
> >   ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
> > /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
> > http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
> > access
> > http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> > utf-8 encoding'
> >
> >
> > In a new fresh 60437 Image (i.e. empty package-cache)
> >   World menu > Monticello > +Repository > squeaksource.com...
> >      MCSqueaksourceRepository
> >         location: 'http://squeaksource.com/ical'
> >         user: ''
> >         password: ''
> >    ==> open repository then errors "MCRepositoryError: Could not access
> > http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> > utf-8 encoding"
> >
> >
> > In Chrome, opening http://www.squeaksource.com/ical
> > then clicking <Versions>
> > and the browser's View Page Source,
> > I see...
> >    <?xml version="1.0" encoding="iso-8859-1"?>
> >
> > Googling: zinc iso-8859-1
> > finds...
> > http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
> > but "ZnByteEncoder iso88591"
> > errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
> > and inspecting "ZnByteEncoder byteTextConverters keys sorted"
> > confirms this key is missing (@Sven, I'm curious why was this removed? )
> >
> >
> > Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
> > indicates IBM819 is an alias
> > and " ZnByteEncoder newForEncoding: 'ibm819' "
> > works okay
> >
> > So in MCHttpRepository>>#loadAllFileNames
> > changing...
> >          queryAt: 'C' put: 'M;O=D' ;
> >          get.
> > to...
> >          queryAt: 'C' put: 'M;O=D' .
> >          ZnDefaultCharacterEncoder
> >               value: (ZnByteEncoder newForEncoding: 'ibm819')
> >               during: [client get].
> >
> > Then from Monticello opening the previously defined
> > http://squeaksource.com/ical
> > works!!
> >
> >
> > Now I was hoping that reverting #loadAllFileNames
> > and in Playground doing...
> >     converters := ZnByteEncoder byteTextConverters.
> >     converters at: 'iso-8859-1' put: (converters at: 'ibm819').
> > might alleviate the problem, but no luck.
> >
> >
> > Anyone know a better way to deal with this that hardcoding the encoding
> > into #loadAllFileNames?
> >
> > cheers -ben
> >
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

Ben Coman
In reply to this post by Sven Van Caekenberghe-2
On Thu, Mar 16, 2017 at 6:25 AM, Sven Van Caekenberghe <[hidden email]> wrote:

>
>
> > On 15 Mar 2017, at 20:52, Rein, Patrick <[hidden email]> wrote:
> >
> > Unfortunately, as I am trying to fix a Travis build, I can not change the call to Zinc.
> >
> > To be clear about this: I also think that squeaksource should serve UTF-8.
> > However, at the same time a missing charset in a HTTP response means that the content
> > should be decoded as ISO-8859-1 [1]. So in general this does seem to me like an issue in Zinc.
> >
> > I see that this might be a problem to change though, so I will consider moving the project at one point (or removing that damn umlaut :) ).
> >
> > Bests
> > Patrick
> >
> > [1] https://tools.ietf.org/html/rfc2616#section-3.7.1
(Hypertext Transfer Protocol -- HTTP/1.1)

For easy reference, the pertinent part seems to be...
   When no explicit charset parameter is provided by the sender,
   media subtypes of the "text" type are defined to have a
   default charset value of "ISO-8859-1" when received via HTTP.

Additional support...
Section 6, The 'text/html' Media Type
(http://www.rfc-editor.org/rfc/rfc2854.txt)
[MIME] specifies "The default character set, which must be assumed in
the absence of a charset parameter, is US-ASCII."
[HTTP] Section 3.7.1, defines that "media subtypes of the 'text' type
are defined to have a default charset value of 'ISO-8859-1'".

>
> Hmm, OK, I never saw that paragraph, interesting.
> Thanks for the pointer, I will put it on my todo list to think about.

Food for thought...

https://en.wikipedia.org/wiki/Robustness_principle

which is implied here...
https://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7.2.1
Any HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body. If and
only if the media type is not given by a Content-Type field, the
recipient **MAY** attempt to guess the media type via inspection of
its content and/or the name extension(s) of the URI used to identify
the resource.


Perhaps then its okay to consider the media type is not fully defined
when there no character encoding is specified, and content inspection
is okay - perhaps only in the face of a character encoding error.  So
even though at the HTML level, these articles are interesting...

Character encoding in HTML
https://www.w3.org/blog/2008/03/html-charset/

Encoding Divination
http://nikitathespider.com/articles/EncodingDivination.html


On Thu, Mar 16, 2017 at 3:55 AM, monty <[hidden email]> wrote:
>
> That isn't Zinc's responsibility; it just handles HTTP. The HTML or XML parser using it should disable Zinc's automatic decoding based on Content-Type and do its own decoding of the raw response (which can still be done using Zinc's decoders) informed by the content of the response and not just its Content-Type. XMLParser and XMLParserHTML both use Zinc this way.

But a lot of what people want to do from the image is process web data
by parsing HTML or XML.
So is there anything we do to make it more robust out of the box
(without getting in the way)
ZnClient is promoted as the "easy" way to use Zinc.  Could that not
have additional smarts to inspect the content
to "auto" divine the encoding. Or maybe we need another user-agent
tool that does this?

cheers -ben

>
>
> > ________________________________________
> > From: Pharo-dev <[hidden email]> on behalf of Ben Coman <[hidden email]>
> > Sent: Wednesday, March 15, 2017 19:16
> > To: Pharo Development List
> > Subject: Re: [Pharo-dev] ZnInvalidUTF8 on response from squeaksource
> >
> > On Thu, Mar 16, 2017 at 1:25 AM, Sven Van Caekenberghe <[hidden email]> wrote:
> >>
> >> Hi,
> >>
> >> This is a recurring issue.
> >
> >
> > It would be cool if some magic(TM) could raise a dialog with an
> > explanation and pull-down list to select an encoding - but maybe that
> > is too much hand holding.
> >
> >
> >>
> >> The problem is that the server serves a resource, in this case text/html, without specifying its encoding.
> >
> > I just bumped into [1] while browsing around to learn more, but I
> > don't know fully how to interpret it.
> > What do you make of it saying "An XHTML5 document is served as XML and
> > has XML syntax. XML parsers do not recognise the encoding declarations
> > in meta elements. They only recognise the XML declaration. Here is an
> > example:
> >    <?xml version="1.0" encoding="utf-8"?>
> >    <!DOCTYPE html ....
> >
> > compared to the page having...
> >    <?xml version="1.0" encoding="iso-8859-1"?>
> >
> > cheers -ben
> >
> > [1]    https://www.w3.org/International/questions/qa-html-encoding-declarations
> >
> >
> >>
> >> Today, when no encoding is specified, we default to UTF-8. In this case the server silently serves a resource which is ISO-8895-1 encoded.
> >>
> >> The error is triggered by accessing the following URL:
> >>
> >> ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself.
> >>
> >> If you inspect the response object inside the http client, you will see that the content-type is text/html. So Zn parses the incoming text using UTF-8 which fails (Zn encoders are strict by default).
> >>
> >> Here is how to change the default during a call:
> >>
> >> ZnDefaultCharacterEncoder
> >>  value: ZnCharacterEncoder iso88591
> >>  during: [ ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself ].
> >>
> >> The solution would be that the server adds the proper charset specification.
> >>
> >> Consider the default in Pharo:
> >>
> >> ZnMimeType textHtml => text/html;charset=utf-8
> >>
> >> The server should serve this resource using the following Content-Type:
> >>
> >> text/html;charset=iso-8859-1
> >>
> >> This is the server's responsibility. The page in question is the MC index page, which would normally be dynamically generated. Somewhere the server decides on the encoding. That encoding does not have to change, but it should be properly indicated in the HTTP response headers.
> >>
> >> HTH,
> >>
> >> Sven
> >>
> >>> On 15 Mar 2017, at 17:42, David T. Lewis <[hidden email]> wrote:
> >>>
> >>> squeaksource.com is still running on a quite old image, and I know that it
> >>> has problems with multibyte characters. If you are seeing problems related
> >>> to this, it's not the fault of Zinc.
> >>>
> >>> If you can confirm that this is what is happening, then I guess it is time
> >>> to update that trusty old squeaksource.com image :-)
> >>>
> >>> Dave
> >>>
> >>>> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[hidden email]> wrote:
> >>>>>
> >>>>> Hi everyone,
> >>>>>
> >>>>> I have been working on bringing http://squeaksource.com/ical/ up to
> >>>>> speed
> >>>>> for Squeak and wanted to make sure that it also works for Pharo.
> >>>> Therefore,
> >>>>> I have created a travis build job for Squeak and Pharo
> >>>>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
> >>>> pulls
> >>>>> the source from squeaksource.com.
> >>>>>
> >>>>> Now the issue is that loading the package in Pharo fails with a
> >>>>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
> >>>>> might be the result of the squeaksource page delivering the page as
> >>>>> iso-8859-1 as it contains special characters. Any ideas on how to get
> >>>>> this
> >>>>> to work? I do not have access to the ical repository description and I
> >>>> would
> >>>>> like to avoid mirroring the whole repository on GitHub.
> >>>>
> >>>>
> >>>> In a fresh 60437 image, in Playground evaluating...
> >>>>
> >>>> Metacello new
> >>>>      configuration: 'ICal';
> >>>>      repository: 'github://codeZeilen/ical-smalltalk:master/repository';
> >>>>      onConflict: [:ex | ex allow];
> >>>>      load.
> >>>> ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
> >>>> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
> >>>> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
> >>>> access
> >>>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> >>>> utf-8 encoding'
> >>>>
> >>>>
> >>>> In a new fresh 60437 Image (i.e. empty package-cache)
> >>>> World menu > Monticello > +Repository > squeaksource.com...
> >>>>    MCSqueaksourceRepository
> >>>>       location: 'http://squeaksource.com/ical'
> >>>>       user: ''
> >>>>       password: ''
> >>>>  ==> open repository then errors "MCRepositoryError: Could not access
> >>>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
> >>>> utf-8 encoding"
> >>>>
> >>>>
> >>>> In Chrome, opening http://www.squeaksource.com/ical
> >>>> then clicking <Versions>
> >>>> and the browser's View Page Source,
> >>>> I see...
> >>>>  <?xml version="1.0" encoding="iso-8859-1"?>
> >>>>
> >>>> Googling: zinc iso-8859-1
> >>>> finds...
> >>>> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
> >>>> but "ZnByteEncoder iso88591"
> >>>> errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
> >>>> and inspecting "ZnByteEncoder byteTextConverters keys sorted"
> >>>> confirms this key is missing (@Sven, I'm curious why was this removed? )
> >>>>
> >>>>
> >>>> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
> >>>> indicates IBM819 is an alias
> >>>> and " ZnByteEncoder newForEncoding: 'ibm819' "
> >>>> works okay
> >>>>
> >>>> So in MCHttpRepository>>#loadAllFileNames
> >>>> changing...
> >>>>        queryAt: 'C' put: 'M;O=D' ;
> >>>>        get.
> >>>> to...
> >>>>        queryAt: 'C' put: 'M;O=D' .
> >>>>        ZnDefaultCharacterEncoder
> >>>>             value: (ZnByteEncoder newForEncoding: 'ibm819')
> >>>>             during: [client get].
> >>>>
> >>>> Then from Monticello opening the previously defined
> >>>> http://squeaksource.com/ical
> >>>> works!!
> >>>>
> >>>>
> >>>> Now I was hoping that reverting #loadAllFileNames
> >>>> and in Playground doing...
> >>>>   converters := ZnByteEncoder byteTextConverters.
> >>>>   converters at: 'iso-8859-1' put: (converters at: 'ibm819').
> >>>> might alleviate the problem, but no luck.
> >>>>
> >>>>
> >>>> Anyone know a better way to deal with this that hardcoding the encoding
> >>>> into #loadAllFileNames?
> >>>>
> >>>> cheers -ben
> >>>>
> >>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

Ben Coman
In reply to this post by Patrick R.
On Thu, Mar 16, 2017 at 3:52 AM, Rein, Patrick <[hidden email]> wrote:
> Unfortunately, as I am trying to fix a Travis build, I can not change the call to Zinc.

Actually maybe you can.  
Is it possible for Travis to run the following from the command line before loading iCal ?

$ pharo your.image loadAllFilenamesHack.st --save --quit

cheers -ben




loadAllFilenamesHack.zip (784 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ZnInvalidUTF8 on response from squeaksource

Sven Van Caekenberghe-2
In reply to this post by Sven Van Caekenberghe-2

> On 15 Mar 2017, at 23:25, Sven Van Caekenberghe <[hidden email]> wrote:
>
>>
>> On 15 Mar 2017, at 20:52, Rein, Patrick <[hidden email]> wrote:
>>
>> Unfortunately, as I am trying to fix a Travis build, I can not change the call to Zinc.
>>
>> To be clear about this: I also think that squeaksource should serve UTF-8.
>> However, at the same time a missing charset in a HTTP response means that the content
>> should be decoded as ISO-8859-1 [1]. So in general this does seem to me like an issue in Zinc.
>>
>> I see that this might be a problem to change though, so I will consider moving the project at one point (or removing that damn umlaut :) ).
>>
>> Bests
>> Patrick
>>
>> [1] https://tools.ietf.org/html/rfc2616#section-3.7.1
>
> Hmm, OK, I never saw that paragraph, interesting.
> Thanks for the pointer, I will put it on my todo list to think about.

I recently added #defaultEncoder: as API to ZnClient and ZnServer to make it easier to change the default (Zn) character encoder used (instead of using the DynamicVariable, which can still be done, and is used behind the scenes).

ZnClient new defaultEncoder: #iso88951; get: 'http://stfx.eu/small.html'.

>> ________________________________________
>> From: Pharo-dev <[hidden email]> on behalf of Ben Coman <[hidden email]>
>> Sent: Wednesday, March 15, 2017 19:16
>> To: Pharo Development List
>> Subject: Re: [Pharo-dev] ZnInvalidUTF8 on response from squeaksource
>>
>> On Thu, Mar 16, 2017 at 1:25 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> This is a recurring issue.
>>
>>
>> It would be cool if some magic(TM) could raise a dialog with an
>> explanation and pull-down list to select an encoding - but maybe that
>> is too much hand holding.
>>
>>
>>>
>>> The problem is that the server serves a resource, in this case text/html, without specifying its encoding.
>>
>> I just bumped into [1] while browsing around to learn more, but I
>> don't know fully how to interpret it.
>> What do you make of it saying "An XHTML5 document is served as XML and
>> has XML syntax. XML parsers do not recognise the encoding declarations
>> in meta elements. They only recognise the XML declaration. Here is an
>> example:
>>   <?xml version="1.0" encoding="utf-8"?>
>>   <!DOCTYPE html ....
>>
>> compared to the page having...
>>   <?xml version="1.0" encoding="iso-8859-1"?>
>>
>> cheers -ben
>>
>> [1]    https://www.w3.org/International/questions/qa-html-encoding-declarations
>>
>>
>>>
>>> Today, when no encoding is specified, we default to UTF-8. In this case the server silently serves a resource which is ISO-8895-1 encoded.
>>>
>>> The error is triggered by accessing the following URL:
>>>
>>> ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself.
>>>
>>> If you inspect the response object inside the http client, you will see that the content-type is text/html. So Zn parses the incoming text using UTF-8 which fails (Zn encoders are strict by default).
>>>
>>> Here is how to change the default during a call:
>>>
>>> ZnDefaultCharacterEncoder
>>> value: ZnCharacterEncoder iso88591
>>> during: [ ZnClient new get: 'http://squeaksource.com/ical/?C=M;O%3DD'; yourself ].
>>>
>>> The solution would be that the server adds the proper charset specification.
>>>
>>> Consider the default in Pharo:
>>>
>>> ZnMimeType textHtml => text/html;charset=utf-8
>>>
>>> The server should serve this resource using the following Content-Type:
>>>
>>> text/html;charset=iso-8859-1
>>>
>>> This is the server's responsibility. The page in question is the MC index page, which would normally be dynamically generated. Somewhere the server decides on the encoding. That encoding does not have to change, but it should be properly indicated in the HTTP response headers.
>>>
>>> HTH,
>>>
>>> Sven
>>>
>>>> On 15 Mar 2017, at 17:42, David T. Lewis <[hidden email]> wrote:
>>>>
>>>> squeaksource.com is still running on a quite old image, and I know that it
>>>> has problems with multibyte characters. If you are seeing problems related
>>>> to this, it's not the fault of Zinc.
>>>>
>>>> If you can confirm that this is what is happening, then I guess it is time
>>>> to update that trusty old squeaksource.com image :-)
>>>>
>>>> Dave
>>>>
>>>>> On Wed, Mar 15, 2017 at 8:19 PM, Patrick R. <[hidden email]> wrote:
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I have been working on bringing http://squeaksource.com/ical/ up to
>>>>>> speed
>>>>>> for Squeak and wanted to make sure that it also works for Pharo.
>>>>> Therefore,
>>>>>> I have created a travis build job for Squeak and Pharo
>>>>>> (https://travis-ci.org/codeZeilen/ical-smalltalk/jobs/211298950) which
>>>>> pulls
>>>>>> the source from squeaksource.com.
>>>>>>
>>>>>> Now the issue is that loading the package in Pharo fails with a
>>>>>> GoferException wrapping a ZnInvalidUTF8 Exception. We figured that this
>>>>>> might be the result of the squeaksource page delivering the page as
>>>>>> iso-8859-1 as it contains special characters. Any ideas on how to get
>>>>>> this
>>>>>> to work? I do not have access to the ical repository description and I
>>>>> would
>>>>>> like to avoid mirroring the whole repository on GitHub.
>>>>>
>>>>>
>>>>> In a fresh 60437 image, in Playground evaluating...
>>>>>
>>>>> Metacello new
>>>>>     configuration: 'ICal';
>>>>>     repository: 'github://codeZeilen/ical-smalltalk:master/repository';
>>>>>     onConflict: [:ex | ex allow];
>>>>>     load.
>>>>> ==> Could not resolve: ICal-Core [ICal-Core-PaulDeBruicker.5] in
>>>>> /home/ben/.local/share/Pharo/images/60437-01/pharo-local/package-cache
>>>>> http://squeaksource.com/ical ERROR: 'GoferRepositoryError: Could not
>>>>> access
>>>>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
>>>>> utf-8 encoding'
>>>>>
>>>>>
>>>>> In a new fresh 60437 Image (i.e. empty package-cache)
>>>>> World menu > Monticello > +Repository > squeaksource.com...
>>>>>   MCSqueaksourceRepository
>>>>>      location: 'http://squeaksource.com/ical'
>>>>>      user: ''
>>>>>      password: ''
>>>>> ==> open repository then errors "MCRepositoryError: Could not access
>>>>> http://squeaksource.com/ical: ZnInvalidUTF8: Illegal continuation byte for
>>>>> utf-8 encoding"
>>>>>
>>>>>
>>>>> In Chrome, opening http://www.squeaksource.com/ical
>>>>> then clicking <Versions>
>>>>> and the browser's View Page Source,
>>>>> I see...
>>>>> <?xml version="1.0" encoding="iso-8859-1"?>
>>>>>
>>>>> Googling: zinc iso-8859-1
>>>>> finds...
>>>>> http://forum.world.st/Problem-using-Zinc-in-Pharo-4-Moose-5-1-td4825329.html
>>>>> but "ZnByteEncoder iso88591"
>>>>> errors with "KeyNotFound: key 'iso88591' not found in Dictionary"
>>>>> and inspecting "ZnByteEncoder byteTextConverters keys sorted"
>>>>> confirms this key is missing (@Sven, I'm curious why was this removed? )
>>>>>
>>>>>
>>>>> Now https://en.wikipedia.org/wiki/ISO/IEC_8859-1
>>>>> indicates IBM819 is an alias
>>>>> and " ZnByteEncoder newForEncoding: 'ibm819' "
>>>>> works okay
>>>>>
>>>>> So in MCHttpRepository>>#loadAllFileNames
>>>>> changing...
>>>>>       queryAt: 'C' put: 'M;O=D' ;
>>>>>       get.
>>>>> to...
>>>>>       queryAt: 'C' put: 'M;O=D' .
>>>>>       ZnDefaultCharacterEncoder
>>>>>            value: (ZnByteEncoder newForEncoding: 'ibm819')
>>>>>            during: [client get].
>>>>>
>>>>> Then from Monticello opening the previously defined
>>>>> http://squeaksource.com/ical
>>>>> works!!
>>>>>
>>>>>
>>>>> Now I was hoping that reverting #loadAllFileNames
>>>>> and in Playground doing...
>>>>>  converters := ZnByteEncoder byteTextConverters.
>>>>>  converters at: 'iso-8859-1' put: (converters at: 'ibm819').
>>>>> might alleviate the problem, but no luck.
>>>>>
>>>>>
>>>>> Anyone know a better way to deal with this that hardcoding the encoding
>>>>> into #loadAllFileNames?
>>>>>
>>>>> cheers -ben