The Trunk: WebClient-Core-topa.112.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: WebClient-Core-topa.112.mcz

commits-2
Tobias Pape uploaded a new version of WebClient-Core to project The Trunk:
http://source.squeak.org/trunk/WebClient-Core-topa.112.mcz

==================== Summary ====================

Name: WebClient-Core-topa.112
Author: topa
Time: 20 September 2017, 5:29:06.096983 pm
UUID: 60b494fc-0652-4a28-be5a-1578963e5aed
Ancestors: WebClient-Core-topa.111

Abide Postel's law for text conversion.

Be conservative in what you do, be liberal in what you accept from others.

=============== Diff against WebClient-Core-topa.111 ===============

Item was changed:
  ----- Method: WebMessage>>getContentWithProgress: (in category 'private') -----
  getContentWithProgress: progressBlockOrNil
  "Reads available content and returns it."
 
  | length result |
  length := self contentLength.
  result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
  new: (length ifNil: [ 1000 ])
  streamContents: [ :outputStream |
  self
  streamFrom: stream
  to: outputStream
  size: length
  progress: progressBlockOrNil ].
  self decoderForContentEncoding ifNotNil: [:decoder |
  result := (decoder on: result) upToEnd].
  self textConverterForContentType ifNotNil: [:converter |
+ [result := result convertFromWithConverter: converter]
+ on: InvalidUTF8 "some servers lie"
+ do: [^ result]].
- result := result convertFromWithConverter: converter].
  ^ result
  !

Item was changed:
  ----- Method: WebMessage>>textConverterForContentType (in category 'accessing') -----
  textConverterForContentType
 
  | index contentType |
  contentType := self contentType.
  contentType size < 8 ifTrue: [ ^nil ].
- (contentType beginsWithAnyOf: #('application/' 'image/' 'video/' 'audio/')) ifTrue: [^nil].
  index := contentType findString: 'charset=' startingAt: 1 caseSensitive: false.
  index = 0 ifTrue: [ ^nil ].
  ^TextConverter newForEncoding: (contentType allButFirst: index + 7) "'charset=' size - 1"!


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: WebClient-Core-topa.112.mcz

Levente Uzonyi
On Wed, 20 Sep 2017, [hidden email] wrote:

> Tobias Pape uploaded a new version of WebClient-Core to project The Trunk:
> http://source.squeak.org/trunk/WebClient-Core-topa.112.mcz
>
> ==================== Summary ====================
>
> Name: WebClient-Core-topa.112
> Author: topa
> Time: 20 September 2017, 5:29:06.096983 pm
> UUID: 60b494fc-0652-4a28-be5a-1578963e5aed
> Ancestors: WebClient-Core-topa.111
>
> Abide Postel's law for text conversion.
>
> Be conservative in what you do, be liberal in what you accept from others.
>
> =============== Diff against WebClient-Core-topa.111 ===============
>
> Item was changed:
>  ----- Method: WebMessage>>getContentWithProgress: (in category 'private') -----
>  getContentWithProgress: progressBlockOrNil
>   "Reads available content and returns it."
>
>   | length result |
>   length := self contentLength.
>   result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
>   new: (length ifNil: [ 1000 ])
>   streamContents: [ :outputStream |
>   self
>   streamFrom: stream
>   to: outputStream
>   size: length
>   progress: progressBlockOrNil ].
>   self decoderForContentEncoding ifNotNil: [:decoder |
>   result := (decoder on: result) upToEnd].
>   self textConverterForContentType ifNotNil: [:converter |
> + [result := result convertFromWithConverter: converter]
> + on: InvalidUTF8 "some servers lie"
> + do: [^ result]].
> - result := result convertFromWithConverter: converter].

Why ignore only InvalidUTF8?

Levente

>   ^ result
>  !
>
> Item was changed:
>  ----- Method: WebMessage>>textConverterForContentType (in category 'accessing') -----
>  textConverterForContentType
>
>   | index contentType |
>   contentType := self contentType.
>   contentType size < 8 ifTrue: [ ^nil ].
> - (contentType beginsWithAnyOf: #('application/' 'image/' 'video/' 'audio/')) ifTrue: [^nil].
>   index := contentType findString: 'charset=' startingAt: 1 caseSensitive: false.
>   index = 0 ifTrue: [ ^nil ].
>   ^TextConverter newForEncoding: (contentType allButFirst: index + 7) "'charset=' size - 1"!

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: WebClient-Core-topa.112.mcz

Tobias Pape

> On 20.09.2017, at 17:56, Levente Uzonyi <[hidden email]> wrote:
>
> On Wed, 20 Sep 2017, [hidden email] wrote:
>
>> Tobias Pape uploaded a new version of WebClient-Core to project The Trunk:
>> http://source.squeak.org/trunk/WebClient-Core-topa.112.mcz
>>
>> ==================== Summary ====================
>>
>> Name: WebClient-Core-topa.112
>> Author: topa
>> Time: 20 September 2017, 5:29:06.096983 pm
>> UUID: 60b494fc-0652-4a28-be5a-1578963e5aed
>> Ancestors: WebClient-Core-topa.111
>>
>> Abide Postel's law for text conversion.
>>
>> Be conservative in what you do, be liberal in what you accept from others.
>>
>> =============== Diff against WebClient-Core-topa.111 ===============
>>
>> Item was changed:
>> ----- Method: WebMessage>>getContentWithProgress: (in category 'private') -----
>> getContentWithProgress: progressBlockOrNil
>> "Reads available content and returns it."
>>
>> | length result |
>> length := self contentLength.
>> result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
>> new: (length ifNil: [ 1000 ])
>> streamContents: [ :outputStream |
>> self
>> streamFrom: stream
>> to: outputStream
>> size: length
>> progress: progressBlockOrNil ].
>> self decoderForContentEncoding ifNotNil: [:decoder |
>> result := (decoder on: result) upToEnd].
>> self textConverterForContentType ifNotNil: [:converter |
>> + [result := result convertFromWithConverter: converter]
>> + on: InvalidUTF8 "some servers lie"
>> + do: [^ result]].
>> - result := result convertFromWithConverter: converter].
>
> Why ignore only InvalidUTF8?

Because there's no InvalidTextConversation yet ;)
What would be appropriate? Error?

Best regards
        -Tobias

>
> Levente
>
>> ^ result
>> !
>>
>> Item was changed:
>> ----- Method: WebMessage>>textConverterForContentType (in category 'accessing') -----
>> textConverterForContentType
>>
>> | index contentType |
>> contentType := self contentType.
>> contentType size < 8 ifTrue: [ ^nil ].
>> - (contentType beginsWithAnyOf: #('application/' 'image/' 'video/' 'audio/')) ifTrue: [^nil].
>> index := contentType findString: 'charset=' startingAt: 1 caseSensitive: false.
>> index = 0 ifTrue: [ ^nil ].
>> ^TextConverter newForEncoding: (contentType allButFirst: index + 7) "'charset=' size - 1"!


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: WebClient-Core-topa.112.mcz

monty-3
In reply to this post by commits-2
This series of content decoding changes is not backwards compatible. Existing code that does its own response decoding with the assumption that WebClient won't do it will now double-decode the response--corrupting it. But breaking existing code is probably worth it here because of the gains in intuitiveness and correctness.

Is there any way to disable the automatic response decoding?

> Sent: Wednesday, September 20, 2017 at 11:29 AM
> From: [hidden email]
> To: [hidden email], [hidden email]
> Subject: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>
> Tobias Pape uploaded a new version of WebClient-Core to project The Trunk:
> http://source.squeak.org/trunk/WebClient-Core-topa.112.mcz
>
> ==================== Summary ====================
>
> Name: WebClient-Core-topa.112
> Author: topa
> Time: 20 September 2017, 5:29:06.096983 pm
> UUID: 60b494fc-0652-4a28-be5a-1578963e5aed
> Ancestors: WebClient-Core-topa.111
>
> Abide Postel's law for text conversion.
>
> Be conservative in what you do, be liberal in what you accept from others.
>
> =============== Diff against WebClient-Core-topa.111 ===============
>
> Item was changed:
>   ----- Method: WebMessage>>getContentWithProgress: (in category 'private') -----
>   getContentWithProgress: progressBlockOrNil
>   "Reads available content and returns it."
>  
>   | length result |
>   length := self contentLength.
>   result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
>   new: (length ifNil: [ 1000 ])
>   streamContents: [ :outputStream |
>   self
>   streamFrom: stream
>   to: outputStream
>   size: length
>   progress: progressBlockOrNil ].
>   self decoderForContentEncoding ifNotNil: [:decoder |
>   result := (decoder on: result) upToEnd].
>   self textConverterForContentType ifNotNil: [:converter |
> + [result := result convertFromWithConverter: converter]
> + on: InvalidUTF8 "some servers lie"
> + do: [^ result]].
> - result := result convertFromWithConverter: converter].
>   ^ result
>   !
>
> Item was changed:
>   ----- Method: WebMessage>>textConverterForContentType (in category 'accessing') -----
>   textConverterForContentType
>  
>   | index contentType |
>   contentType := self contentType.
>   contentType size < 8 ifTrue: [ ^nil ].
> - (contentType beginsWithAnyOf: #('application/' 'image/' 'video/' 'audio/')) ifTrue: [^nil].
>   index := contentType findString: 'charset=' startingAt: 1 caseSensitive: false.
>   index = 0 ifTrue: [ ^nil ].
>   ^TextConverter newForEncoding: (contentType allButFirst: index + 7) "'charset=' size - 1"!
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: WebClient-Core-topa.112.mcz

Tobias Pape

> On 27.10.2017, at 18:40, monty <[hidden email]> wrote:
>
> This series of content decoding changes is not backwards compatible. Existing code that does its own response decoding with the assumption that WebClient won't do it will now double-decode the response--corrupting it. But breaking existing code is probably worth it here because of the gains in intuitiveness and correctness.
>
> Is there any way to disable the automatic response decoding?

That's problematic.
Some conversation was always done, you never ended up with mere Strings/ByteArrays.

This is the old version, that opportunistically applied inflate/deflate:
================
getContentWithProgress: progressBlockOrNil
        "Reads available content and returns it."

        | length result |
        length := self contentLength.
        result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
                new: (length ifNil: [ 1000 ])
                streamContents: [ :outputStream |
                        self
                                streamFrom: stream
                                to: outputStream
                                size: length
                                progress: progressBlockOrNil ].
        (self headerAt: 'content-encoding') = 'gzip' ifFalse: [ ^result ].
        ^(GZipReadStream on: result) upToEnd
=================


Maybe we need a plain "get me the bytes" variant.
Oh there is:

WebMessage>>streamFrom:to:size:progress:

That's somewhat inelegant but should work…



Best regards
        -Tobias

>
>> Sent: Wednesday, September 20, 2017 at 11:29 AM
>> From: [hidden email]
>> To: [hidden email], [hidden email]
>> Subject: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>>
>> Tobias Pape uploaded a new version of WebClient-Core to project The Trunk:
>> http://source.squeak.org/trunk/WebClient-Core-topa.112.mcz
>>
>> ==================== Summary ====================
>>
>> Name: WebClient-Core-topa.112
>> Author: topa
>> Time: 20 September 2017, 5:29:06.096983 pm
>> UUID: 60b494fc-0652-4a28-be5a-1578963e5aed
>> Ancestors: WebClient-Core-topa.111
>>
>> Abide Postel's law for text conversion.
>>
>> Be conservative in what you do, be liberal in what you accept from others.
>>
>> =============== Diff against WebClient-Core-topa.111 ===============
>>
>> Item was changed:
>>  ----- Method: WebMessage>>getContentWithProgress: (in category 'private') -----
>>  getContentWithProgress: progressBlockOrNil
>>   "Reads available content and returns it."
>>
>>   | length result |
>>   length := self contentLength.
>>   result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
>>   new: (length ifNil: [ 1000 ])
>>   streamContents: [ :outputStream |
>>   self
>>   streamFrom: stream
>>   to: outputStream
>>   size: length
>>   progress: progressBlockOrNil ].
>>   self decoderForContentEncoding ifNotNil: [:decoder |
>>   result := (decoder on: result) upToEnd].
>>   self textConverterForContentType ifNotNil: [:converter |
>> + [result := result convertFromWithConverter: converter]
>> + on: InvalidUTF8 "some servers lie"
>> + do: [^ result]].
>> - result := result convertFromWithConverter: converter].
>>   ^ result
>>  !
>>
>> Item was changed:
>>  ----- Method: WebMessage>>textConverterForContentType (in category 'accessing') -----
>>  textConverterForContentType
>>
>>   | index contentType |
>>   contentType := self contentType.
>>   contentType size < 8 ifTrue: [ ^nil ].
>> - (contentType beginsWithAnyOf: #('application/' 'image/' 'video/' 'audio/')) ifTrue: [^nil].
>>   index := contentType findString: 'charset=' startingAt: 1 caseSensitive: false.
>>   index = 0 ifTrue: [ ^nil ].
>>   ^TextConverter newForEncoding: (contentType allButFirst: index + 7) "'charset=' size - 1"!
>>
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: WebClient-Core-topa.112.mcz

monty-3
> Sent: Friday, October 27, 2017 at 1:33 PM
> From: "Tobias Pape" <[hidden email]>
> To: "The general-purpose Squeak developers list" <[hidden email]>
> Subject: Re: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>
>
> > On 27.10.2017, at 18:40, monty <[hidden email]> wrote:
> >
> > This series of content decoding changes is not backwards compatible. Existing code that does its own response decoding with the assumption that WebClient won't do it will now double-decode the response--corrupting it. But breaking existing code is probably worth it here because of the gains in intuitiveness and correctness.
> >
> > Is there any way to disable the automatic response decoding?
>
> That's problematic.
> Some conversation was always done, you never ended up with mere Strings/ByteArrays.
>
> This is the old version, that opportunistically applied inflate/deflate:

Which is part of the HTTP protocol itself and is meant to be applied before transit and removed after in a way that is entirely transparent to the application. By decoding the response content from UTF-8 or some other character encoding, you're directly modifying the content before giving it to the application, and in a way that is permanent and maybe incorrect.

Please review and merge WebClient-Core-monty.114.mcz.

> ================
> getContentWithProgress: progressBlockOrNil
> "Reads available content and returns it."
>
> | length result |
> length := self contentLength.
> result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
> new: (length ifNil: [ 1000 ])
> streamContents: [ :outputStream |
> self
> streamFrom: stream
> to: outputStream
> size: length
> progress: progressBlockOrNil ].
> (self headerAt: 'content-encoding') = 'gzip' ifFalse: [ ^result ].
> ^(GZipReadStream on: result) upToEnd
> =================
>
>
> Maybe we need a plain "get me the bytes" variant.
> Oh there is:
>
> WebMessage>>streamFrom:to:size:progress:
>
> That's somewhat inelegant but should work…
>
>
>
> Best regards
> -Tobias
>
> >
> >> Sent: Wednesday, September 20, 2017 at 11:29 AM
> >> From: [hidden email]
> >> To: [hidden email], [hidden email]
> >> Subject: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
> >>
> >> Tobias Pape uploaded a new version of WebClient-Core to project The Trunk:
> >> http://source.squeak.org/trunk/WebClient-Core-topa.112.mcz
> >>
> >> ==================== Summary ====================
> >>
> >> Name: WebClient-Core-topa.112
> >> Author: topa
> >> Time: 20 September 2017, 5:29:06.096983 pm
> >> UUID: 60b494fc-0652-4a28-be5a-1578963e5aed
> >> Ancestors: WebClient-Core-topa.111
> >>
> >> Abide Postel's law for text conversion.
> >>
> >> Be conservative in what you do, be liberal in what you accept from others.
> >>
> >> =============== Diff against WebClient-Core-topa.111 ===============
> >>
> >> Item was changed:
> >>  ----- Method: WebMessage>>getContentWithProgress: (in category 'private') -----
> >>  getContentWithProgress: progressBlockOrNil
> >>   "Reads available content and returns it."
> >>
> >>   | length result |
> >>   length := self contentLength.
> >>   result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
> >>   new: (length ifNil: [ 1000 ])
> >>   streamContents: [ :outputStream |
> >>   self
> >>   streamFrom: stream
> >>   to: outputStream
> >>   size: length
> >>   progress: progressBlockOrNil ].
> >>   self decoderForContentEncoding ifNotNil: [:decoder |
> >>   result := (decoder on: result) upToEnd].
> >>   self textConverterForContentType ifNotNil: [:converter |
> >> + [result := result convertFromWithConverter: converter]
> >> + on: InvalidUTF8 "some servers lie"
> >> + do: [^ result]].
> >> - result := result convertFromWithConverter: converter].
> >>   ^ result
> >>  !
> >>
> >> Item was changed:
> >>  ----- Method: WebMessage>>textConverterForContentType (in category 'accessing') -----
> >>  textConverterForContentType
> >>
> >>   | index contentType |
> >>   contentType := self contentType.
> >>   contentType size < 8 ifTrue: [ ^nil ].
> >> - (contentType beginsWithAnyOf: #('application/' 'image/' 'video/' 'audio/')) ifTrue: [^nil].
> >>   index := contentType findString: 'charset=' startingAt: 1 caseSensitive: false.
> >>   index = 0 ifTrue: [ ^nil ].
> >>   ^TextConverter newForEncoding: (contentType allButFirst: index + 7) "'charset=' size - 1"!
> >>
> >>
> >>
> >
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: WebClient-Core-topa.112.mcz

Tobias Pape

> On 31.10.2017, at 07:51, monty <[hidden email]> wrote:
>
>> Sent: Friday, October 27, 2017 at 1:33 PM
>> From: "Tobias Pape" <[hidden email]>
>> To: "The general-purpose Squeak developers list" <[hidden email]>
>> Subject: Re: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>>
>>
>>> On 27.10.2017, at 18:40, monty <[hidden email]> wrote:
>>>
>>> This series of content decoding changes is not backwards compatible. Existing code that does its own response decoding with the assumption that WebClient won't do it will now double-decode the response--corrupting it. But breaking existing code is probably worth it here because of the gains in intuitiveness and correctness.
>>>
>>> Is there any way to disable the automatic response decoding?
>>
>> That's problematic.
>> Some conversation was always done, you never ended up with mere Strings/ByteArrays.
>>
>> This is the old version, that opportunistically applied inflate/deflate:
>
> Which is part of the HTTP protocol itself and is meant to be applied before transit and removed after in a way that is entirely transparent to the application. By decoding the response content from UTF-8 or some other character encoding, you're directly modifying the content before giving it to the application, and in a way that is permanent and maybe incorrect.
>

Yes, when the http-header says so?

(see https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html)

I don't see why handling the 'content-encoding' (section 3.5, eg, deflate), the 'transfer-encoding' (section 3.6, eg, chunked), and the 'content-type' (section 3.7, with 'charset') headers differently.
Actually, it probably should also handle the 'charset' (section 3.4) header, too.

Also, HTTP explicitly says that anything not tagged is to be handled as iso-8859-1, so anything that does _not_ tag in the header but only in the content (say, xml whith utf8) will be problematic.

OTOH, you still are able to retrieve the information from the message.

So, say the XML doc says it's utf-8 encoded, you can still ask the message whether the content-type;charset or the charset header already where sent as utf-8, and if not, recode from iso-8859-1.



> Please review and merge WebClient-Core-monty.114.mcz.

I'll review.


Best regards
        -Tobias



>
>> ================
>> getContentWithProgress: progressBlockOrNil
>> "Reads available content and returns it."
>>
>> | length result |
>> length := self contentLength.
>> result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
>> new: (length ifNil: [ 1000 ])
>> streamContents: [ :outputStream |
>> self
>> streamFrom: stream
>> to: outputStream
>> size: length
>> progress: progressBlockOrNil ].
>> (self headerAt: 'content-encoding') = 'gzip' ifFalse: [ ^result ].
>> ^(GZipReadStream on: result) upToEnd
>> =================
>>
>>
>> Maybe we need a plain "get me the bytes" variant.
>> Oh there is:
>>
>> WebMessage>>streamFrom:to:size:progress:
>>
>> That's somewhat inelegant but should work…
>>
>>
>>
>> Best regards
>> -Tobias
>>
>>>
>>>> Sent: Wednesday, September 20, 2017 at 11:29 AM
>>>> From: [hidden email]
>>>> To: [hidden email], [hidden email]
>>>> Subject: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>>>>
>>>> Tobias Pape uploaded a new version of WebClient-Core to project The Trunk:
>>>> http://source.squeak.org/trunk/WebClient-Core-topa.112.mcz
>>>>
>>>> ==================== Summary ====================
>>>>
>>>> Name: WebClient-Core-topa.112
>>>> Author: topa
>>>> Time: 20 September 2017, 5:29:06.096983 pm
>>>> UUID: 60b494fc-0652-4a28-be5a-1578963e5aed
>>>> Ancestors: WebClient-Core-topa.111
>>>>
>>>> Abide Postel's law for text conversion.
>>>>
>>>> Be conservative in what you do, be liberal in what you accept from others.
>>>>
>>>> =============== Diff against WebClient-Core-topa.111 ===============
>>>>
>>>> Item was changed:
>>>> ----- Method: WebMessage>>getContentWithProgress: (in category 'private') -----
>>>> getContentWithProgress: progressBlockOrNil
>>>> "Reads available content and returns it."
>>>>
>>>> | length result |
>>>> length := self contentLength.
>>>> result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
>>>> new: (length ifNil: [ 1000 ])
>>>> streamContents: [ :outputStream |
>>>> self
>>>> streamFrom: stream
>>>> to: outputStream
>>>> size: length
>>>> progress: progressBlockOrNil ].
>>>> self decoderForContentEncoding ifNotNil: [:decoder |
>>>> result := (decoder on: result) upToEnd].
>>>> self textConverterForContentType ifNotNil: [:converter |
>>>> + [result := result convertFromWithConverter: converter]
>>>> + on: InvalidUTF8 "some servers lie"
>>>> + do: [^ result]].
>>>> - result := result convertFromWithConverter: converter].
>>>> ^ result
>>>> !
>>>>
>>>> Item was changed:
>>>> ----- Method: WebMessage>>textConverterForContentType (in category 'accessing') -----
>>>> textConverterForContentType
>>>>
>>>> | index contentType |
>>>> contentType := self contentType.
>>>> contentType size < 8 ifTrue: [ ^nil ].
>>>> - (contentType beginsWithAnyOf: #('application/' 'image/' 'video/' 'audio/')) ifTrue: [^nil].
>>>> index := contentType findString: 'charset=' startingAt: 1 caseSensitive: false.
>>>> index = 0 ifTrue: [ ^nil ].
>>>> ^TextConverter newForEncoding: (contentType allButFirst: index + 7) "'charset=' size - 1"!


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: WebClient-Core-topa.112.mcz

Tobias Pape
Followup

> On 31.10.2017, at 17:54, Tobias Pape <[hidden email]> wrote:
>
>
>> On 31.10.2017, at 07:51, monty <[hidden email]> wrote:
>>
>>> Sent: Friday, October 27, 2017 at 1:33 PM
>>> From: "Tobias Pape" <[hidden email]>
>>> To: "The general-purpose Squeak developers list" <[hidden email]>
>>> Subject: Re: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>>>
>>>
>>>> On 27.10.2017, at 18:40, monty <[hidden email]> wrote:
>>>>
>>>> This series of content decoding changes is not backwards compatible. Existing code that does its own response decoding with the assumption that WebClient won't do it will now double-decode the response--corrupting it. But breaking existing code is probably worth it here because of the gains in intuitiveness and correctness.
>>>>
>>>> Is there any way to disable the automatic response decoding?
>>>
>>> That's problematic.
>>> Some conversation was always done, you never ended up with mere Strings/ByteArrays.
>>>
>>> This is the old version, that opportunistically applied inflate/deflate:
>>
>> Which is part of the HTTP protocol itself and is meant to be applied before transit and removed after in a way that is entirely transparent to the application. By decoding the response content from UTF-8 or some other character encoding, you're directly modifying the content before giving it to the application, and in a way that is permanent and maybe incorrect.
>>
>
> Yes, when the http-header says so?
>
> (see https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html)
>
> I don't see why handling the 'content-encoding' (section 3.5, eg, deflate), the 'transfer-encoding' (section 3.6, eg, chunked), and the 'content-type' (section 3.7, with 'charset') headers differently.
> Actually, it probably should also handle the 'charset' (section 3.4) header, too.


Also, the protocol says:

"HTTP/1.1 recipients MUST respect the charset label provided by the sender"


>
> Also, HTTP explicitly says that anything not tagged is to be handled as iso-8859-1, so anything that does _not_ tag in the header but only in the content (say, xml whith utf8) will be problematic.
>
> OTOH, you still are able to retrieve the information from the message.
>
> So, say the XML doc says it's utf-8 encoded, you can still ask the message whether the content-type;charset or the charset header already where sent as utf-8, and if not, recode from iso-8859-1.
>
>
>
>> Please review and merge WebClient-Core-monty.114.mcz.
>
> I'll review.
>
>
> Best regards
> -Tobias
>
>
>
>>
>>> ================
>>> getContentWithProgress: progressBlockOrNil
>>> "Reads available content and returns it."
>>>
>>> | length result |
>>> length := self contentLength.
>>> result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
>>> new: (length ifNil: [ 1000 ])
>>> streamContents: [ :outputStream |
>>> self
>>> streamFrom: stream
>>> to: outputStream
>>> size: length
>>> progress: progressBlockOrNil ].
>>> (self headerAt: 'content-encoding') = 'gzip' ifFalse: [ ^result ].
>>> ^(GZipReadStream on: result) upToEnd
>>> =================
>>>
>>>
>>> Maybe we need a plain "get me the bytes" variant.
>>> Oh there is:
>>>
>>> WebMessage>>streamFrom:to:size:progress:
>>>
>>> That's somewhat inelegant but should work…
>>>
>>>
>>>
>>> Best regards
>>> -Tobias
>>>
>>>>
>>>>> Sent: Wednesday, September 20, 2017 at 11:29 AM
>>>>> From: [hidden email]
>>>>> To: [hidden email], [hidden email]
>>>>> Subject: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>>>>>
>>>>> Tobias Pape uploaded a new version of WebClient-Core to project The Trunk:
>>>>> http://source.squeak.org/trunk/WebClient-Core-topa.112.mcz
>>>>>
>>>>> ==================== Summary ====================
>>>>>
>>>>> Name: WebClient-Core-topa.112
>>>>> Author: topa
>>>>> Time: 20 September 2017, 5:29:06.096983 pm
>>>>> UUID: 60b494fc-0652-4a28-be5a-1578963e5aed
>>>>> Ancestors: WebClient-Core-topa.111
>>>>>
>>>>> Abide Postel's law for text conversion.
>>>>>
>>>>> Be conservative in what you do, be liberal in what you accept from others.
>>>>>
>>>>> =============== Diff against WebClient-Core-topa.111 ===============
>>>>>
>>>>> Item was changed:
>>>>> ----- Method: WebMessage>>getContentWithProgress: (in category 'private') -----
>>>>> getContentWithProgress: progressBlockOrNil
>>>>> "Reads available content and returns it."
>>>>>
>>>>> | length result |
>>>>> length := self contentLength.
>>>>> result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
>>>>> new: (length ifNil: [ 1000 ])
>>>>> streamContents: [ :outputStream |
>>>>> self
>>>>> streamFrom: stream
>>>>> to: outputStream
>>>>> size: length
>>>>> progress: progressBlockOrNil ].
>>>>> self decoderForContentEncoding ifNotNil: [:decoder |
>>>>> result := (decoder on: result) upToEnd].
>>>>> self textConverterForContentType ifNotNil: [:converter |
>>>>> + [result := result convertFromWithConverter: converter]
>>>>> + on: InvalidUTF8 "some servers lie"
>>>>> + do: [^ result]].
>>>>> - result := result convertFromWithConverter: converter].
>>>>> ^ result
>>>>> !
>>>>>
>>>>> Item was changed:
>>>>> ----- Method: WebMessage>>textConverterForContentType (in category 'accessing') -----
>>>>> textConverterForContentType
>>>>>
>>>>> | index contentType |
>>>>> contentType := self contentType.
>>>>> contentType size < 8 ifTrue: [ ^nil ].
>>>>> - (contentType beginsWithAnyOf: #('application/' 'image/' 'video/' 'audio/')) ifTrue: [^nil].
>>>>> index := contentType findString: 'charset=' startingAt: 1 caseSensitive: false.
>>>>> index = 0 ifTrue: [ ^nil ].
>>>>> ^TextConverter newForEncoding: (contentType allButFirst: index + 7) "'charset=' size - 1"!
>
>


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: WebClient-Core-topa.112.mcz

Tobias Pape
Followup 2:

> On 31.10.2017, at 18:03, Tobias Pape <[hidden email]> wrote:
>
> Followup
>> On 31.10.2017, at 17:54, Tobias Pape <[hidden email]> wrote:
>>
>>
>>> On 31.10.2017, at 07:51, monty <[hidden email]> wrote:
>>>
>>>> Sent: Friday, October 27, 2017 at 1:33 PM
>>>> From: "Tobias Pape" <[hidden email]>
>>>> To: "The general-purpose Squeak developers list" <[hidden email]>
>>>> Subject: Re: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>>>>
>>>>
>>>>> On 27.10.2017, at 18:40, monty <[hidden email]> wrote:
>>>>>
>>>>> This series of content decoding changes is not backwards compatible. Existing code that does its own response decoding with the assumption that WebClient won't do it will now double-decode the response--corrupting it. But breaking existing code is probably worth it here because of the gains in intuitiveness and correctness.
>>>>>
>>>>> Is there any way to disable the automatic response decoding?
>>>>
>>>> That's problematic.
>>>> Some conversation was always done, you never ended up with mere Strings/ByteArrays.
>>>>
>>>> This is the old version, that opportunistically applied inflate/deflate:
>>>
>>> Which is part of the HTTP protocol itself and is meant to be applied before transit and removed after in a way that is entirely transparent to the application. By decoding the response content from UTF-8 or some other character encoding, you're directly modifying the content before giving it to the application, and in a way that is permanent and maybe incorrect.
>>>
>>
>> Yes, when the http-header says so?
>>
>> (see https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html)
>>
>> I don't see why handling the 'content-encoding' (section 3.5, eg, deflate), the 'transfer-encoding' (section 3.6, eg, chunked), and the 'content-type' (section 3.7, with 'charset') headers differently.
>> Actually, it probably should also handle the 'charset' (section 3.4) header, too.
>
>
> Also, the protocol says:
>
> "HTTP/1.1 recipients MUST respect the charset label provided by the sender"


For html (and xml) it is stated that the header takes precedence regardless:

https://www.w3.org/International/questions/qa-html-encoding-declarations.en
"The HTTP header information has the highest priority when it conflicts with in-document declarations […]"


>
>
>>
>> Also, HTTP explicitly says that anything not tagged is to be handled as iso-8859-1, so anything that does _not_ tag in the header but only in the content (say, xml whith utf8) will be problematic.
>>
>> OTOH, you still are able to retrieve the information from the message.
>>
>> So, say the XML doc says it's utf-8 encoded, you can still ask the message whether the content-type;charset or the charset header already where sent as utf-8, and if not, recode from iso-8859-1.
>>
>>
>>
>>> Please review and merge WebClient-Core-monty.114.mcz.
>>
>> I'll review.
>>
>>
>> Best regards
>> -Tobias
>>
>>
>>
>>>
>>>> ================
>>>> getContentWithProgress: progressBlockOrNil
>>>> "Reads available content and returns it."
>>>>
>>>> | length result |
>>>> length := self contentLength.
>>>> result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
>>>> new: (length ifNil: [ 1000 ])
>>>> streamContents: [ :outputStream |
>>>> self
>>>> streamFrom: stream
>>>> to: outputStream
>>>> size: length
>>>> progress: progressBlockOrNil ].
>>>> (self headerAt: 'content-encoding') = 'gzip' ifFalse: [ ^result ].
>>>> ^(GZipReadStream on: result) upToEnd
>>>> =================
>>>>
>>>>
>>>> Maybe we need a plain "get me the bytes" variant.
>>>> Oh there is:
>>>>
>>>> WebMessage>>streamFrom:to:size:progress:
>>>>
>>>> That's somewhat inelegant but should work…
>>>>
>>>>
>>>>
>>>> Best regards
>>>> -Tobias
>>>>
>>>>>
>>>>>> Sent: Wednesday, September 20, 2017 at 11:29 AM
>>>>>> From: [hidden email]
>>>>>> To: [hidden email], [hidden email]
>>>>>> Subject: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>>>>>>
>>>>>> Tobias Pape uploaded a new version of WebClient-Core to project The Trunk:
>>>>>> http://source.squeak.org/trunk/WebClient-Core-topa.112.mcz
>>>>>>
>>>>>> ==================== Summary ====================
>>>>>>
>>>>>> Name: WebClient-Core-topa.112
>>>>>> Author: topa
>>>>>> Time: 20 September 2017, 5:29:06.096983 pm
>>>>>> UUID: 60b494fc-0652-4a28-be5a-1578963e5aed
>>>>>> Ancestors: WebClient-Core-topa.111
>>>>>>
>>>>>> Abide Postel's law for text conversion.
>>>>>>
>>>>>> Be conservative in what you do, be liberal in what you accept from others.
>>>>>>
>>>>>> =============== Diff against WebClient-Core-topa.111 ===============
>>>>>>
>>>>>> Item was changed:
>>>>>> ----- Method: WebMessage>>getContentWithProgress: (in category 'private') -----
>>>>>> getContentWithProgress: progressBlockOrNil
>>>>>> "Reads available content and returns it."
>>>>>>
>>>>>> | length result |
>>>>>> length := self contentLength.
>>>>>> result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
>>>>>> new: (length ifNil: [ 1000 ])
>>>>>> streamContents: [ :outputStream |
>>>>>> self
>>>>>> streamFrom: stream
>>>>>> to: outputStream
>>>>>> size: length
>>>>>> progress: progressBlockOrNil ].
>>>>>> self decoderForContentEncoding ifNotNil: [:decoder |
>>>>>> result := (decoder on: result) upToEnd].
>>>>>> self textConverterForContentType ifNotNil: [:converter |
>>>>>> + [result := result convertFromWithConverter: converter]
>>>>>> + on: InvalidUTF8 "some servers lie"
>>>>>> + do: [^ result]].
>>>>>> - result := result convertFromWithConverter: converter].
>>>>>> ^ result
>>>>>> !
>>>>>>
>>>>>> Item was changed:
>>>>>> ----- Method: WebMessage>>textConverterForContentType (in category 'accessing') -----
>>>>>> textConverterForContentType
>>>>>>
>>>>>> | index contentType |
>>>>>> contentType := self contentType.
>>>>>> contentType size < 8 ifTrue: [ ^nil ].
>>>>>> - (contentType beginsWithAnyOf: #('application/' 'image/' 'video/' 'audio/')) ifTrue: [^nil].
>>>>>> index := contentType findString: 'charset=' startingAt: 1 caseSensitive: false.
>>>>>> index = 0 ifTrue: [ ^nil ].
>>>>>> ^TextConverter newForEncoding: (contentType allButFirst: index + 7) "'charset=' size - 1"!


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: WebClient-Core-topa.112.mcz

monty-3


> Sent: Tuesday, October 31, 2017 at 2:23 PM
> From: "Tobias Pape" <[hidden email]>
> To: "The general-purpose Squeak developers list" <[hidden email]>
> Subject: Re: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
>
> Followup 2:
>
> > On 31.10.2017, at 18:03, Tobias Pape <[hidden email]> wrote:
> >
> > Followup
> >> On 31.10.2017, at 17:54, Tobias Pape <[hidden email]> wrote:
> >>
> >>
> >>> On 31.10.2017, at 07:51, monty <[hidden email]> wrote:
> >>>
> >>>> Sent: Friday, October 27, 2017 at 1:33 PM
> >>>> From: "Tobias Pape" <[hidden email]>
> >>>> To: "The general-purpose Squeak developers list" <[hidden email]>
> >>>> Subject: Re: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
> >>>>
> >>>>
> >>>>> On 27.10.2017, at 18:40, monty <[hidden email]> wrote:
> >>>>>
> >>>>> This series of content decoding changes is not backwards compatible. Existing code that does its own response decoding with the assumption that WebClient won't do it will now double-decode the response--corrupting it. But breaking existing code is probably worth it here because of the gains in intuitiveness and correctness.
> >>>>>
> >>>>> Is there any way to disable the automatic response decoding?
> >>>>
> >>>> That's problematic.
> >>>> Some conversation was always done, you never ended up with mere Strings/ByteArrays.
> >>>>
> >>>> This is the old version, that opportunistically applied inflate/deflate:
> >>>
> >>> Which is part of the HTTP protocol itself and is meant to be applied before transit and removed after in a way that is entirely transparent to the application. By decoding the response content from UTF-8 or some other character encoding, you're directly modifying the content before giving it to the application, and in a way that is permanent and maybe incorrect.
> >>>
> >>
> >> Yes, when the http-header says so?
> >>
> >> (see https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html)
> >>
> >> I don't see why handling the 'content-encoding' (section 3.5, eg, deflate), the 'transfer-encoding' (section 3.6, eg, chunked), and the 'content-type' (section 3.7, with 'charset') headers differently.

Transfer-Encoding/Content-Encoding don't change the underlying message body the application is given. The same byte sequence the server started with, for example, a UTF-16 encoded text file read from disk, is available to the application after the HTTP client reverses the GZip compression applied by the server before transit. But decoding from a particular character encoding permanently transforms the message body before it's given to the application.

> >> Actually, it probably should also handle the 'charset' (section 3.4) header, too.

There is no charset header, only a charset parameter for certain headers like Content-Type.

> > Also, the protocol says:
> >
> > "HTTP/1.1 recipients MUST respect the charset label provided by the sender"
>
>
> For html (and xml) it is stated that the header takes precedence regardless:
>
> https://www.w3.org/International/questions/qa-html-encoding-declarations.en
> "The HTTP header information has the highest priority when it conflicts with in-document declarations […]"
 
For HTML5, you're wrong. BOMs in the content take precedence over the Content-Type charset parameter:

"That this step happens before the next one honoring the HTTP Content-Type header is a willful violation of the HTTP specification, motivated by a desire to be maximally compatible with legacy content. [HTTP]"

https://www.w3.org/TR/html5/syntax.html#determining-the-character-encoding

And even if .114.mcz is rejected, .113.mcz is still worth integrating.

> >
> >
> >>
> >> Also, HTTP explicitly says that anything not tagged is to be handled as iso-8859-1, so anything that does _not_ tag in the header but only in the content (say, xml whith utf8) will be problematic.
> >>
> >> OTOH, you still are able to retrieve the information from the message.
> >>
> >> So, say the XML doc says it's utf-8 encoded, you can still ask the message whether the content-type;charset or the charset header already where sent as utf-8, and if not, recode from iso-8859-1.
> >>
> >>
> >>
> >>> Please review and merge WebClient-Core-monty.114.mcz.
> >>
> >> I'll review.
> >>
> >>
> >> Best regards
> >> -Tobias
> >>
> >>
> >>
> >>>
> >>>> ================
> >>>> getContentWithProgress: progressBlockOrNil
> >>>> "Reads available content and returns it."
> >>>>
> >>>> | length result |
> >>>> length := self contentLength.
> >>>> result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
> >>>> new: (length ifNil: [ 1000 ])
> >>>> streamContents: [ :outputStream |
> >>>> self
> >>>> streamFrom: stream
> >>>> to: outputStream
> >>>> size: length
> >>>> progress: progressBlockOrNil ].
> >>>> (self headerAt: 'content-encoding') = 'gzip' ifFalse: [ ^result ].
> >>>> ^(GZipReadStream on: result) upToEnd
> >>>> =================
> >>>>
> >>>>
> >>>> Maybe we need a plain "get me the bytes" variant.
> >>>> Oh there is:
> >>>>
> >>>> WebMessage>>streamFrom:to:size:progress:
> >>>>
> >>>> That's somewhat inelegant but should work…
> >>>>
> >>>>
> >>>>
> >>>> Best regards
> >>>> -Tobias
> >>>>
> >>>>>
> >>>>>> Sent: Wednesday, September 20, 2017 at 11:29 AM
> >>>>>> From: [hidden email]
> >>>>>> To: [hidden email], [hidden email]
> >>>>>> Subject: [squeak-dev] The Trunk: WebClient-Core-topa.112.mcz
> >>>>>>
> >>>>>> Tobias Pape uploaded a new version of WebClient-Core to project The Trunk:
> >>>>>> http://source.squeak.org/trunk/WebClient-Core-topa.112.mcz
> >>>>>>
> >>>>>> ==================== Summary ====================
> >>>>>>
> >>>>>> Name: WebClient-Core-topa.112
> >>>>>> Author: topa
> >>>>>> Time: 20 September 2017, 5:29:06.096983 pm
> >>>>>> UUID: 60b494fc-0652-4a28-be5a-1578963e5aed
> >>>>>> Ancestors: WebClient-Core-topa.111
> >>>>>>
> >>>>>> Abide Postel's law for text conversion.
> >>>>>>
> >>>>>> Be conservative in what you do, be liberal in what you accept from others.
> >>>>>>
> >>>>>> =============== Diff against WebClient-Core-topa.111 ===============
> >>>>>>
> >>>>>> Item was changed:
> >>>>>> ----- Method: WebMessage>>getContentWithProgress: (in category 'private') -----
> >>>>>> getContentWithProgress: progressBlockOrNil
> >>>>>> "Reads available content and returns it."
> >>>>>>
> >>>>>> | length result |
> >>>>>> length := self contentLength.
> >>>>>> result := (stream isBinary ifTrue:[ ByteArray ] ifFalse: [ ByteString ])
> >>>>>> new: (length ifNil: [ 1000 ])
> >>>>>> streamContents: [ :outputStream |
> >>>>>> self
> >>>>>> streamFrom: stream
> >>>>>> to: outputStream
> >>>>>> size: length
> >>>>>> progress: progressBlockOrNil ].
> >>>>>> self decoderForContentEncoding ifNotNil: [:decoder |
> >>>>>> result := (decoder on: result) upToEnd].
> >>>>>> self textConverterForContentType ifNotNil: [:converter |
> >>>>>> + [result := result convertFromWithConverter: converter]
> >>>>>> + on: InvalidUTF8 "some servers lie"
> >>>>>> + do: [^ result]].
> >>>>>> - result := result convertFromWithConverter: converter].
> >>>>>> ^ result
> >>>>>> !
> >>>>>>
> >>>>>> Item was changed:
> >>>>>> ----- Method: WebMessage>>textConverterForContentType (in category 'accessing') -----
> >>>>>> textConverterForContentType
> >>>>>>
> >>>>>> | index contentType |
> >>>>>> contentType := self contentType.
> >>>>>> contentType size < 8 ifTrue: [ ^nil ].
> >>>>>> - (contentType beginsWithAnyOf: #('application/' 'image/' 'video/' 'audio/')) ifTrue: [^nil].
> >>>>>> index := contentType findString: 'charset=' startingAt: 1 caseSensitive: false.
> >>>>>> index = 0 ifTrue: [ ^nil ].
> >>>>>> ^TextConverter newForEncoding: (contentType allButFirst: index + 7) "'charset=' size - 1"!
>
>
>