Are we missing some string encoding in SqueakSSL/WebClient?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Are we missing some string encoding in SqueakSSL/WebClient?

Christoph Thiede

Hi all,


while doing some experiments with WebClient & WebUtils in order to send textual data to a server via HTTP(S), I found out that posting a request containing non-ASCII characters in a multipart/form-data yields a primitive failure from SqueakSSL/primitiveEncrypt (occurs on both Win32 + emulated Linux/Ubuntu). When I converted the text manually using #squeakToUtf8 before putting it into the contents data, everything worked fine and the server receives the correct text without any encoding problems.


So for my application, I can manually #squeakToUtf8-convert the request string before posting it, but I wonder whether this should really be the responsibility of every Squeak developer and not rather one of the WebClient? For illustration: If you use normal #htmlSubmit:fields: instead of #httpPost:multipartFields:, WebClient does all necessary conversion (#encodeUrlEncodedForm:, String >> #encodeForHTTP) itself. Anyway, #encodeMultipartForm:boundary: does not care about conversion.


I am only a bloody newbie to all this web stuff, so maybe I am missing something important. What do you think? Should we add #squeakToUtf8 conversion in WebUtils class >> #encodeMultipartForm:boundary:, and in the decode message vice versa? Or would this rather be a responsibility of the SqueakSSL protocol? I am looking forward to your help!


Best,

Christoph



Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Are we missing some string encoding in SqueakSSL/WebClient?

Levente Uzonyi
Hi Christoph,

On Sat, 5 Sep 2020, Thiede, Christoph wrote:

>
> Hi all,
>
>
> while doing some experiments with WebClient & WebUtils in order to send textual data to a server via HTTP(S), I found out that posting a request containing non-ASCII characters in a multipart/form-data yields a primitive
> failure from SqueakSSL/primitiveEncrypt (occurs on both Win32 + emulated Linux/Ubuntu). When I converted the text manually using #squeakToUtf8 before putting it into the contents data, everything worked fine and the server
> receives the correct text without any encoding problems.
>
>
> So for my application, I can manually #squeakToUtf8-convert the request string before posting it, but I wonder whether this should really be the responsibility of every Squeak developer and not rather one of the WebClient?
> For illustration: If you use normal #htmlSubmit:fields: instead of #httpPost:multipartFields:, WebClient does all necessary conversion (#encodeUrlEncodedForm:, String >> #encodeForHTTP) itself. Anyway,
> #encodeMultipartForm:boundary: does not care about conversion.
>
>
> I am only a bloody newbie to all this web stuff, so maybe I am missing something important. What do you think? Should we add #squeakToUtf8 conversion in WebUtils class >> #encodeMultipartForm:boundary:, and in the decode
> message vice versa? Or would this rather be a responsibility of the SqueakSSL protocol? I am looking forward to your help!
Encoding is definitely missing there. multipart/form-data is a mess [1]
though, so we should be careful to generate what presumably everything
supports but be able to parse what other sources can generate.


Levente
[1] https://tools.ietf.org/html/rfc7578

>
>
> Best,
>
> Christoph
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Are we missing some string encoding in SqueakSSL/WebClient?

Christoph Thiede

Hi Levente,


unfortunately, I cannot see what component are you referring to: WebUtils or SqueakSSL?

However, I recently found out myself that it is not a solution to change the encoding of the entire post stream, it breaks other things, for example, base64-encoded data. So we probably want some kind of encoding/decoding logic in the WebUtils part. I will upload a relevant inbox version as soon as possible. :-)

How can we be sure not to break any existing implementations (backward compatibility)? In a nutshell, I think we can't (without doing any kind of error-prone "encoding guess"), but the next release will be a new major version, so I think this will allow us to make a breaking change?

Best,
Christoph

Von: Squeak-dev <[hidden email]> im Auftrag von Levente Uzonyi <[hidden email]>
Gesendet: Sonntag, 6. September 2020 18:22:54
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] Are we missing some string encoding in SqueakSSL/WebClient?
 
Hi Christoph,

On Sat, 5 Sep 2020, Thiede, Christoph wrote:

>
> Hi all,
>
>
> while doing some experiments with WebClient & WebUtils in order to send textual data to a server via HTTP(S), I found out that posting a request containing non-ASCII characters in a multipart/form-data yields a primitive
> failure from SqueakSSL/primitiveEncrypt (occurs on both Win32 + emulated Linux/Ubuntu). When I converted the text manually using #squeakToUtf8 before putting it into the contents data, everything worked fine and the server
> receives the correct text without any encoding problems.
>
>
> So for my application, I can manually #squeakToUtf8-convert the request string before posting it, but I wonder whether this should really be the responsibility of every Squeak developer and not rather one of the WebClient?
> For illustration: If you use normal #htmlSubmit:fields: instead of #httpPost:multipartFields:, WebClient does all necessary conversion (#encodeUrlEncodedForm:, String >> #encodeForHTTP) itself. Anyway,
> #encodeMultipartForm:boundary: does not care about conversion.
>
>
> I am only a bloody newbie to all this web stuff, so maybe I am missing something important. What do you think? Should we add #squeakToUtf8 conversion in WebUtils class >> #encodeMultipartForm:boundary:, and in the decode
> message vice versa? Or would this rather be a responsibility of the SqueakSSL protocol? I am looking forward to your help!

Encoding is definitely missing there. multipart/form-data is a mess [1]
though, so we should be careful to generate what presumably everything
supports but be able to parse what other sources can generate.


Levente
[1] https://tools.ietf.org/html/rfc7578

>
>
> Best,
>
> Christoph
>
>
>


Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Are we missing some string encoding in SqueakSSL/WebClient?

Christoph Thiede

Hi Levente, hi all,


please see WebClient-Core-ct.125. I am looking forward to your review! :-)


In my image, WebClient-Tests keeps failing with 9 failures and 1 error, but I get the same results in a fresh Trunk image, so I guess there is no obvious regression.


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Thiede, Christoph
Gesendet: Dienstag, 8. September 2020 08:37:24
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] Are we missing some string encoding in SqueakSSL/WebClient?
 

Hi Levente,


unfortunately, I cannot see what component are you referring to: WebUtils or SqueakSSL?

However, I recently found out myself that it is not a solution to change the encoding of the entire post stream, it breaks other things, for example, base64-encoded data. So we probably want some kind of encoding/decoding logic in the WebUtils part. I will upload a relevant inbox version as soon as possible. :-)

How can we be sure not to break any existing implementations (backward compatibility)? In a nutshell, I think we can't (without doing any kind of error-prone "encoding guess"), but the next release will be a new major version, so I think this will allow us to make a breaking change?

Best,
Christoph

Von: Squeak-dev <[hidden email]> im Auftrag von Levente Uzonyi <[hidden email]>
Gesendet: Sonntag, 6. September 2020 18:22:54
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] Are we missing some string encoding in SqueakSSL/WebClient?
 
Hi Christoph,

On Sat, 5 Sep 2020, Thiede, Christoph wrote:

>
> Hi all,
>
>
> while doing some experiments with WebClient & WebUtils in order to send textual data to a server via HTTP(S), I found out that posting a request containing non-ASCII characters in a multipart/form-data yields a primitive
> failure from SqueakSSL/primitiveEncrypt (occurs on both Win32 + emulated Linux/Ubuntu). When I converted the text manually using #squeakToUtf8 before putting it into the contents data, everything worked fine and the server
> receives the correct text without any encoding problems.
>
>
> So for my application, I can manually #squeakToUtf8-convert the request string before posting it, but I wonder whether this should really be the responsibility of every Squeak developer and not rather one of the WebClient?
> For illustration: If you use normal #htmlSubmit:fields: instead of #httpPost:multipartFields:, WebClient does all necessary conversion (#encodeUrlEncodedForm:, String >> #encodeForHTTP) itself. Anyway,
> #encodeMultipartForm:boundary: does not care about conversion.
>
>
> I am only a bloody newbie to all this web stuff, so maybe I am missing something important. What do you think? Should we add #squeakToUtf8 conversion in WebUtils class >> #encodeMultipartForm:boundary:, and in the decode
> message vice versa? Or would this rather be a responsibility of the SqueakSSL protocol? I am looking forward to your help!

Encoding is definitely missing there. multipart/form-data is a mess [1]
though, so we should be careful to generate what presumably everything
supports but be able to parse what other sources can generate.


Levente
[1] https://tools.ietf.org/html/rfc7578

>
>
> Best,
>
> Christoph
>
>
>


Carpe Squeak!