requests and encodings (was Re: fix for issue 21)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

requests and encodings (was Re: fix for issue 21)

Julian Fitzell-2
Moving to seaside-dev...

On Sat, Jun 28, 2008 at 2:56 PM, Philippe Marschall
<[hidden email]> wrote:

> 2008/6/27, Julian Fitzell <[hidden email]>:
>>  On Fri, Jun 27, 2008 at 1:07 PM, Philippe Marschall
>>  <[hidden email]> wrote:
>>  > 2008/6/26, Julian Fitzell <[hidden email]>:
>>  >>  - I wonder whether we should add a "path" instVar to WARequest.
>>  >>  Currently the (unfortunately-named) "url" instvar doesn't provide any
>>  >>  way to tell the difference between a '/' and a '%2f' in the original
>>  >>  URL. I broke my fix up into two methods so that we could store the
>>  >>  result of #pathSegmentsFrom: in another instvar.
>>  >
>>  > Ideally IMHO "url" would hold a WAUrl that is the request URL parsed.
>>  > I don't know though if this is enough and what it all will break.
>>  > Right now "url" is also always utf-8 decoded which made me create
>>  > issue 79.
>>
>>
>> Well, I thought that too but it would kind of break things to change
>>  it from a string to a WAUrl. Also, after more thought, I realized that
>>  an HTTP request doesn't have a protocol, port, or (necessarily)
>>  server.
>
> Yes it does. The server is in the HOST header. The protocol is either
> http or https we can get this from the configuration. Same for the
> port.

Yeah, ok, I suppose you /could/ fake it with the information from the
configuration (there is no Host: header in HTTP/1.0 but that's likely
not a big problem these days). Is that misleading though since the
user might actually have connected differently (particularly for an
initial connection where seaside's configuration doesn't enter into
the equation? You could also presumably find the port and protocol of
the Kom connection from Kom itself somehow...

In either case, it seems to me that changing #url from a string to a
WAUrl would break existing code. Maybe it's desirable... not a
difficult fix to code that does break and it would probably break
pretty obviously.

>>  >>  - do you know if the header values in HTTPRequest also need to be
>>  >>  decoded? They aren't currently and I don't know if they support UTF-8
>>  >>  values or not...
>>  >
>>  > If they are really UTF-8 that would be good. An example is cookie
>>  > values which are transmitted through headers. See also issue 63.
>>  > Before adding such a thing, please make sure it really works with IE
>>  > 6, Firefox 2, Safari 2 and Opera 9 with utf-8, ISO-8859-1 and utf-16.
>>  > Ideally also Big5 and  Shift JIS though I have to admit I never tested
>>  > with those. Unfortunately the HTTP spec/theory and browsers/reality
>>  > are different.
>>
>>
>> Are you suggesting auto-detecting the encoding of headers sent by the
>>  browser?
>
> No not at all. But in Seaside 2.9 we now know the encoding oft the web
> application. Even if there is a spec, you will simply have to try all
> browsers with at least iso-8859-1 and utf-8. Either there is a rule or
> we can't support it. It's as simple as that. A short googling suggests
> that headers are ASCII. We might or might not want to support a custom
> encoding for cookie values.
>
>> I don't think the browser specifies an encoding in the
>>  headers does it? I'm not sure I want to tackle this mess right now but
>>  I'll keep it in mind. :)
>
> It can, in the content-type header. Not that it often does.
>
>>  I'd have to think about this more but if we are supporting all those
>>  encodings, wouldn't it be nice to have a pair of encoders: one for
>>  what we want our Response encoding to be and one for the encoding we
>>  want to use internally (convert Request data *TO* and Response data
>>  *from*). So you could use a UTF-8 converter for "outside" and a Squeak
>>  encoding converter for "inside"; all incoming data would be converted
>>  to Squeak encoding and anything going out would be converted from
>>  Squeak encoding to UTF-8. If you had UTF-8 encoders for both then you
>>  wouldn't have to do any encoding going out but incoming might still
>>  have to be converted to UTF-8 if it was, for example, UTF-16.
>
> No, internally we ideally want only Squeak/Smalltalk encoding.
> Otherwise we can throw String away and just use ByteArray. The problem
> is that WideStrings are bugged and slow and for legacy reasons we have
> to support "null encoding". Everything else is insanity. Same goes for
> using utf-8 internally and utf-16 externally. Second for some external
> parts (like URLs) the external ecoding is given.

It doesn't appear quite that simple to me... if you have data in UTF
format in a database, you might well prefer to use UTF encoding
internally (or at very least be able to specify the encoding of that
data when giving it to the canvas). Does squeak encoding doesn't
support anything outside basic accented characters does it? Same goes
for incoming form data if you have to put it in a database... you
don't want to be putting it in in Squeak encoding.

Julian
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: requests and encodings (was Re: fix for issue 21)

Philippe Marschall
2008/6/28, Julian Fitzell <[hidden email]>:

> Moving to seaside-dev...
>
>  On Sat, Jun 28, 2008 at 2:56 PM, Philippe Marschall
>  <[hidden email]> wrote:
>  > 2008/6/27, Julian Fitzell <[hidden email]>:
>  >>  On Fri, Jun 27, 2008 at 1:07 PM, Philippe Marschall
>  >>  <[hidden email]> wrote:
>  >>  > 2008/6/26, Julian Fitzell <[hidden email]>:
>  >>  >>  - I wonder whether we should add a "path" instVar to WARequest.
>  >>  >>  Currently the (unfortunately-named) "url" instvar doesn't provide any
>  >>  >>  way to tell the difference between a '/' and a '%2f' in the original
>  >>  >>  URL. I broke my fix up into two methods so that we could store the
>  >>  >>  result of #pathSegmentsFrom: in another instvar.
>  >>  >
>  >>  > Ideally IMHO "url" would hold a WAUrl that is the request URL parsed.
>  >>  > I don't know though if this is enough and what it all will break.
>  >>  > Right now "url" is also always utf-8 decoded which made me create
>  >>  > issue 79.
>  >>
>  >>
>  >> Well, I thought that too but it would kind of break things to change
>  >>  it from a string to a WAUrl. Also, after more thought, I realized that
>  >>  an HTTP request doesn't have a protocol, port, or (necessarily)
>  >>  server.
>  >
>  > Yes it does. The server is in the HOST header. The protocol is either
>  > http or https we can get this from the configuration. Same for the
>  > port.
>
>  Yeah, ok, I suppose you /could/ fake it with the information from the
>  configuration (there is no Host: header in HTTP/1.0 but that's likely
>  not a big problem these days). Is that misleading though since the
>  user might actually have connected differently (particularly for an
>  initial connection where seaside's configuration doesn't enter into
>  the equation? You could also presumably find the port and protocol of
>  the Kom connection from Kom itself somehow...

Well then, let's exclude the port and scheme:

WAUrl new parsePath: '/ch/de/index.html'

works quite well.

>  In either case, it seems to me that changing #url from a string to a
>  WAUrl would break existing code. Maybe it's desirable...

Breaking client code is never desirable.

> not a
>  difficult fix to code that does break and it would probably break
>  pretty obviously.

and there should be pretty few users.

>  >>  >>  - do you know if the header values in HTTPRequest also need to be
>  >>  >>  decoded? They aren't currently and I don't know if they support UTF-8
>  >>  >>  values or not...
>  >>  >
>  >>  > If they are really UTF-8 that would be good. An example is cookie
>  >>  > values which are transmitted through headers. See also issue 63.
>  >>  > Before adding such a thing, please make sure it really works with IE
>  >>  > 6, Firefox 2, Safari 2 and Opera 9 with utf-8, ISO-8859-1 and utf-16.
>  >>  > Ideally also Big5 and  Shift JIS though I have to admit I never tested
>  >>  > with those. Unfortunately the HTTP spec/theory and browsers/reality
>  >>  > are different.
>  >>
>  >>
>  >> Are you suggesting auto-detecting the encoding of headers sent by the
>  >>  browser?
>  >
>  > No not at all. But in Seaside 2.9 we now know the encoding oft the web
>  > application. Even if there is a spec, you will simply have to try all
>  > browsers with at least iso-8859-1 and utf-8. Either there is a rule or
>  > we can't support it. It's as simple as that. A short googling suggests
>  > that headers are ASCII. We might or might not want to support a custom
>  > encoding for cookie values.
>  >
>  >> I don't think the browser specifies an encoding in the
>  >>  headers does it? I'm not sure I want to tackle this mess right now but
>  >>  I'll keep it in mind. :)
>  >
>  > It can, in the content-type header. Not that it often does.
>  >
>  >>  I'd have to think about this more but if we are supporting all those
>  >>  encodings, wouldn't it be nice to have a pair of encoders: one for
>  >>  what we want our Response encoding to be and one for the encoding we
>  >>  want to use internally (convert Request data *TO* and Response data
>  >>  *from*). So you could use a UTF-8 converter for "outside" and a Squeak
>  >>  encoding converter for "inside"; all incoming data would be converted
>  >>  to Squeak encoding and anything going out would be converted from
>  >>  Squeak encoding to UTF-8. If you had UTF-8 encoders for both then you
>  >>  wouldn't have to do any encoding going out but incoming might still
>  >>  have to be converted to UTF-8 if it was, for example, UTF-16.
>  >
>  > No, internally we ideally want only Squeak/Smalltalk encoding.
>  > Otherwise we can throw String away and just use ByteArray. The problem
>  > is that WideStrings are bugged and slow and for legacy reasons we have
>  > to support "null encoding". Everything else is insanity. Same goes for
>  > using utf-8 internally and utf-16 externally. Second for some external
>  > parts (like URLs) the external ecoding is given.
>
>  It doesn't appear quite that simple to me... if you have data in UTF
>  format in a database, you might well prefer to use UTF encoding
>  internally

There is no such thing as UTF encoding. Using an encoding other than
Squeak fixes #= but breaks _every_ method except #,. The only reason
you might want this is to avoid the performance penalties of
WideString. But then again have you profiled your application and can
you prove to me that WideStrings are your performance bottleneck? Else
this is pure premature optimization.

> (or at very least be able to specify the encoding of that
>  data when giving it to the canvas).

No, you must adhere to the Seaside contract. You give Strings to
Seaside in the same encoding you expect Seaside to give Strings to
you. Everything else is a pure horror.

> Does squeak encoding doesn't
>  support anything outside basic accented characters does it?

Squeak supports a superset of Unicode including astral planes.

> Same goes
>  for incoming form data if you have to put it in a database... you
>  don't want to be putting it in in Squeak encoding.

That's between you and your database driver. That doesn't include
Seaside at all.

I still think this belongs to seaside-dev.

Cheers
Philippe
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Re: requests and encodings (was Re: fix for issue 21)

Julian Fitzell-3
Hi Philippe,

I did semd my previous message to seaside-dev.

I feel like maybe I've offended you somehow, which was absolutely not
my intention. If so, I apologize. As I said, I love the pluggability
of this new encoding stuff... it's very clean and well done. My
intention was only to fix the bug in issue 21, which I did. The rest
was just thinking aloud.

My knowledge of the encoding in Squeak must be out of date (I was
familiar with it before the internationalization stuff went in). At
the time, MacRoman was used and, as I understand it, MacRoman only has
256 characters. Obviously you want to be dealing with string literals,
etc. in squeak's encoding but data coming out of an existing database
is going to be in something else and outputting data from such a
database is going to be a common case.

Assuming my understanding of MacRoman is correct, you obviously can't
convert UTF-8 database data to MacRoman, then back to UTF-8 for output
back to the browser because the conversion would be lossy. It sounds
like you're saying MacRoman is no longer the encoding used. As long as
the full character space is available in the native encoding, then I
agree that having seaside deliver everything in that native encoding
is a reasonable implementation.

I don't necessarily agree that being able to specify the encoding of a
piece of data is "pure horror" but I agree what is there now is going
to be adequate as long as the internal encoding is appropriate for the
task. Again, sorry for any offense.

Julian

On Sun, Jun 29, 2008 at 9:48 PM, Philippe Marschall
<[hidden email]> wrote:

> 2008/6/28, Julian Fitzell <[hidden email]>:
>> Moving to seaside-dev...
>>
>>  On Sat, Jun 28, 2008 at 2:56 PM, Philippe Marschall
>>  <[hidden email]> wrote:
>>  > 2008/6/27, Julian Fitzell <[hidden email]>:
>>  >>  On Fri, Jun 27, 2008 at 1:07 PM, Philippe Marschall
>>  >>  <[hidden email]> wrote:
>>  >>  > 2008/6/26, Julian Fitzell <[hidden email]>:
>>  >>  >>  - I wonder whether we should add a "path" instVar to WARequest.
>>  >>  >>  Currently the (unfortunately-named) "url" instvar doesn't provide any
>>  >>  >>  way to tell the difference between a '/' and a '%2f' in the original
>>  >>  >>  URL. I broke my fix up into two methods so that we could store the
>>  >>  >>  result of #pathSegmentsFrom: in another instvar.
>>  >>  >
>>  >>  > Ideally IMHO "url" would hold a WAUrl that is the request URL parsed.
>>  >>  > I don't know though if this is enough and what it all will break.
>>  >>  > Right now "url" is also always utf-8 decoded which made me create
>>  >>  > issue 79.
>>  >>
>>  >>
>>  >> Well, I thought that too but it would kind of break things to change
>>  >>  it from a string to a WAUrl. Also, after more thought, I realized that
>>  >>  an HTTP request doesn't have a protocol, port, or (necessarily)
>>  >>  server.
>>  >
>>  > Yes it does. The server is in the HOST header. The protocol is either
>>  > http or https we can get this from the configuration. Same for the
>>  > port.
>>
>>  Yeah, ok, I suppose you /could/ fake it with the information from the
>>  configuration (there is no Host: header in HTTP/1.0 but that's likely
>>  not a big problem these days). Is that misleading though since the
>>  user might actually have connected differently (particularly for an
>>  initial connection where seaside's configuration doesn't enter into
>>  the equation? You could also presumably find the port and protocol of
>>  the Kom connection from Kom itself somehow...
>
> Well then, let's exclude the port and scheme:
>
> WAUrl new parsePath: '/ch/de/index.html'
>
> works quite well.
>
>>  In either case, it seems to me that changing #url from a string to a
>>  WAUrl would break existing code. Maybe it's desirable...
>
> Breaking client code is never desirable.
>
>> not a
>>  difficult fix to code that does break and it would probably break
>>  pretty obviously.
>
> and there should be pretty few users.
>
>>  >>  >>  - do you know if the header values in HTTPRequest also need to be
>>  >>  >>  decoded? They aren't currently and I don't know if they support UTF-8
>>  >>  >>  values or not...
>>  >>  >
>>  >>  > If they are really UTF-8 that would be good. An example is cookie
>>  >>  > values which are transmitted through headers. See also issue 63.
>>  >>  > Before adding such a thing, please make sure it really works with IE
>>  >>  > 6, Firefox 2, Safari 2 and Opera 9 with utf-8, ISO-8859-1 and utf-16.
>>  >>  > Ideally also Big5 and  Shift JIS though I have to admit I never tested
>>  >>  > with those. Unfortunately the HTTP spec/theory and browsers/reality
>>  >>  > are different.
>>  >>
>>  >>
>>  >> Are you suggesting auto-detecting the encoding of headers sent by the
>>  >>  browser?
>>  >
>>  > No not at all. But in Seaside 2.9 we now know the encoding oft the web
>>  > application. Even if there is a spec, you will simply have to try all
>>  > browsers with at least iso-8859-1 and utf-8. Either there is a rule or
>>  > we can't support it. It's as simple as that. A short googling suggests
>>  > that headers are ASCII. We might or might not want to support a custom
>>  > encoding for cookie values.
>>  >
>>  >> I don't think the browser specifies an encoding in the
>>  >>  headers does it? I'm not sure I want to tackle this mess right now but
>>  >>  I'll keep it in mind. :)
>>  >
>>  > It can, in the content-type header. Not that it often does.
>>  >
>>  >>  I'd have to think about this more but if we are supporting all those
>>  >>  encodings, wouldn't it be nice to have a pair of encoders: one for
>>  >>  what we want our Response encoding to be and one for the encoding we
>>  >>  want to use internally (convert Request data *TO* and Response data
>>  >>  *from*). So you could use a UTF-8 converter for "outside" and a Squeak
>>  >>  encoding converter for "inside"; all incoming data would be converted
>>  >>  to Squeak encoding and anything going out would be converted from
>>  >>  Squeak encoding to UTF-8. If you had UTF-8 encoders for both then you
>>  >>  wouldn't have to do any encoding going out but incoming might still
>>  >>  have to be converted to UTF-8 if it was, for example, UTF-16.
>>  >
>>  > No, internally we ideally want only Squeak/Smalltalk encoding.
>>  > Otherwise we can throw String away and just use ByteArray. The problem
>>  > is that WideStrings are bugged and slow and for legacy reasons we have
>>  > to support "null encoding". Everything else is insanity. Same goes for
>>  > using utf-8 internally and utf-16 externally. Second for some external
>>  > parts (like URLs) the external ecoding is given.
>>
>>  It doesn't appear quite that simple to me... if you have data in UTF
>>  format in a database, you might well prefer to use UTF encoding
>>  internally
>
> There is no such thing as UTF encoding. Using an encoding other than
> Squeak fixes #= but breaks _every_ method except #,. The only reason
> you might want this is to avoid the performance penalties of
> WideString. But then again have you profiled your application and can
> you prove to me that WideStrings are your performance bottleneck? Else
> this is pure premature optimization.
>
>> (or at very least be able to specify the encoding of that
>>  data when giving it to the canvas).
>
> No, you must adhere to the Seaside contract. You give Strings to
> Seaside in the same encoding you expect Seaside to give Strings to
> you. Everything else is a pure horror.
>
>> Does squeak encoding doesn't
>>  support anything outside basic accented characters does it?
>
> Squeak supports a superset of Unicode including astral planes.
>
>> Same goes
>>  for incoming form data if you have to put it in a database... you
>>  don't want to be putting it in in Squeak encoding.
>
> That's between you and your database driver. That doesn't include
> Seaside at all.
>
> I still think this belongs to seaside-dev.
>
> Cheers
> Philippe
> _______________________________________________
> seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Re: requests and encodings (was Re: fix for issue 21)

Philippe Marschall
2008/6/29, Julian Fitzell <[hidden email]>:
> Hi Philippe,
>
>  I did semd my previous message to seaside-dev.

All my headers show your message went to
[hidden email] and not
[hidden email]

>  I feel like maybe I've offended you somehow, which was absolutely not
>  my intention. If so, I apologize. As I said, I love the pluggability
>  of this new encoding stuff... it's very clean and well done. My
>  intention was only to fix the bug in issue 21, which I did. The rest
>  was just thinking aloud.
>
>  My knowledge of the encoding in Squeak must be out of date (I was
>  familiar with it before the internationalization stuff went in). At
>  the time, MacRoman was used and, as I understand it, MacRoman only has
>  256 characters.

That was the state of Squeak 3.7 to my knowledge. Squeak 3.8 switched
to layer violated Unicode. So all the MacRoman issues fall away and
new ones appear like #= and a lot of methods in WideString broke. I
don't know if everything of this is fixed in Squeak 3.10 but
WideString had some show stoppers in Squeak 3.8 and 3.9.

Seriously when dealing with Strings we must be sure that they are
Strings. That is only the case if the String has Smalltalk encoding.
Else the String is a mere ByteArray. A byte in it has no semantics at
all. It is not possible to do anything meaningful at all with such an
abstraction because we can not assume anything about it. So it has the
byte value 60 in it. Is that $<? We don't know and can't know because
it has no semantics.

Cheers
Philippe

> Obviously you want to be dealing with string literals,
>  etc. in squeak's encoding but data coming out of an existing database
>  is going to be in something else and outputting data from such a
>  database is going to be a common case.
>
>  Assuming my understanding of MacRoman is correct, you obviously can't
>  convert UTF-8 database data to MacRoman, then back to UTF-8 for output
>  back to the browser because the conversion would be lossy. It sounds
>  like you're saying MacRoman is no longer the encoding used. As long as
>  the full character space is available in the native encoding, then I
>  agree that having seaside deliver everything in that native encoding
>  is a reasonable implementation.
>
>  I don't necessarily agree that being able to specify the encoding of a
>  piece of data is "pure horror" but I agree what is there now is going
>  to be adequate as long as the internal encoding is appropriate for the
>  task. Again, sorry for any offense.
>
>  Julian
>
>  On Sun, Jun 29, 2008 at 9:48 PM, Philippe Marschall
>
> <[hidden email]> wrote:
>  > 2008/6/28, Julian Fitzell <[hidden email]>:
>  >> Moving to seaside-dev...
>  >>
>  >>  On Sat, Jun 28, 2008 at 2:56 PM, Philippe Marschall
>  >>  <[hidden email]> wrote:
>  >>  > 2008/6/27, Julian Fitzell <[hidden email]>:
>  >>  >>  On Fri, Jun 27, 2008 at 1:07 PM, Philippe Marschall
>  >>  >>  <[hidden email]> wrote:
>  >>  >>  > 2008/6/26, Julian Fitzell <[hidden email]>:
>  >>  >>  >>  - I wonder whether we should add a "path" instVar to WARequest.
>  >>  >>  >>  Currently the (unfortunately-named) "url" instvar doesn't provide any
>  >>  >>  >>  way to tell the difference between a '/' and a '%2f' in the original
>  >>  >>  >>  URL. I broke my fix up into two methods so that we could store the
>  >>  >>  >>  result of #pathSegmentsFrom: in another instvar.
>  >>  >>  >
>  >>  >>  > Ideally IMHO "url" would hold a WAUrl that is the request URL parsed.
>  >>  >>  > I don't know though if this is enough and what it all will break.
>  >>  >>  > Right now "url" is also always utf-8 decoded which made me create
>  >>  >>  > issue 79.
>  >>  >>
>  >>  >>
>  >>  >> Well, I thought that too but it would kind of break things to change
>  >>  >>  it from a string to a WAUrl. Also, after more thought, I realized that
>  >>  >>  an HTTP request doesn't have a protocol, port, or (necessarily)
>  >>  >>  server.
>  >>  >
>  >>  > Yes it does. The server is in the HOST header. The protocol is either
>  >>  > http or https we can get this from the configuration. Same for the
>  >>  > port.
>  >>
>  >>  Yeah, ok, I suppose you /could/ fake it with the information from the
>  >>  configuration (there is no Host: header in HTTP/1.0 but that's likely
>  >>  not a big problem these days). Is that misleading though since the
>  >>  user might actually have connected differently (particularly for an
>  >>  initial connection where seaside's configuration doesn't enter into
>  >>  the equation? You could also presumably find the port and protocol of
>  >>  the Kom connection from Kom itself somehow...
>  >
>  > Well then, let's exclude the port and scheme:
>  >
>  > WAUrl new parsePath: '/ch/de/index.html'
>  >
>  > works quite well.
>  >
>  >>  In either case, it seems to me that changing #url from a string to a
>  >>  WAUrl would break existing code. Maybe it's desirable...
>  >
>  > Breaking client code is never desirable.
>  >
>  >> not a
>  >>  difficult fix to code that does break and it would probably break
>  >>  pretty obviously.
>  >
>  > and there should be pretty few users.
>  >
>  >>  >>  >>  - do you know if the header values in HTTPRequest also need to be
>  >>  >>  >>  decoded? They aren't currently and I don't know if they support UTF-8
>  >>  >>  >>  values or not...
>  >>  >>  >
>  >>  >>  > If they are really UTF-8 that would be good. An example is cookie
>  >>  >>  > values which are transmitted through headers. See also issue 63.
>  >>  >>  > Before adding such a thing, please make sure it really works with IE
>  >>  >>  > 6, Firefox 2, Safari 2 and Opera 9 with utf-8, ISO-8859-1 and utf-16.
>  >>  >>  > Ideally also Big5 and  Shift JIS though I have to admit I never tested
>  >>  >>  > with those. Unfortunately the HTTP spec/theory and browsers/reality
>  >>  >>  > are different.
>  >>  >>
>  >>  >>
>  >>  >> Are you suggesting auto-detecting the encoding of headers sent by the
>  >>  >>  browser?
>  >>  >
>  >>  > No not at all. But in Seaside 2.9 we now know the encoding oft the web
>  >>  > application. Even if there is a spec, you will simply have to try all
>  >>  > browsers with at least iso-8859-1 and utf-8. Either there is a rule or
>  >>  > we can't support it. It's as simple as that. A short googling suggests
>  >>  > that headers are ASCII. We might or might not want to support a custom
>  >>  > encoding for cookie values.
>  >>  >
>  >>  >> I don't think the browser specifies an encoding in the
>  >>  >>  headers does it? I'm not sure I want to tackle this mess right now but
>  >>  >>  I'll keep it in mind. :)
>  >>  >
>  >>  > It can, in the content-type header. Not that it often does.
>  >>  >
>  >>  >>  I'd have to think about this more but if we are supporting all those
>  >>  >>  encodings, wouldn't it be nice to have a pair of encoders: one for
>  >>  >>  what we want our Response encoding to be and one for the encoding we
>  >>  >>  want to use internally (convert Request data *TO* and Response data
>  >>  >>  *from*). So you could use a UTF-8 converter for "outside" and a Squeak
>  >>  >>  encoding converter for "inside"; all incoming data would be converted
>  >>  >>  to Squeak encoding and anything going out would be converted from
>  >>  >>  Squeak encoding to UTF-8. If you had UTF-8 encoders for both then you
>  >>  >>  wouldn't have to do any encoding going out but incoming might still
>  >>  >>  have to be converted to UTF-8 if it was, for example, UTF-16.
>  >>  >
>  >>  > No, internally we ideally want only Squeak/Smalltalk encoding.
>  >>  > Otherwise we can throw String away and just use ByteArray. The problem
>  >>  > is that WideStrings are bugged and slow and for legacy reasons we have
>  >>  > to support "null encoding". Everything else is insanity. Same goes for
>  >>  > using utf-8 internally and utf-16 externally. Second for some external
>  >>  > parts (like URLs) the external ecoding is given.
>  >>
>  >>  It doesn't appear quite that simple to me... if you have data in UTF
>  >>  format in a database, you might well prefer to use UTF encoding
>  >>  internally
>  >
>  > There is no such thing as UTF encoding. Using an encoding other than
>  > Squeak fixes #= but breaks _every_ method except #,. The only reason
>  > you might want this is to avoid the performance penalties of
>  > WideString. But then again have you profiled your application and can
>  > you prove to me that WideStrings are your performance bottleneck? Else
>  > this is pure premature optimization.
>  >
>  >> (or at very least be able to specify the encoding of that
>  >>  data when giving it to the canvas).
>  >
>  > No, you must adhere to the Seaside contract. You give Strings to
>  > Seaside in the same encoding you expect Seaside to give Strings to
>  > you. Everything else is a pure horror.
>  >
>  >> Does squeak encoding doesn't
>  >>  support anything outside basic accented characters does it?
>  >
>  > Squeak supports a superset of Unicode including astral planes.
>  >
>  >> Same goes
>  >>  for incoming form data if you have to put it in a database... you
>  >>  don't want to be putting it in in Squeak encoding.
>  >
>  > That's between you and your database driver. That doesn't include
>  > Seaside at all.
>  >
>  > I still think this belongs to seaside-dev.
>  >
>  > Cheers
>  > Philippe
>
> > _______________________________________________
>  > seaside mailing list
>  > [hidden email]
>  > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>  >
>  _______________________________________________
>  seaside mailing list
>  [hidden email]
>  http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Re: requests and encodings (was Re: fix for issue 21)

Julian Fitzell-3
Oops... you're right. I didn't know there was a separate seaside-dev
so missed the distinction you were making.

I've moved this over there now, though I think you've sufficiently
clarified the issue for me at this point anyway.

Julian

On Sun, Jun 29, 2008 at 11:19 PM, Philippe Marschall
<[hidden email]> wrote:

> 2008/6/29, Julian Fitzell <[hidden email]>:
>> Hi Philippe,
>>
>>  I did semd my previous message to seaside-dev.
>
> All my headers show your message went to
> [hidden email] and not
> [hidden email]
>
>>  I feel like maybe I've offended you somehow, which was absolutely not
>>  my intention. If so, I apologize. As I said, I love the pluggability
>>  of this new encoding stuff... it's very clean and well done. My
>>  intention was only to fix the bug in issue 21, which I did. The rest
>>  was just thinking aloud.
>>
>>  My knowledge of the encoding in Squeak must be out of date (I was
>>  familiar with it before the internationalization stuff went in). At
>>  the time, MacRoman was used and, as I understand it, MacRoman only has
>>  256 characters.
>
> That was the state of Squeak 3.7 to my knowledge. Squeak 3.8 switched
> to layer violated Unicode. So all the MacRoman issues fall away and
> new ones appear like #= and a lot of methods in WideString broke. I
> don't know if everything of this is fixed in Squeak 3.10 but
> WideString had some show stoppers in Squeak 3.8 and 3.9.
>
> Seriously when dealing with Strings we must be sure that they are
> Strings. That is only the case if the String has Smalltalk encoding.
> Else the String is a mere ByteArray. A byte in it has no semantics at
> all. It is not possible to do anything meaningful at all with such an
> abstraction because we can not assume anything about it. So it has the
> byte value 60 in it. Is that $<? We don't know and can't know because
> it has no semantics.
>
> Cheers
> Philippe
>
>> Obviously you want to be dealing with string literals,
>>  etc. in squeak's encoding but data coming out of an existing database
>>  is going to be in something else and outputting data from such a
>>  database is going to be a common case.
>>
>>  Assuming my understanding of MacRoman is correct, you obviously can't
>>  convert UTF-8 database data to MacRoman, then back to UTF-8 for output
>>  back to the browser because the conversion would be lossy. It sounds
>>  like you're saying MacRoman is no longer the encoding used. As long as
>>  the full character space is available in the native encoding, then I
>>  agree that having seaside deliver everything in that native encoding
>>  is a reasonable implementation.
>>
>>  I don't necessarily agree that being able to specify the encoding of a
>>  piece of data is "pure horror" but I agree what is there now is going
>>  to be adequate as long as the internal encoding is appropriate for the
>>  task. Again, sorry for any offense.
>>
>>  Julian
>>
>>  On Sun, Jun 29, 2008 at 9:48 PM, Philippe Marschall
>>
>> <[hidden email]> wrote:
>>  > 2008/6/28, Julian Fitzell <[hidden email]>:
>>  >> Moving to seaside-dev...
>>  >>
>>  >>  On Sat, Jun 28, 2008 at 2:56 PM, Philippe Marschall
>>  >>  <[hidden email]> wrote:
>>  >>  > 2008/6/27, Julian Fitzell <[hidden email]>:
>>  >>  >>  On Fri, Jun 27, 2008 at 1:07 PM, Philippe Marschall
>>  >>  >>  <[hidden email]> wrote:
>>  >>  >>  > 2008/6/26, Julian Fitzell <[hidden email]>:
>>  >>  >>  >>  - I wonder whether we should add a "path" instVar to WARequest.
>>  >>  >>  >>  Currently the (unfortunately-named) "url" instvar doesn't provide any
>>  >>  >>  >>  way to tell the difference between a '/' and a '%2f' in the original
>>  >>  >>  >>  URL. I broke my fix up into two methods so that we could store the
>>  >>  >>  >>  result of #pathSegmentsFrom: in another instvar.
>>  >>  >>  >
>>  >>  >>  > Ideally IMHO "url" would hold a WAUrl that is the request URL parsed.
>>  >>  >>  > I don't know though if this is enough and what it all will break.
>>  >>  >>  > Right now "url" is also always utf-8 decoded which made me create
>>  >>  >>  > issue 79.
>>  >>  >>
>>  >>  >>
>>  >>  >> Well, I thought that too but it would kind of break things to change
>>  >>  >>  it from a string to a WAUrl. Also, after more thought, I realized that
>>  >>  >>  an HTTP request doesn't have a protocol, port, or (necessarily)
>>  >>  >>  server.
>>  >>  >
>>  >>  > Yes it does. The server is in the HOST header. The protocol is either
>>  >>  > http or https we can get this from the configuration. Same for the
>>  >>  > port.
>>  >>
>>  >>  Yeah, ok, I suppose you /could/ fake it with the information from the
>>  >>  configuration (there is no Host: header in HTTP/1.0 but that's likely
>>  >>  not a big problem these days). Is that misleading though since the
>>  >>  user might actually have connected differently (particularly for an
>>  >>  initial connection where seaside's configuration doesn't enter into
>>  >>  the equation? You could also presumably find the port and protocol of
>>  >>  the Kom connection from Kom itself somehow...
>>  >
>>  > Well then, let's exclude the port and scheme:
>>  >
>>  > WAUrl new parsePath: '/ch/de/index.html'
>>  >
>>  > works quite well.
>>  >
>>  >>  In either case, it seems to me that changing #url from a string to a
>>  >>  WAUrl would break existing code. Maybe it's desirable...
>>  >
>>  > Breaking client code is never desirable.
>>  >
>>  >> not a
>>  >>  difficult fix to code that does break and it would probably break
>>  >>  pretty obviously.
>>  >
>>  > and there should be pretty few users.
>>  >
>>  >>  >>  >>  - do you know if the header values in HTTPRequest also need to be
>>  >>  >>  >>  decoded? They aren't currently and I don't know if they support UTF-8
>>  >>  >>  >>  values or not...
>>  >>  >>  >
>>  >>  >>  > If they are really UTF-8 that would be good. An example is cookie
>>  >>  >>  > values which are transmitted through headers. See also issue 63.
>>  >>  >>  > Before adding such a thing, please make sure it really works with IE
>>  >>  >>  > 6, Firefox 2, Safari 2 and Opera 9 with utf-8, ISO-8859-1 and utf-16.
>>  >>  >>  > Ideally also Big5 and  Shift JIS though I have to admit I never tested
>>  >>  >>  > with those. Unfortunately the HTTP spec/theory and browsers/reality
>>  >>  >>  > are different.
>>  >>  >>
>>  >>  >>
>>  >>  >> Are you suggesting auto-detecting the encoding of headers sent by the
>>  >>  >>  browser?
>>  >>  >
>>  >>  > No not at all. But in Seaside 2.9 we now know the encoding oft the web
>>  >>  > application. Even if there is a spec, you will simply have to try all
>>  >>  > browsers with at least iso-8859-1 and utf-8. Either there is a rule or
>>  >>  > we can't support it. It's as simple as that. A short googling suggests
>>  >>  > that headers are ASCII. We might or might not want to support a custom
>>  >>  > encoding for cookie values.
>>  >>  >
>>  >>  >> I don't think the browser specifies an encoding in the
>>  >>  >>  headers does it? I'm not sure I want to tackle this mess right now but
>>  >>  >>  I'll keep it in mind. :)
>>  >>  >
>>  >>  > It can, in the content-type header. Not that it often does.
>>  >>  >
>>  >>  >>  I'd have to think about this more but if we are supporting all those
>>  >>  >>  encodings, wouldn't it be nice to have a pair of encoders: one for
>>  >>  >>  what we want our Response encoding to be and one for the encoding we
>>  >>  >>  want to use internally (convert Request data *TO* and Response data
>>  >>  >>  *from*). So you could use a UTF-8 converter for "outside" and a Squeak
>>  >>  >>  encoding converter for "inside"; all incoming data would be converted
>>  >>  >>  to Squeak encoding and anything going out would be converted from
>>  >>  >>  Squeak encoding to UTF-8. If you had UTF-8 encoders for both then you
>>  >>  >>  wouldn't have to do any encoding going out but incoming might still
>>  >>  >>  have to be converted to UTF-8 if it was, for example, UTF-16.
>>  >>  >
>>  >>  > No, internally we ideally want only Squeak/Smalltalk encoding.
>>  >>  > Otherwise we can throw String away and just use ByteArray. The problem
>>  >>  > is that WideStrings are bugged and slow and for legacy reasons we have
>>  >>  > to support "null encoding". Everything else is insanity. Same goes for
>>  >>  > using utf-8 internally and utf-16 externally. Second for some external
>>  >>  > parts (like URLs) the external ecoding is given.
>>  >>
>>  >>  It doesn't appear quite that simple to me... if you have data in UTF
>>  >>  format in a database, you might well prefer to use UTF encoding
>>  >>  internally
>>  >
>>  > There is no such thing as UTF encoding. Using an encoding other than
>>  > Squeak fixes #= but breaks _every_ method except #,. The only reason
>>  > you might want this is to avoid the performance penalties of
>>  > WideString. But then again have you profiled your application and can
>>  > you prove to me that WideStrings are your performance bottleneck? Else
>>  > this is pure premature optimization.
>>  >
>>  >> (or at very least be able to specify the encoding of that
>>  >>  data when giving it to the canvas).
>>  >
>>  > No, you must adhere to the Seaside contract. You give Strings to
>>  > Seaside in the same encoding you expect Seaside to give Strings to
>>  > you. Everything else is a pure horror.
>>  >
>>  >> Does squeak encoding doesn't
>>  >>  support anything outside basic accented characters does it?
>>  >
>>  > Squeak supports a superset of Unicode including astral planes.
>>  >
>>  >> Same goes
>>  >>  for incoming form data if you have to put it in a database... you
>>  >>  don't want to be putting it in in Squeak encoding.
>>  >
>>  > That's between you and your database driver. That doesn't include
>>  > Seaside at all.
>>  >
>>  > I still think this belongs to seaside-dev.
>>  >
>>  > Cheers
>>  > Philippe
>>
>> > _______________________________________________
>>  > seaside mailing list
>>  > [hidden email]
>>  > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>>  >
>>  _______________________________________________
>>  seaside mailing list
>>  [hidden email]
>>  http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>>
> _______________________________________________
> seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Accessing POSTed information directly

Carl Gundel
How can I read POSTed information directly in my Seaside app?  I don't  
need to do this for Seaside components of course, but for elements on  
the page that are generated by Javascript (that I didn't write).

-Carl Gundel
http://www.libertybasic.com
http://www.runbasic.com
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

RE: Accessing POSTed information directly

Ramon Leon-5
> How can I read POSTed information directly in my Seaside app?
>  I don't  
> need to do this for Seaside components of course, but for
> elements on  
> the page that are generated by Javascript (that I didn't write).
>
> -Carl Gundel
> http://www.libertybasic.com
> http://www.runbasic.com

self fieldsAt: #key, which delegates to the session with delegates to the
currentRequest.  If you want to poke around, self session currentRequest
inspect, you'll find everything you need in there.

Ramon Leon
http://onsmalltalk.com

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside