The opposite of encodeForHTTP

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

The opposite of encodeForHTTP

Davide Varvello
Hi there,
 Does exist the opposite method of String>>encodeForHTTP somewhere?
TIA
 Davide
Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

Sven Van Caekenberghe
String>>#unescapePercents

On 18 Jul 2012, at 12:42, Davide Varvello wrote:

> Hi there,
> Does exist the opposite method of String>>encodeForHTTP somewhere?
> TIA
> Davide
>
> --
> View this message in context: http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>


Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

Davide Varvello
Thanks Sven,
 I was looking for String>>decode..whatever... with no luck :-)
Cheers
Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

EstebanLM
I wonder why you were lost, with such a clear method name... ;)

On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote:

> Thanks Sven,
> I was looking for String>>decode..whatever... with no luck :-)
> Cheers
>
> --
> View this message in context: http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>


Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

Davide Varvello


From: EstebanLM [via Smalltalk] <[hidden email]>
To: Davide Varvello <[hidden email]>
Sent: Wednesday, July 18, 2012 2:08 PM
Subject: Re: The opposite of encodeForHTTP
I wonder why you were lost, with such a clear method name... ;)


:-)



Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

Stéphane Ducasse
In reply to this post by Davide Varvello
Let us fix it and propose a decodeFromHTTP method

Stef

On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote:

> Thanks Sven,
> I was looking for String>>decode..whatever... with no luck :-)
> Cheers
>
> --
> View this message in context: http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>


Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

Davide Varvello
Good Stef, I opened a new feature as reminder here: http://code.google.com/p/pharo/issues/detail?id=6430
 
Davide

----
- Cerchi un bravo Dentista, Avvocato, Commercialista? Un buon Hotel, Ristorante, Pizzeria? Io l'ho trovato su Oltre il Passaparola



From: Stéphane Ducasse [via Smalltalk] <[hidden email]>
To: Davide Varvello <[hidden email]>
Sent: Thursday, July 19, 2012 10:43 PM
Subject: Re: The opposite of encodeForHTTP

Let us fix it and propose a decodeFromHTTP method

Stef

On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote:

> Thanks Sven,
> I was looking for String>>decode..whatever... with no luck :-)
> Cheers
>
> --
> View this message in context: http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>





If you reply to this email, your message will be added to the discussion below:
http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640822.html
To unsubscribe from The opposite of encodeForHTTP, click here.
NAML


Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

NorbertHartl
In reply to this post by Stéphane Ducasse
IMHO that would worsen the problem :)

encodeForHTTP is not a good name. The encoding is defined for URLs and has nothing to do with HTTP. It is mostly called "url safe encoded" or just "url encoded". Doing it similar as base64 I would propose

urlEncoded
urlDecoded

or

urlSafeEncoded
urlSafeDecoded

my 2 cents,

Norbert

Am 19.07.2012 um 21:47 schrieb Stéphane Ducasse:

> Let us fix it and propose a decodeFromHTTP method
>
> Stef
>
> On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote:
>
>> Thanks Sven,
>> I was looking for String>>decode..whatever... with no luck :-)
>> Cheers
>>
>> --
>> View this message in context: http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

Brenda Larcom
In reply to this post by Davide Varvello
I suppose I could unlurk at this point.  :)

I'm a security geek (specifically, a secure development geek focusing on security architecture) in my day job, and I have a long unmaintained architecture security analysis tool written in Squeak (http://www.octotrike.org/ for the curious), which I have been unmothballing.  We are considering switching to Pharo, partly because we are planning to add some P2P collaboration features we think have an HTTP layer in there somewhere & partly because we like it small, tidy, and self-compatible.  Hence my lurking.

I've done some work on how data validation should be done for security purposes, for my day job.  This includes output encoding and decoding, like what Davide is talking about.  It's pretty tricky to get right because of the large number of contexts, with subtly different rules.  E.g. I would expect encodeForHTTP to be appropriate for HTTP headers, except that e.g. two things you usually want to put in HTTP headers are URIs and cookies, each of which have different rules (for different subparts, even) for what should be encoded.  The differences don't seem like much, but in the wild, my coworkers & I see these sorts of differences lead to vulnerabilities on a daily basis.

From a security architecture perspective, the absolute best way to handle encoding & decoding for a structured object like an HTTP request or response (or a URI, or a cookie, or an HTML document, or..) is to use a validating parser.  Basically, when you get an HTTP request, parse it & put it in an object structured like the request.  At that time, you know the meaning of each portion of the string you are parsing, so you can interpret the bits correctly/safely.  The object(s) should store the individual strings that are actually content (vs. structure & constants) in a decoded state.  The developer should get everything from the objects, in decoded form, and put everything into the objects in decoded form.  Then, when it is time to send the response, the objects encode everything safely/canonically based on the exact type of objects they are.  This design concentrates the hard stuff (encoding, decoding, canonicalization, layering encodings on top of each other) near the interfaces, at the first/last possible moment enough context is known to interpret the information accurately.  It separates the mechanics of using a protocol or format from the intent of using the protocol.  It lets someone like me easily QA both the library and application code for security.  It is also simple for the developer to use safely (all the dev needs to think about is what objects/content they want to assemble, and the data validation at that layer is taken care of automatically) & is therefore the only design pattern I have seen consistently avoid all encoding-related vulnerabilities in the wild.  

So what does this mean?  Basically, from a security perspective, encoding & decoding methods should live in the objects they encode and decode, and never be called from outside code.  That is, there should be an HTTPHeader>>fromString: or fromStream: method, which is called from an HTTPResponse >>fromString: or fromStream: method, and no String>>decodeFromHTTP.   Adding a String>>decodeFromHTTP method is easy from the library maintainer's point of view, approximately correct (way more correct than no method at all), and it matches what most languages are doing these days, but it shifts the burden of all that thought about the specific HTTP header & context to the application developer, who is usually just trying to write an application, not learn every single detail of the HTTP & gazillion other standards he would need to do this safely.

Since this is a suggestion for substantial architecture change that would cause significant backwards compatibility issues throughout the entire Web application stack, and I'm new to Pharo to boot, I am expecting some interesting discussion to occur next.  Or maybe profound silence.  :)

In my back pocket somewhere amongst the code I am unmothballing, I have 95% of a thouroughly documented URI implementation and test suite that follows this pattern and is pedantically compliant with one or another of the URI RFCs (it's old, may not be the most recent).  I believe Spoon & Slate are using a previous version of it or its derivatives.  I'll need a fully pedantic HTTP parsing stack to feel comfortable releasing a P2P architecture security analysis tool (high value target, large attack surface, potentially very large professional embarrassment), so whatever isn't available, I expect we'll end up writing.  If Pharo folks are interested in this pattern, I would love to contribute my libraries/changes as I finish them, get advice on backward compatibility, performance, and APIs people would like to see, review whatever related code you'd like for security issues, and/or collaborate with any other developer who is interested.

Brenda


On Jul 20, 2012, at 1:47 AM, Davide Varvello <[hidden email]> wrote:

Good Stef, I opened a new feature as reminder here: http://code.google.com/p/pharo/issues/detail?id=6430
 
Davide

----
- Cerchi un bravo Dentista, Avvocato, Commercialista? Un buon Hotel, Ristorante, Pizzeria? Io l'ho trovato su Oltre il Passaparola



From: Stéphane Ducasse [via Smalltalk] <[hidden email]>
To: Davide Varvello <[hidden email]>
Sent: Thursday, July 19, 2012 10:43 PM
Subject: Re: The opposite of encodeForHTTP

Let us fix it and propose a decodeFromHTTP method

Stef

On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote:

> Thanks Sven,
> I was looking for String>>decode..whatever... with no luck :-)
> Cheers
>
> --
> View this message in context: http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>





If you reply to this email, your message will be added to the discussion below:
http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640822.html
To unsubscribe from The opposite of encodeForHTTP, click here.
NAML




View this message in context: Re: The opposite of encodeForHTTP
Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.

smime.p7s (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

NorbertHartl
Brenda,

these are all good points as you said from a "security architecture perspective" and we should improve on that. The zinc http components do already a good job in structuring the entities as they should be. I think security add-ons can hook onto what is already there. There is a huge amount of things to consider. Even for a single URL the different components of an url have different encoding needs. 
On the other hand security is not a major target in a lot of use cases I can imagine. There is at least (for me) a triangle of security - performance - usability that makes it hard to have a single approach to fit them all. And we smalltalkers tend to judge freedom very high if it comes to program. In other words I would say we like to preserve the freedom of designing an insecure application at will :) The best way to solve those issues is by being modular, meaning a layer that can be put on top of the existing stuff to fulfill a particular use case.
The things you describe are present in a lot of environments. I mostly call this a "at the border of a system" problem. Things like strings inside of an environment are harmless. Problems appear if you cross system borders, meaning you cross interpretation schemes. And this a topic more broad then only HTTP. 
If we look at a widely known problem like sql injection there is not only the need for proper entity handling but for stacking validators and converters for different problems. It is such a big thing because you have an URL that goes through middleware and ends in a storage system like an SQL database. Here you cross at least two borders: HTTP to middleware and middleware to database. So you need to stack up converters and validators for HTTP, probably shell escapes in a middleware and finally for SQL. I think if you can assemble those things by the layers you use a security approach is doable. And for the same reason it goes so terribly wrong everywhere. 
So what does this modular thing mean? To have a lot of possibilities to fulfill certain needs without restricting everyone to a single scheme. 
My advice would be to have a look at the zinc components and propose things to improve from your perspective. Then publish your results here and there will be a lot of clever people finding a good way to integrate it in a modular way.

I hope this helps,

Norbert

Am 20.07.2012 um 18:25 schrieb Brenda Larcom:

I suppose I could unlurk at this point.  :)

I'm a security geek (specifically, a secure development geek focusing on security architecture) in my day job, and I have a long unmaintained architecture security analysis tool written in Squeak (http://www.octotrike.org/ for the curious), which I have been unmothballing.  We are considering switching to Pharo, partly because we are planning to add some P2P collaboration features we think have an HTTP layer in there somewhere & partly because we like it small, tidy, and self-compatible.  Hence my lurking.

I've done some work on how data validation should be done for security purposes, for my day job.  This includes output encoding and decoding, like what Davide is talking about.  It's pretty tricky to get right because of the large number of contexts, with subtly different rules.  E.g. I would expect encodeForHTTP to be appropriate for HTTP headers, except that e.g. two things you usually want to put in HTTP headers are URIs and cookies, each of which have different rules (for different subparts, even) for what should be encoded.  The differences don't seem like much, but in the wild, my coworkers & I see these sorts of differences lead to vulnerabilities on a daily basis.

From a security architecture perspective, the absolute best way to handle encoding & decoding for a structured object like an HTTP request or response (or a URI, or a cookie, or an HTML document, or..) is to use a validating parser.  Basically, when you get an HTTP request, parse it & put it in an object structured like the request.  At that time, you know the meaning of each portion of the string you are parsing, so you can interpret the bits correctly/safely.  The object(s) should store the individual strings that are actually content (vs. structure & constants) in a decoded state.  The developer should get everything from the objects, in decoded form, and put everything into the objects in decoded form.  Then, when it is time to send the response, the objects encode everything safely/canonically based on the exact type of objects they are.  This design concentrates the hard stuff (encoding, decoding, canonicalization, layering encodings on top of each other) near the interfaces, at the first/last possible moment enough context is known to interpret the information accurately.  It separates the mechanics of using a protocol or format from the intent of using the protocol.  It lets someone like me easily QA both the library and application code for security.  It is also simple for the developer to use safely (all the dev needs to think about is what objects/content they want to assemble, and the data validation at that layer is taken care of automatically) & is therefore the only design pattern I have seen consistently avoid all encoding-related vulnerabilities in the wild.  

So what does this mean?  Basically, from a security perspective, encoding & decoding methods should live in the objects they encode and decode, and never be called from outside code.  That is, there should be an HTTPHeader>>fromString: or fromStream: method, which is called from an HTTPResponse >>fromString: or fromStream: method, and no String>>decodeFromHTTP.   Adding a String>>decodeFromHTTP method is easy from the library maintainer's point of view, approximately correct (way more correct than no method at all), and it matches what most languages are doing these days, but it shifts the burden of all that thought about the specific HTTP header & context to the application developer, who is usually just trying to write an application, not learn every single detail of the HTTP & gazillion other standards he would need to do this safely.

Since this is a suggestion for substantial architecture change that would cause significant backwards compatibility issues throughout the entire Web application stack, and I'm new to Pharo to boot, I am expecting some interesting discussion to occur next.  Or maybe profound silence.  :)

In my back pocket somewhere amongst the code I am unmothballing, I have 95% of a thouroughly documented URI implementation and test suite that follows this pattern and is pedantically compliant with one or another of the URI RFCs (it's old, may not be the most recent).  I believe Spoon & Slate are using a previous version of it or its derivatives.  I'll need a fully pedantic HTTP parsing stack to feel comfortable releasing a P2P architecture security analysis tool (high value target, large attack surface, potentially very large professional embarrassment), so whatever isn't available, I expect we'll end up writing.  If Pharo folks are interested in this pattern, I would love to contribute my libraries/changes as I finish them, get advice on backward compatibility, performance, and APIs people would like to see, review whatever related code you'd like for security issues, and/or collaborate with any other developer who is interested.

Brenda


On Jul 20, 2012, at 1:47 AM, Davide Varvello <[hidden email]> wrote:

Good Stef, I opened a new feature as reminder here: http://code.google.com/p/pharo/issues/detail?id=6430
 
Davide

----
- Cerchi un bravo Dentista, Avvocato, Commercialista? Un buon Hotel, Ristorante, Pizzeria? Io l'ho trovato su Oltre il Passaparola



From: Stéphane Ducasse [via Smalltalk] <<a href="x-msg://205/user/SendEmail.jtp?type=node&amp;node=4640866&amp;i=0" target="_top" rel="nofollow" link="external">[hidden email]>
To: Davide Varvello <<a href="x-msg://205/user/SendEmail.jtp?type=node&amp;node=4640866&amp;i=1" target="_top" rel="nofollow" link="external">[hidden email]>
Sent: Thursday, July 19, 2012 10:43 PM
Subject: Re: The opposite of encodeForHTTP

Let us fix it and propose a decodeFromHTTP method

Stef

On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote:

> Thanks Sven,
> I was looking for String>>decode..whatever... with no luck :-)
> Cheers
>
> --
> View this message in context: http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>





If you reply to this email, your message will be added to the discussion below:
http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640822.html
To unsubscribe from The opposite of encodeForHTTP, <a rel="nofollow" target="_blank" href="x-msg://205/" link="external">click here.
NAML




View this message in context: Re: The opposite of encodeForHTTP
Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

Brenda Larcom
Thanks, Norbert; I'll take a look at Zinc, see how my existing code might integrate, and propose something specific.  I personally think having an insecure option for things like URIs and HTTP that are inherently on borders almost all the time is unwise, but I'm happy to resolve my personal issues via documentation.  :)

One reason validating parsers are so powerful is that when layers stack as you mentioned, the security starts working as soon as the functional part does.  I agree, such a parser definitely belongs at the borders of interpretation schemes, not inside them.  Inside them it'll just use up time without providing value.  Conveniently, the tool people naturally reach for at interpretation borders usually has a parser in it someplace.

And yes, there do seem to be a particular lot of fiddly bits in URIs.  So fiddly a few of the examples in the RFCs (as usual) don't match the rest of the spec.

Brenda

On Jul 20, 2012, at 10:50 AM, Norbert Hartl <[hidden email]> wrote:

Brenda,

these are all good points as you said from a "security architecture perspective" and we should improve on that. The zinc http components do already a good job in structuring the entities as they should be. I think security add-ons can hook onto what is already there. There is a huge amount of things to consider. Even for a single URL the different components of an url have different encoding needs. 
On the other hand security is not a major target in a lot of use cases I can imagine. There is at least (for me) a triangle of security - performance - usability that makes it hard to have a single approach to fit them all. And we smalltalkers tend to judge freedom very high if it comes to program. In other words I would say we like to preserve the freedom of designing an insecure application at will :) The best way to solve those issues is by being modular, meaning a layer that can be put on top of the existing stuff to fulfill a particular use case.
The things you describe are present in a lot of environments. I mostly call this a "at the border of a system" problem. Things like strings inside of an environment are harmless. Problems appear if you cross system borders, meaning you cross interpretation schemes. And this a topic more broad then only HTTP. 
If we look at a widely known problem like sql injection there is not only the need for proper entity handling but for stacking validators and converters for different problems. It is such a big thing because you have an URL that goes through middleware and ends in a storage system like an SQL database. Here you cross at least two borders: HTTP to middleware and middleware to database. So you need to stack up converters and validators for HTTP, probably shell escapes in a middleware and finally for SQL. I think if you can assemble those things by the layers you use a security approach is doable. And for the same reason it goes so terribly wrong everywhere. 
So what does this modular thing mean? To have a lot of possibilities to fulfill certain needs without restricting everyone to a single scheme. 
My advice would be to have a look at the zinc components and propose things to improve from your perspective. Then publish your results here and there will be a lot of clever people finding a good way to integrate it in a modular way.

I hope this helps,

Norbert

Am 20.07.2012 um 18:25 schrieb Brenda Larcom:

I suppose I could unlurk at this point.  :)

I'm a security geek (specifically, a secure development geek focusing on security architecture) in my day job, and I have a long unmaintained architecture security analysis tool written in Squeak (http://www.octotrike.org/ for the curious), which I have been unmothballing.  We are considering switching to Pharo, partly because we are planning to add some P2P collaboration features we think have an HTTP layer in there somewhere & partly because we like it small, tidy, and self-compatible.  Hence my lurking.

I've done some work on how data validation should be done for security purposes, for my day job.  This includes output encoding and decoding, like what Davide is talking about.  It's pretty tricky to get right because of the large number of contexts, with subtly different rules.  E.g. I would expect encodeForHTTP to be appropriate for HTTP headers, except that e.g. two things you usually want to put in HTTP headers are URIs and cookies, each of which have different rules (for different subparts, even) for what should be encoded.  The differences don't seem like much, but in the wild, my coworkers & I see these sorts of differences lead to vulnerabilities on a daily basis.

From a security architecture perspective, the absolute best way to handle encoding & decoding for a structured object like an HTTP request or response (or a URI, or a cookie, or an HTML document, or..) is to use a validating parser.  Basically, when you get an HTTP request, parse it & put it in an object structured like the request.  At that time, you know the meaning of each portion of the string you are parsing, so you can interpret the bits correctly/safely.  The object(s) should store the individual strings that are actually content (vs. structure & constants) in a decoded state.  The developer should get everything from the objects, in decoded form, and put everything into the objects in decoded form.  Then, when it is time to send the response, the objects encode everything safely/canonically based on the exact type of objects they are.  This design concentrates the hard stuff (encoding, decoding, canonicalization, layering encodings on top of each other) near the interfaces, at the first/last possible moment enough context is known to interpret the information accurately.  It separates the mechanics of using a protocol or format from the intent of using the protocol.  It lets someone like me easily QA both the library and application code for security.  It is also simple for the developer to use safely (all the dev needs to think about is what objects/content they want to assemble, and the data validation at that layer is taken care of automatically) & is therefore the only design pattern I have seen consistently avoid all encoding-related vulnerabilities in the wild.  

So what does this mean?  Basically, from a security perspective, encoding & decoding methods should live in the objects they encode and decode, and never be called from outside code.  That is, there should be an HTTPHeader>>fromString: or fromStream: method, which is called from an HTTPResponse >>fromString: or fromStream: method, and no String>>decodeFromHTTP.   Adding a String>>decodeFromHTTP method is easy from the library maintainer's point of view, approximately correct (way more correct than no method at all), and it matches what most languages are doing these days, but it shifts the burden of all that thought about the specific HTTP header & context to the application developer, who is usually just trying to write an application, not learn every single detail of the HTTP & gazillion other standards he would need to do this safely.

Since this is a suggestion for substantial architecture change that would cause significant backwards compatibility issues throughout the entire Web application stack, and I'm new to Pharo to boot, I am expecting some interesting discussion to occur next.  Or maybe profound silence.  :)

In my back pocket somewhere amongst the code I am unmothballing, I have 95% of a thouroughly documented URI implementation and test suite that follows this pattern and is pedantically compliant with one or another of the URI RFCs (it's old, may not be the most recent).  I believe Spoon & Slate are using a previous version of it or its derivatives.  I'll need a fully pedantic HTTP parsing stack to feel comfortable releasing a P2P architecture security analysis tool (high value target, large attack surface, potentially very large professional embarrassment), so whatever isn't available, I expect we'll end up writing.  If Pharo folks are interested in this pattern, I would love to contribute my libraries/changes as I finish them, get advice on backward compatibility, performance, and APIs people would like to see, review whatever related code you'd like for security issues, and/or collaborate with any other developer who is interested.

Brenda


On Jul 20, 2012, at 1:47 AM, Davide Varvello <[hidden email]> wrote:

Good Stef, I opened a new feature as reminder here: http://code.google.com/p/pharo/issues/detail?id=6430
 
Davide

----
- Cerchi un bravo Dentista, Avvocato, Commercialista? Un buon Hotel, Ristorante, Pizzeria? Io l'ho trovato su Oltre il Passaparola



From: Stéphane Ducasse [via Smalltalk] <<a href="x-msg://205/user/SendEmail.jtp?type=node&amp;node=4640866&amp;i=0" target="_top" rel="nofollow" link="external">[hidden email]>
To: Davide Varvello <<a href="x-msg://205/user/SendEmail.jtp?type=node&amp;node=4640866&amp;i=1" target="_top" rel="nofollow" link="external">[hidden email]>
Sent: Thursday, July 19, 2012 10:43 PM
Subject: Re: The opposite of encodeForHTTP

Let us fix it and propose a decodeFromHTTP method

Stef

On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote:

> Thanks Sven,
> I was looking for String>>decode..whatever... with no luck :-)
> Cheers
>
> --
> View this message in context: http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>





If you reply to this email, your message will be added to the discussion below:
http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640822.html
To unsubscribe from The opposite of encodeForHTTP, <a rel="nofollow" target="_blank" href="x-msg://205/" link="external">click here.
NAML




View this message in context: Re: The opposite of encodeForHTTP
Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.


smime.p7s (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

NorbertHartl

Am 20.07.2012 um 20:53 schrieb Brenda Larcom:

Thanks, Norbert; I'll take a look at Zinc, see how my existing code might integrate, and propose something specific.  I personally think having an insecure option for things like URIs and HTTP that are inherently on borders almost all the time is unwise, but I'm happy to resolve my personal issues via documentation.  :)

You said "almost" yourself :) I just wanted to say that different people have different ideas. Restricting software to what we can imagine is like avoiding that other people realize amazing things we couldn't imagine. 

One reason validating parsers are so powerful is that when layers stack as you mentioned, the security starts working as soon as the functional part does.  I agree, such a parser definitely belongs at the borders of interpretation schemes, not inside them.  Inside them it'll just use up time without providing value.  Conveniently, the tool people naturally reach for at interpretation borders usually has a parser in it someplace.

And yes, there do seem to be a particular lot of fiddly bits in URIs.  So fiddly a few of the examples in the RFCs (as usual) don't match the rest of the spec.

agreed. I'm eager to see what you'll come up with. 

Norbert

On Jul 20, 2012, at 10:50 AM, Norbert Hartl <[hidden email]> wrote:

Brenda,

these are all good points as you said from a "security architecture perspective" and we should improve on that. The zinc http components do already a good job in structuring the entities as they should be. I think security add-ons can hook onto what is already there. There is a huge amount of things to consider. Even for a single URL the different components of an url have different encoding needs. 
On the other hand security is not a major target in a lot of use cases I can imagine. There is at least (for me) a triangle of security - performance - usability that makes it hard to have a single approach to fit them all. And we smalltalkers tend to judge freedom very high if it comes to program. In other words I would say we like to preserve the freedom of designing an insecure application at will :) The best way to solve those issues is by being modular, meaning a layer that can be put on top of the existing stuff to fulfill a particular use case.
The things you describe are present in a lot of environments. I mostly call this a "at the border of a system" problem. Things like strings inside of an environment are harmless. Problems appear if you cross system borders, meaning you cross interpretation schemes. And this a topic more broad then only HTTP. 
If we look at a widely known problem like sql injection there is not only the need for proper entity handling but for stacking validators and converters for different problems. It is such a big thing because you have an URL that goes through middleware and ends in a storage system like an SQL database. Here you cross at least two borders: HTTP to middleware and middleware to database. So you need to stack up converters and validators for HTTP, probably shell escapes in a middleware and finally for SQL. I think if you can assemble those things by the layers you use a security approach is doable. And for the same reason it goes so terribly wrong everywhere. 
So what does this modular thing mean? To have a lot of possibilities to fulfill certain needs without restricting everyone to a single scheme. 
My advice would be to have a look at the zinc components and propose things to improve from your perspective. Then publish your results here and there will be a lot of clever people finding a good way to integrate it in a modular way.

I hope this helps,

Norbert

Am 20.07.2012 um 18:25 schrieb Brenda Larcom:

I suppose I could unlurk at this point.  :)

I'm a security geek (specifically, a secure development geek focusing on security architecture) in my day job, and I have a long unmaintained architecture security analysis tool written in Squeak (http://www.octotrike.org/ for the curious), which I have been unmothballing.  We are considering switching to Pharo, partly because we are planning to add some P2P collaboration features we think have an HTTP layer in there somewhere & partly because we like it small, tidy, and self-compatible.  Hence my lurking.

I've done some work on how data validation should be done for security purposes, for my day job.  This includes output encoding and decoding, like what Davide is talking about.  It's pretty tricky to get right because of the large number of contexts, with subtly different rules.  E.g. I would expect encodeForHTTP to be appropriate for HTTP headers, except that e.g. two things you usually want to put in HTTP headers are URIs and cookies, each of which have different rules (for different subparts, even) for what should be encoded.  The differences don't seem like much, but in the wild, my coworkers & I see these sorts of differences lead to vulnerabilities on a daily basis.

From a security architecture perspective, the absolute best way to handle encoding & decoding for a structured object like an HTTP request or response (or a URI, or a cookie, or an HTML document, or..) is to use a validating parser.  Basically, when you get an HTTP request, parse it & put it in an object structured like the request.  At that time, you know the meaning of each portion of the string you are parsing, so you can interpret the bits correctly/safely.  The object(s) should store the individual strings that are actually content (vs. structure & constants) in a decoded state.  The developer should get everything from the objects, in decoded form, and put everything into the objects in decoded form.  Then, when it is time to send the response, the objects encode everything safely/canonically based on the exact type of objects they are.  This design concentrates the hard stuff (encoding, decoding, canonicalization, layering encodings on top of each other) near the interfaces, at the first/last possible moment enough context is known to interpret the information accurately.  It separates the mechanics of using a protocol or format from the intent of using the protocol.  It lets someone like me easily QA both the library and application code for security.  It is also simple for the developer to use safely (all the dev needs to think about is what objects/content they want to assemble, and the data validation at that layer is taken care of automatically) & is therefore the only design pattern I have seen consistently avoid all encoding-related vulnerabilities in the wild.  

So what does this mean?  Basically, from a security perspective, encoding & decoding methods should live in the objects they encode and decode, and never be called from outside code.  That is, there should be an HTTPHeader>>fromString: or fromStream: method, which is called from an HTTPResponse >>fromString: or fromStream: method, and no String>>decodeFromHTTP.   Adding a String>>decodeFromHTTP method is easy from the library maintainer's point of view, approximately correct (way more correct than no method at all), and it matches what most languages are doing these days, but it shifts the burden of all that thought about the specific HTTP header & context to the application developer, who is usually just trying to write an application, not learn every single detail of the HTTP & gazillion other standards he would need to do this safely.

Since this is a suggestion for substantial architecture change that would cause significant backwards compatibility issues throughout the entire Web application stack, and I'm new to Pharo to boot, I am expecting some interesting discussion to occur next.  Or maybe profound silence.  :)

In my back pocket somewhere amongst the code I am unmothballing, I have 95% of a thouroughly documented URI implementation and test suite that follows this pattern and is pedantically compliant with one or another of the URI RFCs (it's old, may not be the most recent).  I believe Spoon & Slate are using a previous version of it or its derivatives.  I'll need a fully pedantic HTTP parsing stack to feel comfortable releasing a P2P architecture security analysis tool (high value target, large attack surface, potentially very large professional embarrassment), so whatever isn't available, I expect we'll end up writing.  If Pharo folks are interested in this pattern, I would love to contribute my libraries/changes as I finish them, get advice on backward compatibility, performance, and APIs people would like to see, review whatever related code you'd like for security issues, and/or collaborate with any other developer who is interested.

Brenda


On Jul 20, 2012, at 1:47 AM, Davide Varvello <[hidden email]> wrote:

Good Stef, I opened a new feature as reminder here: http://code.google.com/p/pharo/issues/detail?id=6430
 
Davide

----
- Cerchi un bravo Dentista, Avvocato, Commercialista? Un buon Hotel, Ristorante, Pizzeria? Io l'ho trovato su Oltre il Passaparola



From: Stéphane Ducasse [via Smalltalk] <<a href="x-msg://205/user/SendEmail.jtp?type=node&amp;node=4640866&amp;i=0" target="_top" rel="nofollow" link="external">[hidden email]>
To: Davide Varvello <<a href="x-msg://205/user/SendEmail.jtp?type=node&amp;node=4640866&amp;i=1" target="_top" rel="nofollow" link="external">[hidden email]>
Sent: Thursday, July 19, 2012 10:43 PM
Subject: Re: The opposite of encodeForHTTP

Let us fix it and propose a decodeFromHTTP method

Stef

On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote:

> Thanks Sven,
> I was looking for String>>decode..whatever... with no luck :-)
> Cheers
>
> --
> View this message in context: http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>





If you reply to this email, your message will be added to the discussion below:
http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640822.html
To unsubscribe from The opposite of encodeForHTTP, <a rel="nofollow" target="_blank" href="x-msg://205/" link="external">click here.
NAML




View this message in context: Re: The opposite of encodeForHTTP
Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

Davide Varvello
In reply to this post by NorbertHartl
Right, I seconded urlEncoded and urlDecoded
Davide

Norbert Hartl wrote
IMHO that would worsen the problem :)

encodeForHTTP is not a good name. The encoding is defined for URLs and has nothing to do with HTTP. It is mostly called "url safe encoded" or just "url encoded". Doing it similar as base64 I would propose

urlEncoded
urlDecoded

or

urlSafeEncoded
urlSafeDecoded

my 2 cents,

Norbert

Am 19.07.2012 um 21:47 schrieb Stéphane Ducasse:

> Let us fix it and propose a decodeFromHTTP method
>
> Stef
>
> On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote:
>
>> Thanks Sven,
>> I was looking for String>>decode..whatever... with no luck :-)
>> Cheers
>>
>> --
>> View this message in context: http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

Stéphane Ducasse
In reply to this post by Brenda Larcom

On Jul 20, 2012, at 6:25 PM, Brenda Larcom wrote:

> I suppose I could unlurk at this point.  :)
>
> I'm a security geek (specifically, a secure development geek focusing on security architecture) in my day job, and I have a long unmaintained architecture security analysis tool written in Squeak (http://www.octotrike.org/ for the curious), which I have been unmothballing.  We are considering switching to Pharo, partly because we are planning to add some P2P collaboration features we think have an HTTP layer in there somewhere & partly because we like it small, tidy, and self-compatible.  Hence my lurking.

Welcome and I would love to have more people working on these areas :).

> I've done some work on how data validation should be done for security purposes, for my day job.  This includes output encoding and decoding, like what Davide is talking about.  It's pretty tricky to get right because of the large number of contexts, with subtly different rules.  E.g. I would expect encodeForHTTP to be appropriate for HTTP headers, except that e.g. two things you usually want to put in HTTP headers are URIs and cookies, each of which have different rules (for different subparts, even) for what should be encoded.  The differences don't seem like much, but in the wild, my coworkers & I see these sorts of differences lead to vulnerabilities on a daily basis.
>
> From a security architecture perspective, the absolute best way to handle encoding & decoding for a structured object like an HTTP request or response (or a URI, or a cookie, or an HTML document, or..) is to use a validating parser.  Basically, when you get an HTTP request, parse it & put it in an object structured like the request.  At that time, you know the meaning of each portion of the string you are parsing, so you can interpret the bits correctly/safely.  The object(s) should store the individual strings that are actually content (vs. structure & constants) in a decoded state.  The developer should get everything from the objects, in decoded form, and put everything into the objects in decoded form.  Then, when it is time to send the response, the objects encode everything safely/canonically based on the exact type of objects they are.  This design concentrates the hard stuff (encoding, decoding, canonicalization, layering encodings on top of each other) near the interfaces, at the first/last possible moment enough context is known to interpret the information accurately.  It separates the mechanics of using a protocol or format from the intent of using the protocol.  It lets someone like me easily QA both the library and application code for security.  It is also simple for the developer to use safely (all the dev needs to think about is what objects/content they want to assemble, and the data validation at that layer is taken care of automatically) & is therefore the only design pattern I have seen consistently avoid all encoding-related vulnerabilities in the wild.  
>
> So what does this mean?  Basically, from a security perspective, encoding & decoding methods should live in the objects they encode and decode, and never be called from outside code.  That is, there should be an HTTPHeader>>fromString: or fromStream: method, which is called from an HTTPResponse >>fromString: or fromStream: method, and no String>>decodeFromHTTP.   Adding a String>>decodeFromHTTP method is easy from the library maintainer's point of view, approximately correct (way more correct than no method at all), and it matches what most languages are doing these days, but it shifts the burden of all that thought about the specific HTTP header & context to the application developer, who is usually just trying to write an application, not learn every single detail of the HTTP & gazillion other standards he would need to do this safely.
>
> Since this is a suggestion for substantial architecture change that would cause significant backwards compatibility issues throughout the entire Web application stack, and I'm new to Pharo to boot, I am expecting some interesting discussion to occur next.  Or maybe profound silence.  :)

Thanks for the explanation. It makes sense. String is a dead object just counting and assembling characters. So
Now what I would love to see is if you interested:
        - how can we improve the infrastructure of Pharo?
        step by step or via a big refactoring :)

        - I would add a simple decodeFromHTTP as a convenience method and in the future point to the validators.


> In my back pocket somewhere amongst the code I am unmothballing, I have 95% of a thouroughly documented URI implementation and test suite that follows this pattern and is pedantically compliant with one or another of the URI RFCs (it's old, may not be the most recent).

Bring it to life. We were discussing internally that we would like to have a decent URI implementation and we would like to massively clean
the URL/URI …. with ZnURL whatever. So it would be great to have a good part.
Now what I see from your mail :) is that you are a kind of perfectionist and you should pay attention (I know some of them) and
you should force yourself to be happy with 80% and release it
        - 1 your 80% may be the 95% of somebody else
        - 2 release often, make progress is the best way to finish. :)

>  I believe Spoon & Slate are using a previous version of it or its derivatives.  I'll need a fully pedantic HTTP parsing stack to feel comfortable releasing a P2P architecture security analysis tool (high value target, large attack surface, potentially very large professional embarrassment), so whatever isn't available, I expect we'll end up writing.  If Pharo folks are interested in this pattern,

Yes I'm. I will let the other reply to you because I'm far down in south of france but I'm quite sure that we are all interested.

> I would love to contribute my libraries/changes as I finish them, get advice on backward compatibility, performance, and APIs people would like to see, review whatever related code you'd like for security issues, and/or collaborate with any other developer who is interested.

I would love to learn from your expertise.

Stef

>
> Brenda
>
>
> On Jul 20, 2012, at 1:47 AM, Davide Varvello <[hidden email]> wrote:
>
>> Good Stef, I opened a new feature as reminder here: http://code.google.com/p/pharo/issues/detail?id=6430
>>  
>> Davide
>>
>> ----
>> - Cerchi un bravo Dentista, Avvocato, Commercialista? Un buon Hotel, Ristorante, Pizzeria? Io l'ho trovato su Oltre il Passaparola
>>
>> - Blog: Cambia il Tempo
>>
>> From: Stéphane Ducasse [via Smalltalk] <[hidden email]>
>> To: Davide Varvello <[hidden email]>
>> Sent: Thursday, July 19, 2012 10:43 PM
>> Subject: Re: The opposite of encodeForHTTP
>>
>> Let us fix it and propose a decodeFromHTTP method
>>
>> Stef
>>
>> On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote:
>>
>> > Thanks Sven,
>> > I was looking for String>>decode..whatever... with no luck :-)
>> > Cheers
>> >
>> > --
>> > View this message in context: http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
>> > Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>> >
>>
>>
>>
>>
>> If you reply to this email, your message will be added to the discussion below:
>> http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640822.html
>> To unsubscribe from The opposite of encodeForHTTP, click here.
>> NAML
>>
>>
>>
>> View this message in context: Re: The opposite of encodeForHTTP
>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: The opposite of encodeForHTTP

Stéphane Ducasse
In reply to this post by Davide Varvello

On Jul 21, 2012, at 2:13 PM, Davide Varvello wrote:

> Right, I seconded urlEncoded and urlDecoded

go for it.
we will deprecate or let encodeForHTTP for backward compat.
My point is: let us steadily improve the situation.
And if tomorrow something nicer exists then we just replace and throw away what we did :).

Stef

> Davide
>
>
> Norbert Hartl wrote
>>
>> IMHO that would worsen the problem :)
>>
>> encodeForHTTP is not a good name. The encoding is defined for URLs and has
>> nothing to do with HTTP. It is mostly called "url safe encoded" or just
>> "url encoded". Doing it similar as base64 I would propose
>>
>> urlEncoded
>> urlDecoded
>>
>> or
>>
>> urlSafeEncoded
>> urlSafeDecoded
>>
>> my 2 cents,
>>
>> Norbert
>>
>> Am 19.07.2012 um 21:47 schrieb Stéphane Ducasse:
>>
>>> Let us fix it and propose a decodeFromHTTP method
>>>
>>> Stef
>>>
>>> On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote:
>>>
>>>> Thanks Sven,
>>>> I was looking for String>>decode..whatever... with no luck :-)
>>>> Cheers
>>>>
>>>> --
>>>> View this message in context:
>>>> http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
>>>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>>>>
>>>
>>>
>>
>
>
>
>
> --
> View this message in context: http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4641004.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>