Smalltalk › Cincom › VisualWorks

VW resistance to algorithmic complexity attacks?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

33 messages Options

Reinout Heeck-2

VW resistance to algorithmic complexity attacks?

I lack the time to parse this in relation to VW, so I throw out some questions to more knowledgeable people here.

"Efficient Denial of Service Attacks on Web Application Platforms"
http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

The short version:
if you hand a POST request to a web server it will typically keep the names of the POST variables in a hashed collection. If you know the hashing algorithm you can craft these names such that all of them have the same hash. Put several million onto a single POST request and the web server will churn for minutes on a single request.

This DOS attack can also target other hashed collections like element names in a parsed xml document.

Question 1: is the VW hashing algorithm for strings structured such that it is cheap to generate millions of strings with the same hash value?

Question 2: can we generically counter such an attack? The paper proposes random hash seeding which seems inappropriate for image-based development. Should we consider using a new hash function that is (cryptographically) one-way?

R
-

--
Untitled Document

Reinout Heeck, Sr. Software Engineer

Soops - Specialists in Object Technology

Tel : +31 (0) 20 6222844
Fax : +31 (0) 20 6360827
Web: www.soops.nl

* Please consider the environment before printing this e-mail *

Dit e-mailbericht is alleen bestemd voor de geadresseerde(n). Gebruik door anderen is niet toegestaan. Indien u niet de geadresseerde(n) bent wordt u verzocht de verzender hiervan op de hoogte te stellen en het bericht te verwijderen. Door de elektronische verzending kunnen aan de inhoud van dit bericht geen rechten worden ontleend.

Soops B.V. is gevestigd te Amsterdam, Nederland, en is geregistreerd bij de Kamer van Koophandel onder nummer 33240368. Soops B.V. levert volgens de Fenit voorwaarden, gedeponeerd te Den Haag op 8 december 1994 onder nummer 1994/189.

This e-mail message is intended to be exclusively for the addressee. If you are not the intended recipient you are kindly requested not to make any use whatsoever of the contents and to notify the sender immediately by returning this e-mail message. No rights can be derived from this message.

Soops B.V. is a private limited liability company and has its seat at Amsterdam, The Netherlands and is registered with the Trade Registry of the Chamber of Commerce and Industry under number 33240368. Soops B.V. delivers according to the General Terms and Conditions of Business of Fenit, registered at The Hague, The Netherlands on December 8th, 1994, under number 1994/189.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Eliot Miranda-2

Re: VW resistance to algorithmic complexity attacks?

On Wed, Jan 4, 2012 at 7:59 AM, Reinout Heeck <[hidden email]> wrote:

I lack the time to parse this in relation to VW, so I throw out some questions to more knowledgeable people here.

"Efficient Denial of Service Attacks on Web Application Platforms"
http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

The short version:
if you hand a POST request to a web server it will typically keep the names of the POST variables in a hashed collection. If you know the hashing algorithm you can craft these names such that all of them have the same hash. Put several million onto a single POST request and the web server will churn for minutes on a single request.

This DOS attack can also target other hashed collections like element names in a parsed xml document.

Question 1: is the VW hashing algorithm for strings structured such that it is cheap to generate millions of strings with the same hash value?

Question 2: can we generically counter such an attack? The paper proposes random hash seeding which seems inappropriate for image-based development. Should we consider using a new hash function that is (cryptographically) one-way?

Can you not determine an upper limit for the number of variables in a valid POST and reject any that contain more variables than the limit?

R
-

--

Reinout Heeck, Sr. Software Engineer

Soops - Specialists in Object Technology

Tel : <a href="tel:%2B31%20%280%29%2020%206222844" value="+31206222844" target="_blank">+31 (0) 20 6222844
Fax : <a href="tel:%2B31%20%280%29%2020%206360827" value="+31206360827" target="_blank">+31 (0) 20 6360827
Web: www.soops.nl

* Please consider the environment before printing this e-mail *

Dit e-mailbericht is alleen bestemd voor de geadresseerde(n). Gebruik door anderen is niet toegestaan. Indien u niet de geadresseerde(n) bent wordt u verzocht de verzender hiervan op de hoogte te stellen en het bericht te verwijderen. Door de elektronische verzending kunnen aan de inhoud van dit bericht geen rechten worden ontleend.

Soops B.V. is gevestigd te Amsterdam, Nederland, en is geregistreerd bij de Kamer van Koophandel onder nummer 33240368. Soops B.V. levert volgens de Fenit voorwaarden, gedeponeerd te Den Haag op 8 december 1994 onder nummer 1994/189.

This e-mail message is intended to be exclusively for the addressee. If you are not the intended recipient you are kindly requested not to make any use whatsoever of the contents and to notify the sender immediately by returning this e-mail message. No rights can be derived from this message.

Soops B.V. is a private limited liability company and has its seat at Amsterdam, The Netherlands and is registered with the Trade Registry of the Chamber of Commerce and Industry under number 33240368. Soops B.V. delivers according to the General Terms and Conditions of Business of Fenit, registered at The Hague, The Netherlands on December 8th, 1994, under number 1994/189.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
best,

Eliot

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Alan Knight-2

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Reinout Heeck-2

Well that's clever. Without looking at it in too much detail, the VW algorithm is not that complicated, so I imagine it would be possible to engineer many strings with the same hash. And post values are kept in dictionaries both in Seaside and in WebToolkit/VisualWave, so I suspect that attack would work. The server will, at least in the WebToolkit case, attempt to kill requests if the server is being bogged down, but if there are enough of them thrashing it, that probably wouldn't help. The obvious generic counter would be to use a sorted dictionary that used binary search rather than a hash table. But if this relies on putting millions of strings, and the strings are probably not short, filtering all post requests that had, say, more than a thousand parameters might be a generic counter. It would be hard to filter on request size, because you might have one post parameter that was many megabytes as a file upload.

[hidden email]
4 January, 2012 10:59 AM

I lack the time to parse this in relation to VW, so I throw out some questions to more knowledgeable people here.

"Efficient Denial of Service Attacks on Web Application Platforms"
http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

The short version:
if you hand a POST request to a web server it will typically keep the names of the POST variables in a hashed collection. If you know the hashing algorithm you can craft these names such that all of them have the same hash. Put several million onto a single POST request and the web server will churn for minutes on a single request.

This DOS attack can also target other hashed collections like element names in a parsed xml document.

Question 1: is the VW hashing algorithm for strings structured such that it is cheap to generate millions of strings with the same hash value?

Question 2: can we generically counter such an attack? The paper proposes random hash seeding which seems inappropriate for image-based development. Should we consider using a new hash function that is (cryptographically) one-way?

R
-

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Andres Valloud-4

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Reinout Heeck-2

Cross posted here, with a couple edits.

-------- Original Message --------
Subject: Re: [Pharo-project] string hash collisions?
Date: Mon, 2 Jan 2012 00:00:35 -0500
From: Andres Valloud <[hidden email]>
Reply-To: [hidden email]
<[hidden email]>, [hidden email]
<[hidden email]>
To: [hidden email]
<[hidden email]>

Part of why those attacks are so easy is that the factor for the
multiplicative hash function [they attacked] is very poor (i.e.: 33, 31,
etc). There is nothing spectacular about this reasoning, those "cool
looking" factors chosen using the digital (finger) method ["finger
method" is the original K&R verbiage] should have been stamped out a
long time ago. The hashing book I wrote has a ton of additional details
on how to choose a factor properly if you want to use multiplicative
hash functions, see here:

http://www.lulu.com/spotlight/avsmalltalkbooks

Of course you can avoid all that trouble by using a crypto hash, at the
cost of the (mostly unnecessary) crypto hashing.

The authors suggest using a randomized hash function, but how do
persistent hash tables survive across invocations of the randomizer?
Moreover, I suspect the following:

1. The multiplicative hash function they show as "improved" in Perl
5.8.1 seems to use effectively 1025 as a factor which is generally
terrible (slide 77).

2. How is the seed determined? What criteria are used to guarantee it
is a good seed value?

3. It is likely you can still abuse the hash function with series of
null bytes, or by constructing strings that abuse that poor hash
function into producing tons of collisions mod 2^x, x < 32.

To sum it up...

1. If you use multiplicative hash functions, then you must choose the
factor properly [to avoid extremely easy to find collisions in
practice]. Go read the literature.

2. If you are expecting attackers, consider making it difficult to find
collisions in your hash function. A way to make it REALLY hard is to
use a crypto hash. But the applicability depends on the application.
Generally speaking, making something like a crypto hash function the
default hash function is a bad idea.

3. Just "randomizing" a hash function [as suggested by the slides] can
be as useful as "randomizing" a random number generator. Obfuscation is
not equivalent to a fix. Without a clear understanding of why the
results have to be good, chances are the results will be poor.

On 12/30/11 7:58 , Philippe Marschall wrote:

> Hi
>
> As you probably noted string hash collisions are all the rage these days
> [1]. Has anybody looked into whether this applies to Pharo as well?
>
> [1] http://events.ccc.de/congress/2011/Fahrplan/events/4680.en.html
>
> Cheers
> Philippe
>
>

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Andres Valloud-6

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Reinout Heeck-2

> Question 1: is the VW hashing algorithm for strings structured such that
> it is cheap to generate millions of strings with the same hash value?

I don't think it's THAT cheap, but with a 28 bit hash value and today's
computers, a motivated attacker won't have much trouble brute forcing
whatever number of collisions needed.

> Question 2: can we generically counter such an attack? The paper
> proposes random hash seeding which seems inappropriate for image-based
> development. Should we consider using a new hash function that is
> (cryptographically) one-way?

Yes, applications that expect attackers can e.g.: subclass Dictionary
and replace the send of #hash with the crypto hash that suits their
needs. Using something like a crypto hash as a default hash for the
whole system is going to be extremely painful for performance, because
most uses don't need a crypto hash and its associated expense.
Generally speaking, crypto hashes tend to be much more costly than
non-crypto hashes.

Andres.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Reinout Heeck-2

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Eliot Miranda-2

On 1/4/2012 8:31 PM, Eliot Miranda wrote:

Can you not determine an upper limit for the number of variables in a valid POST and reject any that contain more variables than the limit?

Hi Eliot!

The POST variables is merely one collection open to abuse.

Consider that:
I could attack by giving your web server many HTTP headers with names that generate hash clashes.
I could send a SOAP payload that is crafted such that a dictionary in the XML parser is attacked.
If you use VW to read email I could send you a specially crafted email so the mime parser will churn over hash clashes.
I could publish some stuff into the open Store repository such that all Store clients will churn over hash clashes when browsing the repository.
I could open an SSL connection to a VW server and present it a certificate that is crafted to make the server churn.
etc, etc, etc...

I'm not sure whether all of these scenarios are actually possible, however finding that out would require research effort.

Hence my inclination to look for a generic fix, it seems impossible/error prone to fix all separate use cases.

On a global level we might be able to detect when a hashed collection has a certain percentage of identical hashes (>50% ?) and then raise an exception. Could we do this cheaply though?
Alternatively we could switch to cryptographically more secure hashes but that would be excessively expensive....

R
-

--
Untitled Document

Reinout Heeck, Sr. Software Engineer

Soops - Specialists in Object Technology

Tel : +31 (0) 20 6222844
Fax : +31 (0) 20 6360827
Web: www.soops.nl

* Please consider the environment before printing this e-mail *

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Reinout Heeck-2

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Andres Valloud-6

On 1/5/2012 1:00 AM, Andres Valloud wrote:

Question 1: is the VW hashing algorithm for strings structured such that
it is cheap to generate millions of strings with the same hash value?

I don't think it's THAT cheap, but with a 28 bit hash value and today's 
computers, a motivated attacker won't have much trouble brute forcing 
whatever number of collisions needed.

Doesn't this imply that a cryptographic hash algorithm will be unsecure anyway unless we allow hash values bigger that 28 bits?

Question 2: can we generically counter such an attack? The paper
proposes random hash seeding which seems inappropriate for image-based
development. Should we consider using a new hash function that is
(cryptographically) one-way?

Yes, applications that expect attackers can e.g.: subclass Dictionary 
and replace the send of #hash with the crypto hash that suits their 
needs.  Using something like a crypto hash as a default hash for the 
whole system is going to be extremely painful for performance, because 
most uses don't need a crypto hash and its associated expense.

Hmm, that seems to imply I'll have to give up the abstraction that the Smalltalk library gave me.
I will need to know whether a piece of code that my server uses makes use of 'unsafe' hashing, I will need to know deep details of the libraries I use -> no abstraction.

Alternatively all libraries will need to be coded using a 'safe' variant of hashed collection so I can trust my server is safe against this attack.

Generally speaking, crypto hashes tend to be much more costly than 
non-crypto hashes.

Andres.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
--
Untitled Document

Reinout Heeck, Sr. Software Engineer

Soops - Specialists in Object Technology

Tel : +31 (0) 20 6222844
Fax : +31 (0) 20 6360827
Web: www.soops.nl

* Please consider the environment before printing this e-mail *

Dit e-mailbericht is alleen bestemd voor de geadresseerde(n). Gebruik door anderen is niet toegestaan. Indien u niet de geadresseerde(n) bent wordt u verzocht de verzender hiervan op de hoogte te stellen en het bericht te verwijderen. Door de elektronische verzending kunnen aan de inhoud van dit bericht geen rechten worden ontleend.

Soops B.V. is gevestigd te Amsterdam, Nederland, en is geregistreerd bij de Kamer van Koophandel onder nummer 33240368. Soops B.V. levert volgens de Fenit voorwaarden, gedeponeerd te Den Haag op 8 december 1994 onder nummer 1994/189.

This e-mail message is intended to be exclusively for the addressee. If you are not the intended recipient you are kindly requested not to make any use whatsoever of the contents and to notify the sender immediately by returning this e-mail message. No rights can be derived from this message.

Soops B.V. is a private limited liability company and has its seat at Amsterdam, The Netherlands and is registered with the Trade Registry of the Chamber of Commerce and Industry under number 33240368. Soops B.V. delivers according to the General Terms and Conditions of Business of Fenit, registered at The Hague, The Netherlands on December 8th, 1994, under number 1994/189.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

mkobetic

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Reinout Heeck-2

"Reinout Heeck"<[hidden email]> wrote:
> >> Question 1: is the VW hashing algorithm for strings structured such that
> >> it is cheap to generate millions of strings with the same hash value?
> > I don't think it's THAT cheap, but with a 28 bit hash value and today's
> > computers, a motivated attacker won't have much trouble brute forcing
> > whatever number of collisions needed.
> Doesn't this imply that a cryptographic hash algorithm will be unsecure
> anyway unless we allow hash values bigger that 28 bits?

Depends on what you mean by "cryptographic", but the definition of "cryptographically secure" hash function requires it to be "one-way" and "collision-resistant", the latter meaning that the amount of work needed to find a collision makes it infeasible. I don't see how you can achieve that with 28 bits. Taking birthday paradox into account you'll need at most 16K samples (on average) to find a collision no matter how sophisticated the function is. So unless a single hash operation is enormously computationally expensive (making it completely impractical), I don't see how it can be secure.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

kobetic

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Reinout Heeck-2

"Andres Valloud"<[hidden email]> wrote:
> The authors suggest using a randomized hash function, but how do
> persistent hash tables survive across invocations of the randomizer?

How about a "keyed hash" where a new key is generated for each hashed collection instance and persisted with it. The important requirement is that the key of any particular collection isn't predictable, so you may want a cryptographically secure random generator, but that's a cost paid once when the collection is created, you don't pay for that on every lookup. Could be feasible for reasonable number of use cases.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

mkobetic

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Reinout Heeck-2

Alan Knight-2

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Reinout Heeck-2

Reinout Heeck wrote:

Hmm, that seems to imply I'll have to give up the abstraction that the Smalltalk library gave me.
I will need to know whether a piece of code that my server uses makes use of 'unsafe' hashing, I will need to know deep details of the libraries I use -> no abstraction.

Alternatively all libraries will need to be coded using a 'safe' variant of hashed collection so I can trust my server is safe against this attack.

Unfortunately that tends to be the nature of security problems, that something far down in a library you use turns out to have a problem with hashing, or buffer overflow, or what permissions it uses, etc. Lots of algorithm design tends to think about things in terms of a hypothetical adversary, but they don't tend to really build them with the assumption that a concrete adversary will actually show up on the virtual doorstep one day.

In randomized algorithm design, one of the criteria is that although an algorithm may have theoretical worst-case performance, that worst-case should not depend on the input. That is, the randomized component might give worst-case performance in one run, but running with the same input again would not. I'm not sure if that can be applied to this sort of problem, but it would be a general solution. The other general solution would be to use data structures with better worst-case scenarios.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Andres Valloud-6

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Reinout Heeck-2

On 1/5/2012 8:30 AM, Reinout Heeck wrote:
> On 1/5/2012 1:00 AM, Andres Valloud wrote:
>>> Question 1: is the VW hashing algorithm for strings structured such that
>>> it is cheap to generate millions of strings with the same hash value?
>> I don't think it's THAT cheap, but with a 28 bit hash value and today's
>> computers, a motivated attacker won't have much trouble brute forcing
>> whatever number of collisions needed.
> Doesn't this imply that a cryptographic hash algorithm will be unsecure
> anyway unless we allow hash values bigger that 28 bits?

No, what I meant is that the current default non-crypto hash is 28 bits,
so even if it is obfuscated you can still brute force it easily. With a
crypto hash you'd have large integer hashes, but I wouldn't necessarily
worry too much about it unless you're continuously serving several
thousand requests per second (and then of course you'd have other
serious concerns such as when the GC is allowed to run etc).

> Alternatively all libraries will need to be coded using a 'safe' variant
> of hashed collection so I can trust my server is safe against this attack.

But that also runs the risks of adding a lot of complexity just in case
someone needs it, and still not being enough for a specific application
we're not yet aware of.

Andres.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Andres Valloud-6

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Reinout Heeck-2

As a default hash, yes I'd say crypto hashes would be too expensive for
little to no gain. For this specific instance, however... I'm sure you
can get a crypto hash of a reasonably sized string in a millisecond. How
many requests per second are you handling? And how many milliseconds of
network latency to the client using the app?

On 1/5/2012 7:51 AM, Reinout Heeck wrote:
> Alternatively we could switch to cryptographically more secure hashes
> but that would be excessively expensive....
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Jon Paynter-2

Re: VW resistance to algorithmic complexity attacks?

On Thu, Jan 5, 2012 at 2:01 PM, Andres Valloud <[hidden email]> wrote:

As a default hash, yes I'd say crypto hashes would be too expensive for
little to no gain. For this specific instance, however... I'm sure you
can get a crypto hash of a reasonably sized string in a millisecond. How
many requests per second are you handling? And how many milliseconds of
network latency to the client using the app?

I wonder if we are chasing the wrong problem here.
The original issue had to do with too many POST values sent as a seaside request. Most if not all of the hashing issues could be avoided by adding an upper bound (2k - 5k) to the number of values processed by the seaside request handler. Yes the overall size of the request will be quite big, but once handler passes the upper limit its easy to throw an exception and stop the processing. That way you never have to worry about hashing millions of bogus post values.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Andres Valloud-6

Re: VW resistance to algorithmic complexity attacks?

Yes, that too... how you approach the security issues you're facing
really depends on the application.

On 1/5/2012 2:48 PM, Jon Paynter wrote:

> On Thu, Jan 5, 2012 at 2:01 PM, Andres Valloud <[hidden email]
> <mailto:[hidden email]>> wrote:
>
> As a default hash, yes I'd say crypto hashes would be too expensive for
> little to no gain. For this specific instance, however... I'm sure you
> can get a crypto hash of a reasonably sized string in a millisecond. How
> many requests per second are you handling? And how many milliseconds of
> network latency to the client using the app?
>
>
> I wonder if we are chasing the wrong problem here.
> The original issue had to do with too many POST values sent as a seaside
> request. Most if not all of the hashing issues could be avoided by
> adding an upper bound (2k - 5k) to the number of values processed by the
> seaside request handler. Yes the overall size of the request will be
> quite big, but once handler passes the upper limit its easy to throw an
> exception and stop the processing. That way you never have to worry
> about hashing millions of bogus post values.
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Paul Baumann

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Reinout Heeck-2

Untitled Document

>> I lack the time to parse this in relation to VW

Sorry, but the explanation I'd give for how this relates to VW will take longer to parse... :)

Hash collisions typically happen due to a combination of hash range and hash table size. The smallest of either of these typically determines the number of collisions; however, VW is different.

The example in the document shows a hash->bucket design in which the hash value is used to find a sequence that is slowly searched. The attack is to add unequal objects of equal hash value to maximize the amount of linear search and growth in each hash bucket. An example of hash->bucket implementation in VW is the class-side behavior Symbol uses to intern strings. The trick would also be to minimize the hash variance of the data added to reduce the chance that a rehash will increase the size of the hash table and that distribution into buckets will not be improved once the hash table size is increased.

VW hashed collections have logical hash buckets with search indexes that are determined by the size of the hash table. VW grows the ENTIRE hash table after an addition when #fullCheck notices there are less than 25% free slots. VW uses the hash value as a starting point for doing a linear search that is limited only by the discovery of a nil slot or equal object. VW is more affected by the distribution of empty slots than it is by hash table size. VW needs to have the free slots because without them the #find*OrNil: methods would have the cost of searching an increasing percentage of the hash table as items are added. The free slots serve as a stop that a separate physical bucket would normally provide. This is more costly (like #fixCollisionsFrom:) than a physical hash bucket design that can manage buckets individually. VW attempts to expand the distribution range in methods like #initialIndexFor:boundedBy:, but that would have little benefit when there are many equal hash values for unequal objects. VW has the linear search costs that the attack exploits, searches more than once for some operations, has a larger scope of growth, and searches through objects that would otherwise belong to another bucket. The VW implementation would be more susceptible to a hash collision attack than the hash->bucket design that the document describes how to exploit.

"This artificially forces a highly saturated hash table to simulate lookup cost alone that could be incurred with a hash collision attack ."

| normalSet normalResults slowerSet slowerResults notIncluded |

normalSet := Set new: Symbol tableSize.

notIncluded := Set new: Symbol tableSize.

Symbol table size to: 1 by: -1 do: [:bucketIndex |

(bucketIndex < 6 ifTrue: [notIncluded] ifFalse: [normalSet])

addAll: (Symbol table at: bucketIndex).

notIncluded := notIncluded asArray.

normalResults := Array

with: normalSet basicSize

with: normalSet size

with: (normalSet basicSize / normalSet size) asFloat

with: (Time millisecondsToRun: [10 timesRepeat: [notIncluded do: [:ea | normalSet includes: ea]]]).

slowerSet := normalSet copyEmpty: normalSet size + 1.

normalSet do: [:each | slowerSet noCheckAdd: each].

slowerResults := Array

with: slowerSet basicSize

with: slowerSet size

with: (slowerSet basicSize / slowerSet size) asFloat

with: (Time millisecondsToRun: [10 timesRepeat: [notIncluded do: [:ea | slowerSet includes: ea]]]).

Array

with: normalResults

with: slowerResults

#(#(280277 128057 2.18869 1) #(128431 128057 1.00292 1936))

The performance difference was 1 ms vs. 1936 ms.

A hash collision attack would cause inefficient searches through a long series of overlapping "buckets" much like this code simulates. Because VW overlaps logical buckets, a search that happens in a logical bucket position after an attacked bucket position would experience some of the cost that would otherwise be isolated to just one physical bucket. It wouldn't matter how many free slots VW grows the collection for because the free spaces would be clustered distant from the start of the search from colliding items.

Others have mentioned that the 28 bit computed hash range from a VW string may be of sufficient range to make it extremely difficult to do a string-based collision attack. I agree it would be difficult to simulate an actual hash collision attack using a hashed collection of strings with 28 bits of hash value range; however, I suspect the assumption is being made that the hash values from other languages have less range than 28 bits. I couldn't say one way or another. If 28 bits is enough range to be considered attack resistant then other languages could compute a 28 bit hash value from a string just like VW does.

Only a small variance of the hash value is needed to reduce a distributed "meet in the middle" collision attack. VW gets some automatic variance from the size of the hash table. A physical buckets design doesn't need to adjust the size of the hash table as frequently as VW collections do but VW's more frequent growth of the hash table frequently can reduce some of the computed collisions from #initialIndexFor:boundedBy:.

Assuming that VW string's 28-bit hash range avoids the problem... The hash values for non-strings are not externally controllable in a way that could cause a hash collision through passing data. For some, that will be the answer they are looking for, case closed. But hold on... While VW is generally safe from direct external attack by data, the performance problems that the attack hopes to exploit are so common in VW that it is accepted as normal. Most objects in VW default to the 14 bit (16,383) range of values (#maximumIdentityHashValue) from #identityHash. This narrow range causes many smaller/distributed hash collisions. Many now recognize that the inefficiency of VW hashed collections becomes more pronounced once it contains more than 16K items. Few recognize that the collision costs exist for smaller collections--because the slowness is pervasive and consistent.

In a sense, VW comes pre-attacked with the performance impact of the "meet in the middle" approach of the document. If VW's 28-bit hash range is exploitable then the design VW uses for hashed collections would likely crash harder and faster than a design that doesn't have the problems I've described. The main difference is that VW would crash in acquiring more memory to grow collections while others might just get very slow but linger on.

The limited range that VW returns for #identityHash is one of the biggest performance problems that 32-bit VW has because it causes poor distribution into the logical hash buckets. There are many ways to fix the problem by changing the collections or computations. One way to avoid the problem is to use a 64-bit VM that can offer a greater #maximumIdentityHashValue. Upgrading to 64-bit is easier than reimplementing VW's hashed collections so it is the option Cincom would encourage.

Regards,

Paul Baumann

From: [hidden email] [mailto:[hidden email]] On Behalf Of Reinout Heeck
Sent: Wednesday, January 04, 2012 11:00
To: 'VWNC'
Subject: [vwnc] VW resistance to algorithmic complexity attacks?

Reinout Heeck, Sr. Software Engineer

Soops - Specialists in Object Technology

Tel : +31 (0) 20 6222844
Fax : +31 (0) 20 6360827
Web: www.soops.nl

* Please consider the environment before printing this e-mail *

This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Travis Griggs-4

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Reinout Heeck-2

On Jan 4, 2012, at 7:59 AM, Reinout Heeck wrote:

I lack the time to parse this in relation to VW, so I throw out some questions to more knowledgeable people here.

"Efficient Denial of Service Attacks on Web Application Platforms"
http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

The short version:
if you hand a POST request to a web server it will typically keep the names of the POST variables in a hashed collection. If you know the hashing algorithm you can craft these names such that all of them have the same hash. Put several million onto a single POST request and the web server will churn for minutes on a single request.

This DOS attack can also target other hashed collections like element names in a parsed xml document.

Question 1: is the VW hashing algorithm for strings structured such that it is cheap to generate millions of strings with the same hash value?

Question 2: can we generically counter such an attack? The paper proposes random hash seeding which seems inappropriate for image-based development. Should we consider using a new hash function that is (cryptographically) one-way?

Naive question from me. Can one step back and just ditch hashing altogether by using binary search tree collection? In this particular domain, it sounds like the keys are always strings. I wrote a simple non-balancing (maybe I'll fix that later, was reading on the scapegoat balancing algorithm and found it intriguing) tree, had some fun with polymorphism. Even with that naive approach, for a randomly distributed tree (meaning not worse case linear, but not best case best balance either), I was seeing times that were not that much slower than using a Dictionary. I did not figure out how to create a pathological hash collision. If someone wants to tell me how to compute 1000 strings with the same hash, I'll compare that. :)

Travis Griggs

Objologist

"The best way to know you have a mind is to change it" -Judge Pierre Leval

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Steven Kelly

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Paul Baumann

Untitled Document

Thanks Paul, I always learn something new from your posts! I just spotted VW support Resolution 100401, which “implements Dictionaries using buckets to resolve hash collisions”. The resolution talks about improving the performance of finalization, which uses IdentityDictionary and is subject to the consecutive slot problem. It looks like the resolution just provides the new Dictionary classes, without actually changing existing code to use them. Hopefully still some use to others, and maybe a foundation to build on for a future solution integrated in VW – e.g. to replace IdentityDictionary.

All the best,

Steve

From: [hidden email] [mailto:[hidden email]] On Behalf Of Paul Baumann
Sent: 7. tammikuuta 2012 1:39
To: Reinout Heeck; 'VWNC'
Subject: Re: [vwnc] VW resistance to algorithmic complexity attacks?

>> I lack the time to parse this in relation to VW

Sorry, but the explanation I'd give for how this relates to VW will take longer to parse... :)

Hash collisions typically happen due to a combination of hash range and hash table size. The smallest of either of these typically determines the number of collisions; however, VW is different.

"This artificially forces a highly saturated hash table to simulate lookup cost alone that could be incurred with a hash collision attack ."

| normalSet normalResults slowerSet slowerResults notIncluded |

normalSet := Set new: Symbol tableSize.

notIncluded := Set new: Symbol tableSize.

Symbol table size to: 1 by: -1 do: [:bucketIndex |

(bucketIndex < 6 ifTrue: [notIncluded] ifFalse: [normalSet])

addAll: (Symbol table at: bucketIndex).

notIncluded := notIncluded asArray.

normalResults := Array

with: normalSet basicSize

with: normalSet size

with: (normalSet basicSize / normalSet size) asFloat

with: (Time millisecondsToRun: [10 timesRepeat: [notIncluded do: [:ea | normalSet includes: ea]]]).

slowerSet := normalSet copyEmpty: normalSet size + 1.

normalSet do: [:each | slowerSet noCheckAdd: each].

slowerResults := Array

with: slowerSet basicSize

with: slowerSet size

with: (slowerSet basicSize / slowerSet size) asFloat

with: (Time millisecondsToRun: [10 timesRepeat: [notIncluded do: [:ea | slowerSet includes: ea]]]).

Array

with: normalResults

with: slowerResults

#(#(280277 128057 2.18869 1) #(128431 128057 1.00292 1936))

The performance difference was 1 ms vs. 1936 ms.

Regards,

Paul Baumann

Reinout Heeck, Sr. Software Engineer

Soops - Specialists in Object Technology

Tel : +31 (0) 20 6222844
Fax : +31 (0) 20 6360827
Web: www.soops.nl

* Please consider the environment before printing this e-mail *

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Reinout Heeck-2

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by mkobetic

On 1/5/2012 6:31 PM, [hidden email] wrote:

"Andres Valloud"[hidden email] wrote:

The authors suggest using a randomized hash function, but how do
persistent hash tables survive across invocations of the randomizer?

How about a "keyed hash" where a new key is generated for each hashed collection instance and persisted with it.

ooh, that sounds like a useful idea !

Simple enough to be worth trying across the entire library too :-)

R
-

The important requirement is that the key of any particular collection isn't predictable, so you may want a cryptographically secure random generator, but that's a cost paid once when the collection is created, you don't pay for that on every lookup. Could be feasible for reasonable number of use cases.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
--
Untitled Document

Reinout Heeck, Sr. Software Engineer

Soops - Specialists in Object Technology

Tel : +31 (0) 20 6222844
Fax : +31 (0) 20 6360827
Web: www.soops.nl

* Please consider the environment before printing this e-mail *

Dit e-mailbericht is alleen bestemd voor de geadresseerde(n). Gebruik door anderen is niet toegestaan. Indien u niet de geadresseerde(n) bent wordt u verzocht de verzender hiervan op de hoogte te stellen en het bericht te verwijderen. Door de elektronische verzending kunnen aan de inhoud van dit bericht geen rechten worden ontleend.

Soops B.V. is gevestigd te Amsterdam, Nederland, en is geregistreerd bij de Kamer van Koophandel onder nummer 33240368. Soops B.V. levert volgens de Fenit voorwaarden, gedeponeerd te Den Haag op 8 december 1994 onder nummer 1994/189.

This e-mail message is intended to be exclusively for the addressee. If you are not the intended recipient you are kindly requested not to make any use whatsoever of the contents and to notify the sender immediately by returning this e-mail message. No rights can be derived from this message.

Soops B.V. is a private limited liability company and has its seat at Amsterdam, The Netherlands and is registered with the Trade Registry of the Chamber of Commerce and Industry under number 33240368. Soops B.V. delivers according to the General Terms and Conditions of Business of Fenit, registered at The Hague, The Netherlands on December 8th, 1994, under number 1994/189.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Reinout Heeck-2

Re: VW resistance to algorithmic complexity attacks?

In reply to this post by Andres Valloud-6

On 1/5/2012 11:01 PM, Andres Valloud wrote:

As a default hash, yes I'd say crypto hashes would be too expensive for 
little to no gain.  For this specific instance, however... I'm sure you 
can get a crypto hash of a reasonably sized string in a millisecond. How 
many requests per second are you handling?  And how many milliseconds of 
network latency to the client using the app?

On 1/5/2012 7:51 AM, Reinout Heeck wrote:

Alternatively we could switch to cryptographically more secure hashes
but that would be excessively expensive....

Our app is relatively low traffic but handling high value.

Given other thoughts in this thread I'm turning towards the idea that hashes should be unpredictable rather than cryptographically strong.

The bitcoin mining scene shows how my current graphics card can compute 2^32 SHA-D256 hashes in about a minute, so 28-bits VW hash clashes should be trivial to generate with it.

https://en.bitcoin.it/wiki/Mining_hardware_comparison#AMD_.28ATI.29

__
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
--
Untitled Document

Reinout Heeck, Sr. Software Engineer

Soops - Specialists in Object Technology

Tel : +31 (0) 20 6222844
Fax : +31 (0) 20 6360827
Web: www.soops.nl

* Please consider the environment before printing this e-mail *

Dit e-mailbericht is alleen bestemd voor de geadresseerde(n). Gebruik door anderen is niet toegestaan. Indien u niet de geadresseerde(n) bent wordt u verzocht de verzender hiervan op de hoogte te stellen en het bericht te verwijderen. Door de elektronische verzending kunnen aan de inhoud van dit bericht geen rechten worden ontleend.

Soops B.V. is gevestigd te Amsterdam, Nederland, en is geregistreerd bij de Kamer van Koophandel onder nummer 33240368. Soops B.V. levert volgens de Fenit voorwaarden, gedeponeerd te Den Haag op 8 december 1994 onder nummer 1994/189.

This e-mail message is intended to be exclusively for the addressee. If you are not the intended recipient you are kindly requested not to make any use whatsoever of the contents and to notify the sender immediately by returning this e-mail message. No rights can be derived from this message.

Soops B.V. is a private limited liability company and has its seat at Amsterdam, The Netherlands and is registered with the Trade Registry of the Chamber of Commerce and Industry under number 33240368. Soops B.V. delivers according to the General Terms and Conditions of Business of Fenit, registered at The Hague, The Netherlands on December 8th, 1994, under number 1994/189.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc