On 1/5/2012 11:48 PM, Jon Paynter wrote:
That would handle merely one collection that is exposed. It assumes the use of Seaside and web requests. I envisage this attack would work with *any* hashed collection open to manipulation by mutually distrustful users of a system (not necessarily web based). R -
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Reinout Heeck-2
Ok, but I strongly suspect that "randomized" hashes are just a wee bit
more complicated to attack than non-"randomized" hashes... if you don't have a good theory about why the "randomization" is supposed to give you good results, you shouldn't trust it. If you have low volume, I don't see why you wouldn't just use crypto hashes and be done with it. Then again, it's your choice. On 1/9/2012 2:20 AM, Reinout Heeck wrote: > On 1/5/2012 11:01 PM, Andres Valloud wrote: >> As a default hash, yes I'd say crypto hashes would be too expensive for >> little to no gain. For this specific instance, however... I'm sure you >> can get a crypto hash of a reasonably sized string in a millisecond. How >> many requests per second are you handling? And how many milliseconds of >> network latency to the client using the app? >> >> On 1/5/2012 7:51 AM, Reinout Heeck wrote: >>> Alternatively we could switch to cryptographically more secure hashes >>> but that would be excessively expensive.... > > > Our app is relatively low traffic but handling high value. > > > Given other thoughts in this thread I'm turning towards the idea that > hashes should be unpredictable rather than cryptographically strong. > > The bitcoin mining scene shows how my current graphics card can compute > 2^32 SHA-D256 hashes in about a minute, so 28-bits VW hash clashes > should be trivial to generate with it. > > > https://en.bitcoin.it/wiki/Mining_hardware_comparison#AMD_.28ATI.29 > > > >> __ >> vwnc mailing list >> [hidden email] >> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc >> >> >> -- >> >> Soops b.v. <http://www.soops.nl> Reinout Heeck, Sr. Software Engineer >> >> Soops - Specialists in Object Technology >> >> Tel : +31 (0) 20 6222844 >> Fax : +31 (0) 20 6360827 >> Web: www.soops.nl >> >> >> * Please consider the environment before printing this e-mail * >> >> ------------------------------------------------------------------------ >> >> Dit e-mailbericht is alleen bestemd voor de geadresseerde(n). Gebruik >> door anderen is niet toegestaan. Indien u niet de geadresseerde(n) >> bent wordt u verzocht de verzender hiervan op de hoogte te stellen en >> het bericht te verwijderen. Door de elektronische verzending kunnen >> aan de inhoud van dit bericht geen rechten worden ontleend. >> >> Soops B.V. is gevestigd te Amsterdam, Nederland, en is geregistreerd >> bij de Kamer van Koophandel onder nummer 33240368. Soops B.V. levert >> volgens de Fenit voorwaarden, gedeponeerd te Den Haag op 8 december >> 1994 onder nummer 1994/189. >> >> ------------------------------------------------------------------------ >> >> This e-mail message is intended to be exclusively for the addressee. >> If you are not the intended recipient you are kindly requested not to >> make any use whatsoever of the contents and to notify the sender >> immediately by returning this e-mail message. No rights can be derived >> from this message. >> >> Soops B.V. is a private limited liability company and has its seat at >> Amsterdam, The Netherlands and is registered with the Trade Registry >> of the Chamber of Commerce and Industry under number 33240368. Soops >> B.V. delivers according to the General Terms and Conditions of >> Business of Fenit, registered at The Hague, The Netherlands on >> December 8th, 1994, under number 1994/189. >> > > > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Reinout Heeck-2
Or, as I suggested
previously and Travis re-suggested. Don't use hashing. These are all
strings. Use a sorted collection with binary search or a search tree. No
pathological worst-cases to exploit.
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
On Mon, Jan 9, 2012 at 11:32 AM, Alan Knight <[hidden email]> wrote:
Bar a huge quantity of strings, which was where I pitched in at the beginning. Surely one wants to limit the stress a request can put on a server? If one puts a bound on request size and/or complexity then in this example hashed collections could work better than search trees. But if one doesn't protect against overload both approaches are vulnerable, no?
best, Eliot _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Alan Knight-2
I'm sure someone has already suggested it.. but there are a lot more vectors of attack here than exploiting hash collisions to drill a cpu. The number of headers you send in a request could be endless.. the value part of the key-value pair on an individual line could be endless. Ultimately, you're using the headers of the HTTP request to determine the restrictions on the body of the HTTP request. For example, if it's a database query that takes too long, you can terminate it prematurely -- but you can't make that decision until the request headers are complete. My personal favourite answer for dealing with the headers dictionary is to use an enumerated index for each known header type -- after all, if you can't use the data, there's no point keeping it in memory (and recording it to a log somewhere introduces a new vectors of attack - disk space and disk speed). [each known header string is matched to a fixed integer value which indexes in to a fixed array size, everything else is rejected]. You could put a hard limit on how large each header value can be and so on and so forth but ultimately this goes back to the original design of the HTTP server. Originally, each request that came in would be sent to a new unix process and this process of course runs under process, user and group restrictions. That includes CPU time, disk time, memory, resource handles, permissions, the works. So, now that the world has decided to not use this technique, we have to reinvent the same level of sandboxing that used to exist or potentially suffer the consequences. If you add a timeout to negotiating the headers of a request, you still have to deal with excessive memory usage in that short period of time (just wait for the 10gigabit internet, or move to South Korea); and you also have to be wary of operations that may be invoked that cannot be interrupted (access to volatile resources or simply faults in VM technology, like when we compute large integers over and over again). Do we perform a timeout based on negotiating the headers in our webservers? Nope. Are we superduper industry strength protected from all potential attacks? Nope. Can we be put behind an industry strength superduper protected from nearly all attacks webserver? Yes. Is it common for Smalltalkers to do this? Yep. Is there documentation on how to do this with, say, Apache? Yep. Are there commercial websites built with Seaside using this technique? You betcha. I found this particular vulnerability discussion interesting, but ultimately I also felt the original attack was a bit short sighted. It seems like it was written to attack naive http stack implementations like you'd find in Ruby, Python, Smalltalk, versus the hardened implementations you find deployed on hundreds of millions of websites like Apache, TinyHTTP and, dare I suggest it, IIS. Both sets of technologies have their place. Personally I find the idea of using hash collisions to create a DDOS kind of pointless.. I'm sure there are much better ways to take down your foes than exploiting a thin margin like that. Most webapps that aren't built to handle a huge volume of traffic will fail under DDOS simply by handling -legitimate- requests. So when it comes to dealing with DDOS, how do the big sights actually do it? Well, it turns out you can purchase DDOS protection services from many ISPs. They are packet sniffing at their routers anyway, so why not actually use that to handle DDOS. One example that comes to mind in recent times was the Minecraft servers. Minecraft become immensely popular in a very short period of time and that overwhelmed the creators. Then, LoL the DDOS group decided to attack different games and Minecraft was one of them. They took down the Minecraft servers (HTTP servers for authenticating user sessions so that people can log in) for about half a day before stopping. In response, Mojang purchased DDOS protection from their ISP so that they didn't have to rework everything they already had. They've not been DDOS'd since. Ironically, they still go down but once again from legitimate traffic making legitimate requests. This is the twitter.com problem of success… a problem I'm sure we'd all love to have. So, my recommendation is not to rip out dictionary implementations and replace them, or even to rewrite http stacks or implement multiple levels of resource management. If you are paranoid that this is going to happen to you, add defenses in front of your application, not in to your application. There are loads of security packages you can get for different operating systems that pre-filter data before it gets to you… they'll even detect SQL inject attacks if you're worried that somehow that will slip through Seaside or Glorp. Michael On Jan 9, 2012, at 11:32 AM, Alan Knight wrote:
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Eliot Miranda-2
No, a huge quantity of
strings doesn't make any difference. Well, it's slower than handling a
small quantity of strings, but if you do it with a hash table it's
orders of magnitude worse, because hash tables have bad worst-case
performance and we're assuming an adversary who is deliberately
provoking the worst case. Trees (if balanced) or sorted collections will
work roughly the same regardless of the specifics of the input. A
billion strings will be very roughly 20 million times worse than a
thousand strings, but they won't be a quadrillion times worse.
Filtering out things with a billion post variables in the first place is also not a bad idea, but as Michael points out, probably faster and more reliably done at the front-end before the application has to deal with it.
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Steven Kelly
Thanks Steven. Hashed collections that use physical buckets are better when collections are long-lived or very large. The VW collections (with logical buckets) are faster
to create and dispose of. I'd implemented and shared physical bucket designs a dozen different ways. I'd be surprised if that Cincom resolution didn't have my name associated with a resolution like you've mentioned. Approaches that increase the hash range
won out. The need for faster implementations becomes even more limited once you have that. Some big performance problems with Store caches could have been fixed with a better hash collection design. Those store-specific performance problems had been reported
to Cincom but I recall they believed that their redesign of Store would address those problems. Here is an answer for the main topic: >> Question 2: can we generically counter such an attack? Yes. With a change to one method that actually increases performance. A generic fix would need to work when there are fewer hash buckets than hash values (like when indexing 28-bit string hash values to a 100 position hash table)
and when there are more hash buckets than hash values (like with 14-bit identityHash).
"Default" #(0.333333 1.0 #(3->5461)) "table was 33% of hash value range; distribution range was 100% of hash value; distribution was even" #(1.74998 1.0 #(1->16383)) "table was 175% of hash value range; distribution range was 100% of hash value; distribution was even" The goal we'd want for these numbers is for the first to test to stay " #(0.333333 1.0 #(3->5461))" and for the second test to increase the second column (to
be above 1.0) and the number of pairs shown in the third column. String>>hash is said to answer a 28-bit value. SmallInteger maxBits is 29. Variance (that still has SmallInteger efficiency) can be added by shifting the bits
up 1 (or multiplying the hashValue by 2). " bitShift: 1 " #(0.333333 1.0 #(3->5461)) #(1.74998 1.14287 #(1->12287 2->2048)) It is resistant to an algorithmic complexity attack when the maximum hash value is greater than the number of buckets (which would always be the case with strings).
There is still a even distribution when there are fewer buckets than hash values. When there are more buckets than hash values then the distribution is into 14% more index positions than there were hash values. A meaningful variance is created and average
bucket size was lower. This trick increases the variance of all the hash values such that VW hashing is more efficient even when constrained to the traditional VW 14-bit identityHash value. I've added the implementation to the latest version (currently 1.6) of ICXVwHashTuning. If you want to give it a try then load version 1.6 (or later) and then
read the Set class>>hashTuning_documentation method. Thanks for the kind words. Regards, Paul Baumann From: Steven Kelly [mailto:[hidden email]]
Thanks Paul, I always learn something new from your posts! I just spotted VW support Resolution 100401, which “implements Dictionaries using buckets to resolve
hash collisions”. The resolution talks about improving the performance of finalization, which uses IdentityDictionary and is subject to the consecutive slot problem. It looks like the resolution just provides the new Dictionary classes, without actually changing
existing code to use them. Hopefully still some use to others, and maybe a foundation to build on for a future solution integrated in VW – e.g. to replace IdentityDictionary. All the best, Steve From: [hidden email] [mailto:[hidden email]]
On Behalf Of Paul Baumann >> I lack the time to parse this in relation to VW Sorry, but the explanation I'd give for how this relates to VW will take longer to parse... :) Hash collisions typically happen due to a combination of hash range and hash table size. The smallest of either of these typically determines the number of
collisions; however, VW is different. The example in the document shows a hash->bucket design in which the hash value is used to find a sequence that is slowly searched. The attack is to add unequal
objects of equal hash value to maximize the amount of linear search and growth in each hash bucket. An example of hash->bucket implementation in VW is the class-side behavior Symbol uses to intern strings. The trick would also be to minimize the hash variance
of the data added to reduce the chance that a rehash will increase the size of the hash table and that distribution into buckets will not be improved once the hash table size is increased.
VW hashed collections have logical hash buckets with search indexes that are determined by the size of the hash table. VW grows the ENTIRE hash table after
an addition when #fullCheck notices there are less than 25% free slots. VW uses the hash value as a starting point for doing a linear search that is limited only by the discovery of a nil slot or equal object. VW is more affected by the distribution of empty
slots than it is by hash table size. VW needs to have the free slots because without them the #find*OrNil: methods would have the cost of searching an increasing percentage of the hash table as items are added. The free slots serve as a stop that a separate
physical bucket would normally provide. This is more costly (like #fixCollisionsFrom:) than a physical hash bucket design that can manage buckets individually. VW attempts to expand the distribution range in methods like #initialIndexFor:boundedBy:, but that
would have little benefit when there are many equal hash values for unequal objects. VW has the linear search costs that the attack exploits, searches more than once for some operations, has a larger scope of growth, and searches through objects that would
otherwise belong to another bucket. The VW implementation would be more susceptible to a hash collision attack than the hash->bucket design that the document describes how to exploit. "This artificially forces a highly saturated hash table to simulate lookup cost alone that could be incurred with a hash collision attack ." | normalSet normalResults slowerSet slowerResults notIncluded | normalSet := Set new: Symbol tableSize. notIncluded := Set new: Symbol tableSize. Symbol table size to: 1 by: -1 do: [:bucketIndex |
(bucketIndex < 6 ifTrue: [notIncluded] ifFalse: [normalSet])
addAll: (Symbol table at: bucketIndex). ]. notIncluded := notIncluded asArray. normalResults := Array
with: normalSet basicSize
with: normalSet size
with: (normalSet basicSize / normalSet size) asFloat
with: (Time millisecondsToRun: [10 timesRepeat: [notIncluded do: [:ea | normalSet includes: ea]]]). slowerSet := normalSet copyEmpty: normalSet size + 1. normalSet do: [:each | slowerSet noCheckAdd: each]. slowerResults := Array
with: slowerSet basicSize
with: slowerSet size
with: (slowerSet basicSize / slowerSet size) asFloat with: (Time millisecondsToRun: [10 timesRepeat: [notIncluded do: [:ea | slowerSet includes: ea]]]). Array with: normalResults with: slowerResults #(#(280277 128057 2.18869 1) #(128431 128057 1.00292 1936)) The performance difference was 1 ms vs. 1936 ms. A hash collision attack would cause inefficient searches through a long series of overlapping "buckets" much like this code simulates. Because VW overlaps logical
buckets, a search that happens in a logical bucket position after an attacked bucket position would experience some of the cost that would otherwise be isolated to just one physical bucket. It wouldn't matter how many free slots VW grows the collection for
because the free spaces would be clustered distant from the start of the search from colliding items. Others have mentioned that the 28 bit computed hash range from a VW string may be of sufficient range to make it extremely difficult to do a string-based collision
attack. I agree it would be difficult to simulate an actual hash collision attack using a hashed collection of strings with 28 bits of hash value range; however, I suspect the assumption is being made that the hash values from other languages have less range
than 28 bits. I couldn't say one way or another. If 28 bits is enough range to be considered attack resistant then other languages could compute a 28 bit hash value from a string just like VW does.
Only a small variance of the hash value is needed to reduce a distributed "meet in the middle" collision attack. VW gets some automatic variance from the size
of the hash table. A physical buckets design doesn't need to adjust the size of the hash table as frequently as VW collections do but VW's more frequent growth of the hash table frequently can reduce some of the computed collisions from #initialIndexFor:boundedBy:.
Assuming that VW string's 28-bit hash range avoids the problem... The hash values for non-strings are not externally controllable in a way that could cause
a hash collision through passing data. For some, that will be the answer they are looking for, case closed. But hold on... While VW is generally safe from direct external attack by data, the performance problems that the attack hopes to exploit are so common
in VW that it is accepted as normal. Most objects in VW default to the 14 bit (16,383) range of values (#maximumIdentityHashValue) from #identityHash. This narrow range causes many smaller/distributed hash collisions. Many now recognize that the inefficiency
of VW hashed collections becomes more pronounced once it contains more than 16K items. Few recognize that the collision costs exist for smaller collections--because the slowness is pervasive and consistent.
In a sense, VW comes pre-attacked with the performance impact of the "meet in the middle" approach of the document. If VW's 28-bit hash range is exploitable
then the design VW uses for hashed collections would likely crash harder and faster than a design that doesn't have the problems I've described. The main difference is that VW would crash in acquiring more memory to grow collections while others might just
get very slow but linger on. The limited range that VW returns for #identityHash is one of the biggest performance problems that 32-bit VW has because it causes poor distribution into the
logical hash buckets. There are many ways to fix the problem by changing the collections or computations. One way to avoid the problem is to use a 64-bit VM that can offer a greater #maximumIdentityHashValue. Upgrading to 64-bit is easier than reimplementing
VW's hashed collections so it is the option Cincom would encourage. Regards, Paul Baumann From: [hidden email] [mailto:[hidden email]]
On Behalf Of Reinout Heeck
--
This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are
not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted
electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired. This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Keep in mind that hash buckets vs. open addressing linear probing has
complex tradeoffs that are not always evident at first sight. For example, with hash buckets then you have additional random memory reads and those cost a non-trivial amount of time. You also get more space overhead, and so GC costs more. With hash buckets it is easier to amortize the cost of growing the collection over time, but it is not the only way. Your mileage will vary with the application. Generally speaking, the ideal situation is open addressing linear probing with a good enough hash function that is not too expensive to calculate. If you don't have enough hash values (e.g.: you're using identityHash), then sure use hash buckets and address the problem with a data structure other than open addressing linear probing. Some comments... first, don't forget the Hash Analysis Tool bundle in the public store repository. If you want to quantitatively measure what happens in your actual application, then that's the way to go. > String>>hash is said to answer a 28-bit value. SmallInteger maxBits is > 29. Variance (that still has SmallInteger efficiency) can be added by > shifting the bits up 1 (or multiplying the hashValue by 2). Multiplying hash values by constants such as powers of two do not really affect their distribution, particularly because the hash table sizes are well chosen prime numbers. If the hash values were bad, the hash values * 2 will be just as bad. For this extra bit to matter (because e.g.: you couldn't get a hash function to produce reasonably good quality 28 bit hash values for your application), you'd need a hashed collection with at least 2^28 data slots. That does not fit in a 32 bit image (so, ok, the hashed table itself does, but it needs to hold immediates or duplicate objects instead of actual distinct objects because otherwise you quickly run out of address space). In a 64 bit image, such a table and its contents would quickly grow into the 10 gigabyte range. At that point, rehashing cost for such a table is not trivial anyway, so likely something special has to be done. > It is resistant to an algorithmic complexity attack when the maximum > hash value is greater than the number of buckets (which would always be > the case with strings). I don't understand, the point of the attack was to exploit the fact that there is not one hash bucket per hash value. Also, I don't understand the point of having more hash buckets than hash values. > Thanks Paul, I always learn something new from your posts! I just > spotted VW support Resolution 100401, which “implements Dictionaries > using buckets to resolve hash collisions”. The resolution talks about > improving the performance of finalization, which uses IdentityDictionary > and is subject to the consecutive slot problem. It looks like the > resolution just provides the new Dictionary classes, without actually > changing existing code to use them. Hopefully still some use to others, > and maybe a foundation to build on for a future solution integrated in > VW – e.g. to replace IdentityDictionary. I suspect an approach would be to provide a loadable library that implements hashed collections differently. In that way, it would not be so traumatic to change the existing implementation of things like Set (e.g.: namespaces depend on it). But read on, we have plans to improve performance. > VW hashed collections have logical hash buckets with search indexes that > are determined by the size of the hash table. The actual strategy is called "open addressing linear probing". > VW grows the ENTIRE hash > table after an addition when #fullCheck notices there are less than 25% > free slots. VW uses the hash value as a starting point for doing a > linear search that is limited only by the discovery of a nil slot or > equal object. VW is more affected by the distribution of empty slots > than it is by hash table size. VW needs to have the free slots because > without them the #find*OrNil: methods would have the cost of searching > an increasing percentage of the hash table as items are added. The free > slots serve as a stop that a separate physical bucket would normally > provide. This is more costly (like #fixCollisionsFrom:) than a physical > hash bucket design that can manage buckets individually. It really depends. If you have a good quality hash function, then open addressing linear probing will likely win against hash buckets. Your mileage will vary. > VW attempts to > expand the distribution range in methods like > #initialIndexFor:boundedBy:, but that would have little benefit when > there are many equal hash values for unequal objects. This bit of the current implementation is just bad and will get ripped out rather sooner than later. > VW has the linear search costs that the attack exploits With good hash functions, lookups into an open addressing linear probing based collection will likely find what you're looking for in one probe, two at most in common cases, and rarely with three or more probes. If your target hash bucket size is 10, you already lost. If it is 5, then you could pay a significant amount in memory overhead (e.g.: 12 byte overhead on 20 bytes of actual storage per bucket in a 32 bit image). It depends. > Others have mentioned that the 28 bit computed hash range from a VW > string may be of sufficient range to make it extremely difficult to do a > string-based collision attack. I agree it would be difficult to simulate > an actual hash collision attack using a hashed collection of strings > with 28 bits of hash value range; however, I suspect the assumption is > being made that the hash values from other languages have less range > than 28 bits. I couldn't say one way or another. If 28 bits is enough > range to be considered attack resistant then other languages could > compute a 28 bit hash value from a string just like VW does. The size of the hash value doesn't matter. If you're expecting attackers will try something like this, then you need hash functions that have good security properties. More "complex" (i.e.: obfuscated) hash functions like those randomizations etc do not in my view confer good security properties automatically. Creating a *good* hash function that is secure is not trivial. The ones that are well known are cryptographic. I reviewed on the order of 50 different non-cryptographic hash functions for the hashing book I wrote. I wouldn't call any of them "secure". Even if finding collisions is not immediately trivial like in the case of unfixed Perl, you can still brute force them because the vast majority are designed to produce 32 bit hash values. Any motivated attacker will surely smash them in no time. But even if they produced 256 bit hash values, so what? Non-crypto hashes were not designed to withstand security attacks, so it is very likely they will get smashed anyway with just a bit of patience. That's why you need crypto hashes if you expect attackers will target hashing. > Only a small variance of the hash value is needed to reduce a > distributed "meet in the middle" collision attack. VW gets some > automatic variance from the size of the hash table. A physical buckets > design doesn't need to adjust the size of the hash table as frequently > as VW collections do but VW's more frequent growth of the hash table > frequently can reduce some of the computed collisions from > #initialIndexFor:boundedBy:. With hash buckets you can amortize the cost of rehashing over time at the expense of increasing computing costs. Eventually, though, rehashing makes sense. FYI there are incremental rehashing solutions to this problem. I don't understand the variance comment. All you need to do is find collisions with a hash value of e.g.: 20000, and then send the server more than 20000 total requests with those collisions. There is no additional "variance" (I'd call that "scattering" or "scaling") applied, and performance still stinks. And if you really want to get clever, all you need to do is simulate what happens when you do Set withAll: (1 to: 100000) So, really, you can still provoke awful behavior even without finding collisions. This is not an issue of a base library that was never designed with security in mind. > Assuming that VW string's 28-bit hash range avoids the problem... The > hash values for non-strings are not externally controllable in a way > that could cause a hash collision through passing data. For some, that > will be the answer they are looking for, case closed. But hold on... > While VW is generally safe from direct external attack by data, the > performance problems that the attack hopes to exploit are so common in > VW that it is accepted as normal. Most objects in VW default to the 14 > bit (16,383) range of values (#maximumIdentityHashValue) from > #identityHash. This narrow range causes many smaller/distributed hash > collisions. Many now recognize that the inefficiency of VW hashed > collections becomes more pronounced once it contains more than 16K > items. Few recognize that the collision costs exist for smaller > collections--because the slowness is pervasive and consistent. 1. We have current plans, to be put into execution sooner than later, which will make the performance of identity based hashed collections much more similar to that of a hash bucket approach. This is regardless of the 16k object count. The current plan does not call for hash buckets to accomplish this. 2. As mentioned above, the scaling done by hashed collections will be ripped off. > In a sense, VW comes pre-attacked ... in that the base library is not designed with security in mind. The security needs of applications vary. Security is best achieved when incorporated in the application design from the start. Experts claim retrofitting security in an existing app is harder. > The limited range that VW returns for #identityHash is one of the > biggest performance problems that 32-bit VW has because it causes poor > distribution into the logical hash buckets. There are many ways to fix > the problem by changing the collections or computations. One way to > avoid the problem is to use a 64-bit VM that can offer a greater > #maximumIdentityHashValue. Upgrading to 64-bit is easier than > reimplementing VW's hashed collections so it is the option Cincom would > encourage. It really depends. For example, in your application you might be able to find a way to implement identityHash in terms of hash without violating any implementation assumptions. Upgrading to 64 bit is an option if you are stuck with identityHash, but eventually the issue will likely come back as your image grows into the 10s of gigabytes. Sooner or later someone will have to look at application use and choose a reasonable strategy that fits the application's needs. But why not increase the identity hash range to 32 and 64 bits, right? Well, because that increase costs us in memory use and, because of additional memory reads by the CPU, in performance. Plus, if you don't need the larger identity hash values, you get a performance penalty for no gain. It seems to me this is more of a balance act, and that no solution will be ideal for everyone. I'm not sure about the cost of reimplementing hashed collections. How many times have we as a whole written a new type of hashed collection already? Maybe we should really consolidate efforts and make a loadable library. Who knows, maybe I will try that soon. ====================== A small digression: in my personal opinion, I'd like to delete most current hashed collections because they change behavior by subclassing instead of using composition and delegation. In addition, they hardcode the storage strategy and that is why we have to reimplement them from scratch when we need something special. In other words, even the storage strategy should be dealt with by composition and delegation. But that's my personal opinion anyhow. ====================== Finally, if your app does not need a 64 bit address space, then the 32 bit address space version could run faster because it will tend to use less memory overall. Andres. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Paul Baumann
Finally, one comment before I forget. Make sure to test new hashed
collections in the same conditions they will be used in practice. This seems obvious at first glance, but it is trickier than it appears. Let's say the intent is to e.g.: replace Set with NewSet. Benchmarking NewSet against Set with strings alone may not be enough. Since Set is running the system, some of Set's message sends will likely be of the slow megamorphic kind (e.g.: sending hash and = to a multitude of receiver classes in findElementOrNil:). Those same message sends in NewSet will likely be the much faster mono or polymorphic kinds instead (because NewSet only has to deal with strings and nil). Such timing tests could be misleading. Therefore, measurements of new collections should consider the effect of megamorphic message send into consideration. For the sake of illustration, contrast the following two expressions: | c | c := Object withAllSubclasses reverse. Time millisecondsToRun: [1000 timesRepeat: [c do: [:x | x yourself]]] | c | c := Object withAllSubclasses reverse. c atAllPut: c first. Time millisecondsToRun: [1000 timesRepeat: [c do: [:x | x yourself]]] On 1/11/12 13:27 , Paul Baumann wrote: > Thanks Steven. > > Hashed collections that use physical buckets are better when collections > are long-lived or very large. The VW collections (with logical buckets) > are faster to create and dispose of. I'd implemented and shared physical > bucket designs a dozen different ways. I'd be surprised if that Cincom > resolution didn't have my name associated with a resolution like you've > mentioned. Approaches that increase the hash range won out. The need for > faster implementations becomes even more limited once you have that. > Some big performance problems with Store caches could have been fixed > with a better hash collection design. Those store-specific performance > problems had been reported to Cincom but I recall they believed that > their redesign of Store would address those problems. > > Here is an answer for the main topic: > >> > Question 2: can we generically counter such an attack? > > Yes. With a change to one method that actually increases performance. > > A generic fix would need to work when there are fewer hash buckets than > hash values (like when indexing 28-bit string hash values to a 100 > position hash table) and when there are more hash buckets than hash > values (like with 14-bit identityHash). > > "Default" > > #(0.333333 1.0 #(3->5461)) "table was 33% of hash value range; > distribution range was 100% of hash value; distribution was even" > > #(1.74998 1.0 #(1->16383)) "table was 175% of hash value range; > distribution range was 100% of hash value; distribution was even" > > The goal we'd want for these numbers is for the first to test to stay " > #(0.333333 1.0 #(3->5461))" and for the second test to increase the > second column (to be above 1.0) and the number of pairs shown in the > third column. > > String>>hash is said to answer a 28-bit value. SmallInteger maxBits is > 29. Variance (that still has SmallInteger efficiency) can be added by > shifting the bits up 1 (or multiplying the hashValue by 2). > > " bitShift: 1 " > > #(0.333333 1.0 #(3->5461)) > > #(1.74998 1.14287 #(1->12287 2->2048)) > > It is resistant to an algorithmic complexity attack when the maximum > hash value is greater than the number of buckets (which would always be > the case with strings). There is still a even distribution when there > are fewer buckets than hash values. When there are more buckets than > hash values then the distribution is into 14% more index positions than > there were hash values. A meaningful variance is created and average > bucket size was lower. This trick increases the variance of all the hash > values such that VW hashing is more efficient even when constrained to > the traditional VW 14-bit identityHash value. > > I've added the implementation to the latest version (currently 1.6) of > ICXVwHashTuning. If you want to give it a try then load version 1.6 (or > later) and then read the Set class>>hashTuning_documentation method. > > Thanks for the kind words. > > Regards, > > Paul Baumann > > *From:*Steven Kelly [mailto:[hidden email]] > *Sent:* Monday, January 09, 2012 05:02 > *To:* Paul Baumann; VWNC > *Subject:* RE: [vwnc] VW resistance to algorithmic complexity attacks? > > Thanks Paul, I always learn something new from your posts! I just > spotted VW support Resolution 100401, which “implements Dictionaries > using buckets to resolve hash collisions”. The resolution talks about > improving the performance of finalization, which uses IdentityDictionary > and is subject to the consecutive slot problem. It looks like the > resolution just provides the new Dictionary classes, without actually > changing existing code to use them. Hopefully still some use to others, > and maybe a foundation to build on for a future solution integrated in > VW – e.g. to replace IdentityDictionary. > > All the best, > > Steve > > *From:*[hidden email] [mailto:[hidden email]] *On > Behalf Of *Paul Baumann > *Sent:* 7. tammikuuta 2012 1:39 > *To:* Reinout Heeck; 'VWNC' > *Subject:* Re: [vwnc] VW resistance to algorithmic complexity attacks? > >> > I lack the time to parse this in relation to VW > > Sorry, but the explanation I'd give for how this relates to VW will take > longer to parse... :) > > Hash collisions typically happen due to a combination of hash range and > hash table size. The smallest of either of these typically determines > the number of collisions; however, VW is different. > > The example in the document shows a hash->bucket design in which the > hash value is used to find a sequence that is slowly searched. The > attack is to add unequal objects of equal hash value to maximize the > amount of linear search and growth in each hash bucket. An example of > hash->bucket implementation in VW is the class-side behavior Symbol uses > to intern strings. The trick would also be to minimize the hash variance > of the data added to reduce the chance that a rehash will increase the > size of the hash table and that distribution into buckets will not be > improved once the hash table size is increased. > > VW hashed collections have logical hash buckets with search indexes that > are determined by the size of the hash table. VW grows the ENTIRE hash > table after an addition when #fullCheck notices there are less than 25% > free slots. VW uses the hash value as a starting point for doing a > linear search that is limited only by the discovery of a nil slot or > equal object. VW is more affected by the distribution of empty slots > than it is by hash table size. VW needs to have the free slots because > without them the #find*OrNil: methods would have the cost of searching > an increasing percentage of the hash table as items are added. The free > slots serve as a stop that a separate physical bucket would normally > provide. This is more costly (like #fixCollisionsFrom:) than a physical > hash bucket design that can manage buckets individually. VW attempts to > expand the distribution range in methods like > #initialIndexFor:boundedBy:, but that would have little benefit when > there are many equal hash values for unequal objects. VW has the linear > search costs that the attack exploits, searches more than once for some > operations, has a larger scope of growth, and searches through objects > that would otherwise belong to another bucket. The VW implementation > would be more susceptible to a hash collision attack than the > hash->bucket design that the document describes how to exploit. > > "This artificially forces a highly saturated hash table to simulate > lookup cost alone that could be incurred with a hash collision attack ." > > | normalSet normalResults slowerSet slowerResults notIncluded | > > normalSet := Set new: Symbol tableSize. > > notIncluded := Set new: Symbol tableSize. > > Symbol table size to: 1 by: -1 do: [:bucketIndex | > > (bucketIndex < 6 ifTrue: [notIncluded] ifFalse: [normalSet]) > > addAll: (Symbol table at: bucketIndex). > > ]. > > notIncluded := notIncluded asArray. > > normalResults := Array > > with: normalSet basicSize > > with: normalSet size > > with: (normalSet basicSize / normalSet size) asFloat > > with: (Time millisecondsToRun: [10 timesRepeat: [notIncluded do: [:ea | > normalSet includes: ea]]]). > > slowerSet := normalSet copyEmpty: normalSet size + 1. > > normalSet do: [:each | slowerSet noCheckAdd: each]. > > slowerResults := Array > > with: slowerSet basicSize > > with: slowerSet size > > with: (slowerSet basicSize / slowerSet size) asFloat > > with: (Time millisecondsToRun: [10 timesRepeat: [notIncluded do: [:ea | > slowerSet includes: ea]]]). > > Array > > with: normalResults > > with: slowerResults > > #(#(280277 128057 2.18869 1) #(128431 128057 1.00292 1936)) > > The performance difference was 1 ms vs. 1936 ms. > > A hash collision attack would cause inefficient searches through a long > series of overlapping "buckets" much like this code simulates. Because > VW overlaps logical buckets, a search that happens in a logical bucket > position after an attacked bucket position would experience some of the > cost that would otherwise be isolated to just one physical bucket. It > wouldn't matter how many free slots VW grows the collection for because > the free spaces would be clustered distant from the start of the search > from colliding items. > > Others have mentioned that the 28 bit computed hash range from a VW > string may be of sufficient range to make it extremely difficult to do a > string-based collision attack. I agree it would be difficult to simulate > an actual hash collision attack using a hashed collection of strings > with 28 bits of hash value range; however, I suspect the assumption is > being made that the hash values from other languages have less range > than 28 bits. I couldn't say one way or another. If 28 bits is enough > range to be considered attack resistant then other languages could > compute a 28 bit hash value from a string just like VW does. > > Only a small variance of the hash value is needed to reduce a > distributed "meet in the middle" collision attack. VW gets some > automatic variance from the size of the hash table. A physical buckets > design doesn't need to adjust the size of the hash table as frequently > as VW collections do but VW's more frequent growth of the hash table > frequently can reduce some of the computed collisions from > #initialIndexFor:boundedBy:. > > Assuming that VW string's 28-bit hash range avoids the problem... The > hash values for non-strings are not externally controllable in a way > that could cause a hash collision through passing data. For some, that > will be the answer they are looking for, case closed. But hold on... > While VW is generally safe from direct external attack by data, the > performance problems that the attack hopes to exploit are so common in > VW that it is accepted as normal. Most objects in VW default to the 14 > bit (16,383) range of values (#maximumIdentityHashValue) from > #identityHash. This narrow range causes many smaller/distributed hash > collisions. Many now recognize that the inefficiency of VW hashed > collections becomes more pronounced once it contains more than 16K > items. Few recognize that the collision costs exist for smaller > collections--because the slowness is pervasive and consistent. > > In a sense, VW comes pre-attacked with the performance impact of the > "meet in the middle" approach of the document. If VW's 28-bit hash range > is exploitable then the design VW uses for hashed collections would > likely crash harder and faster than a design that doesn't have the > problems I've described. The main difference is that VW would crash in > acquiring more memory to grow collections while others might just get > very slow but linger on. > > The limited range that VW returns for #identityHash is one of the > biggest performance problems that 32-bit VW has because it causes poor > distribution into the logical hash buckets. There are many ways to fix > the problem by changing the collections or computations. One way to > avoid the problem is to use a 64-bit VM that can offer a greater > #maximumIdentityHashValue. Upgrading to 64-bit is easier than > reimplementing VW's hashed collections so it is the option Cincom would > encourage. > > Regards, > > Paul Baumann > > *From:*[hidden email] [mailto:[hidden email]] *On > Behalf Of *Reinout Heeck > *Sent:* Wednesday, January 04, 2012 11:00 > *To:* 'VWNC' > *Subject:* [vwnc] VW resistance to algorithmic complexity attacks? > > > I lack the time to parse this in relation to VW, so I throw out some > questions to more knowledgeable people here. > > "Efficient Denial of Service Attacks on Web Application Platforms" > http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf > > The short version: > if you hand a POST request to a web server it will typically keep the > names of the POST variables in a hashed collection. If you know the > hashing algorithm you can craft these names such that all of them have > the same hash. Put several million onto a single POST request and the > web server will churn for minutes on a single request. > > > This DOS attack can also target other hashed collections like element > names in a parsed xml document. > > Question 1: is the VW hashing algorithm for strings structured such that > it is cheap to generate millions of strings with the same hash value? > > Question 2: can we generically counter such an attack? The paper > proposes random hash seeding which seems inappropriate for image-based > development. Should we consider using a new hash function that is > (cryptographically) one-way? > > > R > - > > -- > > Soops b.v. <http://www.soops.nl/>*Reinout Heeck, Sr. Software Engineer * > > *Soops - Specialists in Object Technology * > > *Tel : +31 (0) 20 6222844 > Fax : +31 (0) 20 6360827 > Web: www.soops.nl * > > * Please consider the environment before printing this e-mail * > > ------------------------------------------------------------------------ > > Dit e-mailbericht is alleen bestemd voor de geadresseerde(n). Gebruik > door anderen is niet toegestaan. Indien u niet de geadresseerde(n) bent > wordt u verzocht de verzender hiervan op de hoogte te stellen en het > bericht te verwijderen. Door de elektronische verzending kunnen aan de > inhoud van dit bericht geen rechten worden ontleend. > > Soops B.V. is gevestigd te Amsterdam, Nederland, en is geregistreerd bij > de Kamer van Koophandel onder nummer 33240368. Soops B.V. levert volgens > de Fenit voorwaarden, gedeponeerd te Den Haag op 8 december 1994 onder > nummer 1994/189. > > ------------------------------------------------------------------------ > > This e-mail message is intended to be exclusively for the addressee. If > you are not the intended recipient you are kindly requested not to make > any use whatsoever of the contents and to notify the sender immediately > by returning this e-mail message. No rights can be derived from this > message. > > Soops B.V. is a private limited liability company and has its seat at > Amsterdam, The Netherlands and is registered with the Trade Registry of > the Chamber of Commerce and Industry under number 33240368. Soops B.V. > delivers according to the General Terms and Conditions of Business of > Fenit, registered at The Hague, The Netherlands on December 8th, 1994, > under number 1994/189. > > ------------------------------------------------------------------------ > > This message may contain confidential information and is intended for > specific recipients unless explicitly noted otherwise. If you have > reason to believe you are not an intended recipient of this message, > please delete it and notify the sender. This message may not represent > the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or > affiliates, and does not constitute a contract or guarantee. Unencrypted > electronic mail is not secure and the recipient of this message is > expected to provide safeguards from viruses and pursue alternate means > of communication where privacy or a binding message is desired. > > > ------------------------------------------------------------------------ > This message may contain confidential information and is intended for > specific recipients unless explicitly noted otherwise. If you have > reason to believe you are not an intended recipient of this message, > please delete it and notify the sender. This message may not represent > the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or > affiliates, and does not constitute a contract or guarantee. Unencrypted > electronic mail is not secure and the recipient of this message is > expected to provide safeguards from viruses and pursue alternate means > of communication where privacy or a binding message is desired. vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
FYI, if it helps, we ship a pluggable, reasonably thread safe
implementation of a hash bucketed set in Parcels/UniqueObjectTable.pcl. There is also an example parcel of how to make all bytecode arrays unique with it. On 1/11/2012 7:31 PM, Andres Valloud wrote: > Finally, one comment before I forget. Make sure to test new hashed > collections in the same conditions they will be used in practice. This > seems obvious at first glance, but it is trickier than it appears. > > Let's say the intent is to e.g.: replace Set with NewSet. Benchmarking > NewSet against Set with strings alone may not be enough. Since Set is > running the system, some of Set's message sends will likely be of the > slow megamorphic kind (e.g.: sending hash and = to a multitude of > receiver classes in findElementOrNil:). Those same message sends in > NewSet will likely be the much faster mono or polymorphic kinds instead > (because NewSet only has to deal with strings and nil). Such timing > tests could be misleading. Therefore, measurements of new collections > should consider the effect of megamorphic message send into > consideration. For the sake of illustration, contrast the following two > expressions: > > | c | > c := Object withAllSubclasses reverse. > Time millisecondsToRun: [1000 timesRepeat: [c do: [:x | x yourself]]] > > | c | > c := Object withAllSubclasses reverse. > c atAllPut: c first. > Time millisecondsToRun: [1000 timesRepeat: [c do: [:x | x yourself]]] > > > On 1/11/12 13:27 , Paul Baumann wrote: >> Thanks Steven. >> >> Hashed collections that use physical buckets are better when collections >> are long-lived or very large. The VW collections (with logical buckets) >> are faster to create and dispose of. I'd implemented and shared physical >> bucket designs a dozen different ways. I'd be surprised if that Cincom >> resolution didn't have my name associated with a resolution like you've >> mentioned. Approaches that increase the hash range won out. The need for >> faster implementations becomes even more limited once you have that. >> Some big performance problems with Store caches could have been fixed >> with a better hash collection design. Those store-specific performance >> problems had been reported to Cincom but I recall they believed that >> their redesign of Store would address those problems. >> >> Here is an answer for the main topic: >> >>>> Question 2: can we generically counter such an attack? >> >> Yes. With a change to one method that actually increases performance. >> >> A generic fix would need to work when there are fewer hash buckets than >> hash values (like when indexing 28-bit string hash values to a 100 >> position hash table) and when there are more hash buckets than hash >> values (like with 14-bit identityHash). >> >> "Default" >> >> #(0.333333 1.0 #(3->5461)) "table was 33% of hash value range; >> distribution range was 100% of hash value; distribution was even" >> >> #(1.74998 1.0 #(1->16383)) "table was 175% of hash value range; >> distribution range was 100% of hash value; distribution was even" >> >> The goal we'd want for these numbers is for the first to test to stay " >> #(0.333333 1.0 #(3->5461))" and for the second test to increase the >> second column (to be above 1.0) and the number of pairs shown in the >> third column. >> >> String>>hash is said to answer a 28-bit value. SmallInteger maxBits is >> 29. Variance (that still has SmallInteger efficiency) can be added by >> shifting the bits up 1 (or multiplying the hashValue by 2). >> >> " bitShift: 1" >> >> #(0.333333 1.0 #(3->5461)) >> >> #(1.74998 1.14287 #(1->12287 2->2048)) >> >> It is resistant to an algorithmic complexity attack when the maximum >> hash value is greater than the number of buckets (which would always be >> the case with strings). There is still a even distribution when there >> are fewer buckets than hash values. When there are more buckets than >> hash values then the distribution is into 14% more index positions than >> there were hash values. A meaningful variance is created and average >> bucket size was lower. This trick increases the variance of all the hash >> values such that VW hashing is more efficient even when constrained to >> the traditional VW 14-bit identityHash value. >> >> I've added the implementation to the latest version (currently 1.6) of >> ICXVwHashTuning. If you want to give it a try then load version 1.6 (or >> later) and then read the Set class>>hashTuning_documentation method. >> >> Thanks for the kind words. >> >> Regards, >> >> Paul Baumann >> >> *From:*Steven Kelly [mailto:[hidden email]] >> *Sent:* Monday, January 09, 2012 05:02 >> *To:* Paul Baumann; VWNC >> *Subject:* RE: [vwnc] VW resistance to algorithmic complexity attacks? >> >> Thanks Paul, I always learn something new from your posts! I just >> spotted VW support Resolution 100401, which “implements Dictionaries >> using buckets to resolve hash collisions”. The resolution talks about >> improving the performance of finalization, which uses IdentityDictionary >> and is subject to the consecutive slot problem. It looks like the >> resolution just provides the new Dictionary classes, without actually >> changing existing code to use them. Hopefully still some use to others, >> and maybe a foundation to build on for a future solution integrated in >> VW – e.g. to replace IdentityDictionary. >> >> All the best, >> >> Steve >> >> *From:*[hidden email] [mailto:[hidden email]] *On >> Behalf Of *Paul Baumann >> *Sent:* 7. tammikuuta 2012 1:39 >> *To:* Reinout Heeck; 'VWNC' >> *Subject:* Re: [vwnc] VW resistance to algorithmic complexity attacks? >> >>>> I lack the time to parse this in relation to VW >> >> Sorry, but the explanation I'd give for how this relates to VW will take >> longer to parse... :) >> >> Hash collisions typically happen due to a combination of hash range and >> hash table size. The smallest of either of these typically determines >> the number of collisions; however, VW is different. >> >> The example in the document shows a hash->bucket design in which the >> hash value is used to find a sequence that is slowly searched. The >> attack is to add unequal objects of equal hash value to maximize the >> amount of linear search and growth in each hash bucket. An example of >> hash->bucket implementation in VW is the class-side behavior Symbol uses >> to intern strings. The trick would also be to minimize the hash variance >> of the data added to reduce the chance that a rehash will increase the >> size of the hash table and that distribution into buckets will not be >> improved once the hash table size is increased. >> >> VW hashed collections have logical hash buckets with search indexes that >> are determined by the size of the hash table. VW grows the ENTIRE hash >> table after an addition when #fullCheck notices there are less than 25% >> free slots. VW uses the hash value as a starting point for doing a >> linear search that is limited only by the discovery of a nil slot or >> equal object. VW is more affected by the distribution of empty slots >> than it is by hash table size. VW needs to have the free slots because >> without them the #find*OrNil: methods would have the cost of searching >> an increasing percentage of the hash table as items are added. The free >> slots serve as a stop that a separate physical bucket would normally >> provide. This is more costly (like #fixCollisionsFrom:) than a physical >> hash bucket design that can manage buckets individually. VW attempts to >> expand the distribution range in methods like >> #initialIndexFor:boundedBy:, but that would have little benefit when >> there are many equal hash values for unequal objects. VW has the linear >> search costs that the attack exploits, searches more than once for some >> operations, has a larger scope of growth, and searches through objects >> that would otherwise belong to another bucket. The VW implementation >> would be more susceptible to a hash collision attack than the >> hash->bucket design that the document describes how to exploit. >> >> "This artificially forces a highly saturated hash table to simulate >> lookup cost alone that could be incurred with a hash collision attack ." >> >> | normalSet normalResults slowerSet slowerResults notIncluded | >> >> normalSet := Set new: Symbol tableSize. >> >> notIncluded := Set new: Symbol tableSize. >> >> Symbol table size to: 1 by: -1 do: [:bucketIndex | >> >> (bucketIndex< 6 ifTrue: [notIncluded] ifFalse: [normalSet]) >> >> addAll: (Symbol table at: bucketIndex). >> >> ]. >> >> notIncluded := notIncluded asArray. >> >> normalResults := Array >> >> with: normalSet basicSize >> >> with: normalSet size >> >> with: (normalSet basicSize / normalSet size) asFloat >> >> with: (Time millisecondsToRun: [10 timesRepeat: [notIncluded do: [:ea | >> normalSet includes: ea]]]). >> >> slowerSet := normalSet copyEmpty: normalSet size + 1. >> >> normalSet do: [:each | slowerSet noCheckAdd: each]. >> >> slowerResults := Array >> >> with: slowerSet basicSize >> >> with: slowerSet size >> >> with: (slowerSet basicSize / slowerSet size) asFloat >> >> with: (Time millisecondsToRun: [10 timesRepeat: [notIncluded do: [:ea | >> slowerSet includes: ea]]]). >> >> Array >> >> with: normalResults >> >> with: slowerResults >> >> #(#(280277 128057 2.18869 1) #(128431 128057 1.00292 1936)) >> >> The performance difference was 1 ms vs. 1936 ms. >> >> A hash collision attack would cause inefficient searches through a long >> series of overlapping "buckets" much like this code simulates. Because >> VW overlaps logical buckets, a search that happens in a logical bucket >> position after an attacked bucket position would experience some of the >> cost that would otherwise be isolated to just one physical bucket. It >> wouldn't matter how many free slots VW grows the collection for because >> the free spaces would be clustered distant from the start of the search >> from colliding items. >> >> Others have mentioned that the 28 bit computed hash range from a VW >> string may be of sufficient range to make it extremely difficult to do a >> string-based collision attack. I agree it would be difficult to simulate >> an actual hash collision attack using a hashed collection of strings >> with 28 bits of hash value range; however, I suspect the assumption is >> being made that the hash values from other languages have less range >> than 28 bits. I couldn't say one way or another. If 28 bits is enough >> range to be considered attack resistant then other languages could >> compute a 28 bit hash value from a string just like VW does. >> >> Only a small variance of the hash value is needed to reduce a >> distributed "meet in the middle" collision attack. VW gets some >> automatic variance from the size of the hash table. A physical buckets >> design doesn't need to adjust the size of the hash table as frequently >> as VW collections do but VW's more frequent growth of the hash table >> frequently can reduce some of the computed collisions from >> #initialIndexFor:boundedBy:. >> >> Assuming that VW string's 28-bit hash range avoids the problem... The >> hash values for non-strings are not externally controllable in a way >> that could cause a hash collision through passing data. For some, that >> will be the answer they are looking for, case closed. But hold on... >> While VW is generally safe from direct external attack by data, the >> performance problems that the attack hopes to exploit are so common in >> VW that it is accepted as normal. Most objects in VW default to the 14 >> bit (16,383) range of values (#maximumIdentityHashValue) from >> #identityHash. This narrow range causes many smaller/distributed hash >> collisions. Many now recognize that the inefficiency of VW hashed >> collections becomes more pronounced once it contains more than 16K >> items. Few recognize that the collision costs exist for smaller >> collections--because the slowness is pervasive and consistent. >> >> In a sense, VW comes pre-attacked with the performance impact of the >> "meet in the middle" approach of the document. If VW's 28-bit hash range >> is exploitable then the design VW uses for hashed collections would >> likely crash harder and faster than a design that doesn't have the >> problems I've described. The main difference is that VW would crash in >> acquiring more memory to grow collections while others might just get >> very slow but linger on. >> >> The limited range that VW returns for #identityHash is one of the >> biggest performance problems that 32-bit VW has because it causes poor >> distribution into the logical hash buckets. There are many ways to fix >> the problem by changing the collections or computations. One way to >> avoid the problem is to use a 64-bit VM that can offer a greater >> #maximumIdentityHashValue. Upgrading to 64-bit is easier than >> reimplementing VW's hashed collections so it is the option Cincom would >> encourage. >> >> Regards, >> >> Paul Baumann >> >> *From:*[hidden email] [mailto:[hidden email]] *On >> Behalf Of *Reinout Heeck >> *Sent:* Wednesday, January 04, 2012 11:00 >> *To:* 'VWNC' >> *Subject:* [vwnc] VW resistance to algorithmic complexity attacks? >> >> >> I lack the time to parse this in relation to VW, so I throw out some >> questions to more knowledgeable people here. >> >> "Efficient Denial of Service Attacks on Web Application Platforms" >> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf >> >> The short version: >> if you hand a POST request to a web server it will typically keep the >> names of the POST variables in a hashed collection. If you know the >> hashing algorithm you can craft these names such that all of them have >> the same hash. Put several million onto a single POST request and the >> web server will churn for minutes on a single request. >> >> >> This DOS attack can also target other hashed collections like element >> names in a parsed xml document. >> >> Question 1: is the VW hashing algorithm for strings structured such that >> it is cheap to generate millions of strings with the same hash value? >> >> Question 2: can we generically counter such an attack? The paper >> proposes random hash seeding which seems inappropriate for image-based >> development. Should we consider using a new hash function that is >> (cryptographically) one-way? >> >> >> R >> - >> >> -- >> >> Soops b.v.<http://www.soops.nl/>*Reinout Heeck, Sr. Software Engineer * >> >> *Soops - Specialists in Object Technology * >> >> *Tel : +31 (0) 20 6222844 >> Fax : +31 (0) 20 6360827 >> Web: www.soops.nl * >> >> * Please consider the environment before printing this e-mail * >> >> ------------------------------------------------------------------------ >> >> Dit e-mailbericht is alleen bestemd voor de geadresseerde(n). Gebruik >> door anderen is niet toegestaan. Indien u niet de geadresseerde(n) bent >> wordt u verzocht de verzender hiervan op de hoogte te stellen en het >> bericht te verwijderen. Door de elektronische verzending kunnen aan de >> inhoud van dit bericht geen rechten worden ontleend. >> >> Soops B.V. is gevestigd te Amsterdam, Nederland, en is geregistreerd bij >> de Kamer van Koophandel onder nummer 33240368. Soops B.V. levert volgens >> de Fenit voorwaarden, gedeponeerd te Den Haag op 8 december 1994 onder >> nummer 1994/189. >> >> ------------------------------------------------------------------------ >> >> This e-mail message is intended to be exclusively for the addressee. If >> you are not the intended recipient you are kindly requested not to make >> any use whatsoever of the contents and to notify the sender immediately >> by returning this e-mail message. No rights can be derived from this >> message. >> >> Soops B.V. is a private limited liability company and has its seat at >> Amsterdam, The Netherlands and is registered with the Trade Registry of >> the Chamber of Commerce and Industry under number 33240368. Soops B.V. >> delivers according to the General Terms and Conditions of Business of >> Fenit, registered at The Hague, The Netherlands on December 8th, 1994, >> under number 1994/189. >> >> ------------------------------------------------------------------------ >> >> This message may contain confidential information and is intended for >> specific recipients unless explicitly noted otherwise. If you have >> reason to believe you are not an intended recipient of this message, >> please delete it and notify the sender. This message may not represent >> the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or >> affiliates, and does not constitute a contract or guarantee. Unencrypted >> electronic mail is not secure and the recipient of this message is >> expected to provide safeguards from viruses and pursue alternate means >> of communication where privacy or a binding message is desired. >> >> >> ------------------------------------------------------------------------ >> This message may contain confidential information and is intended for >> specific recipients unless explicitly noted otherwise. If you have >> reason to believe you are not an intended recipient of this message, >> please delete it and notify the sender. This message may not represent >> the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or >> affiliates, and does not constitute a contract or guarantee. Unencrypted >> electronic mail is not secure and the recipient of this message is >> expected to provide safeguards from viruses and pursue alternate means >> of communication where privacy or a binding message is desired. > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Reinout Heeck-2
Is it only me or the IRC server (irc.parcplace.net) is down ? ----------------- Benoit St-Jean Yahoo! Messenger: bstjean Blogue: endormitoire.wordpress.com A standpoint is an intellectual horizon of radius zero. (Albert Einstein) _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Georg Georg Heeg eK, Dortmund und Köthen, HR Dortmund A 12812 Wallstraße 22, 06366 Köthen Tel. +49-3496-214328, Fax +49-3496-214712 Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Benoit St-Jean Is it only me or the IRC server (irc.parcplace.net) is down ? ----------------- _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Benoit St-Jean
We're looking into it, please be patient.
On 5/31/13 3:26 PM, Benoit St-Jean wrote: > Is it only me or the IRC server (irc.parcplace.net) is down ? > ----------------- > Benoit St-Jean > Yahoo! Messenger: bstjean > Blogue: endormitoire.wordpress.com > A standpoint is an intellectual horizon of radius zero. > (Albert Einstein) _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Free forum by Nabble | Edit this page |