Performance of the MD5 hash

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance of the MD5 hash

Mark Pirogovsky-3
Hello All,

I am doing the verification of the files using MD5 hash algorithm, and
using it for the downloads and updates.

I added method like follow into the Filename class.

md5Sig
     | rs md5 |
     rs := self readStream binary.
     [md5 := Security.MD5 new integerHashStream: rs] ensure: [rs close].
     ^md5 printStringRadix: 16


Everything works great, BUT very slow.

For example, to run that for the 14 MB image file takes about 30 - 35
seconds.

The C written win32 exe program takes about 400 ms to run against that
same file on the same PC. It is almost two orders of magnitude different.

Is there anything could be done to speed up the MD5 proccess in ST?

--Mark Pirogovsky

Reply | Threaded
Open this post in threaded view
|

Re: Performance of the MD5 hash

kobetic
That sounds about right. I'm reasonably confident that we are not wasting too much time at the smalltalk level, i.e. I doubt that you can speed it up by an order of magnitude just by tweaking the smalltalk code. If you do, please let us know :-). The main issue here is that hashes and symmetric ciphers are often specifically optimized for register based bit-twiddling and you can hardly beat a hand optimized assembler there (which is quite likely at the heart of the C code there). These are very good examples to use when you want to prove that "smalltalk is slower than C" :-).  Moreover we're paying a hefty fine at the smalltalk level for some of the safety it gives us. Just try to iterate over that stream byte by byte doing nothing. I suspect that it will be still slower than the full hashing at the C level.

Anyway, this is largely the reason why we started playing with an interface to the OpenSSL crypto library. You can find the resulting parcel in preview/security starting with VW7.4. It contains drop-in replacements for the smalltalk based implementations for most of the offending algorithms. Assuming you have OpenSSL installed and accessible, you should be able to just replace Security.MD5 with Security.OpenSSL.MD5 and the rest should work the same.

I posted about this when 7.4 was coming out here: http://www.cincomsmalltalk.com/userblogs/cst/blogView?showComments=true&printTitle=VW_7.4_Spoilers_-_OpenSSL&entry=3311257224

Since then we've added support for the hash algorithms (VW7.4.1) and even RSA and DSA (available with VW7.5), although with the public key algorithms we are not nearly as much far behind (3-4 times aproximately last time I measured).

HTH,

Martin

Mark Pirogovsky wrote:

> Hello All,
>
> I am doing the verification of the files using MD5 hash algorithm, and
> using it for the downloads and updates.
>
> I added method like follow into the Filename class.
>
> md5Sig
>     | rs md5 |
>     rs := self readStream binary.
>     [md5 := Security.MD5 new integerHashStream: rs] ensure: [rs close].
>     ^md5 printStringRadix: 16
>
>
> Everything works great, BUT very slow.
>
> For example, to run that for the 14 MB image file takes about 30 - 35
> seconds.
>
> The C written win32 exe program takes about 400 ms to run against that
> same file on the same PC. It is almost two orders of magnitude different.
>
> Is there anything could be done to speed up the MD5 proccess in ST?
>
> --Mark Pirogovsky
>