Hi all,
I've spent a frustrating day trying to find a library which will give me reliable fast file id. I'm managing a directory of images for a web-server with about 500 jpegs. I thought a crc32 or md-5 hash which took the file path as a parameter would be the go. Anyway, I'm trying http://www.esquadro.com.br/md5server.zip which doesn't really exist as a url anymore. It works fine for slow usage, but when I throw it a #collect: on 400 files, I get a memory error, which looks like: 0x12A000, IP 0x10013A7A (C:\Program Files\Common Files\Object Arts\Dolphin Smalltalk 5.1\DolphinVM005.dll)') ProcessorScheduler>>gpFault: [] in ProcessorScheduler>>vmi:list:no:with: BlockClosure>>ifCurtailed: ProcessorScheduler>>vmi:list:no:with: BSTR>>at: BSTR(ExternalAddress)>>replaceBytesOf:from:to:startingAt: UnicodeString class>>fromAddress:length: BSTR>>asUnicodeString BSTR>>asString BSTR>>value BSTR(DWORDBytes)>>asObject MD5IaaMD5>>md5File: [] in File class>>md5Find: [] in OrderedCollection(SequenceableCollection)>>collect: OrderedCollection>>from:to:keysAndValuesDo: OrderedCollection(SequenceableCollection)>>from:keysAndValuesDo: OrderedCollection(SequenceableCollection)>>keysAndValuesDo: OrderedCollection(SequenceableCollection)>>collect: File class>>md5Find: UndefinedObject>>{unbound}doIt ---------------------------END STACK DUMP--------------- Seems to me that there is a lot of memory churn cause by allocating and de-allocating Strings and their close relatives. The main interface to the algorithm is: below. Can I safely reuse the instance of BSTR as a buffer in an instance variable. Or does a new one need to be allocated each time? Is it a fundamental problem with the external library? I've tried several today & they all are too complex or broken. md5File: sA "Answer the <BSTR> result of invoking the MD5File() method of the COM object. Helpstring: MD5 over a file content" | answer | answer := BSTR new. self MD5File: sA sD: answer. ^answer asObject Thanks, --Peter Goodall |
peterg wrote:
> Hi all, > > I've spent a frustrating day trying to find a library which will give > me reliable fast file id. I'm managing a directory of images for a > web-server with about 500 jpegs. I thought a crc32 or md-5 hash which > took the file path as a parameter would be the go. I can't help you with your Active-X problem, but do you really need to go that route ? On my laptop this loop: dir := ...whereever... hashes := LookupTable new. File forAll: '*.jpeg' in: dir do: [:each || name stream hash | name := each path. stream := FileStream read: each path text: false. hash := (SecureHashAlgorithm new) hashStream: stream. stream close. hashes at: name put: hash]. will compute RSH hashes of around 1200 files totalling 70-odd MBytes in about 6 seconds. FWIW, on the same machine doing the same loop with a different and much simpler/faster 64-bit hash, implemented in a DLL, took 1.6 seconds. That hash isn't a crc (crcs are mostly about error *correction*) nor is it a crypto-quality hash like SHA (which is generally considered superior to MD5), but it is very fast, and so does give an indication that you are unlikely to get *much* faster than by using Dolphin's SHA implementation. -- chris |
Wow thanks Chris!
No substitute for knowing the image is there? Most Grateful. --Peter Goodall "Chris Uppal" <[hidden email]> wrote in message news:3f5dea4e$0$33806$[hidden email]... > peterg wrote: [...] > dir := ...whereever... > hashes := LookupTable new. > File forAll: '*.jpeg' in: dir do: > [:each || name stream hash | > name := each path. > stream := FileStream read: each path text: false. > hash := (SecureHashAlgorithm new) hashStream: stream. > stream close. > hashes at: name put: hash]. > > will compute RSH hashes of around 1200 files totalling 70-odd MBytes in > seconds. > > FWIW, on the same machine doing the same loop with a different and much > simpler/faster 64-bit hash, implemented in a DLL, took 1.6 seconds. That hash > isn't a crc (crcs are mostly about error *correction*) nor is it a > crypto-quality hash like SHA (which is generally considered superior to MD5), > but it is very fast, and so does give an indication that you are unlikely to > get *much* faster than by using Dolphin's SHA implementation. > > -- chris > > |
Free forum by Nabble | Edit this page |