When reading any object off the hard drive (represented as the
'byteArray' of a single MaObjectBuffer), Magma always reads 280 bytes. Since the #physicalSize is in the object header, it is then able to check the contents of the buffer to determine the size of the whole object and, if necessary, read more bytes in order to get the whole object. See MaObjectFiler>>#read:bytesInto:and:startingAt:filePosition: for this behavior. 280 bytes is enough for about 40 pointer references, allowing most objects to be read in just one disk access. I refer to it as the #trackSize, to remind me it is supposed to be how many bytes I think can the HD read in one operation without overrunning its own internal buffers and becoming inefficient. I was curious whether this number is optimized in 2010, so I ran the following script: ----------- |stats random| stats:=OrderedCollection new. random := Random new. nextPos:=100. (FileDirectory on: '/home/cmm/test3/cube.001.magma') fileNamed: 'objects.2.dat' do: [ : stream | | ba fileSize | ba := ByteArray new: 10000. fileSize := stream size. 100 to: 10000 by: 100 do: [ : n | stream position: 0. Transcript cr; show: (stats add: n->([stream maRead: n "bytes" bytesFromPosition: 1 of: ba atFilePosition: (random nextInt: fileSize ] bench)) ]]. stats ------------ Note that "objects.2.dat" is a real Magma file, 1.8GB in size. The goal of the script is bench how fast Squeak can read object buffers off the hard-drive when we obviously won't get many (if any) HD cache hits. I have a cheap, Western Digital Caviar HD, which produced the following output: 100->'119 per second.' 200->'98.5 per second.' 300->'106 per second.' 400->'106 per second.' 500->'101 per second.' 600->'102 per second.' 700->'99.9 per second.' 800->'103 per second.' 900->'104 per second.' 1000->'99 per second.' 1100->'97.9 per second.' 1200->'104 per second.' 1300->'111 per second.' 1400->'99.8 per second.' 1500->'107 per second.' 1600->'108 per second.' 1700->'95.6 per second.' 1800->'103 per second.' 1900->'108 per second.' 2000->'102 per second.' 2100->'103 per second.' 2200->'107 per second.' ... 3000->'98.7 per second.' 4000->'102 per second.' 5000->'106 per second.' 6000->'104 per second.' 7000->'101 per second.' 8000->'102 per second.' 9000->'102 per second.' 10000->'107 per second.' For curiousity, I also modified the script to read very small buffers from the HD, here are the results: 4->'137 per second.' 12->'146 per second.' 20->'154 per second.' 28->'143 per second.' (The HD busy light was solid ON during the test). At first I was puzzled because Magma has demonstrated much faster objects-per-second read rates than these, even including materialization, what gives? It's the HD buffering. Most of the time, objects are "clustered" closely together, so that reading one object causes the "next" object which will be read to already be in the HD's buffer. Here's the same script, except reading mostly "sequentially" through the file instead of from a random location: |stats random nextPos| stats:=OrderedCollection new. random := Random new. nextPos:=100. (FileDirectory on: '/home/cmm/test3/cube.001.magma') fileNamed: 'objects.2.dat' do: [ : stream | | ba fileSize | ba := ByteArray new: 10000. fileSize := stream size. #(4 12 20 28 100 200 300 400 500) [ : n | stream position: 0. Transcript cr; show: (stats add: n->([stream maRead: n "bytes" bytesFromPosition: 1 of: ba atFilePosition: ("random nextInt: fileSize" (nextPos := nextPos+n+10)) ] bench)) ]]. stats Now look at the results: "Reading sequentially rather than at a random position." 4->'1,160,000 per second.' 12->'1,210,000 per second.' 20->'1,100,000 per second.' 28->'973,000 per second.' ... 100->'1,030,000 per second.' 200->'321,000 per second.' 300->'215,000 per second.' 400->'160,000 per second.' 500->'227,000 per second.' Conclusions: - Hard-disk seek is definitely a bottleneck with Magma, or any Squeak application that requires random-access to a file. - When objects are clustered closely together, read performance can be dramatically better. - HD's with fast seek times, such as newer solid-state drives, might perform dramatically better. - I should consider reducing the trackSize from 280 bytes to ~100 bytes (or make it customizable); because the rate drops really fast after that and even a second read required could still be faster than an initial read. - Chris _______________________________________________ Magma mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/magma |
Very interesting.
Thanks for share it. Facu
On Wed, Nov 24, 2010 at 2:00 PM, Chris Muller <[hidden email]> wrote: When reading any object off the hard drive (represented as the _______________________________________________ Magma mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/magma |
In reply to this post by Chris Muller-4
Yep, it's called MagmaCompressor.
But still, if you have a multi-gigabyte repository and a HD buffer of, what, a few K? or even a client-cache of a 100MB, there is no way around needing to read off the HD.. On Sun, Nov 28, 2010 at 8:31 PM, Elliot Finley <[hidden email]> wrote: > maybe a defrag utility for Magma that places all objects in a collection > close together on the disk? > > On Wed, Nov 24, 2010 at 10:00 AM, Chris Muller <[hidden email]> wrote: >> >> When reading any object off the hard drive (represented as the >> 'byteArray' of a single MaObjectBuffer), Magma always reads 280 bytes. >> Since the #physicalSize is in the object header, it is then able to >> check the contents of the buffer to determine the size of the whole >> object and, if necessary, read more bytes in order to get the whole >> object. See MaObjectFiler>>#read:bytesInto:and:startingAt:filePosition: >> for this behavior. >> >> 280 bytes is enough for about 40 pointer references, allowing most >> objects to be read in just one disk access. I refer to it as the >> #trackSize, to remind me it is supposed to be how many bytes I think >> can the HD read in one operation without overrunning its own internal >> buffers and becoming inefficient. I was curious whether this number >> is optimized in 2010, so I ran the following script: >> >> ----------- >> |stats random| stats:=OrderedCollection new. random := Random new. >> nextPos:=100. >> (FileDirectory on: '/home/cmm/test3/cube.001.magma') fileNamed: >> 'objects.2.dat' do: >> [ : stream | | ba fileSize | ba := ByteArray new: 10000. >> fileSize := stream size. >> 100 to: 10000 by: 100 do: >> [ : n | >> stream position: 0. >> Transcript cr; show: (stats add: n->([stream >> maRead: n "bytes" >> bytesFromPosition: 1 >> of: ba >> atFilePosition: (random nextInt: fileSize ] >> bench)) ]]. >> stats >> ------------ >> >> Note that "objects.2.dat" is a real Magma file, 1.8GB in size. The >> goal of the script is bench how fast Squeak can read object buffers >> off the hard-drive when we obviously won't get many (if any) HD cache >> hits. >> >> I have a cheap, Western Digital Caviar HD, which produced the following >> output: >> >> 100->'119 per second.' >> 200->'98.5 per second.' >> 300->'106 per second.' >> 400->'106 per second.' >> 500->'101 per second.' >> 600->'102 per second.' >> 700->'99.9 per second.' >> 800->'103 per second.' >> 900->'104 per second.' >> 1000->'99 per second.' >> 1100->'97.9 per second.' >> 1200->'104 per second.' >> 1300->'111 per second.' >> 1400->'99.8 per second.' >> 1500->'107 per second.' >> 1600->'108 per second.' >> 1700->'95.6 per second.' >> 1800->'103 per second.' >> 1900->'108 per second.' >> 2000->'102 per second.' >> 2100->'103 per second.' >> 2200->'107 per second.' >> ... >> 3000->'98.7 per second.' >> 4000->'102 per second.' >> 5000->'106 per second.' >> 6000->'104 per second.' >> 7000->'101 per second.' >> 8000->'102 per second.' >> 9000->'102 per second.' >> 10000->'107 per second.' >> >> For curiousity, I also modified the script to read very small buffers >> from the HD, here are the results: >> >> 4->'137 per second.' >> 12->'146 per second.' >> 20->'154 per second.' >> 28->'143 per second.' >> >> (The HD busy light was solid ON during the test). >> >> At first I was puzzled because Magma has demonstrated much faster >> objects-per-second read rates than these, even including >> materialization, what gives? >> >> It's the HD buffering. Most of the time, objects are "clustered" >> closely together, so that reading one object causes the "next" object >> which will be read to already be in the HD's buffer. Here's the same >> script, except reading mostly "sequentially" through the file instead >> of from a random location: >> >> |stats random nextPos| stats:=OrderedCollection new. random := Random new. >> nextPos:=100. >> (FileDirectory on: '/home/cmm/test3/cube.001.magma') fileNamed: >> 'objects.2.dat' do: >> [ : stream | | ba fileSize | ba := ByteArray new: 10000. >> fileSize := stream size. >> #(4 12 20 28 100 200 300 400 500) >> [ : n | >> stream position: 0. >> Transcript cr; show: (stats add: n->([stream >> maRead: n "bytes" >> bytesFromPosition: 1 >> of: ba >> atFilePosition: ("random nextInt: fileSize" >> (nextPos := >> nextPos+n+10)) ] bench)) ]]. >> stats >> >> Now look at the results: >> >> "Reading sequentially rather than at a random position." >> 4->'1,160,000 per second.' >> 12->'1,210,000 per second.' >> 20->'1,100,000 per second.' >> 28->'973,000 per second.' >> ... >> 100->'1,030,000 per second.' >> 200->'321,000 per second.' >> 300->'215,000 per second.' >> 400->'160,000 per second.' >> 500->'227,000 per second.' >> >> Conclusions: >> >> - Hard-disk seek is definitely a bottleneck with Magma, or any >> Squeak application that requires random-access to a file. >> - When objects are clustered closely together, read performance can >> be dramatically better. >> - HD's with fast seek times, such as newer solid-state drives, might >> perform dramatically better. >> - I should consider reducing the trackSize from 280 bytes to ~100 >> bytes (or make it customizable); because the rate drops really fast >> after that and even a second read required could still be faster than >> an initial read. >> >> - Chris >> _______________________________________________ >> Magma mailing list >> [hidden email] >> http://lists.squeakfoundation.org/mailman/listinfo/magma > > Magma mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/magma |
Free forum by Nabble | Edit this page |