On Thu, 15 Apr 2010, Juan Vuletich wrote:
> This is an updated version. It includes Levente's suggestion, and Yoshiki's > and Andreas' alternatives as well, although disabled. All three options now > work for gray level images with other bit depths besides 1 (I only tested 1 > and 8). I also fixed transparent pixels for gray level images (it was > completely broken). Nice, I ran the following benchmark with all three (+1) versions: Smalltalk garbageCollect. (1 to: 5) collect: [ :runs | [ ImageReadWriter formFromFileNamed: 'test.png' ] timeToRun ]. test.png is Ross' test image which is pretty large. The results: #Andreas -> #(37 35 36 37 36). #Juan -> #(100 99 103 100 101). #Juan2 -> #(75 74 75 75 76). #Yoshiki -> #(38 38 39 38 38). #Juan used the original method from your changeset, the others used the attached version. It includes three enhancements: - doesn't create a blitter when it's not necessary, this affects all benchmarks - holds the bitmap in a variable instead of accessing via #bits - uses #bitShift: instead of #<< Levente > > Cheers, > Juan Vuletich > PNGReadWriter-copyPixelsGray.st (2K) Download Attachment |
Levente Uzonyi wrote:
> On Thu, 15 Apr 2010, Juan Vuletich wrote: > >> This is an updated version. It includes Levente's suggestion, and >> Yoshiki's and Andreas' alternatives as well, although disabled. All >> three options now work for gray level images with other bit depths >> besides 1 (I only tested 1 and 8). I also fixed transparent pixels >> for gray level images (it was completely broken). > > Nice, I ran the following benchmark with all three (+1) versions: > > Smalltalk garbageCollect. > (1 to: 5) collect: [ :runs | > [ ImageReadWriter formFromFileNamed: 'test.png' ] timeToRun ]. > > test.png is Ross' test image which is pretty large. > The results: > > #Andreas -> #(37 35 36 37 36). > #Juan -> #(100 99 103 100 101). > #Juan2 -> #(75 74 75 75 76). > #Yoshiki -> #(38 38 39 38 38). > > #Juan used the original method from your changeset, the others used > the attached version. It includes three enhancements: > - doesn't create a blitter when it's not necessary, this affects all > benchmarks > - holds the bitmap in a variable instead of accessing via #bits > - uses #bitShift: instead of #<< > > > Levente > Cool! Will integrate your enh for Cuis asap! Cheers, Juan Vuletich |
In reply to this post by Andreas.Raab
Andreas Raab wrote:
> On 4/14/2010 1:08 PM, Juan Vuletich wrote: >> Profiling is indeed your friend. >> There is some serious inefficiency there. Quickly hacking this (warning: >> will only work for 1bpp): > > [... snip ...] > >> gives over 30x speed increase (from 10 seconds down to 310 mSec) on my >> system. This is not a solution, just some food for thought. > > Heh, heh. Very good. But now I'm gonna get serious ... > > <pokerface on> > > I see your 30x improvement and raise you another ... 6x for a total of > 200x speedup (from 10secs to 50 msecs). There! Take that! :-) > > (but if Igor pulls out some asm I may have to fold :-) > > <pokerface off> > > > Cheers, > - Andreas So, let's play! <pokerface on> Mh... Hard challenge. Let's see. I take your technique, but save all your objects in new instance variables. And I go down from 67msecs down to 31 msecs. A 110% speed increase over yours! Who wins now? :) <pokerface off> Actually the my smalltalk version is not that bad in many situations. For example, on 'plogo.png' taken from http://palmzlib.sourceforge.net/images/dir.html , and evaluating: (1 to: 100) collect: [ :i | Smalltalk garbageCollect. [ 1 timesRepeat: [ Form fromFileNamed: 'test.png' ]] timeToRun ] Yours gives an array with: self max -> 117 self min -> 107 self average*1.0 -> 108.13 Yoshiki's gives: self max -> 118 self min -> 108 self average*1.0 -> 110.63 And mine gives: self max -> 109 self min -> 95 self average*1.0 -> 97.68 So, the bitblt technique only makes sense if we avoid creating the objects for each scanline, as this is expensive if scanlines are small. This variant gives: self max -> 103 self min -> 91 self average * 1.0 -> 95.16 The gains is not as big as in test.png, because this one has less scanlines. But it is still the winner. Cheers, Juan Vuletich |
In reply to this post by Igor Stasenko
On 4/15/10 8:29 AM, Igor Stasenko wrote:
> On 15 April 2010 05:54, Andreas Raab<[hidden email]> wrote: >> On 4/14/2010 1:08 PM, Juan Vuletich wrote: >>> Profiling is indeed your friend. >>> There is some serious inefficiency there. Quickly hacking this (warning: >>> will only work for 1bpp): >> [... snip ...] >> >>> gives over 30x speed increase (from 10 seconds down to 310 mSec) on my >>> system. This is not a solution, just some food for thought. >> Heh, heh. Very good. But now I'm gonna get serious ... >> >> <pokerface on> >> >> I see your 30x improvement and raise you another ... 6x for a total of 200x >> speedup (from 10secs to 50 msecs). There! Take that! :-) >> >> (but if Igor pulls out some asm I may have to fold :-) >> > Yeah.. one could use MMX/SSE/SIMD instructions which will put on the > knees anything you can write > in C, not mentioning smalltalk :) And now things come full circle. I recall reading an ancient article about someone who used bitblt as a hardware accelerated matrix multiply (or somesuch) back in the day. Lawson > > |
LOL I had been searching for a message by name in my mail app and forgot
to reset the sorting to "by date". That's a rather old thread, isn't it... > And now things come full circle. I recall reading an ancient article > about someone who used bitblt as a hardware accelerated matrix > multiply (or somesuch) back in the day. > > Lawson >> >> > > > |
Free forum by Nabble | Edit this page |