BitBlt alpha blend rules broken?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

BitBlt alpha blend rules broken?

timrowledge
The guy that is hacking BitBlt for ARM for me is having all sorts of fun and making good progress. I even have a VM working with an early version and am currently in the process of testing the results.

He's started on some of the more….curious… corners and right now is of the opinion that alphaBlend is getting its maths wrong. I'll just quote him rather than paraphrasing;

> "The next combinationRule I've tackled is alphaBlend. This is clearly only
> applicable to 32bpp source and destination images. Despite what the
> comments in the original source say, this *does* calculate the alpha
> component of the output, rather than zero it. In fact, it's using the
> formula:
>
> da' =    sa + (1-sa).da
> dr' = sa.sr + (1-sa).dr
> dg' = sa.sg + (1-sa).dg
> db' = sa.sb + (1-sa).db
>
> where sa = source alpha, etc.
>
> This is actually incorrect maths for an alpha blend operation with non-
> premultiplied colours, as appears to be the intent. For the RGB colour
> components, the contribution from the first term is da' times too large,
> and the contribution from the second term is da'/da times too large. In
> other words, you only get the correct results if the destination image
> was fully opaque (which implies that the result is likewise).
>
> Nevertheless, I suspect we'd run into problems if we attempted to fix
> this, so I'll endeavour to replicate the behaviour, bugs and all.
>
> Next issue was the fact that for some crazy reason, they'd decided to
> renormalise the colour components after the multiplication by doing a
> divide by 255, rounding to +infinity. This is neither the "best" approach
> (arguably round to nearest with rounding half-values to odd or even would
> be the best), nor is it particularly efficient to implement - divisions
> rarely are. The current code also has a completely pointless bitwise AND
> with 0xFF following the division, since the inputs to the division cannot
> exceed 0xFF*0xFF to begin with."

We're talking about BitBltSimulation>alphaBlend:with: in this case. It's very old code.

I'm not into colour blending maths etc and I can't see why we're using 255 instead of 256 (which would allow use of some ARM pixel engine instructions) nor quite what the intent is here. Thoughts, anyone?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: IAM: Increase Amperage Above Maximum



Reply | Threaded
Open this post in threaded view
|

Re: BitBlt alpha blend rules broken?

J. Vuletich (mail lists)
Hi Tim,

Quoting tim Rowledge <[hidden email]>:

> The guy that is hacking BitBlt for ARM for me is having all sorts of  
> fun and making good progress. I even have a VM working with an early  
> version and am currently in the process of testing the results.
>
> He's started on some of the more….curious… corners and right now is  
> of the opinion that alphaBlend is getting its maths wrong. I'll just  
> quote him rather than paraphrasing;
>
>> "The next combinationRule I've tackled is alphaBlend. This is clearly only
>> applicable to 32bpp source and destination images. Despite what the
>> comments in the original source say, this *does* calculate the alpha
>> component of the output, rather than zero it.

We're talking about BitBltSimulation >>alphaBlend:with: 'ar (auto  
pragmas 12/08) 4/6/2003 18:52'.
In the original version (with no signature or timestamp, as in  
SqueakV2.sources) it did not set destination alpha, leaving it zero as  
the comment says. So, the comment is outdated.

>> In fact, it's using the
>> formula:
>>
>> da' =    sa + (1-sa).da
>> dr' = sa.sr + (1-sa).dr
>> dg' = sa.sg + (1-sa).dg
>> db' = sa.sb + (1-sa).db
>>
>> where sa = source alpha, etc.
>>
>> This is actually incorrect maths for an alpha blend operation with non-
>> premultiplied colours, as appears to be the intent. For the RGB colour
>> components, the contribution from the first term is da' times too large,
>> and the contribution from the second term is da'/da times too large. In
>> other words, you only get the correct results if the destination image
>> was fully opaque (which implies that the result is likewise).

You're right. In the old version, consistent with the comment, it was  
assumed that destination alpha was 1. When the code was modified to  
use and update da, the bug was introduced. Squeak 32 bit Form have  
non-premultiplied colors, as you assume.

>> Nevertheless, I suspect we'd run into problems if we attempted to fix
>> this, so I'll endeavour to replicate the behaviour, bugs and all.

I'd say most likely nobody is using this. The only effect would be  
more correct visuals, in any case. I think it is best to fix the bug.  
And the comment too!

>> Next issue was the fact that for some crazy reason, they'd decided to
>> renormalise the colour components after the multiplication by doing a
>> divide by 255, rounding to +infinity. This is neither the "best" approach
>> (arguably round to nearest with rounding half-values to odd or even would
>> be the best), nor is it particularly efficient to implement - divisions
>> rarely are.

There is a reason for this:

        255*255 >>8 = 254
        255*255/255 = 255
This means that if you divide by 256, you can no longer set a result  
pixel to 255. A property that it would be good to maintain is that if  
you blend any 2 images with sa = 0, then the destination must be  
unmodified, and if sa = 1, then destination must be an exact copy of  
source. This, of course, for every possible pixel value. The same is  
done, for instance, in #preMultiplyAlpha, #colorFromPixelValue:depth:,  
etc.

>> The current code also has a completely pointless bitwise AND
>> with 0xFF following the division, since the inputs to the division cannot
>> exceed 0xFF*0xFF to begin with."

Agreed. That bitAnd: looks pointless to me.

> We're talking about BitBltSimulation>alphaBlend:with: in this case.  
> It's very old code.
>
> I'm not into colour blending maths etc and I can't see why we're  
> using 255 instead of 256 (which would allow use of some ARM pixel  
> engine instructions) nor quite what the intent is here. Thoughts,  
> anyone?
>
> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> Strange OpCodes: IAM: Increase Amperage Above Maximum

Cheers,
Juan Vuletich


Reply | Threaded
Open this post in threaded view
|

Re: BitBlt alpha blend rules broken?

timrowledge

On 22-05-2013, at 7:55 PM, Juan Vuletich (mail lists) <[hidden email]> wrote:

> Quoting tim Rowledge <[hidden email]>:
>
>> The guy that is hacking BitBlt for ARM for me is having all sorts of fun and making good progress. I even have a VM working with an early version and am currently in the process of testing the results.
>>
>> He's started on some of the more….curious… corners and right now is of the opinion that alphaBlend is getting its maths wrong. I'll just quote him rather than paraphrasing;
>>
>>> "The next combinationRule I've tackled is alphaBlend. This is clearly only
>>> applicable to 32bpp source and destination images. Despite what the
>>> comments in the original source say, this *does* calculate the alpha
>>> component of the output, rather than zero it.
>
> We're talking about BitBltSimulation >>alphaBlend:with: 'ar (auto pragmas 12/08) 4/6/2003 18:52'.
> In the original version (with no signature or timestamp, as in SqueakV2.sources) it did not set destination alpha, leaving it zero as the comment says. So, the comment is outdated.

Sounds very possible. Comments, eh? Who needs them! Why else would we call it code?


>
>>> In fact, it's using the
>>> formula:
>>>
>>> da' =    sa + (1-sa).da
>>> dr' = sa.sr + (1-sa).dr
>>> dg' = sa.sg + (1-sa).dg
>>> db' = sa.sb + (1-sa).db
>>>
>>> where sa = source alpha, etc.
>>>
>>> This is actually incorrect maths for an alpha blend operation with non-
>>> premultiplied colours, as appears to be the intent. For the RGB colour
>>> components, the contribution from the first term is da' times too large,
>>> and the contribution from the second term is da'/da times too large. In
>>> other words, you only get the correct results if the destination image
>>> was fully opaque (which implies that the result is likewise).
>
> You're right. In the old version, consistent with the comment, it was assumed that destination alpha was 1. When the code was modified to use and update da, the bug was introduced. Squeak 32 bit Form have non-premultiplied colors, as you assume.

That may make Ben happy.


>
>>> Nevertheless, I suspect we'd run into problems if we attempted to fix
>>> this, so I'll endeavour to replicate the behaviour, bugs and all.
>
> I'd say most likely nobody is using this. The only effect would be more correct visuals, in any case. I think it is best to fix the bug. And the comment too!

Senders of #blend (the Form class name for rule 24) shows 12 results, many just test cases. One method seems to be fairly important (#placeEmbeddedObject:) but I have no idea how heavily used it might be.

>
>>> Next issue was the fact that for some crazy reason, they'd decided to
>>> renormalise the colour components after the multiplication by doing a
>>> divide by 255, rounding to +infinity. This is neither the "best" approach
>>> (arguably round to nearest with rounding half-values to odd or even would
>>> be the best), nor is it particularly efficient to implement - divisions
>>> rarely are.
>
> There is a reason for this:
>
> 255*255 >>8 = 254
> 255*255/255 = 255
> This means that if you divide by 256, you can no longer set a result pixel to 255. A property that it would be good to maintain is that if you blend any 2 images with sa = 0, then the destination must be unmodified, and if sa = 1, then destination must be an exact copy of source. This, of course, for every possible pixel value. The same is done, for instance, in #preMultiplyAlpha, #colorFromPixelValue:depth:, etc.


Ah, of course. Good idea on the sa test, too.

>
>>> The current code also has a completely pointless bitwise AND
>>> with 0xFF following the division, since the inputs to the division cannot
>>> exceed 0xFF*0xFF to begin with."
>
> Agreed. That bitAnd: looks pointless to me.

Thanks for the comments Juan. It all helps make Cuis faster on a Pi...


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
To define recursion, we must first define recursion. After, of course, defining recursion.



Reply | Threaded
Open this post in threaded view
|

Re: BitBlt alpha blend rules broken?

J. Vuletich (mail lists)
In reply to this post by J. Vuletich (mail lists)
Hi Tim,

(below)
...

>
>>> Next issue was the fact that for some crazy reason, they'd decided to
>>> renormalise the colour components after the multiplication by doing a
>>> divide by 255, rounding to +infinity. This is neither the "best" approach
>>> (arguably round to nearest with rounding half-values to odd or even would
>>> be the best), nor is it particularly efficient to implement - divisions
>>> rarely are.
>
> There is a reason for this:
>
> 255*255 >>8 = 254
> 255*255/255 = 255
> This means that if you divide by 256, you can no longer set a result  
> pixel to 255.
...

Yesterday I stumbled upon this old thread and remembered a trick to  
avoid the division. I think it would be good to apply it in BitBlt, it  
should enhance performance on the PI.

The idea is to approximate 256/255 by 257/256. So, instead of doing  
x/255 (slow) or x>>8 (incorrect), you do x*257 >> 16, or better yet  
(x<<8 + x) >> 16. Two shift and an add instead of a division. The  
error is less than 1/65535, and negligible for 8 bit output.

I hope this is still relevant.

Cheers,
Juan Vuletich


Cheers,
Juan Vuletich


Reply | Threaded
Open this post in threaded view
|

Re: BitBlt alpha blend rules broken?

timrowledge

On 09-09-2013, at 9:10 AM, "J. Vuletich (mail lists)" <[hidden email]> wrote:
>
>
> The idea is to approximate 256/255 by 257/256. So, instead of doing x/255 (slow) or x>>8 (incorrect), you do x*257 >> 16, or better yet (x<<8 + x) >> 16. Two shift and an add instead of a division. The error is less than 1/65535, and negligible for 8 bit output.
>
> I hope this is still relevant.

Ooh, nice. Two instruction cycles for an ARM and probably the second shift could be merged into whatever the following operation is.


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
"Bollocks," said Pooh being more forthright than usual