counting matching bits / bitXnor

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

counting matching bits / bitXnor

Cees De Groot
Hi,

I'm fiddling with some little project where I need to count the number
of bits that are the same between two bitstrings. My first
approximation would be to bitXnor: both integers and count bits in the
result. However, there's no bitXnor: and the standard implementation:

a XNOR b = (a AND b) OR (NOT a AND NOT b)

won't fly because there's not bitNot in Integer, as far as I can tell
(negate draws in all the ugliness of 2-complement arithmetic :)).

It's probably a stupid question that only shows how rusty my bit
fiddling is - but what would be a *good* performing implementation of
something that counts matching bits? (for extra credits: if the two
integers are unequal in size, it's a pattern match so the smaller int
needs to be shifted along the larger int and the largest count is the
one that's returned):

011101 / 101 -> 3
011111 / 101 -> 2
0101 / 0001 -> 3
... etcetera)

--
"Human beings make life so interesting. Do you know, that in a
universe so full of wonders, they have managed to invent boredom. " -
Death, in "The Hogfather"

Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Paolo Bonzini-2
Cees de Groot wrote:

> Hi,
>
> I'm fiddling with some little project where I need to count the number
> of bits that are the same between two bitstrings. My first
> approximation would be to bitXnor: both integers and count bits in the
> result. However, there's no bitXnor: and the standard implementation:
>
> a XNOR b = (a AND b) OR (NOT a AND NOT b)
>
> won't fly because there's not bitNot in Integer, as far as I can tell
> (negate draws in all the ugliness of 2-complement arithmetic :)).

There's bitInvert, I think (I have no Squeak image here).  If you don't
have it, bitInvert is just "-1 - self".

> It's probably a stupid question that only shows how rusty my bit
> fiddling is - but what would be a *good* performing implementation of
> something that counts matching bits?

You need to give the width, as (if both numbers have the same sign, at
least) you can always add a zero to the left and get one more matching
bit. :-)   So the method would be something like this:

      bitMatch: b width: n
          ^((self bitXor: b) bitInvert bitAnd: (1 bitShift: n) - 1)
              bitCount

      bitCount
          | n cnt |
          n := self. cnt := 0.
          [ n = 0 ] whileFalse: [
              cnt := cnt + 1.
              n := n bitAnd: n - 1 ].
          ^cnt

 > 0101 / 0001 -> 3

      ^2r0101 bitMatch: 2r0001 width: 4

> integers are unequal in size, it's a pattern match so the smaller int
> needs to be shifted along the larger int and the largest count is the
> one that's returned):
>
> 011101 / 101 -> 3
> 011111 / 101 -> 2

I don't know the answer for this, but I know of an efficient algorithm
for a similar problem, that is to *find* a bitstring in another.  For
this, you can adapt the Knuth-Morris-Pratt string searching algorithm.

KMP works by computing the longest prefix corresponding to all the
partial match (e.g. if you've matched six chars in 0101011 and fail on
the seventh char, you can try again with a four-character partial
match).  You can modify the algorithm in two ways:

1) you can optimize the prefix table because you only have two values.
In the previous case, if you failed to match the seventh char in
0101011, since you wanted a one you know that the target bitstring had a
zero.  So you can continue with a five-character partial match.

2) when you find a match, if you want overlapping matches you can use
the prefix table to find them.  For example, if you are looking for is
0101, the KMP prefix table (not the one modified according to
optimization 1) says that after failing to match the fourth character
you can try matching against the second.  So, that's what you do even
after finding a full match.  Optimization 1 can also be applied to this
table, though in a slightly different way.

You can find code for KMP inside VisualWorks or GNU Smalltalk (see
Stream>>#nextPutAll:).

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Cees De Groot
On Dec 17, 2007 1:34 PM, Paolo Bonzini <[hidden email]> wrote:
> There's bitInvert, I think (I have no Squeak image here).  If you don't
> have it, bitInvert is just "-1 - self".
>
Yup - and that draws in a sign and other ugliness. At least, in a
direct implementation of bitXnor:

bitXnor: anInteger
        " return bit-wise XNOR of two numbers. "
       
        ^ (self bitAnd: anInteger) bitOr: (self bitInvert bitAnd: anInteger bitInvert)

0 bitXnor: 0 -> -1

> You need to give the width, as (if both numbers have the same sign, at
> least) you can always add a zero to the left and get one more matching
> bit. :-)   So the method would be something like this:
>
>       bitMatch: b width: n
>           ^((self bitXor: b) bitInvert bitAnd: (1 bitShift: n) - 1)
>               bitCount
>
>       bitCount
>           | n cnt |
>           n := self. cnt := 0.
>           [ n = 0 ] whileFalse: [
>               cnt := cnt + 1.
>               n := n bitAnd: n - 1 ].
>           ^cnt
>
> I don't know the answer for this, but I know of an efficient algorithm
> for a similar problem, that is to *find* a bitstring in another.  For
> this, you can adapt the Knuth-Morris-Pratt string searching algorithm.
>
Yeah, but that's for exact searching... besides, I feel that if I can
get the bitcounting efficient the couple of loop/shift operations
needed to shift the shorter pattern along the larger one should be ok.

Thanks for the suggested implementation - i'll throw it at my unit tests :-)

(if you think - wth is that guy needing that stuff for - the bit
strings are genomes, and i want to make some decisions in an
artificial life algorithm based on 'fit' - how well the genomes match
(and also on 'non-fit' - how far they are apart, but that is step 2).
Just something to keep me busy in the train commute)

--
"Human beings make life so interesting. Do you know, that in a
universe so full of wonders, they have managed to invent boredom. " -
Death, in "The Hogfather"

Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Bert Freudenberg
On Dec 17, 2007, at 18:29 , Cees de Groot wrote:

> On Dec 17, 2007 1:34 PM, Paolo Bonzini <[hidden email]> wrote:
>> There's bitInvert, I think (I have no Squeak image here).  If you  
>> don't
>> have it, bitInvert is just "-1 - self".
>>
> Yup - and that draws in a sign and other ugliness. At least, in a
> direct implementation of bitXnor:
>
> bitXnor: anInteger
> " return bit-wise XNOR of two numbers. "
>
> ^ (self bitAnd: anInteger) bitOr: (self bitInvert bitAnd:  
> anInteger bitInvert)
>
> 0 bitXnor: 0 -> -1

And what exactly is wrong with that?

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Cees De Groot
On Dec 17, 2007 1:49 PM, Bert Freudenberg <[hidden email]> wrote:
> > 0 bitXnor: 0 -> -1
>
> And what exactly is wrong with that?
>
err... I'd expect '1' here? The problem is that bitInvert is not
identical, as far as I can tell, with bitNot...

1 bitInvert -> -2

(where '1 bitNot' should result in 0)

And for the problem itself: drawing a quick list of truth tables of
available bit ops quickly pointed out that counting similar bits with
XNOR and counting different bits with XOR are equivalent ;-P. That's
what you get for firing off posts at the end of your lunch break.
Making the number of matching bits a similarity indicated based on the
length of the largest bit string, I currently have (pending more unit
tests ;-)):

findSimilarityBetween: n1 and: n2
        "check how similar n1 and n2 are. Similarity is defined of the maximum
         matching number of bits - if n1 << n2, we'll 'slide' n1 along n2'''"
       
        | small large match thisMatch thisMatchBitcount nBits |
       
        (n1 = n2) ifTrue: [^1.0].
               
        (n1 < n2)
                ifTrue: [small := n1. large := n2]
                ifFalse: [small := n2. large := n1].
        match := 0.
        nBits := large highBitOfMagnitude.
       
        1 to: nBits do: [:i |
                thisMatch := small bitXor: large.
                thisMatchBitcount := nBits - (self countBits: thisMatch).
                (thisMatchBitcount > match)
                        ifTrue: [match := thisMatchBitcount].
                small := small << 1].
       
        ^match / (nBits * 1.0)






--
"Human beings make life so interesting. Do you know, that in a
universe so full of wonders, they have managed to invent boredom. " -
Death, in "The Hogfather"

Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Paolo Bonzini-2
Cees de Groot wrote:
> On Dec 17, 2007 1:49 PM, Bert Freudenberg <[hidden email]> wrote:
>>> 0 bitXnor: 0 -> -1
>> And what exactly is wrong with that?
>>
> err... I'd expect '1' here?

No.  If you have arbitrary-precision integers, numbers like "infinitely
many ones plus something else" are negative; numbers like "infinitely
many zeros plus something else" are positive.  0 is infinitely many
zeros, -1 is infinitely many ones -- and XNORing each of the infinitely
many zeros with itself produces infinitely many ones, right?

The problem is that your bitstrings have a length beyond which all bits
should be considered zeros; integers have not.

I hope this is understandable...

Paolo


Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Randal L. Schwartz
In reply to this post by Cees De Groot
>>>>> "Cees" == Cees de Groot <[hidden email]> writes:

Cees> Hi,
Cees> I'm fiddling with some little project where I need to count the number
Cees> of bits that are the same between two bitstrings. My first
Cees> approximation would be to bitXnor: both integers and count bits in the
Cees> result. However, there's no bitXnor: and the standard implementation:

Cees> a XNOR b = (a AND b) OR (NOT a AND NOT b)

Cees> won't fly because there's not bitNot in Integer, as far as I can tell
Cees> (negate draws in all the ugliness of 2-complement arithmetic :)).

That looks like NOT (a XOR b) to me.  Wouldn't that already have
a fast implementation?

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Cees De Groot
In reply to this post by Paolo Bonzini-2
I *understand* it, but it's not what I *want* :).

(granted, "expect" is not the right word)

On Dec 17, 2007 5:05 PM, Paolo Bonzini <[hidden email]> wrote:

> Cees de Groot wrote:
> > On Dec 17, 2007 1:49 PM, Bert Freudenberg <[hidden email]> wrote:
> >>> 0 bitXnor: 0 -> -1
> >> And what exactly is wrong with that?
> >>
> > err... I'd expect '1' here?
>
> No.  If you have arbitrary-precision integers, numbers like "infinitely
> many ones plus something else" are negative; numbers like "infinitely
> many zeros plus something else" are positive.  0 is infinitely many
> zeros, -1 is infinitely many ones -- and XNORing each of the infinitely
> many zeros with itself produces infinitely many ones, right?
>
> The problem is that your bitstrings have a length beyond which all bits
> should be considered zeros; integers have not.
>
> I hope this is understandable...
>
> Paolo
>
>
>



--
"Human beings make life so interesting. Do you know, that in a
universe so full of wonders, they have managed to invent boredom. " -
Death, in "The Hogfather"

Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Paolo Bonzini-2
Cees de Groot wrote:
> I *understand* it, but it's not what I *want* :).

The point is that, there is no reason why 0 bitInvert should return 1
rather than 3, 7, or 15.

Your data type is a BitArray, but if you encode it as an Integer, that's
not enough.  You need another instance variable to store the width or
alternatively a mask (i.e. (1 bitShift: width) - 1).  Then, you have
something like this:

(the bitOr:'s in the masks are not typos).

& aBitString
     ^BitString
         value: (self value bitAnd: aBitString value)
         mask: (self mask bitOr: aBitString mask)

| aBitString
     ^BitString
         value: (self value bitOr: aBitString value)
         mask: (self mask bitOr: aBitString mask)

invert
     ^BitString
         value: (self value bitXor: self mask)
         mask: self mask

xor: aBitString
     ^BitString
         value: (self value bitXor: aBitString value)
         mask: (self mask bitOr: aBitString mask)

xnor: aBitString
     ^(self xor: aBitString) invert

count
     | n cnt |
     n := self value. cnt := 0.
     [ n = 0 ] whileFalse: [
         cnt := cnt + 1.
         n := n bitAnd: n - 1 ].
     ^cnt

Paolo


Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

timrowledge
Use a Bitmap and BitBLT. Dan put loads of stuff in there to mangle bits


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: PWB: Put to Waste Basket



Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Tapple Gao
In reply to this post by Cees De Groot
On Mon, Dec 17, 2007 at 01:00:41PM +0100, Cees de Groot wrote:
> It's probably a stupid question that only shows how rusty my bit
> fiddling is - but what would be a *good* performing implementation of
> something that counts matching bits? (for extra credits: if the two
> integers are unequal in size, it's a pattern match so the smaller int
> needs to be shifted along the larger int and the largest count is the
> one that's returned):

doing a google search for "fast bit counting" brings up
http://infolab.stanford.edu/~manku/bitcount/bitcount.html as the
first hit. The first algorithm listed takes time proportional to
the number of 1 bits in the integer, rather than the length of
the bit encoding. It relies on the ability to do a fast zero
check, though, as the loop exit condition

--
Matthew Fulmer -- http://mtfulmer.wordpress.com/
Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808

Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Yoshiki Ohshima-2
In reply to this post by timrowledge
> Use a Bitmap and BitBLT. Dan put loads of stuff in there to mangle bits

  Yes.  There is a SqueakMap entry called BitArray that utilizes
Bitmap and BitBlt, and among others, #cardinality returns the number
of set bits.

  I haven't quite understood Cees' requirement, (is it effectively to
find a substring or not?) but BitArray might help.

-- Yoshiki

Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Nicolas Cellier-3
In reply to this post by Cees De Groot
Cees de Groot a écrit :

> Hi,
>
> I'm fiddling with some little project where I need to count the number
> of bits that are the same between two bitstrings. My first
> approximation would be to bitXnor: both integers and count bits in the
> result. However, there's no bitXnor: and the standard implementation:
>
> a XNOR b = (a AND b) OR (NOT a AND NOT b)
>
> won't fly because there's not bitNot in Integer, as far as I can tell
> (negate draws in all the ugliness of 2-complement arithmetic :)).
>
> It's probably a stupid question that only shows how rusty my bit
> fiddling is - but what would be a *good* performing implementation of
> something that counts matching bits? (for extra credits: if the two
> integers are unequal in size, it's a pattern match so the smaller int
> needs to be shifted along the larger int and the largest count is the
> one that's returned):
>
> 011101 / 101 -> 3
> 011111 / 101 -> 2
> 0101 / 0001 -> 3
> ... etcetera)
>

I don't understand how you could achieve that with Integer

0101 / 0001 -> 3
00101 / 00001 -> 4
000101 / 000001 -> 5

Same integer, different results...

So yes, BitArray as suggested by Yoshiki is maybe your option

Nicolas


Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Bert Freudenberg
In reply to this post by Cees De Groot
On Dec 17, 2007, at 19:04 , Cees de Groot wrote:

> On Dec 17, 2007 1:49 PM, Bert Freudenberg <[hidden email]>  
> wrote:
>>> 0 bitXnor: 0 -> -1
>>
>> And what exactly is wrong with that?
>>
> err... I'd expect '1' here?

How should that method know you are looking for 1-bit numbers only?  
In your logic

     2r0 bitXnor: 2r0 -> 2r1
     2r00 bitXnor: 2r00 -> 2r11

which would be rather weird, wouldn't it?

-1 in two's complement is an endless string of 1s, so that's the  
Right Thing.

If you want a "fixed-width" number then you can just #bitAnd: with  
the number of bits you like:

-1 bitAnd: 2r1  -> 2r1
-1 bitAnd: 2r11 -> 2r11

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Cees De Groot
In reply to this post by Paolo Bonzini-2
On Dec 17, 2007 6:25 PM, Paolo Bonzini <[hidden email]> wrote:
> Your data type is a BitArray, but if you encode it as an Integer, that's
> not enough.  You need another instance variable to store the width or
> alternatively a mask (i.e. (1 bitShift: width) - 1).  Then, you have
> something like this:
>
[...]

thanks. You're right, of course. I'm trying to stuff square pegs in
round holes here. That's probably what you get when your daytime job
largely plays in PHP-land...

Tim - thanks for the tip on Bitblt, I would have never thought to look
there (graphics stuff scares me ;-))

--
"Human beings make life so interesting. Do you know, that in a
universe so full of wonders, they have managed to invent boredom. " -
Death, in "The Hogfather"

Reply | Threaded
Open this post in threaded view
|

Re: counting matching bits / bitXnor

Cees De Groot
In reply to this post by Tapple Gao
I typed 'counting bits' on Google and found the excellent page
http://graphics.stanford.edu/~seander/bithacks.html

Nice to see that a trivial space/time trade-off works soo much better
than funky algorithms, by the way :)

On Dec 17, 2007 6:37 PM, Matthew Fulmer <[hidden email]> wrote:

> doing a google search for "fast bit counting" brings up
> http://infolab.stanford.edu/~manku/bitcount/bitcount.html as the
> first hit. The first algorithm listed takes time proportional to
> the number of 1 bits in the integer, rather than the length of
> the bit encoding. It relies on the ability to do a fast zero
> check, though, as the loop exit condition
>
> --
> Matthew Fulmer -- http://mtfulmer.wordpress.com/
> Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808
>
>



--
"Human beings make life so interesting. Do you know, that in a
universe so full of wonders, they have managed to invent boredom. " -
Death, in "The Hogfather"