Bitwise operations in ByteArray (slicing, etc.)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Bitwise operations in ByteArray (slicing, etc.)

Esteban A. Maringolo
Is there any package/library that makes bitwise operations as simple
as with an Integer, but for larger numbers (as in a ByteArray).

Something that allows me to "slice" a sequence of bits, or extract
some using the same protocol as with a String of ones and zeros.

Now when I need to work with sequence of bits, I convert an Integer to
a zero padded version of it up a known size, and then do #copyFrom:to:
to extract what I need and read back the number from it.

I could use a bytearray for it, but as its name implies, it is
oriented towards bytes rather than bits (as in the case of Integer).

Now I do stuff like the following to to extract the first 5 bits of a
fixed length 256 bit array (an Integer).

Integer
  readFrom:
    (((SHA256 hashMessage: message)) asInteger
      printStringBase: 2 length: 256 padded: true)
       copyFrom: 1 to: 5)
  base: 2

I understand bitwise operations, but I couldn't find something that
does the above in a conciser way.

Performance in my case isn't critical, but working with strings is
probably two orders of magnitude slower than manipulating bits in
integers or ByteArrays

Regards,

Esteban A. Maringolo

Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Sven Van Caekenberghe-2
Take a 24-bit number and you want to isolate the first 5 (these are actually the last, higher order) bits.

n := 2r101010001111000011110000.
n >> (16+3).

If necessary, you can apply a mask (assume there are bits earlier/later still).

(n >> (16+3)) bitAnd: 2r11111

Large integers behave as bit strings, see the 'bit manipulation' protocol, and are efficient at it.

> On 4 Mar 2018, at 20:29, Esteban A. Maringolo <[hidden email]> wrote:
>
> Is there any package/library that makes bitwise operations as simple
> as with an Integer, but for larger numbers (as in a ByteArray).
>
> Something that allows me to "slice" a sequence of bits, or extract
> some using the same protocol as with a String of ones and zeros.
>
> Now when I need to work with sequence of bits, I convert an Integer to
> a zero padded version of it up a known size, and then do #copyFrom:to:
> to extract what I need and read back the number from it.
>
> I could use a bytearray for it, but as its name implies, it is
> oriented towards bytes rather than bits (as in the case of Integer).
>
> Now I do stuff like the following to to extract the first 5 bits of a
> fixed length 256 bit array (an Integer).
>
> Integer
>  readFrom:
>    (((SHA256 hashMessage: message)) asInteger
>      printStringBase: 2 length: 256 padded: true)
>       copyFrom: 1 to: 5)
>  base: 2
>
> I understand bitwise operations, but I couldn't find something that
> does the above in a conciser way.
>
> Performance in my case isn't critical, but working with strings is
> probably two orders of magnitude slower than manipulating bits in
> integers or ByteArrays
>
> Regards,
>
> Esteban A. Maringolo
>


Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Esteban A. Maringolo
I do bitshifts and masks on Integer.

What if you want to take the bits from the 3rd to the 7th? You have to
do a some arithmetic to get the slice you want.
I'm simply asking for something more dev-friendlier that adds a
"layer" on top of that, but that internally does regular bitwise.

What I don't like about Integers is that you "lose" information about
the zero bits to the left, and with a ByteArray you don't, because the
array is fixed size.

E.g.
(1 << 16) printStringBase: 16. "'10000'"
#[1 0 0] hex "'010000'"

Maybe I'm too lazy asking when I could have done it myself :)

Regards,

Esteban A. Maringolo


2018-03-04 16:40 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:

> Take a 24-bit number and you want to isolate the first 5 (these are actually the last, higher order) bits.
>
> n := 2r101010001111000011110000.
> n >> (16+3).
>
> If necessary, you can apply a mask (assume there are bits earlier/later still).
>
> (n >> (16+3)) bitAnd: 2r11111
>
> Large integers behave as bit strings, see the 'bit manipulation' protocol, and are efficient at it.
>
>> On 4 Mar 2018, at 20:29, Esteban A. Maringolo <[hidden email]> wrote:
>>
>> Is there any package/library that makes bitwise operations as simple
>> as with an Integer, but for larger numbers (as in a ByteArray).
>>
>> Something that allows me to "slice" a sequence of bits, or extract
>> some using the same protocol as with a String of ones and zeros.
>>
>> Now when I need to work with sequence of bits, I convert an Integer to
>> a zero padded version of it up a known size, and then do #copyFrom:to:
>> to extract what I need and read back the number from it.
>>
>> I could use a bytearray for it, but as its name implies, it is
>> oriented towards bytes rather than bits (as in the case of Integer).
>>
>> Now I do stuff like the following to to extract the first 5 bits of a
>> fixed length 256 bit array (an Integer).
>>
>> Integer
>>  readFrom:
>>    (((SHA256 hashMessage: message)) asInteger
>>      printStringBase: 2 length: 256 padded: true)
>>       copyFrom: 1 to: 5)
>>  base: 2
>>
>> I understand bitwise operations, but I couldn't find something that
>> does the above in a conciser way.
>>
>> Performance in my case isn't critical, but working with strings is
>> probably two orders of magnitude slower than manipulating bits in
>> integers or ByteArrays
>>
>> Regards,
>>
>> Esteban A. Maringolo
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Sven Van Caekenberghe-2
Bits are actually numbered from right to left (seen from how they are printed).

But in a fixed amount of bits you could indeed number them the other way around.

It does not matter that you write the leading zeros or not, everything to the left is zero anyway. When shifting, the right thing happens.

Adding something like Integer>>#bitSliceFrom:to:size: based on your reverse numbering would be wrong, IMHO.

I guess you could write it like

n := 2r11110000101010001111000011110000.
from := 8.
to := 13.
size := 32.
(n >> (size - 13)) bitAnd: ((1 << ( to - from + 1)) - 1).

But the size feels out of place.

> On 4 Mar 2018, at 20:57, Esteban A. Maringolo <[hidden email]> wrote:
>
> I do bitshifts and masks on Integer.
>
> What if you want to take the bits from the 3rd to the 7th? You have to
> do a some arithmetic to get the slice you want.
> I'm simply asking for something more dev-friendlier that adds a
> "layer" on top of that, but that internally does regular bitwise.
>
> What I don't like about Integers is that you "lose" information about
> the zero bits to the left, and with a ByteArray you don't, because the
> array is fixed size.
>
> E.g.
> (1 << 16) printStringBase: 16. "'10000'"
> #[1 0 0] hex "'010000'"
>
> Maybe I'm too lazy asking when I could have done it myself :)
>
> Regards,
>
> Esteban A. Maringolo
>
>
> 2018-03-04 16:40 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
>> Take a 24-bit number and you want to isolate the first 5 (these are actually the last, higher order) bits.
>>
>> n := 2r101010001111000011110000.
>> n >> (16+3).
>>
>> If necessary, you can apply a mask (assume there are bits earlier/later still).
>>
>> (n >> (16+3)) bitAnd: 2r11111
>>
>> Large integers behave as bit strings, see the 'bit manipulation' protocol, and are efficient at it.
>>
>>> On 4 Mar 2018, at 20:29, Esteban A. Maringolo <[hidden email]> wrote:
>>>
>>> Is there any package/library that makes bitwise operations as simple
>>> as with an Integer, but for larger numbers (as in a ByteArray).
>>>
>>> Something that allows me to "slice" a sequence of bits, or extract
>>> some using the same protocol as with a String of ones and zeros.
>>>
>>> Now when I need to work with sequence of bits, I convert an Integer to
>>> a zero padded version of it up a known size, and then do #copyFrom:to:
>>> to extract what I need and read back the number from it.
>>>
>>> I could use a bytearray for it, but as its name implies, it is
>>> oriented towards bytes rather than bits (as in the case of Integer).
>>>
>>> Now I do stuff like the following to to extract the first 5 bits of a
>>> fixed length 256 bit array (an Integer).
>>>
>>> Integer
>>> readFrom:
>>>   (((SHA256 hashMessage: message)) asInteger
>>>     printStringBase: 2 length: 256 padded: true)
>>>      copyFrom: 1 to: 5)
>>> base: 2
>>>
>>> I understand bitwise operations, but I couldn't find something that
>>> does the above in a conciser way.
>>>
>>> Performance in my case isn't critical, but working with strings is
>>> probably two orders of magnitude slower than manipulating bits in
>>> integers or ByteArrays
>>>
>>> Regards,
>>>
>>> Esteban A. Maringolo
>>>
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Esteban A. Maringolo
2018-03-04 17:15 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
> Bits are actually numbered from right to left (seen from how they are printed).

I understand bit operations, used it extensively with IP address eons ago.

But if a spec says: "Take the first n bits from the hash", it means
the first significant bits.
so in 2r100101111 the first 3 bits are "100" and not "111".

> But in a fixed amount of bits you could indeed number them the other way around.
> It does not matter that you write the leading zeros or not, everything to the left is zero anyway. When shifting, the right thing happens.

If it were a packet, with a fixed size, and the header started with 0,
then the leading zeros matter.

So in my use case it isn't the same "00000000000000000000000000000000"
than "000000000000000000000000000000000000000000000000" (both in hex
notation).
See [1]

> Adding something like Integer>>#bitSliceFrom:to:size: based on your reverse numbering would be wrong, IMHO.

Yes, it would be wrong. Unless you have to option to specify the
endianess you want to use.

> I guess you could write it like
>
> n := 2r11110000101010001111000011110000.
> from := 8.
> to := 13.
> size := 32.
> (n >> (size - 13)) bitAnd: ((1 << ( to - from + 1)) - 1).
>
> But the size feels out of place.

How so?

Thanks in advance, this is super low priority, I just wanted to check
whether something like that already existed.

Regards!


[1] https://github.com/eMaringolo/pharo-bip39mnemonic/blob/master/src/BIP39Mnemonic-Core.package/BIP39MnemonicTest.class/instance/jsonTestVectors.st


Esteban A. Maringolo

Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Richard Sargent
Administrator
Please, if you do implement something, start with a Collection class and call it something like BitArray. It might be appropriate to subclass under ByteArray, but perhaps not.

On Mar 4, 2018 12:45 PM, "Esteban A. Maringolo" <[hidden email]> wrote:
2018-03-04 17:15 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
> Bits are actually numbered from right to left (seen from how they are printed).

I understand bit operations, used it extensively with IP address eons ago.

But if a spec says: "Take the first n bits from the hash", it means
the first significant bits.
so in 2r100101111 the first 3 bits are "100" and not "111".

> But in a fixed amount of bits you could indeed number them the other way around.
> It does not matter that you write the leading zeros or not, everything to the left is zero anyway. When shifting, the right thing happens.

If it were a packet, with a fixed size, and the header started with 0,
then the leading zeros matter.

So in my use case it isn't the same "00000000000000000000000000000000"
than "000000000000000000000000000000000000000000000000" (both in hex
notation).
See [1]

> Adding something like Integer>>#bitSliceFrom:to:size: based on your reverse numbering would be wrong, IMHO.

Yes, it would be wrong. Unless you have to option to specify the
endianess you want to use.

> I guess you could write it like
>
> n := 2r11110000101010001111000011110000.
> from := 8.
> to := 13.
> size := 32.
> (n >> (size - 13)) bitAnd: ((1 << ( to - from + 1)) - 1).
>
> But the size feels out of place.

How so?

Thanks in advance, this is super low priority, I just wanted to check
whether something like that already existed.

Regards!


[1] https://github.com/eMaringolo/pharo-bip39mnemonic/blob/master/src/BIP39Mnemonic-Core.package/BIP39MnemonicTest.class/instance/jsonTestVectors.st


Esteban A. Maringolo

Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Pharo Smalltalk Users mailing list
In reply to this post by Esteban A. Maringolo
Hi Esteban,

When you say "extract", what exactly do you mean?

Let's suppose you have 86, i.e. 2r1010110
Which case do you want?

a) 2r1010110 "2 to 5 = 2r10[1011]0 = 2r1011 = 11"

or

b) 2r1010110 "2 to 5 = 2r10[1011]0 = 2r00[1011]0 =  2r0010110 =  22"

Both cases can be dealt with bit shifting and a bitAnd with optimal performance.

-----------------
Benoît St-Jean
Yahoo! Messenger: bstjean
Twitter: @BenLeChialeux
Pinterest: benoitstjean
Instagram: Chef_Benito
IRC: lamneth
Blogue: endormitoire.wordpress.com
"A standpoint is an intellectual horizon of radius zero".  (A. Einstein)


On Sunday, March 4, 2018, 2:31:21 PM EST, Esteban A. Maringolo <[hidden email]> wrote:


Is there any package/library that makes bitwise operations as simple
as with an Integer, but for larger numbers (as in a ByteArray).

Something that allows me to "slice" a sequence of bits, or extract
some using the same protocol as with a String of ones and zeros.

Now when I need to work with sequence of bits, I convert an Integer to
a zero padded version of it up a known size, and then do #copyFrom:to:
to extract what I need and read back the number from it.

I could use a bytearray for it, but as its name implies, it is
oriented towards bytes rather than bits (as in the case of Integer).

Now I do stuff like the following to to extract the first 5 bits of a
fixed length 256 bit array (an Integer).

Integer
  readFrom:
    (((SHA256 hashMessage: message)) asInteger
      printStringBase: 2 length: 256 padded: true)
      copyFrom: 1 to: 5)
  base: 2

I understand bitwise operations, but I couldn't find something that
does the above in a conciser way.

Performance in my case isn't critical, but working with strings is
probably two orders of magnitude slower than manipulating bits in
integers or ByteArrays

Regards,

Esteban A. Maringolo

Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Pharo Smalltalk Users mailing list
In reply to this post by Esteban A. Maringolo
Just put that in Integer.

----------------------------
bitsSliceFrom: start to: end
    "
    Extract bits 4 to 6, i.e. 111 which equals to 7
    2r10000111000 bitsSliceFrom: 4 to: 6
    "
    | num mask  |

    self < 0 ifTrue: [self error: 'This operation is not allowed for negative numbers!'].
    num := self >> (start - 1).
    mask := (1 << ((end - start) + 1)) -  1.

    ^num bitAnd: mask.
-----------------------------
bitsSliceInPlaceFrom: start to: end
    "
    Extract bits 4 to 6, IN PLACE, i.e. 111 which equals to 2r00000111000 = 56
    2r10000111000 bitsSliceInPlaceFrom: 4 to: 6
    "
    |  |

    ^(self bitsSliceFrom: start to: end) << (start - 1)
----------------------------

This should do the job!  hth

-----------------
Benoît St-Jean
Yahoo! Messenger: bstjean
Twitter: @BenLeChialeux
Pinterest: benoitstjean
Instagram: Chef_Benito
IRC: lamneth
Blogue: endormitoire.wordpress.com
"A standpoint is an intellectual horizon of radius zero".  (A. Einstein)


On Sunday, March 4, 2018, 2:59:16 PM EST, Esteban A. Maringolo <[hidden email]> wrote:


I do bitshifts and masks on Integer.

What if you want to take the bits from the 3rd to the 7th? You have to
do a some arithmetic to get the slice you want.
I'm simply asking for something more dev-friendlier that adds a
"layer" on top of that, but that internally does regular bitwise.

What I don't like about Integers is that you "lose" information about
the zero bits to the left, and with a ByteArray you don't, because the
array is fixed size.

E.g.
(1 << 16) printStringBase: 16. "'10000'"
#[1 0 0] hex "'010000'"

Maybe I'm too lazy asking when I could have done it myself :)

Regards,

Esteban A. Maringolo


2018-03-04 16:40 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:

> Take a 24-bit number and you want to isolate the first 5 (these are actually the last, higher order) bits.
>
> n := 2r101010001111000011110000.
> n >> (16+3).
>
> If necessary, you can apply a mask (assume there are bits earlier/later still).
>
> (n >> (16+3)) bitAnd: 2r11111
>
> Large integers behave as bit strings, see the 'bit manipulation' protocol, and are efficient at it.
>
>> On 4 Mar 2018, at 20:29, Esteban A. Maringolo <[hidden email]> wrote:
>>
>> Is there any package/library that makes bitwise operations as simple
>> as with an Integer, but for larger numbers (as in a ByteArray).
>>
>> Something that allows me to "slice" a sequence of bits, or extract
>> some using the same protocol as with a String of ones and zeros.
>>
>> Now when I need to work with sequence of bits, I convert an Integer to
>> a zero padded version of it up a known size, and then do #copyFrom:to:
>> to extract what I need and read back the number from it.
>>
>> I could use a bytearray for it, but as its name implies, it is
>> oriented towards bytes rather than bits (as in the case of Integer).
>>
>> Now I do stuff like the following to to extract the first 5 bits of a
>> fixed length 256 bit array (an Integer).
>>
>> Integer
>>  readFrom:
>>    (((SHA256 hashMessage: message)) asInteger
>>      printStringBase: 2 length: 256 padded: true)
>>      copyFrom: 1 to: 5)
>>  base: 2
>>
>> I understand bitwise operations, but I couldn't find something that
>> does the above in a conciser way.
>>
>> Performance in my case isn't critical, but working with strings is
>> probably two orders of magnitude slower than manipulating bits in
>> integers or ByteArrays
>>
>> Regards,
>>
>> Esteban A. Maringolo
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Richard O'Keefe
​As I understand it, the OP's problem is that he wants to
count bits from the high end, and Integers don't *have*
a finite high end.

PL/I has bit strings.  Common Lisp has bit strings.
Some Schemes have copied Lisp.  Eiffel _had_ bit strings,
more or less, but dropped them.

Smalltalk/X has a BitArray class; see
http://live.exept.de/ClassDoc/classDocOf:,BitArray
for documentation.

My own Smalltalk also has a BitArray class with a
somewhat different interface.  Although developed
independently, both are implemented as byte arrays
with an associated count.  It's not terribly hard.
For selecting small numbers of bits out of long
bit strings, this is more efficient than shifting
and masking a large integer.

I rather wonder whether it might be possible to do
better by thinking in a somewhat less C-like style.
For example, Erlang handles unpacking network
packets and the like by letting you write bit-string
*patterns*, which the compiler can then generate
special-case code for.

Run-time generation of small amounts of code is
somewhat easier in Smalltalk.  (Most Smalltalks.)
What would it be like if you had a BitArray class
where you could say
    extractor := BitArray extractorForBits: 3 to: 7 of: 256.
    ...
    extractor value: thisPacket
and extractor was a block (possibly cached) with
code specialised for just that case?
Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Stephane Ducasse-3
In reply to this post by Esteban A. Maringolo
On Sun, Mar 4, 2018 at 9:43 PM, Esteban A. Maringolo
<[hidden email]> wrote:
> 2018-03-04 17:15 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
>> Bits are actually numbered from right to left (seen from how they are printed).
>
> I understand bit operations, used it extensively with IP address eons ago.
>
> But if a spec says: "Take the first n bits from the hash", it means
> the first significant bits.
> so in 2r100101111 the first 3 bits are "100" and not "111".

naive question: why?
To me it looks like a lousy specification.

>
>> But in a fixed amount of bits you could indeed number them the other way around.
>> It does not matter that you write the leading zeros or not, everything to the left is zero anyway. When shifting, the right thing happens.
>
> If it were a packet, with a fixed size, and the header started with 0,
> then the leading zeros matter.
>
> So in my use case it isn't the same "00000000000000000000000000000000"
> than "000000000000000000000000000000000000000000000000" (both in hex
> notation).
> See [1]
>
>> Adding something like Integer>>#bitSliceFrom:to:size: based on your reverse numbering would be wrong, IMHO.
>
> Yes, it would be wrong. Unless you have to option to specify the
> endianess you want to use.
>
>> I guess you could write it like
>>
>> n := 2r11110000101010001111000011110000.
>> from := 8.
>> to := 13.
>> size := 32.
>> (n >> (size - 13)) bitAnd: ((1 << ( to - from + 1)) - 1).
>>
>> But the size feels out of place.
>
> How so?
>
> Thanks in advance, this is super low priority, I just wanted to check
> whether something like that already existed.
>
> Regards!
>
>
> [1] https://github.com/eMaringolo/pharo-bip39mnemonic/blob/master/src/BIP39Mnemonic-Core.package/BIP39MnemonicTest.class/instance/jsonTestVectors.st
>
>
> Esteban A. Maringolo
>

Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Stephane Ducasse-3
In reply to this post by Richard O'Keefe
It looks like having BitArray is a good approach.
Is there a MIT compatible version? I could add it to my collection collection :)
Containers I should migrate it to github.

Stef

On Mon, Mar 5, 2018 at 2:52 PM, Richard O'Keefe <[hidden email]> wrote:

> As I understand it, the OP's problem is that he wants to
> count bits from the high end, and Integers don't *have*
> a finite high end.
>
> PL/I has bit strings.  Common Lisp has bit strings.
> Some Schemes have copied Lisp.  Eiffel _had_ bit strings,
> more or less, but dropped them.
>
> Smalltalk/X has a BitArray class; see
> http://live.exept.de/ClassDoc/classDocOf:,BitArray
> for documentation.
>
> My own Smalltalk also has a BitArray class with a
> somewhat different interface.  Although developed
> independently, both are implemented as byte arrays
> with an associated count.  It's not terribly hard.
> For selecting small numbers of bits out of long
> bit strings, this is more efficient than shifting
> and masking a large integer.
>
> I rather wonder whether it might be possible to do
> better by thinking in a somewhat less C-like style.
> For example, Erlang handles unpacking network
> packets and the like by letting you write bit-string
> *patterns*, which the compiler can then generate
> special-case code for.
>
> Run-time generation of small amounts of code is
> somewhat easier in Smalltalk.  (Most Smalltalks.)
> What would it be like if you had a BitArray class
> where you could say
>     extractor := BitArray extractorForBits: 3 to: 7 of: 256.
>     ...
>     extractor value: thisPacket
> and extractor was a block (possibly cached) with
> code specialised for just that case?

Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

SergeStinckwich
In reply to this post by Richard O'Keefe


On Mon, Mar 5, 2018 at 2:52 PM, Richard O'Keefe <[hidden email]> wrote:
​As I understand it, the OP's problem is that he wants to
count bits from the high end, and Integers don't *have*
a finite high end.

PL/I has bit strings.  Common Lisp has bit strings.
Some Schemes have copied Lisp.  Eiffel _had_ bit strings,
more or less, but dropped them.

Smalltalk/X has a BitArray class; see
http://live.exept.de/ClassDoc/classDocOf:,BitArray
for documentation.


​Great documentation ! We need something like that for Pharo classes.

--
Serge Stinckwich
UMI UMMISCO 209 (IRD/UPMC/UY1)
"Programs must be written for people to read, and only incidentally for machines to execute."
http://www.doesnotunderstand.org/
Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Esteban A. Maringolo
In reply to this post by Richard O'Keefe
2018-03-05 10:52 GMT-03:00 Richard O'Keefe <[hidden email]>:
> As I understand it, the OP's problem is that he wants to
> count bits from the high end, and Integers don't *have*
> a finite high end.

Exactly.

I want to work with a block of bits, today the best proxy for that is
using an Integer.

> Smalltalk/X has a BitArray class; see
> http://live.exept.de/ClassDoc/classDocOf:,BitArray
> for documentation.

This class basically does the job.

Thank you for the reference.

Esteban A. Maringolo

Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Esteban A. Maringolo
In reply to this post by Stephane Ducasse-3
2018-03-05 14:02 GMT-03:00 Stephane Ducasse <[hidden email]>:

> On Sun, Mar 4, 2018 at 9:43 PM, Esteban A. Maringolo
> <[hidden email]> wrote:
>> 2018-03-04 17:15 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
>>> Bits are actually numbered from right to left (seen from how they are printed).
>>
>> I understand bit operations, used it extensively with IP address eons ago.
>>
>> But if a spec says: "Take the first n bits from the hash", it means
>> the first significant bits.
>> so in 2r100101111 the first 3 bits are "100" and not "111".
>
> naive question: why?

Because it says so.
"A checksum is generated by taking the first ENT / 32 bits of its
SHA256 hash. This checksum is appended to the end of the initial
entropy.
Next, these concatenated bits are split into groups of 11 bits, each
encoding a number from 0-2047, serving as an index into a wordlist.
Finally, we convert these numbers into words and use the joined words
as a mnemonic sentence." [1].

> To me it looks like a lousy specification.

It might be, I can't really tell.

But such manipulation could be useful if you are knitting different
parts of a binary packet whose boundaries are not at byte level, but
bit instead. So you can take "these 5 bits, concatenate with this
other 7, add 13 zero bits, then 1 followed by the payload". I'm
assuming a non real case here though, my use case was fulfilled
already.

Regards!

--
Esteban A. Maringolo

[1] https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki#generating-the-mnemonic

Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Richard O'Keefe
I note that the specification in question does not
deal with arbitrary bit strings but with "entropies"
that are 128 to 256 bits long and a multiple of 32 bits.
4 to 8 bits are copied from the front to the end.
(So selecting *this* bit field can be done by taking
the first byte of a ByteArray.)  This makes the sequence
132 to 264 bits.  This is then chopped into 11 bit
subsequences.  These are not arbitrary subsequences and
they are not taken in arbitrary order.  They are a stream.

My own Smalltalk library include BitInputStream and
BitOutputStream, wrapping byte streams.  So we could
do something like

   ent := aByteArray size * 8.  "BIP-39 ENT"
   cs  := ent // 32.            "BIP-39 CS"
   foo := ByteArray new: (ent + cs) // 8.
   o   := BitOutputStream on: foo writeStream.
   i   := BitInputStream on: aByteArray readStream.
   1 to: ent do: [:x | o nextPut: i next].
   i reset.
   o nextUnsigned: cs put: (i nextUnsigned: cs).
   i close.
   o close.
   i := BitInputStream on: foo readStream.
   ans := (1 to: (ent + cs) // 11) collect: [:x |
      WordList at: 1 + (i nextUnsigned: 11)].

Stare at this for a bit, and you realise that you don't
actually need the working byte array foo.
ByteArray
  methods for: 'bitcoin'
    mnemonic
      |ent csn i t|
      (ent between: 128 and: 256)
        ifFalse: [self error: 'wrong size for BIP-39'].
      cs  := ent // 32.
      n   := (ent + cs) // 11.
      i   := BitInputStream on: (ReadStream on: self).
      t   := i nextUnsigned: cs.
      i reset.
      ^(1 to: n) collect: [:index |
        WordList at: 1 + (index = n
          ifTrue:  [((i nextUnsigned: 11 - cs) bitShift: cs) bitOr: t]
          ifFalse: [i nextUnsigned: 11])]

My BitInputStream and BitOutputStream classes are,
um, not really mature.  They aren't *completely*
naive, but they could be a lot better, and in
particular, BitInputStream>>nextUnsigned: and
BitOutputStream>>nextUnsigned:put: are definitely
suboptimal.  I put this out there just to suggest
that there is a completely different way of thinking
about the problem.  (Actually, this isn't *entirely*
unlike using Erlang bit syntax.)

Bit*Streams are useful enough to justify primitive
support.  (Which my classes don't have yet.  I did
say they are not mature...)

This reminds me of a lesson I learned many years
ago: STRINGS ARE WRONG.  (Thank you, designers of
Burroughs Extended Algol!)  When trees aren't the
answer, streams often are.




On 6 March 2018 at 07:21, Esteban A. Maringolo <[hidden email]> wrote:
2018-03-05 14:02 GMT-03:00 Stephane Ducasse <[hidden email]>:
> On Sun, Mar 4, 2018 at 9:43 PM, Esteban A. Maringolo
> <[hidden email]> wrote:
>> 2018-03-04 17:15 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
>>> Bits are actually numbered from right to left (seen from how they are printed).
>>
>> I understand bit operations, used it extensively with IP address eons ago.
>>
>> But if a spec says: "Take the first n bits from the hash", it means
>> the first significant bits.
>> so in 2r100101111 the first 3 bits are "100" and not "111".
>
> naive question: why?

Because it says so.
"A checksum is generated by taking the first ENT / 32 bits of its
SHA256 hash. This checksum is appended to the end of the initial
entropy.
Next, these concatenated bits are split into groups of 11 bits, each
encoding a number from 0-2047, serving as an index into a wordlist.
Finally, we convert these numbers into words and use the joined words
as a mnemonic sentence." [1].

> To me it looks like a lousy specification.

It might be, I can't really tell.

But such manipulation could be useful if you are knitting different
parts of a binary packet whose boundaries are not at byte level, but
bit instead. So you can take "these 5 bits, concatenate with this
other 7, add 13 zero bits, then 1 followed by the payload". I'm
assuming a non real case here though, my use case was fulfilled
already.

Regards!

--
Esteban A. Maringolo

[1] https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki#generating-the-mnemonic


Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Esteban A. Maringolo
Hi Richard,

Certainly a BitStream is beyond what I initially thought.

But it could be of some use to play with it, is it available somewhere
with a friendly open source license?

I might give it a try using it for my generator (tests are already
passing). Your example is almost complete, but if I understand it
you're adding the size of the checksum instead of the checksum itself.
Nonetheless I think its readability is superior, and I guess that it
is better perfomance and memorywise.

Also it could be useful for Base58Encoder I'm expermenting with.
Encoding is "simple" but requires the use of a really large integer,
I'm stuck at the decoding part now.
After that there is a Base32 encoder (to implement a Bech32 encoder as well).

So I might use it in my road of experimentation with these matters.
Unless I diverge from this and abandon it as it normally happens. :/

Regarding this part:
> This reminds me of a lesson I learned many years
> ago: STRINGS ARE WRONG.  (Thank you, designers of
> Burroughs Extended Algol!)  When trees aren't the
> answer, streams often are.

Can you provide more context to this? I wouldn't mind if it is in a
separate thread or a blog post of your own.

Thanks in advance.

Esteban A. Maringolo

2018-03-05 21:55 GMT-03:00 Richard O'Keefe <[hidden email]>:

> I note that the specification in question does not
> deal with arbitrary bit strings but with "entropies"
> that are 128 to 256 bits long and a multiple of 32 bits.
> 4 to 8 bits are copied from the front to the end.
> (So selecting *this* bit field can be done by taking
> the first byte of a ByteArray.)  This makes the sequence
> 132 to 264 bits.  This is then chopped into 11 bit
> subsequences.  These are not arbitrary subsequences and
> they are not taken in arbitrary order.  They are a stream.
>
> My own Smalltalk library include BitInputStream and
> BitOutputStream, wrapping byte streams.  So we could
> do something like
>
>    ent := aByteArray size * 8.  "BIP-39 ENT"
>    cs  := ent // 32.            "BIP-39 CS"
>    foo := ByteArray new: (ent + cs) // 8.
>    o   := BitOutputStream on: foo writeStream.
>    i   := BitInputStream on: aByteArray readStream.
>    1 to: ent do: [:x | o nextPut: i next].
>    i reset.
>    o nextUnsigned: cs put: (i nextUnsigned: cs).
>    i close.
>    o close.
>    i := BitInputStream on: foo readStream.
>    ans := (1 to: (ent + cs) // 11) collect: [:x |
>       WordList at: 1 + (i nextUnsigned: 11)].
>
> Stare at this for a bit, and you realise that you don't
> actually need the working byte array foo.
> ByteArray
>   methods for: 'bitcoin'
>     mnemonic
>       |ent csn i t|
>       (ent between: 128 and: 256)
>         ifFalse: [self error: 'wrong size for BIP-39'].
>       cs  := ent // 32.
>       n   := (ent + cs) // 11.
>       i   := BitInputStream on: (ReadStream on: self).
>       t   := i nextUnsigned: cs.
>       i reset.
>       ^(1 to: n) collect: [:index |
>         WordList at: 1 + (index = n
>           ifTrue:  [((i nextUnsigned: 11 - cs) bitShift: cs) bitOr: t]
>           ifFalse: [i nextUnsigned: 11])]
>
> My BitInputStream and BitOutputStream classes are,
> um, not really mature.  They aren't *completely*
> naive, but they could be a lot better, and in
> particular, BitInputStream>>nextUnsigned: and
> BitOutputStream>>nextUnsigned:put: are definitely
> suboptimal.  I put this out there just to suggest
> that there is a completely different way of thinking
> about the problem.  (Actually, this isn't *entirely*
> unlike using Erlang bit syntax.)
>
> Bit*Streams are useful enough to justify primitive
> support.  (Which my classes don't have yet.  I did
> say they are not mature...)
>
> This reminds me of a lesson I learned many years
> ago: STRINGS ARE WRONG.  (Thank you, designers of
> Burroughs Extended Algol!)  When trees aren't the
> answer, streams often are.
>
>
>
>
> On 6 March 2018 at 07:21, Esteban A. Maringolo <[hidden email]> wrote:
>>
>> 2018-03-05 14:02 GMT-03:00 Stephane Ducasse <[hidden email]>:
>> > On Sun, Mar 4, 2018 at 9:43 PM, Esteban A. Maringolo
>> > <[hidden email]> wrote:
>> >> 2018-03-04 17:15 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
>> >>> Bits are actually numbered from right to left (seen from how they are
>> >>> printed).
>> >>
>> >> I understand bit operations, used it extensively with IP address eons
>> >> ago.
>> >>
>> >> But if a spec says: "Take the first n bits from the hash", it means
>> >> the first significant bits.
>> >> so in 2r100101111 the first 3 bits are "100" and not "111".
>> >
>> > naive question: why?
>>
>> Because it says so.
>> "A checksum is generated by taking the first ENT / 32 bits of its
>> SHA256 hash. This checksum is appended to the end of the initial
>> entropy.
>> Next, these concatenated bits are split into groups of 11 bits, each
>> encoding a number from 0-2047, serving as an index into a wordlist.
>> Finally, we convert these numbers into words and use the joined words
>> as a mnemonic sentence." [1].
>>
>> > To me it looks like a lousy specification.
>>
>> It might be, I can't really tell.
>>
>> But such manipulation could be useful if you are knitting different
>> parts of a binary packet whose boundaries are not at byte level, but
>> bit instead. So you can take "these 5 bits, concatenate with this
>> other 7, add 13 zero bits, then 1 followed by the payload". I'm
>> assuming a non real case here though, my use case was fulfilled
>> already.
>>
>> Regards!
>>
>> --
>> Esteban A. Maringolo
>>
>> [1]
>> https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki#generating-the-mnemonic
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Richard O'Keefe
Re "Strings are wrong", the influences on my thinking went like this.

(1) As an undergraduate, I did exercises with the Euler compiler
    (written in Burroughs Algol) and the XPL compiler (written in XPL).
    XPL had immutable strings, rather like AWK or Java, and a PL/I-ish
    set of operations on them.  Burroughs Algol didn't have strings,
    but did have pointers, that you could use to stream through arrays.
    It turned out to be notably easier to do tokenising in Algol than
    in XPL.

(2) A friend of mine, while a masters student, made some pocket money
    by writing a "batch editor" (same application domain as sed, but
    different command set) for IBM mainframes under MVS, which he did
    using PL/I.  PL/I fans used to point to its string operations as
    a good collection, but he had to fight PL/I every step of the way
    to get the string operations he needed.  When I met UNIX V7 on a
    PDP-11, it took me 15 pages of C to accomplish what he needed
    more than 50 pages of PL/I to do.  And my editor had undo...
    The trick was being able to make just the right text abstractions
    that I needed.

(3) A company I worked at had a program that generated assembly code
    for several different machines from a common specification.  The
    original version manipulated instructions as *trees*.  A new hire
    was asked to write a new generator because the old one was
    getting harder to maintain and we wanted it faster.  His program
    represented instructions as *strings*.  It turned out to be
    slower and harder to maintain than the original.

(4) I encountered Eugene Myers' AVL DAG data structure (which quite
    efficiently lets you maintain many versions of a large text) and
    the Rope data structure (basically a lazy concatenation tree)
    that accompanies the Boehm garbage collector.  Both of them get
    a lot of efficiency out of NOT doing concatenations.

(5) Learning Java, I found that Java had taken a bullet to the head
    which Smalltalk had dodged, and which had been very clearly
    explained in the coloured books.  Consider the selectors
    #printOn: and #printString.  There are two ways to implement
    them: the smart (Smalltalk) and the spectacularly stupid (Java).
    Smart way:  #printOn: is basic, #printString is derived.
    Stupid way: #printString is basic and #printOn: is derived.
    Why is this smart?  If you are generating several MB of text
    *in order to write it to an external sink*, then #printOn:
    does the job very efficiently.  But #printString not only
    turns over large amounts of memory, giving the GC more work
    than it really needs, it is fatally easy to fall into
    quadratic time behaviour.

(5) There's the insufficiently famous epigram by Alan Perlis:
    "The string is a stark data structure,
     and everywhere it is passed there is much duplication of process.
     It is a perfect vehicle for hiding information."
    One way to understand this is that it's quite easy to put
    information *into* a string, but it can be quite hard to get
    it out again.  I have even run into examples where the
    getting-it-out-reliably part was accidentally impossible.
    What the heck, let's take Dates as an example (not of impossibility).
    What does '180612' mean?  What does dateString copyFrom: 3 to: 4
    give you?  But aData monthNumber is hard to misunderstand.

(6) Then there's the whole scripting attack (Little Bobby Tables)
    thing.

The rule of thumb that I use is that if you are just accepting a
datum, maybe stashing it somewhere, and eventually giving it back,
then a string is fine.  For example, considering the huge variety
in the way people's names may be structured, strings are pretty
much the ideal way to represent them.  You *shouldn't* be extracting
"the middle initial" because some people don't *have* one and some
people have more than one.  (I am sick of my name being rejected as
invalid by American-designed programs that insist that a surname
cannot have an apostrophe or two capital letters.  No Irish need
apply.)  But if you want to process structured information, you
want to get the data out of string form and into some kind of tree
as soon as you can.  In fact, you don't want a string form in the
first place.  I have seen people manipulating XML with string
operations, and it's not a pretty sight.

And when we're talking about something like (REAL) object-oriented
programming in a language like Pharo, we really should be thinking
in terms of objects other than strings. For example, if we take
VMS as an example, [.FOO]BAR.TXT;1 and <.FOO>BAR.TXT.1 are the
same *filename* while they are different *strings*, and resolving
the filename relative to DSK:[ICK.ACK] isn't terribly similar to
string concatenation.  (Alphabetic case?  Let's not go there.)
This is precisely why there are classes like FileLocator and Path.



On 6 March 2018 at 16:33, Esteban A. Maringolo <[hidden email]> wrote:
Hi Richard,

Certainly a BitStream is beyond what I initially thought.

But it could be of some use to play with it, is it available somewhere
with a friendly open source license?

I might give it a try using it for my generator (tests are already
passing). Your example is almost complete, but if I understand it
you're adding the size of the checksum instead of the checksum itself.
Nonetheless I think its readability is superior, and I guess that it
is better perfomance and memorywise.

Also it could be useful for Base58Encoder I'm expermenting with.
Encoding is "simple" but requires the use of a really large integer,
I'm stuck at the decoding part now.
After that there is a Base32 encoder (to implement a Bech32 encoder as well).

So I might use it in my road of experimentation with these matters.
Unless I diverge from this and abandon it as it normally happens. :/

Regarding this part:
> This reminds me of a lesson I learned many years
> ago: STRINGS ARE WRONG.  (Thank you, designers of
> Burroughs Extended Algol!)  When trees aren't the
> answer, streams often are.

Can you provide more context to this? I wouldn't mind if it is in a
separate thread or a blog post of your own.

Thanks in advance.

Esteban A. Maringolo

2018-03-05 21:55 GMT-03:00 Richard O'Keefe <[hidden email]>:
> I note that the specification in question does not
> deal with arbitrary bit strings but with "entropies"
> that are 128 to 256 bits long and a multiple of 32 bits.
> 4 to 8 bits are copied from the front to the end.
> (So selecting *this* bit field can be done by taking
> the first byte of a ByteArray.)  This makes the sequence
> 132 to 264 bits.  This is then chopped into 11 bit
> subsequences.  These are not arbitrary subsequences and
> they are not taken in arbitrary order.  They are a stream.
>
> My own Smalltalk library include BitInputStream and
> BitOutputStream, wrapping byte streams.  So we could
> do something like
>
>    ent := aByteArray size * 8.  "BIP-39 ENT"
>    cs  := ent // 32.            "BIP-39 CS"
>    foo := ByteArray new: (ent + cs) // 8.
>    o   := BitOutputStream on: foo writeStream.
>    i   := BitInputStream on: aByteArray readStream.
>    1 to: ent do: [:x | o nextPut: i next].
>    i reset.
>    o nextUnsigned: cs put: (i nextUnsigned: cs).
>    i close.
>    o close.
>    i := BitInputStream on: foo readStream.
>    ans := (1 to: (ent + cs) // 11) collect: [:x |
>       WordList at: 1 + (i nextUnsigned: 11)].
>
> Stare at this for a bit, and you realise that you don't
> actually need the working byte array foo.
> ByteArray
>   methods for: 'bitcoin'
>     mnemonic
>       |ent csn i t|
>       (ent between: 128 and: 256)
>         ifFalse: [self error: 'wrong size for BIP-39'].
>       cs  := ent // 32.
>       n   := (ent + cs) // 11.
>       i   := BitInputStream on: (ReadStream on: self).
>       t   := i nextUnsigned: cs.
>       i reset.
>       ^(1 to: n) collect: [:index |
>         WordList at: 1 + (index = n
>           ifTrue:  [((i nextUnsigned: 11 - cs) bitShift: cs) bitOr: t]
>           ifFalse: [i nextUnsigned: 11])]
>
> My BitInputStream and BitOutputStream classes are,
> um, not really mature.  They aren't *completely*
> naive, but they could be a lot better, and in
> particular, BitInputStream>>nextUnsigned: and
> BitOutputStream>>nextUnsigned:put: are definitely
> suboptimal.  I put this out there just to suggest
> that there is a completely different way of thinking
> about the problem.  (Actually, this isn't *entirely*
> unlike using Erlang bit syntax.)
>
> Bit*Streams are useful enough to justify primitive
> support.  (Which my classes don't have yet.  I did
> say they are not mature...)
>
> This reminds me of a lesson I learned many years
> ago: STRINGS ARE WRONG.  (Thank you, designers of
> Burroughs Extended Algol!)  When trees aren't the
> answer, streams often are.
>
>
>
>
> On 6 March 2018 at 07:21, Esteban A. Maringolo <[hidden email]> wrote:
>>
>> 2018-03-05 14:02 GMT-03:00 Stephane Ducasse <[hidden email]>:
>> > On Sun, Mar 4, 2018 at 9:43 PM, Esteban A. Maringolo
>> > <[hidden email]> wrote:
>> >> 2018-03-04 17:15 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
>> >>> Bits are actually numbered from right to left (seen from how they are
>> >>> printed).
>> >>
>> >> I understand bit operations, used it extensively with IP address eons
>> >> ago.
>> >>
>> >> But if a spec says: "Take the first n bits from the hash", it means
>> >> the first significant bits.
>> >> so in 2r100101111 the first 3 bits are "100" and not "111".
>> >
>> > naive question: why?
>>
>> Because it says so.
>> "A checksum is generated by taking the first ENT / 32 bits of its
>> SHA256 hash. This checksum is appended to the end of the initial
>> entropy.
>> Next, these concatenated bits are split into groups of 11 bits, each
>> encoding a number from 0-2047, serving as an index into a wordlist.
>> Finally, we convert these numbers into words and use the joined words
>> as a mnemonic sentence." [1].
>>
>> > To me it looks like a lousy specification.
>>
>> It might be, I can't really tell.
>>
>> But such manipulation could be useful if you are knitting different
>> parts of a binary packet whose boundaries are not at byte level, but
>> bit instead. So you can take "these 5 bits, concatenate with this
>> other 7, add 13 zero bits, then 1 followed by the payload". I'm
>> assuming a non real case here though, my use case was fulfilled
>> already.
>>
>> Regards!
>>
>> --
>> Esteban A. Maringolo
>>
>> [1]
>> https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki#generating-the-mnemonic
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Richard Sargent
Administrator
+100.

Strings are "dead bits", lacking behaviour specific to what they represent. Your date example is an excellent portrayal of that. They should never be used for holding structured data.

Think of strings (dead bits) in much the same way we think of paper (dead trees).




On Mar 6, 2018 05:46, "Richard O'Keefe" <[hidden email]> wrote:
Re "Strings are wrong", the influences on my thinking went like this.

(1) As an undergraduate, I did exercises with the Euler compiler
    (written in Burroughs Algol) and the XPL compiler (written in XPL).
    XPL had immutable strings, rather like AWK or Java, and a PL/I-ish
    set of operations on them.  Burroughs Algol didn't have strings,
    but did have pointers, that you could use to stream through arrays.
    It turned out to be notably easier to do tokenising in Algol than
    in XPL.

(2) A friend of mine, while a masters student, made some pocket money
    by writing a "batch editor" (same application domain as sed, but
    different command set) for IBM mainframes under MVS, which he did
    using PL/I.  PL/I fans used to point to its string operations as
    a good collection, but he had to fight PL/I every step of the way
    to get the string operations he needed.  When I met UNIX V7 on a
    PDP-11, it took me 15 pages of C to accomplish what he needed
    more than 50 pages of PL/I to do.  And my editor had undo...
    The trick was being able to make just the right text abstractions
    that I needed.

(3) A company I worked at had a program that generated assembly code
    for several different machines from a common specification.  The
    original version manipulated instructions as *trees*.  A new hire
    was asked to write a new generator because the old one was
    getting harder to maintain and we wanted it faster.  His program
    represented instructions as *strings*.  It turned out to be
    slower and harder to maintain than the original.

(4) I encountered Eugene Myers' AVL DAG data structure (which quite
    efficiently lets you maintain many versions of a large text) and
    the Rope data structure (basically a lazy concatenation tree)
    that accompanies the Boehm garbage collector.  Both of them get
    a lot of efficiency out of NOT doing concatenations.

(5) Learning Java, I found that Java had taken a bullet to the head
    which Smalltalk had dodged, and which had been very clearly
    explained in the coloured books.  Consider the selectors
    #printOn: and #printString.  There are two ways to implement
    them: the smart (Smalltalk) and the spectacularly stupid (Java).
    Smart way:  #printOn: is basic, #printString is derived.
    Stupid way: #printString is basic and #printOn: is derived.
    Why is this smart?  If you are generating several MB of text
    *in order to write it to an external sink*, then #printOn:
    does the job very efficiently.  But #printString not only
    turns over large amounts of memory, giving the GC more work
    than it really needs, it is fatally easy to fall into
    quadratic time behaviour.

(5) There's the insufficiently famous epigram by Alan Perlis:
    "The string is a stark data structure,
     and everywhere it is passed there is much duplication of process.
     It is a perfect vehicle for hiding information."
    One way to understand this is that it's quite easy to put
    information *into* a string, but it can be quite hard to get
    it out again.  I have even run into examples where the
    getting-it-out-reliably part was accidentally impossible.
    What the heck, let's take Dates as an example (not of impossibility).
    What does '180612' mean?  What does dateString copyFrom: 3 to: 4
    give you?  But aData monthNumber is hard to misunderstand.

(6) Then there's the whole scripting attack (Little Bobby Tables)
    thing.

The rule of thumb that I use is that if you are just accepting a
datum, maybe stashing it somewhere, and eventually giving it back,
then a string is fine.  For example, considering the huge variety
in the way people's names may be structured, strings are pretty
much the ideal way to represent them.  You *shouldn't* be extracting
"the middle initial" because some people don't *have* one and some
people have more than one.  (I am sick of my name being rejected as
invalid by American-designed programs that insist that a surname
cannot have an apostrophe or two capital letters.  No Irish need
apply.)  But if you want to process structured information, you
want to get the data out of string form and into some kind of tree
as soon as you can.  In fact, you don't want a string form in the
first place.  I have seen people manipulating XML with string
operations, and it's not a pretty sight.

And when we're talking about something like (REAL) object-oriented
programming in a language like Pharo, we really should be thinking
in terms of objects other than strings. For example, if we take
VMS as an example, [.FOO]BAR.TXT;1 and <.FOO>BAR.TXT.1 are the
same *filename* while they are different *strings*, and resolving
the filename relative to DSK:[ICK.ACK] isn't terribly similar to
string concatenation.  (Alphabetic case?  Let's not go there.)
This is precisely why there are classes like FileLocator and Path.



On 6 March 2018 at 16:33, Esteban A. Maringolo <[hidden email]> wrote:
Hi Richard,

Certainly a BitStream is beyond what I initially thought.

But it could be of some use to play with it, is it available somewhere
with a friendly open source license?

I might give it a try using it for my generator (tests are already
passing). Your example is almost complete, but if I understand it
you're adding the size of the checksum instead of the checksum itself.
Nonetheless I think its readability is superior, and I guess that it
is better perfomance and memorywise.

Also it could be useful for Base58Encoder I'm expermenting with.
Encoding is "simple" but requires the use of a really large integer,
I'm stuck at the decoding part now.
After that there is a Base32 encoder (to implement a Bech32 encoder as well).

So I might use it in my road of experimentation with these matters.
Unless I diverge from this and abandon it as it normally happens. :/

Regarding this part:
> This reminds me of a lesson I learned many years
> ago: STRINGS ARE WRONG.  (Thank you, designers of
> Burroughs Extended Algol!)  When trees aren't the
> answer, streams often are.

Can you provide more context to this? I wouldn't mind if it is in a
separate thread or a blog post of your own.

Thanks in advance.

Esteban A. Maringolo

2018-03-05 21:55 GMT-03:00 Richard O'Keefe <[hidden email]>:
> I note that the specification in question does not
> deal with arbitrary bit strings but with "entropies"
> that are 128 to 256 bits long and a multiple of 32 bits.
> 4 to 8 bits are copied from the front to the end.
> (So selecting *this* bit field can be done by taking
> the first byte of a ByteArray.)  This makes the sequence
> 132 to 264 bits.  This is then chopped into 11 bit
> subsequences.  These are not arbitrary subsequences and
> they are not taken in arbitrary order.  They are a stream.
>
> My own Smalltalk library include BitInputStream and
> BitOutputStream, wrapping byte streams.  So we could
> do something like
>
>    ent := aByteArray size * 8.  "BIP-39 ENT"
>    cs  := ent // 32.            "BIP-39 CS"
>    foo := ByteArray new: (ent + cs) // 8.
>    o   := BitOutputStream on: foo writeStream.
>    i   := BitInputStream on: aByteArray readStream.
>    1 to: ent do: [:x | o nextPut: i next].
>    i reset.
>    o nextUnsigned: cs put: (i nextUnsigned: cs).
>    i close.
>    o close.
>    i := BitInputStream on: foo readStream.
>    ans := (1 to: (ent + cs) // 11) collect: [:x |
>       WordList at: 1 + (i nextUnsigned: 11)].
>
> Stare at this for a bit, and you realise that you don't
> actually need the working byte array foo.
> ByteArray
>   methods for: 'bitcoin'
>     mnemonic
>       |ent csn i t|
>       (ent between: 128 and: 256)
>         ifFalse: [self error: 'wrong size for BIP-39'].
>       cs  := ent // 32.
>       n   := (ent + cs) // 11.
>       i   := BitInputStream on: (ReadStream on: self).
>       t   := i nextUnsigned: cs.
>       i reset.
>       ^(1 to: n) collect: [:index |
>         WordList at: 1 + (index = n
>           ifTrue:  [((i nextUnsigned: 11 - cs) bitShift: cs) bitOr: t]
>           ifFalse: [i nextUnsigned: 11])]
>
> My BitInputStream and BitOutputStream classes are,
> um, not really mature.  They aren't *completely*
> naive, but they could be a lot better, and in
> particular, BitInputStream>>nextUnsigned: and
> BitOutputStream>>nextUnsigned:put: are definitely
> suboptimal.  I put this out there just to suggest
> that there is a completely different way of thinking
> about the problem.  (Actually, this isn't *entirely*
> unlike using Erlang bit syntax.)
>
> Bit*Streams are useful enough to justify primitive
> support.  (Which my classes don't have yet.  I did
> say they are not mature...)
>
> This reminds me of a lesson I learned many years
> ago: STRINGS ARE WRONG.  (Thank you, designers of
> Burroughs Extended Algol!)  When trees aren't the
> answer, streams often are.
>
>
>
>
> On 6 March 2018 at 07:21, Esteban A. Maringolo <[hidden email]> wrote:
>>
>> 2018-03-05 14:02 GMT-03:00 Stephane Ducasse <[hidden email]>:
>> > On Sun, Mar 4, 2018 at 9:43 PM, Esteban A. Maringolo
>> > <[hidden email]> wrote:
>> >> 2018-03-04 17:15 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
>> >>> Bits are actually numbered from right to left (seen from how they are
>> >>> printed).
>> >>
>> >> I understand bit operations, used it extensively with IP address eons
>> >> ago.
>> >>
>> >> But if a spec says: "Take the first n bits from the hash", it means
>> >> the first significant bits.
>> >> so in 2r100101111 the first 3 bits are "100" and not "111".
>> >
>> > naive question: why?
>>
>> Because it says so.
>> "A checksum is generated by taking the first ENT / 32 bits of its
>> SHA256 hash. This checksum is appended to the end of the initial
>> entropy.
>> Next, these concatenated bits are split into groups of 11 bits, each
>> encoding a number from 0-2047, serving as an index into a wordlist.
>> Finally, we convert these numbers into words and use the joined words
>> as a mnemonic sentence." [1].
>>
>> > To me it looks like a lousy specification.
>>
>> It might be, I can't really tell.
>>
>> But such manipulation could be useful if you are knitting different
>> parts of a binary packet whose boundaries are not at byte level, but
>> bit instead. So you can take "these 5 bits, concatenate with this
>> other 7, add 13 zero bits, then 1 followed by the payload". I'm
>> assuming a non real case here though, my use case was fulfilled
>> already.
>>
>> Regards!
>>
>> --
>> Esteban A. Maringolo
>>
>> [1]
>> https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki#generating-the-mnemonic
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Henrik Sperre Johansen
In reply to this post by Esteban A. Maringolo
Esteban A. Maringolo wrote

> Hi Richard,
>
> Certainly a BitStream is beyond what I initially thought.
>
> But it could be of some use to play with it, is it available somewhere
> with a friendly open source license?
>
> I might give it a try using it for my generator (tests are already
> passing). Your example is almost complete, but if I understand it
> you're adding the size of the checksum instead of the checksum itself.
> Nonetheless I think its readability is superior, and I guess that it
> is better perfomance and memorywise.
>
> Also it could be useful for Base58Encoder I'm expermenting with.
> Encoding is "simple" but requires the use of a really large integer,
> I'm stuck at the decoding part now.
> After that there is a Base32 encoder (to implement a Bech32 encoder as
> well).
>
> So I might use it in my road of experimentation with these matters.
> Unless I diverge from this and abandon it as it normally happens. :/

BaseX encode/decode can be simple enough to fit in a small workspace:

base := 58.
base58Lookup :=
'123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'.
myNumber := 12345678901234567890.

decode := [ :string | |decodedNumber|
        decodedNumber := 0.
        string do: [ :char | decodedNumber := decodedNumber * base +(base58Lookup
indexOf: char) - 1 ].
        decodedNumber ].
encode := [ :number | |encodedString numDigits toEncode|
        numDigits := (number numberOfDigitsInBase: base).
        toEncode := number.
        encodedString := String new: numDigits.
        0 to: numDigits - 1 do: [ :i | |lutIndex|
                encodedString at: numDigits - i put: (base58Lookup at: toEncode \\ base +
1).
                toEncode := toEncode // base. ].
        encodedString ].

myString := encode value: myNumber.
(decode value: myString) = myNumber.

The appeal of Base64 is it transforms the // and \\ operations into simple
shifts / masks.

Cheers,
Henry



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Reply | Threaded
Open this post in threaded view
|

Re: Bitwise operations in ByteArray (slicing, etc.)

Esteban A. Maringolo
2018-03-06 12:06 GMT-03:00 Henrik Sperre Johansen
<[hidden email]>:
> Esteban A. Maringolo wrote

> BaseX encode/decode can be simple enough to fit in a small workspace:
>
> base := 58.
> base58Lookup :=
> '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'.
> myNumber := 12345678901234567890.
>
> myString := encode value: myNumber.
> (decode value: myString) = myNumber.

If you're encoding numbers to strings, and viceversa, that's ok.
If you also need to support encoding strings (as bytearrays) that
didn't work for me.

I like your approach, it was more concise and didn't require reversing
the output as the one i used [1]

> The appeal of Base64 is it transforms the // and \\ operations into simple
> shifts / masks.

I guess, every bit and byte operation is simpler when happens in
multiples of 8 :)

Regards!


[1] https://github.com/eMaringolo/pharo-base58/blob/master/src/Base58-Core.package/Base58Encoder.class/instance/encode..st


Esteban A. Maringolo