Is there any package/library that makes bitwise operations as simple
as with an Integer, but for larger numbers (as in a ByteArray). Something that allows me to "slice" a sequence of bits, or extract some using the same protocol as with a String of ones and zeros. Now when I need to work with sequence of bits, I convert an Integer to a zero padded version of it up a known size, and then do #copyFrom:to: to extract what I need and read back the number from it. I could use a bytearray for it, but as its name implies, it is oriented towards bytes rather than bits (as in the case of Integer). Now I do stuff like the following to to extract the first 5 bits of a fixed length 256 bit array (an Integer). Integer readFrom: (((SHA256 hashMessage: message)) asInteger printStringBase: 2 length: 256 padded: true) copyFrom: 1 to: 5) base: 2 I understand bitwise operations, but I couldn't find something that does the above in a conciser way. Performance in my case isn't critical, but working with strings is probably two orders of magnitude slower than manipulating bits in integers or ByteArrays Regards, Esteban A. Maringolo |
Take a 24-bit number and you want to isolate the first 5 (these are actually the last, higher order) bits.
n := 2r101010001111000011110000. n >> (16+3). If necessary, you can apply a mask (assume there are bits earlier/later still). (n >> (16+3)) bitAnd: 2r11111 Large integers behave as bit strings, see the 'bit manipulation' protocol, and are efficient at it. > On 4 Mar 2018, at 20:29, Esteban A. Maringolo <[hidden email]> wrote: > > Is there any package/library that makes bitwise operations as simple > as with an Integer, but for larger numbers (as in a ByteArray). > > Something that allows me to "slice" a sequence of bits, or extract > some using the same protocol as with a String of ones and zeros. > > Now when I need to work with sequence of bits, I convert an Integer to > a zero padded version of it up a known size, and then do #copyFrom:to: > to extract what I need and read back the number from it. > > I could use a bytearray for it, but as its name implies, it is > oriented towards bytes rather than bits (as in the case of Integer). > > Now I do stuff like the following to to extract the first 5 bits of a > fixed length 256 bit array (an Integer). > > Integer > readFrom: > (((SHA256 hashMessage: message)) asInteger > printStringBase: 2 length: 256 padded: true) > copyFrom: 1 to: 5) > base: 2 > > I understand bitwise operations, but I couldn't find something that > does the above in a conciser way. > > Performance in my case isn't critical, but working with strings is > probably two orders of magnitude slower than manipulating bits in > integers or ByteArrays > > Regards, > > Esteban A. Maringolo > |
I do bitshifts and masks on Integer.
What if you want to take the bits from the 3rd to the 7th? You have to do a some arithmetic to get the slice you want. I'm simply asking for something more dev-friendlier that adds a "layer" on top of that, but that internally does regular bitwise. What I don't like about Integers is that you "lose" information about the zero bits to the left, and with a ByteArray you don't, because the array is fixed size. E.g. (1 << 16) printStringBase: 16. "'10000'" #[1 0 0] hex "'010000'" Maybe I'm too lazy asking when I could have done it myself :) Regards, Esteban A. Maringolo 2018-03-04 16:40 GMT-03:00 Sven Van Caekenberghe <[hidden email]>: > Take a 24-bit number and you want to isolate the first 5 (these are actually the last, higher order) bits. > > n := 2r101010001111000011110000. > n >> (16+3). > > If necessary, you can apply a mask (assume there are bits earlier/later still). > > (n >> (16+3)) bitAnd: 2r11111 > > Large integers behave as bit strings, see the 'bit manipulation' protocol, and are efficient at it. > >> On 4 Mar 2018, at 20:29, Esteban A. Maringolo <[hidden email]> wrote: >> >> Is there any package/library that makes bitwise operations as simple >> as with an Integer, but for larger numbers (as in a ByteArray). >> >> Something that allows me to "slice" a sequence of bits, or extract >> some using the same protocol as with a String of ones and zeros. >> >> Now when I need to work with sequence of bits, I convert an Integer to >> a zero padded version of it up a known size, and then do #copyFrom:to: >> to extract what I need and read back the number from it. >> >> I could use a bytearray for it, but as its name implies, it is >> oriented towards bytes rather than bits (as in the case of Integer). >> >> Now I do stuff like the following to to extract the first 5 bits of a >> fixed length 256 bit array (an Integer). >> >> Integer >> readFrom: >> (((SHA256 hashMessage: message)) asInteger >> printStringBase: 2 length: 256 padded: true) >> copyFrom: 1 to: 5) >> base: 2 >> >> I understand bitwise operations, but I couldn't find something that >> does the above in a conciser way. >> >> Performance in my case isn't critical, but working with strings is >> probably two orders of magnitude slower than manipulating bits in >> integers or ByteArrays >> >> Regards, >> >> Esteban A. Maringolo >> > > |
Bits are actually numbered from right to left (seen from how they are printed).
But in a fixed amount of bits you could indeed number them the other way around. It does not matter that you write the leading zeros or not, everything to the left is zero anyway. When shifting, the right thing happens. Adding something like Integer>>#bitSliceFrom:to:size: based on your reverse numbering would be wrong, IMHO. I guess you could write it like n := 2r11110000101010001111000011110000. from := 8. to := 13. size := 32. (n >> (size - 13)) bitAnd: ((1 << ( to - from + 1)) - 1). But the size feels out of place. > On 4 Mar 2018, at 20:57, Esteban A. Maringolo <[hidden email]> wrote: > > I do bitshifts and masks on Integer. > > What if you want to take the bits from the 3rd to the 7th? You have to > do a some arithmetic to get the slice you want. > I'm simply asking for something more dev-friendlier that adds a > "layer" on top of that, but that internally does regular bitwise. > > What I don't like about Integers is that you "lose" information about > the zero bits to the left, and with a ByteArray you don't, because the > array is fixed size. > > E.g. > (1 << 16) printStringBase: 16. "'10000'" > #[1 0 0] hex "'010000'" > > Maybe I'm too lazy asking when I could have done it myself :) > > Regards, > > Esteban A. Maringolo > > > 2018-03-04 16:40 GMT-03:00 Sven Van Caekenberghe <[hidden email]>: >> Take a 24-bit number and you want to isolate the first 5 (these are actually the last, higher order) bits. >> >> n := 2r101010001111000011110000. >> n >> (16+3). >> >> If necessary, you can apply a mask (assume there are bits earlier/later still). >> >> (n >> (16+3)) bitAnd: 2r11111 >> >> Large integers behave as bit strings, see the 'bit manipulation' protocol, and are efficient at it. >> >>> On 4 Mar 2018, at 20:29, Esteban A. Maringolo <[hidden email]> wrote: >>> >>> Is there any package/library that makes bitwise operations as simple >>> as with an Integer, but for larger numbers (as in a ByteArray). >>> >>> Something that allows me to "slice" a sequence of bits, or extract >>> some using the same protocol as with a String of ones and zeros. >>> >>> Now when I need to work with sequence of bits, I convert an Integer to >>> a zero padded version of it up a known size, and then do #copyFrom:to: >>> to extract what I need and read back the number from it. >>> >>> I could use a bytearray for it, but as its name implies, it is >>> oriented towards bytes rather than bits (as in the case of Integer). >>> >>> Now I do stuff like the following to to extract the first 5 bits of a >>> fixed length 256 bit array (an Integer). >>> >>> Integer >>> readFrom: >>> (((SHA256 hashMessage: message)) asInteger >>> printStringBase: 2 length: 256 padded: true) >>> copyFrom: 1 to: 5) >>> base: 2 >>> >>> I understand bitwise operations, but I couldn't find something that >>> does the above in a conciser way. >>> >>> Performance in my case isn't critical, but working with strings is >>> probably two orders of magnitude slower than manipulating bits in >>> integers or ByteArrays >>> >>> Regards, >>> >>> Esteban A. Maringolo >>> >> >> > |
2018-03-04 17:15 GMT-03:00 Sven Van Caekenberghe <[hidden email]>:
> Bits are actually numbered from right to left (seen from how they are printed). I understand bit operations, used it extensively with IP address eons ago. But if a spec says: "Take the first n bits from the hash", it means the first significant bits. so in 2r100101111 the first 3 bits are "100" and not "111". > But in a fixed amount of bits you could indeed number them the other way around. > It does not matter that you write the leading zeros or not, everything to the left is zero anyway. When shifting, the right thing happens. If it were a packet, with a fixed size, and the header started with 0, then the leading zeros matter. So in my use case it isn't the same "00000000000000000000000000000000" than "000000000000000000000000000000000000000000000000" (both in hex notation). See [1] > Adding something like Integer>>#bitSliceFrom:to:size: based on your reverse numbering would be wrong, IMHO. Yes, it would be wrong. Unless you have to option to specify the endianess you want to use. > I guess you could write it like > > n := 2r11110000101010001111000011110000. > from := 8. > to := 13. > size := 32. > (n >> (size - 13)) bitAnd: ((1 << ( to - from + 1)) - 1). > > But the size feels out of place. How so? Thanks in advance, this is super low priority, I just wanted to check whether something like that already existed. Regards! [1] https://github.com/eMaringolo/pharo-bip39mnemonic/blob/master/src/BIP39Mnemonic-Core.package/BIP39MnemonicTest.class/instance/jsonTestVectors.st Esteban A. Maringolo |
Administrator
|
Please, if you do implement something, start with a Collection class and call it something like BitArray. It might be appropriate to subclass under ByteArray, but perhaps not. On Mar 4, 2018 12:45 PM, "Esteban A. Maringolo" <[hidden email]> wrote: 2018-03-04 17:15 GMT-03:00 Sven Van Caekenberghe <[hidden email]>: |
In reply to this post by Esteban A. Maringolo
Hi Esteban, When you say "extract", what exactly do you mean? Let's suppose you have 86, i.e. 2r1010110 Which case do you want? a) 2r1010110 "2 to 5 = 2r10[1011]0 = 2r1011 = 11" or b) 2r1010110 "2 to 5 = 2r10[1011]0 = 2r00[1011]0 = 2r0010110 = 22" Both cases can be dealt with bit shifting and a bitAnd with optimal performance. ----------------- Benoît St-Jean Yahoo! Messenger: bstjean Twitter: @BenLeChialeux Pinterest: benoitstjean Instagram: Chef_Benito IRC: lamneth Blogue: endormitoire.wordpress.com "A standpoint is an intellectual horizon of radius zero". (A. Einstein)
On Sunday, March 4, 2018, 2:31:21 PM EST, Esteban A. Maringolo <[hidden email]> wrote:
Is there any package/library that makes bitwise operations as simple as with an Integer, but for larger numbers (as in a ByteArray). Something that allows me to "slice" a sequence of bits, or extract some using the same protocol as with a String of ones and zeros. Now when I need to work with sequence of bits, I convert an Integer to a zero padded version of it up a known size, and then do #copyFrom:to: to extract what I need and read back the number from it. I could use a bytearray for it, but as its name implies, it is oriented towards bytes rather than bits (as in the case of Integer). Now I do stuff like the following to to extract the first 5 bits of a fixed length 256 bit array (an Integer). Integer readFrom: (((SHA256 hashMessage: message)) asInteger printStringBase: 2 length: 256 padded: true) copyFrom: 1 to: 5) base: 2 I understand bitwise operations, but I couldn't find something that does the above in a conciser way. Performance in my case isn't critical, but working with strings is probably two orders of magnitude slower than manipulating bits in integers or ByteArrays Regards, Esteban A. Maringolo |
In reply to this post by Esteban A. Maringolo
Just put that in Integer. ---------------------------- bitsSliceFrom: start to: end " Extract bits 4 to 6, i.e. 111 which equals to 7 2r10000111000 bitsSliceFrom: 4 to: 6 " | num mask | self < 0 ifTrue: [self error: 'This operation is not allowed for negative numbers!']. num := self >> (start - 1). mask := (1 << ((end - start) + 1)) - 1. ^num bitAnd: mask. ----------------------------- bitsSliceInPlaceFrom: start to: end " Extract bits 4 to 6, IN PLACE, i.e. 111 which equals to 2r00000111000 = 56 2r10000111000 bitsSliceInPlaceFrom: 4 to: 6 " | | ^(self bitsSliceFrom: start to: end) << (start - 1) ---------------------------- This should do the job! hth ----------------- Benoît St-Jean Yahoo! Messenger: bstjean Twitter: @BenLeChialeux Pinterest: benoitstjean Instagram: Chef_Benito IRC: lamneth Blogue: endormitoire.wordpress.com "A standpoint is an intellectual horizon of radius zero". (A. Einstein)
On Sunday, March 4, 2018, 2:59:16 PM EST, Esteban A. Maringolo <[hidden email]> wrote:
I do bitshifts and masks on Integer. What if you want to take the bits from the 3rd to the 7th? You have to do a some arithmetic to get the slice you want. I'm simply asking for something more dev-friendlier that adds a "layer" on top of that, but that internally does regular bitwise. What I don't like about Integers is that you "lose" information about the zero bits to the left, and with a ByteArray you don't, because the array is fixed size. E.g. (1 << 16) printStringBase: 16. "'10000'" #[1 0 0] hex "'010000'" Maybe I'm too lazy asking when I could have done it myself :) Regards, Esteban A. Maringolo 2018-03-04 16:40 GMT-03:00 Sven Van Caekenberghe <[hidden email]>: > Take a 24-bit number and you want to isolate the first 5 (these are actually the last, higher order) bits. > > n := 2r101010001111000011110000. > n >> (16+3). > > If necessary, you can apply a mask (assume there are bits earlier/later still). > > (n >> (16+3)) bitAnd: 2r11111 > > Large integers behave as bit strings, see the 'bit manipulation' protocol, and are efficient at it. > >> On 4 Mar 2018, at 20:29, Esteban A. Maringolo <[hidden email]> wrote: >> >> Is there any package/library that makes bitwise operations as simple >> as with an Integer, but for larger numbers (as in a ByteArray). >> >> Something that allows me to "slice" a sequence of bits, or extract >> some using the same protocol as with a String of ones and zeros. >> >> Now when I need to work with sequence of bits, I convert an Integer to >> a zero padded version of it up a known size, and then do #copyFrom:to: >> to extract what I need and read back the number from it. >> >> I could use a bytearray for it, but as its name implies, it is >> oriented towards bytes rather than bits (as in the case of Integer). >> >> Now I do stuff like the following to to extract the first 5 bits of a >> fixed length 256 bit array (an Integer). >> >> Integer >> readFrom: >> (((SHA256 hashMessage: message)) asInteger >> printStringBase: 2 length: 256 padded: true) >> copyFrom: 1 to: 5) >> base: 2 >> >> I understand bitwise operations, but I couldn't find something that >> does the above in a conciser way. >> >> Performance in my case isn't critical, but working with strings is >> probably two orders of magnitude slower than manipulating bits in >> integers or ByteArrays >> >> Regards, >> >> Esteban A. Maringolo >> > > |
As I understand it, the OP's problem is that he wants to count bits from the high end, and Integers don't *have* a finite high end. PL/I has bit strings. Common Lisp has bit strings. Some Schemes have copied Lisp. Eiffel _had_ bit strings, more or less, but dropped them. for documentation. My own Smalltalk also has a BitArray class with a somewhat different interface. Although developed independently, both are implemented as byte arrays with an associated count. It's not terribly hard. For selecting small numbers of bits out of long bit strings, this is more efficient than shifting and masking a large integer. I rather wonder whether it might be possible to do better by thinking in a somewhat less C-like style. For example, Erlang handles unpacking network packets and the like by letting you write bit-string *patterns*, which the compiler can then generate special-case code for. Run-time generation of small amounts of code is somewhat easier in Smalltalk. (Most Smalltalks.) What would it be like if you had a BitArray class where you could say extractor := BitArray extractorForBits: 3 to: 7 of: 256. ... extractor value: thisPacket and extractor was a block (possibly cached) with code specialised for just that case? |
In reply to this post by Esteban A. Maringolo
On Sun, Mar 4, 2018 at 9:43 PM, Esteban A. Maringolo
<[hidden email]> wrote: > 2018-03-04 17:15 GMT-03:00 Sven Van Caekenberghe <[hidden email]>: >> Bits are actually numbered from right to left (seen from how they are printed). > > I understand bit operations, used it extensively with IP address eons ago. > > But if a spec says: "Take the first n bits from the hash", it means > the first significant bits. > so in 2r100101111 the first 3 bits are "100" and not "111". naive question: why? To me it looks like a lousy specification. > >> But in a fixed amount of bits you could indeed number them the other way around. >> It does not matter that you write the leading zeros or not, everything to the left is zero anyway. When shifting, the right thing happens. > > If it were a packet, with a fixed size, and the header started with 0, > then the leading zeros matter. > > So in my use case it isn't the same "00000000000000000000000000000000" > than "000000000000000000000000000000000000000000000000" (both in hex > notation). > See [1] > >> Adding something like Integer>>#bitSliceFrom:to:size: based on your reverse numbering would be wrong, IMHO. > > Yes, it would be wrong. Unless you have to option to specify the > endianess you want to use. > >> I guess you could write it like >> >> n := 2r11110000101010001111000011110000. >> from := 8. >> to := 13. >> size := 32. >> (n >> (size - 13)) bitAnd: ((1 << ( to - from + 1)) - 1). >> >> But the size feels out of place. > > How so? > > Thanks in advance, this is super low priority, I just wanted to check > whether something like that already existed. > > Regards! > > > [1] https://github.com/eMaringolo/pharo-bip39mnemonic/blob/master/src/BIP39Mnemonic-Core.package/BIP39MnemonicTest.class/instance/jsonTestVectors.st > > > Esteban A. Maringolo > |
In reply to this post by Richard O'Keefe
It looks like having BitArray is a good approach.
Is there a MIT compatible version? I could add it to my collection collection :) Containers I should migrate it to github. Stef On Mon, Mar 5, 2018 at 2:52 PM, Richard O'Keefe <[hidden email]> wrote: > As I understand it, the OP's problem is that he wants to > count bits from the high end, and Integers don't *have* > a finite high end. > > PL/I has bit strings. Common Lisp has bit strings. > Some Schemes have copied Lisp. Eiffel _had_ bit strings, > more or less, but dropped them. > > Smalltalk/X has a BitArray class; see > http://live.exept.de/ClassDoc/classDocOf:,BitArray > for documentation. > > My own Smalltalk also has a BitArray class with a > somewhat different interface. Although developed > independently, both are implemented as byte arrays > with an associated count. It's not terribly hard. > For selecting small numbers of bits out of long > bit strings, this is more efficient than shifting > and masking a large integer. > > I rather wonder whether it might be possible to do > better by thinking in a somewhat less C-like style. > For example, Erlang handles unpacking network > packets and the like by letting you write bit-string > *patterns*, which the compiler can then generate > special-case code for. > > Run-time generation of small amounts of code is > somewhat easier in Smalltalk. (Most Smalltalks.) > What would it be like if you had a BitArray class > where you could say > extractor := BitArray extractorForBits: 3 to: 7 of: 256. > ... > extractor value: thisPacket > and extractor was a block (possibly cached) with > code specialised for just that case? |
In reply to this post by Richard O'Keefe
On Mon, Mar 5, 2018 at 2:52 PM, Richard O'Keefe <[hidden email]> wrote:
Great documentation ! We need something like that for Pharo classes. -- Serge Stinckwich UMI UMMISCO 209 (IRD/UPMC/UY1)
|
In reply to this post by Richard O'Keefe
2018-03-05 10:52 GMT-03:00 Richard O'Keefe <[hidden email]>:
> As I understand it, the OP's problem is that he wants to > count bits from the high end, and Integers don't *have* > a finite high end. Exactly. I want to work with a block of bits, today the best proxy for that is using an Integer. > Smalltalk/X has a BitArray class; see > http://live.exept.de/ClassDoc/classDocOf:,BitArray > for documentation. This class basically does the job. Thank you for the reference. Esteban A. Maringolo |
In reply to this post by Stephane Ducasse-3
2018-03-05 14:02 GMT-03:00 Stephane Ducasse <[hidden email]>:
> On Sun, Mar 4, 2018 at 9:43 PM, Esteban A. Maringolo > <[hidden email]> wrote: >> 2018-03-04 17:15 GMT-03:00 Sven Van Caekenberghe <[hidden email]>: >>> Bits are actually numbered from right to left (seen from how they are printed). >> >> I understand bit operations, used it extensively with IP address eons ago. >> >> But if a spec says: "Take the first n bits from the hash", it means >> the first significant bits. >> so in 2r100101111 the first 3 bits are "100" and not "111". > > naive question: why? Because it says so. "A checksum is generated by taking the first ENT / 32 bits of its SHA256 hash. This checksum is appended to the end of the initial entropy. Next, these concatenated bits are split into groups of 11 bits, each encoding a number from 0-2047, serving as an index into a wordlist. Finally, we convert these numbers into words and use the joined words as a mnemonic sentence." [1]. > To me it looks like a lousy specification. It might be, I can't really tell. But such manipulation could be useful if you are knitting different parts of a binary packet whose boundaries are not at byte level, but bit instead. So you can take "these 5 bits, concatenate with this other 7, add 13 zero bits, then 1 followed by the payload". I'm assuming a non real case here though, my use case was fulfilled already. Regards! -- Esteban A. Maringolo [1] https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki#generating-the-mnemonic |
I note that the specification in question does not deal with arbitrary bit strings but with "entropies" that are 128 to 256 bits long and a multiple of 32 bits. 4 to 8 bits are copied from the front to the end. (So selecting *this* bit field can be done by taking the first byte of a ByteArray.) This makes the sequence 132 to 264 bits. This is then chopped into 11 bit subsequences. These are not arbitrary subsequences and they are not taken in arbitrary order. They are a stream. My own Smalltalk library include BitInputStream and BitOutputStream, wrapping byte streams. So we could do something like ent := aByteArray size * 8. "BIP-39 ENT" cs := ent // 32. "BIP-39 CS" foo := ByteArray new: (ent + cs) // 8. o := BitOutputStream on: foo writeStream. i := BitInputStream on: aByteArray readStream. 1 to: ent do: [:x | o nextPut: i next]. i reset. o nextUnsigned: cs put: (i nextUnsigned: cs). i close. o close. i := BitInputStream on: foo readStream. ans := (1 to: (ent + cs) // 11) collect: [:x | WordList at: 1 + (i nextUnsigned: 11)]. Stare at this for a bit, and you realise that you don't actually need the working byte array foo. ByteArray methods for: 'bitcoin' mnemonic |ent csn i t| (ent between: 128 and: 256) ifFalse: [self error: 'wrong size for BIP-39']. cs := ent // 32. n := (ent + cs) // 11. i := BitInputStream on: (ReadStream on: self). t := i nextUnsigned: cs. i reset. ^(1 to: n) collect: [:index | WordList at: 1 + (index = n ifTrue: [((i nextUnsigned: 11 - cs) bitShift: cs) bitOr: t] ifFalse: [i nextUnsigned: 11])] My BitInputStream and BitOutputStream classes are, um, not really mature. They aren't *completely* naive, but they could be a lot better, and in particular, BitInputStream>>nextUnsigned: and BitOutputStream>>nextUnsigned:put: are definitely suboptimal. I put this out there just to suggest that there is a completely different way of thinking about the problem. (Actually, this isn't *entirely* unlike using Erlang bit syntax.) Bit*Streams are useful enough to justify primitive support. (Which my classes don't have yet. I did say they are not mature...) This reminds me of a lesson I learned many years ago: STRINGS ARE WRONG. (Thank you, designers of Burroughs Extended Algol!) When trees aren't the answer, streams often are. On 6 March 2018 at 07:21, Esteban A. Maringolo <[hidden email]> wrote: 2018-03-05 14:02 GMT-03:00 Stephane Ducasse <[hidden email]>: |
Hi Richard,
Certainly a BitStream is beyond what I initially thought. But it could be of some use to play with it, is it available somewhere with a friendly open source license? I might give it a try using it for my generator (tests are already passing). Your example is almost complete, but if I understand it you're adding the size of the checksum instead of the checksum itself. Nonetheless I think its readability is superior, and I guess that it is better perfomance and memorywise. Also it could be useful for Base58Encoder I'm expermenting with. Encoding is "simple" but requires the use of a really large integer, I'm stuck at the decoding part now. After that there is a Base32 encoder (to implement a Bech32 encoder as well). So I might use it in my road of experimentation with these matters. Unless I diverge from this and abandon it as it normally happens. :/ Regarding this part: > This reminds me of a lesson I learned many years > ago: STRINGS ARE WRONG. (Thank you, designers of > Burroughs Extended Algol!) When trees aren't the > answer, streams often are. Can you provide more context to this? I wouldn't mind if it is in a separate thread or a blog post of your own. Thanks in advance. Esteban A. Maringolo 2018-03-05 21:55 GMT-03:00 Richard O'Keefe <[hidden email]>: > I note that the specification in question does not > deal with arbitrary bit strings but with "entropies" > that are 128 to 256 bits long and a multiple of 32 bits. > 4 to 8 bits are copied from the front to the end. > (So selecting *this* bit field can be done by taking > the first byte of a ByteArray.) This makes the sequence > 132 to 264 bits. This is then chopped into 11 bit > subsequences. These are not arbitrary subsequences and > they are not taken in arbitrary order. They are a stream. > > My own Smalltalk library include BitInputStream and > BitOutputStream, wrapping byte streams. So we could > do something like > > ent := aByteArray size * 8. "BIP-39 ENT" > cs := ent // 32. "BIP-39 CS" > foo := ByteArray new: (ent + cs) // 8. > o := BitOutputStream on: foo writeStream. > i := BitInputStream on: aByteArray readStream. > 1 to: ent do: [:x | o nextPut: i next]. > i reset. > o nextUnsigned: cs put: (i nextUnsigned: cs). > i close. > o close. > i := BitInputStream on: foo readStream. > ans := (1 to: (ent + cs) // 11) collect: [:x | > WordList at: 1 + (i nextUnsigned: 11)]. > > Stare at this for a bit, and you realise that you don't > actually need the working byte array foo. > ByteArray > methods for: 'bitcoin' > mnemonic > |ent csn i t| > (ent between: 128 and: 256) > ifFalse: [self error: 'wrong size for BIP-39']. > cs := ent // 32. > n := (ent + cs) // 11. > i := BitInputStream on: (ReadStream on: self). > t := i nextUnsigned: cs. > i reset. > ^(1 to: n) collect: [:index | > WordList at: 1 + (index = n > ifTrue: [((i nextUnsigned: 11 - cs) bitShift: cs) bitOr: t] > ifFalse: [i nextUnsigned: 11])] > > My BitInputStream and BitOutputStream classes are, > um, not really mature. They aren't *completely* > naive, but they could be a lot better, and in > particular, BitInputStream>>nextUnsigned: and > BitOutputStream>>nextUnsigned:put: are definitely > suboptimal. I put this out there just to suggest > that there is a completely different way of thinking > about the problem. (Actually, this isn't *entirely* > unlike using Erlang bit syntax.) > > Bit*Streams are useful enough to justify primitive > support. (Which my classes don't have yet. I did > say they are not mature...) > > This reminds me of a lesson I learned many years > ago: STRINGS ARE WRONG. (Thank you, designers of > Burroughs Extended Algol!) When trees aren't the > answer, streams often are. > > > > > On 6 March 2018 at 07:21, Esteban A. Maringolo <[hidden email]> wrote: >> >> 2018-03-05 14:02 GMT-03:00 Stephane Ducasse <[hidden email]>: >> > On Sun, Mar 4, 2018 at 9:43 PM, Esteban A. Maringolo >> > <[hidden email]> wrote: >> >> 2018-03-04 17:15 GMT-03:00 Sven Van Caekenberghe <[hidden email]>: >> >>> Bits are actually numbered from right to left (seen from how they are >> >>> printed). >> >> >> >> I understand bit operations, used it extensively with IP address eons >> >> ago. >> >> >> >> But if a spec says: "Take the first n bits from the hash", it means >> >> the first significant bits. >> >> so in 2r100101111 the first 3 bits are "100" and not "111". >> > >> > naive question: why? >> >> Because it says so. >> "A checksum is generated by taking the first ENT / 32 bits of its >> SHA256 hash. This checksum is appended to the end of the initial >> entropy. >> Next, these concatenated bits are split into groups of 11 bits, each >> encoding a number from 0-2047, serving as an index into a wordlist. >> Finally, we convert these numbers into words and use the joined words >> as a mnemonic sentence." [1]. >> >> > To me it looks like a lousy specification. >> >> It might be, I can't really tell. >> >> But such manipulation could be useful if you are knitting different >> parts of a binary packet whose boundaries are not at byte level, but >> bit instead. So you can take "these 5 bits, concatenate with this >> other 7, add 13 zero bits, then 1 followed by the payload". I'm >> assuming a non real case here though, my use case was fulfilled >> already. >> >> Regards! >> >> -- >> Esteban A. Maringolo >> >> [1] >> https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki#generating-the-mnemonic >> > |
Re "Strings are wrong", the influences on my thinking went like this. (1) As an undergraduate, I did exercises with the Euler compiler (written in Burroughs Algol) and the XPL compiler (written in XPL). XPL had immutable strings, rather like AWK or Java, and a PL/I-ish set of operations on them. Burroughs Algol didn't have strings, but did have pointers, that you could use to stream through arrays. It turned out to be notably easier to do tokenising in Algol than in XPL. (2) A friend of mine, while a masters student, made some pocket money by writing a "batch editor" (same application domain as sed, but different command set) for IBM mainframes under MVS, which he did using PL/I. PL/I fans used to point to its string operations as a good collection, but he had to fight PL/I every step of the way to get the string operations he needed. When I met UNIX V7 on a PDP-11, it took me 15 pages of C to accomplish what he needed more than 50 pages of PL/I to do. And my editor had undo... The trick was being able to make just the right text abstractions that I needed. (3) A company I worked at had a program that generated assembly code for several different machines from a common specification. The original version manipulated instructions as *trees*. A new hire was asked to write a new generator because the old one was getting harder to maintain and we wanted it faster. His program represented instructions as *strings*. It turned out to be slower and harder to maintain than the original. (4) I encountered Eugene Myers' AVL DAG data structure (which quite efficiently lets you maintain many versions of a large text) and the Rope data structure (basically a lazy concatenation tree) that accompanies the Boehm garbage collector. Both of them get a lot of efficiency out of NOT doing concatenations. (5) Learning Java, I found that Java had taken a bullet to the head which Smalltalk had dodged, and which had been very clearly explained in the coloured books. Consider the selectors #printOn: and #printString. There are two ways to implement them: the smart (Smalltalk) and the spectacularly stupid (Java). Smart way: #printOn: is basic, #printString is derived. Stupid way: #printString is basic and #printOn: is derived. Why is this smart? If you are generating several MB of text *in order to write it to an external sink*, then #printOn: does the job very efficiently. But #printString not only turns over large amounts of memory, giving the GC more work than it really needs, it is fatally easy to fall into quadratic time behaviour. (5) There's the insufficiently famous epigram by Alan Perlis: "The string is a stark data structure, and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information." One way to understand this is that it's quite easy to put information *into* a string, but it can be quite hard to get it out again. I have even run into examples where the getting-it-out-reliably part was accidentally impossible. What the heck, let's take Dates as an example (not of impossibility). What does '180612' mean? What does dateString copyFrom: 3 to: 4 give you? But aData monthNumber is hard to misunderstand. (6) Then there's the whole scripting attack (Little Bobby Tables) thing. The rule of thumb that I use is that if you are just accepting a datum, maybe stashing it somewhere, and eventually giving it back, then a string is fine. For example, considering the huge variety in the way people's names may be structured, strings are pretty much the ideal way to represent them. You *shouldn't* be extracting "the middle initial" because some people don't *have* one and some people have more than one. (I am sick of my name being rejected as invalid by American-designed programs that insist that a surname cannot have an apostrophe or two capital letters. No Irish need apply.) But if you want to process structured information, you want to get the data out of string form and into some kind of tree as soon as you can. In fact, you don't want a string form in the first place. I have seen people manipulating XML with string operations, and it's not a pretty sight. And when we're talking about something like (REAL) object-oriented programming in a language like Pharo, we really should be thinking in terms of objects other than strings. For example, if we take VMS as an example, [.FOO]BAR.TXT;1 and <.FOO>BAR.TXT.1 are the same *filename* while they are different *strings*, and resolving the filename relative to DSK:[ICK.ACK] isn't terribly similar to string concatenation. (Alphabetic case? Let's not go there.) This is precisely why there are classes like FileLocator and Path. On 6 March 2018 at 16:33, Esteban A. Maringolo <[hidden email]> wrote: Hi Richard, |
Administrator
|
+100. Strings are "dead bits", lacking behaviour specific to what they represent. Your date example is an excellent portrayal of that. They should never be used for holding structured data. Think of strings (dead bits) in much the same way we think of paper (dead trees). On Mar 6, 2018 05:46, "Richard O'Keefe" <[hidden email]> wrote:
|
In reply to this post by Esteban A. Maringolo
Esteban A. Maringolo wrote
> Hi Richard, > > Certainly a BitStream is beyond what I initially thought. > > But it could be of some use to play with it, is it available somewhere > with a friendly open source license? > > I might give it a try using it for my generator (tests are already > passing). Your example is almost complete, but if I understand it > you're adding the size of the checksum instead of the checksum itself. > Nonetheless I think its readability is superior, and I guess that it > is better perfomance and memorywise. > > Also it could be useful for Base58Encoder I'm expermenting with. > Encoding is "simple" but requires the use of a really large integer, > I'm stuck at the decoding part now. > After that there is a Base32 encoder (to implement a Bech32 encoder as > well). > > So I might use it in my road of experimentation with these matters. > Unless I diverge from this and abandon it as it normally happens. :/ BaseX encode/decode can be simple enough to fit in a small workspace: base := 58. base58Lookup := '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'. myNumber := 12345678901234567890. decode := [ :string | |decodedNumber| decodedNumber := 0. string do: [ :char | decodedNumber := decodedNumber * base +(base58Lookup indexOf: char) - 1 ]. decodedNumber ]. encode := [ :number | |encodedString numDigits toEncode| numDigits := (number numberOfDigitsInBase: base). toEncode := number. encodedString := String new: numDigits. 0 to: numDigits - 1 do: [ :i | |lutIndex| encodedString at: numDigits - i put: (base58Lookup at: toEncode \\ base + 1). toEncode := toEncode // base. ]. encodedString ]. myString := encode value: myNumber. (decode value: myString) = myNumber. The appeal of Base64 is it transforms the // and \\ operations into simple shifts / masks. Cheers, Henry -- Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html |
2018-03-06 12:06 GMT-03:00 Henrik Sperre Johansen
<[hidden email]>: > Esteban A. Maringolo wrote > BaseX encode/decode can be simple enough to fit in a small workspace: > > base := 58. > base58Lookup := > '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'. > myNumber := 12345678901234567890. > > myString := encode value: myNumber. > (decode value: myString) = myNumber. If you're encoding numbers to strings, and viceversa, that's ok. If you also need to support encoding strings (as bytearrays) that didn't work for me. I like your approach, it was more concise and didn't require reversing the output as the one i used [1] > The appeal of Base64 is it transforms the // and \\ operations into simple > shifts / masks. I guess, every bit and byte operation is simpler when happens in multiples of 8 :) Regards! [1] https://github.com/eMaringolo/pharo-base58/blob/master/src/Base58-Core.package/Base58Encoder.class/instance/encode..st Esteban A. Maringolo |
Free forum by Nabble | Edit this page |