ByteArray accessors - questions about some of the accessors

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
cbc
Reply | Threaded
Open this post in threaded view
|

ByteArray accessors - questions about some of the accessors

cbc
Hi.

First and foremost, this is not about an issue caused by the recent enhancements to the ByteArray accessors for 64-bit manipulation, but rather a series of questions that has been brewing in my mind during the discussions.

I notice that I can't pass negative numbers to the various #unsigned..At:put:bigEndian:  methods.  While this may be on purpose (since they are UNSIGNED), it is still annoying.  I happen to know that our numbers are stored roughly two-complement, meaning that -1 is the same as 16rFF, 16rFFFF, 16rFFFFFFFF, etc - whatever size I choose (since our digits are never end - we essentially have a infinite length twos compliment negative numbers).  Yet I can't store them with the unsigned, getting an improper store each time I try.  All of these fail:

ba at: 1 put: -1
ba unsignedShortAt: 1 put: -1 bigEndian: false
ba unsignedLongAt: 1 put: -1 bigEndian: false
ba unsignedLong64At: 1 put: -1 bigEndian: false

Is this intentional, and I should stop trying to abuse the methods?  If so, I'll stop.

Still:

ba at: 1 put: -1
ba shortAt: 1 put: -1 bigEndian: false
ba longAt: 1 put: -1 bigEndian: false
ba long64At: 1 put: -1 bigEndian: false

Only the second and third of the above work- we haven't yet added the fourth one, and apparently we signed byte access has never been there.  (And there are uses of signed bytes out there - google it - but I have not personally needed it.  Still, we could/should be complete.)

The other issue found (by accident) is that #longAt:put:bigEndian and #unsignedLongAt:put:bigEndian: allow for arbitrarily large numbers to be passed to them - way bigger than 32 bits - and stores the end 32 bits off of them to the array.  (Except you can't pass SOME negative numbers like -1 to #unsignedLongAt:put:bigEndian: - go figure).  This should be fixed - but it is a very long-standing bug.

-cbc


Reply | Threaded
Open this post in threaded view
|

Re: ByteArray accessors - questions about some of the accessors

David T. Lewis
Hi Chris,

I have not been following these updates too closely, but I can say without
hesitation that using -1 as shorthand for 16rff or 16rffff (or whatever)
in the context of unsigned integer fields would be a bad thing to do, and
if our accessors are treating it as an error, then that is good.

It might be good to document this in the form of unit tests that explain
that the failure is intentional.

Dave

> Hi.
>
> First and foremost, this is not about an issue caused by the recent
> enhancements to the ByteArray accessors for 64-bit manipulation, but
> rather
> a series of questions that has been brewing in my mind during the
> discussions.
>
> I notice that I can't pass negative numbers to the various
> #unsigned..At:put:bigEndian:  methods.  While this may be on purpose
> (since
> they are UNSIGNED), it is still annoying.  I happen to know that our
> numbers are stored roughly two-complement, meaning that -1 is the same as
> 16rFF, 16rFFFF, 16rFFFFFFFF, etc - whatever size I choose (since our
> digits
> are never end - we essentially have a infinite length twos compliment
> negative numbers).  Yet I can't store them with the unsigned, getting an
> improper store each time I try.  All of these fail:
>
> ba at: 1 put: -1
> ba unsignedShortAt: 1 put: -1 bigEndian: false
> ba unsignedLongAt: 1 put: -1 bigEndian: false
> ba unsignedLong64At: 1 put: -1 bigEndian: false
>
> Is this intentional, and I should stop trying to abuse the methods?  If
> so,
> I'll stop.
>
> Still:
>
> ba at: 1 put: -1
> ba shortAt: 1 put: -1 bigEndian: false
> ba longAt: 1 put: -1 bigEndian: false
> ba long64At: 1 put: -1 bigEndian: false
>
> Only the second and third of the above work- we haven't yet added the
> fourth one, and apparently we signed byte access has never been there.
>  (And there are uses of signed bytes out there - google it - but I have
> not
> personally needed it.  Still, we could/should be complete.)
>
> The other issue found (by accident) is that #longAt:put:bigEndian and
> #unsignedLongAt:put:bigEndian: allow for arbitrarily large numbers to be
> passed to them - way bigger than 32 bits - and stores the end 32 bits off
> of them to the array.  (Except you can't pass SOME negative numbers like
> -1
> to #unsignedLongAt:put:bigEndian: - go figure).  This should be fixed -
> but
> it is a very long-standing bug.
>
> -cbc
>
>



Reply | Threaded
Open this post in threaded view
|

Re: ByteArray accessors - questions about some of the accessors

Levente Uzonyi-2
These methods don't validate their arguments, just like most other
Smalltalk methods. I'm pretty sure the unsigned*put* methods would accept
LargeNegativeIntegers as input. The same applies to the signed methods
with larger than acceptable inputs (e.g. shortAt:put:bigEndian: will
accept 65535 and store it as if it were -1).

Levente

On Wed, 9 Sep 2015, David T. Lewis wrote:

> Hi Chris,
>
> I have not been following these updates too closely, but I can say without
> hesitation that using -1 as shorthand for 16rff or 16rffff (or whatever)
> in the context of unsigned integer fields would be a bad thing to do, and
> if our accessors are treating it as an error, then that is good.
>
> It might be good to document this in the form of unit tests that explain
> that the failure is intentional.
>
> Dave
>
>> Hi.
>>
>> First and foremost, this is not about an issue caused by the recent
>> enhancements to the ByteArray accessors for 64-bit manipulation, but
>> rather
>> a series of questions that has been brewing in my mind during the
>> discussions.
>>
>> I notice that I can't pass negative numbers to the various
>> #unsigned..At:put:bigEndian:  methods.  While this may be on purpose
>> (since
>> they are UNSIGNED), it is still annoying.  I happen to know that our
>> numbers are stored roughly two-complement, meaning that -1 is the same as
>> 16rFF, 16rFFFF, 16rFFFFFFFF, etc - whatever size I choose (since our
>> digits
>> are never end - we essentially have a infinite length twos compliment
>> negative numbers).  Yet I can't store them with the unsigned, getting an
>> improper store each time I try.  All of these fail:
>>
>> ba at: 1 put: -1
>> ba unsignedShortAt: 1 put: -1 bigEndian: false
>> ba unsignedLongAt: 1 put: -1 bigEndian: false
>> ba unsignedLong64At: 1 put: -1 bigEndian: false
>>
>> Is this intentional, and I should stop trying to abuse the methods?  If
>> so,
>> I'll stop.
>>
>> Still:
>>
>> ba at: 1 put: -1
>> ba shortAt: 1 put: -1 bigEndian: false
>> ba longAt: 1 put: -1 bigEndian: false
>> ba long64At: 1 put: -1 bigEndian: false
>>
>> Only the second and third of the above work- we haven't yet added the
>> fourth one, and apparently we signed byte access has never been there.
>>  (And there are uses of signed bytes out there - google it - but I have
>> not
>> personally needed it.  Still, we could/should be complete.)
>>
>> The other issue found (by accident) is that #longAt:put:bigEndian and
>> #unsignedLongAt:put:bigEndian: allow for arbitrarily large numbers to be
>> passed to them - way bigger than 32 bits - and stores the end 32 bits off
>> of them to the array.  (Except you can't pass SOME negative numbers like
>> -1
>> to #unsignedLongAt:put:bigEndian: - go figure).  This should be fixed -
>> but
>> it is a very long-standing bug.
>>
>> -cbc
>>
>>
>
>
>
>

cbc
Reply | Threaded
Open this post in threaded view
|

Re: ByteArray accessors - questions about some of the accessors

cbc


On Wed, Sep 9, 2015 at 2:20 PM, Levente Uzonyi <[hidden email]> wrote:
These methods don't validate their arguments, just like most other Smalltalk methods. I'm pretty sure the unsigned*put* methods would accept LargeNegativeIntegers as input. The same applies to the signed methods with larger than acceptable inputs (e.g. shortAt:put:bigEndian: will accept 65535 and store it as if it were -1).


On Wed, 9 Sep 2015, David T. Lewis wrote:

Hi Chris,

I have not been following these updates too closely, but I can say without
hesitation that using -1 as shorthand for 16rff or 16rffff (or whatever)
in the context of unsigned integer fields would be a bad thing to do, and
if our accessors are treating it as an error, then that is good.

It might be good to document this in the form of unit tests that explain
that the failure is intentional.

Dave
So, I agree with Dave on this being a bad practice, and stop doing it.  And his suggestion of documenting these in a form of tests makes sense, too.  But first, what is the right test?

Looking at the behavior of these ByteArray accessors both before and after tweaking them to make them fast, I'd like some consensus on what we want as an answer.  There has been a slight change in the behavior of #longAt:put:bigEndian:, which is more consistent with the signed version, although I think I liked the old behavior better (and would prefer to have both versions change to mirror that behavior).  Once this is set, then consistent tests can also be build for the new unsignedLong64At:put:bigEndian: method as well.

Here is the (rather lengthy) examples:

largeNegative := -16rFFFFFFFFFF.
smallNegative := -16rF.
largePositive := 16rFFFFFFFFFF.
smallPositive := 16rF.

largeNegative isLarge "==> true "
smallPositive isLarge "==> false"
largePositive isLarge "==> true"
smallPositive isLarge "==> false"

largeNegative < 0 "==> true"
smallPositive < 0 "==> false"
largePositive < 0 "==> false"
smallPositive < 0 "==> false"

ba := ByteArray new: 8.

Prior to UL speed/clarity improvements:
------------------------------------------------------------

ba at: 1 put: smallPositive " ==> 15 "
ba at: 1 put: smallNegative " ==> improper store - error"
ba at: 1 put: largePositive " ==> improper store - error"
ba at: 1 put: largeNegative " ==> improper store - error"

ba shortAt: 1 put: smallPositive bigEndian: false " ==> 15 "
ba shortAt: 1 put: smallNegative bigEndian: false " ==> -15 "
ba shortAt: 1 put: largePositive bigEndian: false " ==> improper store - error"
ba shortAt: 1 put: largeNegative bigEndian: false " ==> improper store - error"

ba unsignedShortAt: 1 put: smallPositive bigEndian: false " ==> 15 "
ba unsignedShortAt: 1 put: smallNegative bigEndian: false " ==> improper store - error"
ba unsignedShortAt: 1 put: largePositive bigEndian: false " ==> improper store - error"
ba unsignedShortAt: 1 put: largeNegative bigEndian: false " ==> improper store - error"

ba longAt: 1 put: smallPositive bigEndian: false " ==> 15 "
ba longAt: 1 put: smallNegative bigEndian: false " ==> -15 "
ba longAt: 1 put: largePositive bigEndian: false " ==> 1099511627775"
(ba longAt: 1 bigEndian: false) " ==> -1"
largePositive digitLength " ==> 5"
ba longAt: 1 put: largeNegative bigEndian: false " ==> -1099511627775"
(ba longAt: 1 bigEndian: false) " ==> 1"
largeNegative digitLength " ==>  5"

^^^ This is what the code did - which is not ideal.  Picked digits out of the middle of a number and stored them!

ba unsignedLongAt: 1 put: smallPositive bigEndian: false " ==> 15 "
ba unsignedLongAt: 1 put: smallNegative bigEndian: false " ==> improper store - error"
ba unsignedLongAt: 1 put: largePositive bigEndian: false " ==> improper store - error"
ba unsignedLongAt: 1 put: largeNegative bigEndian: false " ==> improper store - error"


After UL speed/clarity improvements:
------------------------------------------------------------

#at:put: did not change (untouched by UL)

#shortAt:put:bigEndian: and #unsignedShortAt:put:bigEndian: work exactly as before

#longAt:put:bigEndian: works exactly as before.

#unsignedLongAt:put:bigEndian:, however, has changed behavior:

ba unsignedLongAt: 1 put: smallPositive bigEndian: false " ==> 15 "
ba unsignedLongAt: 1 put: smallNegative bigEndian: false " ==> improper store - error"
ba unsignedLongAt: 1 put: largePositive bigEndian: false " ==> 1099511627775"
ba unsignedLongAt: 1 put: largeNegative bigEndian: false " ==> -1099511627775"

(ba unsignedLongAt: 1 bigEndian: false) " ==> 4294967295" 
largePositive digitLength " ==> 5"
(ba unsignedLongAt: 1 bigEndian: false) " ==> 4294967295" 
largeNegative digitLength " ==>  5"

I'd prefer to see both #longAt:put:bigEndian: and #unsignedLongAt:put:bigEndian: raise errors when the largeInteger has more than 5 digits - it should fail as it does for all the other cases.
Note that once we switch to 64Bit VM/Image, I expect that these will fail as well (since they will use the SmallInteger branch of the code).

-cbc