DateAndTime hashing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

DateAndTime hashing

David T. Lewis
Following the recent UTC updates to Chronology-Core, the hash for DateAndTime is:

DateAndTime>>hash
        ^utcMicroseconds hash

This implies the need for a couple of follow up actions. I should have done
"HashedCollection rehashAll" somewhere in the latest updates. This would be
necessary for any hashed collections keyed by DateAndTime. I also need to
update two unit tests that document the hashing, and that now fail.

Before I do anything about these, does anyone have any comments or concerns
regarding how equality testing and hashing are being done for DateAndTime?

Thanks,
Dave

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hashing

David T. Lewis
I don't think that this has a practical impact, but I will mention it
in case I am overlooking something. The hash for a DateAndTime in a
64-bit image will be different from the hash for the same DateAndTime
in a 32-bit image, because the utcMicroseconds value in a 64-bit Spur
image is a SmallInteger, and in a 32-bit image it is a LargePositiveInteger.

This might be a concern if there is some scenario in which two images of
different word size are referring to the same hashed collection (possibly
via Magma or Gemstone?).

Dave


On Sat, Jan 05, 2019 at 03:30:23PM -0500, David T. Lewis wrote:

> Following the recent UTC updates to Chronology-Core, the hash for DateAndTime is:
>
> DateAndTime>>hash
> ^utcMicroseconds hash
>
> This implies the need for a couple of follow up actions. I should have done
> "HashedCollection rehashAll" somewhere in the latest updates. This would be
> necessary for any hashed collections keyed by DateAndTime. I also need to
> update two unit tests that document the hashing, and that now fail.
>
> Before I do anything about these, does anyone have any comments or concerns
> regarding how equality testing and hashing are being done for DateAndTime?
>
> Thanks,
> Dave
>

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hashing

Eliot Miranda-2
Hi David,

> On Jan 5, 2019, at 8:17 PM, David T. Lewis <[hidden email]> wrote:
>
> I don't think that this has a practical impact, but I will mention it
> in case I am overlooking something. The hash for a DateAndTime in a
> 64-bit image will be different from the hash for the same DateAndTime
> in a 32-bit image, because the utcMicroseconds value in a 64-bit Spur
> image is a SmallInteger, and in a 32-bit image it is a LargePositiveInteger.

This was taken care of for this kind of scenario in

Name: Kernel-eem.1198
Author: eem
Time: 24 November 2018, 1:44:47.526422 pm
UUID: 100137c4-2514-4b7f-9064-3dcdfe7d8cc9
Ancestors: Kernel-eem.1197

Redefine LargePositiveInteger hash for compatibility between 32-bit
and 64-bit systems.

>
> This might be a concern if there is some scenario in which two images of
> different word size are referring to the same hashed collection (possibly
> via Magma or Gemstone?).
>
> Dave


_,,,^..^,,,_ (phone)

>
>> On Sat, Jan 05, 2019 at 03:30:23PM -0500, David T. Lewis wrote:
>> Following the recent UTC updates to Chronology-Core, the hash for DateAndTime is:
>>
>> DateAndTime>>hash
>>    ^utcMicroseconds hash
>>
>> This implies the need for a couple of follow up actions. I should have done
>> "HashedCollection rehashAll" somewhere in the latest updates. This would be
>> necessary for any hashed collections keyed by DateAndTime. I also need to
>> update two unit tests that document the hashing, and that now fail.
>>
>> Before I do anything about these, does anyone have any comments or concerns
>> regarding how equality testing and hashing are being done for DateAndTime?
>>
>> Thanks,
>> Dave
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hashing

David T. Lewis
On Sat, Jan 05, 2019 at 08:53:52PM -0800, Eliot Miranda wrote:

> Hi David,
>
> > On Jan 5, 2019, at 8:17 PM, David T. Lewis <[hidden email]> wrote:
> >
> > I don't think that this has a practical impact, but I will mention it
> > in case I am overlooking something. The hash for a DateAndTime in a
> > 64-bit image will be different from the hash for the same DateAndTime
> > in a 32-bit image, because the utcMicroseconds value in a 64-bit Spur
> > image is a SmallInteger, and in a 32-bit image it is a LargePositiveInteger.
>
> This was taken care of for this kind of scenario in
>
> Name: Kernel-eem.1198
> Author: eem
> Time: 24 November 2018, 1:44:47.526422 pm
> UUID: 100137c4-2514-4b7f-9064-3dcdfe7d8cc9
> Ancestors: Kernel-eem.1197
>
> Redefine LargePositiveInteger hash for compatibility between 32-bit
> and 64-bit systems.

Ah, right, I forgot about that update. I was looking at an outdated 32-bit image.

Thanks!
Dave


Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hashing

Chris Muller-3
> On Sat, Jan 05, 2019 at 08:53:52PM -0800, Eliot Miranda wrote:
> > Hi David,
> >
> > > On Jan 5, 2019, at 8:17 PM, David T. Lewis <[hidden email]> wrote:
> > >
> > > I don't think that this has a practical impact, but I will mention it
> > > in case I am overlooking something. The hash for a DateAndTime in a
> > > 64-bit image will be different from the hash for the same DateAndTime
> > > in a 32-bit image, because the utcMicroseconds value in a 64-bit Spur
> > > image is a SmallInteger, and in a 32-bit image it is a LargePositiveInteger.
> >
> > This was taken care of for this kind of scenario in
> >
> > Name: Kernel-eem.1198
> > Author: eem
> > Time: 24 November 2018, 1:44:47.526422 pm
> > UUID: 100137c4-2514-4b7f-9064-3dcdfe7d8cc9
> > Ancestors: Kernel-eem.1197
> >
> > Redefine LargePositiveInteger hash for compatibility between 32-bit
> > and 64-bit systems.

Shouldn't the condition should be  ^self digitLength <= 4, instead of 8?

See?

32-bit image
   1073741824 hash =    230045764

64-bit image
   1073741824 hash  = 1073741824

I must be missing something.  I can't believe we didn't notice this
and went into trunk before asking this question...




> Ah, right, I forgot about that update. I was looking at an outdated 32-bit image.
>
> Thanks!
> Dave
>
>

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hashing

David T. Lewis
On Sun, Jan 06, 2019 at 05:53:23PM -0600, Chris Muller wrote:

> > On Sat, Jan 05, 2019 at 08:53:52PM -0800, Eliot Miranda wrote:
> > > Hi David,
> > >
> > > > On Jan 5, 2019, at 8:17 PM, David T. Lewis <[hidden email]> wrote:
> > > >
> > > > I don't think that this has a practical impact, but I will mention it
> > > > in case I am overlooking something. The hash for a DateAndTime in a
> > > > 64-bit image will be different from the hash for the same DateAndTime
> > > > in a 32-bit image, because the utcMicroseconds value in a 64-bit Spur
> > > > image is a SmallInteger, and in a 32-bit image it is a LargePositiveInteger.
> > >
> > > This was taken care of for this kind of scenario in
> > >
> > > Name: Kernel-eem.1198
> > > Author: eem
> > > Time: 24 November 2018, 1:44:47.526422 pm
> > > UUID: 100137c4-2514-4b7f-9064-3dcdfe7d8cc9
> > > Ancestors: Kernel-eem.1197
> > >
> > > Redefine LargePositiveInteger hash for compatibility between 32-bit
> > > and 64-bit systems.
>
> Shouldn't the condition should be  ^self digitLength <= 4, instead of 8?
>
> See?
>
> 32-bit image
>    1073741824 hash =    230045764
>
> 64-bit image
>    1073741824 hash  = 1073741824
>
> I must be missing something.  I can't believe we didn't notice this
> and went into trunk before asking this question...
>

Eeek!

This is crying out for a new unit test called IntegerTest>>testHash that documents
the expected hash values over various boundary conditions, including maxVal 32/64,
minVal 32/64, and digitLength 4 or 8. The test needs to pass on 32 and 64 bit images.

Dave


>
>
>
> > Ah, right, I forgot about that update. I was looking at an outdated 32-bit image.
> >
> > Thanks!
> > Dave
> >
> >
>

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hashing

Eliot Miranda-2
In reply to this post by Chris Muller-3
Hi Chris,

On Sun, Jan 6, 2019 at 3:54 PM Chris Muller <[hidden email]> wrote:
> On Sat, Jan 05, 2019 at 08:53:52PM -0800, Eliot Miranda wrote:
> > Hi David,
> >
> > > On Jan 5, 2019, at 8:17 PM, David T. Lewis <[hidden email]> wrote:
> > >
> > > I don't think that this has a practical impact, but I will mention it
> > > in case I am overlooking something. The hash for a DateAndTime in a
> > > 64-bit image will be different from the hash for the same DateAndTime
> > > in a 32-bit image, because the utcMicroseconds value in a 64-bit Spur
> > > image is a SmallInteger, and in a 32-bit image it is a LargePositiveInteger.
> >
> > This was taken care of for this kind of scenario in
> >
> > Name: Kernel-eem.1198
> > Author: eem
> > Time: 24 November 2018, 1:44:47.526422 pm
> > UUID: 100137c4-2514-4b7f-9064-3dcdfe7d8cc9
> > Ancestors: Kernel-eem.1197
> >
> > Redefine LargePositiveInteger hash for compatibility between 32-bit
> > and 64-bit systems.

Shouldn't the condition should be  ^self digitLength <= 4, instead of 8?

Of c course not.  It should be that all integers of 64-bits or less are their rowen hashes.

See?

32-bit image
   1073741824 hash =    230045764

64-bit image
   1073741824 hash  = 1073741824

No.  You must be using an un-updated image.  Here's what I get:

Smalltalk wordSize 4
(2 raisedTo: 63) hash = (2 raisedTo: 63) true
(SmallInteger maxVal + 1) hash = (SmallInteger maxVal + 1) true


Smalltalk wordSize 8
(2 raisedTo: 63) hash = (2 raisedTo: 63) true
(SmallInteger maxVal + 1) hash = (SmallInteger maxVal + 1) true
 

I must be missing something.  I can't believe we didn't notice this
and went into trunk before asking this question...

Um, please.  I /did/ ask the question, I /did/ test.  I /do know/ that the idea is for integers of 64-bits or less be their own hashes.  Your result clearly indicates that you're using some erroneous configuration.
 




> Ah, right, I forgot about that update. I was looking at an outdated 32-bit image.
>
> Thanks!
> Dave
>
>



--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hashing

Chris Muller-3
>> > > Name: Kernel-eem.1198
>> > > Author: eem
>> > > Time: 24 November 2018, 1:44:47.526422 pm
>> > > UUID: 100137c4-2514-4b7f-9064-3dcdfe7d8cc9
>> > > Ancestors: Kernel-eem.1197
>> > >
>> > > Redefine LargePositiveInteger hash for compatibility between 32-bit
>> > > and 64-bit systems.
>>
>> Shouldn't the condition should be  ^self digitLength <= 4, instead of 8?
>
>
> Of c course not.  It should be that all integers of 64-bits or less are their rowen hashes.
>>
>>
>> See?
>>
>> 32-bit image
>>    1073741824 hash =    230045764
>>
>> 64-bit image
>>    1073741824 hash  = 1073741824
>
>
> No.  You must be using an un-updated image.  Here's what I get:
>
> Smalltalk wordSize 4
> (2 raisedTo: 63) hash = (2 raisedTo: 63) true
> (SmallInteger maxVal + 1) hash = (SmallInteger maxVal + 1) true
>
>
> Smalltalk wordSize 8
> (2 raisedTo: 63) hash = (2 raisedTo: 63) true
> (SmallInteger maxVal + 1) hash = (SmallInteger maxVal + 1) true

Whew!  You're right, and I knew I wasn't running an updated image, I
thought I could read and understand such little code while watching
some NFL playoff game, but I misread it.  :/   Sorry.

>> I must be missing something.  I can't believe we didn't notice this
>> and went into trunk before asking this question...
>
>
> Um, please.  I /did/ ask the question, I /did/ test.  I /do know/ that the idea is for integers of 64-bits or less be their own hashes.  Your result clearly indicates that you're using some erroneous configuration.

My question was directed at Dave and his question, not you (e.g.,
going in trunk and THEN asking about hash).  But, I can see how you
thought that by my not inlining properly, sorry.  I put a TON of hours
into testing UTCDateAndtime over the holidays.  I had to.

Best,
  Chris

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hashing

David T. Lewis
On Mon, Jan 07, 2019 at 01:36:10PM -0600, Chris Muller wrote:

> >> > > Name: Kernel-eem.1198
> >> > > Author: eem
> >> > > Time: 24 November 2018, 1:44:47.526422 pm
> >> > > UUID: 100137c4-2514-4b7f-9064-3dcdfe7d8cc9
> >> > > Ancestors: Kernel-eem.1197
> >> > >
> >> > > Redefine LargePositiveInteger hash for compatibility between 32-bit
> >> > > and 64-bit systems.
> >>
> >> Shouldn't the condition should be  ^self digitLength <= 4, instead of 8?
> >
> >
> > Of c course not.  It should be that all integers of 64-bits or less are their rowen hashes.
> >>
> >>
> >> See?
> >>
> >> 32-bit image
> >>    1073741824 hash =    230045764
> >>
> >> 64-bit image
> >>    1073741824 hash  = 1073741824
> >
> >
> > No.  You must be using an un-updated image.  Here's what I get:
> >
> > Smalltalk wordSize 4
> > (2 raisedTo: 63) hash = (2 raisedTo: 63) true
> > (SmallInteger maxVal + 1) hash = (SmallInteger maxVal + 1) true
> >
> >
> > Smalltalk wordSize 8
> > (2 raisedTo: 63) hash = (2 raisedTo: 63) true
> > (SmallInteger maxVal + 1) hash = (SmallInteger maxVal + 1) true
>
> Whew!  You're right, and I knew I wasn't running an updated image, I
> thought I could read and understand such little code while watching
> some NFL playoff game, but I misread it.  :/   Sorry.
>
> >> I must be missing something.  I can't believe we didn't notice this
> >> and went into trunk before asking this question...
> >
> >
> > Um, please.  I /did/ ask the question, I /did/ test.  I /do know/ that the idea is for integers of 64-bits or less be their own hashes.  Your result clearly indicates that you're using some erroneous configuration.
>
> My question was directed at Dave and his question, not you (e.g.,
> going in trunk and THEN asking about hash).  But, I can see how you
> thought that by my not inlining properly, sorry.  I put a TON of hours
> into testing UTCDateAndtime over the holidays.  I had to.

Thanks Chris, the testing has been very helpful.

Here is what I said a couple of weeks ago prior to moving the updates to trunk
(http://lists.squeakfoundation.org/pipermail/squeak-dev/2018-December/200994.html):

> Once moved to trunk, there will be eight failing tests in Chronology-Tests.
> Each of these represents an issue that I think should be addressed in
> discussion on the squeak-dev list.

Of the eight failing tests that I mentioned, two were for the hash function,
and that is what I was following up on in this thread. I wanted to make sure
that I was not overlooking anything before I updated the tests to match the
new hash function.

As far as I can there is no problem with the hashing now, aside from the
fact that I neglected to do a HashedCollection rehashAll (this still needs
to be done). So the unit tests are updated to match the hash changes, and
those two tests are now green.

But there are still five test failures left to discuss, so more to follow :-)

Dave