What's up on build.squeak.org

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Bert Freudenberg
On 25.05.2014, at 19:48, David T. Lewis <[hidden email]> wrote:

> Performance of the UTC based DateAndTime is generally favorable compared to
> the original. Here is what I see on my system (smaller numbers are better).
>
> LXTestDateAndTimePerformance test results using the original Squeak DateAndTime
> on an interpreter VM:
> {
> #testNow->10143 .
> #testEquals->30986 .
> #testGreaterThan->80199 .
> #testLessThan->75912 .
> #testPrintString->10429 .
> #testStringAsDateAndTime->44657
> }
>
> LXTestDateAndTimePerformance test results using the new UTC based DateAndTime
> on an interpreter VM:
> {
> #testNow->6423 .
> #testEquals->31625 .
> #testGreaterThan->22999 .
> #testLessThan->18514 .
> #testPrintString->12502 .
> #testStringAsDateAndTime->32912
> }
Hi Dave,

just curious: did you test the performance without the LargeInt primitives?

- Bert -





smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
On Wed, May 28, 2014 at 11:49:08AM +0200, Bert Freudenberg wrote:

> On 25.05.2014, at 19:48, David T. Lewis <[hidden email]> wrote:
>
> > Performance of the UTC based DateAndTime is generally favorable compared to
> > the original. Here is what I see on my system (smaller numbers are better).
> >
> > LXTestDateAndTimePerformance test results using the original Squeak DateAndTime
> > on an interpreter VM:
> > {
> > #testNow->10143 .
> > #testEquals->30986 .
> > #testGreaterThan->80199 .
> > #testLessThan->75912 .
> > #testPrintString->10429 .
> > #testStringAsDateAndTime->44657
> > }
> >
> > LXTestDateAndTimePerformance test results using the new UTC based DateAndTime
> > on an interpreter VM:
> > {
> > #testNow->6423 .
> > #testEquals->31625 .
> > #testGreaterThan->22999 .
> > #testLessThan->18514 .
> > #testPrintString->12502 .
> > #testStringAsDateAndTime->32912
> > }
>
> Hi Dave,
>
> just curious: did you test the performance without the LargeInt primitives?
>
> - Bert -
>

For the new UTC implementation with the primitive disabled, fallback code is
much slower for DateAndTime class>>now.

{
        #testNow->36939 .
        #testEquals->29015 .
        #testGreaterThan->21142 .
        #testLessThan->17586 .
        #testPrintString->11809 .
        #testStringAsDateAndTime->30918
}

Dave



Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Ben Coman
In reply to this post by David T. Lewis
David T. Lewis wrote:
On Mon, May 26, 2014 at 06:16:03PM +0000, J. Vuletich (mail lists) wrote:
  
Quoting "David T. Lewis" [hidden email]:

    
...
      
Ian did a build of the Windows VM that should have the necessary support. 
Try
the Squeak4.1.2.2612 VM from http://squeakvm.org/win32/.
      
That VM fails for
	<primitive: 'primitiveUtcWithOffset'>
	<primitive: 240> primUtcMicrosecondClock
	<primitive: 241> primLocalMicrosecondClock

    
One thing to note - if the primitive is not present, DateAndTime will fall
back on the old logic, and it should produce reasonable results.
      
Yes, Cuis already does that... I'd prefer to be able to assume that  
any future VM will provide 	<primitive: 'primitiveUtcWithOffset'> and  
clean the code. Would that asking for too much?
    

You should expect the following two primitives to be present in all VMs
(comments are from the VMM trunk implementations):

primitiveUtcWithOffset
	"Answer an array with UTC microseconds since the Posix epoch and
	the current seconds offset from GMT in the local time zone. An empty
	two element array may be supplied as a parameter.
	This is a named (not numbered) primitive in the null module (ie the VM)"

primitiveUTCMicrosecondClock
	"Answer the UTC microseconds since the Smalltalk epoch. The value is
	derived from the Posix epoch (see primitiveUTCMicrosecondClock) with a
	constant offset corresponding to elapsed microseconds between the two
	epochs according to RFC 868."

  
Since both answer microseconds, then second seems a little confusing.  Calling them something like #primitivePosixUtcWithOffset and #primitiveSmalltalkUtc seems more intention revealing.

cheers -ben


Reply | Threaded
Open this post in threaded view
|

Re: What's up on build.squeak.org

Frank Shearar-3
In reply to this post by Tobias Pape
On 25 May 2014 21:52, Tobias Pape <[hidden email]> wrote:

>
> On 25.05.2014, at 22:43, Nicolas Cellier <[hidden email]> wrote:
>
>>
>> 2014-05-25 18:34 GMT+02:00 Nicolas Cellier <[hidden email]>:
>>
>> 2014-05-25 2:39 GMT+02:00 Chris Muller <[hidden email]>:
>>
>> Hi, thanks for noticing and investigating this!  Seeing this now, I
>> think we should signal an Error rather than a Warning to be more
>> TestCase friendly but also because persisting empty packages is so
>> painful, it should be an error, hands down.  Better to force a
>> resolution to the issue than silently persist modules of
>> future-pain...
>>
>>
>> In any case, I think you accidently uncovered bugs in Tests-Monticello.
>> I'm looking at a fix for a few hours, and it's really messy.
>>
>>
>> Hooray! After publishing Tests-nice.297 there is now a trunk CI job that finished
>> http://build.squeak.org/job/SqueakTrunk/857/
>>
>> Now we can see we have some regressions...
>> How could we live without CI before?
>>
>> Of course, for dissecting which change introduced which regression after a 2 month interrupt, that's going to be more pain than necessary...
>>
>
> To not make that happen again, can we make the jenkins mail squeak-dev on
> important (read me, not all) information?
> Like:
>         more tests fail
>         errors
>         (possibly one nightly information)

We possibly could, but I'd suggest _not_ doing so because even without
Nicolas's work (thanks very much, Nicolas!) we _normally_ have failing
tests, which means that the normal status of the SqueakTrunk build
would not be green.

Really, the ideal place to be is for the job to usually be green,
because then CI saying anything at all is cause for alarm, not yet
another false positive. And we're just not there, on a number of
fronts.

frank

> best
>         -tobias
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: What's up on build.squeak.org

Frank Shearar-3
On 28 May 2014 21:06, Frank Shearar <[hidden email]> wrote:

> On 25 May 2014 21:52, Tobias Pape <[hidden email]> wrote:
>>
>> On 25.05.2014, at 22:43, Nicolas Cellier <[hidden email]> wrote:
>>
>>>
>>> 2014-05-25 18:34 GMT+02:00 Nicolas Cellier <[hidden email]>:
>>>
>>> 2014-05-25 2:39 GMT+02:00 Chris Muller <[hidden email]>:
>>>
>>> Hi, thanks for noticing and investigating this!  Seeing this now, I
>>> think we should signal an Error rather than a Warning to be more
>>> TestCase friendly but also because persisting empty packages is so
>>> painful, it should be an error, hands down.  Better to force a
>>> resolution to the issue than silently persist modules of
>>> future-pain...
>>>
>>>
>>> In any case, I think you accidently uncovered bugs in Tests-Monticello.
>>> I'm looking at a fix for a few hours, and it's really messy.
>>>
>>>
>>> Hooray! After publishing Tests-nice.297 there is now a trunk CI job that finished
>>> http://build.squeak.org/job/SqueakTrunk/857/
>>>
>>> Now we can see we have some regressions...
>>> How could we live without CI before?
>>>
>>> Of course, for dissecting which change introduced which regression after a 2 month interrupt, that's going to be more pain than necessary...
>>>
>>
>> To not make that happen again, can we make the jenkins mail squeak-dev on
>> important (read me, not all) information?
>> Like:
>>         more tests fail
>>         errors
>>         (possibly one nightly information)
>
> We possibly could, but I'd suggest _not_ doing so because even without
> Nicolas's work (thanks very much, Nicolas!) we _normally_ have failing
> tests, which means that the normal status of the SqueakTrunk build
> would not be green.
>
> Really, the ideal place to be is for the job to usually be green,
> because then CI saying anything at all is cause for alarm, not yet
> another false positive. And we're just not there, on a number of
> fronts.

I should add that it doesn't help that build agents that run for a
long time - Java processes - eventually run out of PermGen space and
stop working. I've pinged the owner of two of the slaves - thanks,
Tony! - and we're tentatively trying out a hack so disgusting I'm not
going to talk about it.

frank

>> best
>>         -tobias

Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
In reply to this post by David T. Lewis
On Wed, May 28, 2014 at 08:13:56AM -0400, David T. Lewis wrote:

> On Wed, May 28, 2014 at 11:49:08AM +0200, Bert Freudenberg wrote:
> > On 25.05.2014, at 19:48, David T. Lewis <[hidden email]> wrote:
> >
> > > Performance of the UTC based DateAndTime is generally favorable compared to
> > > the original. Here is what I see on my system (smaller numbers are better).
> > >
> > > LXTestDateAndTimePerformance test results using the original Squeak DateAndTime
> > > on an interpreter VM:
> > > {
> > > #testNow->10143 .
> > > #testEquals->30986 .
> > > #testGreaterThan->80199 .
> > > #testLessThan->75912 .
> > > #testPrintString->10429 .
> > > #testStringAsDateAndTime->44657
> > > }
> > >
> > > LXTestDateAndTimePerformance test results using the new UTC based DateAndTime
> > > on an interpreter VM:
> > > {
> > > #testNow->6423 .
> > > #testEquals->31625 .
> > > #testGreaterThan->22999 .
> > > #testLessThan->18514 .
> > > #testPrintString->12502 .
> > > #testStringAsDateAndTime->32912
> > > }
> >
> > Hi Dave,
> >
> > just curious: did you test the performance without the LargeInt primitives?
> >
> > - Bert -
> >
>
> For the new UTC implementation with the primitive disabled, fallback code is
> much slower for DateAndTime class>>now.
>
> {
> #testNow->36939 .
> #testEquals->29015 .
> #testGreaterThan->21142 .
> #testLessThan->17586 .
> #testPrintString->11809 .
> #testStringAsDateAndTime->30918
> }
>

I just fixed a glitch in the Jenkins job that keeps the 64-bit image updated, and
it occurred to me that I should try the UTC DateAndTime in a 64-bit image (image
format 68002, see http://build.squeak.org/job/Squeak%2064-bit%20image/).

The results surprised me. I was expecting the 64-bit image to be slower (which
I *think* is the case generally, based on the interactive feel). But the 64-bit image
is faster for the standard DateAndTime, and also for the UTC based DateAndTime.
The UTC DateAndTime on 64-bit image seems to be the fastest of any of the combinations
that I have tested, aside from some degradation in printString processing.

The standard Squeak DateAndTime yields this in a 64-bit image:
{
        #testNow->8225 .
        #testEquals->23880 .
        #testGreaterThan->66041 .
        #testLessThan->62482 .
        #testPrintString->9473 .
        #testStringAsDateAndTime->39384
}

And the new UTC implemention gives this in a 64-bit image:
{
        #testNow->5494 .
        #testEquals->24902 .
        #testGreaterThan->17162 .
        #testLessThan->14169 .
        #testPrintString->11173 .
        #testStringAsDateAndTime->28108
}

For reference, the original standard DateAndTime on a 32-bit image (see above) gave
me these numbers. Note the difference in the basic magnitude operations, testing for
one DateAndTime instance greater than or less than another.
{
        #testNow->10143 .
        #testEquals->30986 .
        #testGreaterThan->80199 .
        #testLessThan->75912 .
        #testPrintString->10429 .
        #testStringAsDateAndTime->44657
}

Dave


Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Levente Uzonyi-2
In reply to this post by David T. Lewis
On Tue, 27 May 2014, David T. Lewis wrote:

> On Tue, May 27, 2014 at 09:55:33PM +0200, Nicolas Cellier wrote:
>> 2014-05-27 4:30 GMT+02:00 Chris Muller <[hidden email]>:
>>
>>> The issue actually relates purely to Squeak domain models.  Consider
>>> the case of an all-in-memory object model in Squeak, with no database
>>> involved at all.  It is very feasible an app would want to import a
>>> flat-file dataset that involves creation a few million DateAndTime
>>> instances (along with other objects, of course) to the point where
>>> memory constraints begin to be noticed.
>>>
>>> When dealing with this level of prolifigation-potential of a
>>> particular class, and for such a base data-type we don't want to
>>> endure changing again, I want us to strongly scrutinize the internal
>>> representation.
>>>
>>> In this case, the use of 'utcMicroseconds' introduces a lot of
>>> duplicate bit-patterns in memory that are very hard, if not
>>> impossible, to share.
>>>
>>> The simplest case are two equivalent instances of DateAndTime (read
>>> from separate files).  Despite being equivalent, their
>>> utcMicroseconds' will be separate objects each consuming separate
>>> memory space.  There is no easy way to share the same
>>> 'utcMicroseconds' instance between them.
>>>
>>> But fully-equivalent DateAndTime's is not even half of the concern --
>>> the high-order bits of every DateAndTime's 'utcMicroseconds'
>>> duplicates the same bit pattern, again and again, eating up memory.
>>>
>>> That doesn't happen when the internal representations are, or can be,
>>> canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
>>> original representation requires two additional slots per instance,
>>> but the _contents_ of those slots are SmallIntegers -- shared memory.
>>>
>>>
>> Well, in current 32 bit image format, SmallInteger are not exactly shared,
>> they are immediate values.
>> Each consumes exactly 32 bits.
>>
>> For a compact class like LargePosOrNegInteger, I don't remember what is the
>> header size exactly, but you get 64 bits for data, I would be surprised to
>> see a major difference wrt consumed memory.
>>
>
> Smalltalk compactClassesArray includes: DateAndTime ==> false
> Smalltalk compactClassesArray includes: LargePositiveInteger ==> true
>
> So for the traditional DateAndTime implementation, an instance requires:
>
>  2 words of header (64 bits)
>  3 words for the small integer jdn/seconds/nanos variables
>  1 word for the pointer to the offset object, which is an instance of Duration
>
> In practice, most instances of DateAndTime within an image will share the
> same offset object, so for purposes of estimation assume that this takes
> no extra space.
>
> Thus each instance requires 6 words of space in the object memory (maybe a bit
> more on average if the DateAndTime instances are not sharing the same Duration
> instance for one reason or another).
>
> For the UTC based implementation of DateAndTime, each instance requires:
>
>  2 words of header
>  1 word for the small integer localOffsetSeconds variable
>  1 word for the pointer to the LargePositiveInteger representing utcMicroSeconds
>  1 word of header for the large positive integer
>  2 words of data for the value of the large positive integer
>
> Thus each instance requires 7 words of space in the object memory.
>
> So there is a difference, but it would probably not be a large effect on
> overall space utilization, even assuming complete sharing of the offset
> Duration instances.

I think it's possible to reduce the number of words to 5 at the cost of
reusing integer primitives. If DateAndTime is a variable byte class, then
it can hold the utcMicroSeconds in 8 variable slots (2 words). I don't
know if the LargeInteger primitives would work with it, but I think they
should, so comparison and arithmetic methods could be based on them.

But it's probably not worth to care about this, because Spur will
change these things.


Levente

>
> Dave
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Chris Muller-3
It's probably possible to get it down to 3 words if DateAndTime were
represented as one canonicalized 'date', one canonicalized 'time'
(precise to the second), and one SmallInteger for millis or micros..

The more and higher-level parts a DateAndTime can be constructed with,
the better the opportunity for memory-optimization.  Conversely, the
more an implementation moves more toward a 'binary data'
representation, the fewer of those bits can be shared and, therefore,
the more that will be duplicated across instances.

On Mon, Jun 2, 2014 at 2:21 PM, Levente Uzonyi <[hidden email]> wrote:

> On Tue, 27 May 2014, David T. Lewis wrote:
>
>> On Tue, May 27, 2014 at 09:55:33PM +0200, Nicolas Cellier wrote:
>>>
>>> 2014-05-27 4:30 GMT+02:00 Chris Muller <[hidden email]>:
>>>
>>>> The issue actually relates purely to Squeak domain models.  Consider
>>>> the case of an all-in-memory object model in Squeak, with no database
>>>> involved at all.  It is very feasible an app would want to import a
>>>> flat-file dataset that involves creation a few million DateAndTime
>>>> instances (along with other objects, of course) to the point where
>>>> memory constraints begin to be noticed.
>>>>
>>>> When dealing with this level of prolifigation-potential of a
>>>> particular class, and for such a base data-type we don't want to
>>>> endure changing again, I want us to strongly scrutinize the internal
>>>> representation.
>>>>
>>>> In this case, the use of 'utcMicroseconds' introduces a lot of
>>>> duplicate bit-patterns in memory that are very hard, if not
>>>> impossible, to share.
>>>>
>>>> The simplest case are two equivalent instances of DateAndTime (read
>>>> from separate files).  Despite being equivalent, their
>>>> utcMicroseconds' will be separate objects each consuming separate
>>>> memory space.  There is no easy way to share the same
>>>> 'utcMicroseconds' instance between them.
>>>>
>>>> But fully-equivalent DateAndTime's is not even half of the concern --
>>>> the high-order bits of every DateAndTime's 'utcMicroseconds'
>>>> duplicates the same bit pattern, again and again, eating up memory.
>>>>
>>>> That doesn't happen when the internal representations are, or can be,
>>>> canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
>>>> original representation requires two additional slots per instance,
>>>> but the _contents_ of those slots are SmallIntegers -- shared memory.
>>>>
>>>>
>>> Well, in current 32 bit image format, SmallInteger are not exactly
>>> shared,
>>> they are immediate values.
>>> Each consumes exactly 32 bits.
>>>
>>> For a compact class like LargePosOrNegInteger, I don't remember what is
>>> the
>>> header size exactly, but you get 64 bits for data, I would be surprised
>>> to
>>> see a major difference wrt consumed memory.
>>>
>>
>> Smalltalk compactClassesArray includes: DateAndTime ==> false
>> Smalltalk compactClassesArray includes: LargePositiveInteger ==> true
>>
>> So for the traditional DateAndTime implementation, an instance requires:
>>
>>  2 words of header (64 bits)
>>  3 words for the small integer jdn/seconds/nanos variables
>>  1 word for the pointer to the offset object, which is an instance of
>> Duration
>>
>> In practice, most instances of DateAndTime within an image will share the
>> same offset object, so for purposes of estimation assume that this takes
>> no extra space.
>>
>> Thus each instance requires 6 words of space in the object memory (maybe a
>> bit
>> more on average if the DateAndTime instances are not sharing the same
>> Duration
>> instance for one reason or another).
>>
>> For the UTC based implementation of DateAndTime, each instance requires:
>>
>>  2 words of header
>>  1 word for the small integer localOffsetSeconds variable
>>  1 word for the pointer to the LargePositiveInteger representing
>> utcMicroSeconds
>>  1 word of header for the large positive integer
>>  2 words of data for the value of the large positive integer
>>
>> Thus each instance requires 7 words of space in the object memory.
>>
>> So there is a difference, but it would probably not be a large effect on
>> overall space utilization, even assuming complete sharing of the offset
>> Duration instances.
>
>
> I think it's possible to reduce the number of words to 5 at the cost of
> reusing integer primitives. If DateAndTime is a variable byte class, then it
> can hold the utcMicroSeconds in 8 variable slots (2 words). I don't know if
> the LargeInteger primitives would work with it, but I think they should, so
> comparison and arithmetic methods could be based on them.
>
> But it's probably not worth to care about this, because Spur will change
> these things.
>
>
> Levente
>
>>
>> Dave
>>
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Chris Muller-3
On Mon, Jun 2, 2014 at 3:16 PM, Chris Muller <[hidden email]> wrote:
> It's probably possible to get it down to 3 words if DateAndTime were
> represented as one canonicalized 'date', one canonicalized 'time'
> (precise to the second), and one SmallInteger for millis or micros..

Forgot about the offset so, okay, 4 words.  Plus the object header, 2 words.

Hmm, that is sounding very familiar!  :)

> The more and higher-level parts a DateAndTime can be constructed with,
> the better the opportunity for memory-optimization.  Conversely, the
> more an implementation moves more toward a 'binary data'
> representation, the fewer of those bits can be shared and, therefore,
> the more that will be duplicated across instances.
>
> On Mon, Jun 2, 2014 at 2:21 PM, Levente Uzonyi <[hidden email]> wrote:
>> On Tue, 27 May 2014, David T. Lewis wrote:
>>
>>> On Tue, May 27, 2014 at 09:55:33PM +0200, Nicolas Cellier wrote:
>>>>
>>>> 2014-05-27 4:30 GMT+02:00 Chris Muller <[hidden email]>:
>>>>
>>>>> The issue actually relates purely to Squeak domain models.  Consider
>>>>> the case of an all-in-memory object model in Squeak, with no database
>>>>> involved at all.  It is very feasible an app would want to import a
>>>>> flat-file dataset that involves creation a few million DateAndTime
>>>>> instances (along with other objects, of course) to the point where
>>>>> memory constraints begin to be noticed.
>>>>>
>>>>> When dealing with this level of prolifigation-potential of a
>>>>> particular class, and for such a base data-type we don't want to
>>>>> endure changing again, I want us to strongly scrutinize the internal
>>>>> representation.
>>>>>
>>>>> In this case, the use of 'utcMicroseconds' introduces a lot of
>>>>> duplicate bit-patterns in memory that are very hard, if not
>>>>> impossible, to share.
>>>>>
>>>>> The simplest case are two equivalent instances of DateAndTime (read
>>>>> from separate files).  Despite being equivalent, their
>>>>> utcMicroseconds' will be separate objects each consuming separate
>>>>> memory space.  There is no easy way to share the same
>>>>> 'utcMicroseconds' instance between them.
>>>>>
>>>>> But fully-equivalent DateAndTime's is not even half of the concern --
>>>>> the high-order bits of every DateAndTime's 'utcMicroseconds'
>>>>> duplicates the same bit pattern, again and again, eating up memory.
>>>>>
>>>>> That doesn't happen when the internal representations are, or can be,
>>>>> canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
>>>>> original representation requires two additional slots per instance,
>>>>> but the _contents_ of those slots are SmallIntegers -- shared memory.
>>>>>
>>>>>
>>>> Well, in current 32 bit image format, SmallInteger are not exactly
>>>> shared,
>>>> they are immediate values.
>>>> Each consumes exactly 32 bits.
>>>>
>>>> For a compact class like LargePosOrNegInteger, I don't remember what is
>>>> the
>>>> header size exactly, but you get 64 bits for data, I would be surprised
>>>> to
>>>> see a major difference wrt consumed memory.
>>>>
>>>
>>> Smalltalk compactClassesArray includes: DateAndTime ==> false
>>> Smalltalk compactClassesArray includes: LargePositiveInteger ==> true
>>>
>>> So for the traditional DateAndTime implementation, an instance requires:
>>>
>>>  2 words of header (64 bits)
>>>  3 words for the small integer jdn/seconds/nanos variables
>>>  1 word for the pointer to the offset object, which is an instance of
>>> Duration
>>>
>>> In practice, most instances of DateAndTime within an image will share the
>>> same offset object, so for purposes of estimation assume that this takes
>>> no extra space.
>>>
>>> Thus each instance requires 6 words of space in the object memory (maybe a
>>> bit
>>> more on average if the DateAndTime instances are not sharing the same
>>> Duration
>>> instance for one reason or another).
>>>
>>> For the UTC based implementation of DateAndTime, each instance requires:
>>>
>>>  2 words of header
>>>  1 word for the small integer localOffsetSeconds variable
>>>  1 word for the pointer to the LargePositiveInteger representing
>>> utcMicroSeconds
>>>  1 word of header for the large positive integer
>>>  2 words of data for the value of the large positive integer
>>>
>>> Thus each instance requires 7 words of space in the object memory.
>>>
>>> So there is a difference, but it would probably not be a large effect on
>>> overall space utilization, even assuming complete sharing of the offset
>>> Duration instances.
>>
>>
>> I think it's possible to reduce the number of words to 5 at the cost of
>> reusing integer primitives. If DateAndTime is a variable byte class, then it
>> can hold the utcMicroSeconds in 8 variable slots (2 words). I don't know if
>> the LargeInteger primitives would work with it, but I think they should, so
>> comparison and arithmetic methods could be based on them.
>>
>> But it's probably not worth to care about this, because Spur will change
>> these things.
>>
>>
>> Levente
>>
>>>
>>> Dave
>>>
>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
In reply to this post by Levente Uzonyi-2
On Mon, Jun 02, 2014 at 09:21:06PM +0200, Levente Uzonyi wrote:

> On Tue, 27 May 2014, David T. Lewis wrote:
>
> >On Tue, May 27, 2014 at 09:55:33PM +0200, Nicolas Cellier wrote:
> >>2014-05-27 4:30 GMT+02:00 Chris Muller <[hidden email]>:
> >>
> >>>The issue actually relates purely to Squeak domain models.  Consider
> >>>the case of an all-in-memory object model in Squeak, with no database
> >>>involved at all.  It is very feasible an app would want to import a
> >>>flat-file dataset that involves creation a few million DateAndTime
> >>>instances (along with other objects, of course) to the point where
> >>>memory constraints begin to be noticed.
> >>>
> >>>When dealing with this level of prolifigation-potential of a
> >>>particular class, and for such a base data-type we don't want to
> >>>endure changing again, I want us to strongly scrutinize the internal
> >>>representation.
> >>>
> >>>In this case, the use of 'utcMicroseconds' introduces a lot of
> >>>duplicate bit-patterns in memory that are very hard, if not
> >>>impossible, to share.
> >>>
> >>>The simplest case are two equivalent instances of DateAndTime (read
> >>>from separate files).  Despite being equivalent, their
> >>>utcMicroseconds' will be separate objects each consuming separate
> >>>memory space.  There is no easy way to share the same
> >>>'utcMicroseconds' instance between them.
> >>>
> >>>But fully-equivalent DateAndTime's is not even half of the concern --
> >>>the high-order bits of every DateAndTime's 'utcMicroseconds'
> >>>duplicates the same bit pattern, again and again, eating up memory.
> >>>
> >>>That doesn't happen when the internal representations are, or can be,
> >>>canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
> >>>original representation requires two additional slots per instance,
> >>>but the _contents_ of those slots are SmallIntegers -- shared memory.
> >>>
> >>>
> >>Well, in current 32 bit image format, SmallInteger are not exactly shared,
> >>they are immediate values.
> >>Each consumes exactly 32 bits.
> >>
> >>For a compact class like LargePosOrNegInteger, I don't remember what is
> >>the
> >>header size exactly, but you get 64 bits for data, I would be surprised to
> >>see a major difference wrt consumed memory.
> >>
> >
> >Smalltalk compactClassesArray includes: DateAndTime ==> false
> >Smalltalk compactClassesArray includes: LargePositiveInteger ==> true
> >
> >So for the traditional DateAndTime implementation, an instance requires:
> >
> > 2 words of header (64 bits)
> > 3 words for the small integer jdn/seconds/nanos variables
> > 1 word for the pointer to the offset object, which is an instance of
> > Duration
> >
> >In practice, most instances of DateAndTime within an image will share the
> >same offset object, so for purposes of estimation assume that this takes
> >no extra space.
> >
> >Thus each instance requires 6 words of space in the object memory (maybe a
> >bit
> >more on average if the DateAndTime instances are not sharing the same
> >Duration
> >instance for one reason or another).
> >
> >For the UTC based implementation of DateAndTime, each instance requires:
> >
> > 2 words of header
> > 1 word for the small integer localOffsetSeconds variable
> > 1 word for the pointer to the LargePositiveInteger representing
> > utcMicroSeconds
> > 1 word of header for the large positive integer
> > 2 words of data for the value of the large positive integer
> >
> >Thus each instance requires 7 words of space in the object memory.
> >
> >So there is a difference, but it would probably not be a large effect on
> >overall space utilization, even assuming complete sharing of the offset
> >Duration instances.
>
> I think it's possible to reduce the number of words to 5 at the cost of
> reusing integer primitives. If DateAndTime is a variable byte class, then
> it can hold the utcMicroSeconds in 8 variable slots (2 words). I don't
> know if the LargeInteger primitives would work with it, but I think they
> should, so comparison and arithmetic methods could be based on them.
>

This probably would work, but I don't think that it would be a good thing
to do. The "microseconds" in the variable name is intended to indicate
the scale, not the actual numeric representation. For example, It would be
a Fraction in the case of parsing a DateAndTime with nanosecond precision
from a string.

> But it's probably not worth to care about this, because Spur will
> change these things.
>

Yes, for sure.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
In reply to this post by David T. Lewis
On Sun, May 25, 2014 at 01:48:44PM -0400, David T. Lewis wrote:
> I have been working on a variation of class DateAndTime that replaces its
> instance variables (seconds offset jdn nanos) with two instance variables,
> utcMicroseconds to represent microseconds elapsed since the Posix epoch, and
> localOffsetSeconds to represent the local time zone offset. When instantiating
> the time now, A single call primitiveUtcWithOffset is used to obtain these
> two values atomically as reported by the underlying platform.

I originally posted this as email to squeak-dev a couple of months ago. I have
now added a page for it on the swiki at http://wiki.squeak.org/squeak/6197 with
the original SAR file attached. I have also put the code into Monticello on SS3
at http://ss3.gemstone.com/ss/UTCDateAndTime. The SS3 repository has a few minor
updates since my original posting.

The SS3 archive has MCM update maps that, if loaded in sequence, can be used
to update a Squeak trunk image from the repository.

I'll probably put this on SqueakMap also, if I can find a nice way to do the
the updates for an arbitrary image. This is a tricky business, because instances
of DateAndTime need to change at the same time that the Monticello loader is
making heavy use of DateAndTime to load the MCZs. A good old fashioned SAR
is the safest way to handle it, but Monticello is probably more approachable for
most of us.

Dave


>
> There are several advantages to this representation of DateAndTime, the most
> important of which is that its magnitude is unambiguous regardless of daylight
> savings transitions in local time zones.
>
> This is my attempt to address some historical baggage in Squeak. The VM
> reports time related to the local time zone, and the image attempts to
> convert to UTC (sometimes incorrectly). A UTC based representation makes the
> implementation of time zone tables more straightforward (see for example
> the Olson time zone tables in TimeZoneDatabase on SqueakMap).
>
> I am attaching the source code as a SAR file that can be loaded into a fully
> updated Squeak trunk image. The conversion process is slow, so be patient
> if you load it.
>
> This can be run on either an intepreter VM or Cog, but if you use Cog, please
> use a version dated June 2013 or later (the VM in the Squeak 4.5 all-in-one
> is fine).
>
> I am also attaching a copy of LXTestDateAndTimePerformance, which can be
> used to compare the performance of some basic DateAndTime functions.
>
> Performance of the UTC based DateAndTime is generally favorable compared to
> the original. Here is what I see on my system (smaller numbers are better).
>
> LXTestDateAndTimePerformance test results using the original Squeak DateAndTime
> on an interpreter VM:
> {
> #testNow->10143 .
> #testEquals->30986 .
> #testGreaterThan->80199 .
> #testLessThan->75912 .
> #testPrintString->10429 .
> #testStringAsDateAndTime->44657
> }
>
> LXTestDateAndTimePerformance test results using the new UTC based DateAndTime
> on an interpreter VM:
> {
> #testNow->6423 .
> #testEquals->31625 .
> #testGreaterThan->22999 .
> #testLessThan->18514 .
> #testPrintString->12502 .
> #testStringAsDateAndTime->32912
> }
>
> (CC to Brent Pinkney, author of the excellent Squeak Chronology package)
>
> Dave
>


Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
In reply to this post by Bert Freudenberg
On Wed, May 28, 2014 at 11:49:08AM +0200, Bert Freudenberg wrote:

> On 25.05.2014, at 19:48, David T. Lewis <[hidden email]> wrote:
>
> > Performance of the UTC based DateAndTime is generally favorable compared to
> > the original. Here is what I see on my system (smaller numbers are better).
> >
> > LXTestDateAndTimePerformance test results using the original Squeak DateAndTime
> > on an interpreter VM:
> > {
> > #testNow->10143 .
> > #testEquals->30986 .
> > #testGreaterThan->80199 .
> > #testLessThan->75912 .
> > #testPrintString->10429 .
> > #testStringAsDateAndTime->44657
> > }
> >
> > LXTestDateAndTimePerformance test results using the new UTC based DateAndTime
> > on an interpreter VM:
> > {
> > #testNow->6423 .
> > #testEquals->31625 .
> > #testGreaterThan->22999 .
> > #testLessThan->18514 .
> > #testPrintString->12502 .
> > #testStringAsDateAndTime->32912
> > }
>
> Hi Dave,
>
> just curious: did you test the performance without the LargeInt primitives?
>
> - Bert -
>

Hi Bert,

Sorry for the extremely late reply. I had never tried the UTC DateAndTime
without a LargeIntegersPlugin, but I realize now that the question might be
important for the SqueakJS VM, so a ran some tests using an interpreter VM
with and without LargeIntegersPlugin.

The results were not at all what I expected. Very surprisingly (at least
to me), the plugin seems to make very little difference in the performance of
the UTC DateAndTime. The UTC DateAndTime seems to be a bit slower on
printString and equality testing (?), and quite a bit faster on the other
measures. But the LargeIntegersPlugin does not seem to make much of a
difference at all.

Here is what I saw on my machine today.

Standard DateAndTime, standard VM:
LXTestDateAndTimePerformance new test. ==>
{
        #testNow->9147 .
        #testEquals->28252 .
        #testGreaterThan->27358 .
        #testLessThan->24664 .
        #testPrintString->10955 .
        #testStringAsDateAndTime->48505
}

Standard DateAndTime, no LargeIntegersPlugin:
LXTestDateAndTimePerformance new test. ==>
{
        #testNow->28324 .
        #testEquals->27985 .
        #testGreaterThan->27837 .
        #testLessThan->23783 .
        #testPrintString->11321 .
        #testStringAsDateAndTime->46679
}

UTC DateAndTime, standard VM:
LXTestDateAndTimePerformance new test. ==>
{
        #testNow->5807 .
        #testEquals->28168 .
        #testGreaterThan->19764 .
        #testLessThan->16117 .
        #testPrintString->13212 .
        #testStringAsDateAndTime->36652
}

UTC DateAndTime, no LargeIntegersPlugin:
LXTestDateAndTimePerformance new test. ==>
{
        #testNow->5759 .
        #testEquals->30079 .
        #testGreaterThan->19934 .
        #testLessThan->16315 .
        #testPrintString->14125 .
        #testStringAsDateAndTime->38208
}



Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Bert Freudenberg

On 2014-08-24, at 22:11, David T. Lewis <[hidden email]> wrote:

> On Wed, May 28, 2014 at 11:49:08AM +0200, Bert Freudenberg wrote:
>> On 25.05.2014, at 19:48, David T. Lewis <[hidden email]> wrote:
>>
>>> Performance of the UTC based DateAndTime is generally favorable compared to
>>> the original. Here is what I see on my system (smaller numbers are better).
>>>
>>> LXTestDateAndTimePerformance test results using the original Squeak DateAndTime
>>> on an interpreter VM:
>>> {
>>> #testNow->10143 .
>>> #testEquals->30986 .
>>> #testGreaterThan->80199 .
>>> #testLessThan->75912 .
>>> #testPrintString->10429 .
>>> #testStringAsDateAndTime->44657
>>> }
>>>
>>> LXTestDateAndTimePerformance test results using the new UTC based DateAndTime
>>> on an interpreter VM:
>>> {
>>> #testNow->6423 .
>>> #testEquals->31625 .
>>> #testGreaterThan->22999 .
>>> #testLessThan->18514 .
>>> #testPrintString->12502 .
>>> #testStringAsDateAndTime->32912
>>> }
>>
>> Hi Dave,
>>
>> just curious: did you test the performance without the LargeInt primitives?
>>
>> - Bert -
>>
>
> Hi Bert,
>
> Sorry for the extremely late reply. I had never tried the UTC DateAndTime
> without a LargeIntegersPlugin, but I realize now that the question might be
> important for the SqueakJS VM, so a ran some tests using an interpreter VM
> with and without LargeIntegersPlugin.
Interesting. So for old DateAndTime the plugin makes a difference only for testNow, and for the new one not at all. Sounds good to me :)

- Bert -

> The results were not at all what I expected. Very surprisingly (at least
> to me), the plugin seems to make very little difference in the performance of
> the UTC DateAndTime. The UTC DateAndTime seems to be a bit slower on
> printString and equality testing (?), and quite a bit faster on the other
> measures. But the LargeIntegersPlugin does not seem to make much of a
> difference at all.
>
> Here is what I saw on my machine today.
>
> Standard DateAndTime, standard VM:
> LXTestDateAndTimePerformance new test. ==>
> {
> #testNow->9147 .
> #testEquals->28252 .
> #testGreaterThan->27358 .
> #testLessThan->24664 .
> #testPrintString->10955 .
> #testStringAsDateAndTime->48505
> }
>
> Standard DateAndTime, no LargeIntegersPlugin:
> LXTestDateAndTimePerformance new test. ==>
> {
> #testNow->28324 .
> #testEquals->27985 .
> #testGreaterThan->27837 .
> #testLessThan->23783 .
> #testPrintString->11321 .
> #testStringAsDateAndTime->46679
> }
>
> UTC DateAndTime, standard VM:
> LXTestDateAndTimePerformance new test. ==>
> {
> #testNow->5807 .
> #testEquals->28168 .
> #testGreaterThan->19764 .
> #testLessThan->16117 .
> #testPrintString->13212 .
> #testStringAsDateAndTime->36652
> }
>
> UTC DateAndTime, no LargeIntegersPlugin:
> LXTestDateAndTimePerformance new test. ==>
> {
> #testNow->5759 .
> #testEquals->30079 .
> #testGreaterThan->19934 .
> #testLessThan->16315 .
> #testPrintString->14125 .
> #testStringAsDateAndTime->38208
> }
>
>
>



smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Sean P. DeNigris
Administrator
In reply to this post by David T. Lewis
David T. Lewis wrote
the two main responsibilities of DateAndTime. One is
to represent time as a magnitude (for duration calculation, etc). The other
is to display time in the frame of reference of a local time zone. It is not
at all clear to me that those two responsibilies belong in the same class.
There is definitely something fishy with the combination. Consider Pharo's Date, which retroactively applies the current timezone to dates. This is bad because e.g.:
  '7/10/2014' asDate "evaluated the day before DST" ~= '7/10/2014' asDate "evaluated the day after DST"

I created an issue, which has been sitting on the bug tracker. Here's an excerpt from the discussion (https://pharo.fogbugz.com/default.asp?12147#BugEvent.90264):
I see that a particular location could reasonably be implied in a Date, e.g. if we're saying that a treaty was signed on such and such a date, we mean at the place where it was signed... it may have already been the next day in another time zone.

Now, what is beyond doubt, is that the timezone that would be implied is the timezone of the date itself, not an artifact of when the instance was created. In my example, I create '11/12/2013' asDate, and it assumes my timezone.
Since it's the summer in NY, that means -4 offset. Now, I evaluate '11/12/2013' asDate again in December. Again it implies my timezone, the offset of which has changed to -5! So now '11/12/2013' asDate (in NY) ~= '11/12/2013' asDate (in NY). Even though the offset of the date in question, which depends only on whether DST was in effect on 11/12/2013, has not changed.

So the current behavior of two equal dates being considered not-equal is as jarring as your counter example of two non-equal dates (different timezone) being considered equal.
There doesn't seem to be an easy, direct solution because I'm assuming we don't have a primitive for "the offset at this location on X date in the past". So the next best thing might be a PlatonicDate (obviously a name-in-progress :)) that represents the abstract idea of a Date without regard to location, and then an object to connect a particular place/offset to that concept (but the second one would have the same problem as we currently have).
Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
Hi Sean,

On Tue, Aug 26, 2014 at 04:21:09PM -0700, Sean P. DeNigris wrote:

> David T. Lewis wrote
> > the two main responsibilities of DateAndTime. One is
> > to represent time as a magnitude (for duration calculation, etc). The
> > other
> > is to display time in the frame of reference of a local time zone. It is
> > not
> > at all clear to me that those two responsibilies belong in the same class.
>
> There is definitely something fishy with the combination. Consider Pharo's
> Date, which retroactively applies the current timezone to dates. This is bad
> because e.g.:
>   '7/10/2014' asDate "evaluated the day before DST" ~= '7/10/2014' asDate
> "evaluated the day after DST"
>
> I created an issue, which has been sitting on the bug tracker. Here's an
> excerpt from the discussion
> (https://pharo.fogbugz.com/default.asp?12147#BugEvent.90264):
>
> > I see that a particular location could reasonably be implied in a Date,
> > e.g. if we're saying that a treaty was signed on such and such a date, we
> > mean at the place where it was signed... it may have already been the next
> > day in another time zone.
> >
> > Now, what is beyond doubt, is that the timezone that would be implied is
> > the timezone of the date itself, not an artifact of when the instance was
> > created. In my example, I create '11/12/2013' asDate, and it assumes my
> > timezone.
> > Since it's the summer in NY, that means -4 offset. Now, I evaluate
> > '11/12/2013' asDate again in December. Again it implies my timezone, the
> > offset of which has changed to -5! So now '11/12/2013' asDate (in NY) ~=
> > '11/12/2013' asDate (in NY). Even though the offset of the date in
> > question, which depends only on whether DST was in effect on 11/12/2013,
> > has not changed.
> >
> > So the current behavior of two equal dates being considered not-equal is
> > as jarring as your counter example of two non-equal dates (different
> > timezone) being considered equal.

The term "Date" can mean a lot of different things in different contexts,
so I think this is very much a matter of agreeing on definitions. As an
example, I work in the automotive manufacturing industry, where the term
"production date" has a very specific meaning that has very little to do
with time zones or specific points in time. But in general, most common
usage of the term "date" are meant to imply local time.

>
> There doesn't seem to be an easy, direct solution because I'm assuming we
> don't have a primitive for "the offset at this location on X date in the
> past".

You are exactly right. As far as I am aware, there is no straightforward
way to use the C runtime libraries to obtain the offset for an arbitrary
time_t in the context of some specified time zone. So this is not something
that can be easily supported in a primitive. On the other hand, the current
offset in the current time zone is easily obtained, and that is now provided
by the the #primitiveUtcWithOffset primitive.

Fortunately, if you have access to the time zone rules, it is quite easy
to do these sorts of calculation in the image. This would typically be done
using the publicly available Olson time zone tables. Several implementations
have been done in Smalltalk, including the one that I did at
http://www.squeaksource.com/TimeZoneDatabase.

> So the next best thing might be a PlatonicDate (obviously a
> name-in-progress :)) that represents the abstract idea of a Date without
> regard to location, and then an object to connect a particular place/offset
> to that concept (but the second one would have the same problem as we
> currently have).
>

I like that approach. Interestingly, if you look at an old version of Squeak
(ST-80), the Date class was implemented as you describe. It had no notion of
location or time zone. Maybe there is a better way now that we could represent
your PlatonicDate (aka ST-80?) such that it would collaborate with time zones and
DateAndTime, but possibly would not be tied to them in implementation.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Chris Muller-3
In reply to this post by Sean P. DeNigris
> So the current behavior of two equal dates being considered not-equal is
> as jarring as your counter example of two non-equal dates (different
> timezone) being considered equal.

Indeed!
 

There doesn't seem to be an easy, direct solution because I'm assuming we
don't have a primitive for "the offset at this location on X date in the
past". So the next best thing might be a PlatonicDate (obviously a
name-in-progress :)) that represents the abstract idea of a Date without
regard to location, and then an object to connect a particular place/offset
to that concept (but the second one would have the same problem as we
currently have).

Well, for me at least, the solution we chose for Squeak has been working well.  Here's the main discussion:




Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Nicolas Cellier
In reply to this post by Sean P. DeNigris



2014-08-27 1:21 GMT+02:00 Sean P. DeNigris <[hidden email]>:
David T. Lewis wrote
> the two main responsibilities of DateAndTime. One is
> to represent time as a magnitude (for duration calculation, etc). The
> other
> is to display time in the frame of reference of a local time zone. It is
> not
> at all clear to me that those two responsibilies belong in the same class.

There is definitely something fishy with the combination. Consider Pharo's
Date, which retroactively applies the current timezone to dates. This is bad
because e.g.:
  '7/10/2014' asDate "evaluated the day before DST" ~= '7/10/2014' asDate
"evaluated the day after DST"

I created an issue, which has been sitting on the bug tracker. Here's an
excerpt from the discussion
(https://pharo.fogbugz.com/default.asp?12147#BugEvent.90264):


For me, this is definitely a bug.
In current implementation, the date corresponding to a DST change should be either 23h or 25h long,
and this should not depend on active DST at the time you ask.

Of course, if we go into these complications, what to do when the DST rules change, but we ask for a Date anterior to the change?
Shall we store/maintain the DST rule history for each and every part of the world?


> I see that a particular location could reasonably be implied in a Date,
> e.g. if we're saying that a treaty was signed on such and such a date, we
> mean at the place where it was signed... it may have already been the next
> day in another time zone.
>
> Now, what is beyond doubt, is that the timezone that would be implied is
> the timezone of the date itself, not an artifact of when the instance was
> created. In my example, I create '11/12/2013' asDate, and it assumes my
> timezone.
> Since it's the summer in NY, that means -4 offset. Now, I evaluate
> '11/12/2013' asDate again in December. Again it implies my timezone, the
> offset of which has changed to -5! So now '11/12/2013' asDate (in NY) ~=
> '11/12/2013' asDate (in NY). Even though the offset of the date in
> question, which depends only on whether DST was in effect on 11/12/2013,
> has not changed.
>
> So the current behavior of two equal dates being considered not-equal is
> as jarring as your counter example of two non-equal dates (different
> timezone) being considered equal.

There doesn't seem to be an easy, direct solution because I'm assuming we
don't have a primitive for "the offset at this location on X date in the
past". So the next best thing might be a PlatonicDate (obviously a
name-in-progress :)) that represents the abstract idea of a Date without
regard to location, and then an object to connect a particular place/offset
to that concept (but the second one would have the same problem as we
currently have).



Of course, a PlatonicDate would be so much simpler...
It's a different thing though.

 

-----
Cheers,
Sean
--
View this message in context: http://forum.world.st/What-s-up-on-build-squeak-org-tp4760266p4774978.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.




123