What's up on build.squeak.org

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
On Mon, May 26, 2014 at 02:17:47PM +0000, J. Vuletich (mail lists) wrote:

> Hi David, Folks,
>
> Quoting "David T. Lewis" <[hidden email]>:
>
> >I have been working on a variation of class DateAndTime that replaces its
> >instance variables (seconds offset jdn nanos) with two instance variables,
> >utcMicroseconds to represent microseconds elapsed since the Posix epoch,
> >and
> >localOffsetSeconds to represent the local time zone offset. When  
> >instantiating
> >the time now, A single call primitiveUtcWithOffset is used to obtain these
> >two values atomically as reported by the underlying platform.
> >
> >There are several advantages to this representation of DateAndTime, the
> >most
> >important of which is that its magnitude is unambiguous regardless  
> >of daylight
> >savings transitions in local time zones.
> >
> >This is my attempt to address some historical baggage in Squeak. The VM
> >reports time related to the local time zone, and the image attempts to
> >convert to UTC (sometimes incorrectly). A UTC based representation makes
> >the
> >implementation of time zone tables more straightforward (see for example
> >the Olson time zone tables in TimeZoneDatabase on SqueakMap).
> >...
> >Dave
>
> I very much support this approach. I did a bit of testing of  
> <primitive: 'primitiveUtcWithOffset'> . I found that on a Mac, with  
> 'Croquet Closure Cog VM [CoInterpreter VMMaker.oscog-eem.331] Squeak  
> Cog 4.0.2776' 'Mac OS' 'intel' '1092' (from Eliot's site), the second  
> element I get (time zone offset) is -140473411.
>
> The correct value would be -10800, as answered in Windows. I could not  
> test on Linux yet (could not get the vm to run in Ubuntu 14.04 64 bit  
> :( ).
>
> Any clue on what's wrong on Mac OS?
>
> BTW, which would be the current non-Cog VMs to try?
>

Oops, I mistakenly said that the Cog VMs could be used. But it looks like
there is a regression or code merge problem of some sort. I'm afraid that
I was testing with my own locally compiled Cog VM and did not notice the
problem.

A unix Mac VM from squeakvm.org/unix should demonstrate the correct behavior.

CC to vm-dev list:

Eliot, the fix for this was here (but it seems to have been overridden by
a more recent change):

   Name: VMMaker.oscog-dtl.286
   Author: dtl
   Time: 4 May 2013, 11:29:25.237 am
   UUID: 8be237d9-7812-4792-9723-90f9cff0c2e9
   Ancestors: VMMaker.oscog-eem.285
   
   Replace broken primitiveUtcWithOffset with a version that works.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
In reply to this post by Louis LaBrunda
On Mon, May 26, 2014 at 10:48:16AM -0400, Louis LaBrunda wrote:

>
> On Sun, 25 May 2014 13:48:44 -0400, "David T. Lewis" <[hidden email]>
> wrote:
>
> >I have been working on a variation of class DateAndTime that replaces its
> >instance variables (seconds offset jdn nanos) with two instance variables,
> >utcMicroseconds to represent microseconds elapsed since the Posix epoch, and
> >localOffsetSeconds to represent the local time zone offset. When instantiating
> >the time now, A single call primitiveUtcWithOffset is used to obtain these
> >two values atomically as reported by the underlying platform.
> >
> >There are several advantages to this representation of DateAndTime, the most
> >important of which is that its magnitude is unambiguous regardless of daylight
> >savings transitions in local time zones.
> >
> Hi Dave,
>
> May I respectfully ask why localOffsetSeconds (to represent the local time
> zone offset) is needed?  It seems to me a UTC time is enough.  Is there
> really a need for the timezone offset the instance was created in?  Does
> every DateAndTime instance need to carry this offset around with it?  I
> would think the offset is only needed if one wants to display a date/time
> as a local value and then one could get the local offset from the VM or a
> program setting the user had previously supplied regardless of where the
> computer was setup to run.  I guess there might be some historic interest
> as to the timezone an instances (or many instances) was created in but one
> could just keep that as a separate value.

Hi Lou,

Good question. In fact, one of the reasons I like the UTC implementation is
that it helps clarify the two main responsibilities of DateAndTime. One is
to represent time as a magnitude (for duration calculation, etc). The other
is to display time in the frame of reference of a local time zone. It is not
at all clear to me that those two responsibilies belong in the same class.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Bert Freudenberg

On 26.05.2014, at 17:16, David T. Lewis <[hidden email]> wrote:

> On Mon, May 26, 2014 at 10:48:16AM -0400, Louis LaBrunda wrote:
>>
>> On Sun, 25 May 2014 13:48:44 -0400, "David T. Lewis" <[hidden email]>
>> wrote:
>>
>>> I have been working on a variation of class DateAndTime that replaces its
>>> instance variables (seconds offset jdn nanos) with two instance variables,
>>> utcMicroseconds to represent microseconds elapsed since the Posix epoch, and
>>> localOffsetSeconds to represent the local time zone offset. When instantiating
>>> the time now, A single call primitiveUtcWithOffset is used to obtain these
>>> two values atomically as reported by the underlying platform.
>>>
>>> There are several advantages to this representation of DateAndTime, the most
>>> important of which is that its magnitude is unambiguous regardless of daylight
>>> savings transitions in local time zones.
>>>
>> Hi Dave,
>>
>> May I respectfully ask why localOffsetSeconds (to represent the local time
>> zone offset) is needed?  It seems to me a UTC time is enough.  Is there
>> really a need for the timezone offset the instance was created in?  Does
>> every DateAndTime instance need to carry this offset around with it?  I
>> would think the offset is only needed if one wants to display a date/time
>> as a local value and then one could get the local offset from the VM or a
>> program setting the user had previously supplied regardless of where the
>> computer was setup to run.  I guess there might be some historic interest
>> as to the timezone an instances (or many instances) was created in but one
>> could just keep that as a separate value.
>
> Hi Lou,
>
> Good question. In fact, one of the reasons I like the UTC implementation is
> that it helps clarify the two main responsibilities of DateAndTime. One is
> to represent time as a magnitude (for duration calculation, etc). The other
> is to display time in the frame of reference of a local time zone. It is not
> at all clear to me that those two responsibilies belong in the same class.
>
> Dave
We need to be able to distinguish between local and universal time. It would be rather inconvenient if asking a DateAndTime for e.g. the hour would not be made to answer the local hour. Arguably the local time offset could be moved to a subclass, but having a single DateAndTime class is appealing for simplicity reasons, too.

- Bert -





smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

A UTC based implementation of DateAndTime

Louis LaBrunda
On Mon, 26 May 2014 17:29:19 +0200, Bert Freudenberg <[hidden email]>
wrote:

>
>On 26.05.2014, at 17:16, David T. Lewis <[hidden email]> wrote:
>
>> On Mon, May 26, 2014 at 10:48:16AM -0400, Louis LaBrunda wrote:
>>>
>>> On Sun, 25 May 2014 13:48:44 -0400, "David T. Lewis" <[hidden email]>
>>> wrote:
>>>
>>>> I have been working on a variation of class DateAndTime that replaces its
>>>> instance variables (seconds offset jdn nanos) with two instance variables,
>>>> utcMicroseconds to represent microseconds elapsed since the Posix epoch, and
>>>> localOffsetSeconds to represent the local time zone offset. When instantiating
>>>> the time now, A single call primitiveUtcWithOffset is used to obtain these
>>>> two values atomically as reported by the underlying platform.
>>>>
>>>> There are several advantages to this representation of DateAndTime, the most
>>>> important of which is that its magnitude is unambiguous regardless of daylight
>>>> savings transitions in local time zones.
>>>>
>>> Hi Dave,
>>>
>>> May I respectfully ask why localOffsetSeconds (to represent the local time
>>> zone offset) is needed?  It seems to me a UTC time is enough.  Is there
>>> really a need for the timezone offset the instance was created in?  Does
>>> every DateAndTime instance need to carry this offset around with it?  I
>>> would think the offset is only needed if one wants to display a date/time
>>> as a local value and then one could get the local offset from the VM or a
>>> program setting the user had previously supplied regardless of where the
>>> computer was setup to run.  I guess there might be some historic interest
>>> as to the timezone an instances (or many instances) was created in but one
>>> could just keep that as a separate value.
>>
>> Hi Lou,
>>
>> Good question. In fact, one of the reasons I like the UTC implementation is
>> that it helps clarify the two main responsibilities of DateAndTime. One is
>> to represent time as a magnitude (for duration calculation, etc). The other
>> is to display time in the frame of reference of a local time zone. It is not
>> at all clear to me that those two responsibilies belong in the same class.
>>
>> Dave
>
>We need to be able to distinguish between local and universal time. It would be rather inconvenient if asking a DateAndTime for e.g. the hour would not be made to answer the local hour. Arguably the local time offset could be moved to a subclass, but having a single DateAndTime class is appealing for simplicity reasons, too.
>- Bert -

I guess a DateAndTime class (without offset) could always answer 0 for the
offset and a DateAndTimeWithOffset subclass could carry and answer the
offset.  Methods could be provided to morph one into the other if desired.

Lou
-----------------------------------------------------------
Louis LaBrunda
Keystone Software Corp.
SkypeMe callto://PhotonDemon
mailto:[hidden email] http://www.Keystone-Software.com


Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

J. Vuletich (mail lists)
In reply to this post by David T. Lewis

Quoting "David T. Lewis" <[hidden email]>:

> On Mon, May 26, 2014 at 02:17:47PM +0000, J. Vuletich (mail lists) wrote:
>> Hi David, Folks,
>> ...
>> The correct value would be -10800, as answered in Windows. I could not
>> test on Linux yet (could not get the vm to run in Ubuntu 14.04 64 bit
>> :( ).
>>
>> Any clue on what's wrong on Mac OS?
>>
>> BTW, which would be the current non-Cog VMs to try?
>>
>
> Oops, I mistakenly said that the Cog VMs could be used. But it looks like
> there is a regression or code merge problem of some sort. I'm afraid that
> I was testing with my own locally compiled Cog VM and did not notice the
> problem.
>
> A unix Mac VM from squeakvm.org/unix should demonstrate the correct behavior.
>
> CC to vm-dev list:
>
> Eliot, the fix for this was here (but it seems to have been overridden by
> a more recent change):
>
>    Name: VMMaker.oscog-dtl.286
>    Author: dtl
>    Time: 4 May 2013, 11:29:25.237 am
>    UUID: 8be237d9-7812-4792-9723-90f9cff0c2e9
>    Ancestors: VMMaker.oscog-eem.285
>
>    Replace broken primitiveUtcWithOffset with a version that works.
>
> Dave

Thanks Dave,

I could run on 64 bits Ubuntu with the VM from squeakvm.org/unix. I'll  
try the Mac VM when I get the chance to borrow a Mac again.

One thing that seems to be missing is a Windows interpreter with the  
new primitives, although I don't know if there is a real need for that.

Additionally, besides getting rid of <primitive: 137>  
(primLocalSecondsClock that will overflow in 2037), it would be great  
to stop using <primitive: 135> (primLocalSecondsClock), that overflows  
every six days. But for this, we would need a new <primitive: 136>  
(primSignal:atMilliseconds:) as it uses on the same time base. This  
would enable a serious simplification of Delay.


Cheers,
Juan Vuletich


Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
On Mon, May 26, 2014 at 04:41:10PM +0000, J. Vuletich (mail lists) wrote:

>
> Quoting "David T. Lewis" <[hidden email]>:
>
> >On Mon, May 26, 2014 at 02:17:47PM +0000, J. Vuletich (mail lists) wrote:
> >>Hi David, Folks,
> >>...
> >>The correct value would be -10800, as answered in Windows. I could not
> >>test on Linux yet (could not get the vm to run in Ubuntu 14.04 64 bit
> >>:( ).
> >>
> >>Any clue on what's wrong on Mac OS?
> >>
> >>BTW, which would be the current non-Cog VMs to try?
> >>
> >
> >Oops, I mistakenly said that the Cog VMs could be used. But it looks like
> >there is a regression or code merge problem of some sort. I'm afraid that
> >I was testing with my own locally compiled Cog VM and did not notice the
> >problem.
> >
> >A unix Mac VM from squeakvm.org/unix should demonstrate the correct
> >behavior.
> >
> >CC to vm-dev list:
> >
> >Eliot, the fix for this was here (but it seems to have been overridden by
> >a more recent change):
> >
> >   Name: VMMaker.oscog-dtl.286
> >   Author: dtl
> >   Time: 4 May 2013, 11:29:25.237 am
> >   UUID: 8be237d9-7812-4792-9723-90f9cff0c2e9
> >   Ancestors: VMMaker.oscog-eem.285
> >
> >   Replace broken primitiveUtcWithOffset with a version that works.
> >
> >Dave
>
> Thanks Dave,
>
> I could run on 64 bits Ubuntu with the VM from squeakvm.org/unix. I'll  
> try the Mac VM when I get the chance to borrow a Mac again.
>
> One thing that seems to be missing is a Windows interpreter with the  
> new primitives, although I don't know if there is a real need for that.
>

Ian did a build of the Windows VM that should have the necessary support. Try
the Squeak4.1.2.2612 VM from http://squeakvm.org/win32/.

One thing to note - if the primitive is not present, DateAndTime will fall
back on the old logic, and it should produce reasonable results.

> Additionally, besides getting rid of <primitive: 137>  
> (primLocalSecondsClock that will overflow in 2037), it would be great  
> to stop using <primitive: 135> (primLocalSecondsClock), that overflows  
> every six days. But for this, we would need a new <primitive: 136>  
> (primSignal:atMilliseconds:) as it uses on the same time base. This  
> would enable a serious simplification of Delay.
>

I think that Eliot is planning to update Squeak to use the microsecond clock
primitive, which removes any 2037 issues. I'm not sure if that would include
a change to primSignal:atMilliseconds:

Dave



Reply | Threaded
Open this post in threaded view
|

re: Please use new threads for new threads.

Chris Muller-3
In reply to this post by ccrraaiigg
Hi Craig, one solution to your question (below).

Also, a counterpart rule to "Please use new threads for new threads"
is, "Please don't start new threads for the same thread".  :-)  e.g.,
by changing the subject twice there are now three separate "threads"
which are really the same thread. (see screenshot)

(As an experiment, I've composed this "reply" anew, but C&P'd the
subject-line from Craigs last post because I want to see whether Gmail
renders this as a new thread or whether it collates it by the subject
line).

>      Short: Yeesh, I'm deleting this thread for sure. :)
>
>      Long:
>
>      This matters to me, because I want to be informed while having
> limited time to spend.
>
>      Naturally, I wondered how I might fix this myself, leaving the
> delicate sensibilities of my fellow raconteurs untrodden. I'm not sure
> what fix would work.

A separate Gmail account dedicated to mailing-list reading and
responding.  It lets the tail wag the dog and still presents threads
collated by subject-line..

> Sometimes a reply has nothing to do with the
> message to which the responder is replying, and the subject line is
> totally different. Sometimes a reply is actually a response, and the
> subject line might be the same ("hyperbole is great!"), somewhat
> different ("perhaps communication is better [was: 'hyperbole is
> great!']"), or totally different. Using simple message ID references and
> the conventions of normal conversation, without turning it into an AI
> project, seems the best we can manage.

I totally agree, in principle!  If there's a graph of message id's,
the mail clients ought to make use of it over Stringy matching!  Just
like I wish Eliot and Bert would use the graph of ancestry in
Monticello rather than stringy name matching for "branches"..   :)

Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Chris Muller-3
In reply to this post by Louis LaBrunda
On Mon, May 26, 2014 at 11:12 AM, Louis LaBrunda
<[hidden email]> wrote:

> On Mon, 26 May 2014 17:29:19 +0200, Bert Freudenberg <[hidden email]>
> wrote:
>
>>
>>On 26.05.2014, at 17:16, David T. Lewis <[hidden email]> wrote:
>>
>>> On Mon, May 26, 2014 at 10:48:16AM -0400, Louis LaBrunda wrote:
>>>>
>>>> On Sun, 25 May 2014 13:48:44 -0400, "David T. Lewis" <[hidden email]>
>>>> wrote:
>>>>
>>>>> I have been working on a variation of class DateAndTime that replaces its
>>>>> instance variables (seconds offset jdn nanos) with two instance variables,
>>>>> utcMicroseconds to represent microseconds elapsed since the Posix epoch, and
>>>>> localOffsetSeconds to represent the local time zone offset. When instantiating
>>>>> the time now, A single call primitiveUtcWithOffset is used to obtain these
>>>>> two values atomically as reported by the underlying platform.
>>>>>
>>>>> There are several advantages to this representation of DateAndTime, the most
>>>>> important of which is that its magnitude is unambiguous regardless of daylight
>>>>> savings transitions in local time zones.
>>>>>
>>>> Hi Dave,
>>>>
>>>> May I respectfully ask why localOffsetSeconds (to represent the local time
>>>> zone offset) is needed?  It seems to me a UTC time is enough.  Is there
>>>> really a need for the timezone offset the instance was created in?  Does
>>>> every DateAndTime instance need to carry this offset around with it?  I
>>>> would think the offset is only needed if one wants to display a date/time
>>>> as a local value and then one could get the local offset from the VM or a
>>>> program setting the user had previously supplied regardless of where the
>>>> computer was setup to run.  I guess there might be some historic interest
>>>> as to the timezone an instances (or many instances) was created in but one
>>>> could just keep that as a separate value.
>>>
>>> Hi Lou,
>>>
>>> Good question. In fact, one of the reasons I like the UTC implementation is
>>> that it helps clarify the two main responsibilities of DateAndTime. One is
>>> to represent time as a magnitude (for duration calculation, etc). The other
>>> is to display time in the frame of reference of a local time zone. It is not
>>> at all clear to me that those two responsibilies belong in the same class.
>>>
>>> Dave
>>
>>We need to be able to distinguish between local and universal time. It would be rather inconvenient if asking a DateAndTime for e.g. the hour would not be made to answer the local hour. Arguably the local time offset could be moved to a subclass, but having a single DateAndTime class is appealing for simplicity reasons, too.
>>- Bert -
>
> I guess a DateAndTime class (without offset) could always answer 0 for the
> offset and a DateAndTimeWithOffset subclass could carry and answer the
> offset.  Methods could be provided to morph one into the other if desired.

No, one of the core requirements of a DateAndTime has always been to
be able to answer the local time.  It's fine for its internal
representation to be in UTC, but that requirement cannot go away.

If you want all UTC DateAndTime's then just specify an offset of 0.

Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
In reply to this post by Bert Freudenberg
On Mon, May 26, 2014 at 05:29:19PM +0200, Bert Freudenberg wrote:

> On 26.05.2014, at 17:16, David T. Lewis <[hidden email]> wrote:
> > On Mon, May 26, 2014 at 10:48:16AM -0400, Louis LaBrunda wrote:
> >>
> >> May I respectfully ask why localOffsetSeconds (to represent the local time
> >> zone offset) is needed?  It seems to me a UTC time is enough.  Is there
> >> really a need for the timezone offset the instance was created in?  Does
> >> every DateAndTime instance need to carry this offset around with it?  I
> >> would think the offset is only needed if one wants to display a date/time
> >> as a local value and then one could get the local offset from the VM or a
> >> program setting the user had previously supplied regardless of where the
> >> computer was setup to run.  I guess there might be some historic interest
> >> as to the timezone an instances (or many instances) was created in but one
> >> could just keep that as a separate value.
> >
> > Hi Lou,
> >
> > Good question. In fact, one of the reasons I like the UTC implementation is
> > that it helps clarify the two main responsibilities of DateAndTime. One is
> > to represent time as a magnitude (for duration calculation, etc). The other
> > is to display time in the frame of reference of a local time zone. It is not
> > at all clear to me that those two responsibilies belong in the same class.
> >
> > Dave
>
> We need to be able to distinguish between local and universal time. It would be rather inconvenient if asking a DateAndTime for e.g. the hour would not be made to answer the local hour. Arguably the local time offset could be moved to a subclass, but having a single DateAndTime class is appealing for simplicity reasons, too.
>

One thing that seemed awkward to me was the question of how to implement #=. I
chose to let it be a comparison of just the utcMicroseconds magnitude, ignoring
the localOffsetSeconds. That may be the wrong thing to do, although if I think
about a DateAndTime as a magnitude, I expect that it should be true that one
instance is "greater than or: [equal to]" another when utcMicroseconds are equal,
regardless of the local offset.

Dave



Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Chris Muller-3
In reply to this post by David T. Lewis
Hi Dave, as someone who works with large systems in Squeak, I'm always
interested in _storage efficiency_ as much as execution efficiency.

DateAndTime, in particular, is a very common domain element with a
high potential for there to be many millions of instances in a given
domain model.

Apps which have millions of objects with merely a Date attribute can
canonicalize them.
And, apps which have millions of Time objects can canonicalize them.

But LargeInteger's are not easy to canonicalize (e.g.,
utcMicroseconds).  So a database system with millions of DateAndTime's
would have to do _two_ reads for every DateAndTime instance instead of
just one today (because SmallIntegers are immediate, while
LargeIntegers require their own storage buffer).

One thing I really like about the current implementation of
DateAndTime is how it carefully avoids LargeIntegers by having
large-grained "platforms" to arrive at the current time.  e.g., each
'jdn' is a chunk of (1000000*60*60*24) microseconds.  Your new
implementation reflects an increase of 86 BILLION utcMicroseconds for
every 1 jdn.

Small, all-in-memory benchmarks may show faster with the LI, but I'm
concerned that large-scale apps might be significantly impacted in the
opposite way..

Would it be possible to re-optimize this part of the representation
while still maintaining internal UTC represenation to solve your
concern about daylight-savings?

Thanks.

On Sun, May 25, 2014 at 12:48 PM, David T. Lewis <[hidden email]> wrote:

> I have been working on a variation of class DateAndTime that replaces its
> instance variables (seconds offset jdn nanos) with two instance variables,
> utcMicroseconds to represent microseconds elapsed since the Posix epoch, and
> localOffsetSeconds to represent the local time zone offset. When instantiating
> the time now, A single call primitiveUtcWithOffset is used to obtain these
> two values atomically as reported by the underlying platform.
>
> There are several advantages to this representation of DateAndTime, the most
> important of which is that its magnitude is unambiguous regardless of daylight
> savings transitions in local time zones.
>
> This is my attempt to address some historical baggage in Squeak. The VM
> reports time related to the local time zone, and the image attempts to
> convert to UTC (sometimes incorrectly). A UTC based representation makes the
> implementation of time zone tables more straightforward (see for example
> the Olson time zone tables in TimeZoneDatabase on SqueakMap).
>
> I am attaching the source code as a SAR file that can be loaded into a fully
> updated Squeak trunk image. The conversion process is slow, so be patient
> if you load it.
>
> This can be run on either an intepreter VM or Cog, but if you use Cog, please
> use a version dated June 2013 or later (the VM in the Squeak 4.5 all-in-one
> is fine).
>
> I am also attaching a copy of LXTestDateAndTimePerformance, which can be
> used to compare the performance of some basic DateAndTime functions.
>
> Performance of the UTC based DateAndTime is generally favorable compared to
> the original. Here is what I see on my system (smaller numbers are better).
>
> LXTestDateAndTimePerformance test results using the original Squeak DateAndTime
> on an interpreter VM:
> {
>         #testNow->10143 .
>         #testEquals->30986 .
>         #testGreaterThan->80199 .
>         #testLessThan->75912 .
>         #testPrintString->10429 .
>         #testStringAsDateAndTime->44657
> }
>
> LXTestDateAndTimePerformance test results using the new UTC based DateAndTime
> on an interpreter VM:
> {
>         #testNow->6423 .
>         #testEquals->31625 .
>         #testGreaterThan->22999 .
>         #testLessThan->18514 .
>         #testPrintString->12502 .
>         #testStringAsDateAndTime->32912
> }
>
> (CC to Brent Pinkney, author of the excellent Squeak Chronology package)
>
> Dave
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

J. Vuletich (mail lists)
In reply to this post by David T. Lewis

Quoting "David T. Lewis" <[hidden email]>:

> ...
>>
>
> Ian did a build of the Windows VM that should have the necessary support. Try
> the Squeak4.1.2.2612 VM from http://squeakvm.org/win32/.

That VM fails for
        <primitive: 'primitiveUtcWithOffset'>
        <primitive: 240> primUtcMicrosecondClock
        <primitive: 241> primLocalMicrosecondClock

> One thing to note - if the primitive is not present, DateAndTime will fall
> back on the old logic, and it should produce reasonable results.

Yes, Cuis already does that... I'd prefer to be able to assume that  
any future VM will provide <primitive: 'primitiveUtcWithOffset'> and  
clean the code. Would that asking for too much?

>> Additionally, besides getting rid of <primitive: 137>
>> (primLocalSecondsClock that will overflow in 2037), it would be great
>> to stop using <primitive: 135> (primLocalSecondsClock), that overflows
>> every six days. But for this, we would need a new <primitive: 136>
>> (primSignal:atMilliseconds:) as it uses on the same time base. This
>> would enable a serious simplification of Delay.
>>
>
> I think that Eliot is planning to update Squeak to use the microsecond clock
> primitive, which removes any 2037 issues. I'm not sure if that would include
> a change to primSignal:atMilliseconds:
>
> Dave

I see. But given that the Delay code is fragile and not trivial at  
all, the advantages of of relying on a clock that never rolls over  
would be significant.

Please, Eliot, consider this when you work on this code.


Cheers,
Juan Vuletich


Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Nicolas Cellier
In reply to this post by Chris Muller-3



2014-05-26 20:09 GMT+02:00 Chris Muller <[hidden email]>:
Hi Dave, as someone who works with large systems in Squeak, I'm always
interested in _storage efficiency_ as much as execution efficiency.

DateAndTime, in particular, is a very common domain element with a
high potential for there to be many millions of instances in a given
domain model.

Apps which have millions of objects with merely a Date attribute can
canonicalize them.
And, apps which have millions of Time objects can canonicalize them.

But LargeInteger's are not easy to canonicalize (e.g.,
utcMicroseconds).  So a database system with millions of DateAndTime's
would have to do _two_ reads for every DateAndTime instance instead of
just one today (because SmallIntegers are immediate, while
LargeIntegers require their own storage buffer).

One thing I really like about the current implementation of
DateAndTime is how it carefully avoids LargeIntegers by having
large-grained "platforms" to arrive at the current time.  e.g., each
'jdn' is a chunk of (1000000*60*60*24) microseconds.  Your new
implementation reflects an increase of 86 BILLION utcMicroseconds for
every 1 jdn.

Small, all-in-memory benchmarks may show faster with the LI, but I'm
concerned that large-scale apps might be significantly impacted in the
opposite way..

Would it be possible to re-optimize this part of the representation
while still maintaining internal UTC represenation to solve your
concern about daylight-savings?

Thanks.

That's more or less the Pharo path.
 

On Sun, May 25, 2014 at 12:48 PM, David T. Lewis <[hidden email]> wrote:
> I have been working on a variation of class DateAndTime that replaces its
> instance variables (seconds offset jdn nanos) with two instance variables,
> utcMicroseconds to represent microseconds elapsed since the Posix epoch, and
> localOffsetSeconds to represent the local time zone offset. When instantiating
> the time now, A single call primitiveUtcWithOffset is used to obtain these
> two values atomically as reported by the underlying platform.
>
> There are several advantages to this representation of DateAndTime, the most
> important of which is that its magnitude is unambiguous regardless of daylight
> savings transitions in local time zones.
>
> This is my attempt to address some historical baggage in Squeak. The VM
> reports time related to the local time zone, and the image attempts to
> convert to UTC (sometimes incorrectly). A UTC based representation makes the
> implementation of time zone tables more straightforward (see for example
> the Olson time zone tables in TimeZoneDatabase on SqueakMap).
>
> I am attaching the source code as a SAR file that can be loaded into a fully
> updated Squeak trunk image. The conversion process is slow, so be patient
> if you load it.
>
> This can be run on either an intepreter VM or Cog, but if you use Cog, please
> use a version dated June 2013 or later (the VM in the Squeak 4.5 all-in-one
> is fine).
>
> I am also attaching a copy of LXTestDateAndTimePerformance, which can be
> used to compare the performance of some basic DateAndTime functions.
>
> Performance of the UTC based DateAndTime is generally favorable compared to
> the original. Here is what I see on my system (smaller numbers are better).
>
> LXTestDateAndTimePerformance test results using the original Squeak DateAndTime
> on an interpreter VM:
> {
>         #testNow->10143 .
>         #testEquals->30986 .
>         #testGreaterThan->80199 .
>         #testLessThan->75912 .
>         #testPrintString->10429 .
>         #testStringAsDateAndTime->44657
> }
>
> LXTestDateAndTimePerformance test results using the new UTC based DateAndTime
> on an interpreter VM:
> {
>         #testNow->6423 .
>         #testEquals->31625 .
>         #testGreaterThan->22999 .
>         #testLessThan->18514 .
>         #testPrintString->12502 .
>         #testStringAsDateAndTime->32912
> }
>
> (CC to Brent Pinkney, author of the excellent Squeak Chronology package)
>
> Dave
>
>
>
>




Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
In reply to this post by J. Vuletich (mail lists)
On Mon, May 26, 2014 at 06:16:03PM +0000, J. Vuletich (mail lists) wrote:

>
> Quoting "David T. Lewis" <[hidden email]>:
>
> >...
> >>
> >
> >Ian did a build of the Windows VM that should have the necessary support.
> >Try
> >the Squeak4.1.2.2612 VM from http://squeakvm.org/win32/.
>
> That VM fails for
> <primitive: 'primitiveUtcWithOffset'>
> <primitive: 240> primUtcMicrosecondClock
> <primitive: 241> primLocalMicrosecondClock
>
> >One thing to note - if the primitive is not present, DateAndTime will fall
> >back on the old logic, and it should produce reasonable results.
>
> Yes, Cuis already does that... I'd prefer to be able to assume that  
> any future VM will provide <primitive: 'primitiveUtcWithOffset'> and  
> clean the code. Would that asking for too much?

You should expect the following two primitives to be present in all VMs
(comments are from the VMM trunk implementations):

primitiveUtcWithOffset
        "Answer an array with UTC microseconds since the Posix epoch and
        the current seconds offset from GMT in the local time zone. An empty
        two element array may be supplied as a parameter.
        This is a named (not numbered) primitive in the null module (ie the VM)"

primitiveUTCMicrosecondClock
        "Answer the UTC microseconds since the Smalltalk epoch. The value is
        derived from the Posix epoch (see primitiveUTCMicrosecondClock) with a
        constant offset corresponding to elapsed microseconds between the two
        epochs according to RFC 868."

Dave


>
> >>Additionally, besides getting rid of <primitive: 137>
> >>(primLocalSecondsClock that will overflow in 2037), it would be great
> >>to stop using <primitive: 135> (primLocalSecondsClock), that overflows
> >>every six days. But for this, we would need a new <primitive: 136>
> >>(primSignal:atMilliseconds:) as it uses on the same time base. This
> >>would enable a serious simplification of Delay.
> >>
> >
> >I think that Eliot is planning to update Squeak to use the microsecond
> >clock
> >primitive, which removes any 2037 issues. I'm not sure if that would
> >include
> >a change to primSignal:atMilliseconds:
> >
> >Dave
>
> I see. But given that the Delay code is fragile and not trivial at  
> all, the advantages of of relying on a clock that never rolls over  
> would be significant.
>
> Please, Eliot, consider this when you work on this code.
>
>
> Cheers,
> Juan Vuletich
>

Reply | Threaded
Open this post in threaded view
|

re: Please use new threads for new threads.

Bert Freudenberg
In reply to this post by Chris Muller-3
On 2014-05-26, at 19:31, Chris Muller <[hidden email]> wrote:

>
>> Sometimes a reply has nothing to do with the
>> message to which the responder is replying, and the subject line is
>> totally different. Sometimes a reply is actually a response, and the
>> subject line might be the same ("hyperbole is great!"), somewhat
>> different ("perhaps communication is better [was: 'hyperbole is
>> great!']"), or totally different. Using simple message ID references and
>> the conventions of normal conversation, without turning it into an AI
>> project, seems the best we can manage.
>
> I totally agree, in principle!  If there's a graph of message id's,
> the mail clients ought to make use of it over Stringy matching!  Just
> like I wish Eliot and Bert would use the graph of ancestry in
> Monticello rather than stringy name matching for "branches"..   :)
In case this is not just a meant-to-be-funny remark, you should start a new thread for that topic.

For my part I can't see much of an analogy here.

- Bert -




smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
In reply to this post by Chris Muller-3
On Mon, May 26, 2014 at 01:09:06PM -0500, Chris Muller wrote:

> Hi Dave, as someone who works with large systems in Squeak, I'm always
> interested in _storage efficiency_ as much as execution efficiency.
>
> DateAndTime, in particular, is a very common domain element with a
> high potential for there to be many millions of instances in a given
> domain model.
>
> Apps which have millions of objects with merely a Date attribute can
> canonicalize them.
> And, apps which have millions of Time objects can canonicalize them.
>
> But LargeInteger's are not easy to canonicalize (e.g.,
> utcMicroseconds).  So a database system with millions of DateAndTime's
> would have to do _two_ reads for every DateAndTime instance instead of
> just one today (because SmallIntegers are immediate, while
> LargeIntegers require their own storage buffer).
>
> One thing I really like about the current implementation of
> DateAndTime is how it carefully avoids LargeIntegers by having
> large-grained "platforms" to arrive at the current time.  e.g., each
> 'jdn' is a chunk of (1000000*60*60*24) microseconds.  Your new
> implementation reflects an increase of 86 BILLION utcMicroseconds for
> every 1 jdn.

Understood. But to clarify: The name "utcMicroseconds" reflects only the
precision of the time scale, it is not meant to imply what kind of number
is used to represent it. In fact, a DateAndTime with nanosecond precision
will typically appear as a Fraction rather than a LargeInteger. But
microsecond precision is what is currently reported by the primitives, so
these are LargeInteger relative to the Posix epoch.

For saving to a database, you could certainly shift the time origin
and/or limit the precision of the time representation. That's more or
less with the current jnd/seconds/nanos does.

>
> Small, all-in-memory benchmarks may show faster with the LI, but I'm
> concerned that large-scale apps might be significantly impacted in the
> opposite way..
>
> Would it be possible to re-optimize this part of the representation
> while still maintaining internal UTC represenation to solve your
> concern about daylight-savings?

Sure, but just to clarify: This is not something that I am proposing for Squeak
trunk.  It is a follow up project to my TimeZoneDatabase that I have been meaning
to do for the last 15 years. I finally got around to trying it, so I figured I'd
go ahead and publish the code :)

Dave


>
> Thanks.
>
> On Sun, May 25, 2014 at 12:48 PM, David T. Lewis <[hidden email]> wrote:
> > I have been working on a variation of class DateAndTime that replaces its
> > instance variables (seconds offset jdn nanos) with two instance variables,
> > utcMicroseconds to represent microseconds elapsed since the Posix epoch, and
> > localOffsetSeconds to represent the local time zone offset. When instantiating
> > the time now, A single call primitiveUtcWithOffset is used to obtain these
> > two values atomically as reported by the underlying platform.
> >
> > There are several advantages to this representation of DateAndTime, the most
> > important of which is that its magnitude is unambiguous regardless of daylight
> > savings transitions in local time zones.
> >
> > This is my attempt to address some historical baggage in Squeak. The VM
> > reports time related to the local time zone, and the image attempts to
> > convert to UTC (sometimes incorrectly). A UTC based representation makes the
> > implementation of time zone tables more straightforward (see for example
> > the Olson time zone tables in TimeZoneDatabase on SqueakMap).
> >
> > I am attaching the source code as a SAR file that can be loaded into a fully
> > updated Squeak trunk image. The conversion process is slow, so be patient
> > if you load it.
> >
> > This can be run on either an intepreter VM or Cog, but if you use Cog, please
> > use a version dated June 2013 or later (the VM in the Squeak 4.5 all-in-one
> > is fine).
> >
> > I am also attaching a copy of LXTestDateAndTimePerformance, which can be
> > used to compare the performance of some basic DateAndTime functions.
> >
> > Performance of the UTC based DateAndTime is generally favorable compared to
> > the original. Here is what I see on my system (smaller numbers are better).
> >
> > LXTestDateAndTimePerformance test results using the original Squeak DateAndTime
> > on an interpreter VM:
> > {
> >         #testNow->10143 .
> >         #testEquals->30986 .
> >         #testGreaterThan->80199 .
> >         #testLessThan->75912 .
> >         #testPrintString->10429 .
> >         #testStringAsDateAndTime->44657
> > }
> >
> > LXTestDateAndTimePerformance test results using the new UTC based DateAndTime
> > on an interpreter VM:
> > {
> >         #testNow->6423 .
> >         #testEquals->31625 .
> >         #testGreaterThan->22999 .
> >         #testLessThan->18514 .
> >         #testPrintString->12502 .
> >         #testStringAsDateAndTime->32912
> > }
> >
> > (CC to Brent Pinkney, author of the excellent Squeak Chronology package)
> >
> > Dave
> >
> >
> >
> >

Reply | Threaded
Open this post in threaded view
|

re: Please use new threads for new threads.

Chris Muller-3
In reply to this post by Bert Freudenberg
>> I totally agree, in principle!  If there's a graph of message id's,
>> the mail clients ought to make use of it over Stringy matching!  Just
>> like I wish Eliot and Bert would use the graph of ancestry in
>> Monticello rather than stringy name matching for "branches"..   :)
>
> In case this is not just a meant-to-be-funny remark, you should start a new thread for that topic.
>
> For my part I can't see much of an analogy here.

The "In-Reply-To:" hierarchy is to the MCVersionInfo hierarchy as the
Subject-Line of an email is to an expanded MCVersionName of saved
package versions.

Each is a loosey-goosey String-matching strategy replacing a hard,
UUID reference hierarchy.

Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
In reply to this post by Chris Muller-3
On Mon, May 26, 2014 at 01:09:06PM -0500, Chris Muller wrote:

> Hi Dave, as someone who works with large systems in Squeak, I'm always
> interested in _storage efficiency_ as much as execution efficiency.
>
> DateAndTime, in particular, is a very common domain element with a
> high potential for there to be many millions of instances in a given
> domain model.
>
> Apps which have millions of objects with merely a Date attribute can
> canonicalize them.
> And, apps which have millions of Time objects can canonicalize them.
>
> But LargeInteger's are not easy to canonicalize (e.g.,
> utcMicroseconds).  So a database system with millions of DateAndTime's
> would have to do _two_ reads for every DateAndTime instance instead of
> just one today (because SmallIntegers are immediate, while
> LargeIntegers require their own storage buffer).

Hi Chris,

I do not have a lot of experience with database systems, so I would
like to better understand the issue for storage of large numeric values.

I was under the impression that modern SQL databases provide direct
support for large integer data types (e.g. bigint for SQL server), and my
assumption was that object databases such as Magma or GemStone would
make this a non-issue. Why is it that a large (64 bit) integer should
be any more or less difficult to persist than a small integer?

This may be a dumb question but I am curious.

Thanks,
Dave


Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Chris Muller-4
The issue actually relates purely to Squeak domain models.  Consider
the case of an all-in-memory object model in Squeak, with no database
involved at all.  It is very feasible an app would want to import a
flat-file dataset that involves creation a few million DateAndTime
instances (along with other objects, of course) to the point where
memory constraints begin to be noticed.

When dealing with this level of prolifigation-potential of a
particular class, and for such a base data-type we don't want to
endure changing again, I want us to strongly scrutinize the internal
representation.

In this case, the use of 'utcMicroseconds' introduces a lot of
duplicate bit-patterns in memory that are very hard, if not
impossible, to share.

The simplest case are two equivalent instances of DateAndTime (read
from separate files).  Despite being equivalent, their
utcMicroseconds' will be separate objects each consuming separate
memory space.  There is no easy way to share the same
'utcMicroseconds' instance between them.

But fully-equivalent DateAndTime's is not even half of the concern --
the high-order bits of every DateAndTime's 'utcMicroseconds'
duplicates the same bit pattern, again and again, eating up memory.

That doesn't happen when the internal representations are, or can be,
canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
original representation requires two additional slots per instance,
but the _contents_ of those slots are SmallIntegers -- shared memory.


On Mon, May 26, 2014 at 8:29 PM, David T. Lewis <[hidden email]> wrote:

> On Mon, May 26, 2014 at 01:09:06PM -0500, Chris Muller wrote:
>> Hi Dave, as someone who works with large systems in Squeak, I'm always
>> interested in _storage efficiency_ as much as execution efficiency.
>>
>> DateAndTime, in particular, is a very common domain element with a
>> high potential for there to be many millions of instances in a given
>> domain model.
>>
>> Apps which have millions of objects with merely a Date attribute can
>> canonicalize them.
>> And, apps which have millions of Time objects can canonicalize them.
>>
>> But LargeInteger's are not easy to canonicalize (e.g.,
>> utcMicroseconds).  So a database system with millions of DateAndTime's
>> would have to do _two_ reads for every DateAndTime instance instead of
>> just one today (because SmallIntegers are immediate, while
>> LargeIntegers require their own storage buffer).
>
> Hi Chris,
>
> I do not have a lot of experience with database systems, so I would
> like to better understand the issue for storage of large numeric values.
>
> I was under the impression that modern SQL databases provide direct
> support for large integer data types (e.g. bigint for SQL server), and my
> assumption was that object databases such as Magma or GemStone would
> make this a non-issue. Why is it that a large (64 bit) integer should
> be any more or less difficult to persist than a small integer?
>
> This may be a dumb question but I am curious.
>
> Thanks,
> Dave
>

Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

Nicolas Cellier



2014-05-27 4:30 GMT+02:00 Chris Muller <[hidden email]>:
The issue actually relates purely to Squeak domain models.  Consider
the case of an all-in-memory object model in Squeak, with no database
involved at all.  It is very feasible an app would want to import a
flat-file dataset that involves creation a few million DateAndTime
instances (along with other objects, of course) to the point where
memory constraints begin to be noticed.

When dealing with this level of prolifigation-potential of a
particular class, and for such a base data-type we don't want to
endure changing again, I want us to strongly scrutinize the internal
representation.

In this case, the use of 'utcMicroseconds' introduces a lot of
duplicate bit-patterns in memory that are very hard, if not
impossible, to share.

The simplest case are two equivalent instances of DateAndTime (read
from separate files).  Despite being equivalent, their
utcMicroseconds' will be separate objects each consuming separate
memory space.  There is no easy way to share the same
'utcMicroseconds' instance between them.

But fully-equivalent DateAndTime's is not even half of the concern --
the high-order bits of every DateAndTime's 'utcMicroseconds'
duplicates the same bit pattern, again and again, eating up memory.

That doesn't happen when the internal representations are, or can be,
canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
original representation requires two additional slots per instance,
but the _contents_ of those slots are SmallIntegers -- shared memory.


Well, in current 32 bit image format, SmallInteger are not exactly shared, they are immediate values.
Each consumes exactly 32 bits.

For a compact class like LargePosOrNegInteger, I don't remember what is the header size exactly, but you get 64 bits for data, I would be surprised to see a major difference wrt consumed memory.

Nicolas


On Mon, May 26, 2014 at 8:29 PM, David T. Lewis <[hidden email]> wrote:
> On Mon, May 26, 2014 at 01:09:06PM -0500, Chris Muller wrote:
>> Hi Dave, as someone who works with large systems in Squeak, I'm always
>> interested in _storage efficiency_ as much as execution efficiency.
>>
>> DateAndTime, in particular, is a very common domain element with a
>> high potential for there to be many millions of instances in a given
>> domain model.
>>
>> Apps which have millions of objects with merely a Date attribute can
>> canonicalize them.
>> And, apps which have millions of Time objects can canonicalize them.
>>
>> But LargeInteger's are not easy to canonicalize (e.g.,
>> utcMicroseconds).  So a database system with millions of DateAndTime's
>> would have to do _two_ reads for every DateAndTime instance instead of
>> just one today (because SmallIntegers are immediate, while
>> LargeIntegers require their own storage buffer).
>
> Hi Chris,
>
> I do not have a lot of experience with database systems, so I would
> like to better understand the issue for storage of large numeric values.
>
> I was under the impression that modern SQL databases provide direct
> support for large integer data types (e.g. bigint for SQL server), and my
> assumption was that object databases such as Magma or GemStone would
> make this a non-issue. Why is it that a large (64 bit) integer should
> be any more or less difficult to persist than a small integer?
>
> This may be a dumb question but I am curious.
>
> Thanks,
> Dave
>




Reply | Threaded
Open this post in threaded view
|

Re: A UTC based implementation of DateAndTime

David T. Lewis
On Tue, May 27, 2014 at 09:55:33PM +0200, Nicolas Cellier wrote:

> 2014-05-27 4:30 GMT+02:00 Chris Muller <[hidden email]>:
>
> > The issue actually relates purely to Squeak domain models.  Consider
> > the case of an all-in-memory object model in Squeak, with no database
> > involved at all.  It is very feasible an app would want to import a
> > flat-file dataset that involves creation a few million DateAndTime
> > instances (along with other objects, of course) to the point where
> > memory constraints begin to be noticed.
> >
> > When dealing with this level of prolifigation-potential of a
> > particular class, and for such a base data-type we don't want to
> > endure changing again, I want us to strongly scrutinize the internal
> > representation.
> >
> > In this case, the use of 'utcMicroseconds' introduces a lot of
> > duplicate bit-patterns in memory that are very hard, if not
> > impossible, to share.
> >
> > The simplest case are two equivalent instances of DateAndTime (read
> > from separate files).  Despite being equivalent, their
> > utcMicroseconds' will be separate objects each consuming separate
> > memory space.  There is no easy way to share the same
> > 'utcMicroseconds' instance between them.
> >
> > But fully-equivalent DateAndTime's is not even half of the concern --
> > the high-order bits of every DateAndTime's 'utcMicroseconds'
> > duplicates the same bit pattern, again and again, eating up memory.
> >
> > That doesn't happen when the internal representations are, or can be,
> > canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
> > original representation requires two additional slots per instance,
> > but the _contents_ of those slots are SmallIntegers -- shared memory.
> >
> >
> Well, in current 32 bit image format, SmallInteger are not exactly shared,
> they are immediate values.
> Each consumes exactly 32 bits.
>
> For a compact class like LargePosOrNegInteger, I don't remember what is the
> header size exactly, but you get 64 bits for data, I would be surprised to
> see a major difference wrt consumed memory.
>

Smalltalk compactClassesArray includes: DateAndTime ==> false
Smalltalk compactClassesArray includes: LargePositiveInteger ==> true

So for the traditional DateAndTime implementation, an instance requires:

  2 words of header (64 bits)
  3 words for the small integer jdn/seconds/nanos variables
  1 word for the pointer to the offset object, which is an instance of Duration

In practice, most instances of DateAndTime within an image will share the
same offset object, so for purposes of estimation assume that this takes
no extra space.

Thus each instance requires 6 words of space in the object memory (maybe a bit
more on average if the DateAndTime instances are not sharing the same Duration
instance for one reason or another).

For the UTC based implementation of DateAndTime, each instance requires:

  2 words of header
  1 word for the small integer localOffsetSeconds variable
  1 word for the pointer to the LargePositiveInteger representing utcMicroSeconds
  1 word of header for the large positive integer
  2 words of data for the value of the large positive integer

Thus each instance requires 7 words of space in the object memory.

So there is a difference, but it would probably not be a large effect on
overall space utilization, even assuming complete sharing of the offset
Duration instances.

Dave


123