current-posix-second is a disastrous mistake

Show/hide message thread

current-posix-second is a disastrous mistake Taylor R Campbell (10 Dec 2010 03:28 UTC)

Re: [Scheme-reports] current-posix-second is a disastrous mistake John Cowan (10 Dec 2010 22:40 UTC)

Re: [Scheme-reports] current-posix-second is a disastrous mistake Thomas Bushnell, BSG (10 Dec 2010 22:53 UTC)

Re: [Scheme-reports] current-posix-second is a disastrous mistake John Cowan (11 Dec 2010 04:30 UTC)

Re: [Scheme-reports] current-posix-second is a disastrous mistake Alaric Snell-Pym (11 Dec 2010 13:41 UTC)

Re: [Scheme-reports] current-posix-second is a disastrous mistake Thomas Bushnell, BSG (17 Dec 2010 23:54 UTC)

current-posix-second is a disastrous mistake Taylor R Campbell 10 Dec 2010 03:28 UTC

At <http://trac.sacrideo.us/wg/wiki/TimeCowan> is a proposal for a
procedure called CURRENT-POSIX-SECOND that returns the number of
seconds that have elapsed since 1970-01-01T00:00:00Z[*], *minus* the
number of those seconds that were leap seconds in UTC.  In the terms
of POSIX, it returns the current POSIX time.

This is a disastrous mistake.

POSIX corrupts the clock seen by programs.  POSIX time is one of two
things: either

(a) a system for naming second-duration intervals on the time line
    delimited by TAI ticks, that lacks names for some seconds
    (twenty-four of them, as of now) and that may have names for
    seconds that don't exist (none yet, as of now); or

(b) a system for naming variable-duration intervals on the time line.

Whichever interpretation one takes, POSIX time behaves extremely
badly.

It requires implementations, even those with extremely accurate local
clocks, either to watch updates to the leap second table, or to count
time inconsistently from systems that do.

It causes some pairs of calls to time, gettimeofday, or clock_gettime,
separated by an interval of more than one SI second, to return the
same (integral part of an) answer, even with extremely accurate clocks
and nobody touching adjtime, settimeofday, or clock_settime.

The proposal claims that `there is about a 1 in 10^-8 probability that
a computation of elapsed time made by calling this procedure twice
will be off by 1.'  This langauge suggests that there is some random
chance involved here.  But there isn't: leap seconds aren't drawn
uniformly at random from time.  Instead, in a network of POSIX agents
with reasonably accurate and well-synchronized clocks, every agent
will observe an erratic clock simultaneously, once every few years.

Leap seconds are a calendrical issue, not a timing issue.  Our clocks
don't rewind by a day at the end of February 28th in a leap year.
They continue to tick forward, second by second.  Instead, our
calendars display February 29th.  UTC doesn't rewind by a second just
before a leap second.  Instead, it calls the leap second the sixty-
first second of that minute.

Programs dealing with timing, rather than with calendars, don't care
about leap seconds.  Giving them a clock corrupted by subtracting the
number of leap seconds either breaks natural assumptions badly or
requires extra work to cover up the corruption.  Either way, it wastes
operator and programmer time, costs program complexity, and adds code
paths that are hit dangerously seldom, only once every few years.

Programs dealing with calendars, and displaying or interpreting time
in civil formats, need to be aware of leap seconds, in order, for
example, to interpret the text `2008-12-31T23:59:60Z' in the ISO 8601
format; and they need to be aware of time zones, and daylight saving
time rules, and so on.  Giving them a corrupted clock and system for
naming seconds makes them fail to interoperate with the real world no
matter how up-to-date their leap second tables and time zone databases
are.

For example, the BSD date utility misrepresents UTC:

% date -u -j +%Y-%m-%dT%H:%M:%SZ 200812312359.60
2009-01-01T00:00:00Z

The GNU date utility fails to interpret UTC:

% date -u +%Y-%m-%dT%H:%M:%SZ -d '2008-12-31 23:59:59'
2008-12-31T23:59:59Z
% date -u +%Y-%m-%dT%H:%M:%SZ -d '2008-12-31 23:59:60'
date: invalid date `2008-12-31 23:59:60'

POSIX time -- not the leap second -- has extremely serious detrimental
real-world consequences.  POSIX corrupts the clock seen by programs.
Don't do the same for Scheme.  Count the number of seconds since an
epoch -- don't corrupt the count by subtracting the number that had
unusual names in UTC.

[*] Strictly speaking, it is not the number of seconds that have
    elapsed since 1970-01-01T00:00:00Z, but the number of seconds that
    have elapsed since 1972-01-01T00:00:00Z plus 63072000, since the
    modern definition of UTC did not start until 1972.

P.S.  `But how do I get an uncorrupted clock in POSIX?', you ask.
Well, you don't: POSIX corrupts the clock seen by programs.

Fortunately, many popular Unix systems synchronize their clocks with
the NTP Project's ntpd and provide some extra-POSIX system calls to
support it, notably ntp_gettime.  The fragment

   struct timespec ts;
   if (0 != clock_gettime(CLOCK_REALTIME, &ts)) /* fail */;

stores in ts the number of seconds since 1972-01-01T00:00:00Z plus
63072000 minus the number of those seconds that were leap seconds.
The replacement fragment in the NTPv4 API, not very much longer,

   struct timespec ts, leaps;
   struct ntptimeval ntv;
   if (0 > ntp_gettime(&ntv)) /* fail */;
   /* ntv.tai is the current TAI - UTC offset.  TAI's 1972-01-01
      00:00:10 is 1972-01-01T00:00:00Z, the modern UTC epoch.
      Hence ntv.tai - 10 is the number of leap seconds since
      1972-01-01T00:00:00Z.  */
   leaps.tv_sec = ntv.tai - 10;
   leaps.tv_nsec = 0;
   timespecadd(&ntv.time, &leaps, &ts);

stores in ts the number of seconds since 1972-01-01T00:00:00Z plus
63072000.

This works only if someone such as the local ntpd informs the system
about the leap second table, of course.  If not, or if your system has
only a POSIX clock and no ntp_gettime, then what you have is a
corrupted clock, which may as well be a clock in error by several
dozens of seconds, and you had better be prepared for erratic clock
behaviour such as rewinding within that margin.  If being off by 24
bothers you but not enough to fix your operating system, you could add
24 to the number of seconds you get from time, gettimeofday, or
clock_gettime.