NetBSD 6.1.5 doesn't have ldexpl in math.h

Fri Sep 15 01:26:46 UTC 2017

Fred Wright via devel <devel at ntpsec.org>:
> IMO, if a proper cost-benefit analysis of the use of long doubles in the
> NTP context were conducted, it would result in a resounding thumbs down.

Thank you, Fred.  I found your contribution measured and valuable even
though I'm not certain I understood all of the issues you were
raising. :-)

I'm not wedded to using long doubles. It was a direct response to this issue
report from gemiller:

    https://gitlab.com/NTPsec/ntpsec/issues/270

    The new, reverted code, in step_systime() has a loss of precision:

	   fp_sys = dtolfp(sys_residual);
	   fp_ofs = dtolfp(step);
	   fp_ofs += fp_sys;

    sys_residual and step are double and only have 53 bits of
    precision. But the l_fp needs 64 bits of precision, arguably 65 bits
    after 2026. Initial steps may be large, such as when a host has no
    valid RTC and thinks the current time is 1Jan70.  The C standard does
    not specify the precision of a double. C99 Annex F makes IEEE754
    compliance optional and very few C compilers are fully IEEE754
    compliant. C doubles may be may be 24 bits, 53 bits, or something
    else. Only rarely would a C double be able to hold a 65 bit number
    without loss of precision.

    Best to avoid any doubt about precision and perform all the
    computations as long double or better as timespec(64). The fix might
    be increasing the precision os sys-residual and step before calling
    step_systime().

    timespec(64) is my notation for a timespec containing a time64_t
    tv_sec and long tv_nsec.

    The replaced code used timespec(64) on 64 bit binaries and thus worked
    well past 2200.

When I posted the fix I wrote this:

    There are a couple if different issues tangled together
    here. Let's do proper separation of concerns before trying
    anything risky.

    As a first step, I've addressed the concern about loss of
    precision in what I think is a simpler way than changing the
    argument signature of step_system() away from using a float type
    (that might be a good idea but it's a separate discussion).

    Since the underlying problem seems to be that step and
    sys_residual have a float type that doesn't fully cover the range
    of l_fp, I've fixed that. There's now a doubletime_t typedef that
    is long double and thus a minimum of 80 bits (except under
    Microsoft C but who cares). This easily handles the full range of
    l_fp. I've tweaked all the appropriate type converters and tried
    to use double_time everywhere that the full precision of an l_fp
    is required.

    Please review this change carefully, hunting for any places I
    might have missed where double variables need to become
    doubletime_t. My goal is for all the floating-point time
    operations requiring that full range to use this type.

    Pivoting is a separate concern - there be dragons at that edge of
    the map. We have a note about that in devel/TODO so it doesn't
    need to be tracked by this issue, which I want to get closed for
    1.0. Please reopen this if you find any changes required for the
    doubletime_t cleanup.

That was the last comment in the bug thread.

I chose the recommendation to move to long double because I was (and still am)
trying to narrow the footprint of the NTP homebrew types. There are several
reasons I want to do this, all basically long-term ones involving gradual
reduction of global complexity in several places where it's still pretty bad.

That goal can be traded away, but I want to have a clearer idea of why
the trade is necessary before I do it.  Also feature freeze is supposed to be
tomorrow and my reluctance to do changes that might be subtly destabilizing
is going to rise dramatically.

I am not enough of a floating-point guru to really evaluate or
critique Gary's arguments about the original loss of precision, nor to
judge the efficacy of his fix, nor to understand Fred's assertion that
the applied fix is somehow useless.  I just followed Gary's
instructions with my "is this an invariant-breaker?" sensors turned up
to max gain.

That seemed to have been sufficient; the code worked.  Beyond that I
admit to feeling pretty clueless about what's going on here.

So, did I make an ignorant mistake?  Can this fix be rescued?  Is
someone else better equipped than me for the rescue?  (Translation:
I'd really love to dump this mess on Fred or Gary.)

> All the fuss over long doubles has distracted folks from a more legitimate
> issue with NetBSD 6.1.5, which is that python-config returns a nonworking
> build setup for the C extension.  But a workaround should be possible, and
> it's only in the build procedure, not the code.

> Well, [the gettime(2)/settime(2)] fallback code was broken in
> multiple ways, anyway, as was the comparable code in GPSD before I
> fixed it.  With *correct* fallback code, (a) and (b) are both
> inapplicable.

These are good things to know, but...

> Limiting support to a OS version that's not even a year old is rather
> heavy-handed, especially when there isn't a really good reason for it.
> And the same fallback that works for 10.11 works at least as far back as
> 10.5 (all of which are supported by classic ntpd, BTW).

OK.  Fred, our convention here is that Mark decides porting scope on
considered advice from the senior devs. We treat him as the product
strategist even through we're not working inside a corporate structure
where that makes obvious sense, simply because he's good at the view
from $30Kft and knows where a lot of the corporare bodies are buried.
Final decision will be his.

That said, I'm going to push - not hard, not hill-to-die-on, just
moderately - for remaining strict about our C99 conformance policy
and culling old releases/minor platforms that can't meet it.

A significant part of *my* job as architect is to defend us against
complexity creep. Of course what I'm actually defending us is an
increase in expected defect rates, but everyone here understands that
link.

I want to ditch the NetBSD 6 and old MacOS shims because they're
defect attractors. Mot only that: by visibly compromising on our "C99
or GTFO" policy we're legitimizing future exceptions and "It's just
one little shim. What harm can it do?" which can come around to bite
us in the ass.

These compromises have a way of accumulting that explains a lot of the
sorry state this code was in when we forked it.

Maybe Mark makes a product-strategy decision to eat that risk, but if
we do it was still my responsibility to be the guy who pulls against it.

One final point:  I actually think we're in a better position than most
projects to be "harsh", as you put it. Security people get it about
reducing attack surface; when you're trying to justify snipping off these
old warts even though someone is inconvenienced, that's the closest
thing you'll ever find to a sovereign excuse.
--
    <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.