Testing float-valued functions

Fri Oct 6 11:52:47 UTC 2017

This is a training assignment for Ian Bruene.  I'm describing the
background in public because this is an instance of a generic error
other programmers are likely to run into, and all need to guard
against it.

As it presently exists, the pylib/packet.py function def
ntp_to_posix(t) is incorrect.  Here is how it now reads:

    @staticmethod
    def ntp_to_posix(t):
        "Scale from NTP time to POSIX time"
        # Note: assumes we're in the same NTP era as the transmitter...
        return (t >> 32) - SyncPacket.UNIX_EPOCH

The problem with this code is that it throws away the low 32 bits of
information in t, the fractional-second part.  Writing it this way was
a coding error on my part which produced this bug:

https://gitlab.com/NTPsec/ntpsec/issues/402

The fractional-seconds part of ntpdig's report was simply missing
because ntp_to_posix() had discarded it.

I fixed this on 2 Oct with the commit "Address GitLab issue #402:
ntpdig: no fraction of seconds", which changed the function to return
a float, as follows:

    def ntp_to_posix(t):
        "Scale from NTP time to POSIX time"
        # Do not do the obvious thing with a shift, the low 32 bits
        # need to become decimal fractional seconds.
        # Note: assumes we're in the same NTP era as the transmitter...
        return (t / (2**32)) - SyncPacket.UNIX_EPOCH

This code is functionally correct, but unmasked an issue wth the test
code for ntp_to_posix() and posix_to_ntp().  Here is that code:

    def test_ntp_to_posix(self):
        f = self.target.ntp_to_posix
        # Test the Timeless Void Before Existence
        self.assertEqual(f(-1), -2208988801)  # NTP can see the Timeless Void
        # Test beginning of NTP epoch
        self.assertEqual(f(0), -2208988800)
        # Test just before the Age of Unix
        self.assertEqual(f(2208988799 * 2**32), -1)
        # Test the beginning of the Age of Unix
        self.assertEqual(f(2208988800 * 2**32), 0)
        # Test the End of Time
        self.assertEqual(f(0xFFFFFFFF00000000), 2085978495)
        # Test the Timeless Void Beyond Existence
        self.assertEqual(f(0x10000000000000000), 2085978496)  # It doesn't clip

    def test_posix_to_ntp(self):
        f = self.target.posix_to_ntp
        # Test the Timeless Void Before Existence
        self.assertEqual(f(-2208988801), -0x100000000)  # It doesn't clip
        # Test beginning of NTP epoch
        self.assertEqual(f(-2208988800), 0)
        # Test just before the Age of Unix
        self.assertEqual(f(-1), 2208988799 * 2**32)
        # Test the beginning of the Age of Unix
        self.assertEqual(f(0), 2208988800 * 2**32)
        # Test the End of Time
        self.assertEqual(f(2085978495), 0xFFFFFFFF00000000)
        # Test the Timeless Void Beyond Existence
        self.assertEqual(f(2085978496), 0x10000000000000000)  # It doesn't clip

The specific problem we began to see was that the assertion

        self.assertEqual(f(-1), -2208988801

began to fail.  But not everywhere - on 18 of 20 platforms in GitLab CI.

In fact, all this test code is subtly wrong, and it is just blind luck
nothing went sproing sooner.  Any of those assertions could have gone
toes-up at any time. The tests in posixize() are wrong, too. 

The problem here is that float representation is not exact, and the
"same" float calculation on different platforms may yield slightly
different results.  Thus comparing floats to floats is flaky,
and comparing ints to floats is flaky.  This is true no matter
what language you are using; it's an intrinsic property of the
float data representation.

This is why the unit-test framework has an assertAlmostEqual(x, y)
function - it's there exactly to deal with float fuzziness.

The flakiness of floats is a trap lying in wait for even experienced
programmers - it's easy to forget that functions are float-valued
rather than int. Little blame attaches to a novice getting caught,
but one should learn from it.

Ian, your assignment is to fix this and verify the fix.  Please do
this as immediately as possible.

A question for you to ponder: should you replace all assertEqual calls
or only those on which we have seen failures?  If the latter, what
else do you need to do?

Be prepared to discuss the tradeoffs and justify your answer.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Probably fewer than 2% of handguns and well under 1% of all guns will
ever be involved in a violent crime. Thus, the problem of criminal gun
violence is concentrated within a very small subset of gun owners,
indicating that gun control aimed at the general population faces a
serious needle-in-the-haystack problem.
	-- Gary Kleck, "Point Blank: Handgun Violence In America"