Graphs from NTP servers
Hal Murray
hmurray at megapathdsl.net
Mon Feb 8 22:57:15 UTC 2016
I'll turn this into a web page, but this is what I have now.
Corrections/feedback encouraged. Off-list is fine.
The place to start is a system's loopstats file. This is from a low cost
DigitalOcean cloud server in San Francisco.
http://users.megapathdsl.net/~hmurray/ntpsec/SFO-self.png
That is the system's opinion of how good its clock is. There are two types
of errors to consider. The first is the wiggles in that graph. That tells
you how stable the local clock is. In this case, except for a few spikes
early on, the system mostly thinks it is within 1/2 ms of the correct time.
So as long as we are interested in millisecond accuracy rather than
microseconds, this system is probably a good place to stand while looking at
other servers and/or the internet connections from here to there.
The other type of error is systematic errors, for example, using the wrong
edge of a PPS pulse or asymmetric network delays. They don't show up in
loopstats. You can't detect them without digging deeper.
Both types of errors are something you need to keep in mind when looking at
graphs.
After the typical request-response packet exchange, a NTP client has 4 time
stamps:
The time the request left the client
The time the request arrived at the server
The time the response left the server
The time the response arrived at the client
Note that there are two different clocks used to make those time stamps,
either of which may be inaccurate.
NTP servers also act as clients to get their time from lower stratum servers.
ntpd logs those time stamps in the rawstats file. If you use the "noselect"
option on a "server" line in your config file, you can collect info without
letting dirty data corrupt your local clock.
Here is a graph of the round trip times from San Francisco to several servers
on the east coast:
http://users.megapathdsl.net/~hmurray/ntpsec/SFO-east-rtt.png
The steps in the green and red dots are due to routing changes. The fuzz on
the blue dots is queuing delays on some overloaded link. The cap on the fuzz
indicates that the overloaded link has 10 ms of buffering. There are a few
scattered red dots. The ones that indicate extra delays are typical network
glitches. I don't have a good story for the ones at 14 and 15 hours that
indicate reduced time. My guess would be a transient network path that was a
few ms shorter but didn't happen often enough to show up clearly.
Normally, ntpd assumes that the network delays are symmetrical. That lets it
compute the offset between the local clock and the remote clock. Here is a
graph of results of that calculation:
http://users.megapathdsl.net/~hmurray/ntpsec/SFO-east-off.png
If instead, you assume that both clocks are accurate, you can compute the
network transit delays in each direction. I picked well run servers for this
experiment, so that assumption is probably valid. The limiting factor is
probably the ms or so on the local clock.
Here is a graph of the delays to/from rackety:
http://users.megapathdsl.net/~hmurray/ntpsec/SFO-rackety-out-back.png
That shows that the congestion is on the return path. It also shows that the
return path takes about 5 ms longer than the forward path.
Here is the out/back graph for the NIST systems:
http://users.megapathdsl.net/~hmurray/ntpsec/SFO-nist-out-back.png
The first thing to notice is that the outgoing path takes over twice as long
as the return path. Going back to the round trip time graph, it's
suspicious that systems located relatively near each other have such large
differences in round trip times. The return times are close to the times
to/from rackety.
Note that there are only a few steps in the bottom/return path and the steps
in the top/forward path match the steps in the round trip time so most of the
routing changes are on the long forward path.
There is an interesting event associated with time-d from 17.5 to 18.5 hours.
Note that the out/back steps are mirror images of each other and that there
is no change in the round trip time during that time slot. That would happen
if the time on the remote system was offset. It could also happen with some
unlikey changes in routing.
Here is the round trip time graph for the nearby clocks used as references by
this system:
http://users.megapathdsl.net/~hmurray/ntpsec/SFO-local-rtt.png
And the corresponding offset graph:
http://users.megapathdsl.net/~hmurray/ntpsec/SFO-local-off.png
The routing to all 3 clocks is stable, but something is off by 1/2 ms.
Here is the out/back graph:
http://users.megapathdsl.net/~hmurray/ntpsec/SFO-local-out-back.png
(I dropped one of the HP clocks to reduce clutter.)
The mirror image pattern is due to offsets/errors in the local clock. (It
could be due to errors in the remote clocks, but all 3 have GPS/PPS inputs
and the return paths all agree.)
Note the 1/2 ms offset between the two out times. In order to figure out
which clock/path is correct, I'll have to find at least one more good clock.
(The 2 clocks at HP are on the same subnet so they only get one vote.)
--
These are my opinions. I hate spam.
More information about the devel
mailing list