[Git][NTPsec/ntpsec][master] 3 commits: In ntpq, polish and document Hal's direct-mode feature.

Thu Dec 22 13:31:55 UTC 2016

Eric S. Raymond pushed to branch master at NTPsec / ntpsec


Commits:
f5f298b6 by Eric S. Raymond at 2016-12-22T08:31:19-05:00
In ntpq, polish and document Hal's direct-mode feature.

- - - - -
b599d523 by Eric S. Raymond at 2016-12-22T08:31:19-05:00
Documentation polishing.

- - - - -
5835482f by Eric S. Raymond at 2016-12-22T08:31:19-05:00
Attempting cleanup after Hal's direct-mode patch.

- - - - -


6 changed files:

- + docs/includes/mrufail.txt
- docs/includes/ntpmon-body.txt
- docs/includes/ntpq-body.txt
- ntpclients/ntpmon
- ntpclients/ntpq
- pylib/packet.py


Changes:

=====================================
docs/includes/mrufail.txt
=====================================

--- /dev/null
+++ b/docs/includes/mrufail.txt
@@ -0,0 +1,19 @@
+// Explain the MRU stall problem and why ntpq has 'direct' mode.
+
+This program will behave in apparently buggy and only semi-predictable
+ways when fetching MRU lists from _any_ server with sufficiently high
+traffic.
+
+The problem is fundamental. The Mode 6 protocol can't ship (and your
+client cannot accept) MRU records as fast as the daemon accepts
+incoming traffic. Under these circumstances, the daemon will
+repeatedly fail to ship an entire report, leading to long
+hangs as your client repeatedly re-sends the request. Eventually the
+Mode 6 client library will throw an error indicating that a maximum
+number of restarts has been exceeded.
+
+To avoid this problem, avoid monitoring over links that don't have
+enough capacity to handle the monitored server's _entire_ NTP load.
+
+
+


=====================================
docs/includes/ntpmon-body.txt
=====================================
--- a/docs/includes/ntpmon-body.txt
+++ b/docs/includes/ntpmon-body.txt
@@ -63,7 +63,7 @@ p:: Change peer display to default mode, showing refid.
 
 q:: Cleanly terminate the program.
 
-s:: Show all hosts, not just reachable ones.
+s:: Toggle display of only reachable hosts (default is all hosts).
 
 w:: Toggle wide mode.
 
@@ -88,4 +88,9 @@ appear to hang when monitoring hosts with extremely long MRU lists -
 in particular, public pool hosts. Correct behavior requires a Mode 6
 protocol extension not yet present in those versions.
 
+Even with this extension, monitoring a sufficiently high-traffic
+server sometimes fails.
+
+include::mrufail.txt[]
+
 // end


=====================================
docs/includes/ntpq-body.txt
=====================================
--- a/docs/includes/ntpq-body.txt
+++ b/docs/includes/ntpq-body.txt
@@ -271,6 +271,13 @@ ind assid status conf reach auth condition last_event cnt
 +monstats+::
   Display monitor facility statistics.
 
++direct::
+  Normally, the mrulist command retrieves an entrie MRUreport (possibly
+  consisting of more than one MRU span), sorts it, and presents the
+  result. But attempting to fetch an entire MRU report may fail on a
+  server so loaded that nome of its MRU entries age out before they
+  are shipped. With this option, each segment is reported as it arrives.
+
 +mrulist+ [+limited+ | +kod+ | +mincount=+'count' | +laddr=+'localaddr' | +sort=+'sortorder' | +resany=+'hexmask' | +resall=+'hexmask']::
   Obtain and print traffic counts collected and maintained by the
   monitor facility. This is useful for tracking who _uses_ or
@@ -290,7 +297,7 @@ The _sortorder_ defaults to +lstint+ and may be any of +addr+,
 +count+, +avgint+, +lstint+, or any of those preceded by a minus sign
 (hyphen) to reverse the sort order. The output columns are:
 +
-include::mrufmt.txt
+include::mrufmt.txt[]
 
 +mreadvar+ 'assocID' 'assocID' [ 'variable_name' [ = 'value'[ ... ]::
 +mrv+ 'assocID' 'assocID' [ 'variable_name' [ = 'value'[ ... ]::
@@ -529,4 +536,11 @@ The -O (--old-rv) option of legacy versions has been retired.
 
 The command ":config" is no longer accepted; use "config" instead.
 
+== Known Limitations ==
+
+include::mrufail.txt
+
+You may be able to retrieve partial data in very high-traffic
+conditions by using the 'direct' option.
+
 // end


=====================================
ntpclients/ntpmon
=====================================
--- a/ntpclients/ntpmon
+++ b/ntpclients/ntpmon
@@ -12,7 +12,7 @@ Any keystroke causes a poll and update. Keystroke commands:
 'o': Change peer display to opeers mode, showing destination address.
 'p': Change peer display to default mode, showing refid.
 'q': Cleanly terminate the program.
-'s': Show all hosts, not just reachable ones.
+'s': Toggle display of only reachable hosts (default is all hosts).
 'w': Toggle wide mode.
 'x': Cleanly terminate the program.
 ' ': Rotate through a/n/o/p display modes. 
@@ -200,7 +200,7 @@ if __name__ == '__main__':
 
                         # Now the MRU report
                         limit = stdscr.getmaxyx()[0] - len(peers)
-                        span = session.mrulist(recent=limit)
+                        span = session.mrulist(variables={'recent':limit})
                         mru_report.now = time.time()
 
                         # Nyquist-interval sampling - half the


=====================================
ntpclients/ntpq
=====================================
--- a/ntpclients/ntpq
+++ b/ntpclients/ntpq
@@ -673,6 +673,12 @@ usage: version
         else:
             print("Direct mode is off")
 
+    def help_direct(self):
+        self.say("""\
+function: toggle direct-mode MRU output
+usage: direct
+""")
+
     def do_raw(self, line):
         "do raw mode variable output"
         self.rawmode = True


=====================================
pylib/packet.py
=====================================
--- a/pylib/packet.py
+++ b/pylib/packet.py
@@ -1399,8 +1399,14 @@ class ControlSession:
                 if span.is_complete():
                     break
 
-                # Snooze for a bit between queries to let ntpd catch
-                # up with other duties.
+                # The C version of ntpq used to snooze for a bit
+                # between MRU queries to let ntpd catch up with other
+                # duties.  It turns out this is a petty bad idea.  Above
+                # a certain traffic threshold, servers accumulate MRU records
+                # enough faster than this protocol loop can capture them that
+                # you never get a complete span.  The last thing you want to
+                # do when trying to keep up with a high-traffic server is stall
+                # in the read loop.
                 ## time.sleep(0.05)
 
                 # If there were no errors, increase the number of rows



View it on GitLab: https://gitlab.com/NTPsec/ntpsec/compare/fd505303bd14baf88c4f508eadfa0e40c556e2d1...5835482f8a7a7e0e20e0e0aae7277364e1f4f478
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ntpsec.org/pipermail/vc/attachments/20161222/1d49af00/attachment.html>