Logfile visualization tools - request for comment

Tue Jul 19 03:24:32 UTC 2016

Quoting "Eric S. Raymond" <esr at thyrsus.com>:
> I've spent the last week reading code and preparing for a serious
> effort to write logfile visualization tools for NTPsec.
>
> There are at least two good reasons to do this, one retrospective and
> one prospective.  The retrospective one is that the stats and
> data-reduction tools now in the distribution are a huge mess. They're
> archaic, often embodying assumptions that have long since passed their
> sell-by date (one pair of tools relies, for example, on mode 7, which
> we've eliminated).  They're poorly documented or not documented at
> all. They're written in Perl, which is a serious maintainability
> problem. The whole area cries to be cleaned up - or better yet, nuked
> and replaced with better code.
>
> The prospective reason is that I need a way to make sense out of my test
> farm data.  I want to be able to answer a bunch of questions, beginning with
> "How important are check servers to a machine with an GPS?"

One of my NTP modules has this annoying habit of drifting multiple  
milliseconds while still producing a PPS (which is odd, it was  
claiming unlocked status for multiple hours).  I think it was  
physically damaged in shipment.  I don't use that module anymore.  But  
it was useful to have other servers configured to verify what was  
going on.

LAN stratum 1 sources can measure offset wander in the tens of  
microseconds, with the right conditions.  For example:  
https://dan.drown.org/rpi/pi2.html

> The path forward that I'm considering is a Python translation of the
> NTP branch of David Drown's chrony-graph software. It makes beautiful and
> interesting visualizations, embodying a lot of domain knowledge about
> which statistics and relationships are interesting.  And of course, that last
> part is where my own knowledge is weakest. Co-opting his work will let me
> concentrate on the software-engineering aspect of the problem.

My first name is Daniel.  David is my dad's name by random chance.  So  
unless he wrote NTP visualization software... :)

> I'm thinking Python translation for two reasons.  One is our general
> Python-and-sh policy for scripting, to reduce maintainance complexity
> down the road.
>
> Another is that, as Gary Miller has pointed out, ddrown's collection of
> shellscripts and Perl has terrible locality.  Gary says he can see in
> his graphs artifacts from chrony-graph's disk overhead, and I have no
> reason to disbelieve that. Gary suggests that a symbiont daemon, keeping
> intermediate data in memory until the final graphs need to be produced,
> would produce less noise.

I wouldn't be surprised if it was from processor activity (instead of  
disk activity), actually.

On my Intel machine I generate all my graphs on, the time spent is  
broken down like this.

1. bin/run (excluding bin/plot), log filtering/processing = ~2 seconds
2. bin/plot = ~9 seconds
2a. bin/plot - just calls to bin/percentile and bin/histogram (perl) =  
~2 seconds
2b. bin/plot - just calls to gnuplot = ~7 seconds
3. bin/copy-to-website, copying html/png to remote system = ~1 second

total script time:
real    0m12.166s
user    0m8.503s
sys     0m2.039s

Disk activity during this time:

0 read operations (everything came from cache)
144 write operations totaling 16MB taking 136ms

These numbers are going to be much slower on a Raspberry Pi, but they  
shouldn't be a drastic impact on the system when running every hour.

I experimented a bit to see if I could speed this up any.  The biggest  
win was limiting the output of the bin/histogram program.  After I do  
that, gnuplot is much faster (and the temporary data file  
loopstats.history is much smaller):

2. bin/plot = ~3 seconds
2a. bin/plot - just calls to bin/percentile and bin/histogram (perl) =  
~1 second
2b. bin/plot - just calls to gnuplot = ~2 seconds

total script time:
real    0m5.984s
user    0m3.797s
sys     0m0.715s

> So, translate chrony-graph to Python.  But this would leave us with
> a coordination problem. It means either ddrown has to be prepared to
> let the Python version be his new mainline, or we have to cross-port
> all his improvements after the fork.
>
> David (*Daniel), do you have any suggestions for making this less painful?

I don't see a compelling reason to switch to python.  I guess I don't  
see the pain points.