WIBDR: Traffic analysis
Hal Murray
hmurray at megapathdsl.net
Fri May 29 07:11:08 UTC 2020
WIBDR == What I've Been Doing Recently
Maybe if we use a tag like that occasionally, it will encourage others to
report on their adventures, or some interesting details of plain old boring
work.
----------
I'm not sure how/why I got started on this, but I've been trying to learn more
about the traffic that NTP servers see. I have 2 servers in the pool to use
as sources of data.
One thing is pretty obvious. There are a lot of bad guys out there, trying
all sorts of things. The KE server gets much more abusive traffic than
legitimate traffic.
NTS KE serves good: 12613
NTS KE serves_bad: 28461
If you look at your log file, you will see things like:
28 May 18:36:16 ntpd[52023]: NTSs: SSL accept from 172.58.239.64:37024 failed:
wrong version number, took 0.006 sec
28 May 18:36:36 ntpd[52023]: NTSs: SSL accept from 172.58.239.64:29722 failed:
wrong version number, took 0.008 sec
28 May 18:37:18 ntpd[52023]: NTSs: SSL accept from 76.216.52.221:49541 failed:
wrong version number, took 0.002 sec
There are several cases of KE server TLS errors where I special cased the
logging down to one line in order to reduce clutter. I think "wrong version"
is from ssh attempts. That code is in nts_ke_accept_fail() in nts_server.c if
you ever want to add another.
-------------
The second part is individual NTP requests.
I recently (month or two ago) cleaned up the rate limiting. ntpq/mrulist now
has a "score" column and a "dropped" column. I was happy when I figured out
how to do the math. score is in packets per second. decay_time is in units
of seconds.
mon->score *= expf(-since_last/mon_data.decay_time);
mon->score += 1.0/mon_data.decay_time;
The first line is the exponential decay since the score was updated when the
last packet arrived. The mru slot has the time of last packet so since_last
is easy to calculate. The second line adds in the score for this packet.
Graphically, each packet gets a total of 1 packet*second. Dividing the
starting score by the decay time accounts for it hanging around longer since
it is decaying slower.
If the limit is 1 and the decay time is 20 (the defaults), you can get a burst
of 20 packets before any start getting dropped.
-------------
With the default rate limiting , lots of packets get dropped.
>From a pool server that has been up 7 1/2 days.
Ctrl-C will stop MRU retrieval and display partial results.
lstint avgint rstr r m v count score drop rport remote address
=====================================================================
17358 0.494 e0 L 3 3 1170863 2240.073 1170526 123 108.161.83.242
4209 0.496 e0 L 3 3 1269062 1619.395 1268017 634 67.216.65.10
4855 0.421 c0 . 3 3 1269863 0.237 1269581 123 50.233.222.130
32946 0.140 e0 L 3 3 1337989 6322.806 1337808 123 190.106.77.82
2558 0.355 c0 . 3 3 1359788 0.206 1359539 123 207.189.100.126
24754 0.296 e0 L 3 3 1467428 6280.005 1467119 123 12.183.201.66
43956 0.081 e0 L 3 4 1866084 6748.549 1865962 123 65.158.5.78
1450 0.201 c0 . 3 4 1998916 0.050 1998725 123 216.103.178.80
7084 0.251 c0 . 3 3 2461501 0.206 2460590 123 67.204.10.106
1603 0.191 c0 . 3 3 3243349 0.051 3242562 123 206.40.97.188
12933 0.166 e0 L 3 3 3729992 5879.169 3728988 634 63.98.240.2
13431 0.079 e0 L 3 4 7717268 6887.582 7716776 123 8.36.94.10
# Collected 12 slots in 1.087 seconds
That system has a big mru list so I can see things like this. If the bad guys
are sending packets often enough, they are likely to stay in the mru list
forever. The bottom slot is 12 packets per second, averaged over 7 days!
We took a closer look with tcpdump. There are conspicuous 10 second bursts.
They range from 2,000 packets per second to over 20,000. Occasionally, there
are back-to-back 10 second bursts.
Steven tracked those back to a bug in FortiGate.
https://community.ntppool.org/t/ntp-bursts-from-fortigate-firewalls/1661
There is another layer of bursts. These last 2 or 4 minutes with 10-100
packets per second. I assume they are several to many systems behind a NAT
box getting used while my system is in the DNS rotation. I got one place to
verify that is was a "large lab" using NAT but don't know yet how many systems
there are.
I'm still scratching my head about what the right limit should be. Ideally,
any NAT system with enough traffic for me to notice would setup their own NTP
servers and tell their clients via dhcp.
--
These are my opinions. I hate spam.
More information about the devel
mailing list