mrulist direct mode, monitoring pool servers
hmurray at megapathdsl.net
Tue Dec 20 21:31:07 UTC 2016
I implemented a direct mode. It writes out each batch of slots as soon as it
gets them. Any sort options are ignored. There will be duplicates of any
slots that get updated after they are retrieved. I think the filtering stuff
should still work but I didn't try it.
The code and UI need more work, but as a proof of concept it managed to
capture everything from a busy server.
I think collecting data from a busy server will always be "interesting". I
know about 2 issues.
The first is the race between collecting data and having slots get moved or
recycled while you are collecting. This is obviously easier if you can run
on the same system as the server so there are no network delays.
If we can't go fast enough, we should be able to get some of the data and/or
some estimates of how much we are missing. We can probably test that by
running over a network. (That will also test the lost packet code.) We need
to be sure to debug this case/mode so we will have useful tools when the next
big burst of traffic hits the pool.
The other issue is memory and CPU on the system collecting the data. I don't
know which limit will kick in first. It takes a lot of CPU, but that's not a
problem as long as you can keep up with the server. I think that translates
into a threshold for how busy a server you can grab complete data from. I
think memory will be a serious issue. I saw troubles before switching to
direct mode but it should work on a system with more memory or less traffic.
Direct mode doesn't use much memory so this probably won't be a problem.
My reference is a pool server in the cloud. For $5 per month you get 512
megabytes. I had 150 megabytes allocated to the MRU list. That's about a
million slots. I had the pool bandwidth adjusted so that covered well over a
day. I was grabbing data with a script that ran once a day from a cron job.
The old c code worked before the recent burst of pool traffic. It didn't
work during the burst. The new python code got tangled up with the burst so
I don't know how well it would have worked before the burst. I think it
would have run out of memory. I don't have a cron job working yet.
Any suggestions for a UI/CLI?
Currently, the direct command sets a flag that gets passed down similar to
the hostnames flag. That seems pretty ugly to me, mostly because it gets
used in several places rather than only one as with hostnames. (The
hostnames stuff seems ugly too, but direct is uglier.)
Maybe it should be a separate program. I'm assuming that mode will mostly be
used from a cron job.
I'd like to also collect statistics on the data collection process. How many
retransmissions and such. Maybe that should go to syserr? Maybe a command
We could implement another stats file and have the server write stuff when
slots are recycled. That might need some rate limits. This feels like the
sort of problem that has a cliff rather than a slope.
These are my opinions. I hate spam.
More information about the devel