mrulist direct mode, monitoring pool servers

Hal Murray hmurray at megapathdsl.net
Tue Dec 20 21:31:07 UTC 2016


I implemented a direct mode.  It writes out each batch of slots as soon as it 
gets them.  Any sort options are ignored.  There will be duplicates of any 
slots that get updated after they are retrieved.  I think the filtering stuff 
should still work but I didn't try it.

The code and UI need  more work, but as a proof of concept it managed to 
capture everything from a busy server.

I think collecting data from a busy server will always be "interesting".  I 
know about 2 issues.

The first is the race between collecting data and having slots get moved or 
recycled while you are collecting.  This is obviously easier if you can run 
on the same system as the server so there are no network delays.

If we can't go fast enough, we should be able to get some of the data and/or 
some estimates of how much we are missing.  We can probably test that by 
running over a network.  (That will also test the lost packet code.)  We need 
to be sure to debug this case/mode so we will have useful tools when the next 
big burst of traffic hits the pool.

The other issue is memory and CPU on the system collecting the data.  I don't 
know which limit will kick in first.  It takes a lot of CPU, but that's not a 
problem as long as you can keep up with the server.  I think that translates 
into a threshold for how busy a server you can grab complete data from.  I 
think memory will be a serious issue.  I saw troubles before switching to 
direct mode but it should work on a system with more memory or less traffic.  
Direct mode doesn't use much memory so this probably won't be a problem.


My reference is a pool server in the cloud.  For $5 per month you get 512 
megabytes.  I had 150 megabytes allocated to the MRU list.  That's about a 
million slots.  I had the pool bandwidth adjusted so that covered well over a 
day.  I was grabbing data with a script that ran once a day from a cron job.  
The old c code worked before the recent burst of pool traffic.  It didn't 
work during the burst.  The new python code got tangled up with the burst so 
I don't know how well it would have worked before the burst.  I think it 
would have run out of memory.  I don't have a cron job working yet.

---------

Any suggestions for a UI/CLI?

Currently, the direct command sets a flag that gets passed down similar to 
the hostnames flag.  That seems pretty ugly to me, mostly because it gets 
used in several places rather than only one as with hostnames.  (The 
hostnames stuff seems ugly too, but direct is uglier.)

Maybe it should be a separate program.  I'm assuming that mode will mostly be 
used from a cron job.

I'd like to also collect statistics on the data collection process.  How many 
retransmissions and such.  Maybe that should go to syserr?  Maybe a command 
line switch?

We could implement another stats file and have the server write stuff when 
slots are recycled.  That might need some rate limits.  This feels like the 
sort of problem that has a cliff rather than a slope.

-- 
These are my opinions.  I hate spam.





More information about the devel mailing list