ISC_PLATFORM_USEBACKTRACE

Gary E. Miller gem at rellim.com
Wed May 31 21:21:24 UTC 2017


Yo Hal!

On Wed, 31 May 2017 14:01:48 -0700
Hal Murray <hmurray at megapathdsl.net> wrote:

> >> Can you try something like
> >>     gdb <your ntpd>
> >>     print /a <your hex number>  
> 
> > Nope.  We have LTO, ASR, and a bunch of other things making the
> > addresses not repeatable.  Every time I intentionally crash ntpd at
> > the same spot the stack IPs are unique.   
> 
> Can we do something like print out the address of main during
> startup, and then subtract that from the stack PCs before printing
> them.  Then to decode one, add the current value of main before
> asking gdb to decode it.

Feel free to try that.  I spent way too many years manually doing
stack traces, I'm not gonna do it again, and we certainly can not
expect users to do that.

I'm trying to get backtrace() to work.  And the problem is not
backtrace() but how to catch the SIGSYS in a usefull way.  If I can
catch the SIGSYS properly then backtrace() will be fine.  At least for
gcc.

From what I can tell catchTrap() is doing the wrong thing, or at
least the suboptimal thing.  ntpd has at least a dozen ways to
handle traps/assertions, none of them close to optimal.  If catchTrap()
is done properly it can tell the user what signal it caught directly,
no need for backtrace().  But backtrace() would be a plus.

catchTrap() needs to be called as a (*sa_sigaction)(), not as an
(*sa_handler)().  All nice and POSIX too.

> > 05-31T13:10:38 ntpd[7311]: sandbox: seccomp enabled.
> > Bad system call
> > So the code that is supposed to catch that is not really working.
> > No backtrace will work until I can actually catch the bad call.   
> 
> You might try using gdb with a break at catchTrap.  It gets confused
> if the log stuff gets another trap when it tried to print the "got a
> trap" message.

But the point is to NOT have to use gdb.  By the time catchTrap() is
called way too much information is already lost.  I'm gonna have
to rewrite catchTrap() and how it is triggered.

It seems like I'm diggging a deep hole, but at least nothing collapsing
yet.

> There is something strange going on that I don't understand yet.

A lot of strange things, and it just keeps getting worse the deeper I
dig.

What I'm seeing is that working inside the caught signal makes it
trivial to create further signals.  strace is showing me the signal
recursion, but no idea how to fix it:

recvmsg(10, {msg_name={sa_family=AF_NETLINK, nl_pid=0,
nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=0,
type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}}, iov_len=4096}],
msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 47 --- SIGSYS
{si_signo=SIGSYS, si_code=SYS_SECCOMP, si_call_addr=0x7f068ff7ba60,
si_syscall=__NR_recvmsg, si_arch=AUDIT_ARCH_X86_64} ---
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 11 fstat(11,
{st_mode=S_IFREG|0644, st_size=215050, ...}) = 0 mmap(NULL, 215050,
PROT_READ, MAP_PRIVATE, 11, 0) = 0x7f069175e000
close(11)                               = 0
open("/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/libgcc_s.so.1",
O_RDONLY|O_CLOEXEC) = 11 read(11,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200*\0\0\0\0\0\0"...,
832) = 832 fstat(11, {st_mode=S_IFREG|0644, st_size=92528, ...}) = 0
mmap(NULL, 2188336, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 11,
0) = 0x7f068ebe5000 mprotect(0x7f068ebfb000, 2093056, PROT_NONE) = 0
mmap(0x7f068edfa000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 11, 0x15000) = 0x7f068edfa000
close(11)                               = 0 mprotect(0x7f068edfa000,
4096, PROT_READ) = 0 munmap(0x7f069175e000, 215050)          = 0
futex(0x7f069022a0f0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7f068edfb1a0, FUTEX_WAKE_PRIVATE, 2147483647) = 0 write(2,
"05-31T14:16:39 ", 1505-31T14:16:39 )         = 15 write(2,
"ntpd[9543]: ", 12ntpd[9543]: )            = 12 write(2, "SIGSYS: got a
trap. Probably sec"..., 56SIGSYS: got a trap. Probably seccomp
omission. Bailing. ) = 56 write(4, "05-31T14:16:39 ntpd[9543]:
SIGSY"..., 83) = 83 write(2, "05-31T14:16:39 ", 1505-31T14:16:39
)         = 15 write(2, "ntpd[9543]: ", 12ntpd[9543]: )            = 12
write(2, "SIGSYS: got a trap. Probably sec"..., 56SIGSYS: got a trap.
Probably seccomp omission. Bailing. ) = 56 write(4, "05-31T14:16:39
ntpd[9543]: SIGSY"..., 83) = 83 write(2, "05-31T14:16:39 ",
1505-31T14:16:39 )         = 15 write(2, "ntpd[9543]: ", 12ntpd[9543]:
)            = 12 write(2, "SIGSYS: got a trap. Probably sec"...,
56SIGSYS: got a trap. Probably seccomp omission. Bailing. ) = 56 [...]

I have no idea how the SIGSYS, after the first one, are happening...

>  I'm
> getting troubles with -u 38.  I just tried bisect and I think it
> points to changes that don't make sense.  Maybe my good/bad test case
> isn't working right.  ??? There is another issue last night for Seg
> fault.  Issues 328 and 329.

This backtrace problem is a large one, I've no spare cycles until
I get some progress on it.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
	gem at rellim.com  Tel:+1 541 382 8588

	    Veritas liberabit vos. -- Quid est veritas?
    "If you can’t measure it, you can’t improve it." - Lord Kelvin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ntpsec.org/pipermail/devel/attachments/20170531/f5e719da/attachment.bin>


More information about the devel mailing list