The end of the beginning is in sight

Eric S. Raymond esr at thyrsus.com
Sat Jan 7 04:43:01 UTC 2017


(mutt hiccupped. You might see this twice.)

Kurt Roeckx <kurt at roeckx.be>:
> On Fri, Jan 06, 2017 at 12:14:35PM -0500, Eric S. Raymond wrote:
> > and the other is ripping out all
> > the interface-scanning stuff so we lose the dependency on
> > getifaddrs(3) and use wildcard interfaces only.
> 
> Are you sure this is going to work? As far as I know there are (or
> were) good reasons to do this, but I can't remember them
> currently. But it's at least something that's specific to UDP.

If you can remember a blocker, please tell us about it before I
put a large amount of work into this.

A little Googling found this from 2012:

https://blog.powerdns.com/2012/10/08/on-binding-datagram-udp-sockets-to-the-any-addresses/

Enter the very powerful recvmsg(2). Recvmsg() allows for the getting
of a boatload of parameters per datagram, as requested via
setsockopt().

One of the parameters we can request is the original destination IP
address of the packet.

IPV6

For IPv6, this is actually standardized in RFC 3542, which tells us to
request parameter IPV6_RECVPKTINFO via setsockopt(), which will lead
to the delivery of the IPV6_PKTINFO parameter when we use recvmsg(2).

This parameter is sent to us as a struct in6_pktinfo, and its
ipi6_addr member contains the original destination IPv6 address of the
query.

When replying to a packet from a socket bound to ::, we have the
reverse problem: how to specify which *source* address to use. To do
so, use sendmsg(2) and specify an IPV6_PKTINFO parameter, which again
contains a struct in6_pktinfo.

And we are done!

To get this to work on OSX, please #define __APPLE_USE_RFC_3542, but
otherwise this feature is portable across FreeBSD, OSX and
Linux. (Please let me know about Windows, I want to make this page as
valuable as possible).

IPv4

For IPv4 the situation is more complicated. Linux and the BSDs picked
a slightly different way to do things, since they did not have an RFC
to guide them. Confusingly, the Linux manpages document this
incorrectly (I’ll submit a patch to the manpages as soon as everybody
agrees that this page describes things correctly).

For BSD, use a setsockopt() called IP_RECVDSTADDR to request the
original destination address. This then arrives as an IP_RECVDSTADDR
option over recvmsg(), which carries a struct in_addr, which does NOT
necessarily have all fields filled out (like for example the
destination port number).

For Linux, use the setsockopt() called IP_PKTINFO, which will get you
a parameter over recvmsg() called IP_PKTINFO, which carries a struct
in_pktinfo, which has a 4 byte IP address hiding in its ipi_addr
field.

Conversely, for sending on Linux pass a IP_PKTINFO parameter using
sendmsg() and make it contain a struct in_pktinfo.

On FreeBSD, pass the IP_SENDSRCADDR option, and make it contain a
struct in_addr, but again note that it probably does not make sense to
set the source port in there, as your socket is bound to exactly one
port number (even if it covers many IP addresses).

BINDING TO :: FOR IPV6 *AND* IPV4 PURPOSES

On Linux, one can bind to :: and get packets destined for both IPv6
and IPv4. The good news is that this combines well with the above, and
Linux delivers an IPv4 IP_PKTINFO for IPv4 packets, and will also
honour the IP_PKTINFO for outgoing IPv4 packets on such a combined
IPv4/IPv6 socket.

On FreeBSD, and probably other BSD-derived systems, one should bind
explicitly to :: and 0.0.0.0 to cover IPv4 and IPv6. This is probably
better. To get this behaviour on Linux, use the setsockopt()
IPV6_V6ONLY, or set /proc/sys/net/ipv6/bindv6only to 1.

ACTUAL SOURCE CODE

To see all this in action, head over to
http://wiki.powerdns.com/trac/browser/trunk/pdns/pdns/nameserver.cc –
it contains the relevant setsockopt(), sendmsg() and recvmsg() calls.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


More information about the devel mailing list