Troubleshooting Dynamic Host Configuration Protocol (DHCP) Connection in LXD, Part 1: The Dnsmasq Server

While testing LXD, the GNU/Linux container hypervisor, one of the issue I've encountered was certain containers failed to obtain an IP address after booting up. Hence, for the past few days, while scratching my head investigating the issue, I've gained some understanding on how DHCP works and learned a few tricks on how to troubleshoot a DHCP connection.

DHCP is a client/server where the client obtain an IP address from the server. Thus, to troubleshoot any connection issue, we should look in two places, the server and the client side.

Is Dnsmasq up and running?
First, the server end. As I mentioned in my previous post, in LXD, the lxcbr0 bridge interface is basically a virtual switch, through Dnsmasq, provides network infrastructures services like Domain Name System (DNS) and DHCP services. If DHCP is not working, first things to check whether the Dnsmasq has been started correctly. Pay attention to all lines that contains the word 'dnsmasq' and check for any errors.
$ sudo systemctl status lxc-net -l
● lxc-net.service - LXC network bridge setup
   Loaded: loaded (/usr/lib/systemd/system/lxc-net.service; enabled; vendor preset: disabled)
   Active: active (exited) since Wed 2015-11-18 21:04:24 MYT; 1s ago
  Process: 21863 ExecStop=/usr/libexec/lxc/lxc-net stop (code=exited, status=0/SUCCESS)
  Process: 21891 ExecStart=/usr/libexec/lxc/lxc-net start (code=exited, status=0/SUCCESS)
 Main PID: 21891 (code=exited, status=0/SUCCESS)
   Memory: 408.0K
      CPU: 39ms
   CGroup: /system.slice/lxc-net.service
           └─21935 dnsmasq -u nobody --strict-order --bind-interfaces --pid-file=/run/lxc/dnsmasq.pid --listen-address 10.0.3.1 --dhcp-range 10.0.3.2,10.0.3.254 --dhcp-lease-max=253 --dhcp-no-override --except-interface=lo --interface=lxcbr0 --dhcp-leasefile=/var/lib/misc/dnsmasq.lxcbr0.leases --dhcp-authoritative

Nov 18 21:04:24 localhost.localdomain dnsmasq[21935]: started, version 2.75 cachesize 150
Nov 18 21:04:24 localhost.localdomain dnsmasq[21935]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth DNSSEC loop-detect inotify
Nov 18 21:04:24 localhost.localdomain dnsmasq-dhcp[21935]: DHCP, IP range 10.0.3.2 -- 10.0.3.254, lease time 1h
Nov 18 21:04:24 localhost.localdomain dnsmasq-dhcp[21935]: DHCP, sockets bound exclusively to interface lxcbr0
Nov 18 21:04:24 localhost.localdomain dnsmasq[21935]: reading /etc/resolv.conf
Nov 18 21:04:24 localhost.localdomain dnsmasq[21935]: using nameserver 192.168.42.1#53
Nov 18 21:04:24 localhost.localdomain dnsmasq[21935]: read /etc/hosts - 2 addresses
Nov 18 21:04:24 localhost.localdomain systemd[1]: Started LXC network bridge setup.

As LXD is still actively under development, there are still many pending issues, you may want to walk through the '/usr/libexec/lxc/lxc-net' script to investigate more. Although from my experience, is simple service restart 'systemctl restart lxc-net' should be sufficient.

Failed to create listening socket?
Few days back, one of the issue I've experienced is that the Dnsmasq server failed to start due to failure in creating listening socket.
......
Nov 14 20:43:18 localhost.localdomain systemd[1]: Starting LXC network bridge setup...
Nov 14 20:43:18 localhost.localdomain lxc-net[24314]: dnsmasq: failed to create listening socket for 10.0.3.1: Cannot assign requested address
Nov 14 20:43:18 localhost.localdomain dnsmasq[24347]: failed to create listening socket for 10.0.3.1: Cannot assign requested address
Nov 14 20:43:18 localhost.localdomain dnsmasq[24347]: FAILED to start up
Nov 14 20:43:18 localhost.localdomain lxc-net[24314]: Failed to setup lxc-net.
Nov 14 20:43:18 localhost.localdomain systemd[1]: Started LXC network bridge setup.
......

Alternately, you can also check through the Systemd journal log.
$ journalctl -u lxc-net.service 
$ journalctl -u lxc-net.service | grep -i 'failed to'

The question we should raise when looking into this error is which other process is trying to bind to port 53, the default DNS port. There are several ways ways to check this.

Are there any other running Dnsmasq instances? Note that output was formatted to improve readability. Besides the one started by lxc-net service. The other two instances were created by libvirt and vagrant-libvirt.
$ ps -o pid,cmd -C dnsmasq
  PID CMD
 2851 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/vagrant-libvirt.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper

 2852 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/vagrant-libvirt.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper

 2933 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper

 2934 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper

21935 dnsmasq -u nobody --strict-order --bind-interfaces --pid-file=/run/lxc/dnsmasq.pid --listen-address 10.0.3.1 --dhcp-range 10.0.3.2,10.0.3.254 --dhcp-lease-max=253 --dhcp-no-override --except-interface=lo --interface=lxcbr0 --dhcp-leasefile=/var/lib/misc/dnsmasq.lxcbr0.leases --dhcp-authoritative

Is there any process currently listening to port 53 using the same IP address of 10.0.3.1?
$ sudo netstat -anp | grep :53 | grep LISTEN
tcp        0      0 10.0.3.1:53             0.0.0.0:*               LISTEN      21935/dnsmasq       
tcp        0      0 192.168.124.1:53        0.0.0.0:*               LISTEN      2933/dnsmasq        
tcp        0      0 192.168.121.1:53        0.0.0.0:*               LISTEN      2851/dnsmasq        
tcp6       0      0 fe80::fc7b:93ff:fe7a:53 :::*                    LISTEN      21935/dnsmasq   

For my case, which I didn't manage to capture the output, is that another orphaned Dnsmasq instance preventing the 'lxc-net' service from launching a new Dnsmasq instance on lxcbr0 interface. If I remember correctly, this was due to the left over instances by me while debugging the '/usr/libexec/lxc/lxc-net' script.

No comments:

Post a Comment