Discussion:
[musl] DNS resolution happenning only after timeout
Srinivasa Raghavan
2017-09-28 10:15:28 UTC
Permalink
Hi,

When using "Alpine" docker image which uses musl-libc, we are facing delay
when we do operations like below in our production environment,
1. ping <name>
2. nslookup <name>
3. traceroute <name>
4. http request from node.js

There is a 5 second delay in name resolution, and then the above command
returns the response. The same problem does not occur in "debian" docker
image (which uses GNU libc).

In our case, there is a combination of SERVFAIL, "canonical name" along
with "Non authoritative answer".

Some learnings after doing some trial and error:
1. If I install "bind-tools" package in alpine, the "nslookup" happens
without delay.
2. If I set "options timout:1" in /etc/resolv.conf , then the name is
resolved after 1 second (instead of 5 seconds).
3. Whatever I change in /etc/resolv.conf (Like setting "domain", "search"),
there was no benefit.
4. output of "host"/"nslookup" command shows "SERVFAIL"
5. The problem does not occur if run from the host machine (Not from alpine
container).
6. The problem does not occur if run from another container which uses Gnu
libc, like "Debian" image.

Sample command outputs attached for reference.

Request you to kindly help in debugging / resolution of this.

Kind Regards,
R. Srinivasa Raghavan.
Szabolcs Nagy
2017-09-28 10:28:55 UTC
Permalink
Post by Srinivasa Raghavan
When using "Alpine" docker image which uses musl-libc, we are facing delay
when we do operations like below in our production environment,
1. ping <name>
2. nslookup <name>
3. traceroute <name>
4. http request from node.js
this bug may be related:
https://github.com/rancher/rancher/issues/9961
Post by Srinivasa Raghavan
There is a 5 second delay in name resolution, and then the above command
returns the response. The same problem does not occur in "debian" docker
image (which uses GNU libc).
In our case, there is a combination of SERVFAIL, "canonical name" along
with "Non authoritative answer".
1. If I install "bind-tools" package in alpine, the "nslookup" happens
without delay.
2. If I set "options timout:1" in /etc/resolv.conf , then the name is
resolved after 1 second (instead of 5 seconds).
3. Whatever I change in /etc/resolv.conf (Like setting "domain", "search"),
there was no benefit.
4. output of "host"/"nslookup" command shows "SERVFAIL"
5. The problem does not occur if run from the host machine (Not from alpine
container).
6. The problem does not occur if run from another container which uses Gnu
libc, like "Debian" image.
Sample command outputs attached for reference.
Request you to kindly help in debugging / resolution of this.
Kind Regards,
R. Srinivasa Raghavan.
Rich Felker
2017-09-28 16:55:28 UTC
Permalink
Post by Szabolcs Nagy
Post by Srinivasa Raghavan
When using "Alpine" docker image which uses musl-libc, we are facing delay
when we do operations like below in our production environment,
1. ping <name>
2. nslookup <name>
3. traceroute <name>
4. http request from node.js
https://github.com/rancher/rancher/issues/9961
Yes, I just filed it after reading the discussion on IRC and this bug
report that was linked as describing similar behavior:

https://github.com/rancher/rancher/issues/4177#issuecomment-332571951

This really requires a fix on the rancher-dns side. I'm not sure
exactly what glibc is doing, but it couldn't be giving the behavior
you want without doing something wrong: it's falling back and trying
different search domains when it hasn't been told that the first one
doesn't exist, only that the nameserver is experiencing a problem.

Rich
Srinivasa Raghavan
2017-10-04 13:48:10 UTC
Permalink
Hi Rich,

Thanks for the reply.

Some updates:
1. Our DNS server is "Infoblox appliance".
2. When we had a delay, we found that there was a "AAAA" query along with
"A" query.

I did further debugging with "tcpdump" and able to narrow down on the
difference in behavior between "debian" and "alpine" images.

In debian:
If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
Then the "nslookup" (or name resolution) does *not* do a "AAAA" query

In alpine:
If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
Then the "nslookup" (or name resolution) does an "AAAA" query along with
"A" query

Is this intentional?

Also, I was wondering if there was any way to disable AAAA query in name
resolution?

Kind Regards,
Srinivasa Raghavan.
Post by Rich Felker
Post by Szabolcs Nagy
Post by Srinivasa Raghavan
When using "Alpine" docker image which uses musl-libc, we are facing
delay
Post by Szabolcs Nagy
Post by Srinivasa Raghavan
when we do operations like below in our production environment,
1. ping <name>
2. nslookup <name>
3. traceroute <name>
4. http request from node.js
https://github.com/rancher/rancher/issues/9961
Yes, I just filed it after reading the discussion on IRC and this bug
https://github.com/rancher/rancher/issues/4177#issuecomment-332571951
This really requires a fix on the rancher-dns side. I'm not sure
exactly what glibc is doing, but it couldn't be giving the behavior
you want without doing something wrong: it's falling back and trying
different search domains when it hasn't been told that the first one
doesn't exist, only that the nameserver is experiencing a problem.
Rich
Markus Wichmann
2017-10-04 16:46:38 UTC
Permalink
Post by Srinivasa Raghavan
Hi Rich,
Thanks for the reply.
1. Our DNS server is "Infoblox appliance".
2. When we had a delay, we found that there was a "AAAA" query along with
"A" query.
I did further debugging with "tcpdump" and able to narrow down on the
difference in behavior between "debian" and "alpine" images.
If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
Then the "nslookup" (or name resolution) does *not* do a "AAAA" query
That's probably because glibc's DNS resolver only generates AAAA queries
if it can create an IPv6 socket.
Post by Srinivasa Raghavan
If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
Then the "nslookup" (or name resolution) does an "AAAA" query along with
"A" query
Is this intentional?
Also, I was wondering if there was any way to disable AAAA query in name
resolution?
There does not appear to be a way without changing code. In musl, the
function name_from_dns() will always generate both the AAAA and the A
query unless "family" is explicitly set to one of the address families.
No input from resolv.conf or similar is used for this. And "family"
comes directly from the caller, i.e. nslookup. You'd have to change the
nslookup code to only ask for IPv4 addresses.
Post by Srinivasa Raghavan
Kind Regards,
Srinivasa Raghavan.
Ciao,
Markus
Srinivasa Raghavan
2017-10-04 19:28:35 UTC
Permalink
Hi Markus,

Thanks for the reply.

The problem is not only in nslookup, it is there in ping, tracert, curl,
node.js, wget etc. :(

I will debug and find the exact c api that is used for each of the
scenarios.

I am just wondering if there is any workaround ?

Lot of folks are facing this issue (slow dns name resolution in alpine
linux, with some dns servers) , and this may be the root cause?

Kind Regards,
Rsr
Post by Markus Wichmann
Post by Srinivasa Raghavan
Hi Rich,
Thanks for the reply.
1. Our DNS server is "Infoblox appliance".
2. When we had a delay, we found that there was a "AAAA" query along with
"A" query.
I did further debugging with "tcpdump" and able to narrow down on the
difference in behavior between "debian" and "alpine" images.
If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
Then the "nslookup" (or name resolution) does *not* do a "AAAA" query
That's probably because glibc's DNS resolver only generates AAAA queries
if it can create an IPv6 socket.
Post by Srinivasa Raghavan
If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
Then the "nslookup" (or name resolution) does an "AAAA" query along with
"A" query
Is this intentional?
Also, I was wondering if there was any way to disable AAAA query in name
resolution?
There does not appear to be a way without changing code. In musl, the
function name_from_dns() will always generate both the AAAA and the A
query unless "family" is explicitly set to one of the address families.
No input from resolv.conf or similar is used for this. And "family"
comes directly from the caller, i.e. nslookup. You'd have to change the
nslookup code to only ask for IPv4 addresses.
Post by Srinivasa Raghavan
Kind Regards,
Srinivasa Raghavan.
Ciao,
Markus
Rich Felker
2017-10-04 20:18:50 UTC
Permalink
This post might be inappropriate. Click to display it.
Srinivasa Raghavan
2017-10-04 20:39:48 UTC
Permalink
Hi Rich,
Thanks for your time and reply.
Will try to get the dns fixed.
Kind Regards,
R. Srinivasa Raghavan.
Post by Rich Felker
Post by Srinivasa Raghavan
Hi Markus,
Thanks for the reply.
The problem is not only in nslookup, it is there in ping, tracert, curl,
node.js, wget etc. :(
I will debug and find the exact c api that is used for each of the
scenarios.
I am just wondering if there is any workaround ?
Lot of folks are facing this issue (slow dns name resolution in alpine
linux, with some dns servers) , and this may be the root cause?
musl does not have any way to suppress applications' requests for IPv6
lookups. In theory if an application used the AI_ADDRCONF option to
request "only give IPv6 results if IPv6 is supported" we could do it,
but there are multiple reasons this hasn't been implemented including
ambiguity as to how exactly it should behave, and I doubt it would
help anyway since most applications don't use this option.
From the info you've provided so far, my best guess is that you have a
buggy nameserver that either stalls or replies with a non-conclusive
message like ServFail when it receives an AAAA query. If this is the
1. If the nameserver is on a device under your control, see if there's
an upgrade/patch to fix the issue.
2. Switch to a different nameserver without the bug like the public
Google ones at 8.8.8.8 etc.
3. Run your own caching/proxy nameserver on localhost and configure it
to reply NxDomain (does not exist) for all AAAA lookups.
4. Use iptables to catch DNS query packets for AAAA records and
redirect them to a dummy server that just always replies with
NxDomain.
Without knowing more about your environment I can't really guess which
ones of these options, if any, might be practical for you but
hopefully at least one is.
Rich
Post by Srinivasa Raghavan
Post by Markus Wichmann
Post by Srinivasa Raghavan
Hi Rich,
Thanks for the reply.
1. Our DNS server is "Infoblox appliance".
2. When we had a delay, we found that there was a "AAAA" query along
with
Post by Srinivasa Raghavan
Post by Markus Wichmann
Post by Srinivasa Raghavan
"A" query.
I did further debugging with "tcpdump" and able to narrow down on the
difference in behavior between "debian" and "alpine" images.
If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
Then the "nslookup" (or name resolution) does *not* do a "AAAA" query
That's probably because glibc's DNS resolver only generates AAAA
queries
Post by Srinivasa Raghavan
Post by Markus Wichmann
if it can create an IPv6 socket.
Post by Srinivasa Raghavan
If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
Then the "nslookup" (or name resolution) does an "AAAA" query along
with
Post by Srinivasa Raghavan
Post by Markus Wichmann
Post by Srinivasa Raghavan
"A" query
Is this intentional?
Also, I was wondering if there was any way to disable AAAA query in
name
Post by Srinivasa Raghavan
Post by Markus Wichmann
Post by Srinivasa Raghavan
resolution?
There does not appear to be a way without changing code. In musl, the
function name_from_dns() will always generate both the AAAA and the A
query unless "family" is explicitly set to one of the address families.
No input from resolv.conf or similar is used for this. And "family"
comes directly from the caller, i.e. nslookup. You'd have to change the
nslookup code to only ask for IPv4 addresses.
Post by Srinivasa Raghavan
Kind Regards,
Srinivasa Raghavan.
Ciao,
Markus
Continue reading on narkive:
Loading...