Back in 2012 there was a discussion about having Fedora default to using a local DNS caching name server [1]:
[1] http://comments.gmane.org/gmane.linux.redhat.fedora.devel/166018
I think this needs to be revisited. While DNSSEC support has historically been a driving factor for implementing this, there is an even more fundamental need due to the poor performance of the system in case the first listed nameserver in /etc/resolv.conf fails for some reason. It is shameful that Linux systems and applications in general still, after 20+ years, can't perform adequately after a primary DNS server failure. The stub resolver in glibc which uses /etc/resolv.conf can decide that the first listed nameserver entry is down, but this decision has to be made over and over in every single process on the system that is doing DNS resolution, resulting in repeated long application hangs/delays. We need an independent, system-wide DNS cache, and always point resolv.conf to 127.0.0.1 to solve this fundamental design problem with how name resolution works on a Linux system. Windows has had a default system-wide DNS cache for over a decade. It is about time that Linux catches up.
Yesterday, a new version of dnsmasq was released [2] that adds full DNSSEC support and provides an alternative to unbound which dnssec-trigger requires. There has also been great work done to solve the NTP/DNSSEC bootstrap problem [3]. What options are currently available in e.g. NetworkManager for using a local DNS cache and what is the current status of this integration? Is it ready yet for turning on by default in all Fedora products?
[2] http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2014q2/008416.html [3] http://comments.gmane.org/gmane.comp.embedded.cerowrt.devel/2244
On Thu, Apr 10, 2014 at 9:41 AM, Chuck Anderson cra@wpi.edu wrote:
Back in 2012 there was a discussion about having Fedora default to using a local DNS caching name server [1]:
...
repeated long application hangs/delays. We need an independent, system-wide DNS cache, and always point resolv.conf to 127.0.0.1 to
I don't think pointing resolv.conf at 127.0.0.1 is the right answer for this. The functionality should be implemented as a 'hosts' service to be listed in nsswitch.conf between files and dns.
On Thu, 10 Apr 2014, Billy Crook wrote:
I don't think pointing resolv.conf at 127.0.0.1 is the right answer for this. The functionality should be implemented as a 'hosts' service to be listed in nsswitch.conf between files and dns.
For security reasons, you really want resolv.conf to only point to 127.0.0.1. Otherwise applications cannot determine the security of the DNSSEC answers without doing full validation inside every application themselves.
See recent discussions on the DANE mailinglist regarding the AD bit discussion:
http://www.ietf.org/mail-archive/web/dane/current/maillist.html
Paul
On Thu, 10 Apr 2014, Chuck Anderson wrote:
Yesterday, a new version of dnsmasq was released [2] that adds full DNSSEC support and provides an alternative to unbound which dnssec-trigger requires. There has also been great work done to solve the NTP/DNSSEC bootstrap problem [3]. What options are currently available in e.g. NetworkManager for using a local DNS cache and what is the current status of this integration? Is it ready yet for turning on by default in all Fedora products?
[2] http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2014q2/008416.html [3] http://comments.gmane.org/gmane.comp.embedded.cerowrt.devel/2244
In my opinion, the last remaining hurdle is roaming users and captive portals. Nothing else prevents anyone from running with dnssec enabled per default (and even on laptops, developers can run with dnssec-triggerd and unbound). If you run a server, you should already be running unbound or bind (or dnsmasq w dnssec).
I do not know if dnsmasq contains the proper code for being reconfigured on the fly based on network changes, which is a requirement for adopting to DNSSEC in various situations (such as VPNs, connecting to corporate LAN/Wifi with internal-only domains, DHCP/ISP forwarder failures). But libreswan, vpnc and openvpn already have the neccessary unbound reconfiguration code to properly support VPNs (eg flushing the cache for the vpn domain when (dis)connecting the VPN, etc)
What is really needed to complete the solution and make it usable for users, not developers, is the proper integration of dnssec-trigger like code natively into NM. That must include captive portal checking (which dnssec-triggerd does) and upstream DNS forwarder checker, with dynamic reconfiguration on the fly. The current dnssec-trigger does a decent job, but its problematic because it is not native to NM. Fedora already hosts the captive portal detection services for dnssec-trigger.
It would also need a little anaconda support. When the user is requested to put in a DNS server (for a static server configuration, not dhcp) than anaconda should configure that server as a _forwarder_ in an unbound configuration and ensure DNSSEC is enabled and running after install. For the roaming laptop case, anaconda should install unbound with the NM integrated dnssec-trigger replacement.
In a perfect world, where all network applications are NM aware, it would be awesome if NM would launch a secure container to do the probing and captive portal logon using a sandboxed browser window. None of the other applications would even know there is a network. Once the captive portal login has succeeded, the uplink becomes available to all apps and the secure container can be destroyed. This would offer the maximum protection for applications against unsafe DNS, and limit the required acceptance of "DNS lies" to the captive portal login procedure only.
The main issue to make this happen has been that we don't have enough resources to convert the dnssec-trigger code into native NM code.
Paul
Hello Chuck,
Thank you so much for brining this up.
On Thursday, 10 April 2014 8:12 PM, Chuck Anderson wrote: I think this needs to be revisited. We need an independent, system-wide DNS cache, and always point resolv.conf to 127.0.0.1 to solve this fundamental design problem with how name resolution works on a Linux system.
Totally agree. In fact, recently there have been multiple instances of discussions wherein this exact same topic was discussed and unanimously everyone agrees that for various reasons having a default local DNS resolver running at 127.0.0.1:53 is the best solution. And going forward it'll be even more beneficial.
Paul pointed to one of these discussions in his reply.
I plan to file a feature/change request for this one. I got caught up with other work this past week so could not do it. Will start with it right away.
Thank you! --- Regards -Prasad http://feedmug.com
Hello,
On Thursday, 10 April 2014 11:39 PM, P J P wrote: I plan to file a feature/change request for this one. I got caught up with other work this past week so could not do it. Will start with it right away.
Please see -> https://fedoraproject.org/wiki/Changes/Default_Local_DNS_Resolver
It's a System Wide Change Proposal request up for review.
I have set the target release as F22, because the proposal deadline for F21 was 08 Apr 2014 [1]. Besides, this change would require significant work on the related packages like NetworkManager etc. So F22 seems safer.
In case if you spot any discrepancies or have additional inputs or links to relevant documents etc. please feel free to update the wiki page or let me know and I'll add it there. -- [1] https://fedoraproject.org/wiki/Releases/21/Schedule
Thank you. --- Regards -Prasad http://feedmug.com
On Sat, Apr 12, 2014 at 02:33:59 +0800, P J P pj.pandit@yahoo.co.in wrote:
Please see -> https://fedoraproject.org/wiki/Changes/Default_Local_DNS_Resolver
It's a System Wide Change Proposal request up for review.
I think there should be something explicitly about how this is going to work with captive portals that lie about dns in order to get people's web browsers to go to their sign in page.
On Saturday, 12 April 2014 12:28 AM, Bruno Wolff III wrote: I think there should be something explicitly about how this is going to work with captive portals that lie about dns in order to get people's web browsers to go to their sign in page.
Sorry, I did not get the question. Could you please explain it a bit?
Thank you. --- Regards -Prasad http://feedmug.com
On Sat, Apr 12, 2014 at 03:06:17 +0800, P J P pj.pandit@yahoo.co.in wrote:
On Saturday, 12 April 2014 12:28 AM, Bruno Wolff III wrote: I think there should be something explicitly about how this is going to work with captive portals that lie about dns in order to get people's web browsers to go to their sign in page.
Sorry, I did not get the question. Could you please explain it a bit?
It looks like your proposal is going to break things for people using some wifi hotspots. That isn't good. What is the plan to deal with that?
On Saturday, 12 April 2014 12:40 AM, Bruno Wolff III wrote: It looks like your proposal is going to break things for people using some wifi hotspots.
Why, how?
--- Regards -Prasad http://feedmug.com
On 04/11/2014 03:14 PM, P J P wrote:
On Saturday, 12 April 2014 12:40 AM, Bruno Wolff III wrote: It looks like your proposal is going to break things for people using some wifi hotspots.
Why, how?
It's a hack designed to handle someone that just connected to the network and opened a browser, say. Instead of blocking access, one runs a fake DNS system that responds with the captive portal's IP to every query. The httpd service at that IP responds with an "enter your credentials to get network access" page to all URLs.
An example of such fake DNS server is the following code resolving all queries to 192.168.123.45
#!/usr/bin/perl
use Net::DNS::Nameserver; use strict; use warnings;
sub reply_handler { my ($qname, $qclass, $qtype, $peerhost) = @_; my ($rcode, @ans, @auth, @add);
if ($qtype eq "A") { my ($ttl, $rdata) = (3600, "192.168.123.45"); push @ans, Net::DNS::RR->new("$qname $ttl $qclass $qtype $rdata"); $rcode = "NOERROR"; } else { $rcode = "NXDOMAIN"; }
# mark the answer as authoritive (by setting the 'aa' flag return ($rcode, @ans, @auth, @add, { aa => 1 }); }
my $ns = Net::DNS::Nameserver->new( LocalPort => 53, ReplyHandler => &reply_handler, Verbose => 0, );
if ($ns) { $ns->main_loop; } else { die "couldn't create nameserver object\n"; }
On Fri, 11 Apr 2014, Przemek Klosowski wrote:
On 04/11/2014 03:14 PM, P J P wrote:
On Saturday, 12 April 2014 12:40 AM, Bruno Wolff III wrote: It looks like your proposal is going to break things for people using some wifi hotspots.
Why, how?
It's a hack designed to handle someone that just connected to the network and opened a browser, say. Instead of blocking access, one runs a fake DNS system that responds with the captive portal's IP to every query. The httpd service at that IP responds with an "enter your credentials to get network access" page to all URLs.
An example of such fake DNS server is the following code resolving all queries to 192.168.123.45
yum install dnssec-triggerd, start the service, start the applet, then attack yourself and see. That situation is handled fine, and you will be given the choice to join the rogue network (insecurely!) or operate using "cache-only", meaning you can still get DNS answers for items in your cache, but no new items can be retrieved over the network.
Note that dnssec-trigger can reconfigure unbound in various ways to work around DNS blockage, in order of preference:
- Use fully functional ISP obtained DNS servers as forwarder - Become a full recursive server and bypass ISP DNS servers - Try DNS over TCP 53 to connect to well known remote DNS servers configured in dnssec-triggerd.conf as forwarder - Try DNS over TCP 443 wrapped in SSL to connect to well known remote DNS servers configured in dnssec-triggerd.conf. - Operate from cache only
It will regularly probe to see if network conditions improved to try and go back to a more prefered method.
I've been running this solution on fedora for about five years now. It works reasonably well, and anyone who is on this list surely has could try it out. Because of lack of NM integration I would not call it enduser ready yet.
Paul
On Fri, 11 Apr 2014 16:39:34 -0400 (EDT) Paul Wouters paul@nohats.ca wrote:
...snip...
I've been running this solution on fedora for about five years now. It works reasonably well, and anyone who is on this list surely has could try it out. Because of lack of NM integration I would not call it enduser ready yet.
Me too. :)
I hope the NM integration will show up at some point. It's really a pretty nice setup.
kevin
On Fri, 2014-04-11 at 14:45 -0600, Kevin Fenzi wrote:
On Fri, 11 Apr 2014 16:39:34 -0400 (EDT) Paul Wouters paul@nohats.ca wrote:
...snip...
I've been running this solution on fedora for about five years now. It works reasonably well, and anyone who is on this list surely has could try it out. Because of lack of NM integration I would not call it enduser ready yet.
Me too. :)
I hope the NM integration will show up at some point. It's really a pretty nice setup.
It's getting worked on, slowly but surely. And since DNSSEC has been getting more interest, we've been having a lot more discussions about how NetworkManager can better serve dnssec-trigger and other DNS consumers like it.
Dan
On Fri, 2014-04-11 at 14:45 -0600, Kevin Fenzi wrote:
On Fri, 11 Apr 2014 16:39:34 -0400 (EDT) Paul Wouters paul@nohats.ca wrote:
...snip...
I've been running this solution on fedora for about five years now. It works reasonably well, and anyone who is on this list surely has could try it out. Because of lack of NM integration I would not call it enduser ready yet.
Me too. :)
I hope the NM integration will show up at some point. It's really a pretty nice setup.
I am using it too successfully. Only occasionally unbound seem to get confused, not clear when, it doesn't happen more than twice a month and systemctl restart unbound.service fixes it.
Simo.
On Fri, 11 Apr 2014, Simo Sorce wrote:
I hope the NM integration will show up at some point. It's really a pretty nice setup.
I am using it too successfully. Only occasionally unbound seem to get confused, not clear when, it doesn't happen more than twice a month and systemctl restart unbound.service fixes it.
Next time please run sudo unbound-control list_forwards and cat /etc/resolv.conf and see if that locates the problem?
The one issue I have is that sometimes I NM fails to write resolv.conf in insecure mode, and I end up with no resolvers. The other issue is that in insecure mode (which you are not meant to run in other than signon or with very broken captive portals)) the VPN forward is added to unbound, but unbound is bypassed during secure mode, so internal resources are not available.
Paul
On Fri, Apr 11, 2014 at 10:44:31PM -0400, Paul Wouters wrote:
On Fri, 11 Apr 2014, Simo Sorce wrote:
I hope the NM integration will show up at some point. It's really a pretty nice setup.
I am using it too successfully. Only occasionally unbound seem to get confused, not clear when, it doesn't happen more than twice a month and systemctl restart unbound.service fixes it.
Next time please run sudo unbound-control list_forwards and cat /etc/resolv.conf and see if that locates the problem?
The one issue I have is that sometimes I NM fails to write resolv.conf in insecure mode, and I end up with no resolvers. The other issue is that in insecure mode (which you are not meant to run in other than signon or with very broken captive portals)) the VPN forward is added to unbound, but unbound is bypassed during secure mode, so internal resources are not available.
I'm proposing that /etc/resolv.conf is never re-written under any circumstances. A local caching resolver should ALWAYS be used and resolv.conf should ALWAYS say:
nameserver 127.0.0.1
so that the applications/services don't hang when ONE external server goes down or becomes unreachable.
All the "magic" for secure/insecure modes during NTP bootstrapping or captive portals has to happen inside unbound (or whatever caching resolver/forwarder is eventually chosen) and it should never be bypassed. That way the forwarder can switch to a second, third, etc. upstream resolver without applications noticing that the first one failed. Or if it is a full iterative resolver, it will internally handle failed authoritative nameservers without applications noticing.
Maybe we should set the file to be immutable after setting it to 127.0.0.1:
chattr +i /etc/resolv.conf
On Sat, 12 Apr 2014, Chuck Anderson wrote:
I'm proposing that /etc/resolv.conf is never re-written under any circumstances. A local caching resolver should ALWAYS be used and resolv.conf should ALWAYS say:
nameserver 127.0.0.1
Cheers. That's a goal I share with you, but...
All the "magic" for secure/insecure modes during NTP bootstrapping or captive portals has to happen inside unbound (or whatever caching resolver/forwarder is eventually chosen) and it should never be bypassed.
Currently, to prevent unbound from either rejecting DNS lies or get polluted by accepting DNS lies, is "taken offline" by the system during hotspot signon. resolv.conf is rewritten to use the DHCP supplied nameservers to get past the portal. During this time, all applications are exposed to DNS lies. Once the captive portal is done, resolv.conf is changed back to 127.0.0.1 and unbound is "online" again protecting all applications. If the network is so bad this cannot work, and the user opts to remain "insecure", the vulnerable situation is continued. If the user opts to go "cache only", than resolv.conf is written as 127.0.0.1 but unbound is configured with 127.0.0.127 as forwarder, meaning no new DNS answers will ever come in.
As I said in previous posts, the ideal situation to not mess with resolv.conf on the host is to have a disposable secure container get the "new network" before any application gets network, do the hotspot login, and throw away the container. In that case, resolv.conf on the host (not container) never has to be modified.
Maybe we should set the file to be immutable after setting it to 127.0.0.1:
chattr +i /etc/resolv.conf
That is the trick currently used by dnssec-triggerd to prevent other applications from messing with that file.
Paul
On Sat, Apr 12, 2014 at 11:01:20AM -0400, Paul Wouters wrote:
On Sat, 12 Apr 2014, Chuck Anderson wrote:
Maybe we should set the file to be immutable after setting it to 127.0.0.1:
chattr +i /etc/resolv.conf
That is the trick currently used by dnssec-triggerd to prevent other applications from messing with that file.
Oh crap, that means I'm going to need a "really really don't touch this file" flag, perhaps a one-way flag that can never be un-set.
I'm already setting chattr +i /etc/resolv.conf to stop anything touching the file, and I don't want apps to mess with that flag (or the file).
Rich.
On Sat, 12 Apr 2014, Richard W.M. Jones wrote:
chattr +i /etc/resolv.conf
That is the trick currently used by dnssec-triggerd to prevent other applications from messing with that file.
Oh crap, that means I'm going to need a "really really don't touch this file" flag, perhaps a one-way flag that can never be un-set.
I'm already setting chattr +i /etc/resolv.conf to stop anything touching the file, and I don't want apps to mess with that flag (or the file).
Which is we need native NM integration, and applications telling NM what to do with resolv.conf so only NM modifies it (and provides with overrides to accomodate your "hardcoded" version). Preferably enforced by SElinux.
Paul
On Sat, Apr 12, 2014 at 04:40:50PM +0100, Richard W.M. Jones wrote:
On Sat, Apr 12, 2014 at 11:01:20AM -0400, Paul Wouters wrote:
On Sat, 12 Apr 2014, Chuck Anderson wrote:
Maybe we should set the file to be immutable after setting it to 127.0.0.1:
chattr +i /etc/resolv.conf
That is the trick currently used by dnssec-triggerd to prevent other applications from messing with that file.
Oh crap, that means I'm going to need a "really really don't touch this file" flag, perhaps a one-way flag that can never be un-set.
I'm already setting chattr +i /etc/resolv.conf to stop anything touching the file, and I don't want apps to mess with that flag (or the file).
Bind mount the file to a read-only filesystem?
On 2014-04-12 11:01 (GMT-0400) Paul Wouters composed:
Chuck Anderson wrote:
Maybe we should set the file to be immutable after setting it to 127.0.0.1:
chattr +i /etc/resolv.conf
That is the trick currently used by dnssec-triggerd to prevent other applications from messing with that file.
I've been doing that myself for years on installations that think my ethernet-only non-wireless LAN host connections need "managing" by NetworkManager, Resolvconf, Wicked or anything else that came along to automagically mis-configure it.
On Sat, 2014-04-12 at 13:11 -0400, Felix Miata wrote:
On 2014-04-12 11:01 (GMT-0400) Paul Wouters composed:
Chuck Anderson wrote:
Maybe we should set the file to be immutable after setting it to 127.0.0.1:
chattr +i /etc/resolv.conf
That is the trick currently used by dnssec-triggerd to prevent other applications from messing with that file.
I've been doing that myself for years on installations that think my ethernet-only non-wireless LAN host connections need "managing" by NetworkManager, Resolvconf, Wicked or anything else that came along to automagically mis-configure it.
So you've gone out of your way to run a daemon but prevent it from working as configured, instead of just reconfiguring it to do what you need.
I have Network Manager and it is extremely simple to configure it to keep fixed DNS Servers as well as have static addresses for ethernet interfaces.
I find that today, except extremely rare case, all that people that complain about network management tools interfering are people that never tried or tried once *years* ago and never checked again.
Simo.
On 2014-04-12 16:12 (GMT-0400) Simo Sorce composed:
On Sat, 2014-04-12 at 13:11 -0400, Felix Miata wrote:
On 2014-04-12 11:01 (GMT-0400) Paul Wouters composed:
Chuck Anderson wrote:
Maybe we should set the file to be immutable after setting it to 127.0.0.1:
chattr +i /etc/resolv.conf
That is the trick currently used by dnssec-triggerd to prevent other applications from messing with that file.
I've been doing that myself for years on installations that think my ethernet-only non-wireless LAN host connections need "managing" by NetworkManager, Resolvconf, Wicked or anything else that came along to automagically mis-configure it.
So you've gone out of your way to run a daemon but prevent it from working as configured, instead of just reconfiguring it to do what you need.
What daemon did I go out of my way to run?
I have Network Manager and it is extremely simple to configure it to keep fixed DNS Servers as well as have static addresses for ethernet interfaces.
Simple without X running, where I normally perform my elementary configuration chores in an OFM? When I install, I install minimal, so there is no X available to tweak the installer's defaults.
I find that today, except extremely rare case, all that people that complain about network management tools interfering are people that never tried or tried once *years* ago and never checked again.
Maybe so, but then some of us don't need our networks "managed", only just configured at installation time, after which they are left untouched for the remaining lifetime of the installation.
On Sun, 2014-04-13 at 00:52 -0400, Felix Miata wrote:
On 2014-04-12 16:12 (GMT-0400) Simo Sorce composed:
On Sat, 2014-04-12 at 13:11 -0400, Felix Miata wrote:
I've been doing that myself for years on installations that think my ethernet-only non-wireless LAN host connections need "managing" by NetworkManager, Resolvconf, Wicked or anything else that came along to automagically mis-configure it.
So you've gone out of your way to run a daemon but prevent it from working as configured, instead of just reconfiguring it to do what you need.
What daemon did I go out of my way to run?
I had the impression you said you make the file immutable to prevent one of the mentioned daemons above from touching it. Apologies if I misunderstood.
I have Network Manager and it is extremely simple to configure it to keep fixed DNS Servers as well as have static addresses for ethernet interfaces.
Simple without X running, where I normally perform my elementary configuration chores in an OFM? When I install, I install minimal, so there is no X available to tweak the installer's defaults.
I usually edit /etc/sysconfig/network-scripts/ifcfg-eth0 (or similar) for headless servers, what's hard about that ?
I find that today, except extremely rare case, all that people that complain about network management tools interfering are people that never tried or tried once *years* ago and never checked again.
Maybe so, but then some of us don't need our networks "managed", only just configured at installation time, after which they are left untouched for the remaining lifetime of the installation.
Not a reason to throw mud at perfectly valid software and perfectly valid use cases.
Defaults need to appeal to the majority of users, they can't please everyone, or they wouldn't be just defaults, they'd be the only option.
We are discussing what defaults make sense for Fedora when it come to DNS caching and DNSSEC, my strong belief is that a default DNS cache is a very good idea for the majority of users. Users with special needs like you can simply not bring it in. If you do a minimal install I suspect it won't be present, as in F21 minimal installs will really be tight afaiu.
Simo.
On Sat, Apr 12, 2014 at 04:12:41PM -0400, Simo Sorce wrote:
So you've gone out of your way to run a daemon but prevent it from working as configured, instead of just reconfiguring it to do what you need.
I have to go out of my way to *stop* NetworkManager from running and to configure a fixed IP address.
I have Network Manager and it is extremely simple to configure it to keep fixed DNS Servers as well as have static addresses for ethernet interfaces.
Unfortunately NetworkManager can't handle brief network outages without dropping the connection (RHBZ#1022954 comment 3). This isn't desirable for servers that have ethernet and fixed IP addresses.
Rich.
On Sun, 2014-04-13 at 18:18 +0100, Richard W.M. Jones wrote:
On Sat, Apr 12, 2014 at 04:12:41PM -0400, Simo Sorce wrote:
So you've gone out of your way to run a daemon but prevent it from working as configured, instead of just reconfiguring it to do what you need.
I have to go out of my way to *stop* NetworkManager from running and to configure a fixed IP address.
Oh come on ...
I have Network Manager and it is extremely simple to configure it to keep fixed DNS Servers as well as have static addresses for ethernet interfaces.
Unfortunately NetworkManager can't handle brief network outages without dropping the connection (RHBZ#1022954 comment 3). This isn't desirable for servers that have ethernet and fixed IP addresses.
Let's try to be constructive here, this is a bug, I remember distinctively discussing delaying dropping ip/interfaces when the link goes down for brief moemnts, and I know that works.
Also this is bug is about dhcp interfaces, not statically configured ones, and it is a bug, not intended behavior, also difficult to reproduce.
I am not here to proselitize you on the use of NM, but let's not be extreme, NM has greatly improved and keep improving at every release,m and for a lot of users it's really the only *decent* option, I am a bit annoyed by constant, gratuitous and useless 'attacks' to it, and I am not even a NM developer and never even contributed to the project with patches (yet).
You don't like it, fine, but let's not come up with the old trite anecdata every time someone mentions NM, it's not useful and derails the conversation to useless bikeshedding.
I'd rather go back to the topic: default local DNS caching name server.
Simo.
On Sun, 13 Apr 2014, Richard W.M. Jones wrote:
So you've gone out of your way to run a daemon but prevent it from working as configured, instead of just reconfiguring it to do what you need.
I have to go out of my way to *stop* NetworkManager from running and to configure a fixed IP address.
Installing for a static server, even if not running NM, should be no problem for the DNS case. The DNS your put in via kickstart/GUI will be configured as a forwarder for unbound. No need to do anything.
Paul
On Sun, Apr 13, 2014 at 10:18 AM, Richard W.M. Jones rjones@redhat.com wrote:
Unfortunately NetworkManager can't handle brief network outages without dropping the connection (RHBZ#1022954 comment 3). This isn't desirable for servers that have ethernet and fixed IP addresses.
NetworkManager-config-server fixes this, it sets: ignore-carrier=*
See https://mail.gnome.org/archives/networkmanager-list/2013-June/msg00159.html
Hello Kevin, Paul
On Saturday, 12 April 2014 2:16 AM, Kevin Fenzi wrote:
I've been running this solution on fedora for about five years now. It works reasonably well, and anyone who is on this list surely has could try it out. Because of lack of NM integration I would not call it enduser ready yet.
Me too. :)
Does it work out of the box? I mean just $ yum install unbound and it works, or there are some steps involved to configure it neatly. For ex. internal domains etc.
If so, it'll be extremely helpful to document these steps on the change wiki. Or if there is already a document about it, please link to the same. Or let me know and I'll do it.
Thank you. --- Regards -Prasad http://feedmug.com
On Sat, 2014-04-12 at 02:33 +0800, P J P wrote:
Hello,
On Thursday, 10 April 2014 11:39 PM, P J P wrote: I plan to file a feature/change request for this one. I got caught up with other work this past week so could not do it. Will start with it right away.
Please see -> https://fedoraproject.org/wiki/Changes/Default_Local_DNS_Resolver
It's a System Wide Change Proposal request up for review.
I have set the target release as F22, because the proposal deadline for F21 was 08 Apr 2014 [1]. Besides, this change would require significant work on the related packages like NetworkManager etc. So F22 seems safer.
In case if you spot any discrepancies or have additional inputs or links to relevant documents etc. please feel free to update the wiki page or let me know and I'll add it there.
NM has had local caching nameserver capability built-in since Fedora 12 or something like that. Set 'dns=dnsmasq' in the [main] section of /etc/NetworkManager/NetworkManager.conf and NM will spawn dnsmasq in a local caching nameserver configuration and write 127.0.0.1 to resolv.conf. NM will update that dnsmasq instance whenever your network configuration chagnes to ensure that dnsmasq has the latest nameservers.
It seems that 'unbound' is getting more love these days though, due to it's DNSSEC capabilities, and there is not yet a NetworkManager DNS plugin for unbound/dnssec-trigger. I know some people are working on that though (Thomas Hozza and Pavel Simerda) and I'd expect that to show up in the near future.
Note that hotspot detection is an important part of this, since hotspots will clearly break any kind of DNSSEC validation that happens, and that's something that's being worked out between dnssec-trigger and NetworkManager right now too.
NM in F20+ already has a "dns=none" option that prevents NM from touching resolv.conf, but obviously if NM isn't touching it, the DNS information that NM gets from upstream or your local configuration needs to get to the local caching nameserver somehow. Which is what the existing NM DNS plugins are for, like the dnsmasq one.
Dan
On Fri, 2014-04-11 at 14:21 -0500, Dan Williams wrote:
On Sat, 2014-04-12 at 02:33 +0800, P J P wrote:
Hello,
On Thursday, 10 April 2014 11:39 PM, P J P wrote: I plan to file a feature/change request for this one. I got caught up with other work this past week so could not do it. Will start with it right away.
Please see -> https://fedoraproject.org/wiki/Changes/Default_Local_DNS_Resolver
It's a System Wide Change Proposal request up for review.
I have set the target release as F22, because the proposal deadline for F21 was 08 Apr 2014 [1]. Besides, this change would require significant work on the related packages like NetworkManager etc. So F22 seems safer.
In case if you spot any discrepancies or have additional inputs or links to relevant documents etc. please feel free to update the wiki page or let me know and I'll add it there.
NM has had local caching nameserver capability built-in since Fedora 12 or something like that. Set 'dns=dnsmasq' in the [main] section of /etc/NetworkManager/NetworkManager.conf and NM will spawn dnsmasq in a local caching nameserver configuration and write 127.0.0.1 to resolv.conf. NM will update that dnsmasq instance whenever your network configuration chagnes to ensure that dnsmasq has the latest nameservers.
It seems that 'unbound' is getting more love these days though, due to it's DNSSEC capabilities, and there is not yet a NetworkManager DNS plugin for unbound/dnssec-trigger. I know some people are working on that though (Thomas Hozza and Pavel Simerda) and I'd expect that to show up in the near future.
Note that hotspot detection is an important part of this, since hotspots will clearly break any kind of DNSSEC validation that happens, and that's something that's being worked out between dnssec-trigger and NetworkManager right now too.
NM in F20+ already has a "dns=none" option that prevents NM from touching resolv.conf, but obviously if NM isn't touching it, the DNS information that NM gets from upstream or your local configuration needs to get to the local caching nameserver somehow. Which is what the existing NM DNS plugins are for, like the dnsmasq one.
" Add domain specific name server entries into local name server's configuration file and ensure that applications are able to resolve internal(company wide) domain names too. (try connecting to company mail/IRC server)"
We want to make sure that any local caching nameserver that we do use doesn't rely exclusively on file-based configuration, or if it does, it's able to re-read that configuration file using SIGHUP or some seamless reload functionality. It *must* also be able to load new configuration without dropping in-process DNS queries on the floor, otherwise users will experience hung DNS a non-trivial amount of the time.
The better way is to add/remove zones + servers from dnsmasq over D-Bus, which NM does not yet do since the patches are not yet upstream, or to use some other socket-based protocol like unbound does with dnssec-trigger.
Dan
Hi,
On Saturday, 12 April 2014 12:56 AM, Dan Williams wrote: We want to make sure that any local caching nameserver that we do use doesn't rely exclusively on file-based configuration, or if it does, it's able to re-read that configuration file using SIGHUP or some seamless reload functionality. It *must* also be able to load new configuration without dropping in-process DNS queries on the floor, otherwise users will experience hung DNS a non-trivial amount of the time.
The better way is to add/remove zones + servers from dnsmasq over D-Bus, which NM does not yet do since the patches are not yet upstream, or to use some other socket-based protocol like unbound does with dnssec-trigger.
Sure, makes sense. This workflow bits need to be worked out still. File based configuration is just an example. Important is that dynamic name servers augment the local name server rather than replace it.
--- Regards -Prasad http://feedmug.com
Hello Dan,
On Saturday, 12 April 2014 12:51 AM, Dan Williams wrote: NM has had local caching nameserver capability built-in since Fedora 12 or something like that. Set 'dns=dnsmasq' in the [main] section of /etc/NetworkManager/NetworkManager.conf and NM will spawn dnsmasq in a local caching nameserver configuration and write 127.0.0.1 to resolv.conf. NM will update that dnsmasq instance whenever your network configuration chagnes to ensure that dnsmasq has the latest nameservers.
It seems that 'unbound' is getting more love these days though, due to it's DNSSEC capabilities, and there is not yet a NetworkManager DNS plugin for unbound/dnssec-trigger. I know some people are working on that though (Thomas Hozza and Pavel Simerda) and I'd expect that to show up in the near future.
Note that hotspot detection is an important part of this, since hotspots will clearly break any kind of DNSSEC validation that happens, and that's something that's being worked out between dnssec-trigger and NetworkManager right now too.
NM in F20+ already has a "dns=none" option that prevents NM from touching resolv.conf, but obviously if NM isn't touching it, the DNS information that NM gets from upstream or your local configuration needs to get to the local caching nameserver somehow. Which is what the existing NM DNS plugins are for, like the dnsmasq one.
That's great. Thank you so much for sharing this information. I'll add it to the wiki page.
About the wifi hotspots breakage, I'm still not in the clear. IIUC how they work is, all client traffic is blocked/redirected to a designated server till the time user authenticates/makes payment/accepts conditions etc. This blockage/redirection is probably done on the the gateway or some such entry/exit point, no?
Thank you. --- Regards -Prasad http://feedmug.com
On Sat, 2014-04-12 at 03:35 +0800, P J P wrote:
Hello Dan,
On Saturday, 12 April 2014 12:51 AM, Dan Williams wrote: NM has had local caching nameserver capability built-in since Fedora 12 or something like that. Set 'dns=dnsmasq' in the [main] section of /etc/NetworkManager/NetworkManager.conf and NM will spawn dnsmasq in a local caching nameserver configuration and write 127.0.0.1 to resolv.conf. NM will update that dnsmasq instance whenever your network configuration chagnes to ensure that dnsmasq has the latest nameservers.
It seems that 'unbound' is getting more love these days though, due to it's DNSSEC capabilities, and there is not yet a NetworkManager DNS plugin for unbound/dnssec-trigger. I know some people are working on that though (Thomas Hozza and Pavel Simerda) and I'd expect that to show up in the near future.
Note that hotspot detection is an important part of this, since hotspots will clearly break any kind of DNSSEC validation that happens, and that's something that's being worked out between dnssec-trigger and NetworkManager right now too.
NM in F20+ already has a "dns=none" option that prevents NM from touching resolv.conf, but obviously if NM isn't touching it, the DNS information that NM gets from upstream or your local configuration needs to get to the local caching nameserver somehow. Which is what the existing NM DNS plugins are for, like the dnsmasq one.
That's great. Thank you so much for sharing this information. I'll add it to the wiki page.
About the wifi hotspots breakage, I'm still not in the clear. IIUC how they work is, all client traffic is blocked/redirected to a designated server till the time user authenticates/makes payment/accepts conditions etc. This blockage/redirection is probably done on the the gateway or some such entry/exit point, no?
They almost always run DNS interceptors, so that the DHCP server on the captive side always returns the portal's login page for any request, even "www.google.com" or "www.fedoraproject.org" will redirect you to the login page. Obviously that DNS reply will not be authenticated or secure in any way, because it clearly cannot be a trusted reply from google.com. Thus they will break any kind of DNSSEC or any other kind of authentication that the local caching DNS server might do.
I think the big issue for me is the use of "trusted" in the proposal. What does that actually mean? Who is doing the trusting? Does it mean that Firefox can "trust" the local caching DNS server, and if so, why would that server be any more "trustworthy" than the upstream nameserver delivered to you by DHCP? If that local nameserver *is* somehow more trustworthy, what specifically does it do to deserve that trust? If it does do something, does that something break hotspot or portal detection and login?
Dan
2014-04-11 21:55 GMT+02:00 Dan Williams dcbw@redhat.com:
I think the big issue for me is the use of "trusted" in the proposal. What does that actually mean? Who is doing the trusting?
The goal is to have DNSSEC validation in a system-wide, dedicated code, trusted for that purpose; i.e. unbound does DNSSEC validation for every application, with a centralized configuration and cache, so no application needs or should do this on its own; it can simply consult the AD bit in the reply.
Does it mean
that Firefox can "trust" the local caching DNS server,
Yes, for the purposes of DNSSEC validation and making sure the AD bit is set correctly..
and if so, why would that server be any more "trustworthy" than the upstream nameserver delivered to you by DHCP?
The upstream nameserver is often not under the control of the owner of the computer, so it can't be trusted to return the DNSSEC validation status in the AD bit correctly, and the communication channel (e.g. ethernet or unencrypted wifi) allows an attacker to spoof the reply from that upstream nameserver anyway. In the general case, DNSSEC validation needs to happen on the same machine that relies on the results of the validation.
If it does do something, does that something break hotspot or portal detection and login?
Not necessarily, and probably not. The applications are still in control of what to do if the AD bit is not set (e.g. because DNSSEC validation fails because the hotspot has replied with an unvalidated redirection); so, in this case, Firefox would probably not be willing to use the hotspot-spoofed DNS response as an indication of the correct public key via DANE, but it should still be perfectly willing to make a HTTP connection, including HTTP connection to "google.com" that redirects it to https://hotspot-owner.com . Mirek
On Saturday, 12 April 2014 1:35 AM, Miloslav Trmač mitr@volny.cz wrote:
The goal is to have DNSSEC validation in a system-wide, dedicated code, trusted for that purpose; i.e. unbound does DNSSEC validation for every application, with a centralized configuration and cache, so no application needs or should do this on its own; it can simply consult the AD bit in the reply.
...
Not necessarily, and probably not. ...
Thanks so much for the precise responses Miloslav. But, am I the only one to not see Dan's earlier mail? Or was it a different thread??
Thank you. --- Regards - Prasad http://feedmug.com
On Fri, 11 Apr 2014, Dan Williams wrote:
That's great. Thank you so much for sharing this information. I'll add it to the wiki page.
About the wifi hotspots breakage, I'm still not in the clear. IIUC how they work is, all client traffic is blocked/redirected to a designated server till the time user authenticates/makes payment/accepts conditions etc. This blockage/redirection is probably done on the the gateway or some such entry/exit point, no?
They almost always run DNS interceptors, so that the DHCP server on the captive side always returns the portal's login page for any request, even "www.google.com" or "www.fedoraproject.org" will redirect you to the login page. Obviously that DNS reply will not be authenticated or secure in any way, because it clearly cannot be a trusted reply from google.com. Thus they will break any kind of DNSSEC or any other kind of authentication that the local caching DNS server might do.
There are two captive technologies that need to be checked for. One is port 80/443 interception and one is DNS interception. dnssec-triggerd offers those but its integration with NM is poor because it was written as a stand aline daemon+gnone applet.
On top of the captive portals, the solution needs to deal with VPN situations and split-view DNS. Unbound handles this with libreswan, vpnc and openvpn already. This also involves flushing the cache when the VPN connection goes up/down. unbound supports all of this via runtime unbound-control daemon reconfiguration and also supports shipped exceptions via /etc/unbound/*.d/ directories.
I am not aware of dnsmasq offering these features at all. dnsmasq also only very recently got beta suppor for dnssec, so I am uncomfortable with using dnsmasq for these tasks.
I think the big issue for me is the use of "trusted" in the proposal. What does that actually mean? Who is doing the trusting? Does it mean that Firefox can "trust" the local caching DNS server, and if so, why would that server be any more "trustworthy" than the upstream nameserver delivered to you by DHCP? If that local nameserver *is* somehow more trustworthy, what specifically does it do to deserve that trust? If it does do something, does that something break hotspot or portal detection and login?
The DNSSEC trust comes from a shipped root key that's local to your machine, augmented by installed trust anchoes (eg in /etc/unbound.d/keys.d/). This means all the validation happens on the host and cannot be mangled in transit by an attacker. The alternative is to request a non-local DNS server with DNSSEC support and looking at the answer's AD bit. But this bit is not cryptographically protected, so it assumes a trust in the network.
What would be the most awesome solution, is if NM on finding a new network connection, can bring up a secure container with resolv.conf from the DHCP server, do the captive portal tests, allows the user to login in a throw away browser context, and once accepted on the network, and the mangling has stopped, to release the actual network to the rest of the OS and destroy the used container.
Paul
On Fri, Apr 11, 2014 at 14:21:30 -0500, Dan Williams dcbw@redhat.com wrote:
NM in F20+ already has a "dns=none" option that prevents NM from touching resolv.conf, but obviously if NM isn't touching it, the DNS information that NM gets from upstream or your local configuration needs to get to the local caching nameserver somehow. Which is what the existing NM DNS plugins are for, like the dnsmasq one.
If you are running a caching resolver you don't need the DNS information from DCHP (except except for the hotspot issue) at all. For example, dnscache can be used for this. (It doesn't do dnssec though, so wouldn't provide what is wanted for the proposal.)
Once upon a time, Bruno Wolff III bruno@wolff.to said:
If you are running a caching resolver you don't need the DNS information from DCHP (except except for the hotspot issue) at all.
Unless you have a specific reason not to, you should use the DNS server from DHCP. That may be the only DNS server that will work, there may be private DNS info not available anywhere else, etc.
On Fri, 11 Apr 2014, Chris Adams wrote:
Unless you have a specific reason not to, you should use the DNS server from DHCP.
My specific reason is that I dont trust random strangers.
That may be the only DNS server that will work, there may be private DNS info not available anywhere else, etc.
That situation is dealt with, see my other email to the list
Paul
On Fri, Apr 11, 2014 at 15:33:48 -0500, Chris Adams linux@cmadams.net wrote:
Once upon a time, Bruno Wolff III bruno@wolff.to said:
If you are running a caching resolver you don't need the DNS information from DCHP (except except for the hotspot issue) at all.
Unless you have a specific reason not to, you should use the DNS server from DHCP. That may be the only DNS server that will work, there may be private DNS info not available anywhere else, etc.
Split horizon should still work with a caching recursive resolver (since that is based on the IP address of where the request is coming from). It won't work if network dnsserver provides alternative data out side of its bailiwick. But if these outside of bailiwick domains are known to you, you can tell your resolver where to look for them.
If the network operator is just outright breaking things so that you can only connect to their dns server, well then you're going to need to do something about that. But even if it is switch to their server, you might want to know that that kind of thing is going on.
The advantage of using your dns server is that you know what you're getting. Some large ISPs are known to do interesting things with dns information (such as rewrite ttl information) that can cause problems that are avoided by using your own server.
On Fri, 11 Apr 2014, Bruno Wolff III wrote:
Unless you have a specific reason not to, you should use the DNS server from DHCP. That may be the only DNS server that will work, there may be private DNS info not available anywhere else, etc.
Split horizon should still work with a caching recursive resolver (since that is based on the IP address of where the request is coming from). It won't work if network dnsserver provides alternative data out side of its bailiwick. But if these outside of bailiwick domains are known to you, you can tell your resolver where to look for them.
If you don't know there is an exception for a domain (eg at the other end of a VPN) than you will get the public answers and might not get where you need to go. Additionally, with DNSSEC there is the problem that the public view cryptographically proves the internal view does not exist (eg internal.fedoraproject.org)
If the network operator is just outright breaking things so that you can only connect to their dns server, well then you're going to need to do something about that. But even if it is switch to their server, you might want to know that that kind of thing is going on.
We support that with the VPN software by using the received DOMAIN and DNS servers from IPsec XAUTH and reconfigure unbound on the fly to use the internal servers for that domain (and flush the cache)
The advantage of using your dns server is that you know what you're getting. Some large ISPs are known to do interesting things with dns information (such as rewrite ttl information) that can cause problems that are avoided by using your own server.
Indeed, with DNSSEC we can use them as cache, because we can validate the answers. But those servers should never be "trusted".
Paul
On Fri, Apr 11, 2014 at 16:59:05 -0400, Paul Wouters paul@nohats.ca wrote:
On Fri, 11 Apr 2014, Bruno Wolff III wrote:
If you don't know there is an exception for a domain (eg at the other end of a VPN) than you will get the public answers and might not get where you need to go. Additionally, with DNSSEC there is the problem that the public view cryptographically proves the internal view does not exist (eg internal.fedoraproject.org)
With an iterative resolver that may not be true. If the route to the name server that has that information is over the VPN (so that you have the correct source address), you should get the right answer.
Indeed, with DNSSEC we can use them as cache, because we can validate the answers. But those servers should never be "trusted".
That doesn't get you the right answers though, it only tells you that they are lying.
On Fri, 11 Apr 2014, Bruno Wolff III wrote:
If you don't know there is an exception for a domain (eg at the other end of a VPN) than you will get the public answers and might not get where you need to go. Additionally, with DNSSEC there is the problem that the public view cryptographically proves the internal view does not exist (eg internal.fedoraproject.org)
With an iterative resolver that may not be true. If the route to the name server that has that information is over the VPN (so that you have the correct source address), you should get the right answer.
Sounds like putting RFC1918 addresses in public DNS? Eww. Also, unbound strips those out unless they come in via forwarders. But also selecting DNS servers does not relate to routing at all, so this is confusing to me.
Indeed, with DNSSEC we can use them as cache, because we can validate the answers. But those servers should never be "trusted".
That doesn't get you the right answers though, it only tells you that they are lying.
I'm not sure what you are trying to say here.
Paul
On Fri, Apr 11, 2014 at 17:46:29 -0400, Paul Wouters paul@nohats.ca wrote:
I'm not sure what you are trying to say here.
It was a comment about ISPs changing TTLs (or other things). DNSSEC can be used to tell you the data might not be authoritative, but doesn't tell you what the correct information is.
On Fri, 11 Apr 2014, Bruno Wolff III wrote:
I'm not sure what you are trying to say here.
It was a comment about ISPs changing TTLs (or other things). DNSSEC can be used to tell you the data might not be authoritative, but doesn't tell you what the correct information is.
First, TTLs you receive from a forwarder can always be manipulated, even with DNSSEC - otherwise caching wouldn't work.
Second, I still don't understand the point. Are you suggesting it is better to believe all DNS lies than to not know where the lies lead?
Paul
On Fri, Apr 11, 2014 at 18:44:21 -0400, Paul Wouters paul@nohats.ca wrote:
First, TTLs you receive from a forwarder can always be manipulated, even with DNSSEC - otherwise caching wouldn't work.
Second, I still don't understand the point. Are you suggesting it is better to believe all DNS lies than to not know where the lies lead?
Not better. That DNSSEC doesn't really solve everythin one might want it to. And hence one might want to avoid ISPs' DNS services in some cases.
On Fri, 11 Apr 2014, Bruno Wolff III wrote:
Second, I still don't understand the point. Are you suggesting it is better to believe all DNS lies than to not know where the lies lead?
Not better. That DNSSEC doesn't really solve everythin one might want it to. And hence one might want to avoid ISPs' DNS services in some cases.
Which is why we do captive portal detection, but would like to see better NM integration for it.....
Paul
Once upon a time, Bruno Wolff III bruno@wolff.to said:
The advantage of using your dns server is that you know what you're getting.
You'll also lose almost all content-delivery network advantages (most of that is mapped to "close" servers with DNS).
On Fri, 11 Apr 2014, Chris Adams wrote:
Once upon a time, Bruno Wolff III bruno@wolff.to said:
The advantage of using your dns server is that you know what you're getting.
You'll also lose almost all content-delivery network advantages (most of that is mapped to "close" servers with DNS).
Another reason to attempt to configure your locally running DNSSEC resolver with your ISP's DNS servers as forwarder........
Paul
On Fri, 11 Apr 2014, Bruno Wolff III wrote:
If you are running a caching resolver you don't need the DNS information from DCHP (except except for the hotspot issue) at all. For example, dnscache can be used for this. (It doesn't do dnssec though, so wouldn't provide what is wanted for the proposal.)
It's rude to bypass the global DNS caching infrastructure. That would significantly load people's DNS servers with more queries. There is no reason not to try and use ISP's DNS caches.
dnscache is very obsolete software.
Paul
On Fri, Apr 11, 2014 at 16:43:12 -0400, Paul Wouters paul@nohats.ca wrote:
It's rude to bypass the global DNS caching infrastructure. That would significantly load people's DNS servers with more queries. There is no reason not to try and use ISP's DNS caches.
Some ISPs modify dns responses (such as increasing ttls).
dnscache is very obsolete software.
Only in that it is lacking dnssec support. If that is critical to you, then one consider it very obsolete.
On Saturday, 12 April 2014 2:13 AM, Paul Wouters wrote:> It's rude to bypass the global DNS caching infrastructure. That would significantly load people's DNS servers with more queries. There is no reason not to try and use ISP's DNS caches.
You mean let local resolver forward queries to the ISP's name servers? That is same as using ISP's name servers in '/etc/resolv.conf'. I wouldn't prefer that without DNSSEC.
dnscache is very obsolete software.
-> http://pjp.dgplug.org/ndjbdns/
There is new version of it. It does not support DNSSEC, but is alive and well maintained. --- Regards -Prasad http://feedmug.com
On Saturday, 12 April 2014 10:33 AM, P J P wrote:
On Saturday, 12 April 2014 2:13 AM, Paul Wouters wrote:>
It's rude to bypass the global DNS caching infrastructure. That would significantly load people's DNS servers with more queries. There is no reason not to try and use ISP's DNS caches.
There is also the case that Chuck mentioned about ISP's name servers being unreliable.
--- Regards -Prasad http://feedmug.com
On Sat, 2014-04-12 at 13:35 +0800, P J P wrote:
On Saturday, 12 April 2014 10:33 AM, P J P wrote:
On Saturday, 12 April 2014 2:13 AM, Paul Wouters wrote:>
It's rude to bypass the global DNS caching infrastructure. That would significantly load people's DNS servers with more queries. There is no reason not to try and use ISP's DNS caches.
There is also the case that Chuck mentioned about ISP's name servers being unreliable.
Regards -Prasad http://feedmug.com
That makes no sense.
Say I have freshly installed my fedora system at home. I then boot it up and start to use it. My laptop is caching DNS results all the while from the "unreliable" ISP.
I then go to work and suddenly things don't work.
Having a DNS cache doesn't fix your unreliable ISP: You need to lodge a complaint with your ISP.
See my previous email, about flushing the cache on network change.
On Sat, Apr 12, 2014 at 15:11:35 +0930, William Brown william@firstyear.id.au wrote:
That makes no sense.
Say I have freshly installed my fedora system at home. I then boot it up and start to use it. My laptop is caching DNS results all the while from the "unreliable" ISP.
I then go to work and suddenly things don't work.
Having a DNS cache doesn't fix your unreliable ISP: You need to lodge a complaint with your ISP.
No, but using a caching iterative resolver does. (Barring the ISP rewriting your traffic.)
On Saturday, 12 April 2014 11:11 AM, William Brown wrote: Say I have freshly installed my fedora system at home. I then boot it up and start to use it. My laptop is caching DNS results all the while from the "unreliable" ISP.
I then go to work and suddenly things don't work.
Having a DNS cache doesn't fix your unreliable ISP: You need to lodge a complaint with your ISP.
What, no! that was the case for having local cache and not forwarding queries to the ISP's name servers at all. Because those are not reliable.
See my previous email, about flushing the cache on network change.
Yes I saw. About automatic cache clearance etc. I agree. Those are features to be requested from the DNS software or maybe NM. I've been using 'dnscache' without any trouble whatsoever.
See -> http://pjp.dgplug.org/ndjbdns/ --- Regards -Prasad http://feedmug.com
On Sat, 2014-04-12 at 14:09 +0800, P J P wrote:
On Saturday, 12 April 2014 11:11 AM, William Brown wrote: Say I have freshly installed my fedora system at home. I then boot it up and start to use it. My laptop is caching DNS results all the while from the "unreliable" ISP.
I then go to work and suddenly things don't work.
Having a DNS cache doesn't fix your unreliable ISP: You need to lodge a complaint with your ISP.
What, no! that was the case for having local cache and not forwarding queries to the ISP's name servers at all. Because those are not reliable.
See my previous email, about flushing the cache on network change.
Yes I saw. About automatic cache clearance etc. I agree. Those are features to be requested from the DNS software or maybe NM. I've been using 'dnscache' without any trouble whatsoever.
Umm what? You cannot have your cake and eat it too.
If you flush the cache of interface route change, the entire point you made about trying to no forward to an unreliable ISP is invalid.
Consider, I get home, and open my laptop. Cache is cleared, and I'm now populating that cache with the contents from the ISP.
But if you weren't to clear the cache, I could be at home caching bad records, then when I go to work they persist.
You cannot have both.
I would rather that cache is flushed on interface change as it prevents so many more issues than making that cache last across potential network boundaries.
At the end of the day, I cannot stress enough, if you have an ISP with bad DNS caching or that is unreliable, you need to fault your ISP, it is not an issue for the OS to solve.
On Sat, 2014-04-12 at 14:09 +0800, P J P wrote:
On Saturday, 12 April 2014 11:11 AM, William Brown wrote: Say I have freshly installed my fedora system at home. I then boot it up and start to use it. My laptop is caching DNS results all the while from the "unreliable" ISP.
PS: The unreliable ISP I perceive as:
1) They often return no query within an acceptable time period 2) They return invalid or incorrect zone data 3) They mess with TTLs or other zone data
On Saturday, 12 April 2014 12:41 PM, William Brown wrote: PS: The unreliable ISP I perceive as:
- They often return no query within an acceptable time period
- They return invalid or incorrect zone data
- They mess with TTLs or other zone data
Right.
Consider, I get home, and open my laptop. Cache is cleared, and I'm now populating that cache with the contents from the ISP.
No, why contents from ISP? Local resolver will populate cache from root servers, no?
But if you weren't to clear the cache, I could be at home caching bad records, then when I go to work they persist.
This is a glitch that when you are at home the cache still has office domain addresses cached, to which you can not connect, because you aren't connected to the office network. Do I understand it right? IMO, that's not bad cache.
You cannot have both. I would rather that cache is flushed on interface change as it prevents so many more issues than making that cache last across potential network boundaries.
Sure, no contention there. IMO, that could be a feature for NM, to clear local cache on interface change. Because NM is suitably placed to do that.
At the end of the day, I cannot stress enough, if you have an ISP with bad DNS caching or that is unreliable, you need to fault your ISP,
IMO, local resolver can help here.
--- Regards -Prasad http://feedmug.com
Consider, I get home, and open my laptop. Cache is cleared, and I'm now populating that cache with the contents from the ISP.
No, why contents from ISP? Local resolver will populate cache from root servers, no?
This isn't how DNS works ..... You populate your cache from the ISP, who queries above them and so on up to the root server.
http://technet.microsoft.com/en-us/library/cc961401.aspx
But if you weren't to clear the cache, I could be at home caching bad records, then when I go to work they persist.
This is a glitch that when you are at home the cache still has office domain addresses cached, to which you can not connect, because you aren't connected to the office network. Do I understand it right? IMO, that's not bad cache.
I should clarify. I cache the record foo.work.com from the office, and it resolves differently externally. When I go home, it no longer resolves to the external IP as I'm using the internally acquired record from cache.
You cannot have both. I would rather that cache is flushed on interface change as it prevents so many more issues than making that cache last across potential network boundaries.
Sure, no contention there. IMO, that could be a feature for NM, to clear local cache on interface change. Because NM is suitably placed to do that.
Agreed.
At the end of the day, I cannot stress enough, if you have an ISP with bad DNS caching or that is unreliable, you need to fault your ISP,
IMO, local resolver can help here.
On Sat, 2014-04-12 at 16:15 +0800, P J P wrote:
On Saturday, 12 April 2014 12:41 PM, William Brown wrote: PS: The unreliable ISP I perceive as:
- They often return no query within an acceptable time period
- They return invalid or incorrect zone data
- They mess with TTLs or other zone data
Right.
Referencing these together.
A local cache will help you with 1 "sometimes" provided you get the first record back once.
It won't prevent the second or third as you will just cache the incorrect data instead (Provided you clear cache on network change, this isn't a problem ... it just means you hold onto bad data for that session for longer, which creates other issues.)
I personally am actually against DNS cache on systems as it tends to create more problems than it solves.
Am 12.04.2014 13:25, schrieb William Brown:
Consider, I get home, and open my laptop. Cache is cleared, and I'm now populating that cache with the contents from the ISP.
No, why contents from ISP? Local resolver will populate cache from root servers, no?
This isn't how DNS works ..... You populate your cache from the ISP, who queries above them and so on up to the root server.
no - only if you enter your ISP's servers as forwarder which makes only partial sense because it lowers the effective TTL and you have more and more cahcings between the origin and your machine while some of them may ignore TTL at all
a DNS server doing recursion don't ask any forwarder
nice, but you need to understand the contents especially point 3 and because point 1-8 it is called "recursion" because you sart to ask the root-servers which tell you "hey i don't konw the address, but i know who knows .com" followed by the server for .com answers "well, i don't know the answer but i can tell you who knows"
On Sat, 12 Apr 2014, Reindl Harald wrote:
a DNS server doing recursion don't ask any forwarder
That's wrong. a DNS server can use a forwareder for some or all of its recursive queries. unbound+dnssec-triggerd mostly cause unbound to do full recursion but using the ISP nameserver as forward for all queries.
Paul
Am 12.04.2014 16:55, schrieb Paul Wouters:
On Sat, 12 Apr 2014, Reindl Harald wrote:
a DNS server doing recursion don't ask any forwarder
That's wrong. a DNS server can use a forwareder for some or all of its recursive queries. unbound+dnssec-triggerd mostly cause unbound to do full recursion but using the ISP nameserver as forward for all queries.
oh no - please try to understand what recursion means in case of DNS
may i suggest to read some docs because if i talk about DNS as one who maintains 600 domains as DNS provider as well as Registry for the .at domain and implemented DNS admin-backends years ago i know what i am talking about
recursion is by definition
* ask the root server for example.com * answer of the root is "dunno, but you can ask xxx for .com" * your DNS asks xxx for example.com * answer of xxx is "dunno, but you can ask ns1.whoever.tld for example.com"
forwarding bypasses that and asks your ISP's or whatever configured nameserver and never the root, so no, you don't do recursion in that case, your forwarder may do or at least the last forwarder if the DNS you asking itself does forwarding too - but that's not your business then and you don't to recursion
On Sat, 12 Apr 2014, Reindl Harald wrote:
That's wrong. a DNS server can use a forwareder for some or all of its recursive queries. unbound+dnssec-triggerd mostly cause unbound to do full recursion but using the ISP nameserver as forward for all queries.
oh no - please try to understand what recursion means in case of DNS
may i suggest to read some docs because if i talk about DNS as one who maintains 600 domains as DNS provider as well as Registry for the .at domain and implemented DNS admin-backends years ago i know what i am talking about
If we are going to do appeals to authority for making arguments, I've been doing DNSSEC since 2001, ran an ISP between 1995 and 2005 that had 500+ domains, have pending DNS RFC drafts out there, am acknowledged on many more DNS RFCs published, am a member of a global DNS incident response team, member of DNS-OARC, implemented a failover dual-software dual-setup multi-local DNSSEC signed for a TLD larger that .at praised by the entire DNS industry, performed DNS outage post-mortem at one of Canada's largest banks and was one of ten ICANN newGTLD Registry Services panel members evaluting the 1500 new TLD submissions for their technical implementations of DNS and Registry Services. I think I know what recursion means.....
Now can we go back to actually discussion technical arguments again?
Thanks.
Paul
Am 12.04.2014 17:21, schrieb Paul Wouters:
On Sat, 12 Apr 2014, Reindl Harald wrote:
That's wrong. a DNS server can use a forwareder for some or all of its recursive queries. unbound+dnssec-triggerd mostly cause unbound to do full recursion but using the ISP nameserver as forward for all queries.
oh no - please try to understand what recursion means in case of DNS
may i suggest to read some docs because if i talk about DNS as one who maintains 600 domains as DNS provider as well as Registry for the .at domain and implemented DNS admin-backends years ago i know what i am talking about
If we are going to do appeals to authority for making arguments, I've been doing DNSSEC since 2001, ran an ISP between 1995 and 2005 that had 500+ domains, have pending DNS RFC drafts out there, am acknowledged on many more DNS RFCs published, am a member of a global DNS incident response team, member of DNS-OARC, implemented a failover dual-software dual-setup multi-local DNSSEC signed for a TLD larger that .at praised by the entire DNS industry, performed DNS outage post-mortem at one of Canada's largest banks and was one of ten ICANN newGTLD Registry Services panel members evaluting the 1500 new TLD submissions for their technical implementations of DNS and Registry Services. I think I know what recursion means.....
Now can we go back to actually discussion technical arguments again?
than stop repsond with "That's wrong" if i tell someone forwarding != recursion because you should know it better and use correct terms
Now can we go back to actually discussion technical arguments again?
Actually no.
This whole thread has forgotten one major thing ... use cases.
Proposal is to add a local caching DNS server to fedora systems. This may or may not accept a DHCP provided forwarder(?).
Case 1: Standard home user. Has little knowledge of DNS, and a router provided by their ISP. DHCP provides the laptop with the DNS ip of the router, which then acts as a forwarder to the ISP
Case 2: Moderate home user. They have a little knowledge of DNS, and have setup a system like OpenWRT or gargoyle on their router. They have their own zone, .local . This means that their DHCP provides the DNS ip of the router to clients.
Case 3: Power home user. They likely have their own fedora router server or some other system setup. They run their own bind / named instance on their network, with their own zone or two. They have DHCP setup, perhaps to use DDNS updates to named.
Case 4: Small business workstation. Likely the small business, like the power home user, has their own name server. It may be Windows DNS from AD, or bind.
Case 5: Medium / Large business workstation. It's nearly guaranteed that the business runs their own zones. They have their own extensive, well organised bind / named setup.
Case 6: Fedora server in a small business: Same as the workstation, likely AD or bind in the office.
Case 7: Fedora server in the large business. Same as the workstation.
Case 8: Road warrior to the power home, small business, or large business. Uses VPN, and needs access to the DNS provided by the push dhcp / dns options from their vpn tunnel.
Now, in all of these cases the system local DNS cache *must* forward to the local DNS server. You can't at the OS distinguish between any of these cases with just the DHCP record or lease. It is unreasonable to ask everyone to manually setup DNS on every network they join. You must have the forwarder set to the DNS provided by DHCP so that you can access the local network resources. You cannot suggest bypassing these.
Case 1: The user doesn't know much about DNS. the ISP might be reliable or unreliable. If we assume as discussed that the cache is flushed on network change, they will have an empty cache. With a good, ISP, they will get consistent answers to queries, and there is no point to having the cache. If the ISP is unreliable, they will get records that are incorrect (See broken TTLs etc), or no record at all. The cache will only help once a record has been returned once, and will only aleviate pain "later" in the session. So DNS caching *may* help here.
Case 2: The user does know a bit. But when they change name records they may not be able to solve why a workstation can't resolve names like other clients. We don't want to give Fedora the same name as Windows, where you need to turn on/off the network interface all the time to solve issues (to flush the DNS cache). In this example, the user wants their router to (maybe) cache the records, and to absolve this from clients.
Case 3: This user does understand DNS, and they don't need DNS cache. They have bind / named setup, and they would like to rely on that instead. When they change records in their local zones, they don't want to have to flush caches etc. If their ISP is unreliable, or their own DNS is unreliable, a DNS cache will potentially mask this issue delaying them from noticing / solving the problem.
Case 4, 5, 6 and 7: DNS cache, again, isn't needed. The infrastructure is well setup, and caching is done by the business servers. DNS outages at the business level, mean there are other issues and they will likely be resolved quickly. You don't want to reboot / reset interfaces for each time you make a change or as the first result of an issue (Again, this would give fedora a bad name). DNS caching may mask a bigger problem.
Case 8: Vpns are a bit unreliable, and have relatively high(ish) latency. But mostly they are quite good, ie openvpn. DNS cache *might* help here in case of traffic loss. Again, this would be masking a greater issue though, and could be better solved with TCP dns queries rather than UDP.
Only case 1 and 8 have real reasons to use a local OS dns cache. However, you can not distinguish these from the other cases at a network level. IE you couldn't make the cache easily enabled / disabled based on some network parameter.
Additionally, case 1 is only needed in the situation that you have a low quality connection, or a low quality ISP forwarder. Both of these are issues you should take up with your ISP, and shouldn't be solved by Fedora.
Case 8 with the VPN, is inherently hard to fix. VPN reliability is always improving and I think it's becoming less of an issue. I would also say it's a "rarer" use case than the others.
In conclusion, I don't percieve that a DNS cache in Fedora is a good idea, as it solves few real world problems, and may in fact create issues, mask issues and create a bad stigma about Fedora network reliability. If it is to become available to users I would like:
* DNS cache is not the default. It bust be enabled on a connection (IE user's in case 1 can enable it if needed) * DNS cache should be able to be enabled from the NM Gui * DNS cache should be able to be flushed live from the NM Gui * DNS cache should be flushed on route or interface state change. * If two interfaces are active, the default route DNS cache setting takes precedence.
On Sat, Apr 12, 2014 at 5:18 PM, William Brown william@firstyear.id.au wrote:
Now can we go back to actually discussion technical arguments again?
Actually no.
This whole thread has forgotten one major thing ... use cases.
Proposal is to add a local caching DNS server to fedora systems. This may or may not accept a DHCP provided forwarder(?).
[snipped lots of entirely legitimate use cases]
Now, in all of these cases the system local DNS cache *must* forward to the local DNS server. You can't at the OS distinguish between any of these cases with just the DHCP record or lease. It is unreasonable to ask everyone to manually setup DNS on every network they join. You must have the forwarder set to the DNS provided by DHCP so that you can access the local network resources. You cannot suggest bypassing these.
I don't think anyone is suggesting bypassing these. That is, all of your use cases *will still work*, possibly better than they do now, with a systemwide resolver.
Case 1: The user doesn't know much about DNS. the ISP might be reliable or unreliable. If we assume as discussed that the cache is flushed on network change, they will have an empty cache. With a good, ISP, they will get consistent answers to queries, and there is no point to having the cache.
There are two good reasons for the cache:
1. DNSSEC. Trusting the AD bit from the ISP is wrong.
2. Caching. It's a lot better to have a correct systemwide cache than a bunch of per-application crappy caches.
Case 3: This user does understand DNS, and they don't need DNS cache. They have bind / named setup, and they would like to rely on that instead. When they change records in their local zones, they don't want to have to flush caches etc. If their ISP is unreliable, or their own DNS is unreliable, a DNS cache will potentially mask this issue delaying them from noticing / solving the problem.
That's what an extremely short TTL is for. Also, anyone relying on local clients not caching when TTL is already terminally screwed. Consider that Firefox, Chromium, Windows, and I suspect Mac OS and Android have caches. The fact that Linux command line tools have no cache right now is not a feature.
Case 8: Vpns are a bit unreliable, and have relatively high(ish) latency. But mostly they are quite good, ie openvpn.
Hah. Openvpn screws up its own internal DNS cache on a regular basis for me. This is a common cause of Ctrl-C.
--Andy
On Sat, 2014-04-12 at 17:46 -0700, Andrew Lutomirski wrote:
On Sat, Apr 12, 2014 at 5:18 PM, William Brown william@firstyear.id.au wrote:
Now can we go back to actually discussion technical arguments again?
Actually no.
This whole thread has forgotten one major thing ... use cases.
Proposal is to add a local caching DNS server to fedora systems. This may or may not accept a DHCP provided forwarder(?).
[snipped lots of entirely legitimate use cases]
Now, in all of these cases the system local DNS cache *must* forward to the local DNS server. You can't at the OS distinguish between any of these cases with just the DHCP record or lease. It is unreasonable to ask everyone to manually setup DNS on every network they join. You must have the forwarder set to the DNS provided by DHCP so that you can access the local network resources. You cannot suggest bypassing these.
I don't think anyone is suggesting bypassing these. That is, all of your use cases *will still work*, possibly better than they do now, with a systemwide resolver.
A system wide resolver I am not opposed to. I am against a system wide *caching* resolver.
Case 1: The user doesn't know much about DNS. the ISP might be reliable or unreliable. If we assume as discussed that the cache is flushed on network change, they will have an empty cache. With a good, ISP, they will get consistent answers to queries, and there is no point to having the cache.
There are two good reasons for the cache:
DNSSEC. Trusting the AD bit from the ISP is wrong.
Caching. It's a lot better to have a correct systemwide cache than
a bunch of per-application crappy caches.
In this case, a cache *is* helpful, as is DNSSEC. But for the other 6, a cache is a severe detriment.
Case 3: This user does understand DNS, and they don't need DNS cache. They have bind / named setup, and they would like to rely on that instead. When they change records in their local zones, they don't want to have to flush caches etc. If their ISP is unreliable, or their own DNS is unreliable, a DNS cache will potentially mask this issue delaying them from noticing / solving the problem.
That's what an extremely short TTL is for. Also, anyone relying on local clients not caching when TTL is already terminally screwed. Consider that Firefox, Chromium, Windows, and I suspect Mac OS and Android have caches. The fact that Linux command line tools have no cache right now is not a feature.
I disable the DNS cache in firefox with developer tools.
Additionally, a short TTL is good, for this situation, but it can't fix everything.
Short ttls really rely on every layer of the stack actually not messing with this. However that's another issue.
Case 8: Vpns are a bit unreliable, and have relatively high(ish) latency. But mostly they are quite good, ie openvpn.
Hah. Openvpn screws up its own internal DNS cache on a regular basis for me. This is a common cause of Ctrl-C.
--Andy
I don't use the OpenVPN dns cache.
On Sun, 2014-04-13 at 16:10 +0930, William Brown wrote:
A system wide resolver I am not opposed to. I am against a system wide *caching* resolver.
In this case, a cache *is* helpful, as is DNSSEC. But for the other 6, a cache is a severe detriment.
About the above 2, can you explain *why* ? A bunch of people here, feel that it would be a great improvement, you keep saying it is doomsday, yet I haven't seen a concise explanation of why that would be (maybe I overlooked, apologies if so).
I disable the DNS cache in firefox with developer tools.
So you will be able to do the same by setting 1 configuration option in unbound, or you could disable the resolver entirely.
Can you tell why *everybody* should have the cache disabled by default ?
Additionally, a short TTL is good, for this situation, but it can't fix everything.
Paul mentioned the single configuration option need to make your resolver tweak the TTL locally, what else do you need ? And again why your preference should be the default ? What compelling arguments can you make ?
Simo.
On Sun, 2014-04-13 at 02:53 -0400, Simo Sorce wrote:
On Sun, 2014-04-13 at 16:10 +0930, William Brown wrote:
A system wide resolver I am not opposed to. I am against a system wide *caching* resolver.
In this case, a cache *is* helpful, as is DNSSEC. But for the other 6, a cache is a severe detriment.
About the above 2, can you explain *why* ? A bunch of people here, feel that it would be a great improvement, you keep saying it is doomsday, yet I haven't seen a concise explanation of why that would be (maybe I overlooked, apologies if so).
I disable the DNS cache in firefox with developer tools.
So you will be able to do the same by setting 1 configuration option in unbound, or you could disable the resolver entirely.
Can you tell why *everybody* should have the cache disabled by default ?
Additionally, a short TTL is good, for this situation, but it can't fix everything.
Paul mentioned the single configuration option need to make your resolver tweak the TTL locally, what else do you need ? And again why your preference should be the default ? What compelling arguments can you make ?
Simo.
Internal and external zone views in a business. These records may different, and so would need flushing between network interface state changes.
Additionally, local DNS caches may issues and delay diagnosis.
It's also not *needed* in a lot of setups. The business cases were to show that these caching layers already exist on these networks. It would be duplication of effort.
In businesses, it's also common place to have a low-ish ttl (Say 5 minutes) and when a system is migrated, they swap the A/AAAA records to the new system. The dns servers on the network are updated, but the workstation has the old record cached. Without a local cache, they would query the local server again, which is relatively cheap. IE: It keeps users happier even if they only needed to wait 5 minutes. Some people like things to be instant.
It's certainly not the end of the world, but it's adding more complexity, and a potential source of issues.
There is additionally, some confusion: It sounds like Paul wants to add the resolver to only forward queries for the local domain name to the local name servers. But this is impossible to discover all possible local domain names that are available.
tl;dr - DNSSEC I believe is a good thing (Even if it's rare). I don't think there are "benefits" to caching except in a minor number of cases where existing DNS caching mechanisms aren't in place. We are adding a layer of caching complexity that doesn't solve a real problem.
On Sun, 2014-04-13 at 16:39 +0930, William Brown wrote:
On Sun, 2014-04-13 at 02:53 -0400, Simo Sorce wrote:
On Sun, 2014-04-13 at 16:10 +0930, William Brown wrote:
A system wide resolver I am not opposed to. I am against a system wide *caching* resolver.
In this case, a cache *is* helpful, as is DNSSEC. But for the other 6, a cache is a severe detriment.
About the above 2, can you explain *why* ? A bunch of people here, feel that it would be a great improvement, you keep saying it is doomsday, yet I haven't seen a concise explanation of why that would be (maybe I overlooked, apologies if so).
I disable the DNS cache in firefox with developer tools.
So you will be able to do the same by setting 1 configuration option in unbound, or you could disable the resolver entirely.
Can you tell why *everybody* should have the cache disabled by default ?
Additionally, a short TTL is good, for this situation, but it can't fix everything.
Paul mentioned the single configuration option need to make your resolver tweak the TTL locally, what else do you need ? And again why your preference should be the default ? What compelling arguments can you make ?
Simo.
Internal and external zone views in a business. These records may different, and so would need flushing between network interface state changes.
Additionally, local DNS caches may issues and delay diagnosis.
It's also not *needed* in a lot of setups. The business cases were to show that these caching layers already exist on these networks. It would be duplication of effort.
In businesses, it's also common place to have a low-ish ttl (Say 5 minutes) and when a system is migrated, they swap the A/AAAA records to the new system. The dns servers on the network are updated, but the workstation has the old record cached. Without a local cache, they would query the local server again, which is relatively cheap. IE: It keeps users happier even if they only needed to wait 5 minutes. Some people like things to be instant.
It's certainly not the end of the world, but it's adding more complexity, and a potential source of issues.
There is additionally, some confusion: It sounds like Paul wants to add the resolver to only forward queries for the local domain name to the local name servers. But this is impossible to discover all possible local domain names that are available.
tl;dr - DNSSEC I believe is a good thing (Even if it's rare). I don't think there are "benefits" to caching except in a minor number of cases where existing DNS caching mechanisms aren't in place. We are adding a layer of caching complexity that doesn't solve a real problem.
PS: It also seemed like the proposal was to *bypass* the networks provided forwarders from DHCP. This *is* a serious issue if it's the case.
On Sun, 13 Apr 2014, William Brown wrote:
PS: It also seemed like the proposal was to *bypass* the networks provided forwarders from DHCP. This *is* a serious issue if it's the case.
We only bypass DHCP provided forwarders that are broken. We actually WANT to use them as much as possible, because we DO believe in caching, and we DO want to use ISP caches whereever possible. But if they run broken or malicious DNS servers, we do our best to bypass those servers.
Paul
William Brown wrote:
In businesses, it's also common place to have a low-ish ttl (Say 5 minutes) and when a system is migrated, they swap the A/AAAA records to the new system. The dns servers on the network are updated, but the workstation has the old record cached. Without a local cache, they would query the local server again, which is relatively cheap. IE: It keeps users happier even if they only needed to wait 5 minutes. Some people like things to be instant.
If the admins on that network configured a five-minute TTL, it's because they *want* the clients to cache the records for five minutes. They can set the TTL to zero if they want the servers to be queried for every single lookup.
And if they're migrating a system they will know this more than five minutes in advance, so they can set a zero TTL temporarily.
william wrote:
[...] Internal and external zone views in a business. These records may different, and so would need flushing between network interface state changes.
Sure, let's make it easy to restart the cache upon such transitions.
Additionally, local DNS caches may issues and delay diagnosis.
It's also not *needed* in a lot of setups. [...]
The cache-hit latency to a local daemon vs. a shared network daemon can be significantly different.
- FChE
DNS over SSL does NOT work - I get no connectivity whatsoever after following the below steps. Tracking bug at https://bugzilla.redhat.com/show_bug.cgi?id=1119050
Can you please tell me what am I doing wrong?
@ https://fedoraproject.org/wiki/Test_Day:2012-12-11_Network_Manager_and_DNSSE...
it says to do:
sudo yum install dnssec-trigger
sudo systemctl enable dnssec-triggerd.service
sudo systemctl enable unbound.service
sudo reboot
Then to get DNS over SSL
@ https://fedoraproject.org/wiki/QA:Testcase_DNS-over-SSL
it says to do:
sudo iptables -A OUTPUT -o lo -j ACCEPT sudo iptables -A OUTPUT -p tcp --dport 53 -j DROP sudo iptables -A OUTPUT -p udp --dport 53 -j DROP
Then we are supposed to click on re-probe.
And now there is no connectivity.
On Sun, 13 Jul 2014, quickbooks office wrote:
DNS over SSL does NOT work - I get no connectivity whatsoever after following the below steps. Tracking bug at https://bugzilla.redhat.com/show_bug.cgi?id=1119050
Can you please tell me what am I doing wrong?
There seems to be some regression with unbound causing packets to go out on port 53 instead of 443 when enabling ssl-upstream. I'm investigating and will run a bisect.
btw to test unbound without using firewall rules of dnssec-trigger, use:
sudo unbound-control forward_add . 80.239.156.220 sudo unbound-control set_option ssl-upstream: yes
This is basically what dnssec-trigger does in the fallback case.
Paul
On Sun, 2014-04-13 at 16:39 +0930, William Brown wrote:
On Sun, 2014-04-13 at 02:53 -0400, Simo Sorce wrote:
On Sun, 2014-04-13 at 16:10 +0930, William Brown wrote:
A system wide resolver I am not opposed to. I am against a system wide *caching* resolver.
In this case, a cache *is* helpful, as is DNSSEC. But for the other 6, a cache is a severe detriment.
About the above 2, can you explain *why* ? A bunch of people here, feel that it would be a great improvement, you keep saying it is doomsday, yet I haven't seen a concise explanation of why that would be (maybe I overlooked, apologies if so).
I disable the DNS cache in firefox with developer tools.
So you will be able to do the same by setting 1 configuration option in unbound, or you could disable the resolver entirely.
Can you tell why *everybody* should have the cache disabled by default ?
Additionally, a short TTL is good, for this situation, but it can't fix everything.
Paul mentioned the single configuration option need to make your resolver tweak the TTL locally, what else do you need ? And again why your preference should be the default ? What compelling arguments can you make ?
Simo.
Internal and external zone views in a business. These records may different, and so would need flushing between network interface state changes.
As mentioned unbound does flush away the specific domains, so I will write this one off. We can discuss the fine tuning for sure but it is not a general concern.
Additionally, local DNS caches may issues and delay diagnosis.
Just like about everything, I will write this off as well as I've seen lack of caching causing the same difficult to diagnose issues (two collaborating application getting different IPs for the same service and inconsitencies between the 2 endpoints causing odd results).
So I do not think this is a generally valid concern, in the sense that the benefits outweight the potential issues IMO.
It's also not *needed* in a lot of setups. The business cases were to show that these caching layers already exist on these networks. It would be duplication of effort.
Not really, it would reduce unnecessary traffic, and give you for free a bit of server affinity in general a good thing for most cases. Whether it is *needed* or not depends on the situation, however my sensation over many years is that a cache would bring more benefits than not. I have had a lot more issues with flaky networks on machine w/o a cache than one those with a cache, browsing in particular becomes erratic on networks with high packet loss or flakey DNS server when you do not have a cache as UDP packets are easily lost while TCP connection can retry and recover more quickly, so the DNS is the one that causes more issues and delays for the browser.
In businesses, it's also common place to have a low-ish ttl (Say 5 minutes) and when a system is migrated, they swap the A/AAAA records to the new system. The dns servers on the network are updated, but the workstation has the old record cached.
If the TTL is 5 minutes, the cache will expire in 5 minutes too.
Without a local cache, they would query the local server again, which is relatively cheap.
And tey will do the same with a local cache, local caches *will* respect TTLs of course!
IE: It keeps users happier even if they only needed to wait 5 minutes. Some people like things to be instant.
And some people want unicorns, nobody prevent those people from disabling this default. My personal experience is that these are rare events and are not that important, and can be properly handled by admins by lowering the TTLs in advance of a planned outage by bringing them down to very short timeouts or even 0.
It's certainly not the end of the world, but it's adding more complexity, and a potential source of issues.
And also a source of benefits, as always it is a matter of balance, and with the advent of DNSSEC I personally think the balance has definitely tipped in favor of a *default* local resolver cache. (note: the *default*, it means it is not bolted on, you can easily replace it like you do for other services, for example the first thing I do on my machines is to throw sendmail out the window and bring in postfix, it's not a big deal, kisckstarts makes it trivial too).
There is additionally, some confusion: It sounds like Paul wants to add the resolver to only forward queries for the local domain name to the local name servers. But this is impossible to discover all possible local domain names that are available.
I think the default will be to forward everything to the DHCP provided nameservers, and forward to specific servers per specific domain on resources like VPN (unless otherwise configured, and there are already knobs for that).
tl;dr - DNSSEC I believe is a good thing (Even if it's rare). I don't think there are "benefits" to caching except in a minor number of cases where existing DNS caching mechanisms aren't in place. We are adding a layer of caching complexity that doesn't solve a real problem.
I guess you do not travel much with a laptop on unreliable networks, there a local cache makes a big difference. I think it is a great default for workstations, and debatable for servers if it weren't for DNSSEC, which again makes it a good default for server too.
If you want machines on your network to not cache locally much, just change your local DNS servers to change TTLs to 0 or a very low amount, problem solved.
HTH, Simo.
On Sun, 13 Apr 2014, William Brown wrote:
Now can we go back to actually discussion technical arguments again?
Actually no.
This whole thread has forgotten one major thing ... use cases.
That was in response to someone using appeal of authority statements, not factual discussions.
Proposal is to add a local caching DNS server to fedora systems. This may or may not accept a DHCP provided forwarder(?).
Yes. It depends on the "trustworthiness" of the network and or preconfiguration of some of your own networks you join.
Case 1: Standard home user. Has little knowledge of DNS, and a router provided by their ISP. DHCP provides the laptop with the DNS ip of the router, which then acts as a forwarder to the ISP
Works reasonably well with unbound+dnssec-triger, could use better NM integration for captive portals.
Case 2: Moderate home user. They have a little knowledge of DNS, and have setup a system like OpenWRT or gargoyle on their router. They have their own zone, .local . This means that their DHCP provides the DNS ip of the router to clients.
Same if their wifi is closed (eg WPA2), will need an exception in NM if their wifi is open for the .local forward.
Case 3: Power home user. They likely have their own fedora router server or some other system setup. They run their own bind / named instance on their network, with their own zone or two. They have DHCP setup, perhaps to use DDNS updates to named.
Same as above.
Case 4: Small business workstation. Likely the small business, like the power home user, has their own name server. It may be Windows DNS from AD, or bind.
Same as above.
Case 5: Medium / Large business workstation. It's nearly guaranteed that the business runs their own zones. They have their own extensive, well organised bind / named setup.
When connecting to their LAN or secure wifi, same as above for one one forwarding zone. Multiple forwarding zones would need to be configured. It if is an enterprise, they might need their corporate CAs as well as their zones configuration, so a corporate rpm package would make sense.
Case 6: Fedora server in a small business: Same as the workstation, likely AD or bind in the office.
Same as previous
Case 7: Fedora server in the large business. Same as the workstation.
Same as previous
Case 8: Road warrior to the power home, small business, or large business. Uses VPN, and needs access to the DNS provided by the push dhcp / dns options from their vpn tunnel.
Same, already works if you only need the one domain that is negotiated via the VPN (eg the IKE XAUTH domain).
Now, in all of these cases the system local DNS cache *must* forward to the local DNS server. You can't at the OS distinguish between any of these cases with just the DHCP record or lease. It is unreasonable to ask everyone to manually setup DNS on every network they join. You must have the forwarder set to the DNS provided by DHCP so that you can access the local network resources. You cannot suggest bypassing these.
We are not suggesting that for LAN or secure wifi. In those cases the forward will be added. However, you don' want those forwards for open wifi or else I can bring up "linksys" push you a forward for your internal.domain.com and mislead you into thinking you would be going over your VPN.
Case 1: The user doesn't know much about DNS. the ISP might be reliable or unreliable. If we assume as discussed that the cache is flushed on network change, they will have an empty cache.
The cache is never fully flushed. It is only flushed for the domain obtained via DHCP or VPN, because those entries can change. They are not changed for anything else. If the upstream ISP could have spoofed them, so be it - the publisher of the domains could have used DNSSEC to prevent that from happening.
With a good, ISP, they will get consistent answers to queries, and there is no point to having the cache. If the ISP is unreliable, they will get records that are incorrect (See broken TTLs etc),
There is no such things as "broken TTLs". And there is no modern nameserver that believes or honours TTLs for months. The unbound default is cache-max-ttl: 86400. Nothing will be cached for more than one day regardless of the TTL received. Again, if a publisher of a zone wants ISPs to keep their hands of their records, they should use DNSSEC and sign their zone.
Case 2: The user does know a bit. But when they change name records they may not be able to solve why a workstation can't resolve names like other clients.
While we could flush the entire cache on (dis)connect, I think that's rather drastic for this kind of odd use-case. If the user runs their own zone and their own records, they should know about DNS and TTLs. But even so, NM could offer an option to flush the DNS cache.
Case 3: This user does understand DNS, and they don't need DNS cache.
That depends. You need caching for DNSSEC validation, so really, every device needs a cache, unless you want to outsource your DNSSEC validation over an insecure transport (LAN). That seems like a very bad idea.
They have bind / named setup, and they would like to rely on that instead.
They can. DNS caches are chained. There is no reason to say you cannot run your own cache and have a network based cache.
When they change records in their local zones, they don't want to have to flush caches etc. If their ISP is unreliable, or their own DNS is unreliable, a DNS cache will potentially mask this issue delaying them from noticing / solving the problem.
This is becoming really contrived. Again, if you think this is a real scenario (I don't think it is) than you could run unbound with ttl=0. But a requirement of automagically understanding what a local zone is and automagically understanding when a remote authoritative dns server changes data, and not willing to enforce that with ttl=0, and using that as argument why any solution of unbound to provide a security feature (DNSSEC) is getting a little unrealistic. If you want your laptop to start validating TLSA and SSHP and OPENPGPKEY records, you need DNSSEC validation on the device. The question should be "how do you change your network requirements to meet that goal". Yes, enforcing security comes at a price.
Let me use your scenario based on TLS. You want to be able to change your TLS certificates and the private CA you regenerate at any time, without any browser on your network ever giving you a popup warning. You know you cannot ask this - it goes against the security model. The same applies for DNS with DNSSEC. The security demands we need to do validation and caching and we should try to make that as flexible and painless as possible.
Case 4, 5, 6 and 7: DNS cache, again, isn't needed.
Again, DNSSEC validation on the device requires caching.
The infrastructure is well setup, and caching is done by the business servers. DNS outages at the business level, mean there are other issues and they will likely be resolved quickly. You don't want to reboot / reset interfaces for each time you make a change or as the first result of an issue (Again, this would give fedora a bad name). DNS caching may mask a bigger problem.
I don't really understand this paragraph.
Case 8: Vpns are a bit unreliable, and have relatively high(ish) latency. But mostly they are quite good, ie openvpn. DNS cache *might* help here in case of traffic loss. Again, this would be masking a greater issue though, and could be better solved with TCP dns queries rather than UDP.
The VPN cases aleady work very well in Fedora. I seamlessly connect and disconnect from the redhat VPN. Resources that are available only via the VPN are never blocked by wrong DNS cache I got from when the VPN was down. VPNs are a non-issue.
In conclusion, I don't percieve that a DNS cache in Fedora is a good idea, as it solves few real world problems, and may in fact create issues, mask issues and create a bad stigma about Fedora network reliability. If it is to become available to users I would like:
I believe you will need to re-think that in light of running a validating DNSSEC resolver on your laptop or servers.
- DNS cache is not the default. It bust be enabled on a connection (IE
user's in case 1 can enable it if needed)
- DNS cache should be able to be enabled from the NM Gui
- DNS cache should be able to be flushed live from the NM Gui
- DNS cache should be flushed on route or interface state change.
- If two interfaces are active, the default route DNS cache setting
takes precedence.
You cannot separate dns cache from DNSSEC. DNS caching is not a problem, it is a feature. If you don't want your records cached, use ttl=0.
Paul
Am 13.04.2014 03:07, schrieb Paul Wouters:
On Sun, 13 Apr 2014, William Brown wrote:
When they change records in their local zones, they don't want to have to flush caches etc. If their ISP is unreliable, or their own DNS is unreliable, a DNS cache will potentially mask this issue delaying them from noticing / solving the problem.
This is becoming really contrived. Again, if you think this is a real scenario (I don't think it is) than you could run unbound with ttl=0
i would run BIND and not unbound in any case and now? would you pull me unbound as dependency?
But a requirement of automagically understanding what a local zone is and automagically understanding when a remote authoritative dns server changes data, and not willing to enforce that with ttl=0, and using that as argument why any solution of unbound to provide a security feature (DNSSEC) is getting a little unrealistic. If you want your laptop to start validating TLSA and SSHP and OPENPGPKEY records, you need DNSSEC validation on the device. The question should be "how do you change your network requirements to meet that goal". Yes, enforcing security comes at a price.
boah it is *not* a security feature having a local resolver which may bypass my DHCP provided DNS which may be the only one with the correct DNS view
if you ask him anyways the result can't be more secure than aksing him directly, if not your breaking real world
in other words: if you are in a untrustable LAN you can not make it more trustable without good changes to break things in trustable ones
Let me use your scenario based on TLS. You want to be able to change your TLS certificates and the private CA you regenerate at any time, without any browser on your network ever giving you a popup warning. You know you cannot ask this - it goes against the security model. The same applies for DNS with DNSSEC. The security demands we need to do validation and caching and we should try to make that as flexible and painless as possible
uhm no - there is a CA signed root zone -> signed TLD -> signed domain
and if you believe that in a not trustable network you don't know if you get the signing informations at all - fine, but you hardly an enforce that with a local software
if i control the network i control the whle traffic and without your own satellite link you can't change that
Case 4, 5, 6 and 7: DNS cache, again, isn't needed.
Again, DNSSEC validation on the device requires caching.
the question is if i gain aynthing doing it on the end-device
The infrastructure is well setup, and caching is done by the business servers. DNS outages at the business level, mean there are other issues and they will likely be resolved quickly. You don't want to reboot / reset interfaces for each time you make a change or as the first result of an issue (Again, this would give fedora a bad name). DNS caching may mask a bigger problem.
I don't really understand this paragraph.
have fun debugging DNS troubles of a road-warrior in your network without realize that he brings his own DNS server
In conclusion, I don't percieve that a DNS cache in Fedora is a good idea, as it solves few real world problems, and may in fact create issues, mask issues and create a bad stigma about Fedora network reliability. If it is to become available to users I would like:
I believe you will need to re-think that in light of running a validating DNSSEC resolver on your laptop or servers.
no
- DNS cache is not the default. It bust be enabled on a connection (IE
user's in case 1 can enable it if needed)
- DNS cache should be able to be enabled from the NM Gui
- DNS cache should be able to be flushed live from the NM Gui
- DNS cache should be flushed on route or interface state change.
- If two interfaces are active, the default route DNS cache setting
takes precedence.
You cannot separate dns cache from DNSSEC. DNS caching is not a problem, it is a feature. If you don't want your records cached, use ttl=0
the cache already is running in my LAN for good reasons that DNS cache is pushed with DHCP that DNS cache already does DNSSEC validation
if you don't trust the network itself you are lost anyways
On Sun, 2014-04-13 at 06:35 +0200, Reindl Harald wrote:
and if you believe that in a not trustable network you don't know if you get the signing informations at all - fine, but you hardly an enforce that with a local software
That is the WHOLE point of DNSSEC, really.
if i control the network i control the whle traffic and without your own satellite link you can't change that
It is not necessary, integrity protection allows me to find out if my traffic is getting through or not. If not, though luck.
Case 4, 5, 6 and 7: DNS cache, again, isn't needed.
Again, DNSSEC validation on the device requires caching.
the question is if i gain aynthing doing it on the end-device
If you need to ask this question I think you missed the whole point of DNSSEC.
have fun debugging DNS troubles of a road-warrior in your network without realize that he brings his own DNS server
Caching DNS is common on many OSs, it is nothing special really.
In conclusion, I don't percieve that a DNS cache in Fedora is a good idea, as it solves few real world problems, and may in fact create issues, mask issues and create a bad stigma about Fedora network reliability. If it is to become available to users I would like:
I believe you will need to re-think that in light of running a validating DNSSEC resolver on your laptop or servers.
no
Oh yeah, you have to. You may decide you do not like it. That is fine, but DNSSEC does change things slightly and, IMHO, for the better.
- DNS cache is not the default. It bust be enabled on a connection (IE
user's in case 1 can enable it if needed)
Just to answer William too, I do not think you brought any evidence that this would not be a good default. On the contrary in base OS we've felt for a long time now the need for it. It is a pain to do without a decent cache, and nscd is not it!
- DNS cache should be able to be enabled from the NM Gui
A way to toggle it on and off is not a bad idea, but I do not think it needs to be a requirement. Turning off the unbound service is not exactly difficult.
- DNS cache should be able to be flushed live from the NM Gui
I totally agree there should be a way to flush the cache, I am not sure it makes sense to have it in the GUI, certainly nm-cli (or other command) should offer it.
- DNS cache should be flushed on route or interface state change.
I do not see why, the only reason to flush a cache is when there is a DNS change (new interface, eg VPN coming up, or going away)
- If two interfaces are active, the default route DNS cache setting
takes precedence.
This would break VPNs, which are often not the default route. What you want is to send arbitrary queries to the default DNS, and have forwarders per domain for special interfaces like VPNs (unless in the configuration you establish that all DNS traffic should go over the VPN when you connect).
You cannot separate dns cache from DNSSEC. DNS caching is not a problem, it is a feature. If you don't want your records cached, use ttl=0
the cache already is running in my LAN for good reasons
That's a different cache, however if you feel strongly you will be able to turn off the local caching dns server on your machines, at your option.
that DNS cache is pushed with DHCP
Forwarders are pushed via DHCP, not caches.
that DNS cache already does DNSSEC validation
Which is useless in the *general* case. You may think your physical security is perfect, that;s great, but for everybody else, trusting the network is not ok, that's why more an more people de[ploy TLS or GSSAPI in internal networks too. The era of the clear text trusted private network is coming to an end, whether you like it or not.
if you don't trust the network itself you are lost anyways
Let me troll a bit, this is why you do all your banking without HTTPS ? :-)
I am strongly in favor of a DNS cache on Fedora, and I would even seriously consider any proposal of making it the default on Fedora Server too.
Regards, Simo.
Am 13.04.2014 08:42, schrieb Simo Sorce:
- DNS cache should be flushed on route or interface state change.
I do not see why, the only reason to flush a cache is when there is a DNS change (new interface, eg VPN coming up, or going away)
because if i change my routing from ISP to VPN i want to access the company severs over the VPN - any of them
changing the default root is a common way for such a switch
the cache already is running in my LAN for good reasons
That's a different cache, however if you feel strongly you will be able to turn off the local caching dns server on your machines, at your option.
that DNS cache is pushed with DHCP
Forwarders are pushed via DHCP, not caches
says who? you or better the one built the network and services?
the via DHCP pushed DNS servers are caches because they do not forward anything, they are doing recursion - if youre DNS servers are only forwarders consider to change that
frankly the main reason i stepped in that thread at all is that people started to talk about recursion / forwarding without understand that both terms in case of DNS
that DNS cache already does DNSSEC validation
Which is useless in the *general* case. You may think your physical security is perfect, that;s great, but for everybody else, trusting the network is not ok, that's why more an more people de[ploy TLS or GSSAPI in internal networks too. The era of the clear text trusted private network is coming to an end, whether you like it or not.
if you don't trust the network itself you are lost anyways
Let me troll a bit, this is why you do all your banking without HTTPS ? :-)
that is a completly different story, you enter a HTTPS URL manually or triggered by HSTS, so you request a encrypted connection from the very first start
in case of DNS there is nothing encrypted at start resolving and if i proper manipulate the network you are in i hide any DNSSEC response from you (deep packet inspection)
I am strongly in favor of a DNS cache on Fedora, and I would even seriously consider any proposal of making it the default on Fedora Server too
as long as it is not a hard wired dependency..... i don't need additional DNS servers on any system
the systems are running BIND are doing that with good reasons the systems running Unbound as local cache doing that for good reasons (MTA servers) the systems running dnsmasq are doing it for good reasons (Reverse-proxy with own DNS view)
Proposal is to add a local caching DNS server to fedora systems. This may or may not accept a DHCP provided forwarder(?).
Yes. It depends on the "trustworthiness" of the network and or preconfiguration of some of your own networks you join.
Not really: Every network you join, you have to semi-trust. If you don't trust it, why did you join it?
There are too many cases where the network admin is correct, and does the right thing. It's more the exception I think that the network is truly untrustworthy.
Case 1: Standard home user. Has little knowledge of DNS, and a router provided by their ISP. DHCP provides the laptop with the DNS ip of the router, which then acts as a forwarder to the ISP
Works reasonably well with unbound+dnssec-triger, could use better NM integration for captive portals.
But you can't account for every captive portal in the world. This is why the cache is a bad idea, because you can't possibly account for every system that is captive like this.
Case 2: Moderate home user. They have a little knowledge of DNS, and have setup a system like OpenWRT or gargoyle on their router. They have their own zone, .local . This means that their DHCP provides the DNS ip of the router to clients.
Same if their wifi is closed (eg WPA2), will need an exception in NM if their wifi is open for the .local forward.
What if I call my network .concrete. Or .starfish. Or any other weird thing I have seen on personal networks. Again, you cannot bypass the local network DNS as the forwarder. You must respect it.
Case 3: Power home user. They likely have their own fedora router server or some other system setup. They run their own bind / named instance on their network, with their own zone or two. They have DHCP setup, perhaps to use DDNS updates to named.
Same as above.
Case 4: Small business workstation. Likely the small business, like the power home user, has their own name server. It may be Windows DNS from AD, or bind.
Same as above.
Case 5: Medium / Large business workstation. It's nearly guaranteed that the business runs their own zones. They have their own extensive, well organised bind / named setup.
When connecting to their LAN or secure wifi, same as above for one one forwarding zone. Multiple forwarding zones would need to be configured. It if is an enterprise, they might need their corporate CAs as well as their zones configuration, so a corporate rpm package would make sense.
How do you plan to make this work? You can't magically discover all the DNS zones hosted in an enterprise. At my work we run nearly 100 zones, and they are all based at different points (IE, a.com, b.com, c.com.) You cannot assume a business has just "a.com" and you can forward all quieries for subtree.a.com to that network server.
Again, you *must* respect the DHCP provided DNS server as the forwarder else you will savagely break things.
Case 6: Fedora server in a small business: Same as the workstation, likely AD or bind in the office.
Same as previous
Case 7: Fedora server in the large business. Same as the workstation.
Same as previous
Case 8: Road warrior to the power home, small business, or large business. Uses VPN, and needs access to the DNS provided by the push dhcp / dns options from their vpn tunnel.
Same, already works if you only need the one domain that is negotiated via the VPN (eg the IKE XAUTH domain).
You can negotiate more than one domain on a VPN .... again, see above.
Now, in all of these cases the system local DNS cache *must* forward to the local DNS server. You can't at the OS distinguish between any of these cases with just the DHCP record or lease. It is unreasonable to ask everyone to manually setup DNS on every network they join. You must have the forwarder set to the DNS provided by DHCP so that you can access the local network resources. You cannot suggest bypassing these.
We are not suggesting that for LAN or secure wifi. In those cases the forward will be added. However, you don' want those forwards for open wifi or else I can bring up "linksys" push you a forward for your internal.domain.com and mislead you into thinking you would be going over your VPN.
This is a more serious problem, than a caching resolver could hope to solve as it shows malicious intent.
Case 1: The user doesn't know much about DNS. the ISP might be reliable or unreliable. If we assume as discussed that the cache is flushed on network change, they will have an empty cache.
The cache is never fully flushed. It is only flushed for the domain obtained via DHCP or VPN, because those entries can change. They are not changed for anything else. If the upstream ISP could have spoofed them, so be it - the publisher of the domains could have used DNSSEC to prevent that from happening.
No no no!!!! You need to flush *all* entries. Consider what I resolve www.google.com to. That changes *per* ISP because google provides different DNS endpoints and zones to ISPs to optimise traffic! So when I use google at work, I'm now getting a suboptimal route to their servers!
With a good, ISP, they will get consistent answers to queries, and there is no point to having the cache. If the ISP is unreliable, they will get records that are incorrect (See broken TTLs etc),
There is no such things as "broken TTLs". And there is no modern nameserver that believes or honours TTLs for months. The unbound default is cache-max-ttl: 86400. Nothing will be cached for more than one day regardless of the TTL received. Again, if a publisher of a zone wants ISPs to keep their hands of their records, they should use DNSSEC and sign their zone.
So that's a valid point: A non-caching unbound that caps TTLs is a good idea, but as you say, you can't stop a dodgy ISP.
Case 2: The user does know a bit. But when they change name records they may not be able to solve why a workstation can't resolve names like other clients.
While we could flush the entire cache on (dis)connect, I think that's rather drastic for this kind of odd use-case. If the user runs their own zone and their own records, they should know about DNS and TTLs. But even so, NM could offer an option to flush the DNS cache.
But this isn't even an odd use case. There are enough power users in the world who do this. It's not just computer enthusiasts, I know a chemist who did this, and others. You can't just assume a generic case, and then break it for others.
Case 3: This user does understand DNS, and they don't need DNS cache.
That depends. You need caching for DNSSEC validation, so really, every device needs a cache, unless you want to outsource your DNSSEC validation over an insecure transport (LAN). That seems like a very bad idea.
If your lan is insecure, you have other issues. That isn't the problem you are trying to solve.
They have bind / named setup, and they would like to rely on that instead.
They can. DNS caches are chained. There is no reason to say you cannot run your own cache and have a network based cache.
But you don't *need* it. I went to efforts to setup my own bind to cache, I shouldn't need it on my system. Again, local caches cause all kinds of issues. A home user is likely to toy with things and set a high-ish ttl, say even 10 minutes, and change records on their server. Then their records appear broken, because the local cache isn't expired yet.
When they change records in their local zones, they don't want to have to flush caches etc. If their ISP is unreliable, or their own DNS is unreliable, a DNS cache will potentially mask this issue delaying them from noticing / solving the problem.
This is becoming really contrived. Again, if you think this is a real scenario (I don't think it is) than you could run unbound with ttl=0. But a requirement of automagically understanding what a local zone is and automagically understanding when a remote authoritative dns server changes data, and not willing to enforce that with ttl=0, and using that as argument why any solution of unbound to provide a security feature (DNSSEC) is getting a little unrealistic. If you want your laptop to start validating TLSA and SSHP and OPENPGPKEY records, you need DNSSEC validation on the device. The question should be "how do you change your network requirements to meet that goal". Yes, enforcing security comes at a price.
It's not contrived: This is a common network setup for all the people I know who are enthusiasts or how they setup their home networks. This is why it's a use case.
Let me use your scenario based on TLS. You want to be able to change your TLS certificates and the private CA you regenerate at any time, without any browser on your network ever giving you a popup warning. You know you cannot ask this - it goes against the security model. The same applies for DNS with DNSSEC. The security demands we need to do validation and caching and we should try to make that as flexible and painless as possible.
The issue is that by adding DNSSEC in this way, you are going to cause a great deal of pain because these caches. Add DNSSEC, but if you need to cache, cache for the most minimal time possible.
Case 4, 5, 6 and 7: DNS cache, again, isn't needed.
Again, DNSSEC validation on the device requires caching.
The infrastructure is well setup, and caching is done by the business servers. DNS outages at the business level, mean there are other issues and they will likely be resolved quickly. You don't want to reboot / reset interfaces for each time you make a change or as the first result of an issue (Again, this would give fedora a bad name). DNS caching may mask a bigger problem.
I don't really understand this paragraph.
It's linked to the other cases. It's the point that local system caches aren't needed as you have access to highly reliable DNS systems.
Additionally, business networks are "trusted" so you can trust their DNS caches etc. (to a point)
Case 8: Vpns are a bit unreliable, and have relatively high(ish) latency. But mostly they are quite good, ie openvpn. DNS cache *might* help here in case of traffic loss. Again, this would be masking a greater issue though, and could be better solved with TCP dns queries rather than UDP.
The VPN cases aleady work very well in Fedora. I seamlessly connect and disconnect from the redhat VPN. Resources that are available only via the VPN are never blocked by wrong DNS cache I got from when the VPN was down. VPNs are a non-issue.
Consider a business with external and internal DNS zones. This becomes an issue in this case. If you have cached say "website.example.com" to the external IP, and that is DMZed somehow on the internal network, when you change to VPN, you need to use the internal view of that zone instead. But you can't the name is cached.
In conclusion, I don't percieve that a DNS cache in Fedora is a good idea, as it solves few real world problems, and may in fact create issues, mask issues and create a bad stigma about Fedora network reliability. If it is to become available to users I would like:
I believe you will need to re-think that in light of running a validating DNSSEC resolver on your laptop or servers.
- DNS cache is not the default. It bust be enabled on a connection (IE
user's in case 1 can enable it if needed)
- DNS cache should be able to be enabled from the NM Gui
- DNS cache should be able to be flushed live from the NM Gui
- DNS cache should be flushed on route or interface state change.
- If two interfaces are active, the default route DNS cache setting
takes precedence.
You cannot separate dns cache from DNSSEC. DNS caching is not a problem, it is a feature. If you don't want your records cached, use ttl=0.
No, cache is not a feature. It's a chronic issue. Look at windows systems that service desks around the world always advise the first step is reboot: Why? Flush dns caches (Or other things). When you can't get to a website? Restart the webbrower, to flush the cache. Intermittent network issues for different people on a network? The cache is allowing some people to work, but masking the issue to them. It's not allowing people to quickly and effectively isolate issues.
DNSSEC is a good idea: Caches are a problem.
If this really is to be used, I cannot stress enough, that a cache must be completely flushed every time the default route or network interface changes. You can't, and I can't possibly conceive every network setup in the world. If you make assumptions like this, systems will break and fedora will be blamed.
William Brown wrote:
The cache is never fully flushed. It is only flushed for the domain obtained via DHCP or VPN, because those entries can change. They are not changed for anything else. If the upstream ISP could have spoofed them, so be it - the publisher of the domains could have used DNSSEC to prevent that from happening.
No no no!!!! You need to flush *all* entries. Consider what I resolve www.google.com to. That changes *per* ISP because google provides different DNS endpoints and zones to ISPs to optimise traffic! So when I use google at work, I'm now getting a suboptimal route to their servers!
You'll still reach Google, but you'll get a suboptimal route for up to five minutes – provided that you managed to go from home to work and reconnect your laptop in less than five minutes. Big deal.
I just looked up www.google.com to check, and I got a TTL of 300 seconds on both A and AAAA records.
You need caching for DNSSEC validation, so really, every device needs a cache, unless you want to outsource your DNSSEC validation over an insecure transport (LAN). That seems like a very bad idea.
If your lan is insecure, you have other issues. That isn't the problem you are trying to solve.
If admins want to set up firewalls, link-layer encryption, intrusion detection and stuff in an attempt to keep all adversaries out of their LAN, and then have the security of servers and workstations depend on the guarantee that the LAN is secure, then they should have to explicitly configure each computer to trust everybody on the LAN. Fedora can *not* assume that it will only ever be connected to secure, isolated networks.
A home user is likely to toy with things and set a high-ish ttl, say even 10 minutes, and change records on their server. Then their records appear broken, because the local cache isn't expired yet.
The kind of user who runs their own DNS at home and tinkers with settings like that, is the kind of person who will learn from the experience and will thereafter know what DNS caching is.
Intermittent network issues for different people on a network? The cache is allowing some people to work, but masking the issue to them. It's not allowing people to quickly and effectively isolate issues.
You keep repeating this argument, as if it's somehow a bad thing that people can continue to work even when the DNS servers have a temporary problem. To me it sounds more like an argument for why the network admins should disable the cache on their own workstations and leave it enabled on everybody else's, so that the admins will be the first to discover a problem – and that translates to an argument for having a cache by default.
On Sun, 13 Apr 2014, William Brown wrote:
Yes. It depends on the "trustworthiness" of the network and or preconfiguration of some of your own networks you join.
Not really: Every network you join, you have to semi-trust. If you don't trust it, why did you join it?
You don't always control which networks your device roams on. If I agree to starbucks at my street, my phone will connect to any network named starbucks, even if it is yours. So to draw the line between the user _knowingly_ joining a network, we drew the line at "plug in physically or provided the authentication credentials".
Works reasonably well with unbound+dnssec-triger, could use better NM integration for captive portals.
But you can't account for every captive portal in the world. This is why the cache is a bad idea, because you can't possibly account for every system that is captive like this.
Yes we can by monitoring for "captivity signs" when a new network is joined. Again, please yum install dnssec-trigger on your laptop and start the dnssec-trigger applet once, and go have a coffee outside. Let us know your experience.
Case 2: Moderate home user. They have a little knowledge of DNS, and have setup a system like OpenWRT or gargoyle on their router. They have their own zone, .local . This means that their DHCP provides the DNS ip of the router to clients.
Same if their wifi is closed (eg WPA2), will need an exception in NM if their wifi is open for the .local forward.
What if I call my network .concrete. Or .starfish. Or any other weird thing I have seen on personal networks. Again, you cannot bypass the local network DNS as the forwarder. You must respect it.
We will! If your DHCP has:
option domain-name-servers 10.1.2.3; option domain-name "starfish";
Then unbound would get a forward configured to use 10.1.2.3 for the domain .starfish, basically calling:
sudo unbound-control forward_add starfish 10.1.2.3. sudo unbound-control flush starfish sudo unbound-control flush_requestlist
When you leave the network, forward_remove is called.
sudo unbound-control forward_remove starfish sudo unbound-control flush startfish sudo unbound-control flush_requestlist
When connecting to their LAN or secure wifi, same as above for one one forwarding zone. Multiple forwarding zones would need to be configured. It if is an enterprise, they might need their corporate CAs as well as their zones configuration, so a corporate rpm package would make sense.
How do you plan to make this work? You can't magically discover all the DNS zones hosted in an enterprise. At my work we run nearly 100 zones, and they are all based at different points (IE, a.com, b.com, c.com.) You cannot assume a business has just "a.com" and you can forward all quieries for subtree.a.com to that network server.
If you are that large a business, you should really have a corporate build rpm package with your enterprise information such as local CA, local zones, etc. DNS forwarder zones can be dropped into /etc/unbound/*.d/ currently. I would expect we would make this software neutral via NM integration, where an NM unbound plugin would use those directories. We could add a per-network option that specifies to use a forward for "." (everything) instead of just the DHCP specified domain, or perhaps even do this for trusted (see above) networks.
However, that should not be the default for open wifi networks for security reasons.
Again, you *must* respect the DHCP provided DNS server as the forwarder else you will savagely break things.
And not doing anything will cause people to have insecure DNS. So I think the question should be turned around a little bit. There is a need for DNSSEC on the end nodes - how can we best facilitate that while trying to be as supportive of current deployments as we can be? That is what we are trying to do. If you only counter with "I require insecure DNS for my network to function" or "all cache is evil", than you are not openminded enough to the realities of the requirement of DNSSEC support.
Same, already works if you only need the one domain that is negotiated via the VPN (eg the IKE XAUTH domain).
You can negotiate more than one domain on a VPN .... again, see above.
not with IPsec/XAUTH. If more domains can come in via openvpn or something, that I would assume the existing openvpn unbound plugin already deals with that case. If not, please file a bug and we will fix it.
We are not suggesting that for LAN or secure wifi. In those cases the forward will be added. However, you don' want those forwards for open wifi or else I can bring up "linksys" push you a forward for your internal.domain.com and mislead you into thinking you would be going over your VPN.
This is a more serious problem, than a caching resolver could hope to solve as it shows malicious intent.
I'm sorry I don't understand what you are trying to say here.
Case 1: The user doesn't know much about DNS. the ISP might be reliable or unreliable. If we assume as discussed that the cache is flushed on network change, they will have an empty cache.
The cache is never fully flushed. It is only flushed for the domain obtained via DHCP or VPN, because those entries can change. They are not changed for anything else. If the upstream ISP could have spoofed them, so be it - the publisher of the domains could have used DNSSEC to prevent that from happening.
No no no!!!! You need to flush *all* entries. Consider what I resolve www.google.com to. That changes *per* ISP because google provides different DNS endpoints and zones to ISPs to optimise traffic! So when I use google at work, I'm now getting a suboptimal route to their servers!
google publishes TTLs for that which are honoured. If google requires different records when you switch ISPs, they need to use shorter TTLs. The publisher decides here, not the consumer. Additionally, to resolve these issues, there is a new draft that has been implemented by some (such as opendns which specifically has this problem at a large scale):
https://tools.ietf.org/html/draft-vandergaast-edns-client-subnet-02
So I consider this a solved problem, even if code and deployment is not there yet at this moment.
So that's a valid point: A non-caching unbound that caps TTLs is a good idea, but as you say, you can't stop a dodgy ISP.
Actually you can! A captive hotspot is not much different from a dodgy ISP. unbound tries its best to not use any DNS server that messes with DNS. So ISPs like Rogers who like to rewrite DNS packets are explicitely not used by unbound - it prefers to become a full recursive server without offloading to any forwarder if the forwarder is that malicious. We even run DNS resolvers as Fedora infrastructure that provides DNS over TCP-80 and DNS over TLS-443 as alternatives to work around these broken ISPs that also block port 53 in an attempt to force you to use their DNS lies.
Case 2: The user does know a bit. But when they change name records they may not be able to solve why a workstation can't resolve names like other clients.
While we could flush the entire cache on (dis)connect, I think that's rather drastic for this kind of odd use-case. If the user runs their own zone and their own records, they should know about DNS and TTLs. But even so, NM could offer an option to flush the DNS cache.
But this isn't even an odd use case. There are enough power users in the world who do this. It's not just computer enthusiasts, I know a chemist who did this, and others. You can't just assume a generic case, and then break it for others.
If you are changing DNS records, you need to understand TTL and cache flushing. If you don't than sure, you can be the clueless windows user that reboots their machine. I care much more about some of the more realistic use cases of fedora machines connected over 3G, where latency matters and flushing the entire cache would cause both more traffic and more latency. And things like pre-fetching where we renew cached DNS entries that are still being served from cache, to avoid the outage when the record expires.
Case 3: This user does understand DNS, and they don't need DNS cache.
That depends. You need caching for DNSSEC validation, so really, every device needs a cache, unless you want to outsource your DNSSEC validation over an insecure transport (LAN). That seems like a very bad idea.
If your lan is insecure, you have other issues. That isn't the problem you are trying to solve.
Yes it is. When I'm at the coffee shop, my LAN is insecure. I don't want to trust DNS answers coming in. I want to validate those using DNSSEC on my own device. So I need to run a validating recursive (caching) nameserver for very valid security reasons - so that the guy next to me cannot spoof paypal.com.
They have bind / named setup, and they would like to rely on that instead.
They can. DNS caches are chained. There is no reason to say you cannot run your own cache and have a network based cache.
But you don't *need* it. I went to efforts to setup my own bind to cache, I shouldn't need it on my system. Again, local caches cause all kinds of issues. A home user is likely to toy with things and set a high-ish ttl, say even 10 minutes, and change records on their server. Then their records appear broken, because the local cache isn't expired yet.
See above where the same argument was discussed. But also, you would have the exact same problem on many devices on your network that won't throw away that DNS record immediately. In-browser caches, OSX system wide cache, and who knows what your PVR, game console and TV do these days. If this worked for you in the past, you were lucky AND you engineered this to work. If you handed that solution to you unknowledgable chemist, it's time to update their solution to meet the modern demands of facilating to use DNSSEC on every device.
When they change records in their local zones, they don't want to have to flush caches etc. If their ISP is unreliable, or their own DNS is unreliable, a DNS cache will potentially mask this issue delaying them from noticing / solving the problem.
This is becoming really contrived. Again, if you think this is a real scenario (I don't think it is) than you could run unbound with ttl=0. But a requirement of automagically understanding what a local zone is and automagically understanding when a remote authoritative dns server changes data, and not willing to enforce that with ttl=0, and using that as argument why any solution of unbound to provide a security feature (DNSSEC) is getting a little unrealistic. If you want your laptop to start validating TLSA and SSHP and OPENPGPKEY records, you need DNSSEC validation on the device. The question should be "how do you change your network requirements to meet that goal". Yes, enforcing security comes at a price.
It's not contrived: This is a common network setup for all the people I know who are enthusiasts or how they setup their home networks. This is why it's a use case.
I suggest you keep a close eye on the IETF HOMENET people, because DNSSEC is coming into your home automation one way or the other, and if you depend on this system, you will run into trouble into the future.
Let me use your scenario based on TLS. You want to be able to change your TLS certificates and the private CA you regenerate at any time, without any browser on your network ever giving you a popup warning. You know you cannot ask this - it goes against the security model. The same applies for DNS with DNSSEC. The security demands we need to do validation and caching and we should try to make that as flexible and painless as possible.
The issue is that by adding DNSSEC in this way, you are going to cause a great deal of pain because these caches. Add DNSSEC, but if you need to cache, cache for the most minimal time possible.
As I argued in the last few days, I do not see this "great deal of pain" and I've provided an unbound workaround for you, and your corner case can be dealt with via a new NM option.
It's linked to the other cases. It's the point that local system caches aren't needed as you have access to highly reliable DNS systems.
You will just have to come to term with the fact that caches are needed when you are doing constant DNSSEC validation. So your argument that caches are not needed might have been true in the past, but is no longer. Now let's work on ensuring your exception cases can be supported in the precense of caches.
Additionally, business networks are "trusted" so you can trust their DNS caches etc. (to a point)
Business networks are never compromised? But as I stated, we already said we will do the forward using the DHCP supplied nameserver in case of a LAN or secured WIFI connection.
Case 8: Vpns are a bit unreliable, and have relatively high(ish) latency. But mostly they are quite good, ie openvpn. DNS cache *might* help here in case of traffic loss. Again, this would be masking a greater issue though, and could be better solved with TCP dns queries rather than UDP.
The VPN cases aleady work very well in Fedora. I seamlessly connect and disconnect from the redhat VPN. Resources that are available only via the VPN are never blocked by wrong DNS cache I got from when the VPN was down. VPNs are a non-issue.
Consider a business with external and internal DNS zones. This becomes an issue in this case. If you have cached say "website.example.com" to the external IP, and that is DMZed somehow on the internal network, when you change to VPN, you need to use the internal view of that zone instead. But you can't the name is cached.
Which is why we flush the cache for the domain in question when we detect a network change. See the above unbound commands used. This is a solved problem. Every day, when my VPN is up I reach bugzilla.redhat.com on its internal IP, and when my VPN is down I reach bugzilla.redhat.com on its external IP. Without any manual intervention. It just works.
No, cache is not a feature. It's a chronic issue.
Then please let us know what you intend to replace DNS with. The reason DNS has worked for over 20 years is because it is a caching system.
Look at windows systems that service desks around the world always advise the first step is reboot: Why? Flush dns caches (Or other things). When you can't get to a website? Restart the webbrower, to flush the cache. Intermittent network issues for different people on a network? The cache is allowing some people to work, but masking the issue to them. It's not allowing people to quickly and effectively isolate issues.
If DNS cache was the only cause for Windows machines to need a reboot, I'm sure Microsoft would have fixed that by now. Let's remain honest here and say there are a 1001 reasons why Windows users reboot their machines. DNS might be one of them but it has no relationship to the discussion we are having right now.
DNSSEC is a good idea: Caches are a problem.
We disagree.
If this really is to be used, I cannot stress enough, that a cache must be completely flushed every time the default route or network interface changes. You can't, and I can't possibly conceive every network setup in the world. If you make assumptions like this, systems will break and fedora will be blamed.
Consider some of the options I suggested for addition to NM to accomodate your scenario, or suggest alternatives. If you are believe the only solution is "no cache ever", than there is not much more we can talk about. And if the majority of fedora users prefers an insecure no-cache over a DNSSEC-cache solution, I guess I will go elsewhere and stop running Fedora.
Paul
But you can't account for every captive portal in the world. This is why the cache is a bad idea, because you can't possibly account for every system that is captive like this.
Yes we can by monitoring for "captivity signs" when a new network is joined. Again, please yum install dnssec-trigger on your laptop and start the dnssec-trigger applet once, and go have a coffee outside. Let us know your experience.
What is a "captivity-sign" as you so put it?
What if I call my network .concrete. Or .starfish. Or any other weird thing I have seen on personal networks. Again, you cannot bypass the local network DNS as the forwarder. You must respect it.
We will! If your DHCP has:
option domain-name-servers 10.1.2.3; option domain-name "starfish";
Then unbound would get a forward configured to use 10.1.2.3 for the domain .starfish, basically calling:
sudo unbound-control forward_add starfish 10.1.2.3. sudo unbound-control flush starfish sudo unbound-control flush_requestlist
When you leave the network, forward_remove is called.
sudo unbound-control forward_remove starfish sudo unbound-control flush startfish sudo unbound-control flush_requestlist
Okay, so lets expand this to my workplace, that run's a University network. We have thousands of students connected. Now, we have many zones on our network, from services.university.edu university.edu, medicalcenter.org, ersearch.com etc etc.
We can't possibly put all of these into our "domain-name" dhcp option. iirc it's a single value attribute anyway.
So how does unbound handle this? Does it bypass my network DNS servers completely for everything that isn't university.edu or a child of? IMO that's not acceptable behaviour.
When connecting to their LAN or secure wifi, same as above for one one forwarding zone. Multiple forwarding zones would need to be configured. It if is an enterprise, they might need their corporate CAs as well as their zones configuration, so a corporate rpm package would make sense.
How do you plan to make this work? You can't magically discover all the DNS zones hosted in an enterprise. At my work we run nearly 100 zones, and they are all based at different points (IE, a.com, b.com, c.com.) You cannot assume a business has just "a.com" and you can forward all quieries for subtree.a.com to that network server.
If you are that large a business, you should really have a corporate build rpm package with your enterprise information such as local CA, local zones, etc. DNS forwarder zones can be dropped into /etc/unbound/*.d/ currently. I would expect we would make this software neutral via NM integration, where an NM unbound plugin would use those directories. We could add a per-network option that specifies to use a forward for "." (everything) instead of just the DHCP specified domain, or perhaps even do this for trusted (see above) networks.
However, that should not be the default for open wifi networks for security reasons.
See above: We can't possibly hope to deploy such a package to students and staff with bring-your-own-device. How do you propose we populate all the needed forwarders for our students (Such, maybe only a few hundred use linux / fedora - but it will cause them to have a negative view of the OS.
Again, you *must* respect the DHCP provided DNS server as the forwarder else you will savagely break things.
And not doing anything will cause people to have insecure DNS. So I think the question should be turned around a little bit. There is a need for DNSSEC on the end nodes - how can we best facilitate that while trying to be as supportive of current deployments as we can be? That is what we are trying to do. If you only counter with "I require insecure DNS for my network to function" or "all cache is evil", than you are not openminded enough to the realities of the requirement of DNSSEC support.
Sure, lets agree we "need" dnssec, and that follows that we need cache.
Set cache times to be deliberately low so that silly network admin's don't break things (Even 300).
Don't try and by pass the local network DNS: There are more network configurations in the world than you or I can contemplate, and bypassing this *will* break things for people.
Case 1: The user doesn't know much about DNS. the ISP might be reliable or unreliable. If we assume as discussed that the cache is flushed on network change, they will have an empty cache.
The cache is never fully flushed. It is only flushed for the domain obtained via DHCP or VPN, because those entries can change. They are not changed for anything else. If the upstream ISP could have spoofed them, so be it - the publisher of the domains could have used DNSSEC to prevent that from happening.
No no no!!!! You need to flush *all* entries. Consider what I resolve www.google.com to. That changes *per* ISP because google provides different DNS endpoints and zones to ISPs to optimise traffic! So when I use google at work, I'm now getting a suboptimal route to their servers!
google publishes TTLs for that which are honoured. If google requires different records when you switch ISPs, they need to use shorter TTLs. The publisher decides here, not the consumer. Additionally, to resolve these issues, there is a new draft that has been implemented by some (such as opendns which specifically has this problem at a large scale):
https://tools.ietf.org/html/draft-vandergaast-edns-client-subnet-02
So I consider this a solved problem, even if code and deployment is not there yet at this moment.
See also my comments about internal and external zones on a network.
If you want to cache, then you can't assume that what I cache on network A will be valid on network B. Consider the home user with the dodgy ISP that set's all TTLs to say 30 days. Do you want that user to take that cached entry to a working network and be using that cache for 30 days? (Or whatever unbound sets it TTL max to.)
So that's a valid point: A non-caching unbound that caps TTLs is a good idea, but as you say, you can't stop a dodgy ISP.
Actually you can! A captive hotspot is not much different from a dodgy ISP. unbound tries its best to not use any DNS server that messes with DNS. So ISPs like Rogers who like to rewrite DNS packets are explicitely not used by unbound - it prefers to become a full recursive server without offloading to any forwarder if the forwarder is that malicious. We even run DNS resolvers as Fedora infrastructure that provides DNS over TCP-80 and DNS over TLS-443 as alternatives to work around these broken ISPs that also block port 53 in an attempt to force you to use their DNS lies.
But you can't really tell what's a dodgy DNS and what's not. There are plenty of good ISP's with well configured DNS systems that you *should* use as a forwarder. Again, you can't determine what zones exist in this DNS server so that you can use it "just for those" and bypass it for all else.
Consider also, that some ISP's force all port 53 traffic to their own DNS servers too. How does unbound know when the ISP is forcing this?
Essentially, what I'm hearing at the moment, is that the proposal isn't just a caching DNS server: It's a DNS server that will be:
* DNSSEC * Caching * Attempts to always bypass my local DNS forwarder.
Case 2: The user does know a bit. But when they change name records they may not be able to solve why a workstation can't resolve names like other clients.
While we could flush the entire cache on (dis)connect, I think that's rather drastic for this kind of odd use-case. If the user runs their own zone and their own records, they should know about DNS and TTLs. But even so, NM could offer an option to flush the DNS cache.
But this isn't even an odd use case. There are enough power users in the world who do this. It's not just computer enthusiasts, I know a chemist who did this, and others. You can't just assume a generic case, and then break it for others.
If you are changing DNS records, you need to understand TTL and cache flushing. If you don't than sure, you can be the clueless windows user that reboots their machine. I care much more about some of the more realistic use cases of fedora machines connected over 3G, where latency matters and flushing the entire cache would cause both more traffic and more latency. And things like pre-fetching where we renew cached DNS entries that are still being served from cache, to avoid the outage when the record expires.
Case 3: This user does understand DNS, and they don't need DNS cache.
That depends. You need caching for DNSSEC validation, so really, every device needs a cache, unless you want to outsource your DNSSEC validation over an insecure transport (LAN). That seems like a very bad idea.
If your lan is insecure, you have other issues. That isn't the problem you are trying to solve.
Yes it is. When I'm at the coffee shop, my LAN is insecure. I don't want to trust DNS answers coming in. I want to validate those using DNSSEC on my own device. So I need to run a validating recursive (caching) nameserver for very valid security reasons - so that the guy next to me cannot spoof paypal.com.
DNSSEC doesn't solve the coffee shop problem: You're still on open wireless and there are plenty of other attacks you are still vulnerable too.
Sure it helps. But this is DNSSEC helping, not the cache.
Look at windows systems that service desks around the world always advise the first step is reboot: Why? Flush dns caches (Or other things). When you can't get to a website? Restart the webbrower, to flush the cache. Intermittent network issues for different people on a network? The cache is allowing some people to work, but masking the issue to them. It's not allowing people to quickly and effectively isolate issues.
If DNS cache was the only cause for Windows machines to need a reboot, I'm sure Microsoft would have fixed that by now. Let's remain honest here and say there are a 1001 reasons why Windows users reboot their machines. DNS might be one of them but it has no relationship to the discussion we are having right now.
That's deflecting the point. The first advice when you can't access some website or service foo, is to reboot for this reason.
DNSSEC is a good idea: Caches are a problem.
We disagree.
If this really is to be used, I cannot stress enough, that a cache must be completely flushed every time the default route or network interface changes. You can't, and I can't possibly conceive every network setup in the world. If you make assumptions like this, systems will break and fedora will be blamed.
Consider some of the options I suggested for addition to NM to accomodate your scenario, or suggest alternatives. If you are believe the only solution is "no cache ever", than there is not much more we can talk about. And if the majority of fedora users prefers an insecure no-cache over a DNSSEC-cache solution, I guess I will go elsewhere and stop running Fedora.
I'm glad that the NM integration is being considered, that will help. I might not be afraid to touch a CLI, but I do think of users who use the GUI only.
I think that at the end of the day, there are just too many network setups than we can both contemplate. Some of them will be ready for DNSSEC and systems like unbound when the time comes (As stubborn as I may seem, When it becomes default, I will try and make my systems work seamlessly with the OOB defaults)
Consider how many networks don't advertise all their domain names via DHCP. How many networks have more than one zone that unbound can't magically discover. How many networks have split views. These are all reasons to flush the cache on interface state change, because the world won't magically make their networks "perfect" for fedoras sake. We need to work in a world of various configurations ranging from sane to insane.
DNS as a caching system has worked, because the caches on networks don't move. They have one view of the world and they don't move. If you have a laptop or other system that moves around, and you take that view of the DNS world with you, things will become screwy and might break in subtle ways that ordinary users can't explain.
In summary, all I ask is that:
* If a forwarder exists on the network, unbound uses it for all queries. (You can't know all the internal zones it's holding, and often they are not all advertised).
* If that forwarder returns an invalid signed DNSSEC zone then you bypass it for only that zone. (IE the zone is being tampered with)
* Unbound flushes it's cache between interface state changes, because you are moving between networks with different DNS views of the world.
* That you keep the DNS cache time short, to help avoid issues with DNS admins who forcefully increase TTLs. Consider google, with the TTL of 300. Perhaps even set each cached record to have a cache time of ttl or 3600 which ever is lower.
Im trying to think about the "user experience" of fedora here rather than a technically perfect world. These suggestions will eliminate all the concerns I have with this system and would hopefully make the default experience better. :)
On Mon, 14 Apr 2014, William Brown wrote:
What is a "captivity-sign" as you so put it?
Check for clean port 80. It fetches the url specified in dnssec-triggerd.conf's url: option (default http://fedoraproject.org/static/hotspot.txt)
If it returns a redirect or a page that does not contain the exact text "OK" it knows a hotspot has intercepted the page and will prompt the user to login to the hotspot. It the user agrees, resolv.conf is filled in with the DHCP obtained values and it fires of xdg-open to the page http://hotspot-nocache.fedoraproject.org/ which is a special DNS entry with TTL=0 so it can never be cached (so we will go through the DNS lies that are told about the name)
When port 80 becomes clean, it is assumed you have "logged on" and it then runs various DNS/DNSSEC tests against TLD servers for known features and bugs in old DNS software. This will determine if DNS is still messed with. If the forwarder shows broken behaviour, it is attempted to bypass it as I described before.
sudo unbound-control forward_add starfish 10.1.2.3. sudo unbound-control flush starfish sudo unbound-control flush_requestlist
When you leave the network, forward_remove is called.
sudo unbound-control forward_remove starfish sudo unbound-control flush startfish sudo unbound-control flush_requestlist
Okay, so lets expand this to my workplace, that run's a University network. We have thousands of students connected. Now, we have many zones on our network, from services.university.edu university.edu, medicalcenter.org, ersearch.com etc etc.
We can't possibly put all of these into our "domain-name" dhcp option. iirc it's a single value attribute anyway.
As we indicated, for "trusted" networks (LAN, secure WIFI) a domain of "." will be used which means "forward everything". This does NOT mean we stop being a recursor. We still recursive because we need tod perform DNSSEC validation. We just use the available DNS cache of the local network - which also gets us your internal-only domains.
Sure, lets agree we "need" dnssec, and that follows that we need cache.
Set cache times to be deliberately low so that silly network admin's don't break things (Even 300).
I still don't see a need for artificially lowered cache times.
Don't try and by pass the local network DNS: There are more network configurations in the world than you or I can contemplate, and bypassing this *will* break things for people.
The publisher determines the TTL, not the consumer. And if we add a forward for "." meaning all domains, that we also run a cache flush for "." meaning all domains. So I don't think TTL matters in this case at all.
If you want to cache, then you can't assume that what I cache on network A will be valid on network B. Consider the home user with the dodgy ISP that set's all TTLs to say 30 days. Do you want that user to take that cached entry to a working network and be using that cache for 30 days? (Or whatever unbound sets it TTL max to.)
Yes. The problem here is the dodgy ISP. If they are dodgy enough, unbound will bypass them anyway. If we need to add an NM option for "don't use this dodgy ISPs DNS servers" we can also add that.
But you can't really tell what's a dodgy DNS and what's not.
Yes we can. There is both dnssec-trigger and some other software that runs various tests for this.
There are plenty of good ISP's with well configured DNS systems that you *should* use as a forwarder. Again, you can't determine what zones exist in this DNS server so that you can use it "just for those" and bypass it for all else.
See earlier discussion. If a wire or secure wifi, and working ISP DNS, we will use it. And flush it.
Consider also, that some ISP's force all port 53 traffic to their own DNS servers too. How does unbound know when the ISP is forcing this?
unbound does not really care about transparent proxy's on port 53. As long as they don't break DNS (and DNSSEC). If they redirect port 53 to some broken DNS server, unbound will try to work around it. If port 53 is broken it will attempt DNS over port 80 of various fedoraproject DNS servers, or DNS over TLS on port 443.
Essentially, what I'm hearing at the moment, is that the proposal isn't just a caching DNS server: It's a DNS server that will be:
- DNSSEC
- Caching
- Attempts to always bypass my local DNS forwarder.
I hope I clarified it now that your third bullet point is not the case.
Sure it helps. But this is DNSSEC helping, not the cache.
I've said everything about caching already. I understand you deem it evil and I explained why I believe you are wrong. We disagree.
If DNS cache was the only cause for Windows machines to need a reboot, I'm sure Microsoft would have fixed that by now. Let's remain honest here and say there are a 1001 reasons why Windows users reboot their machines. DNS might be one of them but it has no relationship to the discussion we are having right now.
That's deflecting the point.
No, bringing up Windows which has nothing to do with anything we are talking about here was deflecting the point.
I'm glad that the NM integration is being considered, that will help. I might not be afraid to touch a CLI, but I do think of users who use the GUI only.
This is why we did not want to force everyone on dnssec-triggerd. We know that solution is not good enough for non-devs.
DNS as a caching system has worked, because the caches on networks don't move. They have one view of the world and they don't move. If you have a laptop or other system that moves around, and you take that view of the DNS world with you, things will become screwy and might break in subtle ways that ordinary users can't explain.
This is a reality already. Every time your phone switches from 3G/LTE to wifi. When I walk across the street, that happens many times. I'm pretty sure my phone won't be flushing its cache all the time.
In summary, all I ask is that:
- If a forwarder exists on the network, unbound uses it for all queries.
Yes, but not for open wifi. Only for physical wire and secured wifi.
- If that forwarder returns an invalid signed DNSSEC zone then you
bypass it for only that zone. (IE the zone is being tampered with)
That's not how things work. The DNS server is either capable or not capable of doing DNSSEC. That is not a "per zone" thing. If it fails to return RRSIG signature records for the root zone, there is nothing you can do but forget about that server. (technically speaking, I do what you say, if you consider "." to be "only that zone")
- Unbound flushes it's cache between interface state changes, because
you are moving between networks with different DNS views of the world.
I am not convinced that is required. It does a lot of damage too.
- That you keep the DNS cache time short, to help avoid issues with DNS
admins who forcefully increase TTLs. Consider google, with the TTL of 300. Perhaps even set each cached record to have a cache time of ttl or 3600 which ever is lower.
No. As I stated repeatedly, we are NOT in the business of modifying DNS records. If people pubilsh long TTLs, we will honor those TTLs. Doing otherwise is similar to launching an attack on the nameservers of those domains, who might not be able to handle such short TTLs. Imagine if I run a domain using a nameserver on my DSL. TTL of 7200. The name gets known, and everyone starts hammering it because middle boxes cut the TTL to 300. That's irresponsible.
Im trying to think about the "user experience" of fedora here rather than a technically perfect world. These suggestions will eliminate all the concerns I have with this system and would hopefully make the default experience better. :)
I think we are fairly close to agreement on what's needed. Thank you for your discussion this with us. It is clear now that we must flush the entire cache when we use a forwarder for more than one domain (eg not the VPN cases) when using authenticated networks. That is something I had not considered before.
Paul
On Mon, 2014-04-14 at 00:42 -0400, Paul Wouters wrote:
On Mon, 14 Apr 2014, William Brown wrote:
What is a "captivity-sign" as you so put it?
Check for clean port 80. It fetches the url specified in dnssec-triggerd.conf's url: option (default http://fedoraproject.org/static/hotspot.txt)
If it returns a redirect or a page that does not contain the exact text "OK" it knows a hotspot has intercepted the page and will prompt the user to login to the hotspot. It the user agrees, resolv.conf is filled in with the DHCP obtained values and it fires of xdg-open to the page http://hotspot-nocache.fedoraproject.org/ which is a special DNS entry with TTL=0 so it can never be cached (so we will go through the DNS lies that are told about the name)
When port 80 becomes clean, it is assumed you have "logged on" and it then runs various DNS/DNSSEC tests against TLD servers for known features and bugs in old DNS software. This will determine if DNS is still messed with. If the forwarder shows broken behaviour, it is attempted to bypass it as I described before.
This seems like a sane(ish) method of doing this. What happens if the hotspot page is down? Why not use a mirror-like setup with yum where you try 2 or 3 mirrors and if they fail then you declare it to be a portal?
sudo unbound-control forward_add starfish 10.1.2.3. sudo unbound-control flush starfish sudo unbound-control flush_requestlist
When you leave the network, forward_remove is called.
sudo unbound-control forward_remove starfish sudo unbound-control flush startfish sudo unbound-control flush_requestlist
Okay, so lets expand this to my workplace, that run's a University network. We have thousands of students connected. Now, we have many zones on our network, from services.university.edu university.edu, medicalcenter.org, ersearch.com etc etc.
We can't possibly put all of these into our "domain-name" dhcp option. iirc it's a single value attribute anyway.
As we indicated, for "trusted" networks (LAN, secure WIFI) a domain of "." will be used which means "forward everything". This does NOT mean we stop being a recursor. We still recursive because we need tod perform DNSSEC validation. We just use the available DNS cache of the local network - which also gets us your internal-only domains.
This point was not made clear. Will summarise at the bottom ...
Yes. The problem here is the dodgy ISP. If they are dodgy enough, unbound will bypass them anyway. If we need to add an NM option for "don't use this dodgy ISPs DNS servers" we can also add that.
But you can't really tell what's a dodgy DNS and what's not.
Yes we can. There is both dnssec-trigger and some other software that runs various tests for this.
*how* can you tell if it's dodgy. You can tell captivity from above, but you can't easily see if an ISP is say TTL tampering or data tampering?
Consider also, that some ISP's force all port 53 traffic to their own DNS servers too. How does unbound know when the ISP is forcing this?
unbound does not really care about transparent proxy's on port 53. As long as they don't break DNS (and DNSSEC). If they redirect port 53 to some broken DNS server, unbound will try to work around it. If port 53 is broken it will attempt DNS over port 80 of various fedoraproject DNS servers, or DNS over TLS on port 443.
How do you setup DNS over TLS?
Again, how can you guarantee that the fedora infrastructure won't go down? My devils advocate points out we are adding more reliance on "third party" infrastructure here. Could it again be a case similar to the mirrors where you can "become" a fedoraproject DNS node to help load balance?
Essentially, what I'm hearing at the moment, is that the proposal isn't just a caching DNS server: It's a DNS server that will be:
- DNSSEC
- Caching
- Attempts to always bypass my local DNS forwarder.
I hope I clarified it now that your third bullet point is not the case.
It's not as much the case, which makes me happier, but I want to know the conditions on which you decide a DNS server is "dodgy" or not.
I'm glad that the NM integration is being considered, that will help. I might not be afraid to touch a CLI, but I do think of users who use the GUI only.
This is why we did not want to force everyone on dnssec-triggerd. We know that solution is not good enough for non-devs.
Agreed.
In summary, all I ask is that:
- If a forwarder exists on the network, unbound uses it for all queries.
Yes, but not for open wifi. Only for physical wire and secured wifi.
Okay. Can this point be made clear on the proposal page? Also the conditions for Physical wire, and secured wifi?
There are also a number of tethering situations that may actually be mis-interpreted as secure. IE my phone has a WPA2 hotspot with DNS that goes via 3g. How does unbound treat this? It would only see the secure wifi ...
Consider also some wifi hotspots have their own "local" zones that are needed, so again, I do think that unbound should use the local forwarder irrespective of network security, because else you may risk breaking things. Or how would you suggest this is solved. For arguments sake lets say:
SSID: myawesomeopenhotspot DHCP provides no domain-name info. I CNAME all records to my.hotspot. until authenticated.
Your hotspot test will be triggered, but if unbound won't use the local forwarder, you won't be able to resolve my.hotspot. on the insecure wifi.
- If that forwarder returns an invalid signed DNSSEC zone then you
bypass it for only that zone. (IE the zone is being tampered with)
That's not how things work. The DNS server is either capable or not capable of doing DNSSEC. That is not a "per zone" thing. If it fails to return RRSIG signature records for the root zone, there is nothing you can do but forget about that server. (technically speaking, I do what you say, if you consider "." to be "only that zone")
Okay that makes sense.
- Unbound flushes it's cache between interface state changes, because
you are moving between networks with different DNS views of the world.
I am not convinced that is required. It does a lot of damage too.
- That you keep the DNS cache time short, to help avoid issues with DNS
admins who forcefully increase TTLs. Consider google, with the TTL of 300. Perhaps even set each cached record to have a cache time of ttl or 3600 which ever is lower.
No. As I stated repeatedly, we are NOT in the business of modifying DNS records. If people pubilsh long TTLs, we will honor those TTLs. Doing otherwise is similar to launching an attack on the nameservers of those domains, who might not be able to handle such short TTLs. Imagine if I run a domain using a nameserver on my DSL. TTL of 7200. The name gets known, and everyone starts hammering it because middle boxes cut the TTL to 300. That's irresponsible.
okay, but lets combine these two points. My ISP mucks with the TTL of some website from say 300 to 30000000. Unbound would respect this to that amount, or to the TTL max (Which is still 86400 iirc). If you aren't flushing the cache between networks you could end up with:
* Suboptimal routes causing a poor user experience. * Incorrect cached zone data moving between networks with different DNS views of the world.
Ignoring the TTL change, lets just look at flushing between network state change. This would solve both the dot points listed. You only need to rebuild the cache on first network reconnect meaning:
* You are caching for that session the correct results as that network sees them. * You get the TTLs for that network (Even if they were tampered with) * You don't take that data to other networks.
Alternately, consider a per-network cache? IE on one network I build a named cache for that network, when I join the other I use that cache.
This way the cache:
* Persists across interface changes * Keeps sane zone data in each network environment.
This would also solve my 3g WPA hotspot case, given the cache built on the hotspot doesn't polute my work wifi for example.
Im trying to think about the "user experience" of fedora here rather than a technically perfect world. These suggestions will eliminate all the concerns I have with this system and would hopefully make the default experience better. :)
I think we are fairly close to agreement on what's needed. Thank you for your discussion this with us. It is clear now that we must flush the entire cache when we use a forwarder for more than one domain (eg not the VPN cases) when using authenticated networks. That is something I had not considered before.
Thanks. That's what these mailing lists are for, even if it can end up in essay length posts. At the end of the day, I'm sure we both want the best user experience.
In summary (Possibly something to add to the wiki)
* Unbound does captive portal detection. Detail how it's done (See above in this email)
* Unbound tries to find dodgy DNS servers. Detail how this detection is done.
* On an open (Insecure) access point, unbound bypasses the local forwarder, except for names listed in the single valued attribute "options domain-name" from dhcp
* On a secure network (Encrypted wifi, lan) unbound will use the forwarders as provided by DHCP.
* Unbound will flush the cache between authenticated networks. (If I read your last point correctly)
Sincerely,
On Mon, 14 Apr 2014, William Brown wrote:
This seems like a sane(ish) method of doing this. What happens if the hotspot page is down? Why not use a mirror-like setup with yum where you try 2 or 3 mirrors and if they fail then you declare it to be a portal?
It has multiple A records matching the redundancy of the fedora infrastructure.
*how* can you tell if it's dodgy. You can tell captivity from above, but you can't easily see if an ISP is say TTL tampering or data tampering?
TTL tampering does not really affect our operation. When caches are chained, you always get lower TTLs. That's how cached data works. Raising the TTL is only allowed within our confines.
Data tampering is detectable by DNSSEC. For those domains that are not protected by DNSSEC we cannot detect tampering. If the publisher want their data integrity, they will have to sign their zones.
unbound does not really care about transparent proxy's on port 53. As long as they don't break DNS (and DNSSEC). If they redirect port 53 to some broken DNS server, unbound will try to work around it. If port 53 is broken it will attempt DNS over port 80 of various fedoraproject DNS servers, or DNS over TLS on port 443.
How do you setup DNS over TLS?
Unbound has this capability already build in. unbound-control activates via (currently via dnssec-triggerd, in the future via NM) using the keywords tcp-upstream or ssl-upstream.
Again, how can you guarantee that the fedora infrastructure won't go down? My devils advocate points out we are adding more reliance on "third party" infrastructure here. Could it again be a case similar to the mirrors where you can "become" a fedoraproject DNS node to help load balance?
Not yet, but it is a thought. Although it is probably more stable and better to than see about getting sponsored services from one of the large ANYcast DNS cloud providers. Note that when you get to the point of needing port 80 or 443 for DNS, you are arguably already on a pretty disfunctional rogue network where you probably shouldn't be on. It hampers your (DNSSEC) security. So you can see these services as "better than disconnecting from the network".
It's not as much the case, which makes me happier, but I want to know the conditions on which you decide a DNS server is "dodgy" or not.
For a detailed list you will have to check the source code. But it includes thing like DNSSEC records, proper wildcard NSEC(3) records, CNAME support, EDNS0 support, packet sizes, etc. The known bugs in older versions of common DNS software. Cases the IETF actually experienced in the wild.
- If a forwarder exists on the network, unbound uses it for all queries.
Yes, but not for open wifi. Only for physical wire and secured wifi.
Okay. Can this point be made clear on the proposal page? Also the conditions for Physical wire, and secured wifi?
Yes, we can do that.
There are also a number of tethering situations that may actually be mis-interpreted as secure. IE my phone has a WPA2 hotspot with DNS that goes via 3g. How does unbound treat this? It would only see the secure wifi ...
We cannot protect against wired or secure wifi running insecure DNS. We do run our tests and if it works install the forward. If it fails to work we won't install the forward.
Consider also some wifi hotspots have their own "local" zones that are needed, so again, I do think that unbound should use the local forwarder irrespective of network security, because else you may risk breaking things.
You sometimes cannot, because often those hotspot routers are old and broken and have crappy DNS proxy/mangling, breaking DNSSEC. That's the premise of how this entire DNS mess started - we could not use DNSSEC when using DNS servers obtained from the network. Anyone injecting rogue domains (what you call "local zones") cannot be distinguished from an attack. Those hotspots should use proper DNS names and/or http redirection for those local zones. Although note that currently dnssec-trigger does give you the option to remain in "insecure mode" which would use the local DNS, and give access to those zones.
Or how would you suggest this is solved. For arguments sake lets say:
SSID: myawesomeopenhotspot DHCP provides no domain-name info. I CNAME all records to my.hotspot. until authenticated.
If this does not do http(s) redirection, than it is a very very broken setup. You would also need to block port 53 to prevent people using 8.8.8.8 to bypass your authentication. So I do not see this in the wild (and I've done the hard work of sitting in many coffee shops for science). hotspots tend to intercept port 80 to a mini web server that only serves a redirect, sometimes without any DNS name, often with a DNS name only resolvable by using their DNS. And that is fully supported by the current solution.
There are many possible scenarios to come up with that will not work. There are manual overrides for that. In my years of experience, these kind of problems are far more rare than for instance the hotspot network using the same IP range as my remote VPN network, causing me to fail to be able to establish a VPN connection.
Your hotspot test will be triggered, but if unbound won't use the local forwarder, you won't be able to resolve my.hotspot. on the insecure wifi.
Arguably, you CNAMEs my domain away is a breach of the DNS "contract" anyway. The method is wrong. It also just runs too many risks for the hotspot vendor of not working.
okay, but lets combine these two points. My ISP mucks with the TTL of some website from say 300 to 30000000. Unbound would respect this to that amount, or to the TTL max (Which is still 86400 iirc). If you aren't flushing the cache between networks you could end up with:
- Suboptimal routes causing a poor user experience.
- Incorrect cached zone data moving between networks with different DNS
views of the world.
If we believe that artificial increase of TTL is a common manglement, we can have dnssec-trigger (or the NM integrated version of that) check for such mangling. I'm reluctant to try and solve every _imaginable_ problem out there. If your ISPs badness causes suboptimal routes, than that's not the end of the world, and you have your ISP to blame. One ISP shouldn't be responsible for every fedora user flushing caches all the time. Let's deal with this problem when we actually find it is a real world problem.
Ignoring the TTL change, lets just look at flushing between network state change. This would solve both the dot points listed. You only need to rebuild the cache on first network reconnect meaning:
"only rebuild"? You are asking everyone else to do hundreds of queries for each time to join their 3G network. Remember, when validating, you don't just have one record for a queried A record. Since you need to recurse and do all the intermediate queries too because otherwise you don't have the records to do full DNSSEC validation. It's not a reasonably thing to flush the cache. We are working hard on ensuring the user _hits_ their cache and gains speed up (including pre-fetching). Waiting on various roundtrips for DNS over 3G is going to cause a lot more delays than a "suboptimal route". Your workaround will actually be detrimental to the user experience.
Note, I'm trying to optimise that path too, see: http://tools.ietf.org/html/draft-ietf-dnsop-edns-chain-query-00
Alternately, consider a per-network cache? IE on one network I build a named cache for that network, when I join the other I use that cache.
That's a lot of (over)engineering.
This would also solve my 3g WPA hotspot case, given the cache built on the hotspot doesn't polute my work wifi for example.
That was already addressed with a full cache flush when you connect to your (secure) work wifi and get "." as forwarded zone flushed.
In summary (Possibly something to add to the wiki)
- Unbound does captive portal detection. Detail how it's done (See above
in this email)
- Unbound tries to find dodgy DNS servers. Detail how this detection is
done.
- On an open (Insecure) access point, unbound bypasses the local
forwarder, except for names listed in the single valued attribute "options domain-name" from dhcp
No, we cannot do that. As I said, a rogue hotspot could give the domain-name "corp.paypal.com" to fool me into thinking I'm connecting to my internal corporate network. We cannot automatically insert those forwards on open wifi, unless the user manually performs an override.
- On a secure network (Encrypted wifi, lan) unbound will use the
forwarders as provided by DHCP.
Provided they are functional (eg don't break DNSSEC)
- Unbound will flush the cache between authenticated networks. (If I
read your last point correctly)
If we did a "." forward, yes.
Paul
unbound does not really care about transparent proxy's on port 53. As long as they don't break DNS (and DNSSEC). If they redirect port 53 to some broken DNS server, unbound will try to work around it. If port 53 is broken it will attempt DNS over port 80 of various fedoraproject DNS servers, or DNS over TLS on port 443.
How do you setup DNS over TLS?
Unbound has this capability already build in. unbound-control activates via (currently via dnssec-triggerd, in the future via NM) using the keywords tcp-upstream or ssl-upstream.
I meant for say bind, but okay.
It's not as much the case, which makes me happier, but I want to know the conditions on which you decide a DNS server is "dodgy" or not.
For a detailed list you will have to check the source code. But it includes thing like DNSSEC records, proper wildcard NSEC(3) records, CNAME support, EDNS0 support, packet sizes, etc. The known bugs in older versions of common DNS software. Cases the IETF actually experienced in the wild.
IE, If I have an out of box bind9 setup with a few zones, or even 100s of zones, these cases should never be triggered. I would hate to see the "dodgy DNS" check giving a false positive on networks that are actually sane ... Such checks need to be conservative in their triggers IMO.
- If a forwarder exists on the network, unbound uses it for all queries.
Yes, but not for open wifi. Only for physical wire and secured wifi.
Okay. Can this point be made clear on the proposal page? Also the conditions for Physical wire, and secured wifi?
Yes, we can do that.
Thanks.
okay, but lets combine these two points. My ISP mucks with the TTL of some website from say 300 to 30000000. Unbound would respect this to that amount, or to the TTL max (Which is still 86400 iirc). If you aren't flushing the cache between networks you could end up with:
- Suboptimal routes causing a poor user experience.
- Incorrect cached zone data moving between networks with different DNS
views of the world.
If we believe that artificial increase of TTL is a common manglement, we can have dnssec-trigger (or the NM integrated version of that) check for such mangling. I'm reluctant to try and solve every _imaginable_ problem out there. If your ISPs badness causes suboptimal routes, than that's not the end of the world, and you have your ISP to blame. One ISP shouldn't be responsible for every fedora user flushing caches all the time. Let's deal with this problem when we actually find it is a real world problem.
It actually is quite common from certain Australian ISPs .... especially the "cheap" ones (You get what you pay for ... )
Even if we ignore the TTL mangling, the first issue of incorrect cached zone data moving between networks is a real world issue IMO. As previously mention, split view business networks. I believe you have said this is solved by flushing "." forwarder between networks that are "secure".
- On an open (Insecure) access point, unbound bypasses the local
forwarder, except for names listed in the single valued attribute "options domain-name" from dhcp
No, we cannot do that. As I said, a rogue hotspot could give the domain-name "corp.paypal.com" to fool me into thinking I'm connecting to my internal corporate network. We cannot automatically insert those forwards on open wifi, unless the user manually performs an override.
Okay, This is another point to make clear on the wiki. I thought this was what you were saying was the case on open wifi.
- On a secure network (Encrypted wifi, lan) unbound will use the
forwarders as provided by DHCP.
Provided they are functional (eg don't break DNSSEC)
Again, can you on the wiki detail the "functional" requirements.
The reason I ask these are documented, is so that when other network admins (like myself) come along, you have already had the argument and provided the justification and detailed explanations of these "edge cases".
- Unbound will flush the cache between authenticated networks. (If I
read your last point correctly)
If we did a "." forward, yes.
Moved ...
Ignoring the TTL change, lets just look at flushing between network state change. This would solve both the dot points listed. You only need to rebuild the cache on first network reconnect meaning:
"only rebuild"? You are asking everyone else to do hundreds of queries for each time to join their 3G network. Remember, when validating, you don't just have one record for a queried A record. Since you need to recurse and do all the intermediate queries too because otherwise you don't have the records to do full DNSSEC validation. It's not a reasonably thing to flush the cache. We are working hard on ensuring the user _hits_ their cache and gains speed up (including pre-fetching). Waiting on various roundtrips for DNS over 3G is going to cause a lot more delays than a "suboptimal route". Your workaround will actually be detrimental to the user experience.
Note, I'm trying to optimise that path too, see: http://tools.ietf.org/html/draft-ietf-dnsop-edns-chain-query-00
These two statements really seem to contradict. On one hand you say that moving between secure networks, the "." forwarder gets flushed. But then you say the whole point is that it isn't flushed!
On my 3g tether, and work, both would be secure wifi, so according to this both flush (Which really, I like :) ) But according to what you are saying they shouldn't do that, but they do?
Really, it seems like the only time the cache *won't* flush is when I move from a secure wifi to an insecure wifi. What happens when I move from the insecure wifi back? I would like to argue that given not all domains have DNSSEC yet, you can't "trust" the records from the insecure wifi, so at the least on insecure wifi interface down, you should flush the non-dnssec cached records.
Which collecting this seems to mean (Current functional state):
Secure to secure network -> Flush "." cache. Secure to insecure network -> Keep cache Insecure to Insecure network -> Keep cache Insecure to secure network -> Keep cache.
I think in the perfect world, assuming that insecure networks are insecure shouldn't it be?
Secure to secure network -> Flush "." cache. Secure to insecure network -> Keep cache Insecure to insecure network -> Keep DNSSEC cache only. Insecure to secure network -> Keep DNSSEC cache only.
But considering split horizon private secure networks etc, shouldn't it really be?
Secure to secure network -> Keep DNSSEC cache only. Secure to insecure network -> Keep DNSSEC cache only. Insecure to insecure network -> Keep DNSSEC cache only. Insecure to secure network -> Keep DNSSEC cache only.
The only records you can really guarantee as being the same on all network views are ones signed by DNSSEC. From the secure networks you may have internal views, and on the insecure networks you can't trust unsigned records. And IIRC there are DNSSEC split view functions too which may not even let you cache the DNSSEC records anyway ....
It gets messy very quickly when you start talking about moving a cache with specific, limited views of the DNS world around between sites ;)
Sure, this won't affect "every person ever who uses fedora", but it will really annoy they ones who get affected by it. Some people will know how to diagnose it, others won't.
Sincerely,
On Tue, 15 Apr 2014, William Brown wrote:
How do you setup DNS over TLS?
Unbound has this capability already build in. unbound-control activates via (currently via dnssec-triggerd, in the future via NM) using the keywords tcp-upstream or ssl-upstream.
I meant for say bind, but okay.
bind does not support this.
For a detailed list you will have to check the source code. But it includes thing like DNSSEC records, proper wildcard NSEC(3) records, CNAME support, EDNS0 support, packet sizes, etc. The known bugs in older versions of common DNS software. Cases the IETF actually experienced in the wild.
IE, If I have an out of box bind9 setup with a few zones, or even 100s of zones, these cases should never be triggered. I would hate to see the "dodgy DNS" check giving a false positive on networks that are actually sane ... Such checks need to be conservative in their triggers IMO.
Correct. It only happens for bind4/bind8 or broken old bind9s, djbdns/cache, but mostly because of 5 year old dnsmasq versions embedded in the platforms as "dns proxy".
Even if we ignore the TTL mangling, the first issue of incorrect cached zone data moving between networks is a real world issue IMO. As previously mention, split view business networks. I believe you have said this is solved by flushing "." forwarder between networks that are "secure".
Correct. If an ISP starts modifying DNS content, it is simply an attack. You have no trust relationship with them.
The reason I ask these are documented, is so that when other network admins (like myself) come along, you have already had the argument and provided the justification and detailed explanations of these "edge cases".
Understood.
"suboptimal route". Your workaround will actually be detrimental to the user experience.
Note, I'm trying to optimise that path too, see: http://tools.ietf.org/html/draft-ietf-dnsop-edns-chain-query-00
These two statements really seem to contradict. On one hand you say that moving between secure networks, the "." forwarder gets flushed. But then you say the whole point is that it isn't flushed!
The number of flushes should be limited as much as possible. It is only to accomdate certain networks that we flush the cache. Our preference is to never flush. But we accept sometimes it cannot be avoided to support certain type of DNS deployments.
On my 3g tether, and work, both would be secure wifi, so according to this both flush (Which really, I like :) ) But according to what you are saying they shouldn't do that, but they do?
The price we have to pay to support some kind of setups. We can also add an option that tells us to not flush certain (secure) networks because we know there is no special casing there. Those are tunings we can do later.
Really, it seems like the only time the cache *won't* flush is when I move from a secure wifi to an insecure wifi. What happens when I move from the insecure wifi back? I would like to argue that given not all domains have DNSSEC yet, you can't "trust" the records from the insecure wifi, so at the least on insecure wifi interface down, you should flush the non-dnssec cached records.
Whether the network is "secure" or "insecure" only has an effect on the forwarder state, and thus potentially certain domains handled by that forwarder. DNSSEC validation is not skipped in those cases, so data can still be trusted. Non-DNSSEC domains are always vulnerable to a MITM. Since they can just sign their domains, I don't feel personally that we need to go out of our ways to accomodate those insecure setups. If people will differently, again we could tune and make a toggle.
Which collecting this seems to mean (Current functional state):
Secure to secure network -> Flush "." cache. Secure to insecure network -> Keep cache Insecure to Insecure network -> Keep cache Insecure to secure network -> Keep cache.
I think in the perfect world, assuming that insecure networks are insecure shouldn't it be?
Secure to secure network -> Flush "." cache. Secure to insecure network -> Keep cache Insecure to insecure network -> Keep DNSSEC cache only. Insecure to secure network -> Keep DNSSEC cache only.
I'll think about these a little more. Note that "keep DNSSEC cache only" is currently not an option implemented by unbound.
The only records you can really guarantee as being the same on all network views are ones signed by DNSSEC.
Not really, you can have differently signed zones for the same name for internal and external view. Hopefully with at least the same DNSKEY, but even that could be different. It would require a manual configuration though of files in /etc/unbound/*.d/
Paul
On Mon, 2014-04-14 at 10:21 -0400, Paul Wouters wrote:
Or how would you suggest this is solved. For arguments sake lets say:
SSID: myawesomeopenhotspot DHCP provides no domain-name info. I CNAME all records to my.hotspot. until authenticated.
If this does not do http(s) redirection, than it is a very very broken setup. You would also need to block port 53 to prevent people using 8.8.8.8 to bypass your authentication. So I do not see this in the wild (and I've done the hard work of sitting in many coffee shops for science). hotspots tend to intercept port 80 to a mini web server that only serves a redirect, sometimes without any DNS name, often with a DNS name only resolvable by using their DNS. And that is fully supported by the current solution.
Many of the captive portals I've seen block all access to anything except the portal login. You don't get to ping anything, you don't get to DNS anything (let alone 8.8.8.8) and you certainly don't get to send traffic outside the portal. Only when you've authenticated does it release you. Sometimes that's done with VLANs and DHCP renewal, sometimes it's done internally with firewall rules or something.
But another scenario I've seen: older Netgear routers which intercept "www.routerlogin.net" as the setup page. The instructions literally are:
1) connect your computer to the router with a cable 2) go to www.routerlogin.net 3) follow the setup guide instructions
Any idea how dnssec-trigger + unbound would handle this? Since it's router setup, maybe spawning the whole new window for the "portal" would work, but you'd want to make sure the window didn't go away or DNS didn't change until the user was done setting up the router.
Dan
On Mon, 14 Apr 2014, Dan Williams wrote:
But another scenario I've seen: older Netgear routers which intercept "www.routerlogin.net" as the setup page. The instructions literally are:
- connect your computer to the router with a cable
- go to www.routerlogin.net
- follow the setup guide instructions
Any idea how dnssec-trigger + unbound would handle this? Since it's router setup, maybe spawning the whole new window for the "portal" would work, but you'd want to make sure the window didn't go away or DNS didn't change until the user was done setting up the router.
I don't know what they do when you query for anything else. If there is no hotspot redirection on port 80/443 and their DNS server works properly, and your wifi was secure, you would then get their forward and the above would work. If it is an open wifi, we would not install the forward and you would not get there. but in the current setup, you can pick "hotspot login" mode and it puts their DNS in place, and than you will reach it. Note that manual hotspot login sessions require you to manually mark them for "reprobe" as well because apparently we cannot probe for it because you manually overrode it. If you switch networks, and bring up the VPN, you'll encounter weird things. While still in hotspot mode, the VPN forward put into unbound is not active because you are not using unbound yet (until you hit reprobe to leave "hotspot signon" mode.
Paul
On Mon, 2014-04-14 at 12:00 -0400, Paul Wouters wrote:
On Mon, 14 Apr 2014, Dan Williams wrote:
But another scenario I've seen: older Netgear routers which intercept "www.routerlogin.net" as the setup page. The instructions literally are:
- connect your computer to the router with a cable
- go to www.routerlogin.net
- follow the setup guide instructions
Any idea how dnssec-trigger + unbound would handle this? Since it's router setup, maybe spawning the whole new window for the "portal" would work, but you'd want to make sure the window didn't go away or DNS didn't change until the user was done setting up the router.
I don't know what they do when you query for anything else. If there is no hotspot redirection on port 80/443 and their DNS server works properly, and your wifi was secure, you would then get their forward and the above would work. If it is an open wifi, we would not install
Since the user is setting things up, they can pick whether it's open or protected wifi. We don't control that.
the forward and you would not get there. but in the current setup, you can pick "hotspot login" mode and it puts their DNS in place, and than you will reach it. Note that manual hotspot login sessions require you
Ok, that could be a problem. This is a user setting up wifi on a router they just bought, so it has no upstream connection yet, is not yet configured at all, and they are just following the directions in the printed brochure they got with the router. Which obviously won't say anything about "hotspot login" mode.
Also, this is the procedure you follow if you reset the router to factory defaults, which support people sometimes tell you to do. So we'd run into the issue if/when the user contacted Netgear technical support too.
Dan
to manually mark them for "reprobe" as well because apparently we cannot probe for it because you manually overrode it. If you switch networks, and bring up the VPN, you'll encounter weird things. While still in hotspot mode, the VPN forward put into unbound is not active because you are not using unbound yet (until you hit reprobe to leave "hotspot signon" mode.
Paul
On Mon, 14 Apr 2014, Dan Williams wrote:
Ok, that could be a problem. This is a user setting up wifi on a router they just bought, so it has no upstream connection yet, is not yet configured at all, and they are just following the directions in the printed brochure they got with the router. Which obviously won't say anything about "hotspot login" mode.
But dnssec-trigger detects the hotspot detection portal page didn't return the expected page content of "OK" and will suggest to the user to go into hotspot signon mode.
Paul
On Mon, Apr 14, 2014 at 9:06 AM, Dan Williams dcbw@redhat.com wrote:
On Mon, 2014-04-14 at 12:00 -0400, Paul Wouters wrote:
On Mon, 14 Apr 2014, Dan Williams wrote:
But another scenario I've seen: older Netgear routers which intercept "www.routerlogin.net" as the setup page. The instructions literally are:
- connect your computer to the router with a cable
- go to www.routerlogin.net
- follow the setup guide instructions
Any idea how dnssec-trigger + unbound would handle this? Since it's router setup, maybe spawning the whole new window for the "portal" would work, but you'd want to make sure the window didn't go away or DNS didn't change until the user was done setting up the router.
I don't know what they do when you query for anything else. If there is no hotspot redirection on port 80/443 and their DNS server works properly, and your wifi was secure, you would then get their forward and the above would work. If it is an open wifi, we would not install
Since the user is setting things up, they can pick whether it's open or protected wifi. We don't control that.
the forward and you would not get there. but in the current setup, you can pick "hotspot login" mode and it puts their DNS in place, and than you will reach it. Note that manual hotspot login sessions require you
Ok, that could be a problem. This is a user setting up wifi on a router they just bought, so it has no upstream connection yet, is not yet configured at all, and they are just following the directions in the printed brochure they got with the router. Which obviously won't say anything about "hotspot login" mode.
Also, this is the procedure you follow if you reset the router to factory defaults, which support people sometimes tell you to do. So we'd run into the issue if/when the user contacted Netgear technical support too.
If you want to get really fancy, you could try to detect a state in which there is no connection to the internet, the router has an address 192.168.*.1, and the router is listening on TCP port 80, and suggest an alternate "you are connected to a possibly unconfigured router" mode.
--Andy
On Sun, 2014-04-13 at 16:29 +0930, William Brown wrote:
That depends. You need caching for DNSSEC validation, so really,
every
device needs a cache, unless you want to outsource your DNSSEC validation over an insecure transport (LAN). That seems like a very
bad
idea.
If your lan is insecure, you have other issues. That isn't the problem you are trying to solve.
I keep seeing this repeated by you and Harald. I am truly in awe that your networks are *secure*, however that is not the common case, networks are routinely breached by zombified machines or are insecure by default (wifi, or very large networks where anyone can plug in). Basically if any of the machines on the network can be compromised the network is not secure anymore. Finally you can't certainly trust network as large as common ISPs.
All these networks need to be treated as insecure by default. You cannot trust a DNS server not on your machine to do DNSSEC resolution for you or, as soon as you want to start using DANE, TLSA, etc.. you are a sitting duck, and people will be able to MITM you extremely easily.
The default needs to cater for these issues. But of course it is just a default, on your network you'll be able to change the resolvers however you want.
The only thing I agree on is that the default MUST use the forwarders provided by the local DHCP unless the user explicitly configured otherwise.
Simo.
On Saturday, 12 April 2014 4:55 PM, William Brown wrote:
This isn't how DNS works ..... You populate your cache from the ISP, who queries above them and so on up to the root server. http://technet.microsoft.com/en-us/library/cc961401.aspx
Hmmn. There are two ways a local resolver can be configured. One is it contacts root servers and builds its cache from their responses. That's recursive name resolution. And second is it acts like a stub resolver and forwards client queries to another recursive resolver.
N-DJBDNS supports both these options. Maybe you could install it and see for yourself.
try -> # yum install ndjbdns
I should clarify. I cache the record foo.work.com from the office, and it resolves differently externally. When I go home, it no longer resolves to the external IP as I'm using the internally acquired record from cache.
No. Your foo.work.com address does not resolve to a public internet address, but resolves to an internal company specific address. And when you come home, your domain foo.work.com still resolves to the _same_ internal address, but you are unable to connect to it because you are outside of the office network.
Try connecting over VPN connection from home.
A local cache will help you with 1 "sometimes" provided you get the first record back once.
It won't prevent the second or third as you will just cache the incorrect data instead (Provided you clear cache on network change, this isn't a problem ... it just means you hold onto bad data for that session for longer, which creates other issues.)
I personally am actually against DNS cache on systems as it tends to create more problems than it solves.
Maybe you could try N-DJBDNS -> # yum install ndjbdns
--- Regards -Prasad http://feedmug.com
On Sat, 12 Apr 2014, William Brown wrote:
I should clarify. I cache the record foo.work.com from the office, and it resolves differently externally. When I go home, it no longer resolves to the external IP as I'm using the internally acquired record from cache.
This currently works for the VPN scenario. When you connect the VPN, and the VPN gives you a domain/nameservers, unbound is reconfigured on the fly with those nameservers as forwards. A cache flush is done on connecting/disconnecting from the VPN for the specified domain. Part of the new proposal for dealing with your scenario consists of two parts.
- LAN and secured WIFI that return a search domain and nameserver IPs will be installed as forwaders in unbound. The current content and request_list will be flushed using unbound-control. - open WIFI will do the same only after the user has told NM this network is to be "trusted". The current content and request_list will be flushed using unbound-control.
That should deal with flushing internal-only records when not internal and flushing external records when not external.
If the internal domain is using DNSSEC, further configuration of a trust anchor override might be needed. This can be done in /etc/unbound/*.d/ directories (commented out examples are present). Possible, this directory structure can be replaced by integrated NM support that reconfigured unbound (or dnsmasq) based on the same information.
Paul
On Sat, Apr 12, 2014 at 02:09:19PM +0800, P J P wrote:
On Saturday, 12 April 2014 11:11 AM, William Brown wrote: Say I have freshly installed my fedora system at home. I then boot it up and start to use it. My laptop is caching DNS results all the while from the "unreliable" ISP.
I then go to work and suddenly things don't work.
Having a DNS cache doesn't fix your unreliable ISP: You need to lodge a complaint with your ISP.
What, no! that was the case for having local cache and not forwarding queries to the ISP's name servers at all. Because those are not reliable.
I disagree. You can still do DNSSEC validation with a local caching resolver and configure that local resolver to forward all queries to the ISP. That should be tried first, and only bypassed and become a full interative recursive querier bypassing the ISP resolvers if that fails. We need to respect the DNS caching infrastructure by default.
Am 12.04.2014 15:31, schrieb Chuck Anderson:
On Sat, Apr 12, 2014 at 02:09:19PM +0800, P J P wrote:
On Saturday, 12 April 2014 11:11 AM, William Brown wrote: Say I have freshly installed my fedora system at home. I then boot it up and start to use it. My laptop is caching DNS results all the while from the "unreliable" ISP.
I then go to work and suddenly things don't work.
Having a DNS cache doesn't fix your unreliable ISP: You need to lodge a complaint with your ISP.
What, no! that was the case for having local cache and not forwarding queries to the ISP's name servers at all. Because those are not reliable.
I disagree. You can still do DNSSEC validation with a local caching resolver and configure that local resolver to forward all queries to the ISP. That should be tried first, and only bypassed and become a full interative recursive querier bypassing the ISP resolvers if that fails. We need to respect the DNS caching infrastructure by default.
nonsense - there are so much ISP nameservers broken out there responding with wildcards and so on that you can not trust them and you will realize that if not before after you started to run a production mailserver which relies on NXDOMAIN responses for proper operations
there are also a lot of broken DNS servers in general not respecting the TTL - not so long ago we moved one of our servers into our datacenter, changed the TTL to 5 minutes two days before and *7 months* later the DNS of my private ISP answered randomly with the old and the new address
other DNS servers out there answered after 7 months still with the old the most broken one just answered with *both* suggesting round robin to the client - problem: the old IP did no longer exist at all
how i tested that? by google for public answering nameservers, ask all which i found with a script and finally asked the tech contact of the broken ones why they not start to hire someone with the skills for DNS
On Sat, Apr 12, 2014 at 04:03:14PM +0200, Reindl Harald wrote:
Am 12.04.2014 15:31, schrieb Chuck Anderson:
On Sat, Apr 12, 2014 at 02:09:19PM +0800, P J P wrote:
On Saturday, 12 April 2014 11:11 AM, William Brown wrote: Say I have freshly installed my fedora system at home. I then boot it up and start to use it. My laptop is caching DNS results all the while from the "unreliable" ISP.
I then go to work and suddenly things don't work.
Having a DNS cache doesn't fix your unreliable ISP: You need to lodge a complaint with your ISP.
What, no! that was the case for having local cache and not forwarding queries to the ISP's name servers at all. Because those are not reliable.
I disagree. You can still do DNSSEC validation with a local caching resolver and configure that local resolver to forward all queries to the ISP. That should be tried first, and only bypassed and become a full interative recursive querier bypassing the ISP resolvers if that fails. We need to respect the DNS caching infrastructure by default.
nonsense - there are so much ISP nameservers broken out there responding with wildcards and so on that you can not trust them and you will realize that if not before after you started to run a production mailserver which relies on NXDOMAIN responses for proper operations
I don't disagree that there is lots of broken DNS out there. But realistically, we still need to default to using the DHCP-provided DNS servers as forwarders because there are unfortunately lots of circumstances where this is required to resolve corporate DNS names or to allow captive portals to work. If the local caching resolver is intelligent enough, it can handle the common use cases (corporate DNS resolution, VPN into corporate, captive portals) and work around the common failure modes (automatic cache flushing, switching to iterative mode to bypass upstream nameservers when necessary, using both the upstream nameservers AND iterative queries and combining the results) for us.
What we cannot do is have the default be to bypass the upstream DNS resolvers without some way to handle the above cases. If mainstream operating systems started doing that by default, then corporate networks, ISPs, captive portals etc. will probably start blocking DNS to outside servers or redirecting port 53 to their own servers. In fact some already do this. We don't want to escalate the arms race by encouraging this behavior.
Am 12.04.2014 16:16, schrieb Chuck Anderson:
On Sat, Apr 12, 2014 at 04:03:14PM +0200, Reindl Harald wrote:
Am 12.04.2014 15:31, schrieb Chuck Anderson:
I disagree. You can still do DNSSEC validation with a local caching resolver and configure that local resolver to forward all queries to the ISP. That should be tried first, and only bypassed and become a full interative recursive querier bypassing the ISP resolvers if that fails. We need to respect the DNS caching infrastructure by default.
nonsense - there are so much ISP nameservers broken out there responding with wildcards and so on that you can not trust them and you will realize that if not before after you started to run a production mailserver which relies on NXDOMAIN responses for proper operations
I don't disagree that there is lots of broken DNS out there. But realistically, we still need to default to using the DHCP-provided DNS servers as forwarders because there are unfortunately lots of circumstances where this is required to resolve corporate DNS names or to allow captive portals to work.
if you rely on that and are a road-wwarrior don't setup a local dns
If the local caching resolver is intelligent enough, it can handle the common use cases (corporate DNS resolution, VPN into corporate, captive portals) and work around the common failure modes (automatic cache flushing, switching to iterative mode to bypass upstream nameservers when necessary, using both the upstream nameservers AND iterative queries and combining the results) for us.
oh no - go away with such ideas
there is nothing like "intelligent" in case of a DNS resolver the only thing you achieve trying that is unpredictable behavior
What we cannot do is have the default be to bypass the upstream DNS resolvers without some way to handle the above cases. If mainstream operating systems started doing that by default, then corporate networks, ISPs, captive portals etc. will probably start blocking DNS to outside servers or redirecting port 53 to their own servers. In fact some already do this. We don't want to escalate the arms race by encouraging this behavior
"we" should not do anything - because "we" don't have a clue about the network of the enduser - if he is a road-warrior then he needs to use DHCP provided nameservers, if he is a roadrunner and has at home VPN access to his company network he has to enter the DNS servers behind the VPN into his router / dhcp and after coming home all works as expected
if the roadrunner has the VPN client directly on his machine, well then he needs to make a decision:
* enter the company nameservers in /etc/resolv.conf and always use VPN * enter the company LAN hostnames in /etc/hosts to not rely on DNS at all * manually change /etc/resolv.conf whenever needed
there is nothing to gain with trying auto-magic for sensible things like DNS
On Sat, 12 Apr 2014, Reindl Harald wrote:
"we" should not do anything - because "we" don't have a clue about the network of the enduser
We know and handle a lot more than you think already using unbound with dnssec-trigger and VPNs. Why don't you give it a shot and give us some feedback on how it works for you on your laptop?
if the roadrunner has the VPN client directly on his machine, well then he needs to make a decision:
They needs to make no decision, it has been automated already:
https://github.com/libreswan/libreswan/blob/master/programs/_updown.netkey/_...
if [ -n "$(pidof unbound)" ]; then echo "updating local nameserver for ${PLUTO_PEER_DOMAIN_INFO} with ${PLUTO_PEER_DNS_INFO}" /usr/sbin/unbound-control forward_add ${PLUTO_PEER_DOMAIN_INFO} ${PLUTO_PEER_DNS_INFO} /usr/sbin/unbound-control flush_zone ${PLUTO_PEER_DOMAIN_INFO} /usr/sbin/unbound-control flush_requestlist return 0 fi
[...]
if [ -n "$(pidof unbound)" ]; then echo "flushing local nameserver of ${PLUTO_PEER_DOMAIN_INFO}" /usr/sbin/unbound-control forward_remove ${PLUTO_PEER_DOMAIN_INFO} /usr/sbin/unbound-control flush_zone ${PLUTO_PEER_DOMAIN_INFO} /usr/sbin/unbound-control flush_requestlist return 0 fi
It even has fallbacks for when not running unbound to do this via editing /etc/resolv.conf - obviously not as preferred as running unbound, but still supported.
Paul
Am 12.04.2014 17:11, schrieb Paul Wouters:
On Sat, 12 Apr 2014, Reindl Harald wrote:
"we" should not do anything - because "we" don't have a clue about the network of the enduser
We know and handle a lot more than you think already using unbound with dnssec-trigger and VPNs. Why don't you give it a shot and give us some feedback on how it works for you on your laptop?
because i stopped to use laptops years ago? because i have way too complex dns setups for such magic? because i don't touch NM in that life?
i speak in that thread as one who understands the pain of playing with DNS and what happens if assumtions are made wrong
if the roadrunner has the VPN client directly on his machine, well then he needs to make a decision:
They needs to make no decision, it has been automated already:
https://github.com/libreswan/libreswan/blob/master/programs/_updown.netkey/_...
if [ -n "$(pidof unbound)" ]; then echo "updating local nameserver for ${PLUTO_PEER_DOMAIN_INFO} with ${PLUTO_PEER_DNS_INFO}" /usr/sbin/unbound-control forward_add ${PLUTO_PEER_DOMAIN_INFO} ${PLUTO_PEER_DNS_INFO} /usr/sbin/unbound-control flush_zone ${PLUTO_PEER_DOMAIN_INFO} /usr/sbin/unbound-control flush_requestlist return 0 fi
and if you cross my street with a users machine give me a single button to disable that because you break my setups with that - no i can't explain internal infrastructure in the public but there has to be no local cache in my way
if a co-worker asks for a dns-record, tried to call the site already you have no business to have a negative cache on the client while all 4 internal nameservers where two are already caching-servers for external responses and used as forwarder for non-auth zones have already the answer
DNS1: cache DNS2: cache DNS3: auth for 600 zones DNS4: auth for 600 zones
users asking DNS1 and DNS2 they are doing recursion
DNS3 and DNS4 are for public access from the internet to resolve customer domains
On 12.4.2014 17:25, Reindl Harald wrote:
Am 12.04.2014 17:11, schrieb Paul Wouters:
On Sat, 12 Apr 2014, Reindl Harald wrote:
"we" should not do anything - because "we" don't have a clue about the network of the enduser
We know and handle a lot more than you think already using unbound with dnssec-trigger and VPNs. Why don't you give it a shot and give us some feedback on how it works for you on your laptop?
because i stopped to use laptops years ago? because i have way too complex dns setups for such magic? because i don't touch NM in that life?
i speak in that thread as one who understands the pain of playing with DNS and what happens if assumtions are made wrong
if the roadrunner has the VPN client directly on his machine, well then he needs to make a decision:
They needs to make no decision, it has been automated already:
https://github.com/libreswan/libreswan/blob/master/programs/_updown.netkey/_...
if [ -n "$(pidof unbound)" ]; then echo "updating local nameserver for ${PLUTO_PEER_DOMAIN_INFO} with ${PLUTO_PEER_DNS_INFO}" /usr/sbin/unbound-control forward_add ${PLUTO_PEER_DOMAIN_INFO} ${PLUTO_PEER_DNS_INFO} /usr/sbin/unbound-control flush_zone ${PLUTO_PEER_DOMAIN_INFO} /usr/sbin/unbound-control flush_requestlist return 0 fi
and if you cross my street with a users machine give me a single button to disable that because you break my setups with that - no i can't explain internal infrastructure in the public but there has to be no local cache in my way
if a co-worker asks for a dns-record, tried to call the site already you have no business to have a negative cache on the client while all 4 internal nameservers where two are already caching-servers for external responses and used as forwarder for non-auth zones have already the answer
DNS1: cache DNS2: cache DNS3: auth for 600 zones DNS4: auth for 600 zones
users asking DNS1 and DNS2 they are doing recursion
DNS3 and DNS4 are for public access from the internet to resolve customer domains
It seems that this thread contains a lot of hand-waving about problems which can theoretically happen.
I would like to move the discussion to more constructive stage - get your hands dirty! :-)
Instructions for testing on Fedora 20+ are available on: https://fedoraproject.org/wiki/Changes/Default_Local_DNS_Resolver#How_To_Tes...
Please, run dnssec-trigger and let exclamations like "It can't possibly work!" apart. Send constructive bug reports instead.
E.g. - My network has static configuration XYZ in /etc/sysconfig/... and it doesn't work. "unbound-control list_forwards" shows empty list.
- I can't access my corporate mail server when I'm connected to VPN. journalctl -f -u unbound.service shows this: unbound[1062]: [1062:0] info: validation failure <mail.corp.com. A IN>: no keys have a DS with algorithm RSASHA1 from 192.168.2.1 for key dnssec-failed.org. while building chain of trust
etc. etc.
We need real data.
Thank you very much for your attention.
Hello Petr,
On Tuesday, 15 April 2014 4:02 PM, Petr Spacek wrote: Instructions for testing on Fedora 20+ are available on: https://fedoraproject.org/wiki/Changes/Default_Local_DNS_Resolver#How_To_Tes...
Please, run dnssec-trigger and let exclamations like "It can't possibly work!" apart. Send constructive bug reports instead.
We need real data.
Excellent! Thank you so much for doing this Petr.
I was going to do the same. Summarise the discussion so far, list down the problem areas and invite users to report their problems.
Having real first hand data and bug reports would be extremely effective in developing the NM plugins and integration with NM.
I'll try both the configuration on my machine.
Thank you! :) --- Regards -Prasad http://feedmug.com
Hi,
On Tuesday, 15 April 2014 4:02 PM, Petr Spacek wrote: We need real data.
Please see -> https://www.piratepad.ca/p/dnssec-requisites-configurations
I've collected the major functionalities people wish to have with a default DNS resolver along with couple of 'unbound' configurations that I tried.
It'll greatly help if others could also list their configuration details on the same page.
Thank you. --- Regards -Prasad http://feedmug.com
On Sat, 12 Apr 2014, Chuck Anderson wrote:
I don't disagree that there is lots of broken DNS out there. But realistically, we still need to default to using the DHCP-provided DNS servers as forwarders because there are unfortunately lots of circumstances where this is required to resolve corporate DNS names or to allow captive portals to work. If the local caching resolver is intelligent enough, it can handle the common use cases (corporate DNS resolution, VPN into corporate, captive portals) and work around the common failure modes (automatic cache flushing, switching to iterative mode to bypass upstream nameservers when necessary, using both the upstream nameservers AND iterative queries and combining the results) for us.
What we cannot do is have the default be to bypass the upstream DNS resolvers without some way to handle the above cases.
correct, which is why Anaconda should configure the DNS server that comes in via kickstart or administrator as a forwarder into unbound.
It is one of the modifications required for this feature.
Paul
On Sat, 12 Apr 2014, Reindl Harald wrote:
nonsense - there are so much ISP nameservers broken out there responding with wildcards and so on that you can not trust them and you will realize that if not before after you started to run a production mailserver which relies on NXDOMAIN responses for proper operations
That's not what the http://atlas.ripe.net/ data set indicates. Your story seems anecdotal and incidental.
Yes, there are a few bad players out there (like Rogers in Canada) but those are in a minority. That said, I agree that using unbound on your servers will reduce upstream DNS outage problems on your servers. I wouldn't run unbound on every VM though.
Paul
Am 12.04.2014 17:05, schrieb Paul Wouters:
On Sat, 12 Apr 2014, Reindl Harald wrote:
nonsense - there are so much ISP nameservers broken out there responding with wildcards and so on that you can not trust them and you will realize that if not before after you started to run a production mailserver which relies on NXDOMAIN responses for proper operations
That's not what the http://atlas.ripe.net/ data set indicates. Your story seems anecdotal and incidental.
if you call the year 2012 anecdotal then yes
Yes, there are a few bad players out there (like Rogers in Canada) but those are in a minority
it is not a matter of bad players, it is a matter of stupid admins on ISP sides - the case of our server was the largest ISP here and they simply had bugs in der load-balcing resulting in random results (current and outdated) from the same nameserver IP
another one was also a large ISP which started 2013 to give that wrong answers for our ipv4 address in 2013 because they fucked up their DNS due try to implement ipv6
in both cases i know for sure what happened at the ISP note that the change was done in 2011 and we are even the GLUE record
another big player at that time was OpenDNS
sicne there are not too much DNS servers of ISP answering to non customer ip addresses i found around 50 public nameservers all over the world and 15 of them where wrong after more than 7 months - yes it is a minority because it's below 50% but *way too much* for such a critical service like DNS
and here i did not talk a single time about overloaded and not responding IPS DNS again and again - we had many years massive troubles to access websites and it was always "could not be found" in Firefox which means no DNS answer - guess what - after no longer using forwarders nobody has seen that message again
On Sat, Apr 12, 2014 at 11:05:21AM -0400, Paul Wouters wrote:
On Sat, 12 Apr 2014, Reindl Harald wrote:
nonsense - there are so much ISP nameservers broken out there responding with wildcards and so on that you can not trust them and you will realize that if not before after you started to run a production mailserver which relies on NXDOMAIN responses for proper operations
That's not what the http://atlas.ripe.net/ data set indicates. Your story seems anecdotal and incidental.
Yes, there are a few bad players out there (like Rogers in Canada) but those are in a minority. That said, I agree that using unbound on your servers will reduce upstream DNS outage problems on your servers. I wouldn't run unbound on every VM though.
Okay, so here is where you and I differ then. We need a solution to run everywhere, on every system, in every use case. The local DNS daemon (note that I didn't say "cache" this time) should be a part of the Base OS like init/systemd is. It should be small, unobtrusive, and do very little, namely the one thing we need: handle failover between multiple DNS servers. I would use the term "DNS proxy" but that term is too overloaded with other connotations and preconceived ideas.
All the other stuff is great, but it should run at a higher level and perhaps be optional like you say. You may not want DNSSEC validation in every VM, or indeed on every server in a corporate datacenter. But you still do need the local DNS daemon to handle failover ONCE for the entire system, rather than the way glibc does it now PER PROCESS.
I omitted "cache" from my description of the local DNS daemon this time around, because after thinking about it more, once you introduce a cache you have to deal with flushing that cache and that complicates things perhaps more than people are willing to accept. Having a cache is not required to handle the most basic failover functionality.
This local non-caching, non-recursive stub DNS daemon could sit in front of, behind, or beside (in place of) other optional full DNSSEC/caching/VPN-aware solutions. For the purposes of this example, lets call this theoretical daemon "dnslookupd":
resolv.conf is configured with 127.0.0.1.
dnslookupd listens on 127.0.0.1:53.
dnslookupd is configured with one or more DNS servers (which could be local daemon(s) on other loopback addresses or ports).
dnslookupd keeps track of up/down DNS servers via some health check mechanism, and switches between them appropriately.
dnslookupd does not cache any results--it simply forwards the queries to one chosen DNS server.
dnslookupd provides an API for other components to configure the list/order of DNS servers (perhaps dbus).
On Sat, 12 Apr 2014, Chuck Anderson wrote:
Okay, so here is where you and I differ then. We need a solution to run everywhere, on every system, in every use case.
Sounds like wanting ponies? Obviously I fully agree with a solution that works everywhere, all the time, for everyone, however the want it :)
The local DNS daemon (note that I didn't say "cache" this time) should be a part of the Base OS like init/systemd is. It should be small, unobtrusive, and do very little, namely the one thing we need: handle failover between multiple DNS servers. I would use the term "DNS proxy" but that term is too overloaded with other connotations and preconceived ideas.
Handling failover requires keeping state of previous queries and outstanding requests to determine which servers are bad or not. Mind you, unbound allows you to set a max TTL on any record received using cache-max-ttl=0, so you can very easilly implement this idea. I think it is a bad idea, because your solution violates your own principle above: it interferes with my use case of optimising DNS caches, reducing unneccessary latency, and doing things like pre-fetching of low TTL records.
In DNS, the publisher of data tells you how long the data should be valid for. If they want the record not to be cached at all, they can set the TTL to 0. Why should we deploy a daemon that does not provide the very useful feature of caching in general (especially when doing DNSSEC validation) when people who wish to not get cached already have a means out, publish records with TTL=0? If you want to be Akamai, you can!
dnslookupd keeps track of up/down DNS servers via some health check mechanism, and switches between them appropriately.
I tend to call heartbeats/keepalives "make deads". They often do the opposite. Why invent a whole new health check protocol when you can simple send DNS queries and use strategies to prefer the nearest/fastest servers already. These kind of selection/preference protocols are part of any decent DNS implementation. There is no need to re-invent the wheel.
Paul
On Sat, Apr 12, 2014 at 12:06:23PM -0400, Paul Wouters wrote:
On Sat, 12 Apr 2014, Chuck Anderson wrote:
Okay, so here is where you and I differ then. We need a solution to run everywhere, on every system, in every use case.
Sounds like wanting ponies? Obviously I fully agree with a solution that works everywhere, all the time, for everyone, however the want it :)
The local DNS daemon (note that I didn't say "cache" this time) should be a part of the Base OS like init/systemd is. It should be small, unobtrusive, and do very little, namely the one thing we need: handle failover between multiple DNS servers. I would use the term "DNS proxy" but that term is too overloaded with other connotations and preconceived ideas.
Handling failover requires keeping state of previous queries and outstanding requests to determine which servers are bad or not. Mind you, unbound allows you to set a max TTL on any record received using cache-max-ttl=0, so you can very easilly implement this idea. I think it is a bad idea, because your solution violates your own principle above: it interferes with my use case of optimising DNS caches, reducing unneccessary latency, and doing things like pre-fetching of low TTL records.
Of course there would be /some/ state kept. It just wouldn't cache the data, it would only use the state of recent queries and response times to determine if that resolver was dead and start sending those queries to another resolver. It would basically do exactly what glibc's stub resolver does now, but ONCE for the entire system rather than having each process do that independently. I would want this daemon to be as lightweight as possible to minimize any interference with optimising DNS caches, latency, etc. and so that it could be used everywhere, just like systemd is used on all Fedora systems and some form of "init" is used on all Linux systems.
Another way to think of this is to separate out the built-in logic in unbound/BIND/dnsmasq/etc. that determines when an authoritative server is dead and apply it to all queries that are made by glibc's stub resolver. Or separate out the logic that glibc uses to determine when a nameserver in /etc/resolv.conf is dead and make that a system-wide daemon.
In DNS, the publisher of data tells you how long the data should be valid for. If they want the record not to be cached at all, they can set the TTL to 0. Why should we deploy a daemon that does not provide the very useful feature of caching in general (especially when doing DNSSEC validation) when people who wish to not get cached already have a means out, publish records with TTL=0? If you want to be Akamai, you can!
Because things get messy once you start caching on the end-user system. Sure, you can optionally have that messiness (and I'd argue that for Fedora Workstation that would be a sane default) but for Fedora Server I think it is too heavyweight of a solution to run everywhere, and you agreed that running this in VMs is probably not desired.
If the lightweight dnslookupd process is configured to forward the request to a local unbound+dnssec-triggerd, then everything from that point will work in the same way it does today with local caching, TTL handling, DNSSEC, etc. But that should be /optional/. I'm arguing that dnslookupd should be on by default everywhere.
dnslookupd keeps track of up/down DNS servers via some health check mechanism, and switches between them appropriately.
I tend to call heartbeats/keepalives "make deads". They often do the opposite. Why invent a whole new health check protocol when you can simple send DNS queries and use strategies to prefer the nearest/fastest servers already. These kind of selection/preference protocols are part of any decent DNS implementation. There is no need to re-invent the wheel.
It doesn't need to do active heartbeats--it could passively watch queries/responses that it is forwarding to the resolver and decide based on that if a server is dead and stop querying it until the next one fails, etc. just like glibc does today.
For the use cases you desire with full caching and DNSSEC, dnslookupd shouldn't get in the way. All applications/glibc would query 127.0.0.1, which would immediately forward all those requests to the local unbound+dnssec-triggerd setup. Dnslookupd would only take action if unbound died for some reason (and if there was an alternate DNS resolver to switch to).
On Sat, 2014-04-12 at 13:04 -0400, Chuck Anderson wrote:
On Sat, Apr 12, 2014 at 12:06:23PM -0400, Paul Wouters wrote:
On Sat, 12 Apr 2014, Chuck Anderson wrote:
Okay, so here is where you and I differ then. We need a solution to run everywhere, on every system, in every use case.
Sounds like wanting ponies? Obviously I fully agree with a solution that works everywhere, all the time, for everyone, however the want it :)
The local DNS daemon (note that I didn't say "cache" this time) should be a part of the Base OS like init/systemd is. It should be small, unobtrusive, and do very little, namely the one thing we need: handle failover between multiple DNS servers. I would use the term "DNS proxy" but that term is too overloaded with other connotations and preconceived ideas.
Handling failover requires keeping state of previous queries and outstanding requests to determine which servers are bad or not. Mind you, unbound allows you to set a max TTL on any record received using cache-max-ttl=0, so you can very easilly implement this idea. I think it is a bad idea, because your solution violates your own principle above: it interferes with my use case of optimising DNS caches, reducing unneccessary latency, and doing things like pre-fetching of low TTL records.
Of course there would be /some/ state kept. It just wouldn't cache the data, it would only use the state of recent queries and response times to determine if that resolver was dead and start sending those queries to another resolver. It would basically do exactly what glibc's stub resolver does now, but ONCE for the entire system rather than having each process do that independently. I would want this daemon to be as lightweight as possible to minimize any interference with optimising DNS caches, latency, etc. and so that it could be used everywhere, just like systemd is used on all Fedora systems and some form of "init" is used on all Linux systems.
Another way to think of this is to separate out the built-in logic in unbound/BIND/dnsmasq/etc. that determines when an authoritative server is dead and apply it to all queries that are made by glibc's stub resolver. Or separate out the logic that glibc uses to determine when a nameserver in /etc/resolv.conf is dead and make that a system-wide daemon.
You can do this today, just write a nsswitch module to handle the host database and connect to it via a pipe from all processes.
In DNS, the publisher of data tells you how long the data should be valid for. If they want the record not to be cached at all, they can set the TTL to 0. Why should we deploy a daemon that does not provide the very useful feature of caching in general (especially when doing DNSSEC validation) when people who wish to not get cached already have a means out, publish records with TTL=0? If you want to be Akamai, you can!
Because things get messy once you start caching on the end-user system.
Citation please ? What kind of messy ? If you properly handle TTLs what gets messed up ? *especially* if unbound is configured to automatically flush caches when you change networks.
Sure, you can optionally have that messiness (and I'd argue that for Fedora Workstation that would be a sane default) but for Fedora Server I think it is too heavyweight of a solution to run everywhere, and you agreed that running this in VMs is probably not desired.
If the lightweight dnslookupd process is configured to forward the request to a local unbound+dnssec-triggerd, then everything from that point will work in the same way it does today with local caching, TTL handling, DNSSEC, etc. But that should be /optional/. I'm arguing that dnslookupd should be on by default everywhere.
Can you substantiate what is lightweight for you ? I have unbound running on my machine and it is basically unnoticeable. The resident memory is 15MiB, with the data and all right around the same size of other similar daemons like polkitd system-journald dhclient dbus-daemon, all stuff you already run your servers and I have never heard anybody call them *heavy* weight.
dnslookupd keeps track of up/down DNS servers via some health check mechanism, and switches between them appropriately.
I tend to call heartbeats/keepalives "make deads". They often do the opposite. Why invent a whole new health check protocol when you can simple send DNS queries and use strategies to prefer the nearest/fastest servers already. These kind of selection/preference protocols are part of any decent DNS implementation. There is no need to re-invent the wheel.
It doesn't need to do active heartbeats--it could passively watch queries/responses that it is forwarding to the resolver and decide based on that if a server is dead and stop querying it until the next one fails, etc. just like glibc does today.
Are you volunteering for writing this daemon ? Because if you are not then we are wasting time I would think.
For the use cases you desire with full caching and DNSSEC, dnslookupd shouldn't get in the way. All applications/glibc would query 127.0.0.1, which would immediately forward all those requests to the local unbound+dnssec-triggerd setup. Dnslookupd would only take action if unbound died for some reason (and if there was an alternate DNS resolver to switch to).
I still fail to see what is the point of writing yet another DNS caching (but non-caching) dameon when we already have unbound, and as Paul said, if you really want to, you can tell it to override TTLs to 0 causing it to effectively not cache data.
I see no point in writing something new in this area.
Simo.
On Fri, 2014-04-11 at 15:22 -0500, Bruno Wolff III wrote:
On Fri, Apr 11, 2014 at 14:21:30 -0500, Dan Williams dcbw@redhat.com wrote:
NM in F20+ already has a "dns=none" option that prevents NM from touching resolv.conf, but obviously if NM isn't touching it, the DNS information that NM gets from upstream or your local configuration needs to get to the local caching nameserver somehow. Which is what the existing NM DNS plugins are for, like the dnsmasq one.
If you are running a caching resolver you don't need the DNS information from DCHP (except except for the hotspot issue) at all. For example, dnscache can be used for this. (It doesn't do dnssec though, so wouldn't provide what is wanted for the proposal.)
Not true, in many networks you want it, for example in corporate networks. You really want to be able to resolve the local resources and they are only resolvable if you consult the local DNS as provided to you by DHCP.
For hotspots in public places that doesn't matter as much of course.
Simo.
On Saturday, 12 April 2014 7:38 AM, Simo Sorce wrote: Not true, in many networks you want it, for example in corporate networks. You really want to be able to resolve the local resources and they are only resolvable if you consult the local DNS as provided to you by DHCP.
True. The local resolver can be configured to resolve internal domains by pointing it to the dynamic name servers. Also one can set 'DNS1=127.0.0.1' in /etc/sysconfig script, that way dynamic name servers are listed as DNS2, DNS3 etc.
For this very reason the dynamic name server entries need to go as "..transitory name servers to be used by the trusted local resolver".
--- Regards -Prasad http://feedmug.com
On Sat, Apr 12, 2014 at 01:22:32PM +0800, P J P wrote:
On Saturday, 12 April 2014 7:38 AM, Simo Sorce wrote: Not true, in many networks you want it, for example in corporate networks. You really want to be able to resolve the local resources and they are only resolvable if you consult the local DNS as provided to you by DHCP.
True. The local resolver can be configured to resolve internal domains by pointing it to the dynamic name servers. Also one can set 'DNS1=127.0.0.1' in /etc/sysconfig script, that way dynamic name servers are listed as DNS2, DNS3 etc.
For this very reason the dynamic name server entries need to go as "..transitory name servers to be used by the trusted local resolver".
You cannot rely on DNS2 and DNS3 to be queried UNLESS DNS1=127.0.0.1 fails to respond. This might be a way to mitigate failure of the local caching resolver process, but it is not a way to ensure the ability to resolve internal names from the corporate nameserver. The way to ensure the latter is to configure the local caching resolver to forward to the DHCP-provided nameservers rather than becoming a full iterative resolver.
On Sat, Apr 12, 2014 at 09:38:03 -0400, Chuck Anderson cra@WPI.EDU wrote:
You cannot rely on DNS2 and DNS3 to be queried UNLESS DNS1=127.0.0.1 fails to respond. This might be a way to mitigate failure of the local caching resolver process, but it is not a way to ensure the ability to resolve internal names from the corporate nameserver. The way to ensure the latter is to configure the local caching resolver to forward to the DHCP-provided nameservers rather than becoming a full iterative resolver.
If they do spilt horizon DNS using a zone that can be found normally, things will still work. If they use zones that can't be found that way, then you can make an exception for that zone, but still use iteration for other stuff.
On Sat, Apr 12, 2014 at 02:33:59AM +0800, P J P wrote:
Hello,
On Thursday, 10 April 2014 11:39 PM, P J P wrote: I plan to file a feature/change request for this one. I got caught up with other work this past week so could not do it. Will start with it right away.
Please see -> https://fedoraproject.org/wiki/Changes/Default_Local_DNS_Resolver
It's a System Wide Change Proposal request up for review.
I have set the target release as F22, because the proposal deadline for F21 was 08 Apr 2014 [1]. Besides, this change would require significant work on the related packages like NetworkManager etc. So F22 seems safer.
In case if you spot any discrepancies or have additional inputs or links to relevant documents etc. please feel free to update the wiki page or let me know and I'll add it there.
Thank you!
I think there needs to be more emphasis on the /other/ benefit, the whole reason I brought this up this time:
While DNSSEC support has historically been a driving factor for implementing this, there is an even more fundamental need due to the poor performance of the system in case the first listed nameserver in /etc/resolv.conf fails for some reason. It is shameful that Linux systems and applications in general still, after 20+ years, can't perform adequately after a primary DNS server failure. The stub resolver in glibc which uses /etc/resolv.conf can decide that the first listed nameserver entry is down, but this decision has to be made over and over in every single process on the system that is doing DNS resolution, resulting in repeated long application hangs/delays. We need an independent, system-wide DNS cache, and always point resolv.conf to 127.0.0.1 to solve this fundamental design problem with how name resolution works on a Linux system. Windows has had a default system-wide DNS cache for over a decade. It is about time that Linux catches up.
I can have a go at adding some text to the wiki.
On Saturday, 12 April 2014 3:55 AM, Chuck Anderson wrote: I think there needs to be more emphasis on the /other/ benefit, the whole reason I brought this up this time:
Sure; I tried to cover it in the detailed description as
=== ...Apart from trust, these name servers are often known to be flaky and unreliable. Which only adds to the overall bad and at times even frustrating user experience. In such a situation, having a trusted local DNS resolver not only makes sense but is in fact badly needed. It has become a need of the hour. (See: [1], [2], [3]) ===
Also, this thread is linked there at [3]. --- Regards -Prasad http://feedmug.com
On Sat, 2014-04-12 at 02:33 +0800, P J P wrote:
Hello,
On Thursday, 10 April 2014 11:39 PM, P J P wrote: I plan to file a feature/change request for this one. I got caught up with other work this past week so could not do it. Will start with it right away.
Please see -> https://fedoraproject.org/wiki/Changes/Default_Local_DNS_Resolver
It's a System Wide Change Proposal request up for review.
I have set the target release as F22, because the proposal deadline for F21 was 08 Apr 2014 [1]. Besides, this change would require significant work on the related packages like NetworkManager etc. So F22 seems safer.
In case if you spot any discrepancies or have additional inputs or links to relevant documents etc. please feel free to update the wiki page or let me know and I'll add it there.
I agree with the goal to add DNSSEC (Despite it's flaws). However, a caching DNS server can create many headaches without a number of considerations.
First, it should be easily possible to clear / invalidate the cache for a GUI and CLI user. This isn't possible on windows for example, and is why often they ask people to reboot computers in the first instance of an issue or migration. Additionally, every time the interface state changes from up/down, or the default route changes, the cache should be cleared. Consider a user of a corporate network that serves both an internal zone and an external zone. The user may enter or exit the network, and cached records would continue to be served causing issue.
Second, it can create issues as otherwise mentioned by "dodgy" hotspots. They server a fake DNS record for all hosts that resolves to the hostspot. When the client authenticates they begin to serve the real records. If these records are cached, suddenly, the hotspot is now unusable (Especially if they don't set a TTL of say 1.) This would create frustration with users who didn't realise they needed to flush their cache (See 1 ...)
Finally, I don't think it should be the default in the server product of fedora. We often have a bind server on networks for servers which is caching already.
Sincerely,
On Sat, Apr 12, 2014 at 12:38:41PM +0930, William Brown wrote:
I agree with the goal to add DNSSEC (Despite it's flaws). However, a caching DNS server can create many headaches without a number of considerations.
First, it should be easily possible to clear / invalidate the cache for a GUI and CLI user. This isn't possible on windows for example, and is why often they ask people to reboot computers in the first instance of an issue or migration. Additionally, every time the interface state changes from up/down, or the default route changes, the cache should be cleared. Consider a user of a corporate network that serves both an internal zone and an external zone. The user may enter or exit the network, and cached records would continue to be served causing issue.
Second, it can create issues as otherwise mentioned by "dodgy" hotspots. They server a fake DNS record for all hosts that resolves to the hostspot. When the client authenticates they begin to serve the real records. If these records are cached, suddenly, the hotspot is now unusable (Especially if they don't set a TTL of say 1.) This would create frustration with users who didn't realise they needed to flush their cache (See 1 ...)
Finally, I don't think it should be the default in the server product of fedora. We often have a bind server on networks for servers which is caching already.
I agree on all points above except this one. Servers ESPECIALLY need to have a local DNS cache so they can continue working correctly when the primary nameserver goes down. In fact, the server case is even simpler--it can simply forward all requests to the first corporate/datacenter nameserver until it fails, and then fail over to the 2nd or 3rd DNS server in that case. It may not even have to support full DNSSEC validation because you may trust the datacenter's nameserver to do that for you. It certainly wouldn't have to deal with all the corner cases that clients have to deal with that you mention above such as flushing the cache on network link or route changes, VPN connection/disconnection, captive portals, etc.
In fact, this is such an important use case that I'm tempted to create a separate Fedora Change for the Server Product with just this very basic limited functionality because we can do this very simply today without getting bogged down in the quagmire of DNSSEC, NTP bootstrapping, and all the client issues above.
One thing I would like to note is that in machines which don't have a hardware clock, I had problems starting bind and unbound, because the date was back to 1970 in each boot, so the root dns key was not yet valid and there were no valid dns resolvers to update time by ntp. I had to hardcode some ntp servers IP addresses to perform the ntp queries at boot time.
This was using the OpenWrt distro in a mips router, I don't know if we can face this kind of problem in ARM machines. I guess all x86 have hardware clock, doesn't they?
Regards, Juan.
On Mon, Apr 14, 2014 at 02:07:07PM +0200, Juan Orti Alcaine wrote:
One thing I would like to note is that in machines which don't have a hardware clock, I had problems starting bind and unbound, because the date was back to 1970 in each boot, so the root dns key was not yet valid and there were no valid dns resolvers to update time by ntp. I had to hardcode some ntp servers IP addresses to perform the ntp queries at boot time.
This was using the OpenWrt distro in a mips router, I don't know if we can face this kind of problem in ARM machines. I guess all x86 have hardware clock, doesn't they?
The NTP Bootstrapping problem is well known. There is an effort to deal with that here (in the context of dnsmasq DNSSEC on OpenWRT/CeroWRT):
http://comments.gmane.org/gmane.comp.embedded.cerowrt.devel/2244
Search for the word "prototype" to find a description of one implementation.
"The nice thing about this switch to dnsmasq is that it does validation of the chain, just ignoring validity times; which presumably would make it harder to exploit as you'd need an actual valid key, rather than just be able to spoof the packets reply of the non-validated query.."
There are many other ideas in that thread.
On Mon, 14 Apr 2014, Juan Orti Alcaine wrote:
One thing I would like to note is that in machines which don't have a hardware clock, I had problems starting bind and unbound, because the date was back to 1970 in each boot, so the root dns key was not yet valid and there were no valid dns resolvers to update time by ntp. I had to hardcode some ntp servers IP addresses to perform the ntp queries at boot time.
This was using the OpenWrt distro in a mips router, I don't know if we can face this kind of problem in ARM machines. I guess all x86 have hardware clock, doesn't they?
That's a problem we are aware of. tlsdate is one method, but I believe the openwrt people now also do some other things. Possibly saving the time on shutdown so you have a reasonable time on startup.
For DNSSEC, we found that you need accurancy within a couple of hours because some RRSIGs in the path to .org (for ntp.pool.org) were pretty short. But I think adding a few ntp servers by IP address could be good for the standard ntp config as well - provided there are IPs that can be used for that in the pool.
Paul
On Thu, 10 Apr 2014 10:41:54 -0400 Chuck Anderson cra@WPI.EDU wrote:
[...] We need an independent, system-wide DNS cache, and always point resolv.conf to 127.0.0.1 to solve this fundamental design problem with how name resolution works on a Linux system. Windows has had a default system-wide DNS cache for over a decade. It is about time that Linux catches up.
I observe you pointedly ignore the existence of nscd (which does not require any changes to resolv.conf). Why is that?
-- Pete
On Fri, 2014-04-25 at 09:56 -0600, Pete Zaitcev wrote:
On Thu, 10 Apr 2014 10:41:54 -0400 Chuck Anderson cra@WPI.EDU wrote:
[...] We need an independent, system-wide DNS cache, and always point resolv.conf to 127.0.0.1 to solve this fundamental design problem with how name resolution works on a Linux system. Windows has had a default system-wide DNS cache for over a decade. It is about time that Linux catches up.
I observe you pointedly ignore the existence of nscd (which does not require any changes to resolv.conf). Why is that?
nscd is ... bad
Simo.
On 25.4.2014 18:19, Simo Sorce wrote:
On Fri, 2014-04-25 at 09:56 -0600, Pete Zaitcev wrote:
On Thu, 10 Apr 2014 10:41:54 -0400 Chuck Anderson cra@WPI.EDU wrote:
[...] We need an independent, system-wide DNS cache, and always point resolv.conf to 127.0.0.1 to solve this fundamental design problem with how name resolution works on a Linux system. Windows has had a default system-wide DNS cache for over a decade. It is about time that Linux catches up.
I observe you pointedly ignore the existence of nscd (which does not require any changes to resolv.conf). Why is that?
nscd is ... bad
Main goal is to have local DNSSEC-validating resolver.
devel@lists.stg.fedoraproject.org