FEATURES.md 12 KB

Features

Dynamic Server Timeout Calculation

Metrics are stored for every server in time series buckets for both the current time span and prior time span in 1 minute, 15 minute, 1 hour, and 1 day intervals, plus a single since-inception bucket (of the server in the c-ares channel).

These metrics are then used to calculate the average latency for queries on each server, which automatically adjusts to network conditions. This average is then multiplied by 5 to come up with a timeout to use for the query before re-queuing it. If there is not sufficient data yet to calculate a timeout (need at least 3 prior queries), then the default of 2000ms is used (or an administrator-set ARES_OPT_TIMEOUTMS).

The timeout is then adjusted to a minimum bound of 250ms which is the approximate RTT of network traffic half-way around the world, to account for the upstream server needing to recurse to a DNS server far away. It is also bounded on the upper end to 5000ms (or an administrator-set ARES_OPT_MAXTIMEOUTMS).

If a server does not reply within the given calculated timeout, the next time the query is re-queued to the same server, the timeout will approximately double thus leading to adjustments in timeouts automatically when a successful reply is recorded.

In order to calculate the optimal timeout, it is highly recommended to ensure ARES_OPT_QUERY_CACHE is enabled with a non-zero qcache_max_ttl (which it is enabled by default with a 3600s default max ttl). The goal is to record the recursion time as part of query latency as the upstream server will also cache results.

This feature requires the c-ares channel to persist for the lifetime of the application.

Failed Server Isolation

Each server is tracked for failures relating to consecutive connectivity issues or unrecoverable response codes. Servers are sorted in priority order based on this metric. Downed servers will be brought back online either when the current highest priority server has failed, or has been determined to be online when a query is randomly selected to probe a downed server.

By default a downed server won't be retried for 5 seconds, and queries will have a 10% chance of being chosen after this timeframe to test a downed server. When a downed server is selected to be probed, the query will be duplicated and sent to the downed server independent of the original query itself. This means that probing a downed server will always use an intended legitimate query, but not have a negative impact of a delayed response in case that server is still down.

Administrators may customize these settings via ARES_OPT_SERVER_FAILOVER.

Additionally, when using ARES_OPT_ROTATE or a system configuration option of rotate, c-ares will randomly select a server from the list of highest priority servers based on failures. Any servers in any lower priority bracket will be omitted from the random selection.

This feature requires the c-ares channel to persist for the lifetime of the application.

Query Cache

Every successful query response, as well as NXDOMAIN responses containing an SOA record are cached using the TTL returned or the SOA Minimum as appropriate. This timeout is bounded by the ARES_OPT_QUERY_CACHE qcache_max_ttl, which defaults to 1hr.

The query is cached at the lowest possible layer, meaning a call into ares_search_dnsrec() or ares_getaddrinfo() may spawn multiple queries in order to complete its lookup, each individual backend query result will be cached.

Any server list change will automatically invalidate the cache in order to purge any possible stale data. For example, if NXDOMAIN is cached but system configuration has changed due to a VPN connection, the same query might now result in a valid response.

This feature is not expected to cause any issues that wouldn't already be present due to the upstream DNS server having substantially similar caching already. However if desired it can be disabled by setting qcache_max_ttl to 0.

This feature requires the c-ares channel to persist for the lifetime of the application.

DNS 0x20 Query Name Case Randomization

DNS 0x20 is the name of the feature which automatically randomizes the case of the characters in a UDP query as defined in draft-vixie-dnsext-dns0x20-00.

For example, if name resolution is performed for www.example.com, the actual query sent to the upstream name server may be Www.eXaMPlE.cOM.

The reason to randomize case characters is to provide additional entropy in the query to be able to detect off-path cache poisoning attacks for UDP. This is not used for TCP connections which are not known to be vulnerable to such attacks due to their stateful nature.

Much research has been performed by Google on case randomization and in general have found it to be effective and widely supported.

This feature is disabled by default and can be enabled via ARES_FLAG_DNS0x20. There are some instances where servers do not properly facilitate this feature and unlike in a recursive resolver where it may be possible to determine an authoritative server is incapable, its much harder to come to any reliable conclusion as a stub resolver as to where in the path the issue resides. Due to the recent wide deployment of DNS 0x20 in large public DNS servers, it is expected compatibility will improve rapidly where this feature, in time, may be able to be enabled by default.

Another feature which can be used to prevent off-path cache poisoning attacks is DNS Cookies.

DNS Cookies

DNS Cookies are are a method of learned mutual authentication between a server and a client as defined in RFC7873 and RFC9018.

This mutual authentication ensures clients are protected from off-path cache poisoning attacks, and protects servers from being used as DNS amplification attack sources. Many servers will disable query throttling limits when DNS Cookies are in use. It only applies to UDP connections.

Since DNS Cookies are optional and learned dynamically, this is an always-on feature and will automatically adjust based on the upstream server state. The only potential issue is if a server has once supported DNS Cookies then stops supporting them, it must clear a regression timeout of 2 minutes before it can accept responses without cookies. Such a scenario would be exceedingly rare.

Interestingly, the large public recursive DNS servers such as provided by Google, CloudFlare, and OpenDNS do not have this feature enabled. That said, most DNS products like BIND enable DNS Cookies by default.

This feature requires the c-ares channel to persist for the lifetime of the application.

TCP FastOpen (0-RTT)

TCP Fast Open is defined in RFC7413 and enables data to be sent with the TCP SYN packet when establishing the connection, thus rivaling the performance of UDP. A previous connection must have already have been established in order to obtain the client cookie to allow the server to trust the data sent in the first packet and know it was not an off-path attack.

TCP FastOpen can only be used with idempotent requests since in timeout conditions the SYN packet with data may be re-sent which may cause the server to process the packet more than once. Luckily DNS requests are idempotent by nature.

TCP FastOpen is supported on Linux, MacOS, and FreeBSD. Most other systems do not support this feature, or like on Windows require use of completion notifications to use it whereas c-ares relies on readiness notifications.

Supported systems also need to be configured appropriately on both the client and server systems.

Linux TFO

In linux a single sysctl value is used with flags to set the desired fastopen behavior.

It is recommended to make any changes permanent by creating a file in /etc/sysctl.d/ with the appropriate key and value. Legacy Linux systems might need to update /etc/sysctl.conf directly. After modifying the configuration, it can be loaded via sysctl -p.

net.ipv4.tcp_fastopen:

  • 1 = client only (typically default)
  • 2 = server only
  • 3 = client and server

MacOS TFO

In MacOS, TCP FastOpen is enabled by default for clients and servers. You can verify via the net.inet.tcp.fastopen sysctl.

If any change is needed, you should make it persistent as per this guidance: Persistent Sysctl Settings

net.inet.tcp.fastopen

  • 1 = client only
  • 2 = server only
  • 3 = client and server (typically default)

FreeBSD TFO

In FreeBSD, server mode TCP FastOpen is typically enabled by default but client mode is disabled. It is recommended to edit /etc/sysctl.conf and place in the values you wish to persist to enable or disable TCP Fast Open. Once the file is modified, it can be loaded via sysctl -f /etc/sysctl.conf.

  • net.inet.tcp.fastopen.server_enable (boolean) - enable/disable server
  • net.inet.tcp.fastopen.client_enable (boolean) - enable/disable client

Event Thread

Historic c-ares integrations required integrators to have their own event loop which would be required to notify c-ares of read and write events for each socket. It was also required to notify c-ares at the appropriate timeout if no events had occurred. This could be difficult to do correctly and could lead to stalls or other issues.

The Event Thread is currently supported on all systems except DOS which does not natively support threading (however it could in theory be possible to enable with something like FSUpthreads).

c-ares is built by default with threading support enabled, however it may disabled at compile time. The event thread must also be specifically enabled via ARES_OPT_EVENT_THREAD.

Using the Event Thread feature also facilitates some other features like System Configuration Change Monitoring, and automatically enables the ares_set_pending_write_cb() feature to optimize multi-query writing.

System Configuration Change Monitoring

The system configuration is automatically monitored for changes to the network and DNS settings. When a change is detected a thread is spawned to read the new configuration then apply it to the current c-ares configuration.

This feature requires the Event Thread to be enabled via ARES_OPT_EVENT_THREAD. Otherwise it is up to the integrator to do their own configuration monitoring and call ares_reinit() to reload the system configuration.

It is supported on Windows, MacOS, iOS and any system configuration that uses /etc/resolv.conf and similar files such as Linux and FreeBSD. Specifically excluded are DOS and Android due to missing mechanisms to support such a feature. On linux file monitoring will result in immediate change detection, however on other unix-like systems a polling mechanism is used that checks every 30s for changes.

This feature requires the c-ares channel to persist for the lifetime of the application.