Oracle Dyn outage

A number of large internet services use Oracle's Dyn service to host their domain's DNS.  This service has had an outage overnight (27 May 2022).  This problem was caused by a DNS misconfiguration and, although the underlying problem has now been resolved, it will take some time for the broken DNS records to be cleared from DNS caches around the world.

Oracle have said that "On-going impact has been improving as DNS cache clears via normal TTL propagation or DNS flushing. As of 04:30 UTC, DNS resolution response was largely recovered globally and will continue to improve as remaining TTL's expire and refresh cache."  However, it is not clear to us why they believe that DNS resolution had recovered by 04:30 UTC - the affected DNS records had a fairly long time-to-live (TTL) (we think they probably had a TTL of 1 day) and we expect that they will therefore be gradually cleared over the following 24 hours.  If you are struggling to access an important site through an Opendium system, please get in touch so that we can arrange to manually flush your system's DNS cache.

Analysis

Oracle's report does not give a technical overview of the problem, so we thought we'd provide a little analysis for those interested.

In order to connect to a service on the internet, the domain name (we'll use amazon.com in this example) needs to be converted into an IP address.  The process for doing this is known as recursive DNS resolution.  This is a slightly involved process, so we'll skip a few steps, but first of all to find the IP address of amazon.com, we look to see which DNS servers are responsible for that domain, and in this case we find 6 of them:


amazon.com.    IN    NS    ns1.p31.dynect.net.
amazon.com.    IN    NS    ns2.p31.dynect.net.
amazon.com.    IN    NS    ns3.p31.dynect.net.
amazon.com.    IN    NS    ns4.p31.dynect.net.
amazon.com.    IN    NS    pdns1.ultradns.net.
amazon.com.    IN    NS    pdns6.ultradns.co.uk.

The *.dynect.net ones are part of Oracle's Dyn service and in Amazon's case they also use UltraDNS to provide an independent backup.  While the Dyn service is broken, users should still be able to get to amazon.com using UltraDNS, although connectivity may be a bit slow and flaky.  Not all internet services have an independent backup though.

Now we know the names of the servers responsible for the amazon.com domain, we need to get their IP addresses, and this is done in the same way:


dynect.net.    IN      NS      adc08dnsext01.us.oracle.com.
dynect.net.    IN      NS      adc08dnsext02.us.oracle.com.
dynect.net.    IN      NS      cgydc01dnsext01.us.oracle.com.
dynect.net.    IN      NS      iad-dns-master.oraclecorp.com.
dynect.net.    IN      NS      llg07dnsext01.llg.oracle.com.
dynect.net.    IN      NS      llg07dnsext02.llg.oracle.com.
dynect.net.    IN      NS      rmdc02dnsext01.us.oracle.com.
dynect.net.    IN      NS      rmdc02dnsext02.us.oracle.com.
dynect.net.    IN      NS      sydc01dns03.au.oracle.com.
dynect.net.    IN      NS      trdc01dnsext01.us.oracle.com.
dynect.net.    IN      NS      tvp02dnsext02.tvp.oracle.com.

We can see that dynect.net's DNS servers are largely hosted by Oracle, so we look for their IP addresses and get:


oracle.com.    IN      NS      ns3.p04.dynect.net.
oracle.com.    IN      NS      orcldns1.ultradns.com.
oracle.com.    IN      NS      ns2.p04.dynect.net.
oracle.com.    IN      NS      ns4.p04.dynect.net.
oracle.com.    IN      NS      ns1.p04.dynect.net.
oracle.com.    IN      NS      orcldns2.ultradns.net.

And here you may spot a problem - most of the DNS servers responsible for oracle.com are under the *.dynect.net domain, and we can't find their IP addresses because we haven't yet found the IP addresses for the (oracle.com) DNS servers which are responsible for the dynect.net domain.  At this point the whole thing fails - our attempt to lookup amazon.com ends in failure.  The problem was caused by Oracle setting up this loop.

The DNS servers responsible for dynect.net have now been changed to remove the loop:


dynect.net.    IN    NS    ns1.dynamicnetworkservices.net.
dynect.net.    IN    NS    ns2.dynamicnetworkservices.net.
dynect.net.    IN    NS    ns3.dynamicnetworkservices.net.
dynect.net.    IN    NS    ns4.dynamicnetworkservices.net.
dynect.net.    IN    NS    ns5.dynamicnetworkservices.net.
dynect.net.    IN    NS    ns6.dynamicnetworkservices.net.

However, recursive DNS lookups are obviously quite a lot of work, so DNS servers cache the records for some time.  The amount of time the cached records remain valid for is known as the time-to-live (TTL).  Unfortunately, the broken records had quite a long TTL (probably 1 day), so the fixed records won't be used until the cached records in all of the DNS servers that you are using have expired.

Although it might appear that the recursive lookups would go on forever, needing to look up a new domain each time, but in reality you eventually find a "glue record" which gives you the IP address of a DNS server.  In fact, it is good practice for DNS providers to ensure that they register suitable glue records registered in order to reduce the number of lookups needed.  All of the dynamicnetworkservices.net DNS servers have glue records, so the full process, using the fixed records, would be:

Look to see which DNS servers are responsible for amazon.com:


amazon.com.    IN    NS    ns1.p31.dynect.net.
amazon.com.    IN    NS    ns2.p31.dynect.net.
amazon.com.    IN    NS    ns3.p31.dynect.net.
amazon.com.    IN    NS    ns4.p31.dynect.net.
amazon.com.    IN    NS    pdns1.ultradns.net.
amazon.com.    IN    NS    pdns6.ultradns.co.uk.

Look to see which DNS server are responsible for dynect.net:


dynect.net.    IN    NS    ns1.dynamicnetworkservices.net.
dynect.net.    IN    NS    ns2.dynamicnetworkservices.net.
dynect.net.    IN    NS    ns3.dynamicnetworkservices.net.
dynect.net.    IN    NS    ns4.dynamicnetworkservices.net.
dynect.net.    IN    NS    ns5.dynamicnetworkservices.net.
dynect.net.    IN    NS    ns6.dynamicnetworkservices.net.

Look to see which DNS servers are responsible for dynamicnetworkservices.net:


dynamicnetworkservices.net.    IN    NS    ns1.dynamicnetworkservices.net.
dynamicnetworkservices.net.    IN    NS    ns6.dynamicnetworkservices.net.
dynamicnetworkservices.net.    IN    NS    ns5.dynamicnetworkservices.net.
dynamicnetworkservices.net.    IN    NS    ns3.dynamicnetworkservices.net.
dynamicnetworkservices.net.    IN    NS    ns2.dynamicnetworkservices.net.
dynamicnetworkservices.net.    IN    NS    ns4.dynamicnetworkservices.net.
ns3.dynamicnetworkservices.net.    IN AAAA    2600:2000:2230::136
ns6.dynamicnetworkservices.net.    IN AAAA    2600:2000:2230::136
ns1.dynamicnetworkservices.net.    IN AAAA    2600:2000:2210::136
ns2.dynamicnetworkservices.net.    IN AAAA    2600:2000:2220::136
ns5.dynamicnetworkservices.net.    IN AAAA    2600:2000:2210::136
ns4.dynamicnetworkservices.net.    IN AAAA    2600:2000:2240::136
ns3.dynamicnetworkservices.net.    IN A    108.59.163.136
ns6.dynamicnetworkservices.net.    IN A    108.59.163.136
ns1.dynamicnetworkservices.net.    IN A    108.59.161.136
ns2.dynamicnetworkservices.net.    IN A    108.59.162.136
ns5.dynamicnetworkservices.net.    IN A    108.59.161.136
ns4.dynamicnetworkservices.net.    IN A    108.59.164.136

Notice that the response to this lookup (above) includes the glue records, which contain the IP addresses of the DNS servers.  We can now use one of those DNS servers to look up the *.dynect.net DNS servers:


ns1.p31.dynect.net.    IN    AAAA    2600:2000:2210::31
ns2.p31.dynect.net.    IN    AAAA    2600:2000:2220::31
ns3.p31.dynect.net.    IN    AAAA    2600:2000:2230::31
ns4.p31.dynect.net.    IN    AAAA    2600:2000:2240::31
ns1.p31.dynect.net.    IN    A    108.59.161.31
ns2.p31.dynect.net.    IN    A    108.59.162.31
ns3.p31.dynect.net.    IN    A    108.59.163.31
ns4.p31.dynect.net.    IN    A    108.59.164.31

And we can use those DNS servers to look up the amazon.com domain itself:


amazon.com.    IN    A    205.251.242.103
amazon.com.    IN    A    54.239.28.85
amazon.com.    IN    A    176.32.103.205