Diagnosis: Lloyds Bank Outage
One of our customers reported problems accessing Lloyds Bank's corporate payments gateway, having already called Lloyds. Lloyds had told them that there were no problems and to clear cookies, add the site to the ActiveX trusted sites list, etc. Still not working, so must be a problem with the customer's firewall.
The first thing we did was to point a browser at https://payments.corporate.lloydsbank.com/ (on an independent internet connection) and nothing happened - it just sat there waiting. So clearly Lloyds were having some problems.
We did some slightly lower level debugging:
# openssl s_client -connect payments.corporate.lloydsbank.com:443 -servername payments.corporate.lloydsbank.com
...it sat there for a long time before eventually:
CONNECTED(00000003)
...then more sitting there for a long time before:
write:errno=104
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 0 bytes and written 0 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
---
Oh dear... It should have connected immediately and we should've got a certificate back but instead the web server dropped the connection. Firing up tcpdump to see the actual network traffic, we found that:
- There was a significant delay before the first packet (SYN) even appeared. So something else was going on before the connection was even attempted. A DNS problem was a good bet.
- The first packet (SYN) was resent about 3 times before the web server responded. This would also cause a significant delay in starting the connection.
So, investigating a potential DNS problem:
dig payments.corporate.lloydsbank.com
resulted in a long wait before failing - definitely a DNS problem then. Lets find the name servers responsible:
# dig payments.corporate.lloydsbank.com ns +trace
<snip>
payments.corporate.lloydsbank.com. IN NS ns-lv6.lloydsbanking.com.
payments.corporate.lloydsbank.com. IN NS ns-lv2.lloydsbanking.com.
payments.corporate.lloydsbank.com. IN NS ns-lv7.lloydsbanking.com.
payments.corporate.lloydsbank.com. IN NS ns-lv3.lloydsbanking.com.
Looking those up gives us:
ns-lv6.lloydsbanking.com. IN A 141.92.88.1
ns-lv2.lloydsbanking.com. IN A 141.92.96.1
ns-lv7.lloydsbanking.com. IN A 195.171.195.169
ns-lv7.lloydsbanking.com. IN A 195.171.195.168
ns-lv7.lloydsbanking.com. IN A 195.171.195.167
ns-lv7.lloydsbanking.com. IN A 195.171.195.166
ns-lv7.lloydsbanking.com. IN A 195.171.195.165
ns-lv7.lloydsbanking.com. IN A 195.171.195.164
ns-lv7.lloydsbanking.com. IN A 195.171.195.163
ns-lv7.lloydsbanking.com. IN A 195.171.195.170
ns-lv3.lloydsbanking.com. IN A 141.92.104.1
So there were 11 name servers. And by trying to look up against each of those name servers we found that 9 of them are down. That means that about 82% of DNS requests would time out - at best things are going to be very slow while clients make repeated DNS lookups and wait for each to time out; at worst, clients will fail to find a working DNS server and give up, rendering the website inaccessible.
[Update 20 September 2018: Additionally it appears that all of the DNS servers are misconfigured and are not answering TCP connections. DNS requests can be made over either TCP or UDP, so DNS servers are required to answer both types of request. However, (when they are working) the Lloyds DNS servers are only answering UDP requests.]
[Update 27 September 2018: A week on and Lloyds still haven't fixed their DNS servers. We're consistently seeing the same 4 responsive DNS servers and 7 dead ones:
# ./test_dns payments.corporate.lloydsbank.com
ns-lv2.lloydsbanking.com. 141.92.96.1 Ok
ns-lv3.lloydsbanking.com. 141.92.104.1 Ok
ns-lv6.lloydsbanking.com. 141.92.88.1 Ok
ns-lv7.lloydsbanking.com. 195.171.195.163 Ok
ns-lv7.lloydsbanking.com. 195.171.195.164 Down
ns-lv7.lloydsbanking.com. 195.171.195.165 Down
ns-lv7.lloydsbanking.com. 195.171.195.166 Down
ns-lv7.lloydsbanking.com. 195.171.195.167 Down
ns-lv7.lloydsbanking.com. 195.171.195.168 Down
ns-lv7.lloydsbanking.com. 195.171.195.169 Down
ns-lv7.lloydsbanking.com. 195.171.195.170 Down
Also a further problem has been discovered with the Lloyds corporate banking website (https://www.lloydsbankcommercial.com/) - the web server only presents the leaf certificate to clients rather than the entire certificate chain. This is an incorrect configuration and results in some users receiving a security warning from their web browser when accessing the website. In that case, the user would need to bypass the browser's security warning in order to access the Lloyds banking website, which would make them susceptible to numerous attack vectors.]
[Update 15 October 2018: Although Lloyds have switched to using a different set of name servers for payments.corporate.lloydsbank.com, we're now getting reports that lloydslink.online.lloydsbank.com doesn't work. Some investigation shows that this DNS zone is still hosted by the broken set of DNS servers listed above. Lloyds still have not acknowledged that there is any problem.]
To summarise:
- 9 out of 11 of Lloyds' DNS servers were down, resulting in intermittently very slow or even completely broken DNS lookups.
- If you managed to resolve the web server's IP address, it took a long time to accept the connection.
- If you managed to get a connection, the web server may fail to negotiate an encrypted TLS session with the client.
- DNS requests made over TCP fail entirely.
- Lloyds have spent over a week refusing the acknowledge that they even have a problem instead of asking their network team to look into it.
With multiple Lloyds Bank servers having serious problems, we wouldn't mind betting that they were being attacked. This diagnosis took about 15 minutes, after which we explained the situation to the customer so that they can follow up with Lloyds directly.
As a temporary solution for our customers who are experiencing this problem, we reconfigured their Opendium UTM DNS settings to use cached records from Google's DNS servers when looking up records within the lloydsbank.com domain. Whilst this can't eliminate the problem, using a DNS server shared by many thousands of users statistically improves the chance of a user being able to retrieve a cached DNS record rather than being affected by the broken DNS servers. In real terms, this makes the Lloyds systems usable for our customers.
Lloyds Bank's response has been exemplary:
@LloydsBankBiz @AskLloydsBank Do you have an update about your current online banking outage? Our customers have been reporting problems since yesterday morning, but your service status pages don't mention anything. Full technical details: https://t.co/inwsnjpkGn Many thanks.
— Opendium (@opendium) September 20, 2018
Hi, I'm AH. Thanks for getting in touch. We're not aware of any current issues. If your customers are experiencing difficulties, they'd be best placed contacting us directly.
— Lloyds Bank (@AskLloydsBank) September 20, 2018
Thanks for your reply. Our customers have already tried contacting you directly and have been told that there are no problems, although this is clearly not the case - please can you pass the technical information on to your network team. We are, of course, happy to help further.
— Opendium (@opendium) September 20, 2018
I appreciate your concern; however, we're not having issues. If your customers contact us directly, we'll try to help them on an individual basis. ^AH
— Lloyds Bank (@AskLloydsBank) September 20, 2018
Thanks for your reply. We have relayed your response on to the affected customers. As they have already contacted Lloyds directly and you would not acknowledge the problem, I'm not sure they are too impressed. As I said, we are happy to lend a hand to your network team.
— Opendium (@opendium) September 20, 2018