A 17-hour-long hiccup that prevented Microsoft Azure users with multi-factor authentification (MFA) set from logging in to their accounts was down to an overloaded Redis cache, Microsoft said Tuesday.
The issue suggests Microsoft may not have been ready for the extent of MFA uptake by increasingly security-conscious Azure users.
Redis is an in-memory database that persists on disk. Many users deactivated their MFA. Others were unable to do so. Microsoft in part fixed the issue by “cycling” (restarting) its servers.
The cause of similar issues for Office 365 users meanwhile was attributed to a “coding issue” following updates to its MFA services, Microsoft added, saying it is monitoring the situation to ensure service is uninterrupted.
Azure MFA: “Operational Threshold Reached”
“Requests from MFA servers to Redis Cache in Europe reached operational threshold causing latency and timeouts”, Microsoft told customers.
“After attempting to fail over traffic to North America this caused a secondary issue where servers became unhealthy and traffic was throttled to handle increased demand.”
To mitigate the issue, engineers deployed a hotfix which eliminated the connection between Azure’s MFA service and an unnamed backend service. They then “cycled” (or re-booted) impacted servers, which allowed authentication requests to succeed, Azure said.
While the issue seems to be largely resolved, some users were still struggling Tuesday, with MFA working via SMS but not via other methods.
— Rick van den Bosch (@rickvdbosch) November 20, 2018
The company plans to publish a full root cause analysis over the next few days.
— mitchel lewis (@stautistic) November 19, 2018
Rik Turner, Principal Analyst, Infrastructure Solutions, Ovum told Computer Business Review said: “These days, cloud service providers (CSPs) are rather like utilities, in that they must not only be ubiquitous, but also always available. The MFA glitch that made Microsoft’s IaaS offering, Azure, and its best-known SaaS offering, Office 365, inaccessible to large swathes of users across Europe, APAC and the Americas for varying lengths of time on November 19 was therefore a major issue for the company, and one that it will need to address going forward.”
He added: “The fact that the problem appears to have been with Redis, the open-source, in-memory, key-value database developed specifically to address scalability challenges in conventional databases, is an irony that will not have been lost on Microsoft. The fact is, though, that all the major CSPs, whether in IaaS or PaaS or, as in Microsoft’s case, both, have had problems, and will almost certainly continue to do so from time to time, be they with scalability or security. Thus, any schadenfreude on the part of Microsoft’s competitors in these markets is likely to be fairly muted.”
“MFA remains a basic requirement for securing systems, particularly ones in the cloud, and it behoves the CSPs to make the back-end infrastructure supporting it scalable enough to guarantee its ability to deliver. Microsoft gets a “Must Try Harder” in its end-of-year report, while its cloud customers may be rereading the SLAs in their contract as we speak…”
Frustrated users meanwhile may find if they cast their eyes further afield that a variety of third-party enterprises provide single sign-on (SSO) solutions for cloud users that include MFA, with names like Okta, Ping Identity and Secret Double Octopus all in the mix.