A number of enterprise Office 365 users were on day two of no email today, as Microsoft (after a 20-hour period of no public status updates) said ongoing issues that appeared to be affecting users globally were yet to be resolved: “Telemetry data is indicating connection time outs within the Exchange authentication infrastructure, resulting in impact”, a status update read on Friday morning.
Microsoft Office 365 Outage: No Failover?
Asked by Computer Business Review why there was no failover in place when such incidents happened, a Microsoft spokesman did not answer the question.
They responded: “We’re working to resolve difficulties a limited subset of enterprise customers in Europe are experiencing when attempting to access Exchange Online. Consumers are not affected. Admins can find status updates on the Admin Center.”
One customer, Piers Webster, the Managing Director of executive recruitment agency Initial Talent, told Computer Business Review: “We’ve been down since 9am (Thursday 24th). It’s causing all sorts of headaches and the usual cluelessness and general lack of response from Microsoft. That’s a whole day’s worth of business lost.”
With Outlook’s Twitter handle perhaps insensitively tweeting late last night that the company “Really, really, really, really needs it to be Friday”, this medical company summed up many enterprise user’s feelings in response.
And we
really
really
really
really
really
really
really
really
really
really
really
really
really
really
really
really
really
really
really
need to receive our customers' emails! #OutlookDown https://t.co/we3ltDwjKG— Just Care Medical (@JustCareMedical) January 25, 2019
A previous 17-hour outage for users with multi-factor authentification (MFA) was caused by requests from MFA servers to Redis Cache in Europe reaching “operational threshold causing latency and timeouts”, Microsoft told customers in late November.
“After attempting to fail over traffic to North America this caused a secondary issue where servers became unhealthy and traffic was throttled to handle increased demand.”
See also: AWS vs Azure vs Google Cloud: Who Wins on Latency, Performance?
To mitigate the issue, engineers deployed a hotfix which eliminated the connection between Azure’s MFA service and an unnamed backend service. They then cycled impacted servers, which allowed authentication requests to succeed.
Both that incident and this raise the question of whether Microsoft’s data centres are equipped to deal with the levels of use they are receiving: more than 135 million people use Office 365 commercial every month.
It is unclear precisely how many users have been affected by the most recent incident: for Microsoft, perhaps a limited subset in the context of broader numbers. For the businesses affected however, loss of email is always bad news.