View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
November 28, 2018

Deja Vu All Over Again: Microsoft in Fresh MFA Meltdown

Oops, it did it again

By CBR Staff Writer

Fresh from providing a post mortem of last week’s multi-factor authentification (MFA) Azure and Office 365 login issues, which plagued users globally for 17 hours, Microsoft today admitted the bugs had re-emerged – and yes, once again rebooting its servers had provided a temporary reprieve.

The issue last week was attributed by Microsoft to three root causes, the first two introduced in a roll-out of a code update that began in some data centers on Tuesday, 13 November 2018 and completed on Friday, 16 November 2018.

Read this: Redis Overload to Blame for 17-Hour Azure MFA Login Issue

The issues were found to be activated once a certain traffic threshold was exceeded. Azure was also affected again today, and users were predictably less than happy, with many having recently rolled out MFA to users.

Content from our partners
Scan and deliver
GenAI cybersecurity: "A super-human analyst, with a brain the size of a planet."
Cloud, AI, and cyber security – highlights from DTX Manchester

Microsoft blamed a buggy code roll-out, with the issues activated once a certain traffic threshold is reached.

The change had been intended to better manage connections to its caching services.

“Unfortunately, this change introduced more latency and a race-condition in the new connection management code, under heavy load. This caused the MFA service to slow down processing of requests, initially impacting the West EU data centres (which service APAC and EMEA traffic).”

One of the three root causes it identified “causes accumulation of processes on the MFA backend leading to resource exhaustion on the backend at which point it was unable to process any further requests from the MFA frontend while otherwise appearing healthy in our monitoring.”

The company has pledged to “review our update deployment procedures to better identify similar issues during our development and testing cycles and review the monitoring services to identify ways to reduce detection time and quickly restore service” (both by December 2018).

Microsoft also promised “review our containment process to avoid propagating an issue to other data centers (completion by Jan 2019)”.

Meanwhile, rebooting its servers seems to work…

Topics in this article : , ,
Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.