View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Hardware
November 18, 2019

Enterprises Users Fuming Over “Giant Debacle” of Chrome White Screen Bug

"We require sovereignty over our environments"/"We don't want to be guinea pigs"

By CBR Staff Writer

System administrators have been left fuming at Google after the company pushed experimental changes out to stable versions of its Chrome browser, which triggered a “white screen of death” for thousands of business users.

The code change last week was silently pushed out as part of a WebContents Occlusion feature designed to suspend Chrome tabs when users move other apps on top of them, in a bid to reduce the browser’s high resource use.

Google said Friday that it had tested it on one percent of users with no negative impact – a comment that did little to assuage sysadmin frustrations.

Google Chrome White Screen: What Happened?

Those running the browser via Windows Server “terminal server” setups – a common setup in enterprise networks – or accessing Chrome through virtual machine environments like Citrix saw Chrome tabs turn completely unresponsive across their networks, impacting thousands of users.

With IT admins typically managing and controlling such updates, the Google Chrome white screen left IT teams scrambling to identify what in their systems had gone wrong as end-users howled at the abrupt outage. (Many businesses will not allow quick replacement downloads of an alternative browser).

Users on the Chromium bug thread expressed huge frustration at the bug itself, Google’s slow reaction and the fact that it had been pushed out to stable Chrome versions without any warning or any notification.

Content from our partners
Powering AI’s potential: turning promise into reality
Unlocking growth through hybrid cloud: 5 key takeaways
How businesses can safeguard themselves on the cyber frontline

As one sysadmin put it in a Chromium bug thread: “In a medical environment with 4000 concurrent users, it becomes a clinical risk to the patients if a web application does not run as it was intended. We absolutely require the ability to disable these random tests. When we deploy a product in a RDS/Citrix environment, we expect it to remain working and unchanged until we update it. We do not use the “autoupdate” function in the product and manage deployment.”

At my organization, nearly 100% of our users (~300) running in Citrix virtual desktops were directly effected for 2 full business days. Our main line of business application runs in Chrome and the result of the Occlusion flag being enabled, our staff was unable to effectively service customers” another user wrote. 

Another added: “Are we running beta/dev/canary versions of chromium where those experiments should take place? No, most of us are on stable/enterprise channel, and therefore shouldn’t have ours messed with at all.”

“The experiment / flag has been on in beta for ~5 months,” Google’s David Bienvenu said in a Chromium bug thread. “It was turned on for stable (e.g., m77, m78) via an experiment that was pushed to released Chrome Tuesday morning.

“Prior to that, it had been on for about one percent of M77 and M78 users for a month with no reports of issues, unfortunately.”

Another Google engineer added: “Once we received reports of the problem, we were able to revert it immediately. We sincerely apologize for the disruption this caused.”

“Many of us were scrambling to find root cause and validating all layers of the infrastructure when it was your team’s misstep all along”

The apology was not enough for many. On Saturday the thread was still drawing frustrated comments. As one user put it: “Given that we’re an enterprise and will need to be doing RCAs [Root Cause Analyses] for this giant debacle who on the chromium team is going to be writing up the RCA that they will make public for all of us so we can understand the details and your action plan to keep this from happening again?

“I know from experience this is part of your process. Will you commit to making this RCA public?”

Another added: “‘Oops’ and apologies unfortunately don’t unravel the mess and backlog this created for many of us. Like others state, figure out a way to eliminate Enterprise versions from being your test subjects. I cant imagine how many technical professionals were impacted with their employers by your experiment. Some employers don’t tolerate what is perceived as incompetence. Many of us were scrambling to find root cause and validating all layers of the infrastructure when it was your team’s misstep all along.”

The code bug capped a torrid few weeks for Google, which has also had to face a series of Google Cloud outages that left engineers being faced to manually fix tasks around the clock for three days. That was also triggered, in part, by inadequate testing prior to code rollout. (GCP said it has now implemented “continuous load testing as part of the deployment pipeline of the component which suffered the performance regression, so that such issues are identified before they reach production in future.”)

Read this: Codeanywhere Blames GCP Outage for Vanished Projects

 

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.
THANK YOU