Those working in a data centre may, on occasion, have been asked to debug an unresponsive server: pressing a firm finger on a hardware interrupt button (NMI) and triggering the system to dump the state of the frozen kernel to a file for further analysis.
But how do you do that when your server is in the cloud?
Trigger a Kernel Panic by API
By API, is the short answer, and on Amazon Web Services, you now can.
AWS this week introduced a new EC2:SendDiagnosticInterrupt API that lets cloud and system engineers, or specialists in kernel diagnosis and debugging, trigger a kernel panic in EC2 instances, letting them analyse the resulting crash dump data.
The diagnostic interrupt causes an EC2 instance’s hypervisor to send a non-maskable interrupt (NMI) to the operating system, which will typically enter into kernel panic.
“Users”, AWS’s Sébastien Stormacq noted in a blog this week, will “find in the crash dump invaluable information to analyse the causes of a kernel freeze. Tools like WinDbg (on Windows) and crash
(on Linux) can be used to inspect the dump…”
By default, Windows Server AMIs have memory dump already turned on, AWS notes, with automatic restart after the memory dump has been saved also selected.
On Amazon Linux 2, users need to install and configure tools to help them do the same thing.
AWS Kernel Panic: Which Instances?
Users can send Diagnostic Interrupts to all EC2 instances in all public AWS regions as of this week.
The API covers all instances powered by the AWS Nitro System, except Arm-based ones.
This spans is C5, C5d, C5n, i3.metal, I3en, M5, M5a, M5ad, M5d, p3dn.24xlarge, R5, R5a, R5ad, R5d, T3, T3a, and Z1d. Needless to say, various permissions can be set around who can do this…
The feature is free, but the instance is still considered by running after you’ve triggered the panic, so instance billing will continue as usual.
Ever wanted to peruse an EC2 instance crash dump? Drop Computer Business Review a line with your thoughts on the new API.