Amazon Web Services (AWS) users – like those in most other cloud environments – can be amazed and sometimes terrified by how fast bills rack up.
Finessing workloads to keep costs under control remains an emerging dark art, though many have cottoned on to the more obvious tweaks; the Home Office, for example, says it has cut its bill over 40 percent by using spot instances.
Among the headaches for AWS users has been having to pay for CPUs/servers even when an application that relies on them is not running.
Redshift, the public cloud giant’s large scale data warehouse service, has been among the culprits and it has not gone unnoticed by rivals.
Oracle CTO Larry Ellison emphasised the point on a March 12 earnings call in which he said: “When your application isn’t running, you’re not paying for servers [in Oracle Cloud]. That is not true of Amazon’s databases. If you have Redshift, you pay for the Redshift processors”.
Twelve hours earlier, Ellison would have been correct. But AWS had, just the day before, made some changes that allow Redshift users to hit pause on cluster cycles and, potentially, trim bills tidily.
AWS Redshift Billing: Hit Pause!
The largely overlooked March 11 announcement revealed that Redshift now supports the ability to “pause and resume a cluster”, and with it, billing for compute. Users (who will still need to pay for storage) can do this in the Redshift console or via API.
Computer Business Review joined Corey Quinn, Cloud Economist and AWS billing expert at consultancy the Duckbill Group for a chat about the move.
Corey, talk us Through this Billing Change?
“The change is that you can now pause and resume clusters, which doesn’t sound like much, except historically you would either have to stop them completely, then rebuild the cluster when you needed it again, or just leave it running.
“Some of these cluster nodes are, for example, 13 bucks an hour. That’s not nothing.
“And surprise: they’re called clusters for a reason! Individual node pricing generally means you’re paying a multiple of that, sometimes a big multiple.
“In many cases, data science teams who are querying Redshift aren’t working around the clock. Sure, some workloads may [constantly] interface with this quite a bit, but the fact that you can pause and resume it when it’s down – you’re only paying for storage – is a massive, massive cost savings opportunity.
Presumably Not Everyone Can Do This…
“If you’ve reserved Redshift instances on a one-three-year basis, then this doesn’t help you because, use it or not, you’re paying for it!
“But if you have very cyclical workloads throughout the course of a day, a week, or a month, this is absolutely something worth looking into. A lot of times these big data warehouse clusters are idle, or nearly so, on weekends.
“How much money you can save depends completely on workload.
Can You Give us a Few Examples?
Sure. So, do you have long running jobs that don’t require human intervention? Or is this only used by a business analytics team?
If people are querying this full-time during business hours as part of their job, and they’re located in one place, well, that’s 40 hours a week. That leaves, what, 128 hours a week that you’re paying for historically that you don’t need to now.
Big multinationals, of course, have distributed teams: people observe different hours. The hassle of spinning down, and spinning back up may very well not justify itself. But as a counterpoint, big multinationals tend to be divisional in nature…
“They won’t have “The Redshift Cluster”, they will have 100 Redshift clusters and some of them are going to be much better aligned for this than others. It’s a huge win for development clusters, for example.”