AWS has overhauled how its Amazon Redshift data warehousing service processes queries, in a bid to tackle complaints about latency.
The move — which AWS says should double the speed with which Redshift processes queries that need to be compiled — comes amid pressure from rivals and customers, some of whom have abandoned it for alternatives like Snowflake after finding the service too slow.
Redshift Queries: No Code Cache Can Mean… A Wait.
Redshift generates and compiles code for each query execution, saying it does this because compiled code executes faster, as it “eliminates the overhead” of using an interpreter that directly executes instructions
New, or unique queries can be slow, as a result; as can queries on new clusters. (Compiled code segments are stored by AWS, so subsequent executions of the same query can run faster because it can skip the compilation phase. This cache survives cluster reboots, but is wiped by updates.)
As AWS notes: “For a new cluster with no code cache or after an existing cluster is upgraded with the latest release, code cache is flushed, and queries must undergo query compilation. As a result, the latency of the query will vary, which may not meet the requirements of some workloads
What’s Changed?
New changes, rolled out automatically for users, mean query compilations are now “scaled to a serverless compute engine beyond the compute resources of the leader node of your cluster,” AWS said. It now “processes queries 2x faster when they need to be compiled” it added in a June 2 blog.
AWS is “also releasing an unlimited cache to store compiled objects to increase cache hits, from 99.60 percent to 99.95 percent.”
“With this update, unlimited cache minimizes the need to compile code, and when compilation is needed, a scalable compilation farm compiles it in parallel to speed up your workloads. The magnitude of workload speed up depends on its complexity and concurrency” AWS said.
The query performance improvements are now automatically enabled with release number 1.0.13751.
Baby, Come Back
The move comes as several high-profile customers have swapped Redshift for alternatives after finding it too slow.
Sports app Strava was one, last year telling Computer Business Review that “We ran into challenges with scaling Redshift due to our data volumes as we continue to grow, as well as query performance as we had more users hitting the database with both ad-hoc SQL and BI tools.”
Strava opted for Snowflake, saying it doesn’t assume your data is in Amazon S3 buckets, and comes with extensions to JDBC, ODBC and dbAPI to simplify data ingestion processes and, the XML support that Redshift lacks.
The move is the latest tweak to Redshift designed to keep customers on board. On March 11, AWS also announced that customers would now be able to “pause and resume a cluster”, and with it, billing for compute.
The need for AWS customers to pay for CPUs/servers even when an application that relies on them is not running had not gone unnoticed by rivals, with Oracle’s Larry Ellison flagging it on an earnings call.