Updating table high processor utlization
Like a lot of folks in the data community, we've been impressed with Redshift.Yet at first, we couldn't figure out why performance was so variable on seemingly-simple queries.
Redshift is especially great for this kind of optimization because data on a cluster usually changes infrequently, often as a result of hourly or nightly ETLs.These speedups degrade if the intermediate results exceed the available RAM and get written to disk.Your query is likely exceeding the available RAM if it causes spikes in your disk usage graph: The disk space spikes as temporary tables are created and destroyed, slowing our queries in the process.The best way to make your SQL queries run faster is to have them do less work.A great way to do less work is to query a materialized view that's already done the heavy lifting.Or perhaps the query shares a common core with several other queries that also need to run, and resources are wasted recomputing that common data for every query.
The most common issue for query underperformance is when they do not use the tables' sort and dist keys.
Lifetime Daily ARPU (average revenue per user) is common metric and often takes a long time to compute. This common metric shows the changes in how much money you're making per user over the lifetime of the your product.
That's a monster query and it takes minutes to run on a database with 2B gameplays and 3M purchases.
This can be done per session using has increased for all steps, and most are no longer disk-based.
Requiring all of the query slots means that this query needs to wait until all 5 slots are available before it can run.
By default, Redshift uses 5 query slots and allocates one slot per query.