Redshift Query With Many Rows Always Spills to Disk and Crashes - amazon-redshift

Running query below on AWS DC2.xlarge instance (10 nodes, 15GB RAM, 1.6 TB HDD). Tables has 1,732,721,100 rows, but only five columns. Working memory consumed by the sort step was 22,767,206,400 (23GB). I realize this exceeds the 15GB of physical memory, but won't Redshift page the sort onto the disk? There was over 1.2TB of free space in the clusters when the query ran, and I don't understand how a 23 GB sort filled up over 1.2 TB of disk space. The table columns are optimally encoded, superfluous columns have been dropped, and the sort keys match the partition logic. Only problem is a very high skewness of the table, but my attempt to unskew also result in a disk spill.`
create table xstg_prof_MED_phase2b as
select *, first_value(_random)
over (partition by subscriber_cd, mbr_cd
order by _random desc
rows between unbounded preceding and unbounded following)
from (select * from xstg_prof_MED_phase2a )
order by subscriber_cd, mbr_cd
;;;;;;;;;;;;;;;;;;;;;;;;;
Can someone please help? Or does Redshift just hate me?

I suspect this has more to do with your window function. The way to find out is to run explain on your query and see where the space is being consumed.
The issue is likely the skew. Each node of redshift is a networked computer and has its own memory and dick space. If any one node fills up the query fails. You may need to just deep copy the table before running the query.
Be aware that column encoding only impacts data oon disk. Once it is inflight it is raw (uncompressed). So memory and spill space are consumed by raw data.
In cloudwatch you can monitor each node's space available. If this assessment is correct you will see the table skew result in failure on one node.

Related

Autovacuum and partitioned tables

Postgres doc tells that partitioned tables are not processed by autovacuum. But still I see that last_autovacuum column from pg_stat_user_tables is populated with recent timestamps for live partitions.
Does it mean that these timestamps are set by the background worker which only prevents transaction ID wraparound, without actually performing ANALYZE&VACUUM? Or whatever else could populate them?
Besides, taken that partitions are large and active enough, should I run the both ANALYZE and VACUUM manually on those partitions? If yes, does the order matter?
UPDATE
I'm trying to elaborate, thanks to the comments given.
Taking that vacuum should work the same way on partition as on the regular table, what could be a reason for much faster growth of the occupied disk space after partitioning? Before partitioning it was nearly a linear function of records count.
What is confusing as well, when looking for autovacuum processes running I see that those related to partitions are denoted with "to prevent wraparound", while others are not. Is it absolutely a coincidence or there is something to check?
Documentation describes partitioned table as rather a virtual entity, without its own storage. What is the point in denoting that it is not vacuumed?
The statement from the documentation is true, but misleading. Autovacuum does not process the partitioned table itself, but it processes the partitions, which are regular PostgreSQL tables. So dead tuples get removed, the visibility map gets updated, and so on. In short, there is nothing to worry about as far as vacuuming is concerned. Remember that the partitioned table itself does not hold any data!
What the documentation warns you about is ANALYZE. Autovacuum also launches automatic ANALYZE jobs to collect accurate table statistics. This will be work fine on the partitions, but there are no table statistics collected on the partitioned table itself, so you have to run ANALYZE manually on the partitioned table to get these data. In practice, I find that not to be a problem, since the optimizer generates plans for each individual partition anyway, and there it has accurate statistics.

Postgres count(*) extremely slowly

I know that count(*) in Postgres is generally slow, however I have a database where it's super slow. I'm talking about minutes even hours.
There is approximately 40M rows in a table and the table consists of 29 columns (most of the are text, 4 are double precision). There is an index on one column which should be unique and I've already run vacuum full. It took around one hour to complete but without no observable results.
Database uses dedicated server with 32GB ram. I set shared_buffers to 8GB and work_mem to 80MB but no speed improvement. I'm aware there are some techniques to get approximated count or to use external table to keep the count but I'm not interested in the count specifically, I'm more concerned about performance in general, since now it's awful. When I run the count there are no CPU peeks or something. Could someone point where to look? Can it be that data are structured so badly that 40M rows are too much for postgres to handle?

Redshift vacuum is not reclaiming space

I have a Redshift cluster that consists of 2 nodes with 160 Gb disks.
I'm randomly getting "Disk full" error when running vacuum or any other query. My disk usage is 92%. I did delete more than a half of the old rows in table that is 10515 Mb in size, but even after rebooting the cluster there's no effect and table still of the same size, though count shows new number of rows. I should have a seen at lease small decrease in disk usage, but there's nothing.
Does anyone has any clues what it might be? Is deleting table in this case is the only option?
There are a few possibilities here but first let me check the facts. You have a 2 node dc2.large cluster and it is 92% disk full. This is too full and needs to lowered to be lowered to provide temp space for query execution. You have a table that is 10515 blocks in size. To address the disk space concern you deleted 1/2 of the rows in the table in question and then vacuumed the table. Once complete you didn't see any change to the cluster space nor the size of the table, not one block difference in table size. Do I have this correct?
First possibility is that the vacuum did not complete correctly. You mention that you are getting disk full messages even when vacuuming. So could it be that the vacuum you tried is not completing? You see vacuum need temp space to sort the table data and if you have a cluster that has gotten too full then the vacuum could fail. In this case you can run a delete-only vacuum that will not attempt to sort the table, just reclaim disk space. This will have a higher likelihood of success in a disk full situation.
Another possibility is that the delete of rows didn't complete correctly or wasn't committed before the vacuum was run. This will cause the vacuum to run on the full set of rows.
It is also possible that the table in question is very wide (many columns). This matters because of how Redshift stores data - each block is 1MB in size and each column needs a block for its data. This cluster has 4 slices and if this table is 1,500 columns wide (yes, that is silly wide) the table will take up 6,000 blocks to just store the first 4 rows. Then it takes no additional disk space to add rows until these blocks start to fill up. The table size will move in very large chunks and when removing rows the size may not change except in large chunks. This is unlikely to be what is happening if you are seeing EXACTLY the same number of blocks but if you are just seeing changes in blocks that are less than you expect this could be in play.
There could be some some other misunderstanding happening. A sort-only vacuum won't free up space. The node type isn't what I think it is. The table could live in S3 and be access through spectrum. But based on the description these don't seem likely.
UNSOLICITED ADVICE: You are on the right track by freeing up disk space but you need to take more action than reducing this one table. (I expect you realize this and this is just a start.) You should be operating below 70% disk full in most cases - this varies by workload and table sizes but is a good general rule. This means reducing a great deal of data on your disks or increasing your node count (and cost). Migrating some data to S3 and using Spectrum to access could be an option. If you need more storage w/o more compute you can look at the storage optimized nodes but since you are at the very smallest end of Redshift these likely aren't a win for you. You need to 1) remove unneeded data, 2) move some data to S3 and use Spectrum, or 3) add a node you your cluster.

Amazon RDS PostgreSQL: Sudden increase in Read IOPS

We are using Amazon RDS to host our PostgreSQL databases. Our production instance (db.t3.xlarge, Single-AZ) was running smoothly until suddenly Read IOPS, Read Latency, Read Throughput and Disk Queue Depth metrics in the AWS console increased rapidly and stayed high afterward (with a lower variability) whereas Write IOPS and Write Throughput were normal.
Read IOPS
Read Throughput
Disk Queue Depth
Write IOPS
There were no code changes or deployments on the date of the increase. There were no significant increases in user activity either.
About our DB structure, we have a single table that holds all of our data and in that table, we have these fields: id as UUID (primary key), type as VARCHAR, data as JSONB (holds the actual data), createdAt and updatedAt as timestamp with the time zone. Most of our data columns have sizes bigger than 2 KB so most of the rows are stored in TOAST table. We have 20 (BTREE) indexes that are created for frequently used fields in JSONB.
So far we have tried VACUUM ANALYZE and also completely rebuilding our table: creating a new table, copying all data from the old table, creating all indexes. They didn't change the behavior.
We also tried increasing storage thus increasing IOPS performance. It helped a bit but it is still not the same as before.
What could be the root cause of this problem? How can we fix it permanently (without increasing storage or instance type)? For now, we are looking for easy changes and we will improve our data model in the future.
T3 instances are not suitable for production. Try moving to another family like a C or M type. You may have hit some burst limits that are now causing odd behaviour

Redshift cluster: queries getting hang and filling up space

I have a Redshift cluster with 3 nodes. Every now and then, with users running queries against it, we end in this unpleasant situation where some queries run for way longer than expected (even simple ones, exceeding 15 minutes), and the cluster storage starts increasing to the point that if you don't terminate the long-standing queries it gets to 100% storage occupied.
I wonder why this may happen. My experience is varied, sometimes it's been a single query doing this and sometimes it's been different concurrent queries been run at the same time.
One specific scenario where we saw this happen related to LISTAGG. The type of LISTAGG is varchar(65535), and while Redshift optimizes away the implicit trailing blanks when stored to disk, the full width is required in memory during processing.
If you have a query that returns a million rows, you end up with 1,000,000 rows times 65,535 bytes per LISTAGG, which is 65 gigabytes. That can quickly get you into a situation like what you describe, with queries taking unexpectedly long or failing with “Disk Full” errors.
My team discussed this a bit more on our team blog the other day.
This typically happens when a poorly constructed query spills a too much data to disk. For instance the user accidentally specifies a Cartesian product (every row from tblA joined to every row of tblB).
If this happens regularly you can implement a QMR rule that limits the amount of disk spill before a query is aborted.
QMR Documentation: https://docs.aws.amazon.com/redshift/latest/dg/cm-c-wlm-query-monitoring-rules.html
QMR Rule Candidates query: https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminScripts/wlm_qmr_rule_candidates.sql