Vacuum does not reclaim disk space - amazon-redshift

I have a fact table with 9.5M records.
The table uses distyle=key, and is hosted on a RedShift cluster with 2 "small" nodes.
I made many UPDATE and DELETE operations on the table, and as expected, I see that the "real" number of rows is much above 9.5M.
Hence, I ran vacuum on the table, and to my surprise, after vacuum finished, I still see that the number of "rows" the table allocates did not come back to 9.5M records.
Could you please advice what may be a reason for such a behavior?
What would be the best way to solve it?
A little bit of copy-pastes from my shell:
The fact table I was talking about:
select count(1) from tbl_facts;
9597184
The "real" number of records in the DB:
select * from stv_tbl_perm where id= 332469;
slice | id | name | rows | sorted_rows | temp | db_id | insert_pristine | delete_pristine
-------+--------+--------------------------------------------------------------------------+----------+-------------+------+--------+-----------------+-----------------
0 | 332469 | tbl_facts | 24108360 | 24108360 | 0 | 108411 | 0 | 1
2 | 332469 | tbl_facts | 24307733 | 24307733 | 0 | 108411 | 0 | 1
3 | 332469 | tbl_facts | 24370022 | 24370022 | 0 | 108411 | 0 | 1
1 | 332469 | tbl_facts | 24597685 | 24597685 | 0 | 108411 | 0 | 1
3211 | 332469 | tbl_facts | 0 | 0 | 0 | 108411 | 3 | 0
(All together is almost 100M records).
Thanks a lot!

I think you need to run analyze for the particular fact table. Analyze will update the statistics linked to the fact table after you run the vacuum (or any other command where the count of rows changes).
Do let us know if this was the case or not (i do not have a table handy where i can test this out) :-)

Related

Get the max value for each column in a table

I have a table for player stats like so:
player_id | game_id | rec | rec_yds | td | pas_att | pas_yds | ...
--------------------------------------------------------
1 | 3 | 1 | 5 | 0 | 3 | 20 |
2 | 3 | 0 | 8 | 1 | 7 | 20 |
3 | 3 | 3 | 9 | 0 | 0 | 0 |
4 | 3 | 5 | 15 | 0 | 0 | 0 |
I want to return the max values for every column in the table except player_id and game_id.
I know I can return the max of one single column by doing something like so:
SELECT MAX(rec) FROM stats
However, this table has almost 30 columns, so I would just be repeating the query below, for all 30 stats, just replacing the name of the stat.
SELECT MAX(rec) as rec FROM stats
This would get tedious real quick, and wont scale.
Is there any way to kind of loop over columns, get every column in the table and return the max value like so:
player_id | game_id | rec | rec_yds | td | pas_att | pas_yds | ...
--------------------------------------------------------
4 | 3 | 5 | 15 | 1 | 7 | 20 |
You can get the maximum of multiple columns in a single query:
SELECT
MAX(rec) AS rec_max,
MAX(rec_yds) AS rec_yds_max,
MAX(td) AS td_max,
MAX(pas_att) AS pas_att_max,
MAX(pas_yds) AS pas_yds_max
FROM stats
However, there is no way to dynamically get an arbitrary number of columns. You could dynamically build the query by loading all column names of the table, then apply conditions such as "except player_id and game_id", but that cannot be done within the query itself.

How to exclude all rows where a column value is different but other columns are the same

So many similar questions - mainly the questions are about how to select one of the duplicates where only a single column is different, but I want to exclude all of them from a query, and only get the ones where a particular field isn't different.
I am looking for all the reference_no where the status is -1, except for those where the status is both -1 and 1 for the same reference_no, as in the table below. The query should return only row Id 4. How do I do that?
This is using SQL server 2016
| id | process_date | status | reference_no |
| --- | ------------ | ------ | ----------- |
| 1 | 12/5/22 | 1 | 789456 |
| 2 | 12/5/22 | -1 | 789456 |
| 3 | 12/5/22 | 1 | 789456 |
| 4 | 12/5/22 | 1 | 321654 |
If I understand correctly you want a not exists check
select *
from t
where status = -1
and not exists (
select * from t t2
where t2.status = 1 and t2.reference_no = t.reference_no
);

No much improvement on max transaction id after vacuum full

We did a vacuum full on our table and toast. The dead tuples dropped drastically, however the max transaction id stays pretty much the same. My question is, why did it the max transaction id not go down as dead tuples go down drastically?
Before
select relname,last_autovacuum ,n_tup_upd,n_tup_del,n_tup_hot_upd,n_live_tup,n_dead_tup,n_mod_since_analyze,vacuum_count,autovacuum_count from pg_stat_all_tables where relname in ('examples','pg_toast_16450');
relname | last_autovacuum. | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | vacuum_count | autovacuum_count
----------------+-------------------------------+-----------+------------+---------------+------------+------------+---------------------+--------------+------------------
examples | 2022-01-18 23:26:52.432808+00 | 57712813 | 9818 | 48386674 | 3601588 | 306558 | 42208 | 0 | 44
pg_toast_16450 | 2022-01-17 23:14:42.516933+00 | 0 | 5735566377 | 0 | 3763818 | 805501171 | 11472355929 | 0 | 51
SELECT max(age(datfrozenxid)) FROM pg_database;
max
-----------
199857797
After
select relname,last_autovacuum ,n_tup_upd,n_tup_del,n_tup_hot_upd,n_live_tup,n_dead_tup,n_mod_since_analyze,vacuum_count,autovacuum_count from pg_stat_all_tables where relname in ('examples','pg_toast_16450');
relname | last_autovacuum | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | vacuum_count | autovacuum_count
----------------+-------------------------------+-----------+-------------+--------------+------------+------------+---------------------+--------------+------------------
examples | 2022-02-01 15:41:17.722575+00 | 120692014 | 9818 | 98148003 | 4172134 | 17666 | 150566 | 1 | 4064
pg_toast_16450 | 2022-02-01 20:49:30.552251+00 | 0 | 16169731895 | 0 | 5557218 | 33365 | 32342853690 | 0 | 15281
SELECT max(age(datfrozenxid)) FROM pg_database;
max
-----------
183888023
Yes, that is as expected. You need VACUUM to freeze tuples. VACUUM (FULL) doesn't.
Users tend to be confused, because both are triggered by the VACUUM statement, but VACUUM (FULL) is actually something entirely different from VACUUM. It is not just “a more thorough VACUUM”. The only thing they have in common is that they get rid of dead tuples. VACUUM (FULL) does not modify tuples, as freezing has to do, it just copies them around (or doesn't, if they are dead).

PostgreSQL 9.3:Updating table(order column) from another table, getting same values in rows

I need help with updating table from another table in Postgres Db.
Long story short we ended up with corrupted data in db, and now I need to update one table with values from another.
I have table with this data table wfc:
| step_id | command_id | commands_order |
|---------|------------|----------------|
| 1 | 1 | 0 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 3 |
| 1 | 1 | 0 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 3 | 1 |
| 2 | 4 | 3 |
and I want to update values in command_order column from another table, so I can have result like this:
| step_id | command_id | commands_order|
|---------|------------|---------------|
| 1 | 1 | 0 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 3 |
| 1 | 1 | 4 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 3 | 2 |
| 2 | 4 | 3 |
It was looking like easy task, but problem is to update rows for same command_id, it is writing same value in commands_order
SQL that I tried is:
UPDATE wfc
SET commands_order = CAST(sq.input_step_id as INTEGER)
FROM (
SELECT wfp.step_id, wfp.input_command_id, wfp.input_step_id
from wfp
order by wfp.step_id, wfp.input_step_id
) AS sq
WHERE (wfc.step_id=sq.step_id AND wfc.command_id=CAST(sq.input_command_id as INTEGER));
SQL Fiddle http://sqlfiddle.com/#!15/4efff4/4
I am pretty stuck with this, please help.
Thanks in advance.
Assuming you are trying to number the rows in the order in which they were created, and as long as you understand that ctid will chnage on update and with VACCUUM FULL, you can do the following:
select step_id, command_id, rank - 1 as command_order
from (select step_id, command_id, ctid as wfc_ctid, rank() over
(partition by step_id order by ctid)
from wfc) as wfc_ordered;
This will give you the wfc table with the ordering that you want. If you do update the original table, the ctids will change, so it's probably safer to create a copy of the table with the above query.

PostgreSQL simple count query

Trying to scale this down so the answer is simple. I can probably extrapolate the answers here to apply to a bigger data set.
Given the following table:
+------+-----+
| name | age |
+------+-----+
| a | 5 |
| b | 7 |
| c | 8 |
| d | 8 |
| e | 10 |
+------+-----+
I want to make a table that shows the count of people where their age is equal to or greater than x. For instance, the table about would produce:
+--------------+-------+
| at least age | count |
+--------------+-------+
| 5 | 5 |
| 6 | 4 |
| 7 | 4 |
| 8 | 3 |
| 9 | 1 |
| 10 | 1 |
+--------------+-------+
Is there a single query that can accomplish this task? Obviously, it is easy to write a simple function for it, but I'm hoping to be able to do this quickly with one query.
Thanks!
Yes, what you're looking for is a window function.
with cte_age_count as (
select age,
count(*) c_star
from people
group by age)
select age,
sum(c_star) over (order by age
range between unbounded preceding
and current row)
from cte_age_count
Not syntax checked ... let me know if it works!