How do I reload a table without deleting existing views? - amazon-redshift

I have a dataset that spans across many tables by date.
table_name_YYYY_MM_DD
There are many VIEWS created across these date range tables. However whenever I need to reload a table, I have to delete all these views to remove dependency constraints
DROP TABLE IF EXISTS table_name_YYYY_MM_DD cascade;
Is there a way to reload the table, as part of a transaction, to where the VIEWS don't need to be deleted. Eg create a new table, and swap their names, so that the transaction would guarantee the views don't need to be deleted.

Don't drop the table. Instead, truncate it:
truncate table table_name_YYYY_MM_DD
This removes all rows (quickly), but the table remains. So other dependencies are not affected.
Afterwards, you need to insert the data back into the table, rather than recreating the table.

Related

Postgres View, after alter table to change table name, View still queries it?

Using Postgres database. I have an existing table, and several existing Views that query that table.
Call the table, 'contacts'.
I alter the table, changing the name to 'contacts_backup'. I then created a new table with the same name the older table used to have 'contacts'
Now it appears that if I query the existing views, the data is still retrieved from the renamed table, contacts_backup, and not the new table, 'contacts'.
Can this be? How can I update the Views to query the new table of the same name, and not the renamed contacts_backup?
My new table is actually a foreign table, but shouldn't the principle be the same? I was expecting the existing tables to query against the new table, not the old renamed one.
What is an efficient way to update the existing views to query from the new table?
This is because PostgreSQL does not store the view definition as an SQL string, but as a parsed query tree.
These parsed query trees don't contain the names of the referenced objects, but only their object identifier (oid), which does not change when you rename an object.
The same is true for table columns. This all holds for foreign tables as well.
When you examine the view definition, for example with pg_get_viewdef, the parse tree is rendered as text, so you will see the changed names.
If you want to change the table that a view is referring to, the only solution is to either DROP the view and CREATE it again, or you can use CREATE OR REPLACE VIEW.

How do I avoid accidentally deleting postgres table rows?

I am using Postgres. It seems very easy to accidentally delete all of a table's data.
For example, I tried to delete one table row and accidentally deleted the whole table contents. This is no problem; the table contained mock data. However, the tables that I create in the future will contain important data.
Example of very easy deletion:
polls_db=# DELETE from polls_choice;
DELETE 8
Is there an easy way in Postgres to restore table data, or to soft delete table data whereby the data can be easily restored?

Best practices for performing a table swap in Redshift

We're in the process of running a handful of hourly scripts on our Redshift cluster which build summary tables for data consumers. After assembling a staging table, the script then runs a transaction which deletes the existing table and replaces it with the staging table, as such:
BEGIN;
DROP TABLE IF EXISTS public.data_facts;
ALTER TABLE public.data_facts_stage RENAME TO data_facts;
COMMIT;
The problem with this operation is that long-running analysis queries will place an AccessShareLock on public.data_facts, preventing it from being dropped and thrashing our ETL cycle. I'm thinking a better solution would be one which renames the existing table, as such:
ALTER TABLE public.data_facts RENAME TO data_facts_old;
ALTER TABLE public.data_facts_stage RENAME TO data_facts;
DROP TABLE public.data_facts_old;
However, this approach presupposes that 1) public.data_facts exists, and 2) public.data_facts_old does not exist.
Do you know if there's a way to conduct this operation safely in SQL, without relying on application logic? (eg. something like ALTER TABLE IF EXISTS).
I haven't tried it but looking at the documentation of CREATE VIEW it seems that this can be done with late-binding views.
The main idea would be a view public.data_facts that users interact with. Behind the scenes, you can load new data and then swap the view to “point” to the new table.
Bootstrap
-- load data into public.data_facts_v0
CREATE VIEW public.data_facts AS
SELECT * from public.data_facts_v0 WITH NO SCHEMA BINDING;
Update
-- load data into public.data_facts_v1
CREATE OR REPLACE VIEW public.data_facts AS
SELECT * from public.data_facts_v1 WITH NO SCHEMA BINDING;
DROP TABLE public.data_facts_v0;
The WITH NO SCHEMA BINDING means the view will be late-binding. “A late-binding view doesn't check the underlying database objects, such as tables and other views, until the view is queried.” This means the update can even introduce a table with renamed columns or a completely new structure.
Notes:
It might be a good idea to wrap the swap operations into a transaction to make sure we don't drop the previous table if the VIEW swap failed.
You can add a new load time timestamp encode runlength default getdate() column to your target table, and make your ETL do this:
INSERT INTO public.data_facts
SELECT * FROM public.data_facts_staging;
DELETE FROM public.data_facts
WHERE load_time<(select max(load_time) from public.data_facts);
DROP TABLE public.data_facts_staging;
note: public.data_facts_staging should have exactly the same structure as public.data_facts except that the last column of public.data_facts is load_time, so that on insert it will be populated with the current timestamp.
The only implication is that it would require extra disk space for a moment between you insert new rows and delete the old rows, and load_time has to be always the last column. Also you have to vaccum table every time you do this.
Another good thing about this is that if your ETL fails and staging table is empty or there is no staging table you won't lose your data. In the pure SQL scenario of swapping tables with DDL you're not protected from dropping the target table when staging table is missing. In the suggested scenario if no new rows are inserted the delete statement deletes nothing (there are no rows less than max load time), so worst case is just having the old version of data.
p.s. there is a command that instead of insert ... select ... just changes the pointer from staging to target table (alter table ... append from ...) but it requires the same type of lock as alter table I guess, so I don't suggest this

postgresql copy from one table to another when the table updates

What I would like is when one row of a table is updated and a new table that will duplicate the original table will update as well but the problem is the original table is a master table that depends on other tables. Any idea how to do this? I'm very new to postgresql.
This is what triggers are for, assuming that the source and destination tables are in the same DB. In this case I think you need an AFTER INSERT OR UPDATE OR DELETE trigger.
http://www.postgresql.org/docs/current/static/plpgsql-trigger.html

PostgreSQL: dynamic row values (?)

Oh helloes!
I have two tables, first one (let's call it NameTable) is preset with a set of values (id, name) and the second one (ListTable) is empty but with same columns.
The question is: How can I insert into ListTable a value that comes from NameTable? So that if I change one name in the NameTable then automagically the values in ListTable are updated aswell.
Is there INSERT for this or does the tables has to be created in some special manner?
Tried browsing the manual but without success :(
The suggestion for using INSERT...SELECT is the best method for moving between tables in the same database.
However, there's another way to deal with the auto-update requirement.
It sounds like these are your criteria:
Table A is defined with columns (x,y)
(x,y) is unique
Table B is also defined with columns (x,y)
Table A is a superset of Table B
Table B is to be loaded with data from Table A and needs to remain in sync with UPDATEs on Table A.
This is a job for a FOREIGN KEY with the option ON UPDATE CASCADE:
ALTER TABLE B ADD FOREIGN KEY (x,y) REFERENCES A (x,y) ON UPDATE CASCADE;
Now, not only will it auto-update Table B when Table A is updated, table B is protected against containing (x,y) pairs that do not exist in Table A. If you want records to auto-delete from Table B when deleted from Table A, add "ON UPDATE DELETE."
Hmmm... I'm a bit confused about exactly what you want to do or why, but here are a couple of pointers towards things you might want to take a look at: table inheritance, triggers and rules.
Table inheritance in postgresql allows a table to share the data of a another table. So, if you add a row to the base table, it won't show up in the inherited table, but if you add a row to the inherited table, it will now show up in both tables and updates in either place will reflect it in both tables.
Triggers allow you to setup code that will be run when insert, update or delete operations happen on a table. This would allow you to add the behavior you describe manually.
Rules allow you to setup a rule that will replace a matching query with an alternative query when a specific condition is met.
If you describe your problem further as in why you want this behavior, it might be easier to suggest the right way to go about things :-)