postgresql copy from one table to another when the table updates - postgresql

What I would like is when one row of a table is updated and a new table that will duplicate the original table will update as well but the problem is the original table is a master table that depends on other tables. Any idea how to do this? I'm very new to postgresql.

This is what triggers are for, assuming that the source and destination tables are in the same DB. In this case I think you need an AFTER INSERT OR UPDATE OR DELETE trigger.
http://www.postgresql.org/docs/current/static/plpgsql-trigger.html

Related

Postgres Select FOR UPDATE NOWAIT

I have a table called account, and I'm selecting a record from this table and I want it to be locked so I'm using UPDATE NOWAIT.
the problem is when I try to insert a record in another table (transaction) which has a foreign key to this table it takes for ever and so I know that it's somehow related to the lock.
Is it the desired functionality ?
Because I'm inserting a record to another table. and the other query might update that record but will not delete it so why not be able to insert a relation?
and what is the solution?

How do I copy data from one table to another, perform schema change and keep them in sync until cut off in Postgres?

I have workloads that have heavy schema changes and other ETL operations that are locking.
Before doing schema changes on my primary table, I would like to first copy the existing contents from the primary table on to a temporary table, then perform the schema change, then sync all new changes and once the "time is right" (cutoff?), do the cut over and have the temporary table become the primary table.
I know that I can use Triggers in postgres to sync data between two tables, and also use COPY to copy data from one table to another.
But I am not sure how can I can copy existing data first, then issue trigger to ensure no data is lost. Then also do the cut off so that the new table is primary.
What I am thinking is -
I issue a COPY table from primary table (TableA) to temp table TableB.
I then perform the schema change in TableB
I then setup Trigger from TableA to TableB for INSERT/UPDATE/DELETE
... Now I am not sure how can I cut off so TableB becomes TableA. I can use RENAME perhaps?
It feels like I can run into some lost changes between Step 1 and Step 2?
Basically I am trying to ensure no data between the three high level operations. Is there a better way to do this?

How do I reload a table without deleting existing views?

I have a dataset that spans across many tables by date.
table_name_YYYY_MM_DD
There are many VIEWS created across these date range tables. However whenever I need to reload a table, I have to delete all these views to remove dependency constraints
DROP TABLE IF EXISTS table_name_YYYY_MM_DD cascade;
Is there a way to reload the table, as part of a transaction, to where the VIEWS don't need to be deleted. Eg create a new table, and swap their names, so that the transaction would guarantee the views don't need to be deleted.
Don't drop the table. Instead, truncate it:
truncate table table_name_YYYY_MM_DD
This removes all rows (quickly), but the table remains. So other dependencies are not affected.
Afterwards, you need to insert the data back into the table, rather than recreating the table.

Best practices for performing a table swap in Redshift

We're in the process of running a handful of hourly scripts on our Redshift cluster which build summary tables for data consumers. After assembling a staging table, the script then runs a transaction which deletes the existing table and replaces it with the staging table, as such:
BEGIN;
DROP TABLE IF EXISTS public.data_facts;
ALTER TABLE public.data_facts_stage RENAME TO data_facts;
COMMIT;
The problem with this operation is that long-running analysis queries will place an AccessShareLock on public.data_facts, preventing it from being dropped and thrashing our ETL cycle. I'm thinking a better solution would be one which renames the existing table, as such:
ALTER TABLE public.data_facts RENAME TO data_facts_old;
ALTER TABLE public.data_facts_stage RENAME TO data_facts;
DROP TABLE public.data_facts_old;
However, this approach presupposes that 1) public.data_facts exists, and 2) public.data_facts_old does not exist.
Do you know if there's a way to conduct this operation safely in SQL, without relying on application logic? (eg. something like ALTER TABLE IF EXISTS).
I haven't tried it but looking at the documentation of CREATE VIEW it seems that this can be done with late-binding views.
The main idea would be a view public.data_facts that users interact with. Behind the scenes, you can load new data and then swap the view to “point” to the new table.
Bootstrap
-- load data into public.data_facts_v0
CREATE VIEW public.data_facts AS
SELECT * from public.data_facts_v0 WITH NO SCHEMA BINDING;
Update
-- load data into public.data_facts_v1
CREATE OR REPLACE VIEW public.data_facts AS
SELECT * from public.data_facts_v1 WITH NO SCHEMA BINDING;
DROP TABLE public.data_facts_v0;
The WITH NO SCHEMA BINDING means the view will be late-binding. “A late-binding view doesn't check the underlying database objects, such as tables and other views, until the view is queried.” This means the update can even introduce a table with renamed columns or a completely new structure.
Notes:
It might be a good idea to wrap the swap operations into a transaction to make sure we don't drop the previous table if the VIEW swap failed.
You can add a new load time timestamp encode runlength default getdate() column to your target table, and make your ETL do this:
INSERT INTO public.data_facts
SELECT * FROM public.data_facts_staging;
DELETE FROM public.data_facts
WHERE load_time<(select max(load_time) from public.data_facts);
DROP TABLE public.data_facts_staging;
note: public.data_facts_staging should have exactly the same structure as public.data_facts except that the last column of public.data_facts is load_time, so that on insert it will be populated with the current timestamp.
The only implication is that it would require extra disk space for a moment between you insert new rows and delete the old rows, and load_time has to be always the last column. Also you have to vaccum table every time you do this.
Another good thing about this is that if your ETL fails and staging table is empty or there is no staging table you won't lose your data. In the pure SQL scenario of swapping tables with DDL you're not protected from dropping the target table when staging table is missing. In the suggested scenario if no new rows are inserted the delete statement deletes nothing (there are no rows less than max load time), so worst case is just having the old version of data.
p.s. there is a command that instead of insert ... select ... just changes the pointer from staging to target table (alter table ... append from ...) but it requires the same type of lock as alter table I guess, so I don't suggest this

PostgreSQL v7.4 ALTER TABLE to change column

I have a need to change the length of CHAR columns in tables in a PostgreSQL v7.4 database. This version did not support the ability to directly change the column type or size using the ALTER TABLE statement. So, directly altering a column from a CHAR(10) to CHAR(20) for instance isn't possible (yeah, I know, "use varchars", but that's not an option in my current circumstance). Anyone have any advice/tricks on how to best accomplish this? My initial thoughts:
-- Save the table's data in a new "save" table.
CREATE TABLE save_data AS SELECT * FROM table_to_change;
-- Drop the columns from the first column to be changed on down.
ALTER TABLE table_to_change DROP column_name1; -- for each column starting with the first one that needs to be modified
ALTER TABLE table_to_change DROP column_name2;
...
-- Add the columns back, using the new size for the CHAR column
ALTER TABLE table_to_change ADD column_name1 CHAR(new_size); -- for each column dropped above
ALTER TABLE table_to_change ADD column_name2...
-- Copy the data bace from the "save" table
UPDATE table_to_change
SET column_name1=save_data.column_name1, -- for each column dropped/readded above
column_name2=save_date.column_name2,
...
FROM save_data
WHERE table_to_change.primary_key=save_data.primay_key;
Yuck! Hopefully there's a better way? Any suggestions appreciated. Thanks!
Not PostgreSQL, but in Oracle I have changed a column's type by:
Add a new column with a temporary name (ie: TMP_COL) and the new data type (ie: CHAR(20))
run an update query: UPDATE TBL SET TMP_COL = OLD_COL;
Drop OLD_COL
Rename TMP_COL to OLD_COL
I would dump the table contents to a flat file with COPY, drop the table, recreate it with the correct column setup, and then reload (with COPY again).
http://www.postgresql.org/docs/7.4/static/sql-copy.html
Is it acceptable to have downtime while performing this operation? Obviously what I've just described requires making the table unusable for a period of time, how long depends on the data size and hardware you're working with.
Edit: But COPY is quite a bit faster than INSERTs and UPDATEs. According to the docs you can make it even faster by using BINARY mode. BINARY makes it less compatible with other PGSQL installs but you won't care about that because you only want to load the data to the same instance that you dumped it from.
The best approach to your problem is to upgrade pg to something less archaic :)
Seriously. 7.4 is going to be removed from "supported versions" pretty soon, so I wouldn't wait for it to happen with 7.4 in production.