TSQL Constraints on Temp Tables - tsql

Very quick and simple question. I am running a script to import data and have declared a temp table and applied check constraints to that table. Obviously if the script is run more than once I check whether the temp table already exists and if so, I drop and recreate the temp table. would that also drop and recreate the check constraints I placed on the temp table?
Logic says yes but I have been known to be wrong about such things.

Yes - dropping the temp table will drop any internal dependencies as well, including your constraints.

Yes, dropping the table will drop constraints, indexes etc that you may have created on it.
Also, if this is something you're doing repeatedly you may want to truncate it instead of drop/recreate it - that is typically a lot faster. (And sometimes, creating a "permanent" temp table can be a good idea for frequently occuring ops)

Related

Huge delete on PostgreSQL table : Deleting 99,9% of the rows of the table

I have a table in my PostgreSQL database that became huge, filled with a lot of useless rows.
As these useless rows represent 99.9% of my table data (about 3.3M rows), I was wondering if deleting them could have a bad impact on my DB :
I know that this operation could take some time and I will be able to block writes on the table during the maintenance operation
But I was wondering if this huge change in the data could also impact performance after the opertation itself.
I found solutions like creating a new table / using TRUNCATE to drop all lines but as this operation will be specific and one shot, I would like to be able to choose the most adapted solution.
I know that Postgre SQL has a VACUUM mechanism but I'm not a DBA expert : Could anyone please confirm that this delete will not impact my table integrity / data structure and that freed space will be reclaimed if needed for new data ?
PostgreSQL 11.12, with default settings on AWS RDS. I don't have any index on my table and the criteria for rows deletion will not be based on the PK
Deleting rows typically does not shrink a PostgreSQL table, sou you would then have to run VACUUM (FULL) to compact it, during which the table is inaccessible.
If you are deleting many rows, both the DELETE and the VACUUM (FULL) will take a long time, and you would be much better off like this:
create a new table that is defined like the old one
INSERT INTO new_tab SELECT * FROM old_tab WHERE ... to copy over the rows you want to keep
drop foreign key constraints that point to the old table
create all indexes and constraints on the new table
drop the old table and rename the new one
By planning that carefully, you can get away with a short down time.

Best practices for performing a table swap in Redshift

We're in the process of running a handful of hourly scripts on our Redshift cluster which build summary tables for data consumers. After assembling a staging table, the script then runs a transaction which deletes the existing table and replaces it with the staging table, as such:
BEGIN;
DROP TABLE IF EXISTS public.data_facts;
ALTER TABLE public.data_facts_stage RENAME TO data_facts;
COMMIT;
The problem with this operation is that long-running analysis queries will place an AccessShareLock on public.data_facts, preventing it from being dropped and thrashing our ETL cycle. I'm thinking a better solution would be one which renames the existing table, as such:
ALTER TABLE public.data_facts RENAME TO data_facts_old;
ALTER TABLE public.data_facts_stage RENAME TO data_facts;
DROP TABLE public.data_facts_old;
However, this approach presupposes that 1) public.data_facts exists, and 2) public.data_facts_old does not exist.
Do you know if there's a way to conduct this operation safely in SQL, without relying on application logic? (eg. something like ALTER TABLE IF EXISTS).
I haven't tried it but looking at the documentation of CREATE VIEW it seems that this can be done with late-binding views.
The main idea would be a view public.data_facts that users interact with. Behind the scenes, you can load new data and then swap the view to “point” to the new table.
Bootstrap
-- load data into public.data_facts_v0
CREATE VIEW public.data_facts AS
SELECT * from public.data_facts_v0 WITH NO SCHEMA BINDING;
Update
-- load data into public.data_facts_v1
CREATE OR REPLACE VIEW public.data_facts AS
SELECT * from public.data_facts_v1 WITH NO SCHEMA BINDING;
DROP TABLE public.data_facts_v0;
The WITH NO SCHEMA BINDING means the view will be late-binding. “A late-binding view doesn't check the underlying database objects, such as tables and other views, until the view is queried.” This means the update can even introduce a table with renamed columns or a completely new structure.
Notes:
It might be a good idea to wrap the swap operations into a transaction to make sure we don't drop the previous table if the VIEW swap failed.
You can add a new load time timestamp encode runlength default getdate() column to your target table, and make your ETL do this:
INSERT INTO public.data_facts
SELECT * FROM public.data_facts_staging;
DELETE FROM public.data_facts
WHERE load_time<(select max(load_time) from public.data_facts);
DROP TABLE public.data_facts_staging;
note: public.data_facts_staging should have exactly the same structure as public.data_facts except that the last column of public.data_facts is load_time, so that on insert it will be populated with the current timestamp.
The only implication is that it would require extra disk space for a moment between you insert new rows and delete the old rows, and load_time has to be always the last column. Also you have to vaccum table every time you do this.
Another good thing about this is that if your ETL fails and staging table is empty or there is no staging table you won't lose your data. In the pure SQL scenario of swapping tables with DDL you're not protected from dropping the target table when staging table is missing. In the suggested scenario if no new rows are inserted the delete statement deletes nothing (there are no rows less than max load time), so worst case is just having the old version of data.
p.s. there is a command that instead of insert ... select ... just changes the pointer from staging to target table (alter table ... append from ...) but it requires the same type of lock as alter table I guess, so I don't suggest this

Implications of using ADD COLUMN on large dataset

Docs for Redshift say:
ALTER TABLE locks the table for reads and writes until the operation completes.
My question is:
Say I have a table with 500 million rows and I want to add a column. This sounds like a heavy operation that could lock the table for a long time - yes? Or is it actually a quick operation since Redshift is a columnar db? Or it depends if column is nullable / has default value?
I find that adding (and dropping) columns is a very fast operation even on tables with many billions of rows, regardless of whether there is a default value or it's just NULL.
As you suggest, I believe this is a feature of the it being a columnar database so the rest of the table is undisturbed. It simply creates empty (or nearly empty) column blocks for the new column on each node.
I added an integer column with a default to a table of around 65M rows in Redshift recently and it took about a second to process. This was on a dw2.large (SSD type) single node cluster.
Just remember you can only add a column to the end (right) of the table, you have to use temporary tables etc if you want to insert a column somewhere in the middle.
Personally I have seen rebuilding the table works best.
I do it in following ways
Create a new table N_OLD_TABLE table
Define the datatype/compression encoding in the new table
Insert data into N_OLD(old_columns) select(old_columns) from old_table Rename OLD_Table to OLD_TABLE_BKP
Rename N_OLD_TABLE to OLD_TABLE
This is a much faster process. Doesn't block any table and you always have a backup of old table incase anything goes wrong

db2 reorganize a table

When I alter a table in db2, I have to reorganize it
so I execute the next query:
Call Sysproc.admin_cmd ('reorg Table myTable');
I m searching an appropriate solution to reorganize a table when it s altered, or reorganize all the schema after making various modifications
You can determine when tables will require a REORG by looking at SYSIBMADM.ADMINTABINFO:
select tabschema, tabname
from sysibmadm.admintabinfo
where reorg_pending = 'Y'
You may also want to look at the NUM_REORG_REC_ALTERS column as this may show you additional tables that don't require reorganization due to various ALTER TABLE statements.
The reorg operation is similar to a defrag in hard disk. It frees empty spaces in pages, and eventually it could reorganize data according to an index. Depending on the features, it creates the compression dictionary and compress data.
As you can see, reorg operation is an administrative task, and it is not necessary each time data is modified. A database could run without reorg.
It order to ease this, DB2 included autonomic features like automatic backup, however this doesn't answer you own question. This will only trigger reorg on tables that need that.
To reorg a table explicitly you need to execute the command reorg http://publib.boulder.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0001966.html
or via the admin_cmd http://publib.boulder.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.sql.rtn.doc/doc/r0023582.html
in db2 config we have:
Automatic reorganization (AUTO_REORG) = OFF
we can set auto_reorg to on

PostgreSQL v7.4 ALTER TABLE to change column

I have a need to change the length of CHAR columns in tables in a PostgreSQL v7.4 database. This version did not support the ability to directly change the column type or size using the ALTER TABLE statement. So, directly altering a column from a CHAR(10) to CHAR(20) for instance isn't possible (yeah, I know, "use varchars", but that's not an option in my current circumstance). Anyone have any advice/tricks on how to best accomplish this? My initial thoughts:
-- Save the table's data in a new "save" table.
CREATE TABLE save_data AS SELECT * FROM table_to_change;
-- Drop the columns from the first column to be changed on down.
ALTER TABLE table_to_change DROP column_name1; -- for each column starting with the first one that needs to be modified
ALTER TABLE table_to_change DROP column_name2;
...
-- Add the columns back, using the new size for the CHAR column
ALTER TABLE table_to_change ADD column_name1 CHAR(new_size); -- for each column dropped above
ALTER TABLE table_to_change ADD column_name2...
-- Copy the data bace from the "save" table
UPDATE table_to_change
SET column_name1=save_data.column_name1, -- for each column dropped/readded above
column_name2=save_date.column_name2,
...
FROM save_data
WHERE table_to_change.primary_key=save_data.primay_key;
Yuck! Hopefully there's a better way? Any suggestions appreciated. Thanks!
Not PostgreSQL, but in Oracle I have changed a column's type by:
Add a new column with a temporary name (ie: TMP_COL) and the new data type (ie: CHAR(20))
run an update query: UPDATE TBL SET TMP_COL = OLD_COL;
Drop OLD_COL
Rename TMP_COL to OLD_COL
I would dump the table contents to a flat file with COPY, drop the table, recreate it with the correct column setup, and then reload (with COPY again).
http://www.postgresql.org/docs/7.4/static/sql-copy.html
Is it acceptable to have downtime while performing this operation? Obviously what I've just described requires making the table unusable for a period of time, how long depends on the data size and hardware you're working with.
Edit: But COPY is quite a bit faster than INSERTs and UPDATEs. According to the docs you can make it even faster by using BINARY mode. BINARY makes it less compatible with other PGSQL installs but you won't care about that because you only want to load the data to the same instance that you dumped it from.
The best approach to your problem is to upgrade pg to something less archaic :)
Seriously. 7.4 is going to be removed from "supported versions" pretty soon, so I wouldn't wait for it to happen with 7.4 in production.