Why doesn't this rule prevent duplicate key violations? - postgresql

(postgresql) I was trying to COPY csv data into a table but I was getting duplicate key violation errors, and there's no way to tell COPY to ignore those, so following internet wisdom I tried adding this rule:
CREATE OR REPLACE RULE ignore_duplicate_inserts AS
ON INSERT TO mytable
WHERE (EXISTS ( SELECT mytable.id
FROM mytable
WHERE mytable.id = new.id)) DO NOTHING;
to circumvent the problem, but I still get those errors - any ideas why ?

Rules by default add things to the current action:
Roughly speaking, a rule causes additional commands to be executed when a given command on a given table is executed.
But an INSTEAD rule allows you to replace the action:
Alternatively, an INSTEAD rule can replace a given command by another, or cause a command not to be executed at all.
So, I think you want to specify INSTEAD:
CREATE OR REPLACE RULE ignore_duplicate_inserts AS
ON INSERT TO mytable
WHERE (EXISTS ( SELECT mytable.id
FROM mytable
WHERE mytable.id = new.id)) DO INSTEAD NOTHING;
Without the INSTEAD, your rule is essentially saying "do the INSERT and then do nothing" when you want to say "instead of the INSERT, do nothing" and, AFAIK, the DO INSTEAD NOTHING will do that.
I'm not an expert on PostgreSQL rules but I think adding the "INSTEAD" should work.
UPDATE: Thanks to araqnid we know that:
COPY FROM will invoke any triggers and check constraints on the destination table. However, it will not invoke rules
So a rule isn't going to work in this situation. However, triggers are fired during COPY FROM so you could write a BEFORE INSERT trigger that would return NULL when it detected duplicate rows:
It can return NULL to skip the operation for the current row. This instructs the executor to not perform the row-level operation that invoked the trigger (the insertion or modification of a particular table row).
That said, I think you'd be better off with araqnid's "load it all into a temporary table, clean it up, and copy it to the final destination" would be a more sensible solution for a bulk loading operation like you have.

COPY FROM will not invoke rules (http://www.postgresql.org/docs/9.0/interactive/sql-copy.html#AEN58860)
My approach would be to load the CSV data into a temp table, then use an INSERT...SELECT statement to copy the data into the target table where it doesn't already exist. (If there are duplicates in the CSV data itself, remove those from the temp table first). Something like:
BEGIN;
CREATE TEMP TABLE stage_data(key_column, data_columns...) ON COMMIT DROP;
\copy stage_data from data.csv with csv header
-- prevent any other updates while we are merging input (omit this if you don't need it)
LOCK target_data IN SHARE ROW EXCLUSIVE MODE;
-- insert into target table
INSERT INTO target_data(key_column, data_columns...)
SELECT key_column, data_columns...
FROM stage_data
WHERE NOT EXISTS (SELECT 1 FROM target_data
WHERE target_data.key_column = stage_data.key_column)
END;

Related

Cannot drop priorly modified new table in execute block

I'm not well acquainted with FB database and its subtleties.
On script executing, the problem occurres:
EXECUTE ibeblock
AS
BEGIN
-- 1. Create temporary table
execute statement 'recreate GLOBAL TEMPORARY table TMPTBL (ID bigint) /*on commit delete rows*/;';
commit;
-- 2. dummy fill of temporary table
insert into tmptbl (ID)
values (0xFE);
commit; -- not necessary
-- 3. perform some actions...
-- 4. Delete temporary table
execute statement 'drop table TMPTBL;';
commit; -- FAILURE!
END
The idea of script is primitive: 1) create temporary table; 2) fill it with records; 3) perform actions on other DB objects using populated records; 4) drop temp table.
For simulation, step-3 is useless (skipped). Step-4 leads to an error on commit: "This operation is not defined for system tables. unsuccessful metadata update. object TABLE "TMPTBL" is in use.".
Neither triggers nor constraints are applied for the table. Obviously, there should be nothing locking temp table.
Help, please, with resolution. Hopefully I missed something.
P.S.: FB 2.5, IBExpert 2017.12.13.1 used as DB managing tool
There are a number of problems with your code:
A global temporary table is intended as a permanent object, it is just the content that is temporary (either for transaction or connection duration). So normally you would create a global temporary table once, and not drop it, but instead reuse its definition.
Although you technically can execute DDL using execute statement, you are not supposed to, and it is not guaranteed to work. Your code is specifically an example of one of the things that will not work.
The problem here, is that you are trying to drop the table in the same transaction that used it (though to be honest, I'm surprised the insert even worked, because normally you can't insert into a table that was created in the same transaction).
The insert you executed on TMPTBL will mark the table in use, and given the transaction isn't committed yet, you can't drop the table: it is in use.
You shouldn't call commit in PSQL code (to be honest, I thought this wasn't even possible).
In short, you need to rethink how you use global temporary tables: define it once, and do not use execute statement to create it, but create it separately.
If you do want to create and drop it and not retain the definition of the global temporary table, then create it before the execute block, commit, then the execute block (with only the inserts and the 'perform some actions'), commit, and then drop it (and commit).
Alternatively, you might get away with executing the create using execute statement ... with autonomous transaction, the inserts and the 'perform some actions' in another execute statement ... with autonomous transaction, and finally the drop in yet another execute statement ... with autonomous transaction. However that makes your code very brittle, and this is not a recommend approach.
I have been forced again by devops guys to find robust solution to provide DB structure upgrades. Requirements: safely combine DDL and DML statements; ability to create temporary tables (for heavy selections); leave no garbage. Of course, upgrade is handled within single connection.
Referencing to the clues given by Mark a deeper insight and lots of experiments were made.
Here is template filescript that really worked out (isql native utility used):
SET TERM #;
-- 1. Create temporary table
EXECUTE BLOCK
AS
BEGIN
execute statement 'recreate GLOBAL TEMPORARY table TMPTBL (ID bigint) /*on commit preserve rows*/;';
END#
commit#
-- Data manipulations
EXECUTE BLOCK
AS
declare xid bigint;
BEGIN
-- 2. dummy fill of temporary table
begin
insert into TMPTBL (ID) values (0xFE);
end
-- 3. perform some actions...
for
select tt.ID
from TMPTBL tt
into :xid
do
begin
-- use :xid var
end
END#
commit#
-- 4. Delete temporary table
EXECUTE BLOCK
AS
BEGIN
execute statement 'drop table TMPTBL;';
END#
commit#
SET TERM ;#
Might be usefull for someone.
Damn, Firebird do drives crazy!

Delete rows from a table if table exists in Redshift otherwise ignore deletion

I am using Redshift. I want a query to delete selected rows from a redshift table if the table exists otherwise just ignore the statement.
Redshift's SQL dialect doesn't contain control-of-flow statements like IF.. THEN so you are not going to be able to do this in a single SQL statement.
Your application or process will need to first query the Redshift table metadata to determine if a table exists e.g.
select 1 from pg_tables where schemaname = 'myschema' and tablename = 'myschema';
If data is returned (i.e. the table exists) then the application or process will execute the delete statement, if no data is returned the application or process does nothing. Basically you need to handle the "if this then do this" logic externally to Redshift.
I recommend #Nathan's answer. I would use python/psycopg2 to set up this logic. The first query would check for the table's existence in pg_tables (eg SELECT count(1) FROM pg_tables WHERE tablename='foo'), and store the result in a variable. Then you'd check the results of that variable to decide whether to kick off a second query (your delete).
But, maybe you don't want to do it in Python. You're just all about Redshift (it's pretty sweet). You could just run the DELETE query in Redshift. If the table is not present, the query fails and nothing happens. If the table is, you delete your data. There's no harm in generating an error here.

How to set Ignore Duplicate Key in Postgresql while table creation itself

I am creating a table in Postgresql 9.5 where id is the primary key. While inserting rows in the table if anyone tries to insert duplicate id, i want it to get ignored instead of raising exception. Is there any way such that i can set this while table creation itself that duplicate entries get ignored.
There are many techniques to resolve duplicate insertion issue while writing insertion query i.e. using ON CONFLICT DO NOTHING, or using WHERE EXISTS clause etc. But i want to handle this at table creation end so that the person writing insertion query doesn't need to bother any.
Creating RULE is one of the possible solution. Are there other possible solutions? Maybe something like this:
`CREATE TABLE dbo.foo (bar int PRIMARY KEY WITH (FILLFACTOR=90, IGNORE_DUP_KEY = ON))`
Although exact this statement doesn't work on Postgresql 9.5 on my machine.
add a trigger before insert or rule on insert do instead - otherwise has to be handled by inserting query. both solutions will require more resources on each insert.
Alternative way to use function with arguments for insert, that will check for duplicates, so end users will use function instead of INSERT statement.
WHERE EXISTS sub-query is not atomic btw - so you can still have exception after check...
9.5 ON CONFLICT DO NOTHING is the best solution still

PostgreSQL TEMP table alternating between exist and not exist

I'm using PostgreSQL 9.6.2, with Toad client on Mac. Auto-commit is set to ON.
I first created a simple temp table like this:
CREATE TEMP TABLE demo_pairs
AS
WITH t (name, value) AS (VALUES ('a', 'b'), ('c', 'd'))
SELECT * FROM t;
Then something weird happens when I ran:
SELECT * FROM demo_pairs;
Every time I run the select (without re-running the create), it alternates between successfully selecting the values and error with table does not exist!
Can anyone help me understand what's going on?
https://www.postgresql.org/docs/current/static/sql-createtable.html
TEMPORARY or TEMP
If specified, the table is created as a temporary table. Temporary
tables are automatically dropped at the end of a session, or
optionally at the end of the current transaction (see ON COMMIT
below). Existing permanent tables with the same name are not visible
to the current session while the temporary table exists, unless they
are referenced with schema-qualified names. Any indexes created on a
temporary table are automatically temporary as well.
If you use session pooler that can close session for your or just close it yourself (eg network problem), the temp table will be dropped.
Also you can create it the way it is dropped on transaction end as well:
ON COMMIT
The behavior of temporary tables at the end of a transaction block can
be controlled using ON COMMIT. The three options are:
PRESERVE ROWS
No special action is taken at the ends of transactions. This is the
default behavior.
DELETE ROWS
All rows in the temporary table will be deleted at the end of each
transaction block. Essentially, an automatic TRUNCATE is done at each
commit.
DROP
The temporary table will be dropped at the end of the current transaction block.

PostgreSQL bulk insert with ActiveRecord

I've a lot of records that are originally from MySQL. I massaged the data so it will be successfully inserted into PostgreSQL using ActiveRecord. This I can easily do with insertions on row basis i.e one row at a time. This is very slow I want to do bulk insert but this fails if any of the rows contains invalid data. Is there anyway I can achieve bulk insert and only the invalid rows failing instead of the whole bulk?
COPY
When using SQL COPY for bulk insert (or its equivalent \copy in the psql client), failure is not an option. COPY cannot skip illegal lines. You have to match your input format to the table you import to.
If data itself (not decorators) is violating your table definition, there are ways to make this a lot more tolerant though. For instance: create a temporary staging table with all columns of type text. COPY to it, then fix offending rows with SQL commands before converting to the actual data type and inserting into the actual target table.
Consider this related answer:
How to bulk insert only new rows in PostreSQL
Or this more advanced case:
"ERROR: extra data after last expected column" when using PostgreSQL COPY
If NULL values are offending, remove the NOT NULL constraint from your target table temporarily. Fix the rows after COPY, then reinstate the constraint. Or take the route with the staging table, if you cannot afford to soften your rules temporarily.
Sample code:
ALTER TABLE tbl ALTER COLUMN col DROP NOT NULL;
COPY ...
-- repair, like ..
-- UPDATE tbl SET col = 0 WHERE col IS NULL;
ALTER TABLE tbl ALTER COLUMN col SET NOT NULL;
Or you just fix the source table. COPY tells you the number of the offending line. Use an editor of your preference and fix it, then retry. I like to use vim for that.
INSERT
For an INSERT (like commented) the check for NULL values is trivial:
To skip a row with a NULL value:
INSERT INTO (col1, ...
SELECT col1, ...
WHERE col1 IS NOT NULL
To insert sth. else instead of a NULL value (empty string in my example):
INSERT INTO (col1, ...
SELECT COALESCE(col1, ''), ...
A common work-around for this is to import the data into a TEMPORARY or UNLOGGED table with no constraints and, where data in the input is sufficiently bogus, text typed columns.
You can then do INSERT INTO ... SELECT queries against the data to populate the real table with a big query that cleans up the data during import. You can use a lot of CASE statements for this. The idea is to transform the data in one pass.
You might be able to do many of the fixes in Ruby as you read the data in, then push the data to PostgreSQL using COPY ... FROM STDIN. This is possible with Ruby's Pg gem, see eg https://bitbucket.org/ged/ruby-pg/src/tip/sample/copyfrom.rb .
For more complicated cases, look at Pentaho Kettle or Talend Studio ETL tools.