Need a bulk insert tip - sql-server-2008-r2

I need to insert a table data into another table. Where it is not guaranteed that the source table have all rows correctly where some of the not null fields are having null values. So with this source table I need to enter all valid rows into the table and find all unvalid rows which failed to insert and return them.
I know we can do this by validating all rows before hand. But as this is a bulk insert from a csv and parsed by .net code so from db we wil not validate it but directly enter.
We can also do this by running a loop but performance might hit.
so my question is is any way where we can use a single statement for insert and skip rows which has a problem and insert which are valid.

BULK INSERT is all-or-nothing. SQL Server does not have the ability to shunt erroneous rows into a separate table, alas.
The best thing you can do is to validate all data thoroughly before inserting it. If the insert still fails (maybe due to a bug) you need to retry all rows one-by-one and log the errors that are occurring.
You can also bulk insert to a temp table and move the rows from there to the final table one-by-one.

Related

PostgreSQL: Return auto-generated ids from COPY FROM insertion

I have a non-empty PostgreSQL table with a GENERATED ALWAYS AS IDENTITY column id. I do a bulk insert with the C++ binding pqxx::stream_to, which I'm assuming uses COPY FROM. My problem is that I want to know the ids of the newly created rows, but COPY FROM has no RETURNING clause. I see several possible solutions, but I'm not sure if any of them is good, or which one is the least bad:
Provide the ids manually through COPY FROM, taking care to give the values which the identity sequence would have provided, then afterwards synchronize the sequence with setval(...).
First stream the data to a temp-table with a custom index column for ordering. Then do something likeINSERT INTO foo (col1, col2)
SELECT ttFoo.col1, ttFoo.col2 FROM ttFoo
ORDER BY ttFoo.idx RETURNING foo.id
and depend on the fact that the identity sequence produces ascending numbers to correlate them with ttFoo.idx (I cannot do RETURNING ttFoo.idx too because only the inserted row is available for that which doesn't contain idx)
Query the current value of the identity sequence prior to insertion, then check afterwards which rows are new.
I would assume that this is a common situation, yet I don't see an obviously correct solution. What do you recommend?
You can find out which rows have been affected by your current transaction using the system columns. The xmin column contains the ID of the inserting transaction, so to return the id values you just copied, you could:
BEGIN;
COPY foo(col1,col2) FROM STDIN;
SELECT id FROM foo
WHERE xmin::text = (txid_current() % (2^32)::bigint)::text
ORDER BY id;
COMMIT;
The WHERE clause comes from this answer, which explains the reasoning behind it.
I don't think there's any way to optimise this with an index, so it might be too slow on a large table. If so, I think your second option would be the way to go, i.e. stream into a temp table and INSERT ... RETURNING.
I think you can create id with type is uuid.
The first step, you should random your ids after that bulk insert them, by this way your will not need to return ids from database.

Insert bulk data in postreSQL / TimescaleDB and manage errors

I have a script that select rows from InfluxDB, and bulk insert it into TimescaleDB.
I am inserting data each 2000 rows, to make it faster.
Thing is when I get one error, all 2000 rows is ignored. Is it possible to insert the 1999 rows, and ignore the failing one ?
Since PostgreSQL implements ACID transactions, the entire transaction is rollbacked on an error. The minimal granularity for transaction is one statement, e.g., INSERT INTO statement with batch of values, and this is default. So if failure happens, it is not possible to ignore it and commit the rest.
I assume you use INSERT INTO statement. It provides ON CONFLICT clause, which can be used in the case if the observed error is due to conflict.
Another work around is to move into a temporal table and then insert into hypertable after cleaning.
BTW, have you looked to Outflux tool from Timescale if it can help?

Postgres Insert as select from table ignore any errors

I am trying to bulk load records from a temp table to table using insert as select stmt and on conflict strategy do update.
I want load as many records as possible, currently if there any any foreign key violations no records get inserted, everything gets rolled back. Is there a way to insert valid records and skip the faulty records.
In https://dba.stackexchange.com/a/46477 I saw a strategy of going with the foreign table in the query to ignore the faulty rows. I don't want to do that too as I may have many foreign keys on that table and it will make my query more complex and table specific. I would like it to be generic.
Sample use case, if have 100 rows in the temp table and suppose row number 5 and 7 are causing insertion failure, I want to insert the rest 98 records and identify which two rows failed.
I want to avoid inserting record by record and catch the error, as it is not efficient. I am doing this whole exercise to avoid loading a table row by row.
Oracle provides support to catch bulk errors at a shot.
Sample https://stackoverflow.com/a/36430893/8575780
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1422998100346727312
I have already explored options loading using copy, it catches not null constraints and other data type errors, but when foreign key violation happens nothing gets committed.
I am looking something closer to what pgloader is doing when it faces error.
https://pgloader.readthedocs.io/en/latest/pgloader.html#batches-and-retry-behaviour

PostgreSQL bulk insert with ActiveRecord

I've a lot of records that are originally from MySQL. I massaged the data so it will be successfully inserted into PostgreSQL using ActiveRecord. This I can easily do with insertions on row basis i.e one row at a time. This is very slow I want to do bulk insert but this fails if any of the rows contains invalid data. Is there anyway I can achieve bulk insert and only the invalid rows failing instead of the whole bulk?
COPY
When using SQL COPY for bulk insert (or its equivalent \copy in the psql client), failure is not an option. COPY cannot skip illegal lines. You have to match your input format to the table you import to.
If data itself (not decorators) is violating your table definition, there are ways to make this a lot more tolerant though. For instance: create a temporary staging table with all columns of type text. COPY to it, then fix offending rows with SQL commands before converting to the actual data type and inserting into the actual target table.
Consider this related answer:
How to bulk insert only new rows in PostreSQL
Or this more advanced case:
"ERROR: extra data after last expected column" when using PostgreSQL COPY
If NULL values are offending, remove the NOT NULL constraint from your target table temporarily. Fix the rows after COPY, then reinstate the constraint. Or take the route with the staging table, if you cannot afford to soften your rules temporarily.
Sample code:
ALTER TABLE tbl ALTER COLUMN col DROP NOT NULL;
COPY ...
-- repair, like ..
-- UPDATE tbl SET col = 0 WHERE col IS NULL;
ALTER TABLE tbl ALTER COLUMN col SET NOT NULL;
Or you just fix the source table. COPY tells you the number of the offending line. Use an editor of your preference and fix it, then retry. I like to use vim for that.
INSERT
For an INSERT (like commented) the check for NULL values is trivial:
To skip a row with a NULL value:
INSERT INTO (col1, ...
SELECT col1, ...
WHERE col1 IS NOT NULL
To insert sth. else instead of a NULL value (empty string in my example):
INSERT INTO (col1, ...
SELECT COALESCE(col1, ''), ...
A common work-around for this is to import the data into a TEMPORARY or UNLOGGED table with no constraints and, where data in the input is sufficiently bogus, text typed columns.
You can then do INSERT INTO ... SELECT queries against the data to populate the real table with a big query that cleans up the data during import. You can use a lot of CASE statements for this. The idea is to transform the data in one pass.
You might be able to do many of the fixes in Ruby as you read the data in, then push the data to PostgreSQL using COPY ... FROM STDIN. This is possible with Ruby's Pg gem, see eg https://bitbucket.org/ged/ruby-pg/src/tip/sample/copyfrom.rb .
For more complicated cases, look at Pentaho Kettle or Talend Studio ETL tools.

Insert data from staging table into multiple, related tables?

I'm working on an application that imports data from Access to SQL Server 2008. Currently, I'm using a stored procedure to import the data individually by record. I can't go with a bulk insert or anything like that because the data is inserted into two related tables...I have a bunch of fields that go into the Account table (first name, last name, etc.) and three fields that will each have a record in an Insurance table, linked back to the Account table by the auto-incrementing AccountID that's selected with SCOPE_IDENTITY in the stored procedure.
Performance isn't very good due to the number of round trips to the database from the application. For this and some other reasons I'm planning to instead use a staging table and import the data from there. Reading up on my options for approaching this, a cursor that executes the same insert stored procedure on the data in the staging table would make sense. However it appears that cursors are evil incarnate and should be avoided.
Is there any way to insert data into one table, retrieve the auto-generated IDs, then insert data for the same records into another table using the corresponding ID, in a set-based operation? Or is a cursor my only option here?
Look at the OUTPUT clause. You should be able to add it to your INSERT statement to do what you want.
BTW, if you need to output columns into the second table that weren't inserted into the first one, then use MERGE instead of INSERT (as suggested in the comment to the original question) as its OUTPUT clause supports referencing other columns from the source table(s). Otherwise, keeping it with an INSERT is more straightforward, and it does give you access to the inserted identity column.
I'm having experiment to worked out in inserting multiple record into related table using databinding. So, try this!
Hopefully this is very helpful. Follow this link How to insert record into related tables. for more information.