How to ignore duplicate keys while using copy in postgresql - postgresql

I am using COPY table_name FROM STDIN to import data. It is
very efficient, but if there's any violation of duplicate keys, the whole
procedure will be stopped. Is there anyway to around this?
why does not postgresql just give a warning and copy rest of the data?
Here's the example :
select * from "Demo1";
Id | Name | Age
---+-------+-----
1 | abc | 20
2 | def | 22
COPY "Demo1" from STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 3 pqr 25
>> 4 xyz 26
>> 5 abc 21
>> \.
ERROR: duplicate key value violates unique constraint "Demo1_Name_key"
DETAIL: Key ("Name")=(abc) already exists.
CONTEXT: COPY Demo1, line 3
Here "Name" field is having unique constraint. Since string "abc" is already present in table. Its ignoring whole process.

You could use either of these two methods to import data:
COPY FROM (to a temporary table). Weed out Primary-Key failures and import only valid data.
Use FDW (like this example). Foreign-Data-Wrappers is recommended for Live Feeds / very large data sets, since you don't need to create an temporary copy (for errors / skip columns / skip rows etc.), and can directly run a SELECT statement skipping any column / row and INSERT into the destination table.

Related

SQL3116W The field value in row and column is missing, but the target column is not nullable. How to specify to use Column Default

I'm using LOAD command to get data into a table where one of the columns has the default value of the current timestamp. I had NULL value in the data being read as I thought it would cause the table to use the default value but based on above error that's not the case. How do I avoid the above error in this case?
Here is the full command, input file is text file: LOAD FROM ${LOADDIR}/${InputFile}.exp OF DEL MODIFIED BY COLDEL| INSERT INTO TEMP_TABLE NONRECOVERABLE
Try:
LOAD FROM ${LOADDIR}/${InputFile}.exp OF DEL MODIFIED BY USEDEFAULTS COLDEL| INSERT INTO TEMP_TABLE NONRECOVERABLE
This modifier usedefaults has been available in Db2-LUW since V7.x, as long as they are fully serviced (i.e. have had the final fixpack correctly applied).
Note that some Db2-LUW versions place restrictions on usage of usedefaults modifier, as detailed in the documentation. For example, restrictions relating to use with other modifiers, or modes or target table type.
Always specify your Db2-server version and platform when asking for help because the answer can depende on these facts.
You can specify which columns from the input file go into which columns of the table using METHOD P - if you omit the column you want the default for it will throw a warning but the default will be populated:
$ db2 "create table testtab1 (cola int, colb int, colc timestamp not null default)"
DB20000I The SQL command completed successfully.
$ cat tt1.del
1,1,1
2,2,2
3,3,99
$ db2 "load from tt1.del of del method P(1,2) insert into testtab1 (cola, colb)"
SQL27967W The COPY NO recoverability parameter of the Load has been converted
to NONRECOVERABLE within the HADR environment.
SQL3109N The utility is beginning to load data from file
"/home/db2inst1/tt1.del".
SQL3500W The utility is beginning the "LOAD" phase at time "07/12/2021
10:14:04.362385".
SQL3112W There are fewer input file columns specified than database columns.
SQL3519W Begin Load Consistency Point. Input record count = "0".
SQL3520W Load Consistency Point was successful.
SQL3110N The utility has completed processing. "3" rows were read from the
input file.
SQL3519W Begin Load Consistency Point. Input record count = "3".
SQL3520W Load Consistency Point was successful.
SQL3515W The utility has finished the "LOAD" phase at time "07/12/2021
10:14:04.496670".
Number of rows read = 3
Number of rows skipped = 0
Number of rows loaded = 3
Number of rows rejected = 0
Number of rows deleted = 0
Number of rows committed = 3
$ db2 "select * from testtab1"
COLA COLB COLC
----------- ----------- --------------------------
1 1 2021-12-07-10.14.04.244232
2 2 2021-12-07-10.14.04.244232
3 3 2021-12-07-10.14.04.244232
3 record(s) selected.

Postgres is adding a space at the beginning and end of all fields

SLES 12 SP3
Postgres 10.8
I have duplicated a table to migrate data from a DB2 instance. The fields are all of type CHAR, VARCHAR, or TIMESTAMP. I originally tried to use \COPY to pull the data in from a pipe delimited file. But, it put a space at the beginning and end of all of the fields, even if this caused the field to be longer than it is defined. I found a claim online that this was a known issue with \COPY. At that point, I dropped the table, used sed and some other tools to convert the pipe delimited data into an SQL INSERT statement. I again had a leading and trailing space in every field.
There are a lot of columns but as an example of what I have follows:
FLD1 CHAR(6) PRIMARY KEY
FLD2 VARCHAR(8)
FLD3 TIMESTAMP
I am using the short form of INSERT.
INSERT INTO my_table VALUES
('123456', '12345678', '2021-01-01 12:34:56');
But when I do a SELECT, I get (note the leading and trailing spaces):
123456 | 12345678 | 2021-01-01 12:34:56 |
I would point out that the first two fields are now longer than they are defined by 2 characters.
Does anyone how I might fix this?
The -A argument to psql gives me the desired result.

How does Redshift treat guillemets?

I am trying to run a CSV import using the COPY command for some data that includes a guillemet (»). Redshift complains that the column value is too long for the varchar column I have defined. The error in the "Loads" tab in the Redshift GUI displays this character as two dots: .. - had it been treated as one, it would have fit in the varchar column. It's not clear whether there is some sort of conversion error occurring or if there is a display issue.
When trying to do plain INSERTs I run into strange behavior as well:
dev=# create table test (name varchar(3));
CREATE TABLE
dev=# insert into test values ('bla');
INSERT 0 1
3 characters treated as 4?
dev=# insert into test values ('bl»');
ERROR: value too long for type character varying(3)
dev=# insert into test values ('b»');
INSERT 0 1
Why does char_length return 2?
dev=# select char_length(name), name from test;
char_length | name
-------------+------
2 | b»
I've checked the client encoding and database encodings and those all seem to be UTF8/UNICODE.
You need to increase the length of your varchar field. Multibyte characters use more than one character and length in the definition of varchar field are byte based. So, your special char might be taking more than a byte. If it still doesn't work refer to the doc page for Redshift below,
http://docs.aws.amazon.com/redshift/latest/dg/multi-byte-character-load-errors.html

COPY only some columns from an input CSV?

I have created a table in my database with name 'con' which has two columns with the name 'date' and 'kgs'. I am trying to extract data from this 'hi.rpt' file copied on this location 'H:Sir\data\reporting\hi.rpt' and want to store values in the table 'con' in my database.
I have tried this code in pgadmin
When I run:
COPY con (date,kgs)
FROM 'H:Sir\data\reporting\hi.rpt'
WITH DELIMITER ','
CSV HEADER
date AS 'Datum/Uhrzeit'
kgs AS 'Summe'
I get the error:
ERROR: syntax error at or near "date"
LINE 5: date AS 'Datum/Uhrzeit'
^
********** Error **********
ERROR: syntax error at or near "date"
SQL state: 42601
Character: 113
"hi.rpt" file from which i am reading the data look like this:
Datum/Uhrzeit,Sta.,Bez.,Unit,TBId,Batch,OrderNr,Mat1,Total1,Mat2,Total2,Mat3,Total3,Mat4,Total4,Mat5,Total5,Mat6,Total6,Summe
41521.512369(04.09.13 12:17:48),TB01,TB01,005,300,9553,,2,27010.47,0,0.00,0,0.00,3,1749.19,0,0.00,0,0.00,28759.66
41521.547592(04.09.13 13:08:31),TB01,TB01,005,300,9570,,2,27057.32,0,0.00,0,0.00,3,1753.34,0,0.00,0,0.00,28810.66
Is it possible to extract only two data values from 20 different type of data that i have in this 'hi.rpt' file or not?
or is there only a mistake in the syntax that i have written?
What is the correct way to write it?
I don't know where you got that syntax, but COPY doesn't take a list of column aliases like that. See the help:
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
(AS isn't one of the listed options; to see the full output run \d copy in psql, or look at the manual for the copy command online).
There is no mapping facility in COPY that lets you read only some columns of the input CSV. It'd be really useful, but nobody's had the time/interest/funding to implement it yet. It's really only one of many data transform/filtering tasks people want anyway.
PostgreSQL expects the column-list given in COPY to be in the same order, left-to-right, as what's in the CSV file, and have the same number of entries as the CSV file has columns. So if you write:
COPY con (date,kgs)
then PostgreSQL will expect an input CSV with exactly two columns. It'll use the first csv column for the "date" table column and the second csv column for the "kgs" table column. It doesn't care what the CSV headers are, they're ignored if you specify WITH (FORMAT CSV, HEADER ON), or treated as normal data rows if you don't specify HEADER.
PostgreSQL 9.4 adds FROM PROGRAM to COPY, so you could run a shell command to read the file and filter it. A simple Python or Perl script would do the job.
If it's a small file, just open a copy in the spreadsheet of your choice as a csv file, delete the unwanted columns, and save it, so only the date and kgs columns remain.
Alternately, COPY to a staging table that has all the same columns as the CSV, then do an INSERT INTO ... SELECT to transfer just the wanted data into the real target table.

Loading large amount of data into Postgres Hstore

The hstore documentation only talks about using "insert" into hstore one row at a time.
Is there anyway to do a bulk upload of several 100k rows
which could be megabytes or Gigs into a postgres hstore.
The copy commands seems to work only for uploading csv files columns
Could someone post an example ? Preferably a solution that works with python/psycopg
The above answers seems incomplete in that if you try to copy in multiple columns including a column with an hstore type and use a comma delimiter, COPY gets confused, like:
$ cat test
1,a=>1,b=>2,a
2,c=>3,d=>4,b
3,e=>5,f=>6,c
create table b(a int4, h hstore, c varchar(10));
CREATE TABLE;
copy b(a,h,c) from 'test' CSV;
ERROR: extra data after last expected column
CONTEXT: COPY b, line 1: "1,a=>1,b=>2,a"
Similarly:
copy b(a,h,c) from 'test' DELIMITER ',';
ERROR: extra data after last expected column
CONTEXT: COPY b, line 1: "1,a=>1,b=>2,a"
This can be fixed, though, by importing as a CSV and quoting the field to be imported into hstore:
$ cat test
1,"a=>1,b=>2",a
2,"c=>3,d=>4",b
3,"e=>5,f=>6",c
copy b(a,h,c) from 'test' CSV;
COPY 3
select h from b;
h
--------------------
"a"=>"1", "b"=>"2"
"c"=>"3", "d"=>"4"
"e"=>"5", "f"=>"6"
(3 rows)
Quoting is only allowed in CSV format, so importing as a CSV is required, but you can explicitly set the field delimiter and quote character to non ',' and '"' values using the DELIMITER and QUOTE arguments for COPY.
both insert and copy appear to work in natural ways for me
create table b(h hstore);
insert into b(h) VALUES ('a=>1,b=>2'::hstore), ('c=>2,d=>3'::hstore);
select * from b;
h
--------------------
"a"=>"1", "b"=>"2"
"c"=>"2", "d"=>"3"
(2 rows)
$ cat > /tmp/t.tsv
a=>1,b=>2
c=>2,d=>3
^d
copy b(h) from '/tmp/t.tsv';
select * from b;
h
--------------------
"a"=>"1", "b"=>"2"
"c"=>"2", "d"=>"3"
"a"=>"1", "b"=>"2"
"c"=>"2", "d"=>"3"
(4 rows)
You can definitely do this with the copy binary command.
I am not aware of a python lib that can do this, but I have a ruby one that can help you understand the column encodings.
https://github.com/pbrumm/pg_data_encoder