How to escape quotes when importing csv file in PostgreSQL / YugabyteDB

How to escape quotes when importing csv file in PostgreSQL / YugabyteDB - postgresql

Using YugabyteDB 2.5.3.1 (PostgreSQL 11.2).
I currently have this table:
create table bum2(id int, the_t text);
Looking to import this "a"bc" from a csv file into the text column.
Tried with this csv file:
6,""a""bc""
And:
\copy bum2 from data.csv WITH (FORMAT csv);
And getting:
yugabyte=# select * from bum2;
id | the_t
----+-------
6 | abc
(1 row)

You can use additional quotes to escape the quotes. The csv file below works:
6,"""a""bc"""
yugabyte=# \copy bum2 from data.csv WITH (FORMAT csv);
COPY 1
yugabyte=# select * from bum2;
id | the_t
----+--------
6 | "a"bc"
(1 row)

Related

How to export to S3 from RDS / Aurora using `aws_s3.query_export_to_s3` with a tab delimiter?

Trying to run
SELECT
*
FROM
aws_s3.query_export_to_s3(
'SELECT * FROM <tbl> WHERE <cond>',
aws_commons.create_s3_uri(
'<bucket_name>',
'<file_name>',
'<region>'
),
options :='format csv, HEADER true, delimiter $$\t$$'
)
;
The custom delimiter specification follows the AWS documentation
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/postgresql-s3-export.html#postgresql-s3-export-examples-custom-delimiter
However, it fails to export due to ERROR: COPY delimiter must be a single one-byte character
The tab delimiter provided in the query complies with the Postgres COPY command.
Any ideas?

you could use E''\t'', it worked, see code below
SELECT * from aws_s3.query_export_to_s3('select * from tb',
aws_commons.create_s3_uri('s3-bucket', 'data.csv', 'us-east-1'),
options :='format csv, HEADER true, delimiter E''\t'' '
);
rows_uploaded | files_uploaded | bytes_uploaded
---------------+----------------+----------------
2 | 1 | 21
(1 row)

Unable to replace dash with null during COPY operation from CSV

I have the following CSV data
"AG","Saint Philip","AG-08"
"AI","Anguilla","-"
"AL","Berat","AL-01"
I want to replace - with NULL
I use the following command
copy subdivision from '/tmp/IP2LOCATION-ISO3166-2.CSV' with delimiter as ',' NULL AS '-' csv;
The copy operation is success. However, - in 3rd column is being copied as well, instead of replaced with NULL.
Do you have idea what mistake in my command? My table is
CREATE TABLE subdivision(
country_code TEXT NOT NULL,
name TEXT NOT NULL,
code TEXT
);

It comes down to the quoting. If you have this:
"AI","Anguilla",-
"AL","Berat","AL-01"
Then the below works(using newer COPY format):
copy
subdivision
from
'/home/postgres/csv_test.csv'
with(format csv, delimiter ',' , NULL '-');
COPY 3
\pset null NULL
select * from subdivision ;
country_code | name | code
--------------+--------------+-------
AG | Saint Philip | AG-08
AI | Anguilla | NULL
AL | Berat | AL-01
If you maintain the original csv:
"AG","Saint Philip","AG-08"
"AI","Anguilla","-"
"AL","Berat","AL-01"
then you have to do this:
copy
subdivision
from
'/home/postgres/csv_test.csv'
with(format csv, delimiter ',' , NULL '-', FORCE_NULL (code) );
select * from subdivision ;
country_code | name | code
--------------+--------------+-------
AG | Saint Philip | AG-08
AI | Anguilla | NULL
AL | Berat | AL-01
where FORCE_NULL is:
https://www.postgresql.org/docs/current/sql-copy.html
FORCE_NULL
Match the specified columns' values against the null string, even if it has been quoted, and if a match is found set the value to NULL. In the default case where the null string is empty, this converts a quoted empty string into NULL. This option is allowed only in COPY FROM, and only when using CSV format.
So to convert quoted values you have to force the conversion by specifying the columns(s)

Importing bytea data into PostgreSQL by using COPY FROM stdin

I generated a (UTF-8) file by an external program for importing into PostgreSQL 9.6.1. Problem is the bytea field (PWHASH).
Snippet from this file (using TAB as delimiter)
COPY USERS (ID,CODE,PWHASH,EMAIL) FROM stdin;
7 test1 E'\\\\x657B954D27B4AC56FA997D24A5FF2563' test#amce.org
\.
When importing with
psql mydb myrole -f test.sql
Everything goes well.
However, if i query the result, the byte array is not 16 bytes, but 37 bytes:
select passwordhash,length(passwordhash) from users;
passwordhash | length
------------------------------------------------------------------------------+--------
\x45275c78363537423935344432374234414335364641393937443234413546463235363327 | 37
What is the correct syntax for this?

The format of the input file is wrong. It should be like this:
7 test1 \\x657B954D27B4AC56FA997D24A5FF2563 test#amce.org

I will have to "prepare" data I believe. Smth like here:
t=# insert into u select 'x657B954D27B4AC56FA997D24A5FF2563';
INSERT 0 1
Time: 5990.809 ms
t=# select b from u;
b
----------------------------------------------------------------------
\x783635374239353444323742344143353646413939374432344135464632353633
(1 row)
Time: 0.234 ms
t=# insert into u select decode('657B954D27B4AC56FA997D24A5FF2563','hex');
INSERT 0 1
Time: 62.767 ms
t=# select b from u;
b
----------------------------------------------------------------------
\x783635374239353444323742344143353646413939374432344135464632353633
\x657b954d27b4ac56fa997d24a5ff2563
(2 rows)
Time: 0.208 ms
So in your case you can:
create table t as select ID,CODE,PWHASH::text,EMAIL from users where false;
COPY t (ID,CODE,PWHASH,EMAIL) FROM stdin;
insert into users select ID,CODE,decode(substr(PWHASH,4),'hex'),EMAIL from t;

What is the simplest way to migrate data from MySQL to DB2

I need to migrate data from MySQL to DB2. Both DBs are up and running.
I tried to mysqldump with --no-create-info --extended-insert=FALSE --complete-insert and with a few changes on the output (e.g. change ` to "), I get to a satisfactory result but sometimes I have weird exceptions, like
does not have an
ending string delimiter. SQLSTATE=42603
Ideally I would want to have a routine that is as general as possible, but as an example here, let's say I have a DB2 table that looks like:
db2 => describe table "mytable"
Data type Column
Column name schema Data type name Length Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
id SYSIBM BIGINT 8 0 No
name SYSIBM VARCHAR 512 0 No
2 record(s) selected.
Its MySQL counterpart being
mysql> describe mytable;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| name | varchar(512) | NO | | NULL | |
+-------+--------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)
Let's assume the DB2 and MySQL databases are called mydb.
Now, if I do
mysqldump -uroot mydb mytable --no-create-info --extended-insert=FALSE --complete-insert | # mysldump, with options (see below): # do not output table create statement # one insert statement per record# ouput table column names
sed -n -e '/^INSERT/p' | # only keep lines beginning with "INSERT"
sed 's/`/"/g' | # replace ` with "
sed 's/;$//g' | # remove `;` at end of insert query
sed "s/\\\'/''/g" # replace `\'` with `''` , see http://stackoverflow.com/questions/2442205/how-does-one-escape-an-apostrophe-in-db2-sql and http://stackoverflow.com/questions/2369314/why-does-sed-require-3-backslashes-for-a-regular-backslash
, I get:
INSERT INTO "mytable" ("id", "name") VALUES (1,'record 1')
INSERT INTO "mytable" ("id", "name") VALUES (2,'record 2')
INSERT INTO "mytable" ("id", "name") VALUES (3,'record 3')
INSERT INTO "mytable" ("id", "name") VALUES (4,'record 4')
INSERT INTO "mytable" ("id", "name") VALUES (5,'" "" '' '''' \"\" ')
This ouput can be used as a DB2 query and it works well.
Any idea on how to solve this more efficiently/generally? Any other suggestions?

After having played around a bit I came with the following routine which I believe to be fairly general, robust and scalable.
1 Run the following command:
mysqldump -uroot mydb mytable --no-create-info --extended-insert=FALSE --complete-insert | # mysldump, with options (see below): # do not output table create statement # one insert statement per record# ouput table column names
sed -n -e '/^INSERT/p' | # only keep lines beginning with "INSERT"
sed 's/`/"/g' | # replace ` with "
sed -e 's/\\"/"/g' | # replace `\"` with `#` (mysql escapes double quotes)
sed "s/\\\'/''/g" > out.sql # replace `\'` with `''` , see http://stackoverflow.com/questions/2442205/how-does-one-escape-an-apostrophe-in-db2-sql and http://stackoverflow.com/questions/2369314/why-does-sed-require-3-backslashes-for-a-regular-backslash
Note: here unlike in the question ; are not being removed.
2 upload the file to DB2 server
scp out.sql user#myserver:out.sql
3 run queries from the file
db2 -tvsf /path/to/query/file/out.sql

how to deal with missings when importing csv to postgres?

I would like to import a csv file, which has multiple occurrences of missing values. I recoded them into NULL and tried to import the file as. I suppose that my attributes which include the NULLS are character values. However transforming them to numeric is bit complicated. Therefore I would like to import all of my table as:
\copy player_allstar FROM '/Users/Desktop/Rdaten/Data/player_allstar.csv' DELIMITER ';' CSV WITH NULL AS 'NULL' ';' HEADER
There must be a syntax error. But I tried different combinations and always get:
ERROR: syntax error at or near "WITH NULL"
LINE 1: COPY player_allstar FROM STDIN DELIMITER ';' CSV WITH NULL ...
I also tried:
\copy player_allstar FROM '/Users/Desktop/Rdaten/Data/player_allstar.csv' WITH(FORMAT CSV, DELIMITER ';', NULL 'NULL', HEADER);
and get:
ERROR: invalid input syntax for integer: "NULL"
CONTEXT: COPY player_allstar, line 2, column dreb: "NULL"
I suppose it is caused by preprocessing with R. The Table came with NAs so I change them to:
data[data==NA] <- "NULL"
I`m not aware of a different way chaning to NULL. I think this causes strings. Is there a different way to preprocess and keep the NAs(as NULLS in postgres of course)?
Sample:
pts dreb oreb reb asts stl
11 NULL NULL 8 3 NULL
4 5 3 8 2 1
3 NULL NULL 1 1 NULL
data type is integer

Given /tmp/sample.csv:
pts;dreb;oreb;reb;asts;stl
11;NULL;NULL;8;3;NULL
4;5;3;8;2;1
3;NULL;NULL;1;1;NULL
then with a table like:
CREATE TABLE player_allstar (pts integer, dreb integer, oreb integer, reb integer, asts integer, stl integer);
it works for me:
\copy player_allstar FROM '/tmp/sample.csv' WITH (FORMAT CSV, DELIMITER ';', NULL 'NULL', HEADER);

Your syntax is fine, the problem seem to be in the formatting of your data. Using your syntax I was able to load data with NULLs successfully:
mydb=# create table test(a int, b text);
CREATE TABLE
mydb=# \copy test from stdin WITH(FORMAT CSV, DELIMITER ';', NULL 'NULL', HEADER);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> col a header;col b header
>> 1;one
>> NULL;NULL
>> 3;NULL
>> NULL;four
>> \.
mydb=# select * from test;
a | b
---+------
1 | one
|
3 |
| four
(4 rows)
mydb=# select * from test where a is null;
a | b
---+------
|
| four
(2 rows)
In your case you can substitute to NULL 'NA' in the copy command, if the original value is 'NA'.
You should make sure that there's no spaces around your data values. For example, if your NULL is represented as NA in your data and fields are delimited with semicolon:
1;NA <-- good
1 ; NA <-- bad
1<tab>NA <-- bad
etc.