I'm trying to use COPY with HEADER option but my header line in file is in different order than the column order specified in database.
Is the column name order necessary in my file ??
My code is as below:
COPY table_name (
SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'schema_name'
AND table_name = 'table_name'
)
FROM 'file.csv'
WITH DELIMITER ',' CSV HEADER;
My database table has got a different order from file.csv and i wanted to select the table order and copy data from csv to table.
You can't issue an SQL query in copy from. You can only list the columns.
If the CSV columns are in the b, a, c order then list that in the copy from command:
copy target_table (b, a, c)
from file.csv
with (delimiter ',', format csv, header)
Assuming the order of the columns we need is the one of the table from which we are copying the results, the next logical step would be to simulate a sub-query through a Bash script.
psql schema_origin -c 'COPY table_origin TO stdout' | \
psql schema_destination -c \
"$(echo 'COPY table_destination (' \
$(psql schema_origin -t -c "select string_agg(column_name, ',') \
from information_schema.columns where table_name = 'table_origin'") \
') FROM stdin')"
StackOverflow answer on COPY command
StackExchange answer on fetching column names
StackOverflow answer on fetching results as tuples
I came up with the following setup for making COPY TO/FROM successful even for quite sophisticated JSON columns:
COPY "your_schema_name.yor_table_name" (
SELECT string_agg(
quote_ident(column_name),
','
) FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'yuour_table_name'
AND TABLE_SCHEMA = 'your_schema_name'
) FROM STDIN WITH CSV DELIMITER E'\t' QUOTE '\b' ESCAPE '\';
--here rows data
\.
the most important parts:
be explicit in filtering from information_schema.columns and user also the table_schema. Otherwise, you may end up with unexpected columns when one table name occurs in multiple schemas.
use quote_ident to make sure your command does not crash if someone made poor naming of table columns using Postgres registred keywords like user or unique. Thanks to quote_ident you will get them wrapped in double-quotes what makes them safe for importing.
I also found the following setup:
QUOTE '\b' - quote with backspace
DELIMITER E'\t' - delimiter with tabs
ESCAPE '\' - and escape with a backslash
for making both COPY to and from most reliable also for dealing with sophisticated/nested JSON columns.
Related
I have following row in AWS Redshift warehouse table.
name
-------------------
tokenauthserver2018
This I queried via simple SELECT query
SELECT name
FROM tablename
When I am trying to unload it using UNLOAD query from AWS Redshift, it is successfully finishing but giving weird quoting.
"name"
"tokenauthserver2018\
Here is my query
UNLOAD ($TABLE_QUERY$
SELECT name
FROM tablename
$TABLE_QUERY$)
TO 's3://bucket/folder'
MANIFEST VERBOSE HEADER DELIMITER AS ','
NULL AS '' ESCAPE GZIP ADDQUOTES ALLOWOVERWRITE PARALLEL OFF;
I tried unloading without ADDQUOTES as well, but got following data
name
"tokenauthserver2018
This is the query for above.
UNLOAD ($TABLE_QUERY$
SELECT name
FROM tablename
$TABLE_QUERY$)
TO 's3://bucket/folder'
MANIFEST VERBOSE HEADER CSV NULL AS '' GZIP ALLOWOVERWRITE PARALLEL OFF;
Amazon support was able to resolve this, I am posting answer here for anyone interested.
This was due to presence of NULL character \0 in my data. As I don't have control over source data, I used TRANSLATE function to replace \0 character.
SELECT
TRANSLATE("name", CHR(0), '') AS "name"
FROM <tablename>
Reference: https://docs.aws.amazon.com/redshift/latest/dg/r_TRANSLATE.html
I have a code which creates 6 templates, adds data to them, merges them and export it as data. I can make it work by using F5 on different paragraphs, but I want to make the whole code work. Can someone help me, I am pretty new.
CREATE TEMP TABLE john1
(email VARCHAR(200));
COPY john1(email) from 'E:\WORK\FXJohn1.csv' DELIMITER ',' CSV HEADER
CREATE TEMP TABLE john2
(email VARCHAR(200));
COPY john2(email) from 'E:\WORK\FXJohn2.csv' DELIMITER ',' CSV HEADER
CREATE TEMP TABLE john3
(email VARCHAR(200));
COPY john3(email) from 'E:\WORK\FXJohn3.csv' DELIMITER ',' CSV HEADER
CREATE TEMP TABLE john4
(email VARCHAR(200));
COPY john4(email) from 'E:\WORK\FXJohn4.csv' DELIMITER ',' CSV HEADER
CREATE TEMP TABLE john5
(email VARCHAR(200));
COPY john5(email) from 'E:\WORK\FXJohn5.csv' DELIMITER ',' CSV HEADER
CREATE TEMP TABLE john6
(email VARCHAR(200));
COPY john6(email) from 'E:\WORK\FXJohn6.csv' DELIMITER ',' CSV HEADER
CREATE TABLE finished AS
(SELECT * FROM john1
UNION
SELECT * FROM john2
UNION
SELECT * FROM john3
UNION
SELECT * FROM john4
UNION
SELECT * FROM john5
UNION
SELECT * FROM john6);
DO $func$
BEGIN
EXECUTE $$
COPY public."finished" TO 'E:\$$ || to_char(CURRENT_DATE, 'YYYY_MM_DD') || $$.csv' DELIMITER ',' CSV HEADER;
$$;
END;
$func$ LANGUAGE plpgsql;
#Rupert
Sorry, but for some reason this script is not running for me, I get this error:
ERROR: syntax error at or near "for" LINE 1: for x in $(ls FXJohn1*.csv);
Do I change the variables correctly?
for x in $(ls file_name*.csv);
| I change file_name to one of my .csv in the folder|
do psql -c "copy table_name from
| I change table_name to my current table name I've created|
'/path/. todir/$x' csv" db_name; done
| I change path to E:\WORK (there are my all my csv files.
Firstly you can load multiple .csv files into the same table. So let's set that up first:
CREATE TABLE finished
(
email varchar(200)
)
Then you can load multiple files from the same folder using a simple bash script:
for x in $(ls file_name*.csv);
do psql -c "copy table_name from
'/path/. todir/$x' csv" db_name; done
This saves you doing multiple 'copies' and then the multiple UNIONs.
Then you can run your script:
DO $func$
BEGIN
EXECUTE $$
COPY public."finished" TO 'E:\$$ || to_char(CURRENT_DATE,
'YYYY_MM_DD') || $$.csv'
DELIMITER ',' CSV HEADER;
$$;
END;
$func$ LANGUAGE plpgsql;
This is a follow-up question from this answer for "Save PL/pgSQL output from PostgreSQL to a CSV file".
I need to write a client-side CSV file using psql's \copy command. A one liner works:
db=> \copy (select 1 AS foo) to 'bar.csv' csv header
COPY 1
However, I have long queries that span several lines. I don't need to show the query, as I can't seem to extend this past one line without a parse error:
db=> \copy (
\copy: parse error at end of line
db=> \copy ( \\
\copy: parse error at end of line
db=> \copy ("
\copy: parse error at end of line
db=> \copy "(
\copy: parse error at end of line
db=> \copy \\
\copy: parse error at end of line
Is it possible to use \copy with a query that spans multiple lines? I'm using psql on Windows.
The working solution I have right now is to create a temporary view, which can be declared over multiple lines, then select from it in the \copy command, which fits comfortably on one line.
db=> CREATE TEMP VIEW v1 AS
db-> SELECT i
db-> FROM generate_series(1, 2) AS i;
CREATE VIEW
db=> \cd /path/to/a/really/deep/directory/structure/on/client
db=> \copy (SELECT * FROM v1) TO 'out.csv' csv header
COPY 2
db=> DROP VIEW v1;
DROP VIEW
We may use HEREDOC to feed multiline SQL to psql and use
# Putting the SQL using a HEREDOC
tr '\n' ' ' << SQL| \psql mydatabase
\COPY (
SELECT
provider_id,
provider_name,
...
) TO './out.tsv' WITH( DELIMITER E'\t', NULL '', )
SQL
Source: https://minhajuddin.com/2017/05/18/how-to-pass-a-multi-line-copy-sql-to-psql/
You can combine the server side COPY command with the \g psql command to produce a multi-line query to local file:
db=# COPY (
SELECT department, count(*) AS employees
FROM emp
WHERE role = 'dba'
GROUP BY department
ORDER BY employees
) TO STDOUT WITH CSV HEADER \g department_dbas.csv
COPY 5
I describe this technique in detial here https://hakibenita.com/postgresql-unknown-features#use-copy-with-multi-line-sql
I want to copy a CSV file to a Postgres table. There are about 100 columns in this table, so I do not want to rewrite them if I don't have to.
I am using the \copy table from 'table.csv' delimiter ',' csv; command but without a table created I get ERROR: relation "table" does not exist. If I add a blank table I get no error, but nothing happens. I tried this command two or three times and there was no output or messages, but the table was not updated when I checked it through PGAdmin.
Is there a way to import a table with headers included like I am trying to do?
This worked. The first row had column names in it.
COPY wheat FROM 'wheat_crop_data.csv' DELIMITER ';' CSV HEADER
With the Python library pandas, you can easily create column names and infer data types from a csv file.
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('postgresql://user:pass#localhost/db_name')
df = pd.read_csv('/path/to/csv_file')
df.to_sql('pandas_db', engine)
The if_exists parameter can be set to replace or append to an existing table, e.g. df.to_sql('pandas_db', engine, if_exists='replace'). This works for additional input file types as well, docs here and here.
Alternative by terminal with no permission
The pg documentation at NOTES
say
The path will be interpreted relative to the working directory of the server process (normally the cluster's data directory), not the client's working directory.
So, gerally, using psql or any client, even in a local server, you have problems ... And, if you're expressing COPY command for other users, eg. at a Github README, the reader will have problems ...
The only way to express relative path with client permissions is using STDIN,
When STDIN or STDOUT is specified, data is transmitted via the connection between the client and the server.
as remembered here:
psql -h remotehost -d remote_mydb -U myuser -c \
"copy mytable (column1, column2) from STDIN with delimiter as ','" \
< ./relative_path/file.csv
I have been using this function for a while with no problems. You just need to provide the number columns there are in the csv file, and it will take the header names from the first row and create the table for you:
create or replace function data.load_csv_file
(
target_table text, -- name of the table that will be created
csv_file_path text,
col_count integer
)
returns void
as $$
declare
iter integer; -- dummy integer to iterate columns with
col text; -- to keep column names in each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet
begin
set schema 'data';
create table temp_table ();
-- add just enough number of columns
for iter in 1..col_count
loop
execute format ('alter table temp_table add column col_%s text;', iter);
end loop;
-- copy the data from csv file
execute format ('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_file_path);
iter := 1;
col_first := (select col_1
from temp_table
limit 1);
-- update the column names based on the first row which has the column names
for col in execute format ('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
loop
execute format ('alter table temp_table rename column col_%s to %s', iter, col);
iter := iter + 1;
end loop;
-- delete the columns row // using quote_ident or %I does not work here!?
execute format ('delete from temp_table where %s = %L', col_first, col_first);
-- change the temp table name to the name given as parameter, if not blank
if length (target_table) > 0 then
execute format ('alter table temp_table rename to %I', target_table);
end if;
end;
$$ language plpgsql;
## csv with header
$ psql -U$db_user -h$db_host -p$db_port -d DB_NAME \
-c "\COPY TB_NAME FROM 'data_sample.csv' WITH (FORMAT CSV, header);"
## csv without header
$ psql -U$db_user -h$db_host -p$db_port -d DB_NAME \
-c "\COPY TB_NAME FROM 'data_sample.csv' WITH (FORMAT CSV);"
## csv without header, specify column
$ psql -U$db_user -h$db_host -p$db_port -d DB_NAME \
-c "\COPY TB_NAME(COL1,COL2) FROM 'data_sample.csv' WITH (FORMAT CSV);"
all columns in csv should be same as table (or same as specified column)
about COPY
https://www.postgresql.org/docs/9.2/sql-copy.html
You can use d6tstack which creates the table for you and is faster than pd.to_sql() because it uses native DB import commands. It supports Postgres as well as MYSQL and MS SQL.
import pandas as pd
df = pd.read_csv('table.csv')
uri_psql = 'postgresql+psycopg2://usr:pwd#localhost/db'
d6tstack.utils.pd_to_psql(df, uri_psql, 'table')
It is also useful for importing multiple CSVs, solving data schema changes and/or preprocess with pandas (eg for dates) before writing to db, see further down in examples notebook
d6tstack.combine_csv.CombinerCSV(glob.glob('*.csv'),
apply_after_read=apply_fun).to_psql_combine(uri_psql, 'table')
The hstore documentation only talks about using "insert" into hstore one row at a time.
Is there anyway to do a bulk upload of several 100k rows
which could be megabytes or Gigs into a postgres hstore.
The copy commands seems to work only for uploading csv files columns
Could someone post an example ? Preferably a solution that works with python/psycopg
The above answers seems incomplete in that if you try to copy in multiple columns including a column with an hstore type and use a comma delimiter, COPY gets confused, like:
$ cat test
1,a=>1,b=>2,a
2,c=>3,d=>4,b
3,e=>5,f=>6,c
create table b(a int4, h hstore, c varchar(10));
CREATE TABLE;
copy b(a,h,c) from 'test' CSV;
ERROR: extra data after last expected column
CONTEXT: COPY b, line 1: "1,a=>1,b=>2,a"
Similarly:
copy b(a,h,c) from 'test' DELIMITER ',';
ERROR: extra data after last expected column
CONTEXT: COPY b, line 1: "1,a=>1,b=>2,a"
This can be fixed, though, by importing as a CSV and quoting the field to be imported into hstore:
$ cat test
1,"a=>1,b=>2",a
2,"c=>3,d=>4",b
3,"e=>5,f=>6",c
copy b(a,h,c) from 'test' CSV;
COPY 3
select h from b;
h
--------------------
"a"=>"1", "b"=>"2"
"c"=>"3", "d"=>"4"
"e"=>"5", "f"=>"6"
(3 rows)
Quoting is only allowed in CSV format, so importing as a CSV is required, but you can explicitly set the field delimiter and quote character to non ',' and '"' values using the DELIMITER and QUOTE arguments for COPY.
both insert and copy appear to work in natural ways for me
create table b(h hstore);
insert into b(h) VALUES ('a=>1,b=>2'::hstore), ('c=>2,d=>3'::hstore);
select * from b;
h
--------------------
"a"=>"1", "b"=>"2"
"c"=>"2", "d"=>"3"
(2 rows)
$ cat > /tmp/t.tsv
a=>1,b=>2
c=>2,d=>3
^d
copy b(h) from '/tmp/t.tsv';
select * from b;
h
--------------------
"a"=>"1", "b"=>"2"
"c"=>"2", "d"=>"3"
"a"=>"1", "b"=>"2"
"c"=>"2", "d"=>"3"
(4 rows)
You can definitely do this with the copy binary command.
I am not aware of a python lib that can do this, but I have a ruby one that can help you understand the column encodings.
https://github.com/pbrumm/pg_data_encoder