I have a code which creates 6 templates, adds data to them, merges them and export it as data. I can make it work by using F5 on different paragraphs, but I want to make the whole code work. Can someone help me, I am pretty new.
CREATE TEMP TABLE john1
(email VARCHAR(200));
COPY john1(email) from 'E:\WORK\FXJohn1.csv' DELIMITER ',' CSV HEADER
CREATE TEMP TABLE john2
(email VARCHAR(200));
COPY john2(email) from 'E:\WORK\FXJohn2.csv' DELIMITER ',' CSV HEADER
CREATE TEMP TABLE john3
(email VARCHAR(200));
COPY john3(email) from 'E:\WORK\FXJohn3.csv' DELIMITER ',' CSV HEADER
CREATE TEMP TABLE john4
(email VARCHAR(200));
COPY john4(email) from 'E:\WORK\FXJohn4.csv' DELIMITER ',' CSV HEADER
CREATE TEMP TABLE john5
(email VARCHAR(200));
COPY john5(email) from 'E:\WORK\FXJohn5.csv' DELIMITER ',' CSV HEADER
CREATE TEMP TABLE john6
(email VARCHAR(200));
COPY john6(email) from 'E:\WORK\FXJohn6.csv' DELIMITER ',' CSV HEADER
CREATE TABLE finished AS
(SELECT * FROM john1
UNION
SELECT * FROM john2
UNION
SELECT * FROM john3
UNION
SELECT * FROM john4
UNION
SELECT * FROM john5
UNION
SELECT * FROM john6);
DO $func$
BEGIN
EXECUTE $$
COPY public."finished" TO 'E:\$$ || to_char(CURRENT_DATE, 'YYYY_MM_DD') || $$.csv' DELIMITER ',' CSV HEADER;
$$;
END;
$func$ LANGUAGE plpgsql;
#Rupert
Sorry, but for some reason this script is not running for me, I get this error:
ERROR: syntax error at or near "for" LINE 1: for x in $(ls FXJohn1*.csv);
Do I change the variables correctly?
for x in $(ls file_name*.csv);
| I change file_name to one of my .csv in the folder|
do psql -c "copy table_name from
| I change table_name to my current table name I've created|
'/path/. todir/$x' csv" db_name; done
| I change path to E:\WORK (there are my all my csv files.
Firstly you can load multiple .csv files into the same table. So let's set that up first:
CREATE TABLE finished
(
email varchar(200)
)
Then you can load multiple files from the same folder using a simple bash script:
for x in $(ls file_name*.csv);
do psql -c "copy table_name from
'/path/. todir/$x' csv" db_name; done
This saves you doing multiple 'copies' and then the multiple UNIONs.
Then you can run your script:
DO $func$
BEGIN
EXECUTE $$
COPY public."finished" TO 'E:\$$ || to_char(CURRENT_DATE,
'YYYY_MM_DD') || $$.csv'
DELIMITER ',' CSV HEADER;
$$;
END;
$func$ LANGUAGE plpgsql;
Related
In order to use COPY (in my case, from a csv file) function in PostgreSQL I need to create the destination table first.
Now, in case my table has 60 columns, for instance, it feel weird and inefficient to write manually this:
CREATE TABLE table_name(
column1 datatype,
column2 datatype,
column3 datatype,
.....
column60 datatype
Those who use PostgreSQL - how do you ger around this issue?
I usually use file_fdw extension to read data from CSV files.
But unfortunately, file_fdw is not that convenient/flexible when you solve such tasks as reading from a CSV file with many columns. CREATE TABLE will work with any number of columns, but if it doesn't correspond to the CSV file, it will fail later, when performing SELECT. So the problem of explicit creation of table remains. However, it is possible to solve it.
Here is brute-force approach that doesn't require anything except Postgres. Written in PL/pgSQL, this function tries to create a table with one single column, and attempt to SELECT from it. If it fails, it drops the table and tries again, but with 2 columns. And so on, until SELECT is OK. All columns are of type text – this is quite a limitation, but it still solves the task of having ready-to-SELECT table instead of doing manual work.
create or replace function autocreate_table_to_read_csv(
fdw_server text,
csv text,
table_name text,
max_columns_num int default 100
) returns void as $$
declare
i int;
sql text;
rec record;
begin
execute format('drop foreign table if exists %I', table_name);
for i in 1..max_columns_num loop
begin
select into sql
format('create foreign table %I (', table_name)
|| string_agg('col' || n::text || ' text', ', ')
|| format(
e') server %I options ( filename \'%s\', format \'csv\' );',
fdw_server,
csv
)
from generate_series(1, i) as g(n);
raise debug 'SQL: %', sql;
execute sql;
execute format('select * from %I limit 1;', table_name) into rec;
-- looks OK, so the number of columns corresponds to the first row of CSV file
raise info 'Table % created with % column(s). SQL: %', table_name, i, sql;
exit;
exception when others then
raise debug 'CSV has more than % column(s), making another attempt...', i;
end;
end loop;
end;
$$ language plpgsql;
Once it founds the proper number of columns, it reports about it (see raise info).
To see more details, run set client_min_messages to debug; before using the function.
Example of use:
test=# create server csv_import foreign data wrapper file_fdw;
CREATE SERVER
test=# set client_min_messages to debug;
SET
test=# select autocreate_table_to_read_csv('csv_import', '/home/nikolay/tmp/sample.csv', 'readcsv');
NOTICE: foreign table "readcsv" does not exist, skipping
DEBUG: SQL: create foreign table readcsv (col1 text) server csv_import options ( filename '/home/nikolay/tmp/sample.csv', format 'csv' );
DEBUG: CSV has more than 1 column(s), making another attempt...
DEBUG: SQL: create foreign table readcsv (col1 text, col2 text) server csv_import options ( filename '/home/nikolay/tmp/sample.csv', format 'csv' );
DEBUG: CSV has more than 2 column(s), making another attempt...
DEBUG: SQL: create foreign table readcsv (col1 text, col2 text, col3 text) server csv_import options ( filename '/home/nikolay/tmp/sample.csv', format 'csv' );
INFO: Table readcsv created with 3 column(s). SQL: create foreign table readcsv (col1 text, col2 text, col3 text) server csv_import options ( filename '/home/nikolay/tmp/sample.csv', format 'csv' );
autocreate_table_to_read_csv
------------------------------
(1 row)
test=# select * from readcsv limit 2;
col1 | col2 | col3
-------+-------+-------
1313 | xvcv | 22
fvbvb | 2434 | 4344
(2 rows)
Update: found implementation of very similar (but w/o "brute-force", requiring explicit specification of # of columns in CSV file) approach, for COPY .. FROM: How to import CSV file data into a PostgreSQL table?
P.S. Actually, this would be a really good task to improve file_fdw and COPY .. FROM capabilities of Postgres making them more flexible – for example, for postgres_fdw, there is a very handy IMPORT FOREIGN SCHEMA command, which allows very quickly define remote ("foreign") objects, just with one line – it saves a lot of efforts. Having similar thing for CSV dta would be awesome.
This is a follow-up question from this answer for "Save PL/pgSQL output from PostgreSQL to a CSV file".
I need to write a client-side CSV file using psql's \copy command. A one liner works:
db=> \copy (select 1 AS foo) to 'bar.csv' csv header
COPY 1
However, I have long queries that span several lines. I don't need to show the query, as I can't seem to extend this past one line without a parse error:
db=> \copy (
\copy: parse error at end of line
db=> \copy ( \\
\copy: parse error at end of line
db=> \copy ("
\copy: parse error at end of line
db=> \copy "(
\copy: parse error at end of line
db=> \copy \\
\copy: parse error at end of line
Is it possible to use \copy with a query that spans multiple lines? I'm using psql on Windows.
The working solution I have right now is to create a temporary view, which can be declared over multiple lines, then select from it in the \copy command, which fits comfortably on one line.
db=> CREATE TEMP VIEW v1 AS
db-> SELECT i
db-> FROM generate_series(1, 2) AS i;
CREATE VIEW
db=> \cd /path/to/a/really/deep/directory/structure/on/client
db=> \copy (SELECT * FROM v1) TO 'out.csv' csv header
COPY 2
db=> DROP VIEW v1;
DROP VIEW
We may use HEREDOC to feed multiline SQL to psql and use
# Putting the SQL using a HEREDOC
tr '\n' ' ' << SQL| \psql mydatabase
\COPY (
SELECT
provider_id,
provider_name,
...
) TO './out.tsv' WITH( DELIMITER E'\t', NULL '', )
SQL
Source: https://minhajuddin.com/2017/05/18/how-to-pass-a-multi-line-copy-sql-to-psql/
You can combine the server side COPY command with the \g psql command to produce a multi-line query to local file:
db=# COPY (
SELECT department, count(*) AS employees
FROM emp
WHERE role = 'dba'
GROUP BY department
ORDER BY employees
) TO STDOUT WITH CSV HEADER \g department_dbas.csv
COPY 5
I describe this technique in detial here https://hakibenita.com/postgresql-unknown-features#use-copy-with-multi-line-sql
I'm trying to use COPY with HEADER option but my header line in file is in different order than the column order specified in database.
Is the column name order necessary in my file ??
My code is as below:
COPY table_name (
SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'schema_name'
AND table_name = 'table_name'
)
FROM 'file.csv'
WITH DELIMITER ',' CSV HEADER;
My database table has got a different order from file.csv and i wanted to select the table order and copy data from csv to table.
You can't issue an SQL query in copy from. You can only list the columns.
If the CSV columns are in the b, a, c order then list that in the copy from command:
copy target_table (b, a, c)
from file.csv
with (delimiter ',', format csv, header)
Assuming the order of the columns we need is the one of the table from which we are copying the results, the next logical step would be to simulate a sub-query through a Bash script.
psql schema_origin -c 'COPY table_origin TO stdout' | \
psql schema_destination -c \
"$(echo 'COPY table_destination (' \
$(psql schema_origin -t -c "select string_agg(column_name, ',') \
from information_schema.columns where table_name = 'table_origin'") \
') FROM stdin')"
StackOverflow answer on COPY command
StackExchange answer on fetching column names
StackOverflow answer on fetching results as tuples
I came up with the following setup for making COPY TO/FROM successful even for quite sophisticated JSON columns:
COPY "your_schema_name.yor_table_name" (
SELECT string_agg(
quote_ident(column_name),
','
) FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'yuour_table_name'
AND TABLE_SCHEMA = 'your_schema_name'
) FROM STDIN WITH CSV DELIMITER E'\t' QUOTE '\b' ESCAPE '\';
--here rows data
\.
the most important parts:
be explicit in filtering from information_schema.columns and user also the table_schema. Otherwise, you may end up with unexpected columns when one table name occurs in multiple schemas.
use quote_ident to make sure your command does not crash if someone made poor naming of table columns using Postgres registred keywords like user or unique. Thanks to quote_ident you will get them wrapped in double-quotes what makes them safe for importing.
I also found the following setup:
QUOTE '\b' - quote with backspace
DELIMITER E'\t' - delimiter with tabs
ESCAPE '\' - and escape with a backslash
for making both COPY to and from most reliable also for dealing with sophisticated/nested JSON columns.
I want to copy a CSV file to a Postgres table. There are about 100 columns in this table, so I do not want to rewrite them if I don't have to.
I am using the \copy table from 'table.csv' delimiter ',' csv; command but without a table created I get ERROR: relation "table" does not exist. If I add a blank table I get no error, but nothing happens. I tried this command two or three times and there was no output or messages, but the table was not updated when I checked it through PGAdmin.
Is there a way to import a table with headers included like I am trying to do?
This worked. The first row had column names in it.
COPY wheat FROM 'wheat_crop_data.csv' DELIMITER ';' CSV HEADER
With the Python library pandas, you can easily create column names and infer data types from a csv file.
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('postgresql://user:pass#localhost/db_name')
df = pd.read_csv('/path/to/csv_file')
df.to_sql('pandas_db', engine)
The if_exists parameter can be set to replace or append to an existing table, e.g. df.to_sql('pandas_db', engine, if_exists='replace'). This works for additional input file types as well, docs here and here.
Alternative by terminal with no permission
The pg documentation at NOTES
say
The path will be interpreted relative to the working directory of the server process (normally the cluster's data directory), not the client's working directory.
So, gerally, using psql or any client, even in a local server, you have problems ... And, if you're expressing COPY command for other users, eg. at a Github README, the reader will have problems ...
The only way to express relative path with client permissions is using STDIN,
When STDIN or STDOUT is specified, data is transmitted via the connection between the client and the server.
as remembered here:
psql -h remotehost -d remote_mydb -U myuser -c \
"copy mytable (column1, column2) from STDIN with delimiter as ','" \
< ./relative_path/file.csv
I have been using this function for a while with no problems. You just need to provide the number columns there are in the csv file, and it will take the header names from the first row and create the table for you:
create or replace function data.load_csv_file
(
target_table text, -- name of the table that will be created
csv_file_path text,
col_count integer
)
returns void
as $$
declare
iter integer; -- dummy integer to iterate columns with
col text; -- to keep column names in each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet
begin
set schema 'data';
create table temp_table ();
-- add just enough number of columns
for iter in 1..col_count
loop
execute format ('alter table temp_table add column col_%s text;', iter);
end loop;
-- copy the data from csv file
execute format ('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_file_path);
iter := 1;
col_first := (select col_1
from temp_table
limit 1);
-- update the column names based on the first row which has the column names
for col in execute format ('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
loop
execute format ('alter table temp_table rename column col_%s to %s', iter, col);
iter := iter + 1;
end loop;
-- delete the columns row // using quote_ident or %I does not work here!?
execute format ('delete from temp_table where %s = %L', col_first, col_first);
-- change the temp table name to the name given as parameter, if not blank
if length (target_table) > 0 then
execute format ('alter table temp_table rename to %I', target_table);
end if;
end;
$$ language plpgsql;
## csv with header
$ psql -U$db_user -h$db_host -p$db_port -d DB_NAME \
-c "\COPY TB_NAME FROM 'data_sample.csv' WITH (FORMAT CSV, header);"
## csv without header
$ psql -U$db_user -h$db_host -p$db_port -d DB_NAME \
-c "\COPY TB_NAME FROM 'data_sample.csv' WITH (FORMAT CSV);"
## csv without header, specify column
$ psql -U$db_user -h$db_host -p$db_port -d DB_NAME \
-c "\COPY TB_NAME(COL1,COL2) FROM 'data_sample.csv' WITH (FORMAT CSV);"
all columns in csv should be same as table (or same as specified column)
about COPY
https://www.postgresql.org/docs/9.2/sql-copy.html
You can use d6tstack which creates the table for you and is faster than pd.to_sql() because it uses native DB import commands. It supports Postgres as well as MYSQL and MS SQL.
import pandas as pd
df = pd.read_csv('table.csv')
uri_psql = 'postgresql+psycopg2://usr:pwd#localhost/db'
d6tstack.utils.pd_to_psql(df, uri_psql, 'table')
It is also useful for importing multiple CSVs, solving data schema changes and/or preprocess with pandas (eg for dates) before writing to db, see further down in examples notebook
d6tstack.combine_csv.CombinerCSV(glob.glob('*.csv'),
apply_after_read=apply_fun).to_psql_combine(uri_psql, 'table')
I want to export a Postgres database into a CSV file. Is this possible?
If it is possible, then how can I do this? I have seen that we can convert a particular table into a CSV file but I don't know about a whole database.
I made this pl/pgsql function to create one .csv file per table (excluding views, thanks to #tarikki):
CREATE OR REPLACE FUNCTION db_to_csv(path TEXT) RETURNS void AS $$
declare
tables RECORD;
statement TEXT;
begin
FOR tables IN
SELECT (table_schema || '.' || table_name) AS schema_table
FROM information_schema.tables t INNER JOIN information_schema.schemata s
ON s.schema_name = t.table_schema
WHERE t.table_schema NOT IN ('pg_catalog', 'information_schema')
AND t.table_type NOT IN ('VIEW')
ORDER BY schema_table
LOOP
statement := 'COPY ' || tables.schema_table || ' TO ''' || path || '/' || tables.schema_table || '.csv' ||''' DELIMITER '';'' CSV HEADER';
EXECUTE statement;
END LOOP;
return;
end;
$$ LANGUAGE plpgsql;
And I use it this way:
SELECT db_to_csv('/home/user/dir');
-- this will create one csv file per table, in /home/user/dir/
You can use this at psql console:
\copy (SELECT foo,bar FROM whatever) TO '/tmp/file.csv' DELIMITER ',' CSV HEADER
Or it in bash console:
psql -P format=unaligned -P tuples_only -P fieldsep=\, -c "SELECT foo,bar FROM whatever" > output_file
Modified jlldoras brilliant answer by adding one line to prevent the script from trying to copy views:
CREATE OR REPLACE FUNCTION db_to_csv(path TEXT) RETURNS void AS $$
declare
tables RECORD;
statement TEXT;
begin
FOR tables IN
SELECT (table_schema || '.' || table_name) AS schema_table
FROM information_schema.tables t INNER JOIN information_schema.schemata s
ON s.schema_name = t.table_schema
WHERE t.table_schema NOT IN ('pg_catalog', 'information_schema', 'configuration')
AND t.table_type NOT IN ('VIEW')
ORDER BY schema_table
LOOP
statement := 'COPY ' || tables.schema_table || ' TO ''' || path || '/' || tables.schema_table || '.csv' ||''' DELIMITER '';'' CSV HEADER';
EXECUTE statement;
END LOOP;
return;
end;
$$ LANGUAGE plpgsql;
If you want to specify the database and user while exporting you can just modify the answer given by Piotr as follows
psql -P format=unaligned -P tuples_only -P fieldsep=\, -c "select * from tableName" > tableName_exp.csv -U <USER> -d <DB_NAME>
Do you want one big CSV file with data from all tables?
Probably not. You want separate files for each table or one big file with more information that can be expressed in CSV file header.
Separate files
Other answers shows how to create separate files for each table. You can query database to show you all tables with such query:
SELECT DISTINCT table_name
FROM information_schema.columns
WHERE table_schema='public'
AND position('_' in table_name) <> 1
ORDER BY 1
One big file
One big file with all tables in CSV format used by PostgreSQL COPY command can be created with pg_dump command. Output will also have all CREATE TABLE, CREATE FUNCTION etc, but with Python, Perl or similar language you can easily extract only CSV data.
I downloaded a copy of RazorSQL, opened the database server and right-clicked on the database and selected Export Tables and it gave me the option of CSV, EXCEL, SQL etc...