I am writing a SQL script to copy multiple .CSV files into a postgres database like this:
COPY product(title, department) from 'ys.csv' CSV HEADER;
I have multiple files I want to copy in. I don't want:
COPY product(title, department) from 'ys1.csv' CSV HEADER;
COPY product(title, department) from 'ys2.csv' CSV HEADER;
COPY product(title, department) from 'ys3.csv' CSV HEADER;
COPY product(title, department) from 'ys4.csv' CSV HEADER;
COPY product(title, department) from 'ys5.csv' CSV HEADER;
I would like to use a for loop for this instead of multiple copy commands. Is this possible? Thanks
In linux pipe the output of the files listing to psql. Make copy use the standard input:
cat /path_to/ys*.csv | psql -c 'COPY product(title, department) from stdin CSV HEADER'
Look for the equivalent in other OSs
I tried the answer above but I got an error when working with more then one file. I think on the second file it didn't cut off the header.
This worked fot me:
# get filenames
IMPFILES=(path/FileNamepart.csv)
# import the files
for i in ${IMPFILES[#]}
do
psql -U user -d database -c "\copy TABLE_NAME from '$i' DELIMITER ';' CSV HEADER"
# move the imported file
mv $i /FilePath
done
In my case I move every file afer it is imported. If an error occours I know where to look. And I can run the script again if there are new files put in that location.
If you want to use the PROGRAM (Postgres > 9.3) keyword but you have the header in each csv file you can use awk :
COPY product(title, department) FROM PROGRAM 'awk FNR-1 ys*.csv | cat' DELIMITER ',' CSV;
Starting with Postgres 9.3, you can run a shell command using the PROGRAM keyword within the COPY command.
COPY product(title, department) from PROGRAM 'cat ys*.csv' FORMAT CSV HEADER
You can loop through the filenames using pg_ls_dir.
DO $$
DECLARE file_path TEXT; -- Path where your CSV files are
DECLARE fn_i TEXT; -- Variable to hold name of current CSV file being inserted
DECLARE mytable TEXT; -- Variable to hold name of table to insert data into
BEGIN
file_path := 'C:/Program Files/PostgreSQL/9.6/data/my_csvs/'; -- Declare the path to your CSV files. You probably need to put this in your PostgreSQL file path to avoid permission issues.
mytable := 'product(title,department)'; -- Declare table to insert data into. You can give columns too since it's just going into an execute statement.
CREATE TEMP TABLE files AS
SELECT file_path || pg_ls_dir AS fn -- get all of the files in the directory, prepending with file path
FROM pg_ls_dir(file_path);
LOOP
fn_i := (select fn from files limit 1); -- Pick the first file
raise notice 'fn: %', fn_i;
EXECUTE 'COPY ' || mytable || ' from ''' || fn_i || ''' with csv header';
DELETE FROM files WHERE fn = fn_i; -- Delete the file just inserted from the queue
EXIT WHEN (SELECT COUNT(*) FROM files) = 0;
END LOOP;
END $$;
Just one option more, using pg_ls_dir and format(). Inserting all files from 'E:\Online_Monitoring\Processed\' folder into ONLMON_T_Online_Monitoring table.
DO $$
DECLARE
directory_path VARCHAR(500);
rec RECORD;
BEGIN
directory_path := 'E:\\Online_Monitoring\\Processed\\';
FOR rec IN SELECT pg_ls_dir(directory_path) AS file_name
LOOP
EXECUTE format(
'
COPY ONLMON_T_Online_Monitoring
(
item
, storeCode
, data
)
FROM %L
WITH (FORMAT CSV, HEADER);
', directory_path || rec.file_name
);
END LOOP;
END; $$;
Related
I'm attempting to dynamically create a script that gets saved as a bat file that will be scheduled to execute daily via Windows Task Scheduler. The script performs full database backups for each Postgres database using pg_dump.
The current script is as follows:
COPY (SELECT 'pg_dump '|| datname || ' > e:\postgresbackups\FULL\' || datname || '_%date:~4,2%-%date:~7,2%-%date:~10,4%_%time:~0,2%_%time:~3,2%_%time:~6,2%.dump' FROM pg_database) TO 'E:\PostgresBackups\Script\FULL_Postgres_Backup_Job_TEST.bat' (format csv, delimiter ';');
An example of the output is as follows:
pg_dump postgres > e:\postgresbackups\FULL\postgres_%date:~4,2%-%date:~7,2%-%date:~10,4%%time:~0,2%%time:~3,2%_%time:~6,2%.dump
I need help with updating my code so that the output will include double quotes around the name of the dump file; however, when I add this to my COPY script it adds more than what is necessary to the output. I would like the output to look like the following which includes the double-quotes:
pg_dump postgres > "e:\postgresbackups\FULL\postgres_%date:~4,2%-%date:~7,2%-%date:~10,4%%time:~0,2%%time:~3,2%_%time:~6,2%.dump"
Any help would be greatly appreciated!
Thanks to #Mike Organek's comment, my issue has been resolved by switching the format from CSV to TEXT. Now when I enclose the dump filename in double quotes, the output is more of what is expected and works as intended. The only odd thing now is that in the output it creates a second backslash in the filename. My code has been updated as follows:
COPY (SELECT 'pg_dump '|| datname || ' > "e:\postgresbackups\FULL\' || datname || '_%date:~4,2%-%date:~7,2%-%date:~10,4%_%time:~0,2%_%time:~3,2%_%time:~6,2%.dump"' FROM pg_database) TO 'E:\PostgresBackups\Script\FULL_Postgres_Backup_Job.bat' (format text, delimiter ';');
An example of the output that gets created within the bat file is as follows:
pg_dump postgres > "e:\\postgresbackups\\FULL\\postgres_%date:~4,2%-%date:~7,2%-%date:~10,4%_%time:~0,2%_%time:~3,2%_%time:~6,2%.dump"
As you can see, it adds a double backslash; however, the pg_dump executes successfully!
below is my procedure executed to upload file into table and do joins etc.
CREATE OR REPLACE FUNCTION sp_product_price()
RETURNS void
LANGUAGE 'plpgsql'
COST 100
AS $BODY$
BEGIN
truncate table prd_product_data;
truncate table price_import;
COPY price_import FROM 'C:\Users\Ram\Documents\prices.csv' CSV HEADER;
truncate table active_product_price;
insert into active_product_price(key,name,price)
SELECT prd.key,prd.name,prd.price FROM prd_product_data prd JOIN price_import import ON prd.name = import.name;
raise notice 'success';
END
$BODY$;
Above procedure giving error could not open file "C:\Users\Ram\Documents\prices.csv" for reading: No such file or directory HINT: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.
I have given access to file for everyone in file properties.
I tried \copy in procedure but this gives error syntax error at or near "\" .
\copy is working when I ran cmd in psql but not in above procedure.
Is there a way to import file into table in above procedure/functions ?
The procedure and the COPY statement are running on the database server, so the file C:\Users\Ram\Documents\prices.csv must be on the database server as well (and your database user must either be a superuser or a member of pg_read_server_files).
The only way you can use COPY to load data from the client is COPY ... FROM STDIN, and you won't be able to use that in a procedure.
\copy is a psql command, not an SQL statement, so it cannot be used in a procedure either.
My suggestion is to use \copy, but to forget about the procedure.
Basically, what I want to achieve is something like this:
mypath:= '~/Desktop/' || my_variable || '.csv'
COPY (SELECT * FROM my_table) TO mypath CSV DELIMITER ',' HEADER;
Where the value of my_variable will change dynamically.
Can someone help me on this problem?
Here and here you can find two different options (either via a bash script or a sql script with variables) to solve your problem.
Since you are using windows the only viable solution is the one with the variable in the sql file. With Windows you should also be able to use the psql command line utility to execute your sqlfile and pass the path as a parameter like this:
-- Example sql file content:
COPY (SELECT * FROM my_table) TO :path CSV DELIMITER ',' HEADER;
Command line example
psql -f /tmp/file.sql -v path='~/Desktop/' || my_variable || '.csv'
Another option i just found would be to output the content of the csv to stdout and forwarding that to file on the windows cmd level.
Example code (based on this answer Linux Example code):
psql -c "COPY (SELECT * FROM my_table) TO STDOUT CSV DELIMITER ',' HEADER;" > '~/Desktop/' || my_variable || '.csv'
EDIT1 based on the new requirement that the variable comes out of the postgresql database:
I quickly built a For loop which can loop over the result of a separate query and then execute a sql query for each of the values in the result:
Example code:
DO $$
declare
result record;
BEGIN
FOR result IN Select * FROM (VALUES ('one'), ('two'), ('three')) AS t (path) LOOP
RAISE NOTICE 'path: ~/Desktop/%.csv', test.path;
END LOOP;
END; $$;
This produces the following output:
NOTICE: path: ~/Desktop/one.csv
NOTICE: path: ~/Desktop/two.csv
NOTICE: path: ~/Desktop/three.csv
You can substitue the Select * FROM (VALUES ('one'), ('two'), ('three')) AS t (path) with any query which produces the table result which contains one path per row.
Second you can substitue the RAISE NOTICE 'path: ~/Desktop/%.csv', test.path; with your copy query
You can use EXECUTE command. A working example is:
sql_copy_command= '
COPY abc_schema_name.xyz_table_name FROM ''' || masterFilePath ||''' DELIMITER '';'' CSV HEADER ' ;
RAISE NOTICE 'sql_copy_command: % ', sql_copy_command;
EXECUTE sql_copy_command;
is possible in PSQL console export file with current date on the end of the file name?
The name of the exported file should be like this table_20140710.csv is it possible to do this dynamically? - the format of the date can be different than the above it isn't so much important.
This is example what i mean:
\set curdate current_date
\copy (SELECT * FROM table) To 'C:/users/user/desktop/table_ ' || :curdate || '.csv' WITH DELIMITER AS ';' CSV HEADER
The exception of the \copy meta command not expanding variables is (meanwhile) documented
Unlike most other meta-commands, the entire remainder of the line is always taken to be the arguments of \copy, and neither variable interpolation nor backquote expansion are performed in the arguments.
To workaround you can build, store and execute the command in multiple steps (similar to the solution Clodoaldo Neto has given):
\set filename 'my fancy dynamic name'
\set command '\\copy (SELECT * FROM generate_series(1, 5)) to ' :'filename'
:command
With this, you need to double (escape) the \ in the embedded meta command. Keep in mind that \set concatenates all further arguments to the second one, so quote spaces between the arguments. You can show the command before execution (:command) with \echo :command.
As an alternative to the local \set command, you could also build the command server side with SQL (the best way depends on where the dynamic content is originating):
SELECT '\copy (SELECT * FROM generate_series(1, 5)) to ''' || :'filename' || '''' AS command \gset
Dynamically build the \copy command and store it in a file. Then execute it with \i
First set tuples only output
\t
Set the output to a file
\o 'C:/users/user/desktop/copy_command.txt'
Build the \copy command
select format(
$$\copy (select * from the_table) To 'C:/users/user/desktop/table_%s.csv' WITH DELIMITER AS ';' CSV HEADER$$
, current_date
);
Restore the output to stdout
\o
Execute the generated command from the file
\i 'C:/users/user/desktop/copy_command.txt'
I'm running PostgreSQL 9.2.6 on OS X 10.6.8. I would like to import data from a CSV file with column headers into a database. I can do this with the COPY statement, but only if I first manually create a table with a column for each column in the CSV file. Is there any way to automatically create this table based on the headers in the CSV file?
Per this question I have tried
COPY test FROM '/path/to/test.csv' CSV HEADER;
But I just get this error:
ERROR: relation "test" does not exist
And if I first create a table with no columns:
CREATE TABLE test ();
I get:
ERROR: extra data after last expected column
I can't find anything in the PostgreSQL COPY documentation about automatically creating a table. Is there some other way to automatically create a table from a CSV file with headers?
There is a very good tool that imports tables into Postgres from a csv file.
It is a command-line tool called pgfutter (with binaries for windows, linux, etc.). One of its big advantages is that it recognizes the attribute/column names as well.
The usage of the tool is simple. For example if you'd like to import myCSVfile.csv:
pgfutter --db "myDatabase" --port "5432" --user "postgres" --pw "mySecretPassword" csv myCSVfile.csv
This will create a table (called myCSVfile) with the column names taken from the csv file's header. Additionally the data types will be identified from the existing data.
A few notes: The command pgfutter varies depending on the binary you use, e.g. it could be pgfutter_windows_amd64.exe (rename it if you intend to use this command frequently). The above command has to be executed in a command line window (e.g. in Windows run cmd and ensure pgfutter is accessible). If you'd like to have a different table name add --table "myTable"; to select a particular database schema us --schema "mySchema". In case you are accessing an external database use --host "myHostDomain".
A more elaborate example of pgfutter to import myFile into myTable is this one:
pgfutter --host "localhost" --port "5432" --db "myDB" --schema "public" --table "myTable" --user "postgres" --pw "myPwd" csv myFile.csv
Most likely you will change a few data types (from text to numeric) after the import:
alter table myTable
alter column myColumn type numeric
using (trim(myColumn)::numeric)
There is a second approach, which I found here (from mmatt). Basically you call a function within Postgres (last argument specifies the number of columns).
select load_csv_file('myTable','C:/MyPath/MyFile.csv',24)
Here is mmatt's function code, which I had to modify slightly, because I am working on the public schema. (copy&paste into PgAdmin SQL Editor and run it to create the function)
CREATE OR REPLACE FUNCTION load_csv_file(
target_table text,
csv_path text,
col_count integer)
RETURNS void AS
$BODY$
declare
iter integer; -- dummy integer to iterate columns with
col text; -- variable to keep the column name at each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet
begin
set schema 'public';
create table temp_table ();
-- add just enough number of columns
for iter in 1..col_count
loop
execute format('alter table temp_table add column col_%s text;', iter);
end loop;
-- copy the data from csv file
execute format('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_path);
iter := 1;
col_first := (select col_1 from temp_table limit 1);
-- update the column names based on the first row which has the column names
for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
loop
execute format('alter table temp_table rename column col_%s to %s', iter, col);
iter := iter + 1;
end loop;
-- delete the columns row
execute format('delete from temp_table where %s = %L', col_first, col_first);
-- change the temp table name to the name given as parameter, if not blank
if length(target_table) > 0 then
execute format('alter table temp_table rename to %I', target_table);
end if;
end;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION load_csv_file(text, text, integer)
OWNER TO postgres;
Note: There is a common issue with importing text files related to encoding. The csv file should be in UTF-8 format. However, sometimes this is not quite achieved by the programs, which try to do the encoding. I have overcome this issue by opening the file in Notepad++ and converting it to ANSI and back to UTF8.
I am using csvsql to generate the table layout (it will automatically guess the format):
head -n 20 table.csv | csvsql --no-constraints --tables table_name
And then I use \COPY in psql. That's for me the fastest way to import CSV file.
You can also use sed with csvsql in order to get the desired datatype:
head -n 20 table.csv | csvsql --no-constraints --tables table_name | sed 's/DECIMAL/NUMERIC/' | sed 's/VARCHAR/TEXT/' | sed 's/DATETIME/TIMESTAMP'
Use sqlite as intermediate step.
Steps:
In the command prompt type: sqlite3
In the sqlite3 CLI type: .mode csv
.import my_csv.csv my_table
.output my_table_sql.sql
.dump my_table
Finally execute that sql in your Postgresql
You can't find anything in the COPY documentation, because COPY cannot create a table for you.
You need to do that before you can COPY to it.
I achieved it with this steps:
Convert the csv file to utf8
iconv -f ISO-8859-1 -t UTF-8 file.txt -o file.csv
Use this python script to create the sql to create table and copy
#!/usr/bin/env python3
import csv, os
#pip install python-slugify
from slugify import slugify
origem = 'file.csv'
destino = 'file.sql'
arquivo = os.path.abspath(origem)
d = open(destino,'w')
with open(origem,'r') as f:
header = f.readline().split(';')
head_cells = []
for cell in header:
value = slugify(cell,separator="_")
if value in head_cells:
value = value+'_2'
head_cells.append(value)
#cabecalho = "{}\n".format(';'.join(campos))
#print(cabecalho)
fields= []
for cell in head_cells:
fields.append(" {} text".format(cell))
table = origem.split('.')[0]
sql = "create table {} ( \n {} \n);".format(origem.split('.')[0],",\n".join(fields))
sql += "\n COPY {} FROM '{}' DELIMITER ';' CSV HEADER;".format(table,arquivo)
print(sql)
d.write(sql)
3.Run the script with
python3 importar.py
Optional: Edit the sql script to adjust the field types (all are text by default)
Run the sql script. Short for console
sudo -H -u postgres bash -c "psql mydatabase < file.sql"
Automatic creation seems to be pretty easy with Python+Pandas
Install sqlalchemy library in your Python environment
pip install SQLAlchemy==1.4.31
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password#localhost:5432/mydatabase')
df=pd.read_csv('example.csv')
df.to_sql('table_name', engine)
I haven't used it, but pgLoader (https://pgloader.io/) is recommended by the pgfutter developers (see answer above) for more complicated problems. It looks very capable.
You can create a new table in DBeaver out of a CSV.
For a single table, I did very simply, quickly and online through one of the many good converters that can be found on the web.
Just google convert csv to sql online and choose one.