Add "Auto Increment Primary Key" column in Postgresql when import TXT file [duplicate] - postgresql

I have a CSV file with 10 columns. After creating a PostgreSQL table with 4 columns, I want to copy some of 10 columns into the table.
the columns of my CSV table are like:
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
the columns of my PostgreSQL table should be like:
x2 x5 x7 x10

If it is an ad hoc task
Create a temporary table with all the columns in the input file
create temporary table t (x1 integer, ... , x10 text)
Copy from the file into it:
copy t (x1, ... , x10)
from '/path/to/my_file'
with (format csv)
Now insert into the definitive table from the temp:
insert into my_table (x2, x5, x7, x10)
select x2, x5, x7, x10
from t
And drop it:
drop table t
If it is a frequent task
Use the file_fdw extension. As superuser:
create extension file_fdw;
create server my_csv foreign data wrapper file_fdw;
create foreign table my_csv (
x1 integer,
x2 text,
x3 text
) server my_csv
options (filename '/tmp/my_csv.csv', format 'csv' )
;
Grant select permission on the table to the user who will read it:
grant select on table my_csv to the_read_user;
Then whenever necessary read directly from the csv file as if it were a table:
insert into my_table (x2)
select x2
from my_csv
where x1 = 2

You can provide the columns your want to fill with the COPY command. Like so:
\copy your_table (x2,x5,x7,x10) FROM '/path/to/your-file.csv' DELIMITER ',' CSV;
Here's the doc for the COPY command.

As other answers have pointed out, it's been possible to specify columns to copy into the PG table. However, without the option to reference column names in the CSV, this had little utility apart from loading into a table where columns had a different order.
Fortunately, as of Postgres 9.3, it's possible to copy columns not only from a file or from standard input, but also from a shell command using PROGRAM:
PROGRAM
A command to execute. In COPY FROM, the input is read from standard output of the command, and in COPY TO, the output is written to the standard input of the command.
Note that the command is invoked by the shell, so if you need to pass any arguments to shell command that come from an untrusted source, you must be careful to strip or escape any special characters that might have a special meaning for the shell. For security reasons, it is best to use a fixed command string, or at least avoid passing any user input in it.
This was the missing piece that we needed for such an eagerly awaited functionality. For example, we could use this option in combination with cut (in a UNIX-based system) to select certain columns by order:
COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'cut -d "," -f 2,5,7,10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)
However, cut has several limitations when manipulating CSV's: it can't adequately manipulate strings with commas (or other delimeters) inside them and doesn't allow to select columns by name.
There are several other open source command-line tools that are better at manipulating CSV files, such as csvkit or miller. Here's an example using miller to select columns by name:
COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'mlr --csv lf cut -f x2,x5,x7,x10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)

Just arrived here on a pursuit for a solution to only load a subset of columns but apparently it's not possible. So, use awk (or cut) to extract the wanted columns to a new file new_file:
$ awk '{print $2, $5, $7, $10}' file > new_file
and load the new_file. You could pipe the output straight to psql:
$ cut -d \ -f 2,5,7,10 file |
psql -h host -U user -c "COPY table(col1,col2,col3,col4) FROM STDIN DELIMITER ' '" database
Notice COPY, not \COPY.
Update:
As it was pointed out in the comments, neither of the above examples can handle quoted delimiters in the data. The same goes for newlines, too, as awk or cut are not CSV aware. Quoted delimiters can be handled with GNU awk, though.
This is a three-column file:
$ cat file
1,"2,3",4
Using GNU awk's FPAT variable we can change the order of the fields (or get a subset of them) even when the quoted fields have field separators in them:
$ gawk 'BEGIN{FPAT="([^,]*)|(\"[^\"]+\")";OFS=","}{print $2,$1,$3}' file
"2,3",1,4
Explained:
$ gawk '
BEGIN { # instead of field separator FS
FPAT="([^,]*)|(\"[^\"]+\")" # ... we define field pattern FPAT
OFS="," # output field separator OFS
}
{
print $2,$1,$3 # change field order
# print $2 # or get a subset of fields
}' file
Notice that FPAT is GNU awk only. For other awks it's just a regular variable.

You could take James Brown's suggestion further and do, all in one line:
$ awk -F ',' '{print $2","$5","$7","$10}' file | psql -d db -c "\copy MyTable from STDIN csv header"

If the number of imported rows is not important for you as result, you could also:
create two tables:
t1 (x1 x2 x3 x4 x5 x6 x7 x8 x9 x10):with all the columns of the csv file
t2 (x2 x5 x7 x10): as you need it
then create:
a trigger function, where you insert the desired columns into t2 instead and return NULL to prevent this row being inserted in t1
a trigger for t1 (BEFORE INSERT FOR EACH ROW) that calls this function.
Especially with larger csv files BEFORE INSERT triggers are also useful to filter out rows with certain properties beforehand, and you can do type conversions as well.

To load data from spreadsheet (Excel or OpenOffice Calc) into postgreSQL:
Save the spreadsheet page as a CSV file. Prefered method is to open the spreadsheet on OpenOffice Calc and do the saving. On “Export to text file” window choose Character Set as Unicode (UTF8), Field Delimiter: “,” and Text Delimiter “ “ “. Message will be displayed saying only active sheet is saved. Note: This file has to be saved on a folder but not on desktop and have to save in UTF8 format (postgreSQL by dafault is step up for UTF8 encoding). If saved on desktop, postgreSQL will give “access denied” message and won't upload.
In PostgreSQL, create an empty table with same number of column as the spreadsheet.
Note: On each column, column-name has to be same, data type has to be same. Also, keep in mind the length of data where character varying with enough field.
Then on postgreSQL, on SQL window, put the code:
copy "ABC"."def" from E'C:\\tmp\\blabla.csv' delimiters ',' CSV HEADER;
NOTE: Here C:\\tmp is the folder where CSV-file “blabla” is saved. “ABC”.”def” is the table created on postgreSQL where "ABC" is schema and"def" is the actual table. Then do “execute query” by pressing the green button on top. “CSV HEADER” is needed when CSV table has heading at the start of every column.
If everythig is ok, no error message will be displayed and table data from CSV file will be loaded into the postgreSQL table. But if there is an error message do as following:
If error message is saying that the data is too long for a specific column, then increase the column size. This happens mostly on character and character varying column. Then run the “execute query” command again.
If error message is saying that the data type doesn't match to a particular column, then change the data type on postgreSQL table-column to match the one in CSV table.
In your case, after creating CSV file, delete the unwanted columns and match the columns in postgre table.

One quick way is to copy a table to your local directory is to:
\copy (select * from table_name) to 'data.csv' CSV;

Related

copy columns of a csv file into postgresql table

I have a CSV file with 12 - 11 - 10 or 5 columns.
After creating a PostgreSQL table with 12 columns, I want to copy this CSV into the table.
I use this request:
COPY absence(champ1, champ2, num_agent, nom_prenom_agent, code_gestion, code_service, calendrier_agent, date_absence, code_absence, heure_absence, minute_absence, periode_absence)
FROM 'C:\temp\absence\absence.csv'
DELIMITER '\'
CSV
My CSV file contains 80000 line.
Ex :
20\05\ 191\MARKEY CLAUDIE\GA0\51110\39H00\21/02/2020\1471\03\54\Matin
21\05\ 191\MARKEY CLAUDIE\GA0\51110\39H00\\8130\7H48\Formation avec repas\
30\05\ 191\MARKEY CLAUDIE\GA0\51430\39H00\\167H42\
22\9993\Temps de déplacement\98\37
when I execute the request, I get a message indicating that there is missing data for the lines with less than 12 fields.
Is there a trick?
copy is extremely fast and efficient, but less flexible because of that. Specifically it can't cope with files that have a different number of "columns" for each line.
You can either use a different import tool, or if you want to stick to built-in tools, copy the file into staging table that only has a single column, then use Postgres string functions to split the lines into the columns:
create unlogged table absence_import
(
line text
);
\COPY absence_import(line) FROM 'C:\temp\absence\absence.csv' DELIMITER E'\b' CSV
E'\b' is the "backspace" character which can't really appear in a text file, so no column splitting is taking place.
Once you have imported the file, you can split the line using string_to_array() and the insert that into the real table:
insert into absence(champ1, champ2, num_agent, nom_prenom_agent, code_gestion, code_service, calendrier_agent, date_absence, code_absence, heure_absence, minute_absence, periode_absence)
select line[1], line[2], line[3], .....
from (
select string_to_array(line, '\') as line
from absence_import
) t;
If there are non-text columns, might need to cast the values to the target data type explicitly: e.g. line[3]::int.
You can add additional expressions to deal with missing columns, e.g. something like: coalesce(line[10], 'default value')

Which delimiter to use when loading CSV data into Postgres?

I've come across a problem with loading some CSV files into my Postgres tables. I have data that looks like this:
ID,IS_ALIVE,BODY_TEXT
123,true,Hi Joe, I am looking for a new vehicle, can you help me out?
Now, the problem here is that the text in what is supposed to be the BODY_TEXT column is unstructured email data and can contain any sort of characters, and when I run the following COPY command it's failing because there are multiple , characters within the BODY_TEXT.
COPY sent from ('my_file.csv') DELIMITER ',' CSV;
How can I resolve this so that everything in the BODY_TEXT column gets loaded as-is without the load command potentially using characters within it as separators?
Additionally to the fixing the source file format you can do it by PostgreSQL itself.
Load all lines from file to temporary table:
create temporary table t (x text);
copy t from 'foo.csv';
Then you can to split each string using regexp like:
select regexp_matches(x, '^([0-9]+),(true|false),(.*)$') from t;
regexp_matches
---------------------------------------------------------------------------
{123,true,"Hi Joe, I am looking for a new vehicle, can you help me out?"}
{456,false,"Hello, honey, there is what I want to ask you."}
(2 rows)
You can use this query to load data to your destination table:
insert into sent(id, is_alive, body_text)
select x[1], x[2], x[3]
from (
select regexp_matches(x, '^([0-9]+),(true|false),(.*)$') as x
from t) t

COPY only some columns from an input CSV?

I have created a table in my database with name 'con' which has two columns with the name 'date' and 'kgs'. I am trying to extract data from this 'hi.rpt' file copied on this location 'H:Sir\data\reporting\hi.rpt' and want to store values in the table 'con' in my database.
I have tried this code in pgadmin
When I run:
COPY con (date,kgs)
FROM 'H:Sir\data\reporting\hi.rpt'
WITH DELIMITER ','
CSV HEADER
date AS 'Datum/Uhrzeit'
kgs AS 'Summe'
I get the error:
ERROR: syntax error at or near "date"
LINE 5: date AS 'Datum/Uhrzeit'
^
********** Error **********
ERROR: syntax error at or near "date"
SQL state: 42601
Character: 113
"hi.rpt" file from which i am reading the data look like this:
Datum/Uhrzeit,Sta.,Bez.,Unit,TBId,Batch,OrderNr,Mat1,Total1,Mat2,Total2,Mat3,Total3,Mat4,Total4,Mat5,Total5,Mat6,Total6,Summe
41521.512369(04.09.13 12:17:48),TB01,TB01,005,300,9553,,2,27010.47,0,0.00,0,0.00,3,1749.19,0,0.00,0,0.00,28759.66
41521.547592(04.09.13 13:08:31),TB01,TB01,005,300,9570,,2,27057.32,0,0.00,0,0.00,3,1753.34,0,0.00,0,0.00,28810.66
Is it possible to extract only two data values from 20 different type of data that i have in this 'hi.rpt' file or not?
or is there only a mistake in the syntax that i have written?
What is the correct way to write it?
I don't know where you got that syntax, but COPY doesn't take a list of column aliases like that. See the help:
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
(AS isn't one of the listed options; to see the full output run \d copy in psql, or look at the manual for the copy command online).
There is no mapping facility in COPY that lets you read only some columns of the input CSV. It'd be really useful, but nobody's had the time/interest/funding to implement it yet. It's really only one of many data transform/filtering tasks people want anyway.
PostgreSQL expects the column-list given in COPY to be in the same order, left-to-right, as what's in the CSV file, and have the same number of entries as the CSV file has columns. So if you write:
COPY con (date,kgs)
then PostgreSQL will expect an input CSV with exactly two columns. It'll use the first csv column for the "date" table column and the second csv column for the "kgs" table column. It doesn't care what the CSV headers are, they're ignored if you specify WITH (FORMAT CSV, HEADER ON), or treated as normal data rows if you don't specify HEADER.
PostgreSQL 9.4 adds FROM PROGRAM to COPY, so you could run a shell command to read the file and filter it. A simple Python or Perl script would do the job.
If it's a small file, just open a copy in the spreadsheet of your choice as a csv file, delete the unwanted columns, and save it, so only the date and kgs columns remain.
Alternately, COPY to a staging table that has all the same columns as the CSV, then do an INSERT INTO ... SELECT to transfer just the wanted data into the real target table.

Copy a few of the columns of a csv file into a table

I have a CSV file with 10 columns. After creating a PostgreSQL table with 4 columns, I want to copy some of 10 columns into the table.
the columns of my CSV table are like:
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
the columns of my PostgreSQL table should be like:
x2 x5 x7 x10
If it is an ad hoc task
Create a temporary table with all the columns in the input file
create temporary table t (x1 integer, ... , x10 text)
Copy from the file into it:
copy t (x1, ... , x10)
from '/path/to/my_file'
with (format csv)
Now insert into the definitive table from the temp:
insert into my_table (x2, x5, x7, x10)
select x2, x5, x7, x10
from t
And drop it:
drop table t
If it is a frequent task
Use the file_fdw extension. As superuser:
create extension file_fdw;
create server my_csv foreign data wrapper file_fdw;
create foreign table my_csv (
x1 integer,
x2 text,
x3 text
) server my_csv
options (filename '/tmp/my_csv.csv', format 'csv' )
;
Grant select permission on the table to the user who will read it:
grant select on table my_csv to the_read_user;
Then whenever necessary read directly from the csv file as if it were a table:
insert into my_table (x2)
select x2
from my_csv
where x1 = 2
You can provide the columns your want to fill with the COPY command. Like so:
\copy your_table (x2,x5,x7,x10) FROM '/path/to/your-file.csv' DELIMITER ',' CSV;
Here's the doc for the COPY command.
As other answers have pointed out, it's been possible to specify columns to copy into the PG table. However, without the option to reference column names in the CSV, this had little utility apart from loading into a table where columns had a different order.
Fortunately, as of Postgres 9.3, it's possible to copy columns not only from a file or from standard input, but also from a shell command using PROGRAM:
PROGRAM
A command to execute. In COPY FROM, the input is read from standard output of the command, and in COPY TO, the output is written to the standard input of the command.
Note that the command is invoked by the shell, so if you need to pass any arguments to shell command that come from an untrusted source, you must be careful to strip or escape any special characters that might have a special meaning for the shell. For security reasons, it is best to use a fixed command string, or at least avoid passing any user input in it.
This was the missing piece that we needed for such an eagerly awaited functionality. For example, we could use this option in combination with cut (in a UNIX-based system) to select certain columns by order:
COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'cut -d "," -f 2,5,7,10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)
However, cut has several limitations when manipulating CSV's: it can't adequately manipulate strings with commas (or other delimeters) inside them and doesn't allow to select columns by name.
There are several other open source command-line tools that are better at manipulating CSV files, such as csvkit or miller. Here's an example using miller to select columns by name:
COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'mlr --csv lf cut -f x2,x5,x7,x10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)
Just arrived here on a pursuit for a solution to only load a subset of columns but apparently it's not possible. So, use awk (or cut) to extract the wanted columns to a new file new_file:
$ awk '{print $2, $5, $7, $10}' file > new_file
and load the new_file. You could pipe the output straight to psql:
$ cut -d \ -f 2,5,7,10 file |
psql -h host -U user -c "COPY table(col1,col2,col3,col4) FROM STDIN DELIMITER ' '" database
Notice COPY, not \COPY.
Update:
As it was pointed out in the comments, neither of the above examples can handle quoted delimiters in the data. The same goes for newlines, too, as awk or cut are not CSV aware. Quoted delimiters can be handled with GNU awk, though.
This is a three-column file:
$ cat file
1,"2,3",4
Using GNU awk's FPAT variable we can change the order of the fields (or get a subset of them) even when the quoted fields have field separators in them:
$ gawk 'BEGIN{FPAT="([^,]*)|(\"[^\"]+\")";OFS=","}{print $2,$1,$3}' file
"2,3",1,4
Explained:
$ gawk '
BEGIN { # instead of field separator FS
FPAT="([^,]*)|(\"[^\"]+\")" # ... we define field pattern FPAT
OFS="," # output field separator OFS
}
{
print $2,$1,$3 # change field order
# print $2 # or get a subset of fields
}' file
Notice that FPAT is GNU awk only. For other awks it's just a regular variable.
You could take James Brown's suggestion further and do, all in one line:
$ awk -F ',' '{print $2","$5","$7","$10}' file | psql -d db -c "\copy MyTable from STDIN csv header"
If the number of imported rows is not important for you as result, you could also:
create two tables:
t1 (x1 x2 x3 x4 x5 x6 x7 x8 x9 x10):with all the columns of the csv file
t2 (x2 x5 x7 x10): as you need it
then create:
a trigger function, where you insert the desired columns into t2 instead and return NULL to prevent this row being inserted in t1
a trigger for t1 (BEFORE INSERT FOR EACH ROW) that calls this function.
Especially with larger csv files BEFORE INSERT triggers are also useful to filter out rows with certain properties beforehand, and you can do type conversions as well.
To load data from spreadsheet (Excel or OpenOffice Calc) into postgreSQL:
Save the spreadsheet page as a CSV file. Prefered method is to open the spreadsheet on OpenOffice Calc and do the saving. On “Export to text file” window choose Character Set as Unicode (UTF8), Field Delimiter: “,” and Text Delimiter “ “ “. Message will be displayed saying only active sheet is saved. Note: This file has to be saved on a folder but not on desktop and have to save in UTF8 format (postgreSQL by dafault is step up for UTF8 encoding). If saved on desktop, postgreSQL will give “access denied” message and won't upload.
In PostgreSQL, create an empty table with same number of column as the spreadsheet.
Note: On each column, column-name has to be same, data type has to be same. Also, keep in mind the length of data where character varying with enough field.
Then on postgreSQL, on SQL window, put the code:
copy "ABC"."def" from E'C:\\tmp\\blabla.csv' delimiters ',' CSV HEADER;
NOTE: Here C:\\tmp is the folder where CSV-file “blabla” is saved. “ABC”.”def” is the table created on postgreSQL where "ABC" is schema and"def" is the actual table. Then do “execute query” by pressing the green button on top. “CSV HEADER” is needed when CSV table has heading at the start of every column.
If everythig is ok, no error message will be displayed and table data from CSV file will be loaded into the postgreSQL table. But if there is an error message do as following:
If error message is saying that the data is too long for a specific column, then increase the column size. This happens mostly on character and character varying column. Then run the “execute query” command again.
If error message is saying that the data type doesn't match to a particular column, then change the data type on postgreSQL table-column to match the one in CSV table.
In your case, after creating CSV file, delete the unwanted columns and match the columns in postgre table.
One quick way is to copy a table to your local directory is to:
\copy (select * from table_name) to 'data.csv' CSV;

Loading large amount of data into Postgres Hstore

The hstore documentation only talks about using "insert" into hstore one row at a time.
Is there anyway to do a bulk upload of several 100k rows
which could be megabytes or Gigs into a postgres hstore.
The copy commands seems to work only for uploading csv files columns
Could someone post an example ? Preferably a solution that works with python/psycopg
The above answers seems incomplete in that if you try to copy in multiple columns including a column with an hstore type and use a comma delimiter, COPY gets confused, like:
$ cat test
1,a=>1,b=>2,a
2,c=>3,d=>4,b
3,e=>5,f=>6,c
create table b(a int4, h hstore, c varchar(10));
CREATE TABLE;
copy b(a,h,c) from 'test' CSV;
ERROR: extra data after last expected column
CONTEXT: COPY b, line 1: "1,a=>1,b=>2,a"
Similarly:
copy b(a,h,c) from 'test' DELIMITER ',';
ERROR: extra data after last expected column
CONTEXT: COPY b, line 1: "1,a=>1,b=>2,a"
This can be fixed, though, by importing as a CSV and quoting the field to be imported into hstore:
$ cat test
1,"a=>1,b=>2",a
2,"c=>3,d=>4",b
3,"e=>5,f=>6",c
copy b(a,h,c) from 'test' CSV;
COPY 3
select h from b;
h
--------------------
"a"=>"1", "b"=>"2"
"c"=>"3", "d"=>"4"
"e"=>"5", "f"=>"6"
(3 rows)
Quoting is only allowed in CSV format, so importing as a CSV is required, but you can explicitly set the field delimiter and quote character to non ',' and '"' values using the DELIMITER and QUOTE arguments for COPY.
both insert and copy appear to work in natural ways for me
create table b(h hstore);
insert into b(h) VALUES ('a=>1,b=>2'::hstore), ('c=>2,d=>3'::hstore);
select * from b;
h
--------------------
"a"=>"1", "b"=>"2"
"c"=>"2", "d"=>"3"
(2 rows)
$ cat > /tmp/t.tsv
a=>1,b=>2
c=>2,d=>3
^d
copy b(h) from '/tmp/t.tsv';
select * from b;
h
--------------------
"a"=>"1", "b"=>"2"
"c"=>"2", "d"=>"3"
"a"=>"1", "b"=>"2"
"c"=>"2", "d"=>"3"
(4 rows)
You can definitely do this with the copy binary command.
I am not aware of a python lib that can do this, but I have a ruby one that can help you understand the column encodings.
https://github.com/pbrumm/pg_data_encoder