Not all rows copy from csv into postgresql - postgresql

I have a CSV file contains over 2 mill rows.
When I using the COPY statement in postgresql, it returns a littel over 1 mill rows into postgresql.
I am using the statement below:
copy table (
columns[1],
columns[2],
columns[3],
columns[4],
columns[5],
columns[6],
columns[7],
columns[8]
)
from 'C:\Temp\co11700t_bcp\co11700t_bcp.csv' with delimiter ']' quote '"' CSV;
I have bulk-copy the data from a cmd-file, and used windows notepad to set encoding to utf-8.

Related

PostgreSQL COPY FROM issue

DROP table if exists legislators;
CREATE table legislators
(
...
)
;
COPY legislators
FROM 'C:\data\legislators.csv'
DELIMITER ',' -- It is written in different line from `FROM` clause but it raises ERROR.
CSV HEADER;
I am trying to import CSV file to PostgreSQL on HeidiSQL v11. When I execute a query written in muliple lines as above, it raises an error:
ERROR: syntax error at or near "CSV" LINE 1: CSV HEADER;
However, I found that if I write a FROM clause and DELIMITER ',' in a single line together as below, it works well.
COPY legislators
FROM 'C:\data\legislators.csv' DELIMITER ',' -- These FROM and DELIMITER should be the same line to work
CSV HEADER;
I know SQL basically ignores whitespace, but I am confused why this happens.
It would be very appreciate someone help me. Thanks.
That's just because HeidiSQL is not very smart about parsing PostgreSQL lines and gets confused. It executed the statement as two statements, which causes the error.
Use a different client with PostgreSQL.

Why does my postgresql csv export have more rows than the table?

I am trying to copy a table in a postgresql database (version 10.12) via psql. One of the rows contains strings representing xml data. When I query the database for a row count with this query I get a count of about 50,000:
select count(column) from table;
But when I try to export the data to a csv file the output has more than 1,000,000 rows! I don't understand how a csv export could have a different number of rows than the table!
This is the copy command:
\copy (select column from table) to 'directory/output.csv' with csv;
It doesn't seem to matter if I change the delimiter or quote either. I've tried using | as a delimiter and ` as a quote and the number of rows in the csv was the same. Why is the row count different in the csv export?
The row count is not different: the CSV output simply has linefeeds (LF, ASCII code 10) embedded in fields, which is expected in XML.
If you want one line per row with COPY, don't use CSV, use the text format, that is, just omit with csv. Then newlines are encoded with \n instead of being output verbatim.

SQL server Openquery equivalent to PostgresQL

Is there query equivalent to sql server's openquery or openrowset to use in postgresql to query from excel or csv ?
You can use PostgreSQL's COPY
As per doc:
COPY moves data between PostgreSQL tables and standard file-system
files. COPY TO copies the contents of a table to a file, while COPY
FROM copies data from a file to a table (appending the data to
whatever is in the table already). COPY TO can also copy the results
of a SELECT query
COPY works like this:
Importing a table from CSV
Assuming you already have a table in place with the right columns, the command is as follows
COPY tblemployee FROM '~/empsource.csv' DELIMITERS ',' CSV;
Exporting a CSV from a table.
COPY (select * from tblemployee) TO '~/exp_tblemployee.csv' DELIMITERS ',' CSV;
Its important to mention here that generally if your data is in unicode or need strict Encoding, then Always set client_encoding before running any of the above mentioned commands.
To set CLIENT_ENCODING parameter in PostgreSQL
set client_encoding to 'UTF8'
or
set client_encoding to 'latin1'
Another thing to guard against is nulls, while exporting , if some fields are null then PostgreSQL will add '/N' to represent a null field, this is fine but may cause issues if you are trying to import that data in say SQL server.
A quick fix is modify the export command by specifying what would you prefer as a null placeholder in exported CSV
COPY (select * from tblemployee ) TO '~/exp_tblemployee.csv' DELIMITERS ',' NULL as E'';
Another common requirement is import or export with the header.
Import CSV to table with Header for columns present in first row of csv file.
COPY tblemployee FROM '~/empsource.csv' DELIMITERS ',' CSV HEADER
Export a table to CSV with Headers present in the first row.
COPY (select * from tblemployee) TO '~/exp_tblemployee.csv' DELIMITERS ',' CSV HEADER

"extra data after last expected column" while trying to import a csv file into postgresql

I try to copy the content of a CSV file into my postgresql db and I get this error "extra data after last expected column".
The content of my CSV is
agency_id,agency_name,agency_url,agency_timezone,agency_lang,agency_phone
100,RATP (100),http://www.ratp.fr/,CET,,
and my postgresql command is
COPY agency (agency_name, agency_url, agency_timezone) FROM 'myFile.txt' CSV HEADER DELIMITER ',';
Here is my table
CREATE TABLE agency (
agency_id character varying,
agency_name character varying NOT NULL,
agency_url character varying NOT NULL,
agency_timezone character varying NOT NULL,
agency_lang character varying,
agency_phone character varying,
agency_fare_url character varying
);
Column | Type | Modifiers
-----------------+-------------------+-----------
agency_id | character varying |
agency_name | character varying | not null
agency_url | character varying | not null
agency_timezone | character varying | not null
agency_lang | character varying |
agency_phone | character varying |
agency_fare_url | character varying |
Now you have 7 fields.
You need to map those 6 fields from the CSV into 6 fields into the table.
You cannot map only 3 fields from csv when you have it 6 like you do in:
\COPY agency (agency_name, agency_url, agency_timezone) FROM 'myFile.txt' CSV HEADER DELIMITER ',';
All fields from the csv file need to to be mapped in the copy from command.
And since you defined csv , delimiter is default, you don't need to put it.
Not sure this counts as an answer, but I just hit this with a bunch of CSV files, and found that simply opening them in Excel and re-saving them with no changes made the error go away. IOTW there is possibly some incorrect formatting in the source file that Excel is able to clean up automatically.
This error also occurs if you have same number of columns in both postgres table and csv file, even if you have specified delimiter ',' in \copy command. You also need to specify CSV.
In my case, one of my columns contained comma separated data and I execute:
db=# \copy table1 FROM '/root/db_scripts/input_csv.csv' delimiter ','
ERROR: invalid input syntax for integer: "id"
CONTEXT: COPY quiz_quiz, line 1, column id: "id"
It worked after adding CSV:
db=# \copy table1 FROM '/root/db_scripts/input_csv.csv' delimiter ',' CSV
COPY 47871
For future visitors, when I had this problem it was because I was using a loop that wrote to the same io.StringsIO() variable before committing the query to the database (context).
If you're encountering this problem, make sure your code is like this:
for tableName in tableNames:
output = io.StringsIO()
...
output.seek(0)
cur.copy_expert(f"COPY {tableName} FROM STDIN", output)
conn.commit()
And not like this:
output = io.StringsIO()
for tableName in tableNames:
...
output.seek(0)
cur.copy_expert(f"COPY {tableName} FROM STDIN", output)
conn.commit()
I tried your example and it works fine but ....
your command from the psql command line is missing \
database=# \COPY agency FROM 'myFile.txt' CSV HEADER DELIMITER ',';
And next time please include DDL
I created DDL from the csv headers

Loading large amount of data into Postgres Hstore

The hstore documentation only talks about using "insert" into hstore one row at a time.
Is there anyway to do a bulk upload of several 100k rows
which could be megabytes or Gigs into a postgres hstore.
The copy commands seems to work only for uploading csv files columns
Could someone post an example ? Preferably a solution that works with python/psycopg
The above answers seems incomplete in that if you try to copy in multiple columns including a column with an hstore type and use a comma delimiter, COPY gets confused, like:
$ cat test
1,a=>1,b=>2,a
2,c=>3,d=>4,b
3,e=>5,f=>6,c
create table b(a int4, h hstore, c varchar(10));
CREATE TABLE;
copy b(a,h,c) from 'test' CSV;
ERROR: extra data after last expected column
CONTEXT: COPY b, line 1: "1,a=>1,b=>2,a"
Similarly:
copy b(a,h,c) from 'test' DELIMITER ',';
ERROR: extra data after last expected column
CONTEXT: COPY b, line 1: "1,a=>1,b=>2,a"
This can be fixed, though, by importing as a CSV and quoting the field to be imported into hstore:
$ cat test
1,"a=>1,b=>2",a
2,"c=>3,d=>4",b
3,"e=>5,f=>6",c
copy b(a,h,c) from 'test' CSV;
COPY 3
select h from b;
h
--------------------
"a"=>"1", "b"=>"2"
"c"=>"3", "d"=>"4"
"e"=>"5", "f"=>"6"
(3 rows)
Quoting is only allowed in CSV format, so importing as a CSV is required, but you can explicitly set the field delimiter and quote character to non ',' and '"' values using the DELIMITER and QUOTE arguments for COPY.
both insert and copy appear to work in natural ways for me
create table b(h hstore);
insert into b(h) VALUES ('a=>1,b=>2'::hstore), ('c=>2,d=>3'::hstore);
select * from b;
h
--------------------
"a"=>"1", "b"=>"2"
"c"=>"2", "d"=>"3"
(2 rows)
$ cat > /tmp/t.tsv
a=>1,b=>2
c=>2,d=>3
^d
copy b(h) from '/tmp/t.tsv';
select * from b;
h
--------------------
"a"=>"1", "b"=>"2"
"c"=>"2", "d"=>"3"
"a"=>"1", "b"=>"2"
"c"=>"2", "d"=>"3"
(4 rows)
You can definitely do this with the copy binary command.
I am not aware of a python lib that can do this, but I have a ruby one that can help you understand the column encodings.
https://github.com/pbrumm/pg_data_encoder