This is what my data looks like -
"to_claim_id" "NEW_PATIENT" "from_rend" "from_bill" "to_rend" "to_bill" "from_date" "to_date" "days_diff"
"10193136348200818391" "102657" "103325" "174597" "1830139" "17497" 20180904 20181002 28
How do I import this data into my database using \copy?
I have tried \copy public.data from '/data/test' with delimiter E'\t' csv header quote '"' but I get ERROR: value too long for type character varying(25) error.
That means at least one column in the target table public.data is type varchar(25) and a corresponding value in the CSV file has more characters.
You might change the data type of such columns (temporarily) to just varchar or text, import, and then identify and trim offending values - or just live happily ever after as you probably don't need that restriction to begin with.
Related:
Any downsides of using data type "text" for storing strings?
Related
I am using the next cmd to import a csv file to my PostgreSQL table:
\copy myTable from 'myTable_sample_nh.csv' with delimiter E'\t' null as '\x00';
ERROR: invalid input syntax for type numeric: ""
CONTEXT: COPY myTable, line 1, column salary: ""
How can I do to bypass this error? I mean, during an INSERT row operation, if I do not specify a column then the DB does not assign a value to that column. In this case, the CSV file does not have a value for a numeric column field so the intention is to assign no value to that column.
'\x00' does not mean a zero byte, it means backslash followed by x followed by
00.
Considering that, it actually works:
create table test(n numeric);
\copy test from stdin with delimiter E'\t' null as '\x00'
>> \x00
>> \.
test=> select n is null from test;
?column?
----------
t
But presumably what you meant is a null byte to represent a SQL NULL. Syntactically, it can be written as E'\x00' but in practice it's unusable: for implementation reasons, null bytes in strings are not supported by Postgres, in pretty much any context. In the context of the null specification of \copy it would be rejected even before submitting input data:
\copy test from stdin with delimiter E'\t' null as E'\x00'
ERROR: invalid byte sequence for encoding "UTF8": 0x00
The solution is to use something else that doesn't appear in the data, or the empty string: null as '' is accepted by \copy.
I am using Postgres 9.5.3(On Ubuntu 16.04) and I have a table with some timestamptz fields
...
datetime_received timestamptz NULL,
datetime_manufactured timestamptz NULL,
...
I used the following SQL command to generate CSV file:
COPY (select * from tmp_table limit 100000) TO '/tmp/aa.csv' DELIMITER ';' CSV HEADER;
and used:
COPY tmp_table FROM '/tmp/aa.csv' DELIMITER ';' CSV ENCODING 'UTF-8';
to import into the table.
The example of rows in the CSV file:
CM0030;;INV_AVAILABLE;2016-07-30 14:50:42.141+07;;2016-08-06 00:00:000+07;FAHCM00001;;123;;;;;1.000000;1.000000;;;;;;;;80000.000000;;;2016-07-30 14:59:08.959+07;2016-07-30 14:59:08.959+07;2016-07-30 14:59:08.959+07;2016-07-30 14:59:08.959+07;
But I encounter the following error when running the second command:
ERROR: invalid input syntax for type timestamp with time zone: "datetime_received"
CONTEXT: COPY inventory_item, line 1, column datetime_received: "datetime_received"
My database's timezone is:
show timezone;
TimeZone
-----------
localtime(GMT+7)
(1 row)
Is there any missing step or wrong configuration?
Any suggestions are appreciated!
The error you're seeing means that Postgres is trying (and failing) to convert the string 'datetime_received' to a timestamp value.
This is happening because COPY is trying to insert the header row into your table. You need to include a HEADER clause on the COPY FROM command, just like you did for the COPY TO.
More generally, when using COPY to move data around, you should make sure that the TO and FROM commands are using exactly the same options. Specifying ENCODING for one command and not the other can lead to errors, or silently corrupt data, if your client encoding is not UTF8.
I have a CSV file that I'm trying to load into a PostgreSQL 9.2.4 database using the COPY FROM command. In particular there is a timestamp field that is allowed to be null, however when I load "null values" (actually just "") I get the following error:
ERROR: invalid input syntax for type timestamp with time zone: ""
An example CSV file looks as follows:
id,name,joined
1,"bob","2013-10-02 15:27:44-05"
2,"jane",""
The SQL looks as follows:
CREATE TABLE "users"
(
"id" BIGSERIAL NOT NULL PRIMARY KEY,
"name" VARCHAR(255),
"joined" TIMESTAMP WITH TIME ZONE,
);
COPY "users" ("id", "name", "joined")
FROM '/path/to/data.csv'
WITH (
ENCODING 'utf-8',
HEADER 1,
FORMAT 'csv'
);
According to the documentation, null values should be represented by an empty string that cannot contain the quote character, which is double quote (") in this case:
NULL
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.
Note: When using COPY FROM, any data item that matches this string will be stored as a null value, so you should make sure that you use the same string as you used with COPY TO.
I've tried the option NULL '' but that seems to have no affect. Advice, please!
empty string without quotes works normally:
id,name,joined
1,"bob","2013-10-02 15:27:44-05"
2,"jane",
select * from users;
id | name | joined
----+------+------------------------
1 | bob | 2013-10-03 03:27:44+07
2 | jane |
maybe it would be simpler to replace "" with empty string using sed.
The FORCE_NULL option for COPY FROM in Postgres 9.4+ would be the most elegant way to solve your problem. Per documentation:
FORCE_NULL
Match the specified columns' values against the null string, even if
it has been quoted, and if a match is found set the value to NULL. In
the default case where the null string is empty, this converts a
quoted empty string into NULL. This option is allowed only in COPY
FROM, and only when using CSV format.
Of course, it converts all matching values in all columns.
In older versions, you can COPY to a temporary table with the same table layout - except data type text for the problem column. Then fix offending values and INSERT from there:
single quotes appear arround value after running copy in postgres 9.2
Could not get it to work. Ended up using this program:
http://neilb.bitbucket.org/csvfix/
With that you can replace empty fileds with other values.
So for example in your case column 3 needs to have a timestamp value, so I give it a fake one. In this case '1900-01-01 00:00:00'. if needed you can delete or filter them out once the data is imported.
$CSVFIXHOME/csvfix map -f 3 -fv '' -tv '1900-01-01 00:00:00' -rsep ',' $YOURFILE > $FILEWITHDATES
After that you can import the newly created file.
I try to copy the content of a CSV file into my postgresql db and I get this error "extra data after last expected column".
The content of my CSV is
agency_id,agency_name,agency_url,agency_timezone,agency_lang,agency_phone
100,RATP (100),http://www.ratp.fr/,CET,,
and my postgresql command is
COPY agency (agency_name, agency_url, agency_timezone) FROM 'myFile.txt' CSV HEADER DELIMITER ',';
Here is my table
CREATE TABLE agency (
agency_id character varying,
agency_name character varying NOT NULL,
agency_url character varying NOT NULL,
agency_timezone character varying NOT NULL,
agency_lang character varying,
agency_phone character varying,
agency_fare_url character varying
);
Column | Type | Modifiers
-----------------+-------------------+-----------
agency_id | character varying |
agency_name | character varying | not null
agency_url | character varying | not null
agency_timezone | character varying | not null
agency_lang | character varying |
agency_phone | character varying |
agency_fare_url | character varying |
Now you have 7 fields.
You need to map those 6 fields from the CSV into 6 fields into the table.
You cannot map only 3 fields from csv when you have it 6 like you do in:
\COPY agency (agency_name, agency_url, agency_timezone) FROM 'myFile.txt' CSV HEADER DELIMITER ',';
All fields from the csv file need to to be mapped in the copy from command.
And since you defined csv , delimiter is default, you don't need to put it.
Not sure this counts as an answer, but I just hit this with a bunch of CSV files, and found that simply opening them in Excel and re-saving them with no changes made the error go away. IOTW there is possibly some incorrect formatting in the source file that Excel is able to clean up automatically.
This error also occurs if you have same number of columns in both postgres table and csv file, even if you have specified delimiter ',' in \copy command. You also need to specify CSV.
In my case, one of my columns contained comma separated data and I execute:
db=# \copy table1 FROM '/root/db_scripts/input_csv.csv' delimiter ','
ERROR: invalid input syntax for integer: "id"
CONTEXT: COPY quiz_quiz, line 1, column id: "id"
It worked after adding CSV:
db=# \copy table1 FROM '/root/db_scripts/input_csv.csv' delimiter ',' CSV
COPY 47871
For future visitors, when I had this problem it was because I was using a loop that wrote to the same io.StringsIO() variable before committing the query to the database (context).
If you're encountering this problem, make sure your code is like this:
for tableName in tableNames:
output = io.StringsIO()
...
output.seek(0)
cur.copy_expert(f"COPY {tableName} FROM STDIN", output)
conn.commit()
And not like this:
output = io.StringsIO()
for tableName in tableNames:
...
output.seek(0)
cur.copy_expert(f"COPY {tableName} FROM STDIN", output)
conn.commit()
I tried your example and it works fine but ....
your command from the psql command line is missing \
database=# \COPY agency FROM 'myFile.txt' CSV HEADER DELIMITER ',';
And next time please include DDL
I created DDL from the csv headers
I try to insert a single line to log table, but it throws an error message . ><
The log table structure is like this:
no integer NOT NULL nextval('log_no_seq'::regclass)
ip character varying(50)
country character varying(10)
region character varying(10)
city character varying(50)
postalCode character varying(10)
taken numeric
date date
and my query:
INSERT INTO log (ip,country,region,city,postalCode,taken,date) VALUES
("24.24.24.24","US","NY","Binghamton","11111",1,"2011-11-09")
=> ERROR: column "postalcode" of relation "log" does not exist
second try query : (without postalcode)
INSERT INTO log (ip,country,region,city,taken,date) VALUES
("24.24.24.24","US","NY","11111",1,"2011-11-09")
=> ERROR: column "24.24.24.24" does not exist
I don't know what I did wrong...
And PostgreSQL does not have datetime type? (2011-11-09 11:00:10)
Try single quotes (e.g. '2011-11-09')
PostgreSQL has a "datetime" type: timestamp. Read the manual here.
The double-qutes "" are used for identifiers if you want them as is. It's best you never have to use them as #wildplasser advised.
String literals are enclosed in single quotes ''.
Start by reading the chapter Lexical Structure. It is very informative. :)
Try it rewrite in this way:
INSERT INTO log (ip,country,region,city,"postalCode",taken,date) VALUES
('24.24.24.24','US','NY','Binghamton','11111',1,'2011-11-09');
When you are using mixed case in the name of column, or reserved words (such as "column", "row" etc.), you have to use double quotes, instead of values, where you have to use a single ones, as you can see in the example.