This question already has answers here:
Postgres error on insert - ERROR: invalid byte sequence for encoding "UTF8": 0x00
(7 answers)
Closed 9 years ago.
I'm trying to import a tab separated values file into a PostgreSQL database using the "COPY" command. The problem is that is fails on a line with the error message
ERROR: invalid byte sequence for encoding "UTF8": 0x00
The bad line can be found in this file.
It still fails when I try to import this single-line file.
I tried to open the file but it looks like a normal text file and I cannot find anyway to resolve this problem. The schema of the table looks like
CREATE TABLE osm_nodes (
id BIGINT,
longitude double precision,
latitude double precision,
tags TEXT
);
I use the following command to copy the file
cat bad_lines2 | psql -c "COPY osm_nodes FROM STDIN WITH DELIMITER ' '"
(Note: The delimeter above is the tab character)
I use (PostgreSQL) 9.2.3.
Thanks for your help.
I found the error. The text contains "\09" which was translated as a tab character and caused this problem. Each "\" should be escaped by "\" so that it can be inserted correctly.
Related
I try to read a csv file containing real numbers with a comma as separator. I try to read this file with \copy in psql:
\copy table FROM 'filename.csv' DELIMITER ';' CSV HEADER;
psql does not recognize the comma as decimal point.
psql:filename.sql:44: ERROR: invalid input syntax for type real: "9669,84"
CONTEXT: COPY filename, line 2, column col-3: "9669,84"
I did some googling but could not find any answer other than "change the decimal comma into a decimal point". I tried SET DECIMALSEPARATORCOMMA=ON; but that did not work. I also experimented with some encoding but I couldn't find whether encoding governs the decimal point (I got the impression it didn't).
Is there really no solution other than changing the input data?
COPY to a table where you insert the number into a varchar field. Then do something like in psql:
--Temporarily change numeric formatting to one that uses ',' as
--decimal separator.
set lc_numeric = "de_DE.UTF-8";
--Below is just an example. In your case the select would be part of
--insert into the target table. Also the first part of to_number
--would be the field from your staging table.
select to_number('9669,84', '99999D999');
9669.84
You might need to change the format string to match all the numbers. For more information on what is available see Data formatting Table 9.28. Template Patterns for Numeric Formatting.
I am importing data from a file to PostgreSQL database table using COPY FROM.
Some of the strings in my file contain hex characters (mostly \x0d and \x0a) and I'd like them to be converted into regular text using COPY.
My problem is that they are treated as regular text and remain in the string unchanged.
How can I get the hex values converted?
Here is a simplified example of my situation:
-- The table I am importing to
CREATE TABLE my_pg_table (
id serial NOT NULL,
value text
);
COPY my_pg_table(id, data)
FROM 'location/data.file'
WITH CSV
DELIMITER ' ' -- this is actually a tab
QUOTE ''''
ENCODING 'UTF-8'
Example file:
1 'some data'
2 'some more data \x0d'
3 'even more data \x0d\x0a'
Note: the file is tab delimited.
Now, doing:
SELECT * FROM my_pg_table
would get me results containing hex.
Additional info for context:
My task is to export data from sybase tables (many hundreds) and import to Postgres. I am using UNLOAD to export data to files like so:
UNLOAD
TABLE my_sybase_table
TO 'location/data.file'
DELIMITED BY ' ' -- this is actually a tab
BYTE ORDER MARK OFF
ENCODING 'UTF-8'
It seems to me that (for a reason I don't understand) hex is only converted when using FORMAT TEXT and FORMAT CSV will treat it as regular string.
Solving the problem in my situation:
Because I had to use TEXT I didn't have the QUOTE option anymore and because of that I couldn't have quoted strings in my files anymore. So I needed my files in a little different format and eventually used this to export my table from sybase:
UNLOAD
SELECT
COALESCE(cast(id as long varchar), '(NULL)'),
COALESCE(cast(data as long varchar), '(NULL)')
FROM my_sybase_table
TO 'location/data.file'
DELIMITED BY ' ' -- still tab delimited
BYTE ORDER MARK OFF
QUOTES OFF
ENCODING 'UTF-8'
and to import it to postgres:
COPY my_pg_table(id, data)
FROM 'location/data.file'
DELIMITER ' ' -- tab delimited
NULL '(NULL)'
ENCODING 'UTF-8'
I used (NULL), because I needed a way to differentiate between an empty string and null. I casted every column to long varchar, to make my mass export/import more convenient.
I'd be still very interested to know why hex wouldn't convert when using FORMAT CSV.
I am wondering what the delimiter from this .csv file is. I am trying to import the .csv via the COPY FROM Statement, but somehow it throws always an error. When I change the delimiter to E'\t' it throws an error. When I change the delimiter to '|' it throws a different error. I have been trying to import a silly .csv file for 3 days and I cannot achieve a success. I really need your help. Here is my .csv file: Download here, please
My code on postgresql looks like this:
CREATE TABLE movie
(
imdib varchar NOT NULL,
name varchar NOT NULL,
year integer,
rating float ,
votes integer,
runtime varchar ,
directors varchar ,
actors varchar ,
genres varchar
);
MY COPY Statement:
COPY movie FROM '/home/max/Schreibtisch/imdb_top100t_2015-06-18.csv' (DELIMITER E'\t', FORMAT CSV, NULL '', ENCODING 'UTF8');
When I use SHOW SERVER_ENCODING it says "UTF8". But why the hell can't postgre read the datas from the columns? I really do not get it. I use Ubuntu 64 bit, the .csv file has all the permissions it needs, postgresql has also. Please help me.
These are my errors:
ERROR: missing data for column "name"
CONTEXT: COPY movie, line 1: "tt0468569,The Dark Knight,2008,9,1440667,152 mins.,Christopher Nolan,Christian Bale|Heath Ledger|Aar..."
********** Error **********
ERROR: missing data for column "name"
SQL state: 22P04
Context: COPY movie, line 1: "tt0468569,The Dark Knight,2008,9,1440667,152 mins.,Christopher Nolan,Christian Bale|Heath Ledger|Aar..."
Use this code instead it is working fine on Linux as well on windows
\COPY movie(imdib,name,year,rating,votes,runtime,directors,actors,genres) FROM 'D:\test.csv' WITH DELIMITER '|' CSV HEADER;
and one more thing insert header in your csv file like shown below:
imdib|name|year|rating|votes|runtime|directors|actors|genres
tt0111161|The Shawshank Redemption|1994|9.3|1468273|142 mins.|Frank Darabont|Tim Robbins|Morgan Freeman
and use single byte delimiter like ',','|' etc.
Hope this will work for you ..!
The following works for me:
COPY movie (imdib,name,year,rating,votes,runtime,directors,actors,genres)
FROM 'imdb_top100t_2015-06-18.csv'
WITH (format csv, header false, delimiter E'\t', NULL '');
Unfortunately the file is invalid because on line 12011 the column year contains the value 2015 Video and thus the import fails because this can't be converted to an integer. And then further down (line 64155) there is an invalid value NA for the rating which can't be converted to a float and then one more for the votes.
But if you create the table with all varchar columns the above command worked for me.
I bet it's totaly simple and i just don't see it, but i don't get it ..
I execute the following command in DB2 command line processor:
DB2 LOAD FROM "DB_ACC_PASS_REGEXP.del" OF DEL METHOD P (1, 2, 3, 4, 5) MESSAGES "DB_ACC_PASS_REGEXP.del.msg" INSERT INTO DB_ACC_PASS_REGEXP (APP_ID,APREGEXP,EXPLAIN_TEXT,ID,OPT_KZ) NONRECOVERABLE INDEXING MODE REBUILD
Which loads the Data specified in following File into the database.
1,"[a-z]",,1,0
1,"[A-Z]",,2,0
1,"[0-9]",,3,0
1,"[!|\"|§|$|%|&|/|(|)|=|?|`|´|*|+|~|'|#|-|_|.|:|,|;|µ|<|>| |°|^]",,4,0
^
Here is the Problem
The Problem is, that only 3 of these 4 inserts will be accepted. The last one will be rejected, because DB2 Load doesn't notice the escape character before the double quotation mark.
if I change the last line to:
1,"[!|x|§|$|%|&|/|(|)|=|?|`|´|*|+|~|'|#|-|_|.|:|,|;|µ|<|>| |°|^]",,4,0
^
Here is the changed character
there is no problem ..
WHY doesn't the escape character "\" work??
edit
Okay.. I just tryed it the oracle way now and that works ... I escape " with another " so my Line looks like
1,"[!|""|§|$|%|&|/|(|)|=|?|`|´|*|+|~|'|#|-|_|.|:|,|;|µ|<|>| |°|^]",,4,0
But that's only a way to do it .. That doesn't explain why IBM offers the Backslash as an escape character (http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0008305.html)
Using LOAD with ascii / delimited files requires to tune the file type modifiers (look on Table 6 and Table 8 of the docu page you linked). I am not quite sure, but I can't remember using backslash as escape character in DB2.
You can either use another character delimiter as double quotes with chardel option or force no character delimiter with nochardel option.
BUT ...
In your case you need special characters as regular expressions, so you will always need to escape " with "" and ' with ''. I think there is no other way to get this working.
Hi guys i have question better say error question.
MY function in postgres look's like:
CREATE TEMP TABLE aljazeera
(
date character varying,
channel_name character varying,
date1 character varying,
start character varying,
program character varying,
sub character varying,
epis_title character varying
) ;
COPY aljazeera FROM '/opt/transcode/data/epg/output/CSV/ALJAZEERA/pre_g_1.csv' WITH (FORMAT CSV, DELIMITER ',', QUOTE '"');
And my file looks like :
http://www.speedyshare.com/bNcjM/pre-g-1.csv
(WARNING: possible malware, certainly nasty bundleware on download link. Only use the top link Download: pre-g-1.csv).
When I try to upload to table it say error:
ERROR: unterminated CSV quoted field
CONTEXT: COPY aljazeera, line 175: "2013-12-24,02:00:00","KRAJ PROGRAMA",,,,
2013-12-24,07:30:00,"Sportski magazin (R)",,"Sportski doga..."
I don't know where is problem. Any advice for this issue.
Once I eventually downloaded the file without that nasty download manager I could reproduce the error:
craig=> \copy aljazeera FROM 'pre_g_1.csv' WITH (FORMAT CSV, DELIMITER ',', QUOTE '"')
ERROR: unterminated CSV quoted field
CONTEXT: COPY aljazeera, line 175: "2013-12-24,02:00:00","KRAJ PROGRAMA",,,,
2013-12-24,07:30:00,"Sportski magazin (R)",,"Sportski doga..."
The error isn't actually on line 175. It is that some prior line has an unbalanced quote. By binary search it was easy to narrow that down to line 26:
2013-12-24,02:00:00","KRAJ PROGRAMA",,,,
I'm sure you can see what the problem is there. You've got a stray quote in the date.
BTW, if you ever need to link to chunks of text, you can use http://gist.github.com/, http://pastebin.com/, http://pastebin.ca/, etc. (For PostgreSQL query plans http://explain.depesz.com/ is best). That speedyshare thing is nasty.