How to convert hex characters when using Postgres COPY FROM? - postgresql

I am importing data from a file to PostgreSQL database table using COPY FROM.
Some of the strings in my file contain hex characters (mostly \x0d and \x0a) and I'd like them to be converted into regular text using COPY.
My problem is that they are treated as regular text and remain in the string unchanged.
How can I get the hex values converted?
Here is a simplified example of my situation:
-- The table I am importing to
CREATE TABLE my_pg_table (
id serial NOT NULL,
value text
);
COPY my_pg_table(id, data)
FROM 'location/data.file'
WITH CSV
DELIMITER ' ' -- this is actually a tab
QUOTE ''''
ENCODING 'UTF-8'
Example file:
1 'some data'
2 'some more data \x0d'
3 'even more data \x0d\x0a'
Note: the file is tab delimited.
Now, doing:
SELECT * FROM my_pg_table
would get me results containing hex.
Additional info for context:
My task is to export data from sybase tables (many hundreds) and import to Postgres. I am using UNLOAD to export data to files like so:
UNLOAD
TABLE my_sybase_table
TO 'location/data.file'
DELIMITED BY ' ' -- this is actually a tab
BYTE ORDER MARK OFF
ENCODING 'UTF-8'

It seems to me that (for a reason I don't understand) hex is only converted when using FORMAT TEXT and FORMAT CSV will treat it as regular string.
Solving the problem in my situation:
Because I had to use TEXT I didn't have the QUOTE option anymore and because of that I couldn't have quoted strings in my files anymore. So I needed my files in a little different format and eventually used this to export my table from sybase:
UNLOAD
SELECT
COALESCE(cast(id as long varchar), '(NULL)'),
COALESCE(cast(data as long varchar), '(NULL)')
FROM my_sybase_table
TO 'location/data.file'
DELIMITED BY ' ' -- still tab delimited
BYTE ORDER MARK OFF
QUOTES OFF
ENCODING 'UTF-8'
and to import it to postgres:
COPY my_pg_table(id, data)
FROM 'location/data.file'
DELIMITER ' ' -- tab delimited
NULL '(NULL)'
ENCODING 'UTF-8'
I used (NULL), because I needed a way to differentiate between an empty string and null. I casted every column to long varchar, to make my mass export/import more convenient.
I'd be still very interested to know why hex wouldn't convert when using FORMAT CSV.

Related

PostgreSQL how to read csv file with decimal comma?

I try to read a csv file containing real numbers with a comma as separator. I try to read this file with \copy in psql:
\copy table FROM 'filename.csv' DELIMITER ';' CSV HEADER;
psql does not recognize the comma as decimal point.
psql:filename.sql:44: ERROR: invalid input syntax for type real: "9669,84"
CONTEXT: COPY filename, line 2, column col-3: "9669,84"
I did some googling but could not find any answer other than "change the decimal comma into a decimal point". I tried SET DECIMALSEPARATORCOMMA=ON; but that did not work. I also experimented with some encoding but I couldn't find whether encoding governs the decimal point (I got the impression it didn't).
Is there really no solution other than changing the input data?
COPY to a table where you insert the number into a varchar field. Then do something like in psql:
--Temporarily change numeric formatting to one that uses ',' as
--decimal separator.
set lc_numeric = "de_DE.UTF-8";
--Below is just an example. In your case the select would be part of
--insert into the target table. Also the first part of to_number
--would be the field from your staging table.
select to_number('9669,84', '99999D999');
9669.84
You might need to change the format string to match all the numbers. For more information on what is available see Data formatting Table 9.28. Template Patterns for Numeric Formatting.

How to import in format CSV chars as in Local regional language PostgreSQL

I want to upload a CSV file with having regional language character in some of values , e.g. if format is like
FirstName,LastName,DOB,State
Rahul,Gour,25-Mar-1988,Delhi
രാഹുൽ,ഗൗർ,24-മാർ-1987,Kerala
in above format some line exists with loacal langauge (Malyalam) when i uploading this file data where this speical charater showing data as "????????????
Is there any format i can use to upload this data as it is , or we cannot be do this in PostgreSQL.
Please help.
You have to figure out what kind of encoding the CSV file uses. Most probably it is using the UTF-8 encoding. Afterwards you can just use:
copy tablename (firstname, lastname, dob, state)
from /path/to/the/file.csv
with (encoding 'UTF-8', format csv);
If the server doesn't have access to the file you can use the equivalent \copy command for psql.

How do I use Postgresql's COPY command to import a file with consecutive single quotes?

I am trying to import a TSV file into Postgresql. I have created a table:
CREATE TABLE description (
id TEXT
, effective_time DATE
, active INT
, module_id TEXT
, concept_id TEXT
, language_code TEXT
, type_id TEXT
, term TEXT
, case_significance_id TEXT
);
I have a TSV file like so:
id effectiveTime active moduleId conceptId languageCode typeId term caseSignificanceId
12118017 20170731 1 900000000000207008 6708002 en 900000000000013009 Intrauterine cordocentesis 900000000000448009
12119013 20020131 1 900000000000207008 6709005 en 900000000000013009 Gentamicin 2''-nucleotidyltransferase 900000000000020002
12119013 20170731 1 900000000000207008 6709005 en 900000000000013009 Gentamicin 2''-nucleotidyltransferase 900000000000448009
12120019 20020131 1 900000000000207008 6710000 en 900000000000013009 Nitric oxide 900000000000020002
Note that the middle two entries have two consecutive single quotes acting as the symbol for double-prime (Gentamicin 2''-nucleotidyltransferase).
If I run
psql=# \copy description FROM /path/to/foo.txt WITH DELIMITER AS E'\t';
I get ERROR: missing data for column "effective_time". I think that's because the '' is screwing up the parsing of the column boundaries.
I have tried finding and replacing the '' instances with either \'\' or '''' and using CSV QUOTE E'\'' or CSV QUOTE '''', respectively, but I get the same error.
How do I edit the file or alter the \copy command to import the file correctly?
Haleemur Ali correctly points out that the original file—whose README purports it to comprise "UTF-8 encoded tab-delimited flat files which can be imported into any database"—is in fact not tab-separated, which may be my editor's fault. It works once I fix that.

SPOOL - Format columns with french characters

I am creating a file from a SELECT query using sqlplus with SPOOL command. Some of the columns in my SELECT query have French characters, which are not written properly the file.
SELECT RPAD(Column1, ‘ ‘, 32 ) FROM TableX;
If the value of Column1 contains for example the character "é", then the output would have length=31 instead of 32 and the "é" char is not correctly shown in output file.
How can I format the columns so that I get proper value and length from my columns?
I found out how to resolve my formating problem.
1. The definition of selected column must be replaced from Column1 VARCHAR2(32 BYTE) to VARCHAR2(32 CHAR);
2. The charset environnemnt variable NLS_LANG must accept french characters: NLS_LANG=FRENCH_FRANCE.WE8ISO8859P15.
Thx anyway!

ERROR: COPY delimiter must be a single one-byte character

I want to load the data from a flat file with delimiter "~,~" into a PostgreSQL table. I have tried it as below but looks like there is a restriction for the delimiter. If COPY statement doesn't allow multiple chars for delimiter, is there any alternative to do this?
metadb=# \COPY public.CME_DATA_STAGE_TRANS FROM 'E:\Infor\Outbound_Marketing\7.2.1\EM\metadata\pgtrans.log' WITH DELIMITER AS '~,~'
ERROR: COPY delimiter must be a single one-byte character
\copy: ERROR: COPY delimiter must be a single one-byte character
If you are using Vertica, you could use E'\t'or U&'\0009'
To indicate a non-printing delimiter character (such as a tab),
specify the character in extended string syntax (E'...'). If your
database has StandardConformingStrings enabled, use a Unicode string
literal (U&'...'). For example, use either E'\t' or U&'\0009' to
specify tab as the delimiter.
Unfortunatelly there is no way to load flat file with multiple characters delimiter ~,~ in Postgres unless you want to modify source code (and recompile of course) by yourself in some (terrific) way:
/* Only single-byte delimiter strings are supported. */
if (strlen(cstate->delim) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY delimiter must be a single one-byte character")));
What you want is to preprocess your input file with some external tool, for example sed might to be best companion on GNU/Linux platfom, for example:
sed s/~,~/\\t/g inputFile
The obvious thing to do is what all other answers advised. Edit import file. I would do that, too.
However, as a proof of concept, here are two ways to accomplish this without additional tools.
1) General solution
CREATE OR REPLACE FUNCTION f_import_file(OUT my_count integer)
RETURNS integer AS
$BODY$
DECLARE
myfile text; -- read xml file into that var.
datafile text := '\path\to\file.txt'; -- !pg_read_file only accepts relative path in database dir!
BEGIN
myfile := pg_read_file(datafile, 0, 100000000); -- arbitrary 100 MB max.
INSERT INTO public.my_tbl
SELECT ('(' || regexp_split_to_table(replace(myfile, '~,~', ','), E'\n') || ')')::public.my_tbl;
-- !depending on file format, you might need additional quotes to create a valid format.
GET DIAGNOSTICS my_count = ROW_COUNT;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
This uses a number of pretty advanced features. If anybody is actually interested and needs an explanation, leave a comment to this post and I will elaborate.
2) Special case
If you can guarantee that '~' is only present in the delimiter '~,~', then you can go ahead with a plain COPY in this special case. Just treat ',' in '~,~' as an additional columns.
Say, your table looks like this:
CREATE TABLE foo (a int, b int, c int);
Then you can (in one transaction):
CREATE TEMP TABLE foo_tmp ON COMMIT DROP (
a int, tmp1 "char"
,b int, tmp2 "char"
,c int);
COPY foo_tmp FROM '\path\to\file.txt' WITH DELIMITER AS '~';
ALTER TABLE foo_tmp DROP COLUMN tmp1;
ALTER TABLE foo_tmp DROP COLUMN tmp2;
INSERT INTO foo SELECT * FROM foo_tmp;
Not quite sure if you're looking for a postgresql solution or just a general one.
If it were me, I would open up a copy of vim (or gvim) and run the commend :%s/~,~/~/g
That replaces all "~,~" with "~".
you can use a single character delimiter, open notepad press ctrl+h replace ~,~ with something will not interfere. like |