SPOOL - Format columns with french characters - encoding

I am creating a file from a SELECT query using sqlplus with SPOOL command. Some of the columns in my SELECT query have French characters, which are not written properly the file.
SELECT RPAD(Column1, ‘ ‘, 32 ) FROM TableX;
If the value of Column1 contains for example the character "é", then the output would have length=31 instead of 32 and the "é" char is not correctly shown in output file.
How can I format the columns so that I get proper value and length from my columns?

I found out how to resolve my formating problem.
1. The definition of selected column must be replaced from Column1 VARCHAR2(32 BYTE) to VARCHAR2(32 CHAR);
2. The charset environnemnt variable NLS_LANG must accept french characters: NLS_LANG=FRENCH_FRANCE.WE8ISO8859P15.
Thx anyway!

Related

PostgreSQL how to read csv file with decimal comma?

I try to read a csv file containing real numbers with a comma as separator. I try to read this file with \copy in psql:
\copy table FROM 'filename.csv' DELIMITER ';' CSV HEADER;
psql does not recognize the comma as decimal point.
psql:filename.sql:44: ERROR: invalid input syntax for type real: "9669,84"
CONTEXT: COPY filename, line 2, column col-3: "9669,84"
I did some googling but could not find any answer other than "change the decimal comma into a decimal point". I tried SET DECIMALSEPARATORCOMMA=ON; but that did not work. I also experimented with some encoding but I couldn't find whether encoding governs the decimal point (I got the impression it didn't).
Is there really no solution other than changing the input data?
COPY to a table where you insert the number into a varchar field. Then do something like in psql:
--Temporarily change numeric formatting to one that uses ',' as
--decimal separator.
set lc_numeric = "de_DE.UTF-8";
--Below is just an example. In your case the select would be part of
--insert into the target table. Also the first part of to_number
--would be the field from your staging table.
select to_number('9669,84', '99999D999');
9669.84
You might need to change the format string to match all the numbers. For more information on what is available see Data formatting Table 9.28. Template Patterns for Numeric Formatting.

PostgreSQL invalid byte sequence for encoding utf8 0xbf

I am importing a CSV file which is related to the properties. It has /n between the values. While trying to import it into a table, the following error shows up:
PostgreSQL invalid byte sequence for encoding utf8 0xbf
I tried by simply importing the single column only, but it is not working.
Column values will look like this:
"Job No 305385917-001: To attached Garage (Single remain).\n10305 - 132 STREET NW
Plan 23AF Blk 84 Lot 14\n2002995 LERTA LTD O/A LIR HOMES DONTON\nHENORA"
I want to import the above whole into a single column.
COPY edmonton.general_filtered (descriptive)
FROM 'D:/property_own/descriptive_details.csv'
DELIMITER ',' CSV HEADER;
Your COPY statement is correct, but your data are not in UTF8 encoding.
They are probably in Latin-1 or Windows-1252, where 0xBF is ¿.
Specify the encoding correctly, e.g.:
COPY edmonton.general_filtered (descriptive)
FROM 'D:/property_own/descriptive_details.csv'
(FORMAT 'csv', HEADER, ENCODING 'WIN1252');

How do I use Postgresql's COPY command to import a file with consecutive single quotes?

I am trying to import a TSV file into Postgresql. I have created a table:
CREATE TABLE description (
id TEXT
, effective_time DATE
, active INT
, module_id TEXT
, concept_id TEXT
, language_code TEXT
, type_id TEXT
, term TEXT
, case_significance_id TEXT
);
I have a TSV file like so:
id effectiveTime active moduleId conceptId languageCode typeId term caseSignificanceId
12118017 20170731 1 900000000000207008 6708002 en 900000000000013009 Intrauterine cordocentesis 900000000000448009
12119013 20020131 1 900000000000207008 6709005 en 900000000000013009 Gentamicin 2''-nucleotidyltransferase 900000000000020002
12119013 20170731 1 900000000000207008 6709005 en 900000000000013009 Gentamicin 2''-nucleotidyltransferase 900000000000448009
12120019 20020131 1 900000000000207008 6710000 en 900000000000013009 Nitric oxide 900000000000020002
Note that the middle two entries have two consecutive single quotes acting as the symbol for double-prime (Gentamicin 2''-nucleotidyltransferase).
If I run
psql=# \copy description FROM /path/to/foo.txt WITH DELIMITER AS E'\t';
I get ERROR: missing data for column "effective_time". I think that's because the '' is screwing up the parsing of the column boundaries.
I have tried finding and replacing the '' instances with either \'\' or '''' and using CSV QUOTE E'\'' or CSV QUOTE '''', respectively, but I get the same error.
How do I edit the file or alter the \copy command to import the file correctly?
Haleemur Ali correctly points out that the original file—whose README purports it to comprise "UTF-8 encoded tab-delimited flat files which can be imported into any database"—is in fact not tab-separated, which may be my editor's fault. It works once I fix that.

Replacing nonbreaking spaces (%A0) in Postgres

I've got some values in a varchar column that are separated by nonbreaking spaces (urlencoded %A0 instead of %20). I'm trying to replace them with spaces, but can't seem to get the syntax right:
select regexp_replace('hello world', E'\xa0', ' ')
What is the correct way to encode the character in a Postgres regexp_replace function? Or, is there a better way to do the replacement?
Replacing '\xa0' didn't work for me, possibly because my strings were in UTF-8 rather than Latin1 or other where the character is encoded directly as A0. (U+A0 is encoded with bytes C2 A0 in UTF-8)
I found it more practical to replace it as a code point (U+A0) rather than as the encoded bytes (C2 A0 or A0):
select replace('456321 ', E'\u00a0', '') -- value is E'456321\u00a0'
This may help you
select replace('Hello world', '\xa0', '')
Ref Postgresql (Current) Section 9.4. String Functions and Operators

How to convert hex characters when using Postgres COPY FROM?

I am importing data from a file to PostgreSQL database table using COPY FROM.
Some of the strings in my file contain hex characters (mostly \x0d and \x0a) and I'd like them to be converted into regular text using COPY.
My problem is that they are treated as regular text and remain in the string unchanged.
How can I get the hex values converted?
Here is a simplified example of my situation:
-- The table I am importing to
CREATE TABLE my_pg_table (
id serial NOT NULL,
value text
);
COPY my_pg_table(id, data)
FROM 'location/data.file'
WITH CSV
DELIMITER ' ' -- this is actually a tab
QUOTE ''''
ENCODING 'UTF-8'
Example file:
1 'some data'
2 'some more data \x0d'
3 'even more data \x0d\x0a'
Note: the file is tab delimited.
Now, doing:
SELECT * FROM my_pg_table
would get me results containing hex.
Additional info for context:
My task is to export data from sybase tables (many hundreds) and import to Postgres. I am using UNLOAD to export data to files like so:
UNLOAD
TABLE my_sybase_table
TO 'location/data.file'
DELIMITED BY ' ' -- this is actually a tab
BYTE ORDER MARK OFF
ENCODING 'UTF-8'
It seems to me that (for a reason I don't understand) hex is only converted when using FORMAT TEXT and FORMAT CSV will treat it as regular string.
Solving the problem in my situation:
Because I had to use TEXT I didn't have the QUOTE option anymore and because of that I couldn't have quoted strings in my files anymore. So I needed my files in a little different format and eventually used this to export my table from sybase:
UNLOAD
SELECT
COALESCE(cast(id as long varchar), '(NULL)'),
COALESCE(cast(data as long varchar), '(NULL)')
FROM my_sybase_table
TO 'location/data.file'
DELIMITED BY ' ' -- still tab delimited
BYTE ORDER MARK OFF
QUOTES OFF
ENCODING 'UTF-8'
and to import it to postgres:
COPY my_pg_table(id, data)
FROM 'location/data.file'
DELIMITER ' ' -- tab delimited
NULL '(NULL)'
ENCODING 'UTF-8'
I used (NULL), because I needed a way to differentiate between an empty string and null. I casted every column to long varchar, to make my mass export/import more convenient.
I'd be still very interested to know why hex wouldn't convert when using FORMAT CSV.