Postgresql convert Japanese Full-Width to Half-Width - cjk

I am manipulating Japanese data and in some Japanese words, there are English words and Numbers are in.
SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1 are the examples.
I wanted to convert these English and Numbers in Full-width to half-width by throwing a function or any possible ways.
the output of the input above should be look-like "SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1"
If anyone knows the best way to start, I would appreciate it.

How about using translate() function?
-- prepare test data
CREATE TABLE address (
id integer,
name text
);
INSERT INTO address VALUES (1, 'SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1');
-- show test data
SELECT * from address;
-- convert Full-Width to Half-Width Japanese
UPDATE address SET name = translate(name,
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
);
-- see the converted data
SELECT * from address;
This code made the name column to "SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1".

Related

how to remove # character from national data type in cobol

i am facing issue while converting unicode data into national characters.
When i convert the Unicode data into national using national-of function, some junk character like # is appended after the string.
E.g
Ws-unicode pic X(200)
Ws-national pic N(600)
--let the value in Ws-Unicode is これらの変更は. getting from java end.
move function national-of ( Ws-unicode ,1208 ) to Ws-national.
--after converting value is like これらの変更は #.
i do not want the extra # character added after conversion.
please help me to find out the possible solution, i have tried to replace N'#' with space using inspect clause.
it worked well but failed in some specific scenario like if we have # in input from user end. in that case genuine # also converted to space.
Below is a snippet of code I used to convert EBCDIC to UTF. Before I was capturing string lengths, I was also getting # symbols:
STRING
FUNCTION DISPLAY-OF (
FUNCTION NATIONAL-OF (
WS-EBCDIC-STRING(1:WS-XML-EBCDIC-LENGTH)
WS-EBCDIC-CCSID
)
WS-UTF8-CCSID
)
DELIMITED BY SIZE
INTO WS-UTF8-STRING
WITH POINTER WS-XML-UTF8-LENGTH
END-STRING
SUBTRACT 1 FROM WS-XML-UTF8-LENGTH
What this code does is string the UTF8 representation of the EBCIDIC string into another variable. The WITH POINTER clause will capture the new length of the string + 1 (+ 1 because the pointer is positioned to the next position after the string ended).
Using this method, you should be able to know exactly how long second string is and use that string with the exact length.
That should remove the unwanted #s.
EDIT:
One thing I forgot to mention, in my case, the # signs were actually EBCDIC low values when viewing the actual hex on the mainframe
Use inspect with reverse and stop after first occurence of #

How to convert hex characters when using Postgres COPY FROM?

I am importing data from a file to PostgreSQL database table using COPY FROM.
Some of the strings in my file contain hex characters (mostly \x0d and \x0a) and I'd like them to be converted into regular text using COPY.
My problem is that they are treated as regular text and remain in the string unchanged.
How can I get the hex values converted?
Here is a simplified example of my situation:
-- The table I am importing to
CREATE TABLE my_pg_table (
id serial NOT NULL,
value text
);
COPY my_pg_table(id, data)
FROM 'location/data.file'
WITH CSV
DELIMITER ' ' -- this is actually a tab
QUOTE ''''
ENCODING 'UTF-8'
Example file:
1 'some data'
2 'some more data \x0d'
3 'even more data \x0d\x0a'
Note: the file is tab delimited.
Now, doing:
SELECT * FROM my_pg_table
would get me results containing hex.
Additional info for context:
My task is to export data from sybase tables (many hundreds) and import to Postgres. I am using UNLOAD to export data to files like so:
UNLOAD
TABLE my_sybase_table
TO 'location/data.file'
DELIMITED BY ' ' -- this is actually a tab
BYTE ORDER MARK OFF
ENCODING 'UTF-8'
It seems to me that (for a reason I don't understand) hex is only converted when using FORMAT TEXT and FORMAT CSV will treat it as regular string.
Solving the problem in my situation:
Because I had to use TEXT I didn't have the QUOTE option anymore and because of that I couldn't have quoted strings in my files anymore. So I needed my files in a little different format and eventually used this to export my table from sybase:
UNLOAD
SELECT
COALESCE(cast(id as long varchar), '(NULL)'),
COALESCE(cast(data as long varchar), '(NULL)')
FROM my_sybase_table
TO 'location/data.file'
DELIMITED BY ' ' -- still tab delimited
BYTE ORDER MARK OFF
QUOTES OFF
ENCODING 'UTF-8'
and to import it to postgres:
COPY my_pg_table(id, data)
FROM 'location/data.file'
DELIMITER ' ' -- tab delimited
NULL '(NULL)'
ENCODING 'UTF-8'
I used (NULL), because I needed a way to differentiate between an empty string and null. I casted every column to long varchar, to make my mass export/import more convenient.
I'd be still very interested to know why hex wouldn't convert when using FORMAT CSV.

Postgresql, treat text as numbers for getting MAX function result

Still didnt fix issue with dates written as strings here comes another problem.
I have text column where only numbers as writen (like text).
By using function MAX I get incorrect result because there 9 is bigger than 30.
Is here any inline function like VAL or CINT or something that I can compare and use textual data (only numbers) like numbers in queries like SELECT, MAX and other similar?
How than can look like in following examples:
mCmd = New OdbcCommand("SELECT MAX(myTextColumn) FROM " & myTable, mCon)
You need to use max(to_number(myTextColumn, '999999'))
More details are in the manual: http://www.postgresql.org/docs/current/static/functions-formatting.html
If all "numbers" are integers, you can also use the cast operator: max(myTextColumn::int)
If your text values are properly formatted you can simply cast them to double, e.g.: '3.14'::numeric.
If the text is not formatted according to the language settings you need to use to_number() with a format mask containing the decimal separator: to_number('3.14', '9.99')
To get the MAX works poterly you need to first convert your text field in numeric format
mCmd = New OdbcCommand("SELECT MAX(TO_NUMBER(myTextColumn, '99999')) FROM " & myTable, mCon)

to_char(number) function in postgres

i want to display/convert a number to character (of it's same length) using to_char() function .
In oracle i can write like
SELECT to_char(1234) FROM DUAL
But in postgres
SELECT to_char(1234)
is not working.
You need to supply a format mask. In PostgreSQL there is no default:
select to_char(1234, 'FM9999');
If you don't know how many digits there are, just estimate the maximum:
select to_char(1234, 'FM999999999999999999');
If the number has less digits, this won't have any side effects.
If you don't need any formatting (like decimal point, thousands separator) you can also simply cast the value to text:
select 1234::text
you have to specify a numeric format, ie:
to_char(1234, '9999')
Take a look here for more info: http://www.postgresql.org/docs/current/static/functions-formatting.html
CAST function worked for me.
SELECT CAST(integerv AS text) AS textv
FROM (
SELECT 1234 AS integerv
) x
OR
SELECT integerv::text AS textv
FROM (
SELECT 1234 AS integerv
) x
You can use:
1234||''
It works on most databases.

SPOOL - Format columns with french characters

I am creating a file from a SELECT query using sqlplus with SPOOL command. Some of the columns in my SELECT query have French characters, which are not written properly the file.
SELECT RPAD(Column1, ‘ ‘, 32 ) FROM TableX;
If the value of Column1 contains for example the character "é", then the output would have length=31 instead of 32 and the "é" char is not correctly shown in output file.
How can I format the columns so that I get proper value and length from my columns?
I found out how to resolve my formating problem.
1. The definition of selected column must be replaced from Column1 VARCHAR2(32 BYTE) to VARCHAR2(32 CHAR);
2. The charset environnemnt variable NLS_LANG must accept french characters: NLS_LANG=FRENCH_FRANCE.WE8ISO8859P15.
Thx anyway!