Oracle
select regexp_replace(TO_SINGLE_BYTE('A'),'[^ a-zA-Z]','!!',1,0,'im') from dual;
--> Result: A
Postgres
select regexp_replace('A','[^ a-zA-Z]',' ','ig');
--> Result : !!
Question: how to get a same result between oracle and Postgres? I'd like get an result as like Oracle.
I tried to search the way to convert this function into Postgres. But the answer is there is no function in Postgres.
That letter is Unicode code point U+FF21 (Fullwidth Latin Capital Letter A) and does not match your pattern.
Use the unaccent function to convert the character to a regular A:
CREATE EXTENSION unaccent;
SELECT ascii(unaccent('A'));
ascii
-------
65
(1 row)
Related
Converting a code from oracle to postgres,But ABS() function showing different format result. How to get exact format as oracle.
ORACLE
SQL> select ABS(0.00) from dual;
0
SQL> select ABS(-2.05) from dual;
2.05
SQL> select ABS(2.50) from dual;
2.5
Postgres
select abs(0.00)
0.00
select abs(-2.05)
2.05
select abs(2.50)
2.50
The difference you observe is insignificant and is just caused by a different way to display numeric data: PostgreSQL shows as many zeros as the scale of the type decrees.
If you want numbers to appear in a certain format, always format them using to_char().
use ::REAL in postgres
use :: operator to convert decimal number containing trailing zeros to a number without additional zeros.
example:
select abs(2.50)::REAL;
The dump function in Oracle displays the internal representation of data:
DUMP returns a VARCHAR2 value containing the data type code, length in bytes, and internal representation of expr
Fore example:
SELECT DUMP(cast(1 as number ))
2 FROM DUAL;
DUMP(CAST(1ASNUMBER))
--------------------------------------------------------------------------------
Typ=2 Len=2: 193,2
SQL> SELECT DUMP(cast(1.000001 as number ))
2 FROM DUAL;
DUMP(CAST(1.000001ASNUMBER))
--------------------------------------------------------------------------------
Typ=2 Len=5: 193,2,1,1,2
It shows that the first 1 uses 2 byte for storing and the second example uses 5 bytes for storing.
I suppose the similar function in PostgreSQL is pg_typeof but it returns only the type name without information about byte usage:
SELECT pg_typeof(33);
pg_typeof
integer (1 row)
Does anybody know if there is an equivalent function in PostgreSQL?
I don't speak PostgreSQL.
However, Oracle functionality page says that there's Orafce which
implements in Postgres some of the functions from the Oracle database that are missing (or behaving differently)
It, furthermore, mentions the dump function
dump (anyexpr [, int]): Returns a text value that includes the datatype code, the length in bytes, and the internal representation of the expression
One of examples looks like this:
postgres=# select pg_catalog.dump('Pavel Stehule',10);
dump
-------------------------------------------------------------------------
Typ=25 Len=17: 68,0,0,0,80,97,118,101,108,32,83,116,101,104,117,108,101
(1 row)
To me, it looks like Oracle's dump:
SQL> select dump('Pavel Stehule') result from dual;
RESULT
--------------------------------------------------------------
Typ=96 Len=13: 80,97,118,101,108,32,83,116,101,104,117,108,101
SQL>
I presume you'll have to visit GitHub and install the package to see whether you can use it or not.
It is not a complete equivalent, but if you want to figure out the byte values used to encode a string in PostgreSQL, you can simply cast the value to bytea, which will give you the bytes in hexadecimal:
SELECT CAST ('schön' AS bytea);
This will work for strings, but not for numbers.
I'm trying to query a PostgreSQL table for rows where a column contains ASCII letters:
SELECT p.* FROM person AS p WHERE p.surname LIKE 'e%';
The database contains UTF-8 strings. For example:
Éxample
Ěxample
I need to find rows that have only ASCII e (or E). Why does the above query not work?
You can do this with the unaccent extension:
CREATE EXTENSION unaccent;
SELECT * FROM (
VALUES
('Éxample'),
('Ěxample')
) v(s)
WHERE lower(unaccent(s)) LIKE 'e%'
I am unable to convert multibyte characters in Redshift.
create table temp2 (city varchar);
insert into temp2 values('г. красноярск'); // lower value
insert into temp2 values('Г. Красноярск'); //upper value
select * from temp2 where city ilike 'Г. Красноярск'
city
-------------
Г. Красноярск
I tried like below, UTF-8 characters are converting into lower.
select lower('Г. Красноярск')
lower
-------------
г. красноярск
In vertica it is working fine with using lowerb() function.
Internally the LIKE and ILIKE operators use PostgreSQL's regular expression support.
Support for proper handling of utf-8 multibyte chars in regular expressions was added in PostgreSQL 9.2. Redshift is based on PostgreSQL 8.2 (?) and it looks like they haven't backported that support into their forked version.
See Postgresql regex to match uppercase, Unicode-aware
You can work around this, with limitations, by using LIKE lower('Г. Красноярск') instead. An expression index may be useful.
I want to format long numbers using thousand separator. It can be done using to_char function just like:
SELECT TO_CHAR(76543210.98, '999G999G990D00')
But when my PostgreSQL server with UTF-8 encoding is on Polish version of Windows such SELECT ends with:
ERROR: invalid byte sequence for encoding "UTF8": 0xa0
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
In to_char pattern G is described as: group separator (uses locale).
This SELECT works without error when server is running on Linux with Polish locale.
As a workaround I use space instead of G in format string, but I think there should be way to set thousand separator just like in Oracle:
ALTER SESSION SET NLS_NUMERIC_CHARACTERS=', ';
Is such setting available for PostgreSQL?
If you use psql, you can execute this:
\pset numericlocale
Example:
test=# create temporary table a (a numeric(20,10));
CREATE TABLE
test=# insert into a select random() * 1000000 from generate_series(1,3);
INSERT 0 3
test=# select * from a;
a
-------------------
287421.6944910590
140297.9311533270
887215.3805568810
(3 rows)
test=# \pset numericlocale
Showing locale-adjusted numeric output.
test=# select * from a;
a
--------------------
287.421,6944910590
140.297,9311533270
887.215,3805568810
(3 rows)
I'm pretty sure the error message is literally true: 0xa0 isn't a valid UTF-8 character.
My home server is running PostgreSQL on Windows XP, SP3. I can do this in psql.
sandbox=# show client_encoding;
client_encoding
-----------------
UTF8
(1 row)
sandbox=# show lc_numeric;
lc_numeric
---------------
polish_poland
(1 row)
sandbox=# SELECT TO_CHAR(76543210.98, '999G999G990D00');
to_char
-----------------
76 543 210,98
(1 row)
I don't get an error message, but I get garbage for the separator. Could this be a code page issue?
As a workaround I use space instead of
G in format string
Let's think about this. If you use a space, then on a web page the value might split at the end of a line or at the boundary of a table cell. I'd think a nonbreaking space might be a better choice.
And, in Unicode, a nonbreaking space is 0xa0. In Unicode, not in UTF8. (That is, 0xa0 can't be the first byte of a UTF8 character. See UTF-8 Bit Distribution.)
Another possibility is that your client is expecting one byte order, and the server is giving it a different byte order. Since the numbers are single-byte characters, the byte order wouldn't matter until, well, it mattered. If the client is expecting a big endian MB character, and it got a little endian MB character beginning with 0xa0, I'd expect it to die with the error message you saw. I'm not sure I have a way to test this before I go to work today.