PostgreSQL 9.0 replace function not working for one character - postgresql

im working with PostgreSQl 9.0
and i have a table from which i need to replace a character with ''(blank space)
for that im using
update species set engname = replace(engname, '', '');
(this is the query image)
(image is posted)
in the case species is the table and engname is the field(character varying)..
the contens of one of the row is
" -tellifer fÂÂrthii"
even after firing the query the character is not replaced.
i have tried with
update species set sciname = regexp_replace(sciname, '', '')
but the character doesnot get replace
my database is
CREATE DATABASE myDB
WITH OWNER = Myadmin
ENCODING = 'SQL_ASCII'
TABLESPACE = pg_default
LC_COLLATE = 'C'
LC_CTYPE = 'C'
CONNECTION LIMIT = -1;
We are planning to move to UTF-8 encoding but during conversion with iconv the conversion fails because of this
so i wanted to replace the character with..
can anyone tell me how to remove that character?

this symbol can be used for more characters - so you cannot to use replace. Probably your client application uses a different encoding than database. Symbol is used to signalisation broken encoding.
Solution is using correct encoding
postgres=# select * from ff;
a
───────────────
žluťoučký kůň
(1 row)
postgres=# set client_encoding to 'latin2'; --setting wrong encoding
SET
postgres=# select * from ff; -- and you can see strange symbols
a
───────────────
�lu�ou�k� k�
(1 row)
postgres=# set client_encoding to 'utf8'; -- setting good encoding
SET
postgres=# select * from ff;
a
───────────────
žluťoučký kůň
(1 row)
Other solution is replacing national or special chars by related ascii characters
9.x has unaccent contrib module for utf or for some 8bites encoding there is function to_ascii()

Related

ERROR: requested character too large for encoding: 14844072

I am converting following line of code from Oracle to PostgreSQL.
In Oracle:
select CHR(14844072) from dual
Output:
"
"
In postgresql:
select CHR(14844072);
Getting an error:
SQL Error [54000]: ERROR: requested character too large for encoding:
14844072
The behavior of the function is different from Oracle to Postgresql.
In oracle the statement is valid. So is, for example:
select CHR(0) from dual;
While in Postgresql, you can't SELECT CHR(0):
chr(0) is disallowed because text data types cannot store that
character.
Source: https://www.postgresql.org/docs/14/functions-string.html
This is just an example. More specific: what do you expect with value 14844072? Empty string is nonsense for Postgresql.
In Oracle you have this situation:
For single-byte character sets, if n > 256, then Oracle Database returns the binary equivalent of n mod 256
For multibyte character sets, n must resolve to one entire code point
But:
Invalid code points are not validated, and the result of specifying
invalid code points is indeterminate.
Source: https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions019.htm
In PostgreSQL the function depends from encoding, but, assuming you use UTF8:
In UTF8 encoding the argument is treated as a Unicode code point. In
other multibyte encodings the argument must designate an ASCII
character
Short answer: you need to work on the application code OR build your own function, something like this (just en example):
CREATE OR REPLACE FUNCTION myCHR(integer) RETURNS TEXT
AS $$
BEGIN
IF $1 = 0 THEN
RETURN '';
ELSE
IF $1 <= 1114111 THEN --replace the number according to your encoding
RETURN CHR($1);
ELSE
RETURN '';
END IF;
END IF;
END;
$$ LANGUAGE plpgsql;
In Oracle, this function expects an UTF8 encoded value. Now 14844072 in hex is E280A8, which corresponds to the UNICODE code point hex 2028, the "line separator" character.
In PostgreSQL, chr() expexts the code point as argument. So feed it the decimal value that corresponds to hex 2028:
SELECT chr(8232);

Some Opencart's modules doesn't support UTF8 and showing ???? instead of characters

in some cases, modules for Opencart doesn't support RTL languages and UTF8 characters and it will show ????????? characters instead of your Persian/Arabic characters.what I should do with these modules to show up my characters correctly?
there is several ways:
1) Use sql query:
In this case you can use some queries like bellow:
$this->db->query("SET NAMES 'utf8'");
$this->db->query("SET CHARACTER SET utf8;");
$this->db->query("SET character_set_connection=utf8;");
You should put these queries in your database driver file. here I am using mysqli then I should put codes in mysqli.php in this directory opencart\system\library\db\mysqli.php like bellow:
public function __construct($hostname, $username, $password, $database, $port = '3306') {
$this->link = new \mysqli($hostname, $username, $password, $database, $port);
if ($this->link->connect_error) {
trigger_error('Error: Could not make a database link (' . $this->link->connect_errno . ') ' . $this->link->connect_error);
exit();
}
$this->link->query("SET NAMES 'utf8'");
$this->link->query("SET CHARACTER SET utf8");
$this->link->query("SET character_set_connection=utf8;");
$this->link->query("SET SQL_MODE = ''");
}
2) Change database charset:
But is some cases it wont solve your problem. then you should check your database Collation for all tables and columns inside tables and you should set it to utf8_general_ci.
To do this you can use ALTER TABLE YOUR_TABLE_NAME DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci; to change character set for Tables and use ALTER TABLE YOUR_TABLE_NAME CHANGE COLUMN_NAME CHARACTER SET utf8 COLLATE utf8_general_ci; to change columns character set.
Please note, if there is so many tables and columns, you can export your file to .sql format and then open it with notepad and replace all latin1 (it's my file charset, maybe it be different in your file), to utf8 and save it and use this new database file.
3) Change file format:
In this case, you should open your file with notepad and use file menu / save as, and in save as windows change encoding to UTF-8 (it mostly help if the file is using echo or print to show some strings...
Hope it helps.

Encoding and decoding in postgresql

Let's say we have a string 'a\b'.
I need to encode it first, save to file, then read it from file and puck back in db.
How to encode and decode text that has escape characters?
select encode(E'a\b'::bytea, 'base64')
"YQg="
select decode('YQg=', 'base64')
"a\010"
After decoding I am not getting back string as it was in it's original form.
You're using an E'' string (escape string) and casting to bytea. The result will be a representation of that string in your current database encoding - probably UTF-8.
E'a\b' is the character a then the character represented by the escape \b which is ordinal \x08. PostgreSQL represents this string with a hex-escape when printing to the terminal because it's a non-printable character. The string is still two characters long.
postgres=> SELECT E'a\b';
?column?
----------
a\x08
(1 row)
postgres=> SELECT length(E'a\b');
length
--------
2
(1 row)
The cast to bytea implicitly does a conversion to the current database encoding:
postgres=> SELECT E'a\b'::bytea;
bytea
--------
\x6108
(1 row)
(\x61 is the ASCII ordinal for a in most encodings).
Except you must be on an old PostgreSQL since you have bytea_output = escape, resulting in octal escape output instead:
postgres=> SELECT E'a\b'::bytea;
bytea
-------
a\010
(1 row)
You need to decode the bytea back into a text string, e.g.
convert_from(decode('YQg=', 'base64'), 'utf-8')
... and even then the nonprintable character \b will be printed as \x08 by psql. You can verify that it is really that character inside the database using another client.
BTW, what's going on would be clearer if you instead explicitly encoded it when you stored it rather than relying on a cast to bytea:
encode(convert_to(E'a\b', 'utf-8'), bytea)

bytea type & nulls, Postgres

I'm using a bytea type in PostgreSQL, which, to my understanding, contains just a series of bytes. However, I can't get it to play well with nulls. For example:
=# select length(E'aa\x00aa'::bytea);
length
--------
2
(1 row)
I was expecting 5. Also:
=# select md5(E'aa\x00aa'::bytea);
md5
----------------------------------
4124bc0a9335c27f086f24ba207a4912
(1 row)
That's the MD5 of "aa", not "aa\x00aa". Clearly, I'm Doing It Wrong, but I don't know what I'm doing wrong. I'm also on an older version of Postgres (8.1.11) for reasons outside of my control. (I'll see if this behaves the same on the latest Postgres as soon as I get home...)
Try this:
# select length(E'aa\\000aa'::bytea);
length
--------
5
Updated: Why the original didn't work? First, understand the difference between one slash and two:
pg=# select E'aa\055aa', length(E'aa\055aa') ;
?column? | length
----------+--------
aa-aa | 5
(1 row)
pg=# select E'aa\\055aa', length(E'aa\\055aa') ;
?column? | length
----------+--------
aa\055aa | 8
In the first case, I'm writing a literal string, 4 characters unescaped('a') and one escaped. The slash is consumed by the parser in a first pass, which converts the full \055
to a single char ('-' in this case).
In the second case, the first slash just escapes the second, the pair \\ is translated by the parser to a single \ and the 055 is seen as three characters.
Now, when converting a text to a bytea, escape characters (in a already parsed or produced text) are parsed/interpreted again! (Yes, this is confusing).
So, when I write
select E'aa\000aa'::bytea;
in the first parsing, the literal E'aa\000aa' is converted to an internal text with a null character in the third position (and depending on your postgresql version, the null character is interpreted as an EOS, and the text is assumed to be of length two - or in other versions an illegal string error is thrown).
Instead, when I write
select E'aa\\000aa'::bytea;
in the first parsing, the literal string "aa\000aa" (eight characters) is seen, and is asigned to a text; then in the casting to bytea, it is parsed again, and the sequence of characters '\000' is interpreted as a null byte.
IMO postgresql kind of sucks here.
You can use regular strings or dollar-quoted strings instead of escaped strings:
# select length('aa\000aa'::bytea);
length
════════
5
(1 row)
# select length($$aa\000aa$$::bytea);
length
════════
5
(1 row)
I think that dollar-quoted strings are a better option because, if the configuration parameter standard_conforming_strings is off, then PostgreSQL recognizes backslash escapes in both regular and escape string constants. However, as of PostgreSQL 9.1, the default is on, meaning that backslash escapes are recognized only in escape string constants.

How to use case insensitive pattern matching with PostgreSQL and Umlauts?

I'm trying to get PostgreSQL 8.4.3 to do case insensitive pattern matching with its ~* operator when the strings contain non-ASCII characters like German umlauts. The database, terminal, and everything else is configured to use UTF-8.
Here's the problem in a nutshell:
SELECT 'Ö' ~* 'ö'; -- false
There are other variants which do work:
SELECT 'Ö' ILIKE 'ö'; -- true
SELECT 'Ö' ~* '[Öö]'; -- true
SELECT LOWER('Ö') ~* 'ö'; -- true
None of these alternatives make me especially happy. ILIKE doesn't use regular expressions. [Öö] involves rewriting the search term. LOWER() is probably the best workaround, but I'd really like to get the ~* operator working like it's supposed to.
Thanks in advance.
This is a bug in PostgreSQL versions prior to 9.0.
It's in the 9.0 changelog: http://www.postgresql.org/docs/9.0/static/release-9-0.html#AEN99075
Here is my test in 9.0 beta2 using Ubuntu:
SELECT 'Ö' ~* 'ö';
?column?
----------
t
(1 row)
I get true with this query:
SELECT 'Ö' ~* 'ö'; -- true
But I did use version 9.0beta2 at OS X 10.5.8 with these settings:
CREATE DATABASE test
WITH OWNER = postgres
ENCODING = 'UTF8'
TABLESPACE = pg_default
LC_COLLATE = 'nl_NL.UTF-8'
LC_CTYPE = 'nl_NL.UTF-8'
CONNECTION LIMIT = -1;
Edit: Same result on version 8.3.7. Looks like you have a problem with the encoding.