Search is not working with lowercase LIKE for russian characters - postgresql

What is the reason for this query give no results:
SELECT name FROM users WHERE LOWER(name) LIKE LOWER('%после%');
When this works fine:
SELECT name FROM users WHERE LOWER(name) LIKE LOWER('%После%');
Name is 'Последователь'. If i use name like 'Post', then search works fine.
Version: PostgreSQL 11.2 (Ubuntu 11.2-100) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, 64-bit
Server and db encoding is UTF8. Client encoding is UNICODE.

The lower function works according to the database collation, so the result will vary depending on how your database is defined.
It will work well in this case:
test=# CREATE DATABASE rus TEMPLATE template0
ENCODING UTF8 LC_COLLATE "ru_RU.utf8" LC_CTYPE "ru_RU.utf8";
CREATE DATABASE
test=# \c rus
You are now connected to database "rus" as user "postgres".
rus=# SELECT LOWER('%после%') = LOWER('%После%');
?column?
----------
t
(1 row)
But it won't work with the C collation, because that does not know how to properly lower case cyrillic characters:
rus=# \c test
You are now connected to database "test" as user "postgres".
test=# DROP DATABASE rus;
DROP DATABASE
test=# CREATE DATABASE rus TEMPLATE template0
ENCODING UTF8 LC_COLLATE "C" LC_CTYPE "C";
CREATE DATABASE
test=# \c rus
You are now connected to database "rus" as user "postgres".
rus=# SELECT LOWER('%после%') = LOWER('%После%');
?column?
----------
f
(1 row)
So if you want your query to work well, make sure that you are using a collation that knows how to convert the characters to upper and lower case.

Functions LOWER and UPPER do not work correct if database has wrong locale.
Step-1. View locale of databases in Postgres:
\l
Step-2. View available locales in terminal:
locale -a
or by your language, for example ru:
locale -a | grep ru
Step-3. Use locale name from Step-2(for example ru_RU.utf8) in updating database YOUR_DATABASE_NAME in Postgres:
update pg_database set datcollate='ru_RU.utf8', datctype='ru_RU.utf8' where datname='YOUR_DATABASE_NAME';
or you can create new database with locale:
CREATE DATABASE NEW_DATABASE_NAME TEMPLATE template0 ENCODING UTF8 LC_COLLATE "ru_RU.UTF-8" LC_CTYPE "ru_RU.UTF-8";
PS My macOS has name ru_RU.UTF-8, Ubuntu has name ru_RU.utf8

Look Closer. LIKE statement is sometimes case sensitive. Depens on the program you use
The first statement is :
SELECT name FROM users WHERE LOWER(name) LIKE LOWER('%после%');
while the second is :
SELECT name FROM users WHERE LOWER(name) LIKE LOWER('%После%');
What you seek is
П оследователь
note that in the first statement you wrote with (п) while the second one is (П)

Related

locale LC_COLLATE="de_DE.UTF-8" but sorting by german words doesn't work in postgresql DB

I created a table with the settings as
CREATE DATABASE myDB ENCODING = 'UTF8' LC_COLLATE = 'de_DE.utf8' LC_CTYPE = 'de_DE.utf8' OWNER = owner --locale=de_DE;"
As I run SHOW lc_collate I get :de_DE.utf8
the operating system is ubuntu
I am working in docker, but my c library is: ldd (Ubuntu GLIBC 2.31-0ubuntu9.2) 2.31
but by sorting, all german special characters go to the end for example Ä goes after z.but I expect sorting like a,ä,b,.....z.
I tried to use another image with the command :
RUN localedef -i de_DE -c -f UTF-8 -A /usr/share/locale/locale.alias de_DE.UTF-8
and it worked
but as I run the same command with my current image which can not be changed, I get error
every help would be appreciated
Maybe PostgreSQL has some different DE collations. For viewing all Postgres Collations list you can execute this SQL script:
SELECT c.oid, c.collname, c.collcollate, c.collctype FROM pg_catalog.pg_collation c
ORDER BY c.collname
For testing I execute this script in my server and i gets this collates:
de-AT-x-icu de-AT de-AT
de-BE-x-icu de-BE de-BE
de-CH-x-icu de-CH de-CH
de-DE-x-icu de-DE de-DE
de-IT-x-icu de-IT de-IT
de-LI-x-icu de-LI de-LI
de-LU-x-icu de-LU de-LU
de-x-icu de de
Somebodies don't know that you can create a database with customs you needed collating. In PostgreSQL when you are using standard CREATE DATABASE command you can not set any custom type of collates to Database, database collate gets automatically during creating process from the Regional Parameters of Operation System. But, when you create a database from template0 then you can select any custom collating. For example:
CREATE DATABASE test_db
WITH OWNER "postgres"
ENCODING 'UTF8'
LC_COLLATE = 'de-DE'
LC_CTYPE = 'de-DE'
TEMPLATE template0;

Invalid local name: C.UTF-8 even though the collname exist in pg_collation.collname

As this might have something to do with AWS Lightsail, I've cross posted this question on AWS - Click Here
I'm trying to create a template database using
CREATE DATABASE __edgedbtpl__ OWNER='edgedb' IS_TEMPLATE = TRUE TEMPLATE='template0' ENCODING='UTF8' LC_COLLATE='C' LC_CTYPE='C.UTF-8';
But this fails and gives me the error
ERROR: invalid locale name: "C.UTF-8"
I checked if the PostgreSQL server supports the C.UTF-8 locale, using
SELECT collname FROM pg_collation WHERE lower(replace(collname, '-', '')) = 'c.utf8' LIMIT 1;
which gives me the response
collname
----------
C.utf8
(1 row)
Question
How are the collnames in pg_collation different from SHOW LC_CTYPE and SHOW LC_COLLATE?
SHOW LC_COLLATE and SHOW LC_TYPE responded with en_US.UTF-8 and not C.UTF-8. So how should I identify if a certain locale is supported?
Collation names are identifiers, not string literals in PostgreSQL. Use double quotes instead of single quotes. Also, case and spelling matter, so use "C.utf8".

postgresql 11 - create database with ICU locale

I want to use an ICU system-insensitive sorting collation, to avoid sorting differences between postgres11-on-mac vs postgres11-on-Ubuntu. My first test was to dump out my existing Collate=en_US.UTF-8 and pg_restore them into a db created with Collate=en-US-x-icu
Create Database doc has this to say:
To create a database music with a different locale:
CREATE DATABASE music
LC_COLLATE 'sv_SE.utf8' LC_CTYPE 'sv_SE.utf8'
TEMPLATE template0;
I seem to have the required icu locales already:
select collname, collprovider from pg_collation where collname like 'en_US%';
collname | collprovider
------------------------+--------------
en_US.UTF-8 | c
en_US | c
en_US.ISO8859-15 | c
en_US.ISO8859-1 | c
en_US | c
en_US | c
en-US-x-icu | i 👈
en-US-u-va-posix-x-icu | i 👈
(8 rows)
But no luck when creating a database with either icu locales.
CREATE DATABASE test LC_COLLATE = 'en-US-x-icu' TEMPLATE template0;
ksysdb=# CREATE DATABASE test LC_COLLATE = 'en-US-x-icu' TEMPLATE template0;
ERROR: invalid locale name: "en-US-x-icu"
I can use LC_COLLATE with other locales:
The LC_COLLATE clause does seem to come with some strings attached, such as watching your encoding and specifying an appropriate template. But it seems to give error hints w non-ICU locales.
This works, for example: CREATE DATABASE test LC_COLLATE = 'en_US' TEMPLATE template0;
and this one gives a helpful user message:
ksysdb=# CREATE DATABASE test LC_COLLATE = 'en_US.ISO8859-15' TEMPLATE template0;
ERROR: encoding "UTF8" does not match locale "en_US.ISO8859-15"
DETAIL: The chosen LC_COLLATE setting requires encoding "LATIN9".
Note: a related question, PostgreSQL 10 on Linux - LC_COLLATE locale en_US.utf-8 not valid, doesn't seem all that relevant, as the answer talks about generating an OS-level locale to fix the issue. While the ICU locales, as far as I understand, are expressly intended to be separated from the underlying OS.

Failing to create database in Postgres 9.3 with Norwegian (Bokmål) locale

I am trying to create a database using this command:
CREATE DATABASE workgroup WITH TEMPLATE = template0
ENCODING = 'UTF8'
LC_COLLATE = 'Norwegian (Bokmål)_Norway.1252'
LC_CTYPE = 'Norwegian (Bokmål)_Norway.1252';
But it fails with this error:
"ERROR: invalid locale name: "Norwegian (Bokmål)_Norway.1252"
********** Error **********
ERROR: invalid locale name: "Norwegian (Bokmål)_Norway.1252" SQL state: 42809"
I added Norwegian (Bokmål) keyboard on Windows 7, and also failed with standard 'a' (Bokmal) and without the space.
Creating the DB with this locale:
LC_COLLATE='Estonian_Estonia.1257'
LC_CTYPE='Estonian_Estonia.1257'
works fine.
I've installed Windows 9.3 Postgres with the Norwegian Bokmal locale, but then queried the database for the locales using the SQL commands:
show LC_COLLATE;
show LC_CTYPE;
SELECT *
FROM pg_settings
WHERE name ~~ 'lc%';
It returns empty data for LC_COLLATE and LC_CTYPE.
What should be the LC* values for the Norwegian (Bokmål) locale?
I had a similar problem trying a query that worked on windows and didn't work on linux:
CREATE DATABASE testingDB
WITH OWNER = postgres
ENCODING = 'UTF8'
TABLESPACE = pg_default
LC_COLLATE = 'Portuguese_Brazil.1252'
LC_CTYPE = 'Portuguese_Brazil.1252'
CONNECTION LIMIT = -1;
When I changed
'Portuguese_Brazil.1252'
to
'pt_BR.UTF8'
I then got another problem:
new collation (pt_BR.UTF8) is incompatible with the collation of the
template database (en_US.UTF-8)
which is self explained, my system and database uses another locale, so I fixed it.
you can find more about your postgres version collation here
I believe this should fix your issue:
CREATE DATABASE workgroup WITH TEMPLATE = template0
ENCODING = 'UTF8'
LC_COLLATE = 'nb_NO.UTF8'
LC_CTYPE = 'nb_NO.UTF8';
Try issuing \dOs+ in psql, it will show a list of available locales.

Can not insert German characters in Postgres

I am using UTF8 as encoding for my Postgres 8.4.11 database:
CREATE DATABASE test
WITH OWNER = postgres
ENCODING = 'UTF8'
TABLESPACE = mydata
LC_COLLATE = 'de_DE.UTF-8'
LC_CTYPE = 'de_DE.UTF-8'
CONNECTION LIMIT = -1;
ALTER DATABASE test SET default_tablespace='mydata';
ALTER DATABASE test SET temp_tablespaces=mydata;
And the output of \l
test | postgres | UTF8 | de_DE.UTF-8 | de_DE.UTF-8 |
When I try to insert a German character:
create table x(a text);
insert into x values('ä,ß,ö');
ERROR: invalid byte sequence for encoding "UTF8": 0xe42cdf
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
I am using puTTY to connect. Any idea?
The key element is the client_encoding - the encoding the server expects from your client. It has to match what is actually sent. What do you get for show client_encoding? Is it UNICODE?
Read more in the chapter Automatic Character Set Conversion Between Server and Client of the manual.
If you are using psql as client, you can set client_encoding with \encoding. Check the encoding your local system users (on Linux type locale in the shell) and set a matching client_encoding in psql. You can avoid such complications if you use the same locale on your system as you use as encoding for your PostgreSQL server.
If you use puTTY (on Windows), make sure to set its "Translation" accordingly. Have a look at Settings: Window - Translation. Must match client_encoding. You can right-click in a running session and chose Change Settings. You can also save these settings with your saved sessions.