Install utf8 collation in PostgreSQL - postgresql

Right now I can choose Encoding : UTF8 when creating a new DB in pgAdmin4 GUI.
But, there is no option to choose utf8_general_ci as collation or character type. When I do select * from pg_collation; I dont see any collation relevant to utf8_general_ci.
Coming from a mySQL background I am confused. Do I have to install utf8-like ( eg utf8_general_ci, utf8_unicode_ci) collation in my PostgreSQL 10 or windows10?
I just want to have the equivalent of mySQL collation utf8_general_ci to PostgreSQL.
Thank you

utf8 is an encoding (how to represent unicode characters as a series of bytes), not a collation (which character goes before which).
I think the Postgres 10 collation equivalent for utf8_general_ci (or more modern utf8_unicode_ci) is called und-x-icu - this is an undefined collation (not defined for any real world language) provided by an ICU library. This collation would sort quite reasonably characters from most languages.
ICU support is a new feature added in PostgreSQL 10, so this collation isn't available for older PostgreSQL versions or when it's disabled during compilation. Before that Postgres was using operating system provided collation support, which differs between operating systems.

Related

Create database defnition equivalent from mysql to postgresql

I need to migrate a mysql table to postgresql.
I need an accent and case insensitive database.
In mysql, my database has the next definition:
CREATE DATABASE gestan
DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
How do I create an equivalent definition to postgresql?
I have read some posts, but it seems outdated.
create database getan
encoding = 'UTF-8';
There is no direct equivalent for the case insensitive collation _ci in Postgres.
You can define a new ICU collation that uses case insensitive comparison, but it can't be used as a default collation for a database, only for the collation on column level.
Related questions:
Install utf8 collation in PostgreSQL
https://dba.stackexchange.com/questions/183355
https://dba.stackexchange.com/a/256979

Problems with COLLATE in PostgreSQL 12

My problem:
I work in Windows 10 and my computer is set-up to Portuguese (pt_BR);
I'm building a database in PostgreSQL where I need certain columns to remain in Portuguese, but others to be in en_US - namely, those storing numbers and currency. I need $ instead of R$ and 1,000.00 instead of 1.000,00.
I tried to create columns this way using the COLLATE statement, as:
CREATE TABLE crm.TESTE (
prodserv_id varchar(30) NOT NULL,
prodserv_name varchar(140) NULL,
fk_prodservs_rep_acronym varchar(4) NULL,
prodserv_price numeric null collate "en_US",
CONSTRAINT pk_prodservs_prodserv_id PRIMARY KEY (prodserv_id)
);
But I get the error message:
SQL Error [42704]: ERROR: collation "en_US" for encoding "UTF8" does not exist
Database metadata shows Default Encoding: UTF8 and Collate Portuguese_Brazil.1252
It will be deployed at my ISP, which runs Linux.
Any suggestions would be greatly appreciated.
Thanks in advance.
A collation defines how strings are compared. It is not applicable to numerical data.
Moreover, PostgreSQL uses the operating system's collations, which causes problems when porting a database from Windows to other operating systems. The collation would be called English on Windows and en_US.utf8 on operating systems that use glibc.
To influence the formatting of numbers and currency symbols, set the lc_numeric and lc_monetary parameters appropriately (English on Windows, en_US elsewhere). Note that while lc_monetary affects the string representation of the money data type, these settings do not influence the string representation of numbers. You need to use to_char like this:
to_char(1000, '999G999G999D00 L');

Heroku Postgres ignores underscores when sorting

This is driving me bonkers. My Heroku Postgres (9.5.18) DB seems to be ignoring underscores when sorting results:
Query:
SELECT category FROM categories ORDER BY category ASC;
Results:
category
-------------------
z_commercial_overlay
z_district
zr_use_group
zr_uses_footnote
z_special_district
This is new to me. I've never noticed another system where underscores are not respected in sorting, and this is the first time I've noticed Postgres behaving like this.
On my local OSX box (Postgres 10.5) the results are sorted the 'normal' expected way:
category
-------------------
z_commercial_overlay
z_district
z_special_district
zr_use_group
zr_uses_footnote
UPDATE:
Based on the comments, I was able to get the correct sorting by using COLLATE "C"
SELECT category FROM categories ORDER BY category COLLATE "C" ASC;
But I don't understand why is this necessary. BOTH of the Postgres instances show the same default collation value, and all of the table columns were created the same way, with no alternate collation specified.
SHOW lc_collate;
lc_collate
-------------
en_US.UTF-8
SHOW lc_ctype;
lc_ctype
-------------
en_US.UTF-8
So why does the Heroku Postgres DB require the COLLATE declaration?
I've never encountered another system where underscores are not respected in sorting
Really? Never used one, or just never paid attention to one?
On Ubuntu 16.04 (and every other modern system I've paid attention to), the system sort tool behaves the same way as long as you are using en_US.
LC_ALL= LANG=en_US.UTF-8 sort
<produced the same order as the first one you show above)
On my local box (Postgres 10.5) the results are sorted the 'normal' expected way:
BOTH of the Postgres instances show the same collation value:
SHOW lc_collate;
lc_collate
en_US.UTF-8
That only shows the default collation for the database. The column could have been declared to use a different collation than the default:
create table categories(category text collate "C");
If your local database is supposed to be using en_US, and is not, then it is busted.

how can i change the "character type" of a database in postgresql?

I'm using postgreSQL 9.1
I've set the Collation and the Character Type of the database to Greek_Greece.1253 and I want to change it to utf8
To change the collation I should use this, right?
But how can I change the character type?
Thanks
EDIT
I ment to wright C instead of utf8. I would like to change the Collation and the Character Type to C
You cannot change default collation of an existing database. You need to CREATE DATABASE with the collation you need and then dump/restore your schema and data into it.
If you do not want to recreate the database - you can specify collation for every text collumn in your db.
Here is detailed postgres manual on collations: Collation Support.
First line of this manual page states:
LC_COLLATE and LC_CTYPE settings of a database cannot be changed after
its creation.
CREATE DATABASE, pg_dump, pg_restore

Force Postgres shift to uppercase rather than lowercase before sorting case insensitive?

I try to migrate to postgres from pervasive. In pervasive there was something like 'upper.alt' - alternative collation. I don't really know how it works, but I have to make my new postgres database to behave like pervasive with this collation.
I use Postgres 9.2.4 and utf-8 encoding and LC_COLLATE='Polish_Poland.1250' .
You can try and order with COLLATE "C". That would get what you want in your example. It has side effects though! Effectively everything is ordered according to the byte values of the encoded character.
WITH x(col) AS (
VALUES
('ABC_AAAAA')
,('ABC_BBBBB')
,('ABC_ZZZZZ')
,('ABCAAAAA')
,('ABCBBBBB')
,('ABCZZZZZ')
)
SELECT *
FROM x
ORDER BY col COLLATE "C"
This option to change the collation for individual expressions (as opposed to using a collation defined at creation time of the db) was introduced with Postgres 9.1.
More about collation in the manual here.