Force Postgres shift to uppercase rather than lowercase before sorting case insensitive? - postgresql

I try to migrate to postgres from pervasive. In pervasive there was something like 'upper.alt' - alternative collation. I don't really know how it works, but I have to make my new postgres database to behave like pervasive with this collation.
I use Postgres 9.2.4 and utf-8 encoding and LC_COLLATE='Polish_Poland.1250' .

You can try and order with COLLATE "C". That would get what you want in your example. It has side effects though! Effectively everything is ordered according to the byte values of the encoded character.
WITH x(col) AS (
VALUES
('ABC_AAAAA')
,('ABC_BBBBB')
,('ABC_ZZZZZ')
,('ABCAAAAA')
,('ABCBBBBB')
,('ABCZZZZZ')
)
SELECT *
FROM x
ORDER BY col COLLATE "C"
This option to change the collation for individual expressions (as opposed to using a collation defined at creation time of the db) was introduced with Postgres 9.1.
More about collation in the manual here.

Related

Create database defnition equivalent from mysql to postgresql

I need to migrate a mysql table to postgresql.
I need an accent and case insensitive database.
In mysql, my database has the next definition:
CREATE DATABASE gestan
DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
How do I create an equivalent definition to postgresql?
I have read some posts, but it seems outdated.
create database getan
encoding = 'UTF-8';
There is no direct equivalent for the case insensitive collation _ci in Postgres.
You can define a new ICU collation that uses case insensitive comparison, but it can't be used as a default collation for a database, only for the collation on column level.
Related questions:
Install utf8 collation in PostgreSQL
https://dba.stackexchange.com/questions/183355
https://dba.stackexchange.com/a/256979

Install utf8 collation in PostgreSQL

Right now I can choose Encoding : UTF8 when creating a new DB in pgAdmin4 GUI.
But, there is no option to choose utf8_general_ci as collation or character type. When I do select * from pg_collation; I dont see any collation relevant to utf8_general_ci.
Coming from a mySQL background I am confused. Do I have to install utf8-like ( eg utf8_general_ci, utf8_unicode_ci) collation in my PostgreSQL 10 or windows10?
I just want to have the equivalent of mySQL collation utf8_general_ci to PostgreSQL.
Thank you
utf8 is an encoding (how to represent unicode characters as a series of bytes), not a collation (which character goes before which).
I think the Postgres 10 collation equivalent for utf8_general_ci (or more modern utf8_unicode_ci) is called und-x-icu - this is an undefined collation (not defined for any real world language) provided by an ICU library. This collation would sort quite reasonably characters from most languages.
ICU support is a new feature added in PostgreSQL 10, so this collation isn't available for older PostgreSQL versions or when it's disabled during compilation. Before that Postgres was using operating system provided collation support, which differs between operating systems.

Is it safe to change Collation on Postgres (keeping encoding)?

I have a Postgres 9.3 database which, by mistake, has been set to:
but I need it to be:
Since the Encoding doesn't change, it is safe to dump the DB and restore it later (see here) to a database with the new Collation / Character type?
Perfectly safe -- the collation is just telling Postgres which set of rules to apply when sorting text.
You can even set it dynamically on a query basis in the order by clause, and should be able to alter it without needing to dump the database.

How to make my postgresql database use a case insensitive collation?

In several SO posts OP asked for an efficient way to search text columns in a case insensitive manner.
As much as I could understand the most efficient way is to have a database with a case insensitive collation. In my case I am creating the database from scratch, so I have the perfect control on the DB collation. The only problem is that I have no idea how to define it and could not find any example of it.
Please, show me how to create a database with case insensitive collation.
I am using postgresql 9.2.4.
EDIT 1
The CITEXT extension is a good solution. However, it has some limitations, as explained in the documentation. I will certainly use it, if no better way exists.
I would like to emphasize, that I wish ALL the string operations to be case insensitive. Using CITEXT for every TEXT field is one way. However, using a case insensitive collation would be the best, if at all possible.
Now https://stackoverflow.com/users/562459/mike-sherrill-catcall says that PostgreSQL uses whatever collations the underlying system exposes. I do not mind making the OS expose a case insensitive collation. The only problem I have no idea how to do it.
A lot has changed since this question. Native support for case-insensitive collation has been added in PostgreSQL v12. This basically deprecates the citext extension, as mentioned in the other answers.
In PostgreSQL v12, one can do:
CREATE COLLATION case_insensitive (
provider = icu,
locale = 'und-u-ks-level2',
deterministic = false
);
CREATE TABLE names(
first_name text,
last_name text
);
insert into names values
('Anton','Egger'),
('Berta','egger'),
('Conrad','Egger');
select * from names
order by
last_name collate case_insensitive,
first_name collate case_insensitive;
See https://www.postgresql.org/docs/current/collation.html for more information.
There are no case insensitive collations, but there is the citext extension:
http://www.postgresql.org/docs/current/static/citext.html
For my purpose the ILIKE keyword did the job.
From the postgres docs:
The key word ILIKE can be used instead of LIKE to make the match
case-insensitive according to the active locale. This is not in the
SQL standard but is a PostgreSQL extension.
This is not changing collation, but maybe somebody help this type of query, where I was use function lower:
SELECT id, full_name, email FROM nurses WHERE(lower(full_name) LIKE '%bar%' OR lower(email) LIKE '%bar%')
I believe you need to specify your collation as a command line option to initdb when you create the database cluster. Something like
initdb --lc-collate=en_US.UTF-8
It also seems that using PostgreSQL 9.3 on Ubuntu and Mac OS X, initdb automatically creates the database cluster using a case-insensitive collation that is default in the current OS locale, in my case, en_US.UTF-8.
Could you be using an older version of PostgreSQL that does not default to the host locale? Or could it be that you are on an operating system that does not provide any case-insensitive collations for PostgreSQL to choose from?

Records not matching without LTRIM, RTRIM and Upper/Lower function

This issue I am facing from long time. I have two tables in different database having same columns and exactly same data type. But when doing join or any other matching query I get few results only, I noticed that when keeping
LTRIM(RTRIM(UPPER(SourceTable.Column))) =
LTRIM(RTRIM(UPPER(DestinationTable.Column)))
It works fine. I am surprised to say that I have seen same issue on bit and integer column and they also works fine when I keep LTRIM, RTRIM and UPPER/LOWER.
Below are the collation of the two databases:
Source: SQL_Latin1_General_CP1_CI_AS
Destination: SQL_Latin1_General_CP1_CI_AS
As you can see that they have same collation even though I am getting this issue. Can I have a permanent solution to this?
If the datatypes are exactly the same, it could be that you actually have a different collation on the columns - you can actually have a different collation to the database one, specified at the column level. First port of call for me would be to check that.
MSDN resource, quote:
Column-level collations
When you create or alter a table, you can specify collations for each
character-string column by using the COLLATE clause. If no collation is
specified, the column is assigned the default collation of the database.