Alphanumeric case in-sensitive sorting in postgres - postgresql

I am new to postrges and want to sort varchar type columns. want to explain the problem with with below example:
table name: testsorting
order name
1 b
2 B
3 a
4 a1
5 a11
6 a2
7 a20
8 A
9 a19
case sensitive sorting (which is default in postgres) gives:
select name from testsorting order by name;
A
B
a
a1
a11
a19
a2
a20
b
case in-sensitive sorting gives:
select name from testsorting order by UPPER(name);
A
a
a1
a11
a19
a2
a20
B
b
how can i make alphanumeric case in-sensitive sorting in postgres to get below order:
a
A
a1
a2
a11
a19
a20
b
B
I wont mind the order for capital or small letters, but the order should be "aAbB" or "AaBb" and should not be "ABab"
Please suggest if you have any solution to this in postgres.

My PostgreSQL sorts the way you want. The way PostgreSQL compares strings is determined by locale and collation. When you create database using createdb there is -l option to set locale. Also you can check how it is configured in your environment using psql -l:
[postgres#test]$ psql -l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
---------+----------+----------+------------+------------+-----------------------
mn_test | postgres | UTF8 | pl_PL.UTF8 | pl_PL.UTF8 |
As you see my database uses Polish collation.
If you created database using other collation then you can use other collation in query just like:
SELECT * FROM sort_test ORDER BY name COLLATE "C";
SELECT * FROM sort_test ORDER BY name COLLATE "default";
SELECT * FROM sort_test ORDER BY name COLLATE "pl_PL";
You can list available collations by:
SELECT * FROM pg_collation;
EDITED:
Oh, I missed that 'a11' must be before 'a2'.
I don't think standard collation can solve alphanumeric sorting. For such sorting you will have to split string into parts just like in Clodoaldo Neto response. Another option that is useful if you frequently have to order this way is to separate name field into two columns. You can create trigger on INSERT and UPDATE that split name into name_1 and name_2 and then:
SELECT name FROM sort_test ORDER BY name_1 COLLATE "en_EN", name_2;
(I changed collation from Polish into English, you should use your native collation to sort letters like aącć etc)

If the name is always in the 1 alpha followed by n numerics format then:
select name
from testsorting
order by
upper(left(name, 1)),
(substring(name from 2) || '0')::integer

PostgreSQL uses the C library locale facilities for sorting strings. C library is provided by the host operating system. On Mac OS X or a BSD-family operating system,the UTF-8 locale definitions are broken and hence the results are as per collation "C".
image attached for collation results with ubuntu 15.04 as host OS
Check FAQ's on postgres wiki for more details : https://wiki.postgresql.org/wiki/FAQ

As far as I'm concerned, I have used the PostgreSQL module citext and used the data type CITEXT instead of TEXT. It makes both sort and search on these columns case insensitive.
The module can be installed with the SQL command CREATE EXTENSION IF NOT EXISTS citext;

I agree with Clodoaldo Neto's answer, but also don't forget to add the index
CREATE INDEX testsorting_name on testsorting(upper(left(name,1)), substring(name from 2)::integer)

Answer strongly inspired from this one.
By using a function it will be easier to keep it clean if you need it over different queries.
CREATE OR REPLACE FUNCTION alphanum(str anyelement)
RETURNS anyelement AS $$
BEGIN
RETURN (SUBSTRING(str, '^[^0-9]*'),
COALESCE(SUBSTRING(str, '[0-9]+')::INT, -1) + 2000000);
END;
$$ LANGUAGE plpgsql IMMUTABLE;
Then you could use it this way:
SELECT name FROM testsorting ORDER BY alphanum(name);
Test:
WITH x(name) AS (VALUES ('b'), ('B'), ('a'), ('a1'),
('a11'), ('a2'), ('a20'), ('A'), ('a19'))
SELECT name, alphanum(name) FROM x ORDER BY alphanum(name);
name | alphanum
------+-------------
a | (a,1999999)
A | (A,1999999)
a1 | (a,2000001)
a2 | (a,2000002)
a11 | (a,2000011)
a19 | (a,2000019)
a20 | (a,2000020)
b | (b,1999999)
B | (B,1999999)

Related

How to optimize inverse pattern matching in Postgresql?

I have Pg version 13.
CREATE TABLE test_schemes (
pattern TEXT NOT NULL,
some_code TEXT NOT NULL
);
Example data
----------- | -----------
pattern | some_code
----------- | -----------
__3_ | c1
__34 | c2
1_3_ | a12
_7__ | a10
7138 | a19
_123|123_ | a20
___253 | a28
253 | a29
2_1 | a30
This table have about 300k rows. I want to optimize simple query like
SELECT * FROM test_schemes where '1234' SIMILAR TO pattern
----------- | -----------
pattern | some_code
----------- | -----------
__3_ | c1
__34 | c2
1_3_ | a12
_123|123_ | a20
The problem is that this simple query will do a full scan of 300k rows to find all the matches. Given this design, how can I make the query faster (any use of special index)?
Internally, SIMILAR TO works similar to regexes, which would be evident by running an EXPLAIN on the query. You may want to just switch to regexes straight up, but it is also worth looking at text_pattern_ops indexes to see if you can improve the performance.
If the pipe is the only feature of SIMILAR TO (other than those present in LIKE) which you use, then you could process it into a form you can use with the much faster LIKE.
SELECT * FROM test_schemes where '1234' LIKE any(string_to_array(pattern,'|'))
In my hands this is about 25 times faster, and gives the same answer as your example on your example data (augmented with a few hundred thousand rows of garbage to get the table row count up to about where you indicated). It does assume there is no escaping of any pipes.
If you store the data already broken apart, it is about 3 times faster yet, but of course give cosmetically different answers.
create table test_schemes2 as select unnest as pattern, somecode from test_schemes, unnest(string_to_array(pattern,'|'));
SELECT * FROM test_schemes2 where '1234' LIKE pattern;

PostgreSQL table name (“relation does not exist”), option to ignore case-sensitivity?

I have been trying to migrate an application ear in Websphere Application server that was connected to oracle to PostgreSQL,
I can see a lot of PostgreSQL table name (“relation does not exist”) after the migration, based on what i've read here i see that its because of the way the select scripts were written by the developers. i see that most solutions is to Quote the tablenames as Postgres automatically lowercase the table names during query. however since i am not the person who has the created the ear file, is there any alternative solution that i can apply on the postgress side to ignore case-sensitivity queries?
PostgreSQL is by default case insenstive if object names are not quoted:
postgres=# \d t;
Table "public.t"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
x | integer | | |
postgres=# select count(*) from t;
count
-------
1
(1 row)
postgres=# select count(*) from T;
count
-------
1
(1 row)
It is only case sensitive if object names are quoted:
postgres=# select count(*) from "t";
count
-------
1
(1 row)
postgres=# select count(*) from "T";
ERROR: relation "T" does not exist
LINE 1: select count(*) from "T";
^
postgres=#
AFAIK there is no parameter to change this behaviour.

Postgres ordering of UTF-8 characters

I'm building a small app that includes Esperanto words in my database, so I have words like ĉapelojn and brakhorloĝo, with "special" characters.
Using PostgreSQL 9.4.4 I have a words table with the following schema:
lingvoj_dev=# \d words
Table "public.words"
Column | Type | Modifiers
-------------+-----------------------------+----------------------------------------------------
id | integer | not null default nextval('words_id_seq'::regclass)
translated | character varying(255) |
meaning | character varying(255) |
times_seen | integer |
inserted_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
Indexes:
"words_pkey" PRIMARY KEY, btree (id)
But the following query gives some strange output:
lingvoj_dev=# SELECT w."translated" FROM "words" AS w ORDER BY w."translated" desc limit 10;
translated
------------
ĉu
ŝi
ĝi
ĉevaloj
ĉapelojn
ĉapeloj
ĉambro
vostojn
volas
viro
(10 rows)
The ordering is inconsistent - I'd be okay with all of the words starting with special characters being at the end, but all of the words starting with ĉ should be grouped together and they're not! Why do ŝi and ĝi come in between ĉu and ĉevaloj?
The server encoding is UTF8, and the collation is en_AU.UTF-8.
edit: It looks like it's sorting all of the special characters as equivalent - it's ordering correctly based on the second character in each word. How do I make PostgreSQL see that ĉ, ŝ and ĝ are not equivalent?
I'd be okay with all of the words starting with special characters
being at the end...
Use collate "C":
SELECT w."translated"
FROM "words" AS w
ORDER BY w."translated" collate "C" desc limit 10;
See also Different behaviour in “order by” clause: Oracle vs. PostgreSQL
The query can be problematic when using ORM. The solution may be to recreate the database with the LC_COLLATE = C option, as suggested by the OP in the comment. There is one more option - change the collation for a single column:
ALTER TABLE "words" ALTER COLUMN "translated" TYPE text COLLATE "C";

How can I sort the postgres column with certain special characters?

I am trying to sort on a character column in a Postgres database:
Select column1 from table order by column1
Output
dir1
dir2
dir3
#num1
t1
I want the sort to print #num1 first the way sqlite does. Any ideas what I need to change in my query?
A possible solution would be to "disable" your collation setting for this sort:
WITH x(a) AS (VALUES
('dir1')
,('dir2')
,('dir3')
,('#num1')
,('t1')
)
SELECT *
FROM x
ORDER BY a COLLATE "C";
Ad-hoc Collation for individual expressions requires PostgreSQL 9.1 or later.
Most locales would ignore the leading # for sorting. If you switch to "C", characters are effectively sorted by their byte values. This may or may not be what you want, though.
Many related questions, like here:
PostgreSQL UTF-8 binary collation
You can use the ASCII value of the ordered field:
SELECT column1 FROM table ORDER BY ascii(column1)
Special chars ASCII values are lower than letters ones.
Output
#num1
dir1
dir2
dir3
t1
A brute force version to put # on top in the sort order
SELECT column1
FROM table1
ORDER BY CASE WHEN LEFT(column1, 1) = '#'
THEN 0 ELSE 1 END, column1
Here is SQLFiddle demo.
This may not be exactly what you want

MySQL Select if field is unique or null

Sorry, I can't find an example anywhere, mainly because I can't think of any other way to explain it that doesn't include DISTINCT or UNIQUE (which I've found to be misleading terms in SQL).
I need to select unique values AND null values from one table.
FLAVOURS:
id | name | flavour
--------------------------
1 | mark | chocolate
2 | cindy | chocolate
3 | rick |
4 | dave |
5 | jenn | vanilla
6 | sammy | strawberry
7 | cindy | chocolate
8 | rick |
9 | dave |
10 | jenn | caramel
11 | sammy | strawberry
I want the kids who have a unique flavour (vanilla, caramel) and the kids who don't have any flavour.
I don't want the kids with duplicate flavours (chocolate, strawberry).
My searches for help always return an answer for how to GROUP BY, UNIQUE and DISTINCT for chocolate and strawberry. That's not what I want. I don't want any repeated terms in a field - I want everything else.
What is the proper MySQL select statement for this?
Thanks!
You can use HAVING to select just some of the groups, so to select the groups where there is only one flavor, you use:
SELECT * from my_table GROUP BY flavour HAVING COUNT(*) = 1
If you then want to select those users that have NULL entries, you use
SELECT * FROM my_table WHERE flavour IS NULL
and if you combine them, you get all entries that either have a unique flavor, or NULL.
SELECT * from my_table GROUP BY flavour HAVING COUNT(*) = 1 AND flavour IS NOT NULL
UNION
SELECT * FROM my_table WHERE flavour IS NULL
I added the "flavour IS NOT NULL" just to ensure that a flavour that is NULL is not picked if it's the single one, which would generate a duplicate.
I don't have a database to hand, but you should be able to use a query along the lines of.
SELECT name from FLAVOURS WHERE flavour IN ( SELECT flavour, count(Flavour) from FLAVOURS where count(Flavour) = 1 GROUP BY flavour) OR flavour IS NULL;
I apologise if this isn't quite right, but hopefully is a good start.
You need a self-join that looks for duplicates, and then you need to veto those duplicates by looking for cases where there was no match (that's the WHERE t2.flavor IS NULL). Then your're doing something completely different, looking for nulls in the original table, with the second line in the WHERE clause (OR t1.flavor IS NULL)
SELECT DISTINCT t1.name, t1.flavor
FROM tablename t1
LEFT JOIN tablename t2
ON t2.flavor = t1.flavor AND t2.ID <> t1.ID
WHERE t2.flavor IS NULL
OR t1.flavor IS NULL
I hope this helps.