PostgreSQL sort order affected by succeeding characters - postgresql

I want to sort test last name and test2 last name so that the former comes before the latter. My understanding is that each character is compared from left to right until they differ and therefore the characters after those first differing characters do not matter anymore. However, as shown below, test comes before test2 but as soon as I append another character, the order changes. Why does this happen? What collation should I use to get the desired order? Note that converting them to bytea would yield the desired order.
test=# SELECT 'test last name' < 'test2 last name' COLLATE "en_US";
?column?
----------
f
(1 row)
test=# SELECT 'test last' < 'test2 last' COLLATE "en_US";
?column?
----------
f
(1 row)
test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test' < 'test2' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 l' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 l' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test l' < 'test2 l' COLLATE "en_US";
?column?
----------
f
(1 row)
test=# SELECT 'test l' < 'test2l' COLLATE "en_US";
?column?
----------
f
(1 row)
test=# SELECT 'test ' < 'test2l' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test last name'::bytea < 'test2 last name'::bytea;
?column?
----------
t
(1 row)

white space is special character in ICU collation.
see demo: https://www.unicode.org/reports/tr10/#Variable_Weighting_Examples
also here: http://www.unicode.org/reports/tr35/tr35-collation.html#table-collation-settings
simple explanation: https://unicode-org.github.io/icu/userguide/collation/customization/ignorepunct.html#shift-trimmed
You can following test:
CREATE COLLATION coll_shifted(provider = icu, locale = 'en-u-ka-shifted');
CREATE COLLATION coll_noignore(provider = icu, locale = 'en-u-ka-noignore');
SELECT 'test last name' < 'test2 last name' COLLATE coll_shifted
union all
SELECT 'test last name' < 'test2 last name' COLLATE coll_noignore
union all
SELECT 'test last name' < 'test2 last name' COLLATE "en_US";
if you just want compare by code pointer, you can use COLLATE "C" or COLLATE "POSIX".

That's how natural language collations work. If you want to compare character by character and have the space character be like other characters, use the C collation:
SELECT 'test last name' < 'test2 last name' COLLATE "C";
?column?
══════════
t
(1 row)
But don't complain if 'Z' < 'a' …

Related

in postgresql, how to preserve spaces in char

how to preserve spaces in char?
I create 2 tables ,
create table test_1 (a int, b char(10)) ;
create table test_2 (a int, b varchar(255));
and insert one row into test_1
insert into test_1 values (1 ,' ');
insert into test_2 select * from test_1;
select a, length(b) from test_2;
and it returns
| a | length(b) |
| -------- | -------------- |
| 1 | 0 |
I expect bleow, like oracle does
| a | length(b) |
| -------- | -------------- |
| 1 | 10 |
is there any option can i try ?
There is no way to change this behavior.
Observe also the following:
CREATE TABLE test_1 (a int, b char(10));
INSERT INTO test_1 VALUES (1, 'x');
SELECT length(b) FROM test_1;
length
--------
1
(1 row)
SELECT 'y' || b FROM test_1;
?column?
----------
yx
(1 row)
All this is working as required by the SQL standard.
Do yourself a favor and never use char. If you need the value to always have 10 characters, user a check constraint or a trigger that pads values with blanks.

Flattening Postgres nested JSONB column

I'm looking to see how to flatten data nested in a JSONB column.
As an example, say we have the table users with user_id(int) and siblings(JSONB)
With rows like:
id | JSONB
---------------------
1 | {"brother": {"first_name":"Sam", "last_name":"Smith"}, "sister": {"first_name":"Sally", "last_name":"Smith"}
2 | {"sister": {"first_name":"Jill"}}
I'm looking for a query that will return a response like:
id | sibling | first_name | last_name
-------------------------------------
1 | "brother" | "Sam" | "Smith"
1 | "sister" | "Sally" | "Smith"
2 | "sister" | "Jill" | null
I develop to this use it in psql.
To check code I create small view t1:
CREATE VIEW t1 AS (
SELECT 1 AS id, '{"brother": {"first_name":"Sam", "last_name":"Smith"}, "sister": {"first_name":"Sally", "last_name":"Smith"}}'::jsonb AS jsonb
UNION SELECT 2, '{"sister": {"first_name":"Jill", "last_name":"Johnson"}}'
UNION SELECT 3, '{"sister": {"first_name":"Jill", "x_name":"Johnson"}}'
);
The first task is to found list of possible key:
WITH fields AS (
SELECT DISTINCT jff.key
FROM t1,
jsonb_each(jsonb) AS jf,
jsonb_each(jf.value) AS jff
)
SELECT * FROM fields;
The result is:
key
------------
first_name
last_name
x_name
The next step is generate queries:
SELECT 'SELECT id, jf.key as sibling, ' || (
WITH fields AS (
SELECT DISTINCT jff.key
FROM t1,
jsonb_each(jsonb) AS jf,
jsonb_each(jf.value) AS jff
)
SELECT string_agg('jf.value->>''' || key || ''' as "' || key || '"', ',' ORDER BY key)
FROM fields
)
|| ' FROM t1, jsonb_each(jsonb) AS jf ORDER BY 1, 2, 3;' AS cmd;
It returns:
cmd
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SELECT id, jf.key as sibling,jf.value->>'first_name' as "first_name",jf.value->>'last_name' as "last_name",jf.value->>'x_name' as "x_name" FROM t1, jsonb_each(jsonb) AS jf ORDER BY 1, 2, 3;
(1 row)
To set result as psql variable I use gset:
\gset
After that you can call query:
:cmd
id | sibling | first_name | last_name | x_name
----+---------+------------+-----------+---------
1 | brother | Sam | Smith |
1 | sister | Sally | Smith |
2 | sister | Jill | Johnson |
3 | sister | Jill | | Johnson
(4 rows)
To run it from external languages you can create postgres function than return SQL command:
CREATE OR REPLACE FUNCTION build_query(IN tname text, OUT cmd text) AS $sql$
BEGIN
EXECUTE $cmd$
SELECT 'SELECT id, jf.key as sibling, ' || (
WITH fields AS (
SELECT DISTINCT jff.key
FROM t1,
jsonb_each(jsonb) AS jf,
jsonb_each(jf.value) AS jff
)
SELECT string_agg('jf.value->>''' || key || ''' as "' || key || '"', ',' ORDER BY key)
FROM fields
)
|| ' FROM $cmd$ || quote_ident(tname) || $cmd$ , jsonb_each(jsonb) AS jf ORDER BY 1, 2, 3;'$cmd$ INTO cmd;
RETURN;
END;
$sql$ LANGUAGE plpgsql;
SELECT * FROM build_query('t1');
cmd
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SELECT id, jf.key as sibling, jf.value->>'first_name' as "first_name",jf.value->>'last_name' as "last_name",jf.value->>'x_name' as "x_name" FROM t1 , jsonb_each(jsonb) AS jf ORDER BY 1, 2, 3;
(1 row)

can not get all column values separately but instead get all values in one column postgresql --9.5

CREATE OR REPLACE FUNCTION public.get_locations(
location_word varchar(50)
)
RETURNS TABLE
(
country varchar(50),
city varchar(50)
)
AS $$
DECLARE
location_word_ varchar(50);
BEGIN
location_word_:=concat(location_word, '%');
RETURN QUERY EXECUTE format(' (SELECT c.country, ''''::varchar(50) as city FROM webuser.country c
WHERE lower(c.country) LIKE %L LIMIT 1)
UNION
(SELECT c.country,ci.city FROM webuser.country c
JOIN webuser.city ci ON c.country_id=ci.country_id
WHERE lower(ci.city) LIKE %L LIMIT 4)',
location_word_,
location_word_ ) ;
END
$$ language PLPGSQL STABLE;
SELECT public.get_locations('a'::varchar(50));
I get this;
+get_locations +
+record +
------------------
+(Andorra,"") +
+(Germany,Aach) +
+(Germany,Aalen) +
+(Germany,Achim) +
+(Germany,Adenau)+
How can i place/get the values column by column like below? Because otherwise i can not match the values correctly. I should get the values column by column as countries and cities etc.
|country | city |
-------------------------------
| Andorra | "" |
| Germany | Aach |
| Germany | Aalen |
| Germany | Achim |
| Germany | Adenau |
Your function is declared as returns table so you have to use it like a table:
SELECT *
FROM public.get_locations('a'::varchar(50));
Unrelated, but:
Your function is way too complicated, you don't need dynamic SQL, nor do you need a PL/pgSQL function.
You can simplify that to:
CREATE OR REPLACE FUNCTION public.get_locations(p_location_word varchar(50))
RETURNS TABLE(country varchar(50), city varchar(50))
AS $$
(SELECT c.country, ''::varchar(50) as city
FROM webuser.country c
WHERE lower(c.country) LIKE concat(p_location_word, '%')
LIMIT 1)
UNION ALL
(SELECT c.country ,ci.city
FROM webuser.country c
JOIN webuser.city ci ON c.country_id = ci.country_id
WHERE lower(ci.city) LIKE concat(p_location_word, '%')
LIMIT 4)
$$
language SQL;

Postgres ts_rank_cd different result for same tsvector?

I'm testing out a text search query on two separate database servers, both running Postgres 9.4.4.
The row in question has the same data, and I get the same underlying tsvector on both servers:
SELECT
user_id,
TO_TSVECTOR('english', REGEXP_REPLACE(first_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
TO_TSVECTOR('english', REGEXP_REPLACE(last_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
TO_TSVECTOR('english', REGEXP_REPLACE(username, '[^a-zA-Z0-9]', ' ', 'g'))
FROM users_v1 where user_id = 123;
-- On server A:
-- user_id | to_tsvector
-- -----------+----------------------
-- 123 | 'georg':1 'hickman':2
-- (1 row)
-- On server B:
-- user_id | to_tsvector
-- -----------+----------------------
-- 123 | 'georg':1 'hickman':2
-- (1 row)
However I get a different rank when using this vector to run a query:
SELECT username,
TS_RANK_CD(
TO_TSVECTOR('english', REGEXP_REPLACE(first_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
TO_TSVECTOR('english', REGEXP_REPLACE(last_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
TO_TSVECTOR('english', REGEXP_REPLACE(username, '[^a-zA-Z0-9]', ' ', 'g'))
, PLAINTO_TSQUERY('george'))
FROM users WHERE user_id = 123;
-- On server A:
-- user_id | ts_rank_cd
-- -----------+----------------------
-- 123 | 0.2
-- (1 row)
-- On server B:
-- user_id | ts_rank_cd
-- -----------+----------------------
-- 123 | 0.0
-- (1 row)
Is the vector the only input to the rank function, or are there any server settings/anything else that affect the behaviour of ts_rank_cd? Is all the information stored in the vector displayed in the console output, or is there some hidden difference in it which I'm not seeing? If not what could be causing the discrepancy?
Thanks to jjanes's comment, I realised that PLAINTO_TSQUERY also accepts an optional argument to specify the text search configuration. Doing
SELECT PLAINTO_TSQUERY('george');
returned george on one system and georg on the other, but doing
SELECT PLAINTO_TSQUERY('english', 'george');
returned the same on both and caused the expected ranking. Doing
SHOW default_text_search_config;
Revealed pg_catalog.english was set as the default on one system and pg_catalog.simple on the other. The discrepancy can then be fixed by either explicitly passing the config in when creating the TSQUERY, or updating the default config so they are the same on both DBs.

Postgres dump: pg_catalog.setval

Does anyone know what pg_catalog.setval does?
I just did a dump off a PostgreSQL database and got lots of lines with that in it. Not sure what it's for.
You might want to check the fine manual:
setval(regclass, bigint) bigint Set
sequence's current value
Example usage:;
# create sequence x;
CREATE SEQUENCE
# select nextval('x');
nextval
---------
1
(1 row)
# select nextval('x');
nextval
---------
2
(1 row)
# select nextval('x');
nextval
---------
3
(1 row)
# select setval('x', 10000);
setval
--------
10000
(1 row)
# select nextval('x');
nextval
---------
10001
(1 row)
# select nextval('x');
nextval
---------
10002
(1 row)