Operator ~<~ in Postgres - postgresql

(Originally part of this question, but it was bit irrelevant, so I decided to make it a question of its own.)
I cannot find what the operator ~<~ is. The Postgres manual only mentions ~ and similar operators here, but no sign of ~<~.
When fiddling in the psql console, I found out that these commands give the same results:
SELECT * FROM test ORDER BY name USING ~<~;
SELECT * FROM test ORDER BY name COLLATE "C";
And these gives the reverse ordering:
SELECT * FROM test ORDER BY name USING ~>~;
SELECT * FROM test ORDER BY name COLLATE "C" DESC;
Also some info on the tilde operators:
\do ~*~
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Description
------------+------+---------------+----------------+-------------+-------------------------
pg_catalog | ~<=~ | character | character | boolean | less than or equal
pg_catalog | ~<=~ | text | text | boolean | less than or equal
pg_catalog | ~<~ | character | character | boolean | less than
pg_catalog | ~<~ | text | text | boolean | less than
pg_catalog | ~>=~ | character | character | boolean | greater than or equal
pg_catalog | ~>=~ | text | text | boolean | greater than or equal
pg_catalog | ~>~ | character | character | boolean | greater than
pg_catalog | ~>~ | text | text | boolean | greater than
pg_catalog | ~~ | bytea | bytea | boolean | matches LIKE expression
pg_catalog | ~~ | character | text | boolean | matches LIKE expression
pg_catalog | ~~ | name | text | boolean | matches LIKE expression
pg_catalog | ~~ | text | text | boolean | matches LIKE expression
(12 rows)
The 3rd and 4th rows is the operator I'm looking for, but the description is a bit insufficient for me.

~>=~, ~<=~, ~>~ and ~<~ are text pattern (or varchar, basically the same) operators, the counterparts of their respective siblings >=, <=, >and <. They sort character data strictly by their byte values, ignoring rules of any collation setting (as opposed to their siblings). This makes them faster, but also invalid for most languages / countries.
The "C" locale is effectively the same as no locale, meaning no collation rules. That explains why ORDER BY name USING ~<~ and ORDER BY name COLLATE "C" end up doing the same. The latter syntax variant should be preferred: more standard, less error-prone.
Detailed explanation in the last chapter of this related answer on dba.SE:
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
Note that ~~ / ~~* are Postgres operators for LIKE / ILIKE and barely related to the above. Similarly, !~~ / !~~* for NOT LIKE / NOT IILKE. (Use standard LIKE notation instead of these "internal" operators.)
Related:
~~ Operator In Postgres
Symfony2 Doctrine - ILIKE clause for PostgreSQL?
Find rows where text array contains value similar to input

Related

How to convert PSQLs ::json #> ::json to a jpa/jpql-predicate

Say i have a db-table looking like this:
CREATE TABLE myTable(
id BIGINT,
date TIMESTAMP,
user_ids JSONB
);
user_ids are a JSONB-ARRAY
Let a record of this table look like this:
{
"id":13,
"date":"2019-01-25 11:03:57",
"user_ids":[25, 661, 88]
};
I need to query all records where user_ids contain 25. In SQL i can achieve it with the following select-statement:
SELECT * FROM myTable where user_ids::jsonb #> '[25]'::jsonb;
Now i need to write a JPA-Predicate that renders "user_ids::jsonb #> '[25]'::jsonb" to a hibernate parseable/executable Criteria, which i then intent to use in a session.createQuery() statement.
In simpler terms i need to know how i can write that PSQL-snippet (user_ids::jsonb #> '[25]'::jsonb) as a HQL-expression.
Fortunately, every comparison operator in PostgreSQL is merely an alias to a function, and you can find the alias through the psql console by typing \doS+ and the operator (although some operators are considered wildcards in this search, so they give more results than desired).
Here is the result:
postgres=# \doS+ #>
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Function | Description
------------+------+---------------+----------------+-------------+---------------------+-------------
pg_catalog | #> | aclitem[] | aclitem | boolean | aclcontains | contains
pg_catalog | #> | anyarray | anyarray | boolean | arraycontains | contains
pg_catalog | #> | anyrange | anyelement | boolean | range_contains_elem | contains
pg_catalog | #> | anyrange | anyrange | boolean | range_contains | contains
pg_catalog | #> | box | box | boolean | box_contain | contains
pg_catalog | #> | box | point | boolean | box_contain_pt | contains
pg_catalog | #> | circle | circle | boolean | circle_contain | contains
pg_catalog | #> | circle | point | boolean | circle_contain_pt | contains
pg_catalog | #> | jsonb | jsonb | boolean | jsonb_contains | contains
pg_catalog | #> | path | point | boolean | path_contain_pt | contains
pg_catalog | #> | polygon | point | boolean | poly_contain_pt | contains
pg_catalog | #> | polygon | polygon | boolean | poly_contain | contains
pg_catalog | #> | tsquery | tsquery | boolean | tsq_mcontains | contains
(13 rows)
What you want is jsonb arguments on both sides, and we see the function that has that is called jsonb_contains. So the equivalent to jsonbcolumn #> jsonbvalue is jsonb_contains(jsonbcolumn, jsonbvalue). Now you can't use the function in either JPQL or CriteriaBuilder, unless you register it through a custom Dialect if you're using Hibernate. If you're using EclipseLink, I don't know the situation there.
From here on, your options are to use native queries, or add your own Hibernate Dialect by extending an existing one.
Replacing "#>" with "jsonb_contains()" is not a good idea. The operator is indexed, not the function. Example: https://dbfiddle.uk/-xMuHYAA

Full text search configuration on postgresql

I'm facing an issue concerning the text search configuration on postgresql.
I have a table users wich contains a column name. The name of users maybe a french, english, spanish or any other language.
So I need to use the Full Text Search of postgresql. The default text serach configuration I'm using now is the simple configuration but is not efficient to make the search and get the suitable results.
I'm trying to combine different text search configuration like this:
(to_tsvector('english', document) || to_tsvector('french', document) || to_tsvector('spanish', document) || to_tsvector('russian', document)) ##
(to_tsquery('english', query) || to_tsquery('french', query) || to_tsquery('spanish', query) || to_tsquery('russian', query))
But this query didn't give suitable results, if we test for example:
select (to_tsvector('english', 'adam and smith') || to_tsvector('french', 'adam and smith') || to_tsvector('spanish', 'adam and smith') || to_tsvector('russian', 'adam and smith'))
tsvector: 'adam':1,4,7,10 'and':5,8 'smith':3,6,9,12
Using the origin language of the word:
select (to_tsvector('english', 'adam and smith'))
tsvector: 'adam':1 'smith':3
The first thing to mention that the stopwords were not token into consideration when we combine different configuration with || operator.
Is there any solution to combine different text search configuration and use the suitable language when a user search a text?
Maybe you think that || is an “or” operator, but it concatenates text search vectors.
Take a look at what happens in your expression.
Running \dF+ french in psql will show you that for asciiwords, a French Snowball stemmer is used. That removes stop words and reduces the words to their stem. Similar for English and Russian.
You can use ts_debug to see this in operation:
test=> SELECT * FROM ts_debug('english', 'adam and smith');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+-------+----------------+--------------+---------
asciiword | Word, all ASCII | adam | {english_stem} | english_stem | {adam}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | and | {english_stem} | english_stem | {}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | smith | {english_stem} | english_stem | {smith}
(5 rows)
test=> SELECT * FROM ts_debug('french', 'adam and smith');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+-------+---------------+-------------+---------
asciiword | Word, all ASCII | adam | {french_stem} | french_stem | {adam}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | and | {french_stem} | french_stem | {and}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | smith | {french_stem} | french_stem | {smith}
(5 rows)
Now if you concatenate these four tsvectors, you end up with adam in position 1, 4, 7 and 10.
There is no good way to use full text search for different languages at once.
But if it is really personal names you are searching, I would do the following:
Create a text search configuration with a simple dictionary for asciiwords, and either use an empty stopword file for the dictionary or one that contains stopwords that are acceptable in all languages.
Personal names normally should not be stemmed, so you avoid that problem. And if you miss a stopword, that's no big deal. It only makes the resulting tsvector (and index) larger, but with personal names there should not be too many stopwords anyway.

~~ Operator In Postgres

I have a query in Postgres:
SELECT DISTINCT a.profn FROM tprof a, sap_tstc b, tgrc c
WHERE ((c.grcid ~~ a.grcid)
AND ((c.tcode) = (b.tcode)));
What is ~~ mean?
From 9.7.1. LIKE of PostgreSQL documentation:
The operator ~~ is equivalent to LIKE, and ~~* corresponds to ILIKE. There are also !~~ and !~~* operators that represent NOT LIKE and NOT ILIKE, respectively. All of these operators are PostgreSQL-specific.
It isn't listed in the index of the documentation which is frustrating.
So I had a look with psql:
regress=> \do ~~
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Description
------------+------+---------------+----------------+-------------+-------------------------
pg_catalog | ~~ | bytea | bytea | boolean | matches LIKE expression
pg_catalog | ~~ | character | text | boolean | matches LIKE expression
pg_catalog | ~~ | name | text | boolean | matches LIKE expression
pg_catalog | ~~ | text | text | boolean | matches LIKE expression
(4 rows)
It's an operator alias for LIKE, that's all.

Escaping special characters in to_tsquery

How do you espace special characters in string passed to to_tsquery? For instance, this kind of query:
select to_tsquery('AT&T');
Produces:
NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored
to_tsquery
------------
(1 row)
Edit: I also noticed that there is the same issue in to_tsvector.
A simple solution is to create the tsquery as follows:
select $$'AT&T'$$::tsquery;
You can make more complex queries:
select $$'AT&T' & Phone | '|Bang!'$$::tsquery;
See the text search docs for more.
I found this comment very useful that uses the plainto_tsquery('AT&T) function https://stackoverflow.com/a/16020565/350195
If you want 'AT&T' to be treated as a search word, you're going to need some customised components, because the default parser splits it as two words:
steve#steve#[local] =# select * from ts_parse('default', 'AT&T');
tokid | token
-------+-------
1 | AT
12 | &
1 | T
(3 rows)
steve#steve#[local] =# select * from ts_debug('simple', 'AT&T');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+-------+--------------+------------+---------
asciiword | Word, all ASCII | AT | {simple} | simple | {at}
blank | Space symbols | & | {} | |
asciiword | Word, all ASCII | T | {simple} | simple | {t}
(3 rows)
As you can see from the documentation for CREATE TEXT PARSER this is not very trivial, as the parser appears to need to be a C function.
You might find this post of someone getting "underscore_word" to be recognised as a single token useful: http://postgresql.1045698.n5.nabble.com/Configuring-Text-Search-parser-td2846645.html

Bit masking in Postgres

I have this query
SELECT * FROM "functions" WHERE (models_mask & 1 > 0)
and the I get the following error:
PGError: ERROR: operator does not exist: character varying & integer
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
The models_mask is an integer in the database. How can I fix this.
Thank you!
Check out the docs on bit operators for Pg.
Essentially & only works on two like types (usually bit or int), so model_mask will have to be CASTed from varchar to something reasonable like bit or int:
models_mask::int & 1 -or- models_mask::int::bit & b'1'
You can find out what types an operator works with using \doS in psql
pg_catalog | & | bigint | bigint | bigint | bitwise and
pg_catalog | & | bit | bit | bit | bitwise and
pg_catalog | & | inet | inet | inet | bitwise and
pg_catalog | & | integer | integer | integer | bitwise and
pg_catalog | & | smallint | smallint | smallint | bitwise and
Here is a quick example for more information
# SELECT 11 & 15 AS int, b'1011' & b'1111' AS bin INTO foo;
SELECT
# \d foo
Table "public.foo"
Column | Type | Modifiers
--------+---------+-----------
int | integer |
bin | "bit" |
# SELECT * FROM foo;
int | bin
-----+------
11 | 1011