How to convert PSQLs ::json #> ::json to a jpa/jpql-predicate - jpa

Say i have a db-table looking like this:
CREATE TABLE myTable(
id BIGINT,
date TIMESTAMP,
user_ids JSONB
);
user_ids are a JSONB-ARRAY
Let a record of this table look like this:
{
"id":13,
"date":"2019-01-25 11:03:57",
"user_ids":[25, 661, 88]
};
I need to query all records where user_ids contain 25. In SQL i can achieve it with the following select-statement:
SELECT * FROM myTable where user_ids::jsonb #> '[25]'::jsonb;
Now i need to write a JPA-Predicate that renders "user_ids::jsonb #> '[25]'::jsonb" to a hibernate parseable/executable Criteria, which i then intent to use in a session.createQuery() statement.
In simpler terms i need to know how i can write that PSQL-snippet (user_ids::jsonb #> '[25]'::jsonb) as a HQL-expression.

Fortunately, every comparison operator in PostgreSQL is merely an alias to a function, and you can find the alias through the psql console by typing \doS+ and the operator (although some operators are considered wildcards in this search, so they give more results than desired).
Here is the result:
postgres=# \doS+ #>
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Function | Description
------------+------+---------------+----------------+-------------+---------------------+-------------
pg_catalog | #> | aclitem[] | aclitem | boolean | aclcontains | contains
pg_catalog | #> | anyarray | anyarray | boolean | arraycontains | contains
pg_catalog | #> | anyrange | anyelement | boolean | range_contains_elem | contains
pg_catalog | #> | anyrange | anyrange | boolean | range_contains | contains
pg_catalog | #> | box | box | boolean | box_contain | contains
pg_catalog | #> | box | point | boolean | box_contain_pt | contains
pg_catalog | #> | circle | circle | boolean | circle_contain | contains
pg_catalog | #> | circle | point | boolean | circle_contain_pt | contains
pg_catalog | #> | jsonb | jsonb | boolean | jsonb_contains | contains
pg_catalog | #> | path | point | boolean | path_contain_pt | contains
pg_catalog | #> | polygon | point | boolean | poly_contain_pt | contains
pg_catalog | #> | polygon | polygon | boolean | poly_contain | contains
pg_catalog | #> | tsquery | tsquery | boolean | tsq_mcontains | contains
(13 rows)
What you want is jsonb arguments on both sides, and we see the function that has that is called jsonb_contains. So the equivalent to jsonbcolumn #> jsonbvalue is jsonb_contains(jsonbcolumn, jsonbvalue). Now you can't use the function in either JPQL or CriteriaBuilder, unless you register it through a custom Dialect if you're using Hibernate. If you're using EclipseLink, I don't know the situation there.
From here on, your options are to use native queries, or add your own Hibernate Dialect by extending an existing one.

Replacing "#>" with "jsonb_contains()" is not a good idea. The operator is indexed, not the function. Example: https://dbfiddle.uk/-xMuHYAA

Related

Any way to find and delete almost similar records with SQL?

I have a table in Postgres DB, that has a lot of almost identical rows. For example:
1. 00Zicky_-_San_Pedro_Danilo_Vigorito_Remix
2. 00Zicky_-_San_Pedro__Danilo_Vigorito_Remix__
3. 0101_-_Try_To_Say__Strictlyjaz_Unit_Future_Rmx__
4. 0101_-_Try_To_Say__Strictlyjaz_Unit_Future_Rmx_
5. 01_-_Digital_Excitation_-_Brothers_Gonna_Work_it_Out__Piano_Mix__
6. 01_-_Digital_Excitation_-_Brothers_Gonna_Work_it_Out__Piano_Mix__
I think about to writing a little golang script to remove duplicates, but maybe SQL can do it?
Table definition:
\d+ songs
Table "public.songs"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------------+-----------------------------+-----------+----------+----------------------------------------+----------+-------------+--------------+-------------
song_id | integer | | not null | nextval('songs_song_id_seq'::regclass) | plain | | |
song_name | character varying(250) | | not null | | extended | | |
fingerprinted | smallint | | | 0 | plain | | |
file_sha1 | bytea | | | | extended | | |
total_hashes | integer | | not null | 0 | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
Indexes:
"pk_songs_song_id" PRIMARY KEY, btree (song_id)
Referenced by:
TABLE "fingerprints" CONSTRAINT "fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
Tried several methods to find duplicates, but that methods search only for exact similarity.
There is no operator which is essentially A almost = B. (Well there is full text search, but that seems to be a little excessive here.) If the only difference is the number of - and _ then just get rid of them and compare the the resulting difference. If they are equal, then one is a duplicate. You can use the replace() function to remove them. So something like: (see demo)
delete
from songs s2
where exists ( select null
from songs s1
where s1.song_id < s2.song_id
and replace(replace(s1.name, '_',''),'-','') =
replace(replace(s2.name, '_',''),'-','')
);
If your table is large this will not be fast, but a functional index may help:
create index song_name_idx on songs
(replace(replace(name, '_',''),'-',''));

Bulk update datatype of a column in all relevant tables

An example of some tables with the column I want to change.
+--------------------------------------+------------------+------+
| ?column? | column_name | data_type |
|--------------------------------------+------------------+------|
| x.articles | article_id | bigint |
| x.supplier_articles | article_id | bigint |
| x.purchase_order_details | article_id | bigint |
| y.scheme_articles | article_id | integer |
....
There are some 50 tables that have the column.
I want to change the article_id column from a numeric data type to a textual data type. It is found across several tables. Is there anyway to update them all at once ? Information schema is readonly so I cannot do an update on it. Other than writing inidividual alter statements for all the tables, is there a better way to do it ?

What Postgres 13 index types support distance searches?

Original Question
We've had great results using a K-NN search with a GiST index with gist_trgm_ops. Pure magic. I've got other situations, with other datatypes like timestamp where distance functions would be quite useful. If I didn't dream it, this is, or was, available through pg_catalog. Looking around, I can't find a way to search on indexes by such properties. I think what I'm after, in this case, is AMPROP_DISTANCE_ORDERABLE under-the-hood.
Just checked, and pg_am did have a lot more attributes than it does now, prior to 9.6.
Is there another way to figure out what options various indexes have with a catalog search?
Catalogs
jjanes' answer inspired me to look at the system information functions some more, and to spend a day in the pg_catalog tables. The catalogs for indexes and operators are complicated. The system information functions are a big help. This piece proved super useful for getting a handle on things:
https://postgrespro.com/blog/pgsql/4161264
I think the conclusion is "no, you can't readily figure out what data types and indexes support proximity searches." The relevant attribute is a property of a column in a specific index. However, it looks like nearest-neighbor searching requires a GiST index, and that there are readily-available index operator classes to add K-NN searching to a huge range of common types. Happy for corrections on these conclusions, or the details below.
Built-in Distance Support
https://www.postgresql.org/docs/current/gist-builtin-opclasses.html
From various bits of the docs, it sounds like there are distance (proximity, nearest neighbor, K-NN) operators for GiST indexes on a handful of built-in geometric types.
box
circle
point
poly
B-tree Operator Classes
Not listed as such in the docs, but visible with this query:
select am.amname AS index_method
, opc.opcname AS opclass_name
, opc.opcintype::regtype AS indexed_type
, opc.opcdefault AS is_default
from pg_am am
, pg_opclass opc
where opc.opcmethod = am.oid
and am.amname = 'btree'
order by 1,2;
B-tree GiST Distance Support
https://www.postgresql.org/docs/current/btree-gist.html
I guess a B-tree is a special case of a GiST, and there's a B-tree operator class to match. The docs say these native types are supported:
int2
int4
int8
float4
float8
timestamp with time zone
timestamp without time zone
time without time zone
date
interval
oid
money
BRIN Built-in Operator Classes
https://www.postgresql.org/docs/current/brin-builtin-opclasses.html
There are over 70 listed in the internals docs.
GIN Built-in Operator Classes
https://www.postgresql.org/docs/12/gin-builtin-opclasses.html
array_ops
jsonb_ops
jsonb_path_ops
tsvector_ops
Alternative Text Opts
https://www.postgresql.org/docs/current/indexes-opclass.html
There are special operator classes for text comparisons made character-by-character, rather than through a collation. Or so the docs say:
text_pattern_ops
varchar_pattern_ops
bpchar_pattern_ops
pg_trgm
Beyond this, the included pg_trgm module includes operators for GIN and GiST, with the GiST version optimizing <->. I think this shows up as:
text
Note: Postgres 14 modifies pg_trgm to allow you to adjust the "signature length" for the index entry. Longer is possibly more accurate, shorter signatures are smaller on disk. If you've been using pg_trgm, it might be worth experimenting with the signature length in PG 14.
https://www.postgresql.org/docs/current/pgtrgm.html
SP-GiST Built-in Operator Classes
box_ops
kd_point_ops
network_ops
poly_ops
quad_point_ops
range_ops
text_ops
pg_operator search
Here's a search on pg_operator to look for matches starting from the <-> operator itself:
select oprnamespace::regnamespace::text as schema_name,
oprowner::regrole as owner,
oprname as operator,
oprleft::regtype as left,
oprright::regtype as right,
oprresult::regtype as result,
oprcom::regoperator as commutator
from pg_operator
where oprname = '<->'
order by 1
Output from one of our severs:
| schema_name | owner | operator | left | right | result | commutator |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
| extensions | postgres | <-> | text | text | real | <->(text,text) |
| extensions | postgres | <-> | money | money | money | <->(money,money) |
| extensions | postgres | <-> | date | date | integer | <->(date,date) |
| extensions | postgres | <-> | real | real | real | <->(real,real) |
| extensions | postgres | <-> | double precision | double precision | double precision | <->(double precision,double precision) |
| extensions | postgres | <-> | smallint | smallint | smallint | <->(smallint,smallint) |
| extensions | postgres | <-> | integer | integer | integer | <->(integer,integer) |
| extensions | postgres | <-> | bigint | bigint | bigint | <->(bigint,bigint) |
| extensions | postgres | <-> | interval | interval | interval | <->(interval,interval) |
| extensions | postgres | <-> | oid | oid | oid | <->(oid,oid) |
| extensions | postgres | <-> | time without time zone | time without time zone | interval | <->(time without time zone,time without time zone) |
| extensions | postgres | <-> | timestamp without time zone | timestamp without time zone | interval | <->(timestamp without time zone,timestamp without time zone) |
| extensions | postgres | <-> | timestamp with time zone | timestamp with time zone | interval | <->(timestamp with time zone,timestamp with time zone) |
| pg_catalog | postgres | <-> | box | box | double precision | <->(box,box) |
| pg_catalog | postgres | <-> | path | path | double precision | <->(path,path) |
| pg_catalog | postgres | <-> | line | line | double precision | <->(line,line) |
| pg_catalog | postgres | <-> | lseg | lseg | double precision | <->(lseg,lseg) |
| pg_catalog | postgres | <-> | polygon | polygon | double precision | <->(polygon,polygon) |
| pg_catalog | postgres | <-> | circle | circle | double precision | <->(circle,circle) |
| pg_catalog | postgres | <-> | point | circle | double precision | <->(circle,point) |
| pg_catalog | postgres | <-> | circle | point | double precision | <->(point,circle) |
| pg_catalog | postgres | <-> | point | polygon | double precision | <->(polygon,point) |
| pg_catalog | postgres | <-> | polygon | point | double precision | <->(point,polygon) |
| pg_catalog | postgres | <-> | circle | polygon | double precision | <->(polygon,circle) |
| pg_catalog | postgres | <-> | polygon | circle | double precision | <->(circle,polygon) |
| pg_catalog | postgres | <-> | point | point | double precision | <->(point,point) |
| pg_catalog | postgres | <-> | box | line | double precision | <->(line,box) |
| pg_catalog | postgres | <-> | tsquery | tsquery | tsquery | 0 |
| pg_catalog | postgres | <-> | line | box | double precision | <->(box,line) |
| pg_catalog | postgres | <-> | point | line | double precision | <->(line,point) |
| pg_catalog | postgres | <-> | line | point | double precision | <->(point,line) |
| pg_catalog | postgres | <-> | point | lseg | double precision | <->(lseg,point) |
| pg_catalog | postgres | <-> | lseg | point | double precision | <->(point,lseg) |
| pg_catalog | postgres | <-> | point | box | double precision | <->(box,point) |
| pg_catalog | postgres | <-> | box | point | double precision | <->(point,box) |
| pg_catalog | postgres | <-> | lseg | line | double precision | <->(line,lseg) |
| pg_catalog | postgres | <-> | line | lseg | double precision | <->(lseg,line) |
| pg_catalog | postgres | <-> | lseg | box | double precision | <->(box,lseg) |
| pg_catalog | postgres | <-> | box | lseg | double precision | <->(lseg,box) |
| pg_catalog | postgres | <-> | point | path | double precision | <->(path,point) |
| pg_catalog | postgres | <-> | path | point | double precision | <->(point,path) |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
Did I miss any index opts worth knowing about?
Checking Out Live Indexes
Here's a longer-than-it-should-be-because-I-still-find-the-catalogs-confusing query to pull out the columns from each user index, and figure out their more interesting properties. For a nice, short catalog search of much utility, see https://dba.stackexchange.com/questions/186944/how-to-list-all-the-indexes-along-with-their-type-btree-brin-hash-etc
with
basic_details as (
select relnamespace::regnamespace::text as schema_name,
indrelid::regclass::text as table_name,
indexrelid::regclass::text as index_name,
unnest(indkey) as column_ordinal_position , -- WITH ORDINALITY would be nice here, didn't get it working.
generate_subscripts(indkey, 1) + 1 as column_position_in_index --
from pg_index
join pg_class on pg_class.oid = pg_index.indrelid
),
enriched_details as (
select basic_details.schema_name,
basic_details.table_name,
basic_details.index_name,
basic_details.column_ordinal_position,
basic_details.column_position_in_index,
columns.column_name,
columns.udt_name as column_type_name
from basic_details
join information_schema.columns as columns
on columns.table_schema = basic_details.schema_name
and columns.table_name = basic_details.table_name
and columns.ordinal_position = basic_details.column_ordinal_position
where schema_name not like 'pg_%'
)
select *,
-- https://postgrespro.com/blog/pgsql/4161264
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'distance_orderable'), false) as supports_knn_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'search_array'), false) as supports_in_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'returnable'), false) as supports_index_only_scans,
(select indexdef
from pg_indexes
where pg_indexes.schemaname = enriched_details.schema_name
and pg_indexes.indexname = enriched_details.index_name) as index_definition
from enriched_details
order by supports_in_searches desc,
schema_name,
table_name,
index_name
timestamp type supports KNN with GiST indexes using the <-> operator created by the btree_gist extension.
You can check if a specific column of a specific index supports it, like this:
select pg_index_column_has_property('pgbench_history_mtime_idx'::regclass,1,'distance_orderable');
As best as I can tell, here's the state of play as of PG 14:
GiST indexes may support nearest-neighbor (K-NN) proximity <--> search, and always have.
SP-GiST added such support as of PG 12.
RUM indexes (not in core) also support K-NN.
In all cases, support is done in the operator class:
https://www.postgresql.org/docs/current/indexes-opclass.html
That's what determines if distance_orderable works for a specific data type on a specific kind of index. Built-in, some of the geometric and text vector types work out-of-the box. Other than that small set, many more types are supported via specific operator classes, such as:
https://www.postgresql.org/docs/current/btree-gist.html
https://www.postgresql.org/docs/current/pgtrgm.html
In the case of SP-GiST, there are a lot fewer types supported than with GiST, once you've installed btree_gist:
https://www.postgresql.org/docs/14/spgist-builtin-opclasses.html
It looks like text_opts and range_opts do not support proximity searches. However, for tsrange, etc., there are likely enough options with other tools.

Sorting Issue with Underscore in Postgres

I'm trying to perform sorting on below data but postgres return the wrong sorting result.
Can someone please help me over her. How can I get proper sorting data.
Here I'm write below query to get data,
SELECT * FROM TempTable ORDER BY a_test ASC NULLS FIRST;
and it's return result like below,
| BB001217 |
| BB001217_000010 |
| BB001217_000011 |
| BB001217_00002 |
| BB001217_00003 |
| BB001218 |
| BB001219 |
| BB001220 |
| BB001220_000010 |
| BB001220_000011 |
| BB001220_00002 |
| BB001220_00003 |
| BB001220_00004 |
| BB001220_00005 |
| BB001220_00006 |
And I Expected result in below form,
| BB001217 |
| BB001217_00002 |
| BB001217_00003 |
| BB001217_000010 |
| BB001217_000011 |
| BB001218 |
| BB001219 |
| BB001220 |
| BB001220_00002 |
| BB001220_00003 |
| BB001220_00004 |
| BB001220_00005 |
| BB001220_00006 |
| BB001220_000010 |
| BB001220_000011 |
From PostgreSQL v10 on you could use an ICU collation that provides “natural sorting”:
CREATE COLLATION english_natural (
LOCALE = 'en-US-u-kn-true',
PROVIDER = icu
);
SELECT *
FROM TempTable
ORDER BY a_test COLLATE english_natural
ASC NULLS FIRST;
You are storing numbers in a VARCHAR column and the sorting is thus based on character sorting where '10' is considered to be smaller than '2'
You need to split the column into two parts, then convert the second to a number and sort on those two:
SELECT *
FROM temptable
ORDER BY split_part(a_test,'_',1),
nullif(split_part(a_test,'_',2),'')::int ASC NULLS FIRST;
Online example: https://rextester.com/RNU44666

PG::UndefinedFunction: ERROR: operator does not exist: geometry && box

Why does PostgreSQL complain that the && operator does not exist? (I have PostGIS installed - see below).
mydb=# SELECT "monuments".* FROM "monuments" WHERE
mydb=# (coord && '-10,-10,10,10'::box)
mydb=# ORDER BY created_at DESC ;
ERROR: operator does not exist: geometry && box
LINE 1: ...LECT "monuments".* FROM "monuments" WHERE (coord && '-10...
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
I have PostGIS installed:
mydb=# select postgis_full_version();
NOTICE: Function postgis_topology_scripts_installed() not found. Is topology support enabled and topology.sql installed?
postgis_full_version
----------------------------------------------------------------------------------------------------------------------------------------------------------------
POSTGIS="2.1.0 r11822" GEOS="3.3.8-CAPI-1.7.8" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.10.0, released 2013/04/24" LIBXML="2.9.1" LIBJSON="UNKNOWN" RASTER
And by the way, my table looks like this:
mydb=# \d monuments
id | integer | not null default nextval('monuments_id_seq'::regclass)
coord | geometry(Point,3785) |
Let me know if you need any more info.
box is a built-in PostgreSQL primitive geometric type, like point.
postgres=> \dT box
List of data types
Schema | Name | Description
------------+------+------------------------------------------
pg_catalog | box | geometric box '(lower left,upper right)'
(1 row)
PostGIS uses its own geometry type, and doesn't generally inter-operate well with the PostgreSQL built-in basic geometric types. These are the supported data type combinations for && with PostGIS 2 on my PostgreSQL 9.3 install:
postgres=# \do &&
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Description
------------+------+---------------+----------------+-------------+-----------------
pg_catalog | && | anyarray | anyarray | boolean | overlaps
pg_catalog | && | anyrange | anyrange | boolean | overlaps
pg_catalog | && | box | box | boolean | overlaps
pg_catalog | && | circle | circle | boolean | overlaps
pg_catalog | && | polygon | polygon | boolean | overlaps
pg_catalog | && | tinterval | tinterval | boolean | overlaps
pg_catalog | && | tsquery | tsquery | tsquery | AND-concatenate
public | && | geography | geography | boolean |
public | && | geometry | geometry | boolean |
public | && | geometry | raster | boolean |
public | && | raster | geometry | boolean |
public | && | raster | raster | boolean |
(12 rows)
You'll see that box is supported for box && box but not box && geometry. Since your coord column is a geometry type, you'll need to convert the box to geometry, so as to end up with geometry && geometry.
Example:
WHERE (coord && geometry(polygon('((-10, -10), (10, 10))'::box)))
The simplest explanation would be that you installed the extension into some schema that is not in your current search_path.
Did you know, that you can even "schema-qualify" operators? Like:
SELECT 3 OPERATOR(pg_catalog.+) 4;
Or:
SELECT * FROM public.monuments
WHERE coord OPERATOR(my_postgis_schema.&&) '-10,-10,10,10'::box);
This way you could make your query independent of the current search_path. Better though, to fix it.