PostgreSQL index array of int4range using GIN - custom operator class - postgresql

Here is my table:
CREATE TABLE
mytable
(
id INT NOT NULL PRIMARY KEY,
val int4range[]
);
I want to index the val column:
CREATE INDEX
ix_mytable_val
ON mytable
USING GIN (INT4RANGE(val, '[]')); // error, as is GIN(val)
I came up with the following:
CREATE OPERATOR CLASS gin_int4range_ops
DEFAULT FOR TYPE int4range[] USING gin AS
OPERATOR 1 <(anyrange,anyrange),
OPERATOR 2 <=(anyrange,anyrange),
OPERATOR 3 =(anyrange,anyrange),
OPERATOR 4 >=(anyrange,anyrange),
OPERATOR 5 >(anyrange,anyrange),
FUNCTION 1 lower(anyrange),
FUNCTION 2 upper(anyrange),
FUNCTION 3 isempty(anyrange),
FUNCTION 4 lower_inc(anyrange),
FUNCTION 5 upper_inc(anyrange);
But when I try to create the index, it fails (error below). However, if I call the create from within a DO $$ block, it executes.
If the create index executed, I get the error on INSERT INTO instead.
"ERROR: cache lookup failed for type 1"
I also tried this:
OPERATOR 1 &&(anyrange,anyrange),
OPERATOR 2 <#(anyrange,anyrange),
OPERATOR 3 #>(anyrange,anyrange),
OPERATOR 4 =(anyrange,anyrange),
In order to try and solve this, I have rebooted PG, the machine, and vacuumed the DB. I believe there is an error in the CREATE OPERATOR code.
If I can index an array of custom type of (int, int4range), that would be even better.
I've spent quite some time (a full day) wading through documentation, forums, etc., but can find nothing that really helps me to understand how to solve this (i.e. create a working custom operator class).

You need to CREATE OPERATOR CLASS based on Range Functions and Operators, for example:
CREATE OPERATOR CLASS gin_int4range_ops
DEFAULT FOR TYPE int4range[] USING gin AS
OPERATOR 1 =(anyrange,anyrange),
FUNCTION 1 lower(anyrange),
FUNCTION 2 upper(anyrange),
FUNCTION 3 isempty(anyrange),
FUNCTION 4 lower_inc(anyrange),
FUNCTION 5 upper_inc(anyrange);
Now you can CREATE INDEX:
CREATE INDEX ix_mytable4_vhstore_low
ON mytable USING gin (val gin_int4range_ops);
Check also:
Operator Classes and Operator Families
CREATE OPERATOR CLASS
The following query shows all defined operator classes:
SELECT am.amname AS index_method,
opc.opcname AS opclass_name
FROM pg_am am, pg_opclass opc
WHERE opc.opcmethod = am.oid
ORDER BY index_method, opclass_name;
This query shows all defined operator families and all the operators included in each family:
SELECT am.amname AS index_method,
opf.opfname AS opfamily_name,
amop.amopopr::regoperator AS opfamily_operator
FROM pg_am am, pg_opfamily opf, pg_amop amop
WHERE opf.opfmethod = am.oid AND
amop.amopfamily = opf.oid
ORDER BY index_method, opfamily_name, opfamily_operator;

Related

How to create support functions for gin index on custom operator class in postgres?

I created some custom operators for jsonb type and a class for them all. Problem is that when i create an index
CREATE INDEX idx_name on table USING gin(column_name custom_operator_class)
I get an error
missing support function 2 for attribute 1 of index "idx_name"
I probably need to create support classes for overlap, contains, containedBy and equal, but i am not finding any documentation on how to do that. All i found online is for btree, and nothing for gin. Does anybody know how to do this, or any material where i can find some examples?
If you need more information, i will be glad to say more. Operators are basically for recursive search of keys where date is less than, more than, equal to the specified one
EDIT:
I tried creating support functions like this
CREATE OR REPLACE FUNCTION jb_custom_contains(jsonb, jsonb)
RETURNS bool AS
'SELECT $1 <# $2' LANGUAGE sql IMMUTABLE;
CREATE OR REPLACE FUNCTION jb_custom_contaiedBy(jsonb, jsonb)
RETURNS bool AS
'SELECT $1 #> $2' LANGUAGE sql IMMUTABLE;
CREATE OR REPLACE FUNCTION jb_custom_equals(jsonb, jsonb)
RETURNS bool AS
'SELECT $1 = $2' LANGUAGE sql IMMUTABLE;
CREATE INDEX then doesn't return an error, but the operator won't work properly
To explain your error message:
Support function 2 is defined in src/include/access/gin.h:
/*
* amproc indexes for inverted indexes.
*/
#define GIN_COMPARE_PROC 1
#define GIN_EXTRACTVALUE_PROC 2
#define GIN_EXTRACTQUERY_PROC 3
#define GIN_CONSISTENT_PROC 4
#define GIN_COMPARE_PARTIAL_PROC 5
#define GIN_TRICONSISTENT_PROC 6
#define GIN_OPTIONS_PROC 7
#define GINNProcs 7
That is, support function 2 is the extractValue described in the documentation:
There are two methods that an operator class for GIN must provide:
Datum *extractValue(Datum itemValue, int32 *nkeys, bool **nullFlags)
Returns a palloc'd array of keys given an item to be indexed. The number of returned keys must be stored into *nkeys. If any of the keys can be null, also palloc an array of *nkeys bool fields, store its address at *nullFlags, and set these null flags as needed. *nullFlags can be left NULL (its initial value) if all keys are non-null. The return value can be NULL if the item contains no keys.
So the following is missing in your CREATE OPERATOR CLASS statement:
CREATE OPERATOR CLASS custom_operator_class FOR TYPE jsonb USING gin AS
FUNCTION 2 myextractvaluefunc(jsonb, internal),
...;
You are confusing a strategy number with a support function number.
This is documented in table 37.13 and chapter 66 of the official documentation, and there are examples linked therein.

Unable to create bloom index

I'm new to bloom indexes. I'm referring https://habr.com/en/company/postgrespro/blog/452968/ link to learn about new indexes.
When I was trying to create bloom index on own test table, I got below error:
SQL Error [42704]: ERROR: data type bigint has no default operator class for access method "bloom"
Hint: You must specify an operator class for the index or define a default operator class for the data
type.
No doubt, because in my table I have a column where I'm using bigint datatype and the same column I'm including in my index creation.
To avoid that error, I tried to create my own class for bigint datatype. Like below:
CREATE OPERATOR CLASS bigint_ops
DEFAULT FOR TYPE int USING bloom AS
OPERATOR 1 =(bigint,bigint),
FUNCTION 1 hashbigint;
and I got below error:
SQL Error [42883]: ERROR: could not find a function named "hashbigint"
Any help to avoid this error will be much appreciated.
The hashing function for bigint is hashint8, not hashbigint. I found this by running the query in the post you linked and filtering to where the type is 'bigint'.
testdb=# with t0 as (select distinct
opc.opcintype::regtype::text,
amop.amopopr::regoperator,
ampr.amproc
from pg_am am, pg_opclass opc, pg_amop amop, pg_amproc ampr
where am.amname = 'hash'
and opc.opcmethod = am.oid
and amop.amopfamily = opc.opcfamily
and amop.amoplefttype = opc.opcintype
and amop.amoprighttype = opc.opcintype
and ampr.amprocfamily = opc.opcfamily
and ampr.amproclefttype = opc.opcintype
order by opc.opcintype::regtype::text) select * from t0 where opcintype='bigint';
opcintype | amopopr | amproc
-----------+------------------+----------
bigint | =(bigint,bigint) | hashint8
(1 row)
There's also an error in your CREATE OPERATOR statement; it needs to be DEFAULT FOR TYPE bigint, not int.
testdb=# create extension bloom;
CREATE EXTENSION
testdb=# CREATE OPERATOR CLASS bigint_ops
DEFAULT FOR TYPE int USING bloom AS
OPERATOR 1 =(bigint,bigint),
FUNCTION 1 hashint8;
ERROR: could not make operator class "bigint_ops" be default for type pg_catalog.int4
DETAIL: Operator class "int4_ops" already is the default.
testdb=# CREATE OPERATOR CLASS bigint_ops
DEFAULT FOR TYPE bigint USING bloom AS
OPERATOR 1 =(bigint,bigint),
FUNCTION 1 hashint8;
CREATE OPERATOR CLASS
testdb=#

How to use array operators for type bytea[]?

Is it possible to use array operators on a type of bytea[]?
For example:
CREATE TABLE test (
metadata bytea[]
);
SELECT * FROM test WHERE test.metadata && ANY($1);
// could not find array type for data type bytea[]
If it's not possible, is there an alternative approach without changing the type from bytea[]?
postgresql 12.x
Do not use ANY, just compare the arrays directly using an array constructor and array functions
CREATE TABLE test (
metadata bytea[]
);
INSERT INTO public.test (metadata) VALUES('{"x","y"}');
SELECT * FROM test t WHERE metadata && array[E'\x78'::bytea];
When using ANY, the left-hand expression is evaluated and compared to each element of the right-hand array using the given operator, which must yield a Boolean result. So the original sql was trying to do something like bytea[] && bytea.
This applies not only for bytea[], but any array type e.g text[] or integer[].

How to create an operator in PostgreSQL for the hstore type with an int4range value

I have a table with an HSTORE column 'ext', where the value is an int4range. An example:
"p1"=>"[10, 18]", "p2"=>"[24, 32]", "p3"=>"[29, 32]", "p4"=>"[18, 19]"
However, when I try to create an expression index on this, I get an error:
CREATE INDEX ix_test3_p1
ON test3
USING gist
(((ext -> 'p1'::text)::int4range));
ERROR: data type text has no default operator class for access method
"gist" SQL state: 42704 Hint: You must specify an operator class for
the index or define a default operator class for the data type.
How do I create the operator for this?
NOTE
Each record may have its own unique set of keys. Each key represents an attribute, and the values the value range. So not all records will have "p1". Consider this an EAV model in hstore.
I don't get that error - I get "functions in index expression must be marked IMMUTABLE"
CREATE TABLE ht (ext hstore);
INSERT INTO ht VALUES ('p1=>"[10,18]"'), ('p1=>"[99,99]"');
CREATE INDEX ht_test_idx ON ht USING GIST ( ((ext->'p1'::text)::int4range) );
ERROR: functions in index expression must be marked IMMUTABLE
CREATE FUNCTION foo(hstore) RETURNS int4range LANGUAGE SQL AS $$ SELECT ($1->'p1')::int4range; $$ IMMUTABLE;
CREATE INDEX ht_test_idx ON ht USING GIST ( foo(ext) );
SET enable_seq_scan=false;
EXPLAIN SELECT * FROM ht WHERE foo(ext) = '[10,19)';
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using ht_test_idx on ht (cost=0.25..8.52 rows=1 width=32)
Index Cond: (foo(ext) = '[10,19)'::int4range)
I'm guessing the cast isn't immutable because you can change the default format of the range from inclusive...exclusive "[...)" to something else. You presumably won't be doing that though.
Obviously you'll want your real function to deal with things like missing "p1" entries, badly formed range values etc.

Are you able to use a custom Postgres comparison function for ORDER BY clauses?

In Python, I can write a sort comparison function which returns an item in the set {-1, 0, 1} and pass it to a sort function like so:
sorted(["some","data","with","a","nonconventional","sort"], custom_function)
This code will sort the sequence according to the collation order I define in the function.
Can I do the equivalent in Postgres?
e.g.
SELECT widget FROM items ORDER BY custom_function(widget)
Edit: Examples and/or pointers to documentation are welcome.
Yes you can, you can even create an functional index to speed up the sorting.
Edit: Simple example:
CREATE TABLE foo(
id serial primary key,
bar int
);
-- create some data
INSERT INTO foo(bar) SELECT i FROM generate_series(50,70) i;
-- show the result
SELECT * FROM foo;
CREATE OR REPLACE FUNCTION my_sort(int) RETURNS int
LANGUAGE sql
AS
$$
SELECT $1 % 5; -- get the modulo (remainder)
$$;
-- lets sort!
SELECT *, my_sort(bar) FROM foo ORDER BY my_sort(bar) ASC;
-- make an index as well:
CREATE INDEX idx_my_sort ON foo ((my_sort(bar)));
The manual is full of examples how to use your own functions, just start playing with it.
SQL: http://www.postgresql.org/docs/current/static/xfunc-sql.html
PL/pgSQL: http://www.postgresql.org/docs/current/static/plpgsql.html
We can avoid confusion about ordering methods using names:
"score function" of standard SQL select * from t order by f(x) clauses, and
"compare function" ("sort function" in the question text) of the Python's sort array method.
The ORDER BY clause of PostgreSQL have 3 mechanisms to sort:
Standard, using an "score function", that you can use also with INDEX.
Special "standard string-comparison alternatives", by collation configuration (only for text, varchar, etc. datatypes).
ORDER BY ... USING clause. See this question or docs example. Example: SELECT * FROM mytable ORDER BY somecol USING ~<~ where ~<~ is an operator, that is embedding a compare function.
Perhaps "standard way" in a RDBMS (as PostgreSQL) is not like Python's standard because indexing is the aim of a RDBMS, and it's easier to index score functions.
Answers to the question:
Direct solution. There are no direct way to use an user-defined function as compare function, like in the sort method of languages like Python or Javascript.
Indirect solution. You can use a user-defined compare function in an user-defined operator, and an user-defined operator class to index it. See at PostgreSQL docs:
CREATE OPERATOR with the compare function;
CREATE OPERATOR CLASS, to be indexable.
Explaining compare functions
In Python, the compare function looks like this:
def compare(a, b):
return 1 if a > b else 0 if a == b else -1
The compare function use less CPU tham a score function. It is usefull also to express order when score funcion is unknown.
See a complete description at
for C language see https://www.gnu.org/software/libc/manual/html_node/Comparison-Functions.html
for Javascript see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort#Description
Other typical compare functions
Wikipedia's example to compare tuples:
function tupleCompare((lefta, leftb, leftc), (righta, rightb, rightc))
if lefta ≠ righta
return compare(lefta, righta)
else if leftb ≠ rightb
return compare(leftb, rightb)
else
return compare(leftc, rightc)
In Javascript:
function compare(a, b) {
if (a is less than b by some ordering criterion) {
return -1;
}
if (a is greater than b by the ordering criterion) {
return 1;
}
// a must be equal to b
return 0;
}
C++ example of PostgreSQL docs:
complex_abs_cmp_internal(Complex *a, Complex *b)
{
double amag = Mag(a),
bmag = Mag(b);
if (amag < bmag)
return -1;
if (amag > bmag)
return 1;
return 0;
}
You could do something like this
SELECT DISTINCT ON (interval_alias) *,
to_timestamp(floor((extract('epoch' FROM index.created_at) / 10)) * 10) AT
TIME ZONE 'UTC' AS interval_alias
FROM index
WHERE index.created_at >= '{start_date}'
AND index.created_at <= '{end_date}'
AND product = '{product_id}'
GROUP BY id, interval_alias
ORDER BY interval_alias;
Firstly you define the parameter that will be your ordering column with AS. It could be function or any SQL expression. Then set it to ORDER BY expression and you're done!
In my opinion, this is the smoothest way to do such an ordering.