Original Question
We've had great results using a K-NN search with a GiST index with gist_trgm_ops. Pure magic. I've got other situations, with other datatypes like timestamp where distance functions would be quite useful. If I didn't dream it, this is, or was, available through pg_catalog. Looking around, I can't find a way to search on indexes by such properties. I think what I'm after, in this case, is AMPROP_DISTANCE_ORDERABLE under-the-hood.
Just checked, and pg_am did have a lot more attributes than it does now, prior to 9.6.
Is there another way to figure out what options various indexes have with a catalog search?
Catalogs
jjanes' answer inspired me to look at the system information functions some more, and to spend a day in the pg_catalog tables. The catalogs for indexes and operators are complicated. The system information functions are a big help. This piece proved super useful for getting a handle on things:
https://postgrespro.com/blog/pgsql/4161264
I think the conclusion is "no, you can't readily figure out what data types and indexes support proximity searches." The relevant attribute is a property of a column in a specific index. However, it looks like nearest-neighbor searching requires a GiST index, and that there are readily-available index operator classes to add K-NN searching to a huge range of common types. Happy for corrections on these conclusions, or the details below.
Built-in Distance Support
https://www.postgresql.org/docs/current/gist-builtin-opclasses.html
From various bits of the docs, it sounds like there are distance (proximity, nearest neighbor, K-NN) operators for GiST indexes on a handful of built-in geometric types.
box
circle
point
poly
B-tree Operator Classes
Not listed as such in the docs, but visible with this query:
select am.amname AS index_method
, opc.opcname AS opclass_name
, opc.opcintype::regtype AS indexed_type
, opc.opcdefault AS is_default
from pg_am am
, pg_opclass opc
where opc.opcmethod = am.oid
and am.amname = 'btree'
order by 1,2;
B-tree GiST Distance Support
https://www.postgresql.org/docs/current/btree-gist.html
I guess a B-tree is a special case of a GiST, and there's a B-tree operator class to match. The docs say these native types are supported:
int2
int4
int8
float4
float8
timestamp with time zone
timestamp without time zone
time without time zone
date
interval
oid
money
BRIN Built-in Operator Classes
https://www.postgresql.org/docs/current/brin-builtin-opclasses.html
There are over 70 listed in the internals docs.
GIN Built-in Operator Classes
https://www.postgresql.org/docs/12/gin-builtin-opclasses.html
array_ops
jsonb_ops
jsonb_path_ops
tsvector_ops
Alternative Text Opts
https://www.postgresql.org/docs/current/indexes-opclass.html
There are special operator classes for text comparisons made character-by-character, rather than through a collation. Or so the docs say:
text_pattern_ops
varchar_pattern_ops
bpchar_pattern_ops
pg_trgm
Beyond this, the included pg_trgm module includes operators for GIN and GiST, with the GiST version optimizing <->. I think this shows up as:
text
Note: Postgres 14 modifies pg_trgm to allow you to adjust the "signature length" for the index entry. Longer is possibly more accurate, shorter signatures are smaller on disk. If you've been using pg_trgm, it might be worth experimenting with the signature length in PG 14.
https://www.postgresql.org/docs/current/pgtrgm.html
SP-GiST Built-in Operator Classes
box_ops
kd_point_ops
network_ops
poly_ops
quad_point_ops
range_ops
text_ops
pg_operator search
Here's a search on pg_operator to look for matches starting from the <-> operator itself:
select oprnamespace::regnamespace::text as schema_name,
oprowner::regrole as owner,
oprname as operator,
oprleft::regtype as left,
oprright::regtype as right,
oprresult::regtype as result,
oprcom::regoperator as commutator
from pg_operator
where oprname = '<->'
order by 1
Output from one of our severs:
| schema_name | owner | operator | left | right | result | commutator |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
| extensions | postgres | <-> | text | text | real | <->(text,text) |
| extensions | postgres | <-> | money | money | money | <->(money,money) |
| extensions | postgres | <-> | date | date | integer | <->(date,date) |
| extensions | postgres | <-> | real | real | real | <->(real,real) |
| extensions | postgres | <-> | double precision | double precision | double precision | <->(double precision,double precision) |
| extensions | postgres | <-> | smallint | smallint | smallint | <->(smallint,smallint) |
| extensions | postgres | <-> | integer | integer | integer | <->(integer,integer) |
| extensions | postgres | <-> | bigint | bigint | bigint | <->(bigint,bigint) |
| extensions | postgres | <-> | interval | interval | interval | <->(interval,interval) |
| extensions | postgres | <-> | oid | oid | oid | <->(oid,oid) |
| extensions | postgres | <-> | time without time zone | time without time zone | interval | <->(time without time zone,time without time zone) |
| extensions | postgres | <-> | timestamp without time zone | timestamp without time zone | interval | <->(timestamp without time zone,timestamp without time zone) |
| extensions | postgres | <-> | timestamp with time zone | timestamp with time zone | interval | <->(timestamp with time zone,timestamp with time zone) |
| pg_catalog | postgres | <-> | box | box | double precision | <->(box,box) |
| pg_catalog | postgres | <-> | path | path | double precision | <->(path,path) |
| pg_catalog | postgres | <-> | line | line | double precision | <->(line,line) |
| pg_catalog | postgres | <-> | lseg | lseg | double precision | <->(lseg,lseg) |
| pg_catalog | postgres | <-> | polygon | polygon | double precision | <->(polygon,polygon) |
| pg_catalog | postgres | <-> | circle | circle | double precision | <->(circle,circle) |
| pg_catalog | postgres | <-> | point | circle | double precision | <->(circle,point) |
| pg_catalog | postgres | <-> | circle | point | double precision | <->(point,circle) |
| pg_catalog | postgres | <-> | point | polygon | double precision | <->(polygon,point) |
| pg_catalog | postgres | <-> | polygon | point | double precision | <->(point,polygon) |
| pg_catalog | postgres | <-> | circle | polygon | double precision | <->(polygon,circle) |
| pg_catalog | postgres | <-> | polygon | circle | double precision | <->(circle,polygon) |
| pg_catalog | postgres | <-> | point | point | double precision | <->(point,point) |
| pg_catalog | postgres | <-> | box | line | double precision | <->(line,box) |
| pg_catalog | postgres | <-> | tsquery | tsquery | tsquery | 0 |
| pg_catalog | postgres | <-> | line | box | double precision | <->(box,line) |
| pg_catalog | postgres | <-> | point | line | double precision | <->(line,point) |
| pg_catalog | postgres | <-> | line | point | double precision | <->(point,line) |
| pg_catalog | postgres | <-> | point | lseg | double precision | <->(lseg,point) |
| pg_catalog | postgres | <-> | lseg | point | double precision | <->(point,lseg) |
| pg_catalog | postgres | <-> | point | box | double precision | <->(box,point) |
| pg_catalog | postgres | <-> | box | point | double precision | <->(point,box) |
| pg_catalog | postgres | <-> | lseg | line | double precision | <->(line,lseg) |
| pg_catalog | postgres | <-> | line | lseg | double precision | <->(lseg,line) |
| pg_catalog | postgres | <-> | lseg | box | double precision | <->(box,lseg) |
| pg_catalog | postgres | <-> | box | lseg | double precision | <->(lseg,box) |
| pg_catalog | postgres | <-> | point | path | double precision | <->(path,point) |
| pg_catalog | postgres | <-> | path | point | double precision | <->(point,path) |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
Did I miss any index opts worth knowing about?
Checking Out Live Indexes
Here's a longer-than-it-should-be-because-I-still-find-the-catalogs-confusing query to pull out the columns from each user index, and figure out their more interesting properties. For a nice, short catalog search of much utility, see https://dba.stackexchange.com/questions/186944/how-to-list-all-the-indexes-along-with-their-type-btree-brin-hash-etc
with
basic_details as (
select relnamespace::regnamespace::text as schema_name,
indrelid::regclass::text as table_name,
indexrelid::regclass::text as index_name,
unnest(indkey) as column_ordinal_position , -- WITH ORDINALITY would be nice here, didn't get it working.
generate_subscripts(indkey, 1) + 1 as column_position_in_index --
from pg_index
join pg_class on pg_class.oid = pg_index.indrelid
),
enriched_details as (
select basic_details.schema_name,
basic_details.table_name,
basic_details.index_name,
basic_details.column_ordinal_position,
basic_details.column_position_in_index,
columns.column_name,
columns.udt_name as column_type_name
from basic_details
join information_schema.columns as columns
on columns.table_schema = basic_details.schema_name
and columns.table_name = basic_details.table_name
and columns.ordinal_position = basic_details.column_ordinal_position
where schema_name not like 'pg_%'
)
select *,
-- https://postgrespro.com/blog/pgsql/4161264
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'distance_orderable'), false) as supports_knn_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'search_array'), false) as supports_in_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'returnable'), false) as supports_index_only_scans,
(select indexdef
from pg_indexes
where pg_indexes.schemaname = enriched_details.schema_name
and pg_indexes.indexname = enriched_details.index_name) as index_definition
from enriched_details
order by supports_in_searches desc,
schema_name,
table_name,
index_name
timestamp type supports KNN with GiST indexes using the <-> operator created by the btree_gist extension.
You can check if a specific column of a specific index supports it, like this:
select pg_index_column_has_property('pgbench_history_mtime_idx'::regclass,1,'distance_orderable');
As best as I can tell, here's the state of play as of PG 14:
GiST indexes may support nearest-neighbor (K-NN) proximity <--> search, and always have.
SP-GiST added such support as of PG 12.
RUM indexes (not in core) also support K-NN.
In all cases, support is done in the operator class:
https://www.postgresql.org/docs/current/indexes-opclass.html
That's what determines if distance_orderable works for a specific data type on a specific kind of index. Built-in, some of the geometric and text vector types work out-of-the box. Other than that small set, many more types are supported via specific operator classes, such as:
https://www.postgresql.org/docs/current/btree-gist.html
https://www.postgresql.org/docs/current/pgtrgm.html
In the case of SP-GiST, there are a lot fewer types supported than with GiST, once you've installed btree_gist:
https://www.postgresql.org/docs/14/spgist-builtin-opclasses.html
It looks like text_opts and range_opts do not support proximity searches. However, for tsrange, etc., there are likely enough options with other tools.
I have a postgres table which has some data.Each row has a date associated with it.I want to extract rows for the dates which has the month as April.Here is a csv version of my postgres table data
,date,location,device,provider,cpu,mem,load,drops,id,latency,gw_latency,upload,download,sap_drops,sap_latency,alert_id
0,2018-02-10 11:52:59.342269+00:00,BEM,10.11.100.1,COD,6.0,23.0,11.75,0.0,,,,,,,,
1,2018-02-10 11:53:04.006971+00:00,VER,10.11.100.1,KOD,6.0,23.0,4.58,0.0,,,,,,,,
2,2018-03-25 20:28:36.186015+00:00,RET,10.11.100.1,POL,7.0,26.0,9.83,0.0,,86.328,5.0,4.33,15.33,0.0,23.0,
3,2018-03-25 20:28:59.155453+00:00,ASR,10.12.100.1,VOL,5.0,14.0,2.67,0.0,,52.406,12.0,2.17,3.17,0.0,28.0,
4,2018-04-01 13:16:44.472119+00:00,RED,10.19.0.1,SEW,6.0,14.0,2.77,0.0,,52.766,2.0,3.25,2.29,0.0,1.0,0.0
5,2018-04-01 13:16:48.478708+00:00,RED,10.19.0.1,POL,6.0,14.0,4.065,0.0,,52.766,1.0,6.63,1.5,0.0,1.0,0.0
6,2018-04-06 21:00:44.769702+00:00,GOK,10.61.100.1,FDE,4.0,22.0,3.08,0.0,,54.406,8.0,3.33,2.83,0.0,19.0,0.0
7,2018-04-06 21:01:07.211395+00:00,WER,10.4.100.1,FDE,3.0,3.0,9.28,0.0,,0.346,2.0,10.54,8.02,0.0,33.0,0.0
8,2018-04-13 11:18:08.411550+00:00,DER,10.19.0.1,CVE,14.0,14.0,7.88,0.0,,50.545,2.0,6.17,9.59,0.0,1.0,0.0
9,2018-04-13 11:18:12.420974+00:00,RTR,10.19.0.1,BOL,14.0,14.0,1.345,0.0,,50.545,1.0,2.26,0.43,0.0,1.0,0.0
So I want only the rows which has a month of april data such that I will have a table which looks something like this
4,2018-04-01 13:16:44.472119+00:00,RED,10.19.0.1,SEW,6.0,14.0,2.77,0.0,,52.766,2.0,3.25,2.29,0.0,1.0,0.0
5,2018-04-01 13:16:48.478708+00:00,RED,10.19.0.1,POL,6.0,14.0,4.065,0.0,,52.766,1.0,6.63,1.5,0.0,1.0,0.0
6,2018-04-06 21:00:44.769702+00:00,GOK,10.61.100.1,FDE,4.0,22.0,3.08,0.0,,54.406,8.0,3.33,2.83,0.0,19.0,0.0
7,2018-04-06 21:01:07.211395+00:00,WER,10.4.100.1,FDE,3.0,3.0,9.28,0.0,,0.346,2.0,10.54,8.02,0.0,33.0,0.0
8,2018-04-13 11:18:08.411550+00:00,DER,10.19.0.1,CVE,14.0,14.0,7.88,0.0,,50.545,2.0,6.17,9.59,0.0,1.0,0.0
9,2018-04-13 11:18:12.420974+00:00,RTR,10.19.0.1,BOL,14.0,14.0,1.345,0.0,,50.545,1.0,2.26,0.43,0.0,1.0,0.0
Now If I try to extract a particular date with the below query
select * from metrics_data where date = 2018-04-13;
I get the error message
No operator matches the given name and argument type(s). You might need to add explicit type casts.
How do I get the rows for the month of April and store it in a new table say april_data?
Below is the structure of my existing table
Column | Type | Modifiers | Storage | Stats target | Description
-------------+--------------------------+-----------+----------+--------------+-------------
date | timestamp with time zone | | plain | |
location | character varying(255) | | extended | |
device | character varying(255) | | extended | |
provider | character varying(255) | | extended | |
cpu | double precision | | plain | |
mem | double precision | | plain | |
load | double precision | | plain | |
drops | double precision | | plain | |
id | integer | | plain | |
latency | double precision | | plain | |
gw_latency | double precision | | plain | |
upload | double precision | | plain | |
download | double precision | | plain | |
sap_drops | double precision | | plain | |
sap_latency | double precision | | plain | |
alert_id | double precision | | plain | |
The type of column date in your table is timestamp with time zone which format will be YYYY:MM:DD HH24:MI:SS.MS. In the query you make an operation timestamp with time zone = date, so it will throw an error.
So, if you want to fix it, you should fix one side to the type of other.
In your case I suggest as below:
Match exact 1 day.
select * from metrics_data where date(date) = '2018-04-13';
Match within 1 month.
select * from metrics_data where date BETWEEN '2018-04-01 00:00:00' AND '2018-04-30 23:59:59.999';
OR
select * from metrics_data where date(date) BETWEEN '2018-04-01' AND '2018-04-30';
OR
select * from metrics_data where to_char(date,'YYYY-MM') = '2018-04';
Match only April.
select * from metrics_data where to_char(date,'MM') = '04';
OR
select * from metrics_data where extract(month from date) = 4;
Hopefully this answer will help you.
You need single quotes around the string literal.
PostgreSQL will automatically cast it to the correct data type (timestamp with time zone).
You could use the extract function to select only the dates from April:
SELECT * FROM yourtable WHERE extract (month FROM yourtable.date) = 4;