Bit masking in Postgres - postgresql

I have this query
SELECT * FROM "functions" WHERE (models_mask & 1 > 0)
and the I get the following error:
PGError: ERROR: operator does not exist: character varying & integer
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
The models_mask is an integer in the database. How can I fix this.
Thank you!

Check out the docs on bit operators for Pg.
Essentially & only works on two like types (usually bit or int), so model_mask will have to be CASTed from varchar to something reasonable like bit or int:
models_mask::int & 1 -or- models_mask::int::bit & b'1'
You can find out what types an operator works with using \doS in psql
pg_catalog | & | bigint | bigint | bigint | bitwise and
pg_catalog | & | bit | bit | bit | bitwise and
pg_catalog | & | inet | inet | inet | bitwise and
pg_catalog | & | integer | integer | integer | bitwise and
pg_catalog | & | smallint | smallint | smallint | bitwise and
Here is a quick example for more information
# SELECT 11 & 15 AS int, b'1011' & b'1111' AS bin INTO foo;
SELECT
# \d foo
Table "public.foo"
Column | Type | Modifiers
--------+---------+-----------
int | integer |
bin | "bit" |
# SELECT * FROM foo;
int | bin
-----+------
11 | 1011

Related

What Postgres 13 index types support distance searches?

Original Question
We've had great results using a K-NN search with a GiST index with gist_trgm_ops. Pure magic. I've got other situations, with other datatypes like timestamp where distance functions would be quite useful. If I didn't dream it, this is, or was, available through pg_catalog. Looking around, I can't find a way to search on indexes by such properties. I think what I'm after, in this case, is AMPROP_DISTANCE_ORDERABLE under-the-hood.
Just checked, and pg_am did have a lot more attributes than it does now, prior to 9.6.
Is there another way to figure out what options various indexes have with a catalog search?
Catalogs
jjanes' answer inspired me to look at the system information functions some more, and to spend a day in the pg_catalog tables. The catalogs for indexes and operators are complicated. The system information functions are a big help. This piece proved super useful for getting a handle on things:
https://postgrespro.com/blog/pgsql/4161264
I think the conclusion is "no, you can't readily figure out what data types and indexes support proximity searches." The relevant attribute is a property of a column in a specific index. However, it looks like nearest-neighbor searching requires a GiST index, and that there are readily-available index operator classes to add K-NN searching to a huge range of common types. Happy for corrections on these conclusions, or the details below.
Built-in Distance Support
https://www.postgresql.org/docs/current/gist-builtin-opclasses.html
From various bits of the docs, it sounds like there are distance (proximity, nearest neighbor, K-NN) operators for GiST indexes on a handful of built-in geometric types.
box
circle
point
poly
B-tree Operator Classes
Not listed as such in the docs, but visible with this query:
select am.amname AS index_method
, opc.opcname AS opclass_name
, opc.opcintype::regtype AS indexed_type
, opc.opcdefault AS is_default
from pg_am am
, pg_opclass opc
where opc.opcmethod = am.oid
and am.amname = 'btree'
order by 1,2;
B-tree GiST Distance Support
https://www.postgresql.org/docs/current/btree-gist.html
I guess a B-tree is a special case of a GiST, and there's a B-tree operator class to match. The docs say these native types are supported:
int2
int4
int8
float4
float8
timestamp with time zone
timestamp without time zone
time without time zone
date
interval
oid
money
BRIN Built-in Operator Classes
https://www.postgresql.org/docs/current/brin-builtin-opclasses.html
There are over 70 listed in the internals docs.
GIN Built-in Operator Classes
https://www.postgresql.org/docs/12/gin-builtin-opclasses.html
array_ops
jsonb_ops
jsonb_path_ops
tsvector_ops
Alternative Text Opts
https://www.postgresql.org/docs/current/indexes-opclass.html
There are special operator classes for text comparisons made character-by-character, rather than through a collation. Or so the docs say:
text_pattern_ops
varchar_pattern_ops
bpchar_pattern_ops
pg_trgm
Beyond this, the included pg_trgm module includes operators for GIN and GiST, with the GiST version optimizing <->. I think this shows up as:
text
Note: Postgres 14 modifies pg_trgm to allow you to adjust the "signature length" for the index entry. Longer is possibly more accurate, shorter signatures are smaller on disk. If you've been using pg_trgm, it might be worth experimenting with the signature length in PG 14.
https://www.postgresql.org/docs/current/pgtrgm.html
SP-GiST Built-in Operator Classes
box_ops
kd_point_ops
network_ops
poly_ops
quad_point_ops
range_ops
text_ops
pg_operator search
Here's a search on pg_operator to look for matches starting from the <-> operator itself:
select oprnamespace::regnamespace::text as schema_name,
oprowner::regrole as owner,
oprname as operator,
oprleft::regtype as left,
oprright::regtype as right,
oprresult::regtype as result,
oprcom::regoperator as commutator
from pg_operator
where oprname = '<->'
order by 1
Output from one of our severs:
| schema_name | owner | operator | left | right | result | commutator |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
| extensions | postgres | <-> | text | text | real | <->(text,text) |
| extensions | postgres | <-> | money | money | money | <->(money,money) |
| extensions | postgres | <-> | date | date | integer | <->(date,date) |
| extensions | postgres | <-> | real | real | real | <->(real,real) |
| extensions | postgres | <-> | double precision | double precision | double precision | <->(double precision,double precision) |
| extensions | postgres | <-> | smallint | smallint | smallint | <->(smallint,smallint) |
| extensions | postgres | <-> | integer | integer | integer | <->(integer,integer) |
| extensions | postgres | <-> | bigint | bigint | bigint | <->(bigint,bigint) |
| extensions | postgres | <-> | interval | interval | interval | <->(interval,interval) |
| extensions | postgres | <-> | oid | oid | oid | <->(oid,oid) |
| extensions | postgres | <-> | time without time zone | time without time zone | interval | <->(time without time zone,time without time zone) |
| extensions | postgres | <-> | timestamp without time zone | timestamp without time zone | interval | <->(timestamp without time zone,timestamp without time zone) |
| extensions | postgres | <-> | timestamp with time zone | timestamp with time zone | interval | <->(timestamp with time zone,timestamp with time zone) |
| pg_catalog | postgres | <-> | box | box | double precision | <->(box,box) |
| pg_catalog | postgres | <-> | path | path | double precision | <->(path,path) |
| pg_catalog | postgres | <-> | line | line | double precision | <->(line,line) |
| pg_catalog | postgres | <-> | lseg | lseg | double precision | <->(lseg,lseg) |
| pg_catalog | postgres | <-> | polygon | polygon | double precision | <->(polygon,polygon) |
| pg_catalog | postgres | <-> | circle | circle | double precision | <->(circle,circle) |
| pg_catalog | postgres | <-> | point | circle | double precision | <->(circle,point) |
| pg_catalog | postgres | <-> | circle | point | double precision | <->(point,circle) |
| pg_catalog | postgres | <-> | point | polygon | double precision | <->(polygon,point) |
| pg_catalog | postgres | <-> | polygon | point | double precision | <->(point,polygon) |
| pg_catalog | postgres | <-> | circle | polygon | double precision | <->(polygon,circle) |
| pg_catalog | postgres | <-> | polygon | circle | double precision | <->(circle,polygon) |
| pg_catalog | postgres | <-> | point | point | double precision | <->(point,point) |
| pg_catalog | postgres | <-> | box | line | double precision | <->(line,box) |
| pg_catalog | postgres | <-> | tsquery | tsquery | tsquery | 0 |
| pg_catalog | postgres | <-> | line | box | double precision | <->(box,line) |
| pg_catalog | postgres | <-> | point | line | double precision | <->(line,point) |
| pg_catalog | postgres | <-> | line | point | double precision | <->(point,line) |
| pg_catalog | postgres | <-> | point | lseg | double precision | <->(lseg,point) |
| pg_catalog | postgres | <-> | lseg | point | double precision | <->(point,lseg) |
| pg_catalog | postgres | <-> | point | box | double precision | <->(box,point) |
| pg_catalog | postgres | <-> | box | point | double precision | <->(point,box) |
| pg_catalog | postgres | <-> | lseg | line | double precision | <->(line,lseg) |
| pg_catalog | postgres | <-> | line | lseg | double precision | <->(lseg,line) |
| pg_catalog | postgres | <-> | lseg | box | double precision | <->(box,lseg) |
| pg_catalog | postgres | <-> | box | lseg | double precision | <->(lseg,box) |
| pg_catalog | postgres | <-> | point | path | double precision | <->(path,point) |
| pg_catalog | postgres | <-> | path | point | double precision | <->(point,path) |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
Did I miss any index opts worth knowing about?
Checking Out Live Indexes
Here's a longer-than-it-should-be-because-I-still-find-the-catalogs-confusing query to pull out the columns from each user index, and figure out their more interesting properties. For a nice, short catalog search of much utility, see https://dba.stackexchange.com/questions/186944/how-to-list-all-the-indexes-along-with-their-type-btree-brin-hash-etc
with
basic_details as (
select relnamespace::regnamespace::text as schema_name,
indrelid::regclass::text as table_name,
indexrelid::regclass::text as index_name,
unnest(indkey) as column_ordinal_position , -- WITH ORDINALITY would be nice here, didn't get it working.
generate_subscripts(indkey, 1) + 1 as column_position_in_index --
from pg_index
join pg_class on pg_class.oid = pg_index.indrelid
),
enriched_details as (
select basic_details.schema_name,
basic_details.table_name,
basic_details.index_name,
basic_details.column_ordinal_position,
basic_details.column_position_in_index,
columns.column_name,
columns.udt_name as column_type_name
from basic_details
join information_schema.columns as columns
on columns.table_schema = basic_details.schema_name
and columns.table_name = basic_details.table_name
and columns.ordinal_position = basic_details.column_ordinal_position
where schema_name not like 'pg_%'
)
select *,
-- https://postgrespro.com/blog/pgsql/4161264
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'distance_orderable'), false) as supports_knn_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'search_array'), false) as supports_in_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'returnable'), false) as supports_index_only_scans,
(select indexdef
from pg_indexes
where pg_indexes.schemaname = enriched_details.schema_name
and pg_indexes.indexname = enriched_details.index_name) as index_definition
from enriched_details
order by supports_in_searches desc,
schema_name,
table_name,
index_name
timestamp type supports KNN with GiST indexes using the <-> operator created by the btree_gist extension.
You can check if a specific column of a specific index supports it, like this:
select pg_index_column_has_property('pgbench_history_mtime_idx'::regclass,1,'distance_orderable');
As best as I can tell, here's the state of play as of PG 14:
GiST indexes may support nearest-neighbor (K-NN) proximity <--> search, and always have.
SP-GiST added such support as of PG 12.
RUM indexes (not in core) also support K-NN.
In all cases, support is done in the operator class:
https://www.postgresql.org/docs/current/indexes-opclass.html
That's what determines if distance_orderable works for a specific data type on a specific kind of index. Built-in, some of the geometric and text vector types work out-of-the box. Other than that small set, many more types are supported via specific operator classes, such as:
https://www.postgresql.org/docs/current/btree-gist.html
https://www.postgresql.org/docs/current/pgtrgm.html
In the case of SP-GiST, there are a lot fewer types supported than with GiST, once you've installed btree_gist:
https://www.postgresql.org/docs/14/spgist-builtin-opclasses.html
It looks like text_opts and range_opts do not support proximity searches. However, for tsrange, etc., there are likely enough options with other tools.

how to restore data values that already converted into scientific notation in a table

so i have a problem where i inserted some values into my table that the value automatically converted into a scientific notation (ex: 8.24e+04) does anyone know how restore the original value or how keep the original values in the table?
i'm using double precision as data type for the column and i just noticed that double precision data type often convert long number values into scientific notation.
this is how table looks like after i inserted some values
test=# select * from demo;
| string_col | values |
|------------|-----------------------|
| Rocket | 123228435521 |
| Test | 13328422942213 |
| Power | 1.243343991231232e+15 |
| Pull | 1.233433459353712e+15 |
| Drag | 1244375399128 |
edb=# \d+ demo;
Table "public.demo"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
------------+-----------------------+-----------+----------+---------+----------+--------------+-------------
string_col | character varying(20) | | | | extended | |
values | double precision | | | | plain | |
Access method: heap
this just some dummy table i used to explain my question here.
You'll have to format the number using to_char if you want it in a specific format:
SELECT 31672516735473059594023526::double precision,
to_char(
31672516735473059594023526::double precision,
'999999999999999999999999999.99999999999999FM'
);
float8 │ to_char
═══════════════════════╪════════════════════════════
3.167251673547306e+25 │ 31672516735473058997862400
(1 row)
The result is not exact because the precision of double precision is not high enough.
If you don't want the rounding errors and want to avoid scientific notation as well, use the data type numeric instead.

PostgreSQL: what is the difference between 0 and (0)::smallint as default?

I'm playing around with column types and default values, and came across two columns, both smallint but with different defaults:
has 0
has (0)::smallint
When I change 2 to the value 1 has, both are the same and nothing seems changed. Of course, I googled to find the answer to my question, and found that :: is used for type casting. But I wonder, if it is a default value, why would you need type casting?
There is no difference, and the casting is not necessary.
# create table somesmallints (a smallint default 0, b smallint default 0::smallint, c smallint default (0)::smallint);
CREATE TABLE
# \d+ somesmallints;
Table "public.somesmallints"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+----------+-----------+----------+---------------+---------+--------------+-------------
a | smallint | | | 0 | plain | |
b | smallint | | | (0)::smallint | plain | |
c | smallint | | | (0)::smallint | plain | |
# select 0 = 0::smallint;
?column?
----------
t
(1 row)

Operator ~<~ in Postgres

(Originally part of this question, but it was bit irrelevant, so I decided to make it a question of its own.)
I cannot find what the operator ~<~ is. The Postgres manual only mentions ~ and similar operators here, but no sign of ~<~.
When fiddling in the psql console, I found out that these commands give the same results:
SELECT * FROM test ORDER BY name USING ~<~;
SELECT * FROM test ORDER BY name COLLATE "C";
And these gives the reverse ordering:
SELECT * FROM test ORDER BY name USING ~>~;
SELECT * FROM test ORDER BY name COLLATE "C" DESC;
Also some info on the tilde operators:
\do ~*~
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Description
------------+------+---------------+----------------+-------------+-------------------------
pg_catalog | ~<=~ | character | character | boolean | less than or equal
pg_catalog | ~<=~ | text | text | boolean | less than or equal
pg_catalog | ~<~ | character | character | boolean | less than
pg_catalog | ~<~ | text | text | boolean | less than
pg_catalog | ~>=~ | character | character | boolean | greater than or equal
pg_catalog | ~>=~ | text | text | boolean | greater than or equal
pg_catalog | ~>~ | character | character | boolean | greater than
pg_catalog | ~>~ | text | text | boolean | greater than
pg_catalog | ~~ | bytea | bytea | boolean | matches LIKE expression
pg_catalog | ~~ | character | text | boolean | matches LIKE expression
pg_catalog | ~~ | name | text | boolean | matches LIKE expression
pg_catalog | ~~ | text | text | boolean | matches LIKE expression
(12 rows)
The 3rd and 4th rows is the operator I'm looking for, but the description is a bit insufficient for me.
~>=~, ~<=~, ~>~ and ~<~ are text pattern (or varchar, basically the same) operators, the counterparts of their respective siblings >=, <=, >and <. They sort character data strictly by their byte values, ignoring rules of any collation setting (as opposed to their siblings). This makes them faster, but also invalid for most languages / countries.
The "C" locale is effectively the same as no locale, meaning no collation rules. That explains why ORDER BY name USING ~<~ and ORDER BY name COLLATE "C" end up doing the same. The latter syntax variant should be preferred: more standard, less error-prone.
Detailed explanation in the last chapter of this related answer on dba.SE:
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
Note that ~~ / ~~* are Postgres operators for LIKE / ILIKE and barely related to the above. Similarly, !~~ / !~~* for NOT LIKE / NOT IILKE. (Use standard LIKE notation instead of these "internal" operators.)
Related:
~~ Operator In Postgres
Symfony2 Doctrine - ILIKE clause for PostgreSQL?
Find rows where text array contains value similar to input

Hashing a String to a Numeric Value in PostgreSQL

I need to Convert Strings stored in my Database to a Numeric value. Result can be Integer (preferred) or Bigint. This conversion is to be done at Database side in a PL/pgSQL function.
Can someone please point me to some algorithm or any API's that can be used to achieve this?
I have been searching for this on Google for hours now, could not find anything useful so far :(
Just keep the first 32 bits or 64 bits of the MD5 hash. Of course, it voids the main property of md5 (=the probability of collision being infinitesimal) but you'll still get a wide dispersion of values which presumably is good enough for your problem.
SQL functions derived from the other answers:
For bigint:
create function h_bigint(text) returns bigint as $$
select ('x'||substr(md5($1),1,16))::bit(64)::bigint;
$$ language sql;
For int:
create function h_int(text) returns int as $$
select ('x'||substr(md5($1),1,8))::bit(32)::int;
$$ language sql;
You can create a md5 hash value without problems:
select md5('hello, world');
This returns a string with a hex number.
Unfortunately there is no built-in function to convert hex to integer but as you are doing that in PL/pgSQL anyway, this might help:
https://stackoverflow.com/a/8316731/330315
PostgreSQL has hash functions for many column types. You can use hashtext if you want integer hash values, or hashtextextended if you prefer bigint hash values.
Note that hashXXXextended functions take one extra parameter for seed and 0 means that no seed will be used.
In the commit message the author Robert Haas says:
Just in case somebody wants a 64-bit hash value that is compatible
with the existing 32-bit hash values, make the low 32-bits of the
64-bit hash value match the 32-bit hash value when the seed is 0.
Example
postgres=# SELECT hashtextextended('test string of type text', 0);
hashtextextended
----------------------
-6578719834206879717
(1 row)
postgres=# SELECT hashtext('test string of type text');
hashtext
-------------
-1790427109
(1 row)
What about other data types?
You can check all available hash functions on your backend by checking the output for \df hash*. Below you can see the functions available in PG 14.0.
hanefi=# \df hash*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+--------------------------+------------------+--------------------------+------
pg_catalog | hash_aclitem | integer | aclitem | func
pg_catalog | hash_aclitem_extended | bigint | aclitem, bigint | func
pg_catalog | hash_array | integer | anyarray | func
pg_catalog | hash_array_extended | bigint | anyarray, bigint | func
pg_catalog | hash_multirange | integer | anymultirange | func
pg_catalog | hash_multirange_extended | bigint | anymultirange, bigint | func
pg_catalog | hash_numeric | integer | numeric | func
pg_catalog | hash_numeric_extended | bigint | numeric, bigint | func
pg_catalog | hash_range | integer | anyrange | func
pg_catalog | hash_range_extended | bigint | anyrange, bigint | func
pg_catalog | hash_record | integer | record | func
pg_catalog | hash_record_extended | bigint | record, bigint | func
pg_catalog | hashbpchar | integer | character | func
pg_catalog | hashbpcharextended | bigint | character, bigint | func
pg_catalog | hashchar | integer | "char" | func
pg_catalog | hashcharextended | bigint | "char", bigint | func
pg_catalog | hashenum | integer | anyenum | func
pg_catalog | hashenumextended | bigint | anyenum, bigint | func
pg_catalog | hashfloat4 | integer | real | func
pg_catalog | hashfloat4extended | bigint | real, bigint | func
pg_catalog | hashfloat8 | integer | double precision | func
pg_catalog | hashfloat8extended | bigint | double precision, bigint | func
pg_catalog | hashhandler | index_am_handler | internal | func
pg_catalog | hashinet | integer | inet | func
pg_catalog | hashinetextended | bigint | inet, bigint | func
pg_catalog | hashint2 | integer | smallint | func
pg_catalog | hashint2extended | bigint | smallint, bigint | func
pg_catalog | hashint4 | integer | integer | func
pg_catalog | hashint4extended | bigint | integer, bigint | func
pg_catalog | hashint8 | integer | bigint | func
pg_catalog | hashint8extended | bigint | bigint, bigint | func
pg_catalog | hashmacaddr | integer | macaddr | func
pg_catalog | hashmacaddr8 | integer | macaddr8 | func
pg_catalog | hashmacaddr8extended | bigint | macaddr8, bigint | func
pg_catalog | hashmacaddrextended | bigint | macaddr, bigint | func
pg_catalog | hashname | integer | name | func
pg_catalog | hashnameextended | bigint | name, bigint | func
pg_catalog | hashoid | integer | oid | func
pg_catalog | hashoidextended | bigint | oid, bigint | func
pg_catalog | hashoidvector | integer | oidvector | func
pg_catalog | hashoidvectorextended | bigint | oidvector, bigint | func
pg_catalog | hashtext | integer | text | func
pg_catalog | hashtextextended | bigint | text, bigint | func
pg_catalog | hashtid | integer | tid | func
pg_catalog | hashtidextended | bigint | tid, bigint | func
pg_catalog | hashvarlena | integer | internal | func
pg_catalog | hashvarlenaextended | bigint | internal, bigint | func
(47 rows)
Caveats
If you want to have consistent hashes across different systems, make sure you have the same collation behaviour.
The built-in collatable data types are text, varchar, and char. If you have different collation options, you can see different hash values. For example the sort order of the strings ‘a-a’ and ‘a+a’ flipped in glibc 2.28 (Debian 10, RHEL 8) compared to earlier releases.
If you wish to use the hashes on the same machine, you need not worry, as long as you do not update glibc or use a different collation.
See more details at: https://www.citusdata.com/blog/2020/12/12/dont-let-collation-versions-corrupt-your-postgresql-indexes/
Must it be an integer? The pg_crypto module provides a number of standard hash functions (md5, sha1, etc). They all return bytea. I suppose you could throw away some bits and convert bytea to integer.
bigint is too small to store a cryptographic hash. The largest non-bytea binary type Pg supports is uuid. You could cast a digest to uuid like this:
select ('{'||encode( substring(digest('foobar','sha256') from 1 for 16), 'hex')||'}')::uuid;
uuid
--------------------------------------
c3ab8ff1-3720-e8ad-9047-dd39466b3c89
This is an implementation of Java's String.hashCode():
CREATE OR REPLACE FUNCTION hashCode(_string text) RETURNS INTEGER AS $$
DECLARE
val_ CHAR[];
h_ INTEGER := 0;
ascii_ INTEGER;
c_ char;
BEGIN
val_ = regexp_split_to_array(_string, '');
FOR i in 1 .. array_length(val_, 1)
LOOP
c_ := (val_)[i];
ascii_ := ascii(c_);
h_ = 31 * h_ + ascii_;
raise info '%: % = %', i, c_, h_;
END LOOP;
RETURN h_;
END;
$$ LANGUAGE plpgsql;