so i have a problem where i inserted some values into my table that the value automatically converted into a scientific notation (ex: 8.24e+04) does anyone know how restore the original value or how keep the original values in the table?
i'm using double precision as data type for the column and i just noticed that double precision data type often convert long number values into scientific notation.
this is how table looks like after i inserted some values
test=# select * from demo;
| string_col | values |
| Rocket | 123228435521 |
| Test | 13328422942213 |
| Power | 1.243343991231232e+15 |
| Pull | 1.233433459353712e+15 |
| Drag | 1244375399128 |
edb=# \d+ demo;
Table "public.demo"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
string_col | character varying(20) | | | | extended | |
values | double precision | | | | plain | |
Access method: heap
this just some dummy table i used to explain my question here.

You'll have to format the number using to_char if you want it in a specific format:
SELECT 31672516735473059594023526::double precision,
31672516735473059594023526::double precision,
float8 │ to_char
3.167251673547306e+25 │ 31672516735473058997862400
(1 row)
The result is not exact because the precision of double precision is not high enough.
If you don't want the rounding errors and want to avoid scientific notation as well, use the data type numeric instead.


Any way to find and delete almost similar records with SQL?

I have a table in Postgres DB, that has a lot of almost identical rows. For example:
1. 00Zicky_-_San_Pedro_Danilo_Vigorito_Remix
2. 00Zicky_-_San_Pedro__Danilo_Vigorito_Remix__
3. 0101_-_Try_To_Say__Strictlyjaz_Unit_Future_Rmx__
4. 0101_-_Try_To_Say__Strictlyjaz_Unit_Future_Rmx_
5. 01_-_Digital_Excitation_-_Brothers_Gonna_Work_it_Out__Piano_Mix__
6. 01_-_Digital_Excitation_-_Brothers_Gonna_Work_it_Out__Piano_Mix__
I think about to writing a little golang script to remove duplicates, but maybe SQL can do it?
Table definition:
\d+ songs
Table "public.songs"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
song_id | integer | | not null | nextval('songs_song_id_seq'::regclass) | plain | | |
song_name | character varying(250) | | not null | | extended | | |
fingerprinted | smallint | | | 0 | plain | | |
file_sha1 | bytea | | | | extended | | |
total_hashes | integer | | not null | 0 | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
"pk_songs_song_id" PRIMARY KEY, btree (song_id)
Referenced by:
TABLE "fingerprints" CONSTRAINT "fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
Tried several methods to find duplicates, but that methods search only for exact similarity.
There is no operator which is essentially A almost = B. (Well there is full text search, but that seems to be a little excessive here.) If the only difference is the number of - and _ then just get rid of them and compare the the resulting difference. If they are equal, then one is a duplicate. You can use the replace() function to remove them. So something like: (see demo)
from songs s2
where exists ( select null
from songs s1
where s1.song_id < s2.song_id
and replace(replace(, '_',''),'-','') =
replace(replace(, '_',''),'-','')
If your table is large this will not be fast, but a functional index may help:
create index song_name_idx on songs
(replace(replace(name, '_',''),'-',''));

What Postgres 13 index types support distance searches?

Original Question
We've had great results using a K-NN search with a GiST index with gist_trgm_ops. Pure magic. I've got other situations, with other datatypes like timestamp where distance functions would be quite useful. If I didn't dream it, this is, or was, available through pg_catalog. Looking around, I can't find a way to search on indexes by such properties. I think what I'm after, in this case, is AMPROP_DISTANCE_ORDERABLE under-the-hood.
Just checked, and pg_am did have a lot more attributes than it does now, prior to 9.6.
Is there another way to figure out what options various indexes have with a catalog search?
jjanes' answer inspired me to look at the system information functions some more, and to spend a day in the pg_catalog tables. The catalogs for indexes and operators are complicated. The system information functions are a big help. This piece proved super useful for getting a handle on things:
I think the conclusion is "no, you can't readily figure out what data types and indexes support proximity searches." The relevant attribute is a property of a column in a specific index. However, it looks like nearest-neighbor searching requires a GiST index, and that there are readily-available index operator classes to add K-NN searching to a huge range of common types. Happy for corrections on these conclusions, or the details below.
Built-in Distance Support
From various bits of the docs, it sounds like there are distance (proximity, nearest neighbor, K-NN) operators for GiST indexes on a handful of built-in geometric types.
B-tree Operator Classes
Not listed as such in the docs, but visible with this query:
select am.amname AS index_method
, opc.opcname AS opclass_name
, opc.opcintype::regtype AS indexed_type
, opc.opcdefault AS is_default
from pg_am am
, pg_opclass opc
where opc.opcmethod = am.oid
and am.amname = 'btree'
order by 1,2;
B-tree GiST Distance Support
I guess a B-tree is a special case of a GiST, and there's a B-tree operator class to match. The docs say these native types are supported:
timestamp with time zone
timestamp without time zone
time without time zone
BRIN Built-in Operator Classes
There are over 70 listed in the internals docs.
GIN Built-in Operator Classes
Alternative Text Opts
There are special operator classes for text comparisons made character-by-character, rather than through a collation. Or so the docs say:
Beyond this, the included pg_trgm module includes operators for GIN and GiST, with the GiST version optimizing <->. I think this shows up as:
Note: Postgres 14 modifies pg_trgm to allow you to adjust the "signature length" for the index entry. Longer is possibly more accurate, shorter signatures are smaller on disk. If you've been using pg_trgm, it might be worth experimenting with the signature length in PG 14.
SP-GiST Built-in Operator Classes
pg_operator search
Here's a search on pg_operator to look for matches starting from the <-> operator itself:
select oprnamespace::regnamespace::text as schema_name,
oprowner::regrole as owner,
oprname as operator,
oprleft::regtype as left,
oprright::regtype as right,
oprresult::regtype as result,
oprcom::regoperator as commutator
from pg_operator
where oprname = '<->'
order by 1
Output from one of our severs:
| schema_name | owner | operator | left | right | result | commutator |
| extensions | postgres | <-> | text | text | real | <->(text,text) |
| extensions | postgres | <-> | money | money | money | <->(money,money) |
| extensions | postgres | <-> | date | date | integer | <->(date,date) |
| extensions | postgres | <-> | real | real | real | <->(real,real) |
| extensions | postgres | <-> | double precision | double precision | double precision | <->(double precision,double precision) |
| extensions | postgres | <-> | smallint | smallint | smallint | <->(smallint,smallint) |
| extensions | postgres | <-> | integer | integer | integer | <->(integer,integer) |
| extensions | postgres | <-> | bigint | bigint | bigint | <->(bigint,bigint) |
| extensions | postgres | <-> | interval | interval | interval | <->(interval,interval) |
| extensions | postgres | <-> | oid | oid | oid | <->(oid,oid) |
| extensions | postgres | <-> | time without time zone | time without time zone | interval | <->(time without time zone,time without time zone) |
| extensions | postgres | <-> | timestamp without time zone | timestamp without time zone | interval | <->(timestamp without time zone,timestamp without time zone) |
| extensions | postgres | <-> | timestamp with time zone | timestamp with time zone | interval | <->(timestamp with time zone,timestamp with time zone) |
| pg_catalog | postgres | <-> | box | box | double precision | <->(box,box) |
| pg_catalog | postgres | <-> | path | path | double precision | <->(path,path) |
| pg_catalog | postgres | <-> | line | line | double precision | <->(line,line) |
| pg_catalog | postgres | <-> | lseg | lseg | double precision | <->(lseg,lseg) |
| pg_catalog | postgres | <-> | polygon | polygon | double precision | <->(polygon,polygon) |
| pg_catalog | postgres | <-> | circle | circle | double precision | <->(circle,circle) |
| pg_catalog | postgres | <-> | point | circle | double precision | <->(circle,point) |
| pg_catalog | postgres | <-> | circle | point | double precision | <->(point,circle) |
| pg_catalog | postgres | <-> | point | polygon | double precision | <->(polygon,point) |
| pg_catalog | postgres | <-> | polygon | point | double precision | <->(point,polygon) |
| pg_catalog | postgres | <-> | circle | polygon | double precision | <->(polygon,circle) |
| pg_catalog | postgres | <-> | polygon | circle | double precision | <->(circle,polygon) |
| pg_catalog | postgres | <-> | point | point | double precision | <->(point,point) |
| pg_catalog | postgres | <-> | box | line | double precision | <->(line,box) |
| pg_catalog | postgres | <-> | tsquery | tsquery | tsquery | 0 |
| pg_catalog | postgres | <-> | line | box | double precision | <->(box,line) |
| pg_catalog | postgres | <-> | point | line | double precision | <->(line,point) |
| pg_catalog | postgres | <-> | line | point | double precision | <->(point,line) |
| pg_catalog | postgres | <-> | point | lseg | double precision | <->(lseg,point) |
| pg_catalog | postgres | <-> | lseg | point | double precision | <->(point,lseg) |
| pg_catalog | postgres | <-> | point | box | double precision | <->(box,point) |
| pg_catalog | postgres | <-> | box | point | double precision | <->(point,box) |
| pg_catalog | postgres | <-> | lseg | line | double precision | <->(line,lseg) |
| pg_catalog | postgres | <-> | line | lseg | double precision | <->(lseg,line) |
| pg_catalog | postgres | <-> | lseg | box | double precision | <->(box,lseg) |
| pg_catalog | postgres | <-> | box | lseg | double precision | <->(lseg,box) |
| pg_catalog | postgres | <-> | point | path | double precision | <->(path,point) |
| pg_catalog | postgres | <-> | path | point | double precision | <->(point,path) |
Did I miss any index opts worth knowing about?
Checking Out Live Indexes
Here's a longer-than-it-should-be-because-I-still-find-the-catalogs-confusing query to pull out the columns from each user index, and figure out their more interesting properties. For a nice, short catalog search of much utility, see
basic_details as (
select relnamespace::regnamespace::text as schema_name,
indrelid::regclass::text as table_name,
indexrelid::regclass::text as index_name,
unnest(indkey) as column_ordinal_position , -- WITH ORDINALITY would be nice here, didn't get it working.
generate_subscripts(indkey, 1) + 1 as column_position_in_index --
from pg_index
join pg_class on pg_class.oid = pg_index.indrelid
enriched_details as (
select basic_details.schema_name,
columns.udt_name as column_type_name
from basic_details
join information_schema.columns as columns
on columns.table_schema = basic_details.schema_name
and columns.table_name = basic_details.table_name
and columns.ordinal_position = basic_details.column_ordinal_position
where schema_name not like 'pg_%'
select *,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'distance_orderable'), false) as supports_knn_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'search_array'), false) as supports_in_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'returnable'), false) as supports_index_only_scans,
(select indexdef
from pg_indexes
where pg_indexes.schemaname = enriched_details.schema_name
and pg_indexes.indexname = enriched_details.index_name) as index_definition
from enriched_details
order by supports_in_searches desc,
timestamp type supports KNN with GiST indexes using the <-> operator created by the btree_gist extension.
You can check if a specific column of a specific index supports it, like this:
select pg_index_column_has_property('pgbench_history_mtime_idx'::regclass,1,'distance_orderable');
As best as I can tell, here's the state of play as of PG 14:
GiST indexes may support nearest-neighbor (K-NN) proximity <--> search, and always have.
SP-GiST added such support as of PG 12.
RUM indexes (not in core) also support K-NN.
In all cases, support is done in the operator class:
That's what determines if distance_orderable works for a specific data type on a specific kind of index. Built-in, some of the geometric and text vector types work out-of-the box. Other than that small set, many more types are supported via specific operator classes, such as:
In the case of SP-GiST, there are a lot fewer types supported than with GiST, once you've installed btree_gist:
It looks like text_opts and range_opts do not support proximity searches. However, for tsrange, etc., there are likely enough options with other tools.

GIST index creation too slow on PostgreSQL

I have a database in PostgreSQL with the following structure:
Column | Type | Collation | Nullable | Default
vessel_hash | integer | | not null | nextval('samplecol_vessel_hash_seq'::regclass)
status | character varying(50) | | |
station | character varying(50) | | |
speed | character varying(10) | | |
longitude | numeric(12,8) | | |
latitude | numeric(12,8) | | |
course | character varying(50) | | |
heading | character varying(50) | | |
timestamp | character varying(50) | | |
the_geom | geometry | | |
Check constraints:
"enforce_dims_the_geom" CHECK (st_ndims(the_geom) = 2)
"enforce_geotype_geom" CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL)
"enforce_srid_the_geom" CHECK (st_srid(the_geom) = 4326)
The database contains ~146.000.000 records and the size of table that contains the data is:
public | samplecol | table | postgres | 31 GB |
I try to create a GIST index on the geometry field the_geom with this command:
create index samplecol_the_geom_gist on samplecol using gist (the_geom);
but takes too long. It runs 2 hours already.
Based on this question Slow indexing of 300GB Postgis table
Ask Question, before index creation I execute in psql console:
ALTER SYSTEM SET maintenance_work_mem = '1GB';
SELCT pg_reload_conf();
(1 row)
But index creation takes too long. Does anyone know why? An how to fix this?
I am afraid you'll have to sit it out.
Apart from high maintenance_work_mem, there is not really a tuning option.
Increasing max_wal_size will help somewhat, since you will get fewer checkpoints.
If you can't live with an ACCESS EXCLUSIVE lock for that long, try CREATE INDEX CONCURRENTLY, which will be even slower, but won't block concurrent database activity.

How to select rows in a postgres table based on a date and store it in a new table?

I have a postgres table which has some data.Each row has a date associated with it.I want to extract rows for the dates which has the month as April.Here is a csv version of my postgres table data
0,2018-02-10 11:52:59.342269+00:00,BEM,,COD,6.0,23.0,11.75,0.0,,,,,,,,
1,2018-02-10 11:53:04.006971+00:00,VER,,KOD,6.0,23.0,4.58,0.0,,,,,,,,
2,2018-03-25 20:28:36.186015+00:00,RET,,POL,7.0,26.0,9.83,0.0,,86.328,5.0,4.33,15.33,0.0,23.0,
3,2018-03-25 20:28:59.155453+00:00,ASR,,VOL,5.0,14.0,2.67,0.0,,52.406,12.0,2.17,3.17,0.0,28.0,
4,2018-04-01 13:16:44.472119+00:00,RED,,SEW,6.0,14.0,2.77,0.0,,52.766,2.0,3.25,2.29,0.0,1.0,0.0
5,2018-04-01 13:16:48.478708+00:00,RED,,POL,6.0,14.0,4.065,0.0,,52.766,1.0,6.63,1.5,0.0,1.0,0.0
6,2018-04-06 21:00:44.769702+00:00,GOK,,FDE,4.0,22.0,3.08,0.0,,54.406,8.0,3.33,2.83,0.0,19.0,0.0
7,2018-04-06 21:01:07.211395+00:00,WER,,FDE,3.0,3.0,9.28,0.0,,0.346,2.0,10.54,8.02,0.0,33.0,0.0
8,2018-04-13 11:18:08.411550+00:00,DER,,CVE,14.0,14.0,7.88,0.0,,50.545,2.0,6.17,9.59,0.0,1.0,0.0
9,2018-04-13 11:18:12.420974+00:00,RTR,,BOL,14.0,14.0,1.345,0.0,,50.545,1.0,2.26,0.43,0.0,1.0,0.0
So I want only the rows which has a month of april data such that I will have a table which looks something like this
4,2018-04-01 13:16:44.472119+00:00,RED,,SEW,6.0,14.0,2.77,0.0,,52.766,2.0,3.25,2.29,0.0,1.0,0.0
5,2018-04-01 13:16:48.478708+00:00,RED,,POL,6.0,14.0,4.065,0.0,,52.766,1.0,6.63,1.5,0.0,1.0,0.0
6,2018-04-06 21:00:44.769702+00:00,GOK,,FDE,4.0,22.0,3.08,0.0,,54.406,8.0,3.33,2.83,0.0,19.0,0.0
7,2018-04-06 21:01:07.211395+00:00,WER,,FDE,3.0,3.0,9.28,0.0,,0.346,2.0,10.54,8.02,0.0,33.0,0.0
8,2018-04-13 11:18:08.411550+00:00,DER,,CVE,14.0,14.0,7.88,0.0,,50.545,2.0,6.17,9.59,0.0,1.0,0.0
9,2018-04-13 11:18:12.420974+00:00,RTR,,BOL,14.0,14.0,1.345,0.0,,50.545,1.0,2.26,0.43,0.0,1.0,0.0
Now If I try to extract a particular date with the below query
select * from metrics_data where date = 2018-04-13;
I get the error message
No operator matches the given name and argument type(s). You might need to add explicit type casts.
How do I get the rows for the month of April and store it in a new table say april_data?
Below is the structure of my existing table
Column | Type | Modifiers | Storage | Stats target | Description
date | timestamp with time zone | | plain | |
location | character varying(255) | | extended | |
device | character varying(255) | | extended | |
provider | character varying(255) | | extended | |
cpu | double precision | | plain | |
mem | double precision | | plain | |
load | double precision | | plain | |
drops | double precision | | plain | |
id | integer | | plain | |
latency | double precision | | plain | |
gw_latency | double precision | | plain | |
upload | double precision | | plain | |
download | double precision | | plain | |
sap_drops | double precision | | plain | |
sap_latency | double precision | | plain | |
alert_id | double precision | | plain | |
The type of column date in your table is timestamp with time zone which format will be YYYY:MM:DD HH24:MI:SS.MS. In the query you make an operation timestamp with time zone = date, so it will throw an error.
So, if you want to fix it, you should fix one side to the type of other.
In your case I suggest as below:
Match exact 1 day.
select * from metrics_data where date(date) = '2018-04-13';
Match within 1 month.
select * from metrics_data where date BETWEEN '2018-04-01 00:00:00' AND '2018-04-30 23:59:59.999';
select * from metrics_data where date(date) BETWEEN '2018-04-01' AND '2018-04-30';
select * from metrics_data where to_char(date,'YYYY-MM') = '2018-04';
Match only April.
select * from metrics_data where to_char(date,'MM') = '04';
select * from metrics_data where extract(month from date) = 4;
Hopefully this answer will help you.
You need single quotes around the string literal.
PostgreSQL will automatically cast it to the correct data type (timestamp with time zone).
You could use the extract function to select only the dates from April:
SELECT * FROM yourtable WHERE extract (month FROM = 4;

Bit masking in Postgres

I have this query
SELECT * FROM "functions" WHERE (models_mask & 1 > 0)
and the I get the following error:
PGError: ERROR: operator does not exist: character varying & integer
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
The models_mask is an integer in the database. How can I fix this.
Thank you!
Check out the docs on bit operators for Pg.
Essentially & only works on two like types (usually bit or int), so model_mask will have to be CASTed from varchar to something reasonable like bit or int:
models_mask::int & 1 -or- models_mask::int::bit & b'1'
You can find out what types an operator works with using \doS in psql
pg_catalog | & | bigint | bigint | bigint | bitwise and
pg_catalog | & | bit | bit | bit | bitwise and
pg_catalog | & | inet | inet | inet | bitwise and
pg_catalog | & | integer | integer | integer | bitwise and
pg_catalog | & | smallint | smallint | smallint | bitwise and
Here is a quick example for more information
# SELECT 11 & 15 AS int, b'1011' & b'1111' AS bin INTO foo;
# \d foo
Table ""
Column | Type | Modifiers
int | integer |
bin | "bit" |
# SELECT * FROM foo;
int | bin
11 | 1011