how to restore data values that already converted into scientific notation in a table - postgresql

so i have a problem where i inserted some values into my table that the value automatically converted into a scientific notation (ex: 8.24e+04) does anyone know how restore the original value or how keep the original values in the table?
i'm using double precision as data type for the column and i just noticed that double precision data type often convert long number values into scientific notation.
this is how table looks like after i inserted some values
test=# select * from demo;
| string_col | values |
|------------|-----------------------|
| Rocket | 123228435521 |
| Test | 13328422942213 |
| Power | 1.243343991231232e+15 |
| Pull | 1.233433459353712e+15 |
| Drag | 1244375399128 |
edb=# \d+ demo;
Table "public.demo"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
------------+-----------------------+-----------+----------+---------+----------+--------------+-------------
string_col | character varying(20) | | | | extended | |
values | double precision | | | | plain | |
Access method: heap
this just some dummy table i used to explain my question here.

You'll have to format the number using to_char if you want it in a specific format:
SELECT 31672516735473059594023526::double precision,
to_char(
31672516735473059594023526::double precision,
'999999999999999999999999999.99999999999999FM'
);
float8 │ to_char
═══════════════════════╪════════════════════════════
3.167251673547306e+25 │ 31672516735473058997862400
(1 row)
The result is not exact because the precision of double precision is not high enough.
If you don't want the rounding errors and want to avoid scientific notation as well, use the data type numeric instead.

Related

Any way to find and delete almost similar records with SQL?

I have a table in Postgres DB, that has a lot of almost identical rows. For example:
1. 00Zicky_-_San_Pedro_Danilo_Vigorito_Remix
2. 00Zicky_-_San_Pedro__Danilo_Vigorito_Remix__
3. 0101_-_Try_To_Say__Strictlyjaz_Unit_Future_Rmx__
4. 0101_-_Try_To_Say__Strictlyjaz_Unit_Future_Rmx_
5. 01_-_Digital_Excitation_-_Brothers_Gonna_Work_it_Out__Piano_Mix__
6. 01_-_Digital_Excitation_-_Brothers_Gonna_Work_it_Out__Piano_Mix__
I think about to writing a little golang script to remove duplicates, but maybe SQL can do it?
Table definition:
\d+ songs
Table "public.songs"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------------+-----------------------------+-----------+----------+----------------------------------------+----------+-------------+--------------+-------------
song_id | integer | | not null | nextval('songs_song_id_seq'::regclass) | plain | | |
song_name | character varying(250) | | not null | | extended | | |
fingerprinted | smallint | | | 0 | plain | | |
file_sha1 | bytea | | | | extended | | |
total_hashes | integer | | not null | 0 | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
Indexes:
"pk_songs_song_id" PRIMARY KEY, btree (song_id)
Referenced by:
TABLE "fingerprints" CONSTRAINT "fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
Tried several methods to find duplicates, but that methods search only for exact similarity.
There is no operator which is essentially A almost = B. (Well there is full text search, but that seems to be a little excessive here.) If the only difference is the number of - and _ then just get rid of them and compare the the resulting difference. If they are equal, then one is a duplicate. You can use the replace() function to remove them. So something like: (see demo)
delete
from songs s2
where exists ( select null
from songs s1
where s1.song_id < s2.song_id
and replace(replace(s1.name, '_',''),'-','') =
replace(replace(s2.name, '_',''),'-','')
);
If your table is large this will not be fast, but a functional index may help:
create index song_name_idx on songs
(replace(replace(name, '_',''),'-',''));

What Postgres 13 index types support distance searches?

Original Question
We've had great results using a K-NN search with a GiST index with gist_trgm_ops. Pure magic. I've got other situations, with other datatypes like timestamp where distance functions would be quite useful. If I didn't dream it, this is, or was, available through pg_catalog. Looking around, I can't find a way to search on indexes by such properties. I think what I'm after, in this case, is AMPROP_DISTANCE_ORDERABLE under-the-hood.
Just checked, and pg_am did have a lot more attributes than it does now, prior to 9.6.
Is there another way to figure out what options various indexes have with a catalog search?
Catalogs
jjanes' answer inspired me to look at the system information functions some more, and to spend a day in the pg_catalog tables. The catalogs for indexes and operators are complicated. The system information functions are a big help. This piece proved super useful for getting a handle on things:
https://postgrespro.com/blog/pgsql/4161264
I think the conclusion is "no, you can't readily figure out what data types and indexes support proximity searches." The relevant attribute is a property of a column in a specific index. However, it looks like nearest-neighbor searching requires a GiST index, and that there are readily-available index operator classes to add K-NN searching to a huge range of common types. Happy for corrections on these conclusions, or the details below.
Built-in Distance Support
https://www.postgresql.org/docs/current/gist-builtin-opclasses.html
From various bits of the docs, it sounds like there are distance (proximity, nearest neighbor, K-NN) operators for GiST indexes on a handful of built-in geometric types.
box
circle
point
poly
B-tree Operator Classes
Not listed as such in the docs, but visible with this query:
select am.amname AS index_method
, opc.opcname AS opclass_name
, opc.opcintype::regtype AS indexed_type
, opc.opcdefault AS is_default
from pg_am am
, pg_opclass opc
where opc.opcmethod = am.oid
and am.amname = 'btree'
order by 1,2;
B-tree GiST Distance Support
https://www.postgresql.org/docs/current/btree-gist.html
I guess a B-tree is a special case of a GiST, and there's a B-tree operator class to match. The docs say these native types are supported:
int2
int4
int8
float4
float8
timestamp with time zone
timestamp without time zone
time without time zone
date
interval
oid
money
BRIN Built-in Operator Classes
https://www.postgresql.org/docs/current/brin-builtin-opclasses.html
There are over 70 listed in the internals docs.
GIN Built-in Operator Classes
https://www.postgresql.org/docs/12/gin-builtin-opclasses.html
array_ops
jsonb_ops
jsonb_path_ops
tsvector_ops
Alternative Text Opts
https://www.postgresql.org/docs/current/indexes-opclass.html
There are special operator classes for text comparisons made character-by-character, rather than through a collation. Or so the docs say:
text_pattern_ops
varchar_pattern_ops
bpchar_pattern_ops
pg_trgm
Beyond this, the included pg_trgm module includes operators for GIN and GiST, with the GiST version optimizing <->. I think this shows up as:
text
Note: Postgres 14 modifies pg_trgm to allow you to adjust the "signature length" for the index entry. Longer is possibly more accurate, shorter signatures are smaller on disk. If you've been using pg_trgm, it might be worth experimenting with the signature length in PG 14.
https://www.postgresql.org/docs/current/pgtrgm.html
SP-GiST Built-in Operator Classes
box_ops
kd_point_ops
network_ops
poly_ops
quad_point_ops
range_ops
text_ops
pg_operator search
Here's a search on pg_operator to look for matches starting from the <-> operator itself:
select oprnamespace::regnamespace::text as schema_name,
oprowner::regrole as owner,
oprname as operator,
oprleft::regtype as left,
oprright::regtype as right,
oprresult::regtype as result,
oprcom::regoperator as commutator
from pg_operator
where oprname = '<->'
order by 1
Output from one of our severs:
| schema_name | owner | operator | left | right | result | commutator |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
| extensions | postgres | <-> | text | text | real | <->(text,text) |
| extensions | postgres | <-> | money | money | money | <->(money,money) |
| extensions | postgres | <-> | date | date | integer | <->(date,date) |
| extensions | postgres | <-> | real | real | real | <->(real,real) |
| extensions | postgres | <-> | double precision | double precision | double precision | <->(double precision,double precision) |
| extensions | postgres | <-> | smallint | smallint | smallint | <->(smallint,smallint) |
| extensions | postgres | <-> | integer | integer | integer | <->(integer,integer) |
| extensions | postgres | <-> | bigint | bigint | bigint | <->(bigint,bigint) |
| extensions | postgres | <-> | interval | interval | interval | <->(interval,interval) |
| extensions | postgres | <-> | oid | oid | oid | <->(oid,oid) |
| extensions | postgres | <-> | time without time zone | time without time zone | interval | <->(time without time zone,time without time zone) |
| extensions | postgres | <-> | timestamp without time zone | timestamp without time zone | interval | <->(timestamp without time zone,timestamp without time zone) |
| extensions | postgres | <-> | timestamp with time zone | timestamp with time zone | interval | <->(timestamp with time zone,timestamp with time zone) |
| pg_catalog | postgres | <-> | box | box | double precision | <->(box,box) |
| pg_catalog | postgres | <-> | path | path | double precision | <->(path,path) |
| pg_catalog | postgres | <-> | line | line | double precision | <->(line,line) |
| pg_catalog | postgres | <-> | lseg | lseg | double precision | <->(lseg,lseg) |
| pg_catalog | postgres | <-> | polygon | polygon | double precision | <->(polygon,polygon) |
| pg_catalog | postgres | <-> | circle | circle | double precision | <->(circle,circle) |
| pg_catalog | postgres | <-> | point | circle | double precision | <->(circle,point) |
| pg_catalog | postgres | <-> | circle | point | double precision | <->(point,circle) |
| pg_catalog | postgres | <-> | point | polygon | double precision | <->(polygon,point) |
| pg_catalog | postgres | <-> | polygon | point | double precision | <->(point,polygon) |
| pg_catalog | postgres | <-> | circle | polygon | double precision | <->(polygon,circle) |
| pg_catalog | postgres | <-> | polygon | circle | double precision | <->(circle,polygon) |
| pg_catalog | postgres | <-> | point | point | double precision | <->(point,point) |
| pg_catalog | postgres | <-> | box | line | double precision | <->(line,box) |
| pg_catalog | postgres | <-> | tsquery | tsquery | tsquery | 0 |
| pg_catalog | postgres | <-> | line | box | double precision | <->(box,line) |
| pg_catalog | postgres | <-> | point | line | double precision | <->(line,point) |
| pg_catalog | postgres | <-> | line | point | double precision | <->(point,line) |
| pg_catalog | postgres | <-> | point | lseg | double precision | <->(lseg,point) |
| pg_catalog | postgres | <-> | lseg | point | double precision | <->(point,lseg) |
| pg_catalog | postgres | <-> | point | box | double precision | <->(box,point) |
| pg_catalog | postgres | <-> | box | point | double precision | <->(point,box) |
| pg_catalog | postgres | <-> | lseg | line | double precision | <->(line,lseg) |
| pg_catalog | postgres | <-> | line | lseg | double precision | <->(lseg,line) |
| pg_catalog | postgres | <-> | lseg | box | double precision | <->(box,lseg) |
| pg_catalog | postgres | <-> | box | lseg | double precision | <->(lseg,box) |
| pg_catalog | postgres | <-> | point | path | double precision | <->(path,point) |
| pg_catalog | postgres | <-> | path | point | double precision | <->(point,path) |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
Did I miss any index opts worth knowing about?
Checking Out Live Indexes
Here's a longer-than-it-should-be-because-I-still-find-the-catalogs-confusing query to pull out the columns from each user index, and figure out their more interesting properties. For a nice, short catalog search of much utility, see https://dba.stackexchange.com/questions/186944/how-to-list-all-the-indexes-along-with-their-type-btree-brin-hash-etc
with
basic_details as (
select relnamespace::regnamespace::text as schema_name,
indrelid::regclass::text as table_name,
indexrelid::regclass::text as index_name,
unnest(indkey) as column_ordinal_position , -- WITH ORDINALITY would be nice here, didn't get it working.
generate_subscripts(indkey, 1) + 1 as column_position_in_index --
from pg_index
join pg_class on pg_class.oid = pg_index.indrelid
),
enriched_details as (
select basic_details.schema_name,
basic_details.table_name,
basic_details.index_name,
basic_details.column_ordinal_position,
basic_details.column_position_in_index,
columns.column_name,
columns.udt_name as column_type_name
from basic_details
join information_schema.columns as columns
on columns.table_schema = basic_details.schema_name
and columns.table_name = basic_details.table_name
and columns.ordinal_position = basic_details.column_ordinal_position
where schema_name not like 'pg_%'
)
select *,
-- https://postgrespro.com/blog/pgsql/4161264
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'distance_orderable'), false) as supports_knn_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'search_array'), false) as supports_in_searches,
coalesce(pg_index_column_has_property(index_name,column_position_in_index,'returnable'), false) as supports_index_only_scans,
(select indexdef
from pg_indexes
where pg_indexes.schemaname = enriched_details.schema_name
and pg_indexes.indexname = enriched_details.index_name) as index_definition
from enriched_details
order by supports_in_searches desc,
schema_name,
table_name,
index_name
timestamp type supports KNN with GiST indexes using the <-> operator created by the btree_gist extension.
You can check if a specific column of a specific index supports it, like this:
select pg_index_column_has_property('pgbench_history_mtime_idx'::regclass,1,'distance_orderable');
As best as I can tell, here's the state of play as of PG 14:
GiST indexes may support nearest-neighbor (K-NN) proximity <--> search, and always have.
SP-GiST added such support as of PG 12.
RUM indexes (not in core) also support K-NN.
In all cases, support is done in the operator class:
https://www.postgresql.org/docs/current/indexes-opclass.html
That's what determines if distance_orderable works for a specific data type on a specific kind of index. Built-in, some of the geometric and text vector types work out-of-the box. Other than that small set, many more types are supported via specific operator classes, such as:
https://www.postgresql.org/docs/current/btree-gist.html
https://www.postgresql.org/docs/current/pgtrgm.html
In the case of SP-GiST, there are a lot fewer types supported than with GiST, once you've installed btree_gist:
https://www.postgresql.org/docs/14/spgist-builtin-opclasses.html
It looks like text_opts and range_opts do not support proximity searches. However, for tsrange, etc., there are likely enough options with other tools.

GIST index creation too slow on PostgreSQL

I have a database in PostgreSQL with the following structure:
Column | Type | Collation | Nullable | Default
-------------+-----------------------+-----------+----------+------------------------------------------------
vessel_hash | integer | | not null | nextval('samplecol_vessel_hash_seq'::regclass)
status | character varying(50) | | |
station | character varying(50) | | |
speed | character varying(10) | | |
longitude | numeric(12,8) | | |
latitude | numeric(12,8) | | |
course | character varying(50) | | |
heading | character varying(50) | | |
timestamp | character varying(50) | | |
the_geom | geometry | | |
Check constraints:
"enforce_dims_the_geom" CHECK (st_ndims(the_geom) = 2)
"enforce_geotype_geom" CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL)
"enforce_srid_the_geom" CHECK (st_srid(the_geom) = 4326)
The database contains ~146.000.000 records and the size of table that contains the data is:
public | samplecol | table | postgres | 31 GB |
I try to create a GIST index on the geometry field the_geom with this command:
create index samplecol_the_geom_gist on samplecol using gist (the_geom);
but takes too long. It runs 2 hours already.
Based on this question Slow indexing of 300GB Postgis table
Ask Question, before index creation I execute in psql console:
ALTER SYSTEM SET maintenance_work_mem = '1GB';
ALTER SYSTEM
SELCT pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
But index creation takes too long. Does anyone know why? An how to fix this?
I am afraid you'll have to sit it out.
Apart from high maintenance_work_mem, there is not really a tuning option.
Increasing max_wal_size will help somewhat, since you will get fewer checkpoints.
If you can't live with an ACCESS EXCLUSIVE lock for that long, try CREATE INDEX CONCURRENTLY, which will be even slower, but won't block concurrent database activity.

How to select rows in a postgres table based on a date and store it in a new table?

I have a postgres table which has some data.Each row has a date associated with it.I want to extract rows for the dates which has the month as April.Here is a csv version of my postgres table data
,date,location,device,provider,cpu,mem,load,drops,id,latency,gw_latency,upload,download,sap_drops,sap_latency,alert_id
0,2018-02-10 11:52:59.342269+00:00,BEM,10.11.100.1,COD,6.0,23.0,11.75,0.0,,,,,,,,
1,2018-02-10 11:53:04.006971+00:00,VER,10.11.100.1,KOD,6.0,23.0,4.58,0.0,,,,,,,,
2,2018-03-25 20:28:36.186015+00:00,RET,10.11.100.1,POL,7.0,26.0,9.83,0.0,,86.328,5.0,4.33,15.33,0.0,23.0,
3,2018-03-25 20:28:59.155453+00:00,ASR,10.12.100.1,VOL,5.0,14.0,2.67,0.0,,52.406,12.0,2.17,3.17,0.0,28.0,
4,2018-04-01 13:16:44.472119+00:00,RED,10.19.0.1,SEW,6.0,14.0,2.77,0.0,,52.766,2.0,3.25,2.29,0.0,1.0,0.0
5,2018-04-01 13:16:48.478708+00:00,RED,10.19.0.1,POL,6.0,14.0,4.065,0.0,,52.766,1.0,6.63,1.5,0.0,1.0,0.0
6,2018-04-06 21:00:44.769702+00:00,GOK,10.61.100.1,FDE,4.0,22.0,3.08,0.0,,54.406,8.0,3.33,2.83,0.0,19.0,0.0
7,2018-04-06 21:01:07.211395+00:00,WER,10.4.100.1,FDE,3.0,3.0,9.28,0.0,,0.346,2.0,10.54,8.02,0.0,33.0,0.0
8,2018-04-13 11:18:08.411550+00:00,DER,10.19.0.1,CVE,14.0,14.0,7.88,0.0,,50.545,2.0,6.17,9.59,0.0,1.0,0.0
9,2018-04-13 11:18:12.420974+00:00,RTR,10.19.0.1,BOL,14.0,14.0,1.345,0.0,,50.545,1.0,2.26,0.43,0.0,1.0,0.0
So I want only the rows which has a month of april data such that I will have a table which looks something like this
4,2018-04-01 13:16:44.472119+00:00,RED,10.19.0.1,SEW,6.0,14.0,2.77,0.0,,52.766,2.0,3.25,2.29,0.0,1.0,0.0
5,2018-04-01 13:16:48.478708+00:00,RED,10.19.0.1,POL,6.0,14.0,4.065,0.0,,52.766,1.0,6.63,1.5,0.0,1.0,0.0
6,2018-04-06 21:00:44.769702+00:00,GOK,10.61.100.1,FDE,4.0,22.0,3.08,0.0,,54.406,8.0,3.33,2.83,0.0,19.0,0.0
7,2018-04-06 21:01:07.211395+00:00,WER,10.4.100.1,FDE,3.0,3.0,9.28,0.0,,0.346,2.0,10.54,8.02,0.0,33.0,0.0
8,2018-04-13 11:18:08.411550+00:00,DER,10.19.0.1,CVE,14.0,14.0,7.88,0.0,,50.545,2.0,6.17,9.59,0.0,1.0,0.0
9,2018-04-13 11:18:12.420974+00:00,RTR,10.19.0.1,BOL,14.0,14.0,1.345,0.0,,50.545,1.0,2.26,0.43,0.0,1.0,0.0
Now If I try to extract a particular date with the below query
select * from metrics_data where date = 2018-04-13;
I get the error message
No operator matches the given name and argument type(s). You might need to add explicit type casts.
How do I get the rows for the month of April and store it in a new table say april_data?
Below is the structure of my existing table
Column | Type | Modifiers | Storage | Stats target | Description
-------------+--------------------------+-----------+----------+--------------+-------------
date | timestamp with time zone | | plain | |
location | character varying(255) | | extended | |
device | character varying(255) | | extended | |
provider | character varying(255) | | extended | |
cpu | double precision | | plain | |
mem | double precision | | plain | |
load | double precision | | plain | |
drops | double precision | | plain | |
id | integer | | plain | |
latency | double precision | | plain | |
gw_latency | double precision | | plain | |
upload | double precision | | plain | |
download | double precision | | plain | |
sap_drops | double precision | | plain | |
sap_latency | double precision | | plain | |
alert_id | double precision | | plain | |
The type of column date in your table is timestamp with time zone which format will be YYYY:MM:DD HH24:MI:SS.MS. In the query you make an operation timestamp with time zone = date, so it will throw an error.
So, if you want to fix it, you should fix one side to the type of other.
In your case I suggest as below:
Match exact 1 day.
select * from metrics_data where date(date) = '2018-04-13';
Match within 1 month.
select * from metrics_data where date BETWEEN '2018-04-01 00:00:00' AND '2018-04-30 23:59:59.999';
OR
select * from metrics_data where date(date) BETWEEN '2018-04-01' AND '2018-04-30';
OR
select * from metrics_data where to_char(date,'YYYY-MM') = '2018-04';
Match only April.
select * from metrics_data where to_char(date,'MM') = '04';
OR
select * from metrics_data where extract(month from date) = 4;
Hopefully this answer will help you.
You need single quotes around the string literal.
PostgreSQL will automatically cast it to the correct data type (timestamp with time zone).
You could use the extract function to select only the dates from April:
SELECT * FROM yourtable WHERE extract (month FROM yourtable.date) = 4;

Bit masking in Postgres

I have this query
SELECT * FROM "functions" WHERE (models_mask & 1 > 0)
and the I get the following error:
PGError: ERROR: operator does not exist: character varying & integer
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
The models_mask is an integer in the database. How can I fix this.
Thank you!
Check out the docs on bit operators for Pg.
Essentially & only works on two like types (usually bit or int), so model_mask will have to be CASTed from varchar to something reasonable like bit or int:
models_mask::int & 1 -or- models_mask::int::bit & b'1'
You can find out what types an operator works with using \doS in psql
pg_catalog | & | bigint | bigint | bigint | bitwise and
pg_catalog | & | bit | bit | bit | bitwise and
pg_catalog | & | inet | inet | inet | bitwise and
pg_catalog | & | integer | integer | integer | bitwise and
pg_catalog | & | smallint | smallint | smallint | bitwise and
Here is a quick example for more information
# SELECT 11 & 15 AS int, b'1011' & b'1111' AS bin INTO foo;
SELECT
# \d foo
Table "public.foo"
Column | Type | Modifiers
--------+---------+-----------
int | integer |
bin | "bit" |
# SELECT * FROM foo;
int | bin
-----+------
11 | 1011