Hash index is taking 1000% more time than Btree in postgres

Hash index is taking 1000% more time than Btree in postgres - postgresql

Is there anyway we can improve the Hash index creation as it is taking more time than Btree

Related

How do I measure if an index is bloated in postgresql?

How do I measure if an index is bloated in postgresql 13? I'm specifically wondering about the primary key index of a table.

Multicolum index vs singel column index for time series data in Postgres

This table started out at short term storage for meter data before it was going to be validated and added to some long term storage tables.
Turns out the clients wants to keep this data for a long time since we saved it and it is growing fast.
create table metering_meterreading
(
id bigserial not null. # Primary Key
created_at timestamp with time zone not null,
updated_at timestamp with time zone not null,
timestamp timestamp with time zone not null, # BTREE index
value numeric(15, 3) not null,
meter_device_id uuid not null, # FK to meter_device, BTREE index
series_id uuid not null # FK to series, BTREE index
organization_id uuid not null. # FK to org , BTREE index
);
I am planning on dropping the primary key since (org_id, meter_device_id, series_id, timestamp) makes it unique. It was just added by my ORM (django) and I didn't care when we started.
But since I pretty much always want to filter in organization, meter_device, and series to get a range of time series data I am wondering if it would be more efficient to have a multicolumn index on (organization_id, meter_device_id, series_id, timestamp) instead of the separate indexes.
I read somewhere that if I had a range it should be the rightmost in the index.
This is still not an super efficient table for timeseries data, since it will grow large, but I am planning in fixing that by partitioning on range, or maybe even use Timescale. But before partitioning I would like it to be as efficient as possible to look up data in it.
I also saw an example somewhere that used a separate table to identify the metric:
create table metric
(
id
organization_id
meter_device_id
series_id
) UNIQE (organization_id, meter_device_id, series_id)
;
create table metering_meterreading
(
metric_id. bigserial, FK to metric, BTREE index
timestamp timestamp with time zone not null, # BTREE index
value numeric(15, 3) not null,
created_at timestamp with time zone not null,
updated_at timestamp with time zone not null,
);
But I am not sure if that is actually better than just putting them all in table. It might impact ingestion rate since there is another table involved now.

If (org_id, meter_device_id, series_id, timestamp) uniquely determine a table row, you need to use a multi-column primary key over all of them. So you automatically have a 4-column index on these columns. Just make sure that timestamp is last in the list, then that index will support your query ideally.

Performance Difference between index on two columns vs conditional index

I have a decently large postgres table with a few billion rows.
However the table could be partitioned by one column (type)
Should we prefer:
An index with two columns
create nonclustered index ix_index1 on table1(type, string_urn_id)
or a conditional index
create nonclustered index ix_index1_alternative on table1(string_urn_id) WHERE type = 'type1'
create nonclustered index ix_index1_alternative2 on table1(string_urn_id) WHERE type = 'type2'
create nonclustered index ix_index1_alternative3 on table1(string_urn_id) WHERE type = 'type3'
....

There is no statement create nonclustered index in PostgreSQL.
What is better depends on the definition of "better". From a maintenance perspective, the single index is better, because you won't have to create a new index whenever you add a new type.
From a performance perspective, only a benchmark with realistic data can tell. Planning time will increase with many indexes, but query performance may be a tad better.
If you partition the table, query performance will decrease, but you can do with a single partitioned index on string_urn_id.

Postgres choosing BTREE instead of BRIN index

I'm running Postgres 9.5 and am playing around with BRIN indexes. I have a fact table with about 150 million rows and I'm trying to get PG to use a BRIN index. My query is:
select sum(transaction_amt),
sum (total_amt)
from fact_transaction
where transaction_date_key between 20170101 and 20170201
I created both a BTREE index and a BRIN index (default pages_per_range value of 128) on column transaction_date_key (the above query is referring to January to February 2017). I would have thought that PG would choose to use the BRIN index however it goes with the BTREE index. Here is the explain plan:
https://explain.depesz.com/s/uPI
I then deleted the BTREE index, did a vacuum / analyze on the the table, and re-ran the query and it did choose the BRIN index however the run time was considerably longer:
https://explain.depesz.com/s/5VXi
In fact my tests were all faster when using the BTREE index rather than the BRIN index. I thought it was supposed to be the opposite?
I'd prefer to use the BRIN index because of its smaller size however I can't seem to get PG to use it.
Note: I loaded the data, starting from January 2017 through to June 2017 (defined via transaction_date_key) as I read that physical table ordering makes a difference when using BRIN indexes.
Does anyone know why PG is choosing to use the BTREE index and why BRIN is so much slower in my case?

It seems like the BRIN index scan is not very selective – it returns 30 million rows, all of which have to be re-checked, which is where the time is spent.
That probably means that transaction_date_key is not well correlated with the physical location of the rows in the table.
A BRIN index works by “lumping together” ranges of table blocks (how many can be configured with the storage parameter pages_per_range, whose default value is 128). The maximum and minimum of the indexed value for eatch range of blocks is stored.
So a lot of block ranges in your table contain transaction_date_key between 20170101 and 20170201, and all of these blocks have to be scanned to compute the query result.
I see two options to improve the situation:
Lower the pages_per_range storage parameter. That will make the index bigger, but it will reduce the number of “false positive” blocks.
Cluster the table on the transaction_date_key attribute. As you have found out, that requires (at least temporarily) a B-tree index on the column.

How Internally BTree and Bitmap Index stored in oracle?

I want to know what value is stored against BTree Index and Bitmap Index in oracle

Search the documentation here for "internal structure of indexes".

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse