Indexing in TimescaleDB - postgresql

Today I read on Hackernews about BRIN indexing with PostgreSQL.
We are working with TimescaleDB for large and simple sensor data series.
Does BRIN indexing in TimescaleDB give additional value or do TimescaleDB features make BRIN indexing obsolete?

TimescaleDB is just a thin layer that speeds up inserting into partitions, it doesn't boost your queries to the best of my knowledge.
BRIN indexes are only useful in the case when the logical ordering of the rows according to a column is the exact same (or the exact opposite) of the physical ordering of the rows.
In practice, that means that rows have to be inserted in the same order as the column in question (e.g., earlier timestamps get inserted before later ones), and there is never an UPDATE or DELETE.
If this is the case, you can use a BRIN index on that column, which will take almost no space.

Related

Custom Indexes on TimeScaleDB Hypertable

I'm using a TimeScaleDB and PostgreSQL to manage timeseries data.
When optimizing the table is it recommended to rely purely on TimeScaleDB hypertable or should I also add indexes independently the same way I would do when not using a hypertable?
What is critical in that scenario is the performance of retrieving the data.
TimescaleDB creates an index on the time dimension by default. If your queries often select data on values from other columns, it can be helpful to create indexes on such columns as you would do with normal tables. However, in the case of TimescaleDB all indexes should be compound and include the time dimension column. You might drop the automatically created index on the time dimension after you created new indexes.
As usual creating new indexes should take in consideration that indexes occupy additional space and require more processing resources to maintain them.
Timescale has a blog post, which has advises on adding indexes.

What type of index is most suitable for a low-selective column

I have table with around 60M of records and potentially it will grow up to ~500M soon (then will be growing slowly). In the table there is a column, say category. Total number of categories is around 20K and grows very slow and occasionally. Records are not distributed evenly among categories, there are categories that cover 5% of all records while other categories are represented by only very small proportion of records.
I have number of queries that work only with one or several categories (use = or IN/ANY conditions) and I want to optimize performance of these queries.
Taking into account low-selective nature of data in the column, which type of Postgres index will be more beneficial: HASH or B-TREE?
Are there any other ways to optimize performance of these queries?
I can only give a generalized answer to this broad question.
Use B-tree indexes, not hash indexes.
If you have several conditions that are not very selective, create an index on each of the columns, then they can be combined with a bitmap index scan.
In general, a column that is not very selective is not a good candidate for an index. Indexes are not free. They need to be maintained, and at query-time, in most cases, Postgres will still have to go out to the table for each row the index search matches (exception is covering indexes).
With that said, I'm not sure of your selectivity analysis. If the highest percent you'll filter down to worst-case is 5%, and most are far lower than that, that I'd say you have a very selective column.
As for which index type to use, b-tree versus hash, I generally go with a b-tree index as my standard unless there is a specific need to deviate.
Hash indexes are faster to query than b-tree indexes, but, they cannot be used for range lookups, only equality. Hash indexes are not supported on all RDBMS's, and as a result, are less well understood in the community, which can hinder support.

Increase postgresql-9.5 performance on 200+ million records

I have 200+ millions of records in postgresql-9.5 table. Almost all queries are analytical queries. To increase and optimize the query performance so far I am trying with Indexing and seems that its not sufficient. What are the other options i need to look it into?
Depending on where clause condition create partitioned table (https://www.postgresql.org/docs/10/static/ddl-partitioning.html)
,it will reduce query cost drastically,also if there is certain fixed value in where clause do partial indexing on partitioned table.
Important point check order of columns in where clause and match it while indexing
You should upgrade to PostgreSQL v10 so that you can use parallel query.
That enables you to run sequential and index scans with several background workers in parallel, which can speed up these operations on large tables.
A good database layout, good indexing, lots of RAM and fast storage are also important factors for good performance of analytical queries.
If the analysis involves a lot of aggregation, consider materialized views to store the aggregates. Materialized views do take up space and they need to be refreshed too. But they are very useful for data aggregation.

How does PostgreSQL's CLUSTER differ from a clustered index in SQL Server?

Many posts like this stackoverflow link claim that there is no concept of a clustered index in PostgreSQL. However, the PostgreSQL documentation contains something similar. A few people claim it is similar to a clustered index in SQL Server.
Do you know what the exact difference between these two is, if there is any?
A clustered index or index organized table is a data structure where all the table data are organized in index order, typically by organizing the table in a B-tree structure.
Once a table is organized like this, the order is automatically maintained by all future data modifications.
PostgreSQL does not have such clustering indexes. What the CLUSTER command does is rewrite the table in the order of the index, but the table remains a fundamentally unordered heap of data, so future data modifications will not maintain that index order.
You have to CLUSTER a PostgreSQL table regularly if you want to maintain an approximate index order in the face of data modifications to the table.
Clustering in PostgreSQL can improve performance, because tuples found during an index scan will be close together in the heap table, which can turn random access to the heap to faster sequential access.

Reorganizing indexes and database size

I have a fragmentation problem on my production database. One of my main data tables is about 6GB(3GB Indexes) (about 9M records) in size and has 94%(!) index fragmentation.
I know that reorganizing indexes will solve this problem BUT my database is on SQL Server 2008R2 Express which has 10GB database limit and my database is already 8GB in size.
I have read few blog posts about this issue but non gave answer to my situation.
My Question1 is:
How much size(% or in GB) increase can I expect after reorganizing indexes on that table?
Question2:
Will Drop Index -> Build same index take less space? Time is not a factor for me at the moment.
Extra question:
Any other suggestions for database fragmentation? I know only to avoid shrinking like a fire ;)
Having INDEX on key columns will improve joins and Filters by negating the need for a table scan. A well maintained index can drastically improve performance.
It is Right that GUID's makes poor choice for indexed columns but by no means does it mean that you should not create these indexes. Ideally a data type of INT or BIGINT would be advisable.
For me Adding NEWID() as a default has shown some improvement in counteracting index fragmentation but if all alternatives fail you may have to do index maintenance (Rebuild, reorganize) operations more often than for other indexes. Reorganize needs some working space but in your scenario as time is not a concern, I would disable index, shrink DB and create index.