Postgresql: order by two boolean and 1 timestamp columns - postgresql

Im having trouble with a query that becomes ghastly slow as the database grows.
The problem seems to be the sorting, which depends on three conditions - importance, urgency and timestamp.
The query currently in use is plain old
ORDER BY urgent DESC, important DESC, date_published DESC
Fields are boolean for urgent and important, and date_published is an integer (UNIX timestamp).

Create indexes for columns you sort by regularly. You may even set a compound index.

CREATE INDEX foo ON table_name (urgent DESC, important DESC, date_published DESC);

Related

ASC time index in timescaleDB

I'm building a TimescaleDB local server and I'm creating my first "production" hypertables. The point is that, at the moment, all the future consumers of my DB are going to use the data in ASC order, but by default timescale creates a DESC index in the time column.
My doubt is, does it worth to change the default behaviour and make the index to be ASC?
I don't know if it's DESC by default for a good reason and I'm going to have some penalty. I have also read that indexs in postgresql can be read backward, so a DESC index could be used in an ASC query, but I don't know if there are performance penalties.
In the other hand, it's safe to simple delete the default index and create a new one with different order? Also not sure if deleting it I'm going to screw up some timescale internal functionality.
Thanks for your time,
H25E
For a single-column index, it does not matter at all if it is created ASC or DESC, because indexes can be read in both directions with the same efficiency.
The only time when you really need to specify DESC in an index is if the index is supposed to support an ORDER BY clause like ORDER BY a, b DESC. Then one of the index columns must be sorted ASC and the other DESC — but again it doesn't matter which one is ASC and which DESC, as the index can be read in both directions.
So, for a single column index, there is no need to build the index again, and there was no good reason to create it DESC in the first place (but it doesn't matter).

Efficient retrieval of latest value in a large table PostgreSQL

Currently after working to get an efficient way to query a table in the format below I am using this query...
select distinct on (symbol, date) date, symbol, value, created_time
from "test_table"
where symbol in ('symbol15', 'symbol19', 'symbol36', 'symbol54', 'symbol13', 'symbol90', 'symbol115', 'symbol145', 'symbol165', 'symbol12')
order by symbol, date, created_time desc
With this index...
test_table(symbol, date, created_time)
Below is a sample of the data to show what columns I am working with. The real table is a 13 million rows.
date symbol value created_time
2010-01-09 symbol1 101 3847474847
2010-01-10 symbol1 102 3847474847
2010-01-10 symbol1 102.5 3847475500
2010-01-10 symbol2 204 3847474847
2010-01-11 symbol1 109 3847474847
2010-01-12 symbol1 105 3847474847
2010-01-12 symbol2 206 3847474847
Currently it looks like 80+% of the query is spent sorting based on the EXPLAIN ANALYZE. Any idea how to improve the speed of this query? I need to get the latest created_time for each date and symbol combination.
Since your where clause uses only the column symbol, the index you created will not be used.
I advise you to create an index on symbol:
CREATE INDEX ON test_table(symbol);
Also, this is probably a better way to write your query
SELECT date, symbol, MAX(created_time)
FROM "test_table"
WHERE symbol in ('symbol15', 'symbol19', 'symbol36', 'symbol54', 'symbol13', 'symbol90', 'symbol115', 'symbol145', 'symbol165', 'symbol12')
GROUP BY date, symbol
ORDER BY symbol, date
LIMIT 10;
Adding a limit will greatly improve the performance if that is an option.
You should run EXPLAIN ANALYZE SELECT... to get a better understanding of which indexes are used or not and how PostgreSQL is running your query.
You might consider creating a partial or filtered index for this purpose - but be aware it may not work if your IN clause changes by adding more values or adding values not in your filtered index. It also may have some detrimental effects on INSERT speed to your table, as the index will have to evaluate whether your INSERT contains an interesting value - so if you're doing lots of inserts and can't afford any additional penalty there keep that in mind. You should also specify that you want date and created_time descending in the index.
E.g.
CREATE INDEX test_table_ix ON test_table (symbol, date DESC, created_time DESC)
WHERE (symbol in ('symbol15', 'symbol19', 'symbol36', 'symbol54', 'symbol13', 'symbol90', 'symbol115', 'symbol145', 'symbol165', 'symbol12'));
see: https://www.postgresql.org/docs/8.0/static/indexes-partial.html and https://www.postgresql.org/docs/9.6/static/indexes-ordering.html
Your query would then be able to use this index and should see some benefit - jus tkeep in mind this index has some cost associated, and consider whether your query is run frequently enough to justify it. You might see benefit just by applying the order to the index as well.
Without an ability to properly test this over 13 million rows the problem is always going to be the sorting needed to establish "latest". Although I am a little reluctant to propose this here row_number() over() is often a good technique to arrive at "latest".
An index that mimics the way you need to perform the sort to establish "latest" is the most likely to assist, so I expect that in index on symbol, date, created_time desc would be useful.
select date, symbol, value, created_time
from (select date, symbol, value, created_time
, row_number() over(partition by symbol, date order by created_time DESC) rn
from test_table
where symbol in ('symbol15', 'symbol19', 'symbol36', 'symbol54', 'symbol13', 'symbol90', 'symbol115', 'symbol145', 'symbol165', 'symbol12')
) d
where rn = 1
order by symbol, date, created_time desc
;
The index you are using is already the best. Since you do not show the explain analyze output I suggest you to try to values syntax:
select distinct on (symbol, date) date, symbol, value, created_time
from test_table
where symbol in (values ('symbol15'), ('symbol19'), ('symbol36'), ('symbol54'), ('symbol13'), ('symbol90'), ('symbol115'), ('symbol145'), ('symbol165'), ('symbol12'))
order by symbol, date, created_time desc

Postgresql 9.4 slow [duplicate]

I have table
create table big_table (
id serial primary key,
-- other columns here
vote int
);
This table is very big, approximately 70 million rows, I need to query:
SELECT * FROM big_table
ORDER BY vote [ASC|DESC], id [ASC|DESC]
OFFSET x LIMIT n -- I need this for pagination
As you may know, when x is a large number, queries like this are very slow.
For performance optimization I added indexes:
create index vote_order_asc on big_table (vote asc, id asc);
and
create index vote_order_desc on big_table (vote desc, id desc);
EXPLAIN shows that the above SELECT query uses these indexes, but it's very slow anyway with a large offset.
What can I do to optimize queries with OFFSET in big tables? Maybe PostgreSQL 9.5 or even newer versions have some features? I've searched but didn't find anything.
A large OFFSET is always going to be slow. Postgres has to order all rows and count the visible ones up to your offset. To skip all previous rows directly you could add an indexed row_number to the table (or create a MATERIALIZED VIEW including said row_number) and work with WHERE row_number > x instead of OFFSET x.
However, this approach is only sensible for read-only (or mostly) data. Implementing the same for table data that can change concurrently is more challenging. You need to start by defining desired behavior exactly.
I suggest a different approach for pagination:
SELECT *
FROM big_table
WHERE (vote, id) > (vote_x, id_x) -- ROW values
ORDER BY vote, id -- needs to be deterministic
LIMIT n;
Where vote_x and id_x are from the last row of the previous page (for both DESC and ASC). Or from the first if navigating backwards.
Comparing row values is supported by the index you already have - a feature that complies with the ISO SQL standard, but not every RDBMS supports it.
CREATE INDEX vote_order_asc ON big_table (vote, id);
Or for descending order:
SELECT *
FROM big_table
WHERE (vote, id) < (vote_x, id_x) -- ROW values
ORDER BY vote DESC, id DESC
LIMIT n;
Can use the same index.
I suggest you declare your columns NOT NULL or acquaint yourself with the NULLS FIRST|LAST construct:
PostgreSQL sort by datetime asc, null first?
Note two things in particular:
The ROW values in the WHERE clause cannot be replaced with separated member fields. WHERE (vote, id) > (vote_x, id_x) cannot be replaced with:
WHERE vote >= vote_x
AND id > id_x
That would rule out all rows with id <= id_x, while we only want to do that for the same vote and not for the next. The correct translation would be:
WHERE (vote = vote_x AND id > id_x) OR vote > vote_x
... which doesn't play along with indexes as nicely, and gets increasingly complicated for more columns.
Would be simple for a single column, obviously. That's the special case I mentioned at the outset.
The technique does not work for mixed directions in ORDER BY like:
ORDER BY vote ASC, id DESC
At least I can't think of a generic way to implement this as efficiently. If at least one of both columns is a numeric type, you could use a functional index with an inverted value on (vote, (id * -1)) - and use the same expression in ORDER BY:
ORDER BY vote ASC, (id * -1) ASC
Related:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
Improve performance for order by with columns from many tables
Note in particular the presentation by Markus Winand I linked to:
"Pagination done the PostgreSQL way"
Have you tried partioning the table ?
Ease of management, improved scalability and availability, and a
reduction in blocking are common reasons to partition tables.
Improving query performance is not a reason to employ partitioning,
though it can be a beneficial side-effect in some cases. In terms of
performance, it is important to ensure that your implementation plan
includes a review of query performance. Confirm that your indexes
continue to appropriately support your queries after the table is
partitioned, and verify that queries using the clustered and
nonclustered indexes benefit from partition elimination where
applicable.
http://sqlperformance.com/2013/09/sql-indexes/partitioning-benefits

Postgres DESC index on date field

I have a date field on a large table that I mostly query and sort in DESC order. I have an index on that field with the default ASC order. I read that if an index is on a single field it does not matter if it is in ASC or DESC order since an index can be read from both directions. Will I benefit from changing my index to DESC?
operating systems are generally more efficient reading files in a forwards direction, so you may get a slight speed up by creating a DESC index.
For a big speed up create the DESC index and CLUSTER the table on it.
CLUSTER tablename USING indexname;
clustering on the ASC index will also give improvement, but it will be less.

sql date order by problem

i have image table, which has 2 or more rows with same date.. now im tring to do order by created_date DESC, which works fine and shows rows same position, but when i change the query and try again, it shows different positions.. and no i dont have any other order by field, so im bit confused on why its doing it and how can i fix it.
can you please help on this.
To get reproducible results you need to have columns in your order by clause that together are unique. Do you have an ID column? You can use that to tie-break:
ORDER BY created_date DESC, id
I suspect that this is happening because MySQL is not given any ordering information other than ORDER BY created_date DESC, so it does whatever is most convenient for MySQL depending on its complicated inner workings (caching, indexing, etc.). Assuming you have a unique key id, you could do:
SELECT * FROM table t ORDER BY t.created_date DESC, t.id ASC
Which would give you the same result every time because putting a comma in the arguments following ORDER BY gives it a secondary ordering rule that is executed when the first ordering rule doesn't produce a clear order between two rows.
To have consistent results, you will need to add at least more column to the 'ORDER BY' clause. Since the values in the created_date column are not unique, there is not a defined order. If you wanted that column to be 'unique', you could define it as a timestamp.