Postgresql Index Only Scan Doesnt Properly Work On Group By - postgresql

I have a table like:
CREATE TABLE summary
(
id serial NOT NULL,
user_id bigint NOT NULL,
country character varying(5),
product_id bigint NOT NULL,
category_id bigint NOT NULL,
text_id bigint NOT NULL,
text character varying(255),
product_type integer NOT NULL,
event_name character varying(255),
report_date date NOT NULL,
currency character varying(5),
revenue double precision,
last_event_time timestamp
);
My table size is 1786 MB (except index). In here, I've created index like below:
CREATE INDEX "idx_as_type_usr_productId_eventTime"
ON summary USING btree
(product_type, user_id, product_id, last_event_time)
INCLUDE(event_name);
And my simple query looks like below:
select
event_name,
max(last_event_time)
from summary s
where s.user_id = ? and s.product_id = ? and s.product_type = ?
and s.last_event_time > '2020-03-01' and s.last_event_time < '2020-03-25'
group by event_name;
When I explain it, it looks like;
HashAggregate (cost=93.82..96.41 rows=259 width=25) (actual time=9187.533..9187.536 rows=10 loops=1)
Group Key: event_name
Buffers: shared hit=70898 read=10579 dirtied=22650
I/O Timings: read=3876.367
-> Index Only Scan using "idx_as_type_usr_productId_eventTime" on summary s (cost=0.56..92.36 rows=292 width=25) (actual time=0.485..9153.812 rows=87322 loops=1)
Index Cond: ((product_type = 2) AND (product_id = ?) AND (product_id = ?) AND (last_event_time > '2020-03-01 00:00:00'::timestamp without time zone) AND (last_event_time < '2020-03-25 00:00:00'::timestamp without time zone))
Heap Fetches: 35967
Buffers: shared hit=70898 read=10579 dirtied=22650
I/O Timings: read=3876.367
Planning Time: 0.452 ms
Execution Time: 9187.583 ms
In here, everything looks fine. But when I execute it, it takes more than 10 seconds, sometime it takes more than 30 seconds.
In here, if I execute it without Group By, it returns so quickly like less than 2 seconds. What can be the effect of Group By? The select part is so little (like a 500 rows).
This table has insert/update operations with 30/per second. Can this be related with this indexing problem?
Updated:
Query Without - GroupBy:
select
event_name,
last_event_time
from summary s
where s.user_id = ? and s.product_id = ? and s.product_type = ?
and s.last_event_time > '2020-03-01' and s.last_event_time < '2020-03-25';
Explain Without - Group By:
Index Only Scan using "idx_as_type_usr_productId_eventTime" on summary s (cost=0.56..92.36 rows=292 width=25) (actual time=0.023..79.138 rows=87305 loops=1)
Index Cond: ((product_type = ?) AND (user_id = ?) AND (product_id = ?) AND (last_event_time > '2020-03-01 00:00:00'::timestamp without time zone) AND (last_event_time < '2020-03-25 00:00:00'::timestamp without time zone))
Heap Fetches: 22949
Buffers: shared hit=37780 read=12143 dirtied=15156
I/O Timings: read=4418.930
Planning Time: 0.639 ms
Execution Time: 4625.213 ms

There are several problems:
PostgreSQL had to set hint bits, which dirty the pages and cause writes.
PostgreSQL has to fetch table rows from disk to fetch their visibility.
PostgreSQL has to scan 80000 pages to get 87000 rows, so the index must be totally bloated.
The first two can be taken care of by running
VACUUM summary;
which is always a good idea after a bulk load, and the bloat can be cured by
REINDEX INDEX "idx_as_type_usr_productId_eventTime";

Related

Postgres indexing for timestamp range does not work

I have a following table with 1.000.000 rows
create table event
(
id serial
constraint event_pk
primary key,
type text not null,
start_date timestamp not null,
end_date timestamp not null,
val text
);
and I need to execute a following SQL query
EXPLAIN (analyse, buffers, format text)
SELECT *
from event
WHERE end_date >= '2010-01-12T18:00:00'::timestamp
AND start_date <= '2010-01-13T00:00:00'::timestamp;
Please note that end_date in being compared with date which is eariler than that for start_date
The question is what index should I create for such query?
I've tried following one:
create index my_index
on event (end_date, start_date desc);
But it doesn't work, I can see that sequential search is being used
Seq Scan on event (cost=0.00..53040.01 rows=1967249 width=57) (actual time=0.142..149.163 rows=1971694 loops=1)
Filter: ((end_date >= '2010-01-12 18:00:00'::timestamp without time zone) AND (start_date <= '2010-01-13 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 28307
Buffers: shared hit=15762 read=7278
Planning:
Buffers: shared hit=4
Planning Time: 0.127 ms
Execution Time: 201.610 ms
I can not understand why my index is not working, because if we just try following index and query:
create index simple
on event (start_date, end_date DESC);
SELECT *
from event
WHERE event.start_date >= '2010-01-12T18:00:00'::timestamp
AND event.end_date <= '2011-01-13T00:00:00'::timestamp;
indexing works just fine
Bitmap Heap Scan on event (cost=466.91..23418.68 rows=18035 width=57) (actual time=1.954..8.551 rows=15944 loops=1)
Recheck Cond: ((start_date >= '2010-01-12 00:00:00'::timestamp without time zone) AND (end_date <= '2011-01-13 00:00:00'::timestamp without time zone))
Heap Blocks: exact=7694
Buffers: shared hit=7734 read=26
-> Bitmap Index Scan on simple (cost=0.00..462.40 rows=18035 width=0) (actual time=1.314..1.314 rows=15944 loops=1)
Index Cond: ((start_date >= '2010-01-12 00:00:00'::timestamp without time zone) AND (end_date <= '2011-01-13 00:00:00'::timestamp without time zone))
Buffers: shared hit=55 read=11
Planning:
Buffers: shared hit=8
Planning Time: 0.133 ms
Execution Time: 9.025 ms
This query does not do what I need, but I'm just wondering now why this index works for such query, but my index above does not work for query which I need
The answer is right there in the execution plan:
Seq Scan on event (...) (actual ... rows=1971694 ...)
Filter: ((end_date >= '2010-01-12 18:00:00'::timestamp without time zone) AND (start_date <= '2010-01-13 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 28307
the query returns almost two million rows, and the filter only removes 30000. It is more efficient to use a sequential scan than an index scan in that case.

Postgres index not using correct plan

My postgresql version is 10.6. I have created an index but that is not used for all where clause condition check. Below are more details :
Create index concurrently ticket_created_at_portal_id_created_by_id_assigned_group_id_idx on ticket(created_at, portal_id, created_by_id, assigned_group);
EXPLAIN (analyze true, verbose true, costs true, buffers true, timing true ) select * from ticket where status is not null
and (assigned_group in ('447') or created_by_id in ('39731566'))
and portal_id=8
and created_at>='2020-12-07T03:00:10.973'
and created_at<='2021-02-05T03:00:10.973'
order by updated_at DESC limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=18975.23..18975.25 rows=10 width=638) (actual time=278.340..278.345 rows=10 loops=1)
Output: id, action, assigned_agent, assigned_at, assigned_group, attachments, closed_at, created_at, created_by_email, created_by_id, description, first_response_time, parent_id, portal_id, priority, reopened_at, resolution_id, resolved_at, resolved_by_id, resource_id, resource_type, source, status, subject, tags, ticket_category, ticket_id, ticket_sub_category, ticket_sub_sub_category, type, updated_at, custom_fields, updated_by_id, first_assigned_at, mode, sla_breached, agent_assist_tags, comm_vendor, from_email
Buffers: shared hit=2280 read=3105
-> Sort (cost=18975.23..18975.45 rows=87 width=638) (actual time=278.338..278.339 rows=10 loops=1)
Output: id, action, assigned_agent, assigned_at, assigned_group, attachments, closed_at, created_at, created_by_email, created_by_id, description, first_response_time, parent_id, portal_id, priority, reopened_at, resolution_id, resolved_at, resolved_by_id, resource_id, resource_type, source, status, subject, tags, ticket_category, ticket_id, ticket_sub_category, ticket_sub_sub_category, type, updated_at, custom_fields, updated_by_id, first_assigned_at, mode, sla_breached, agent_assist_tags, comm_vendor, from_email
Sort Key: ticket.updated_at DESC
Sort Method: top-N heapsort Memory: 33kB
Buffers: shared hit=2280 read=3105
-> Bitmap Heap Scan on public.ticket (cost=17855.76..18973.35 rows=87 width=638) (actual time=111.871..275.835 rows=1256 loops=1)
Output: id, action, assigned_agent, assigned_at, assigned_group, attachments, closed_at, created_at, created_by_email, created_by_id, description, first_response_time, parent_id, portal_id, priority, reopened_at, resolution_id, resolved_at, resolved_by_id, resource_id, resource_type, source, status, subject, tags, ticket_category, ticket_id, ticket_sub_category, ticket_sub_sub_category, type, updated_at, custom_fields, updated_by_id, first_assigned_at, mode, sla_breached, agent_assist_tags, comm_vendor, from_email
Recheck Cond: (((ticket.assigned_group = '447'::bigint) AND (ticket.portal_id = 8)) OR ((ticket.created_at >= '2020-12-07 03:00:10.973'::timestamp without time zone) AND (ticket.created_at <= '2021-02-05 03:00:10.973'::timestamp without time zone) AND (ticket.portal_id = 8) AND (ticket.created_by_id = '39731566'::bigint)))
Filter: ((ticket.status IS NOT NULL) AND (ticket.created_at >= '2020-12-07 03:00:10.973'::timestamp without time zone) AND (ticket.created_at <= '2021-02-05 03:00:10.973'::timestamp without time zone))
Rows Removed by Filter: 1517
Heap Blocks: exact=2638
Buffers: shared hit=2277 read=3105
-> BitmapOr (cost=17855.76..17855.76 rows=291 width=0) (actual time=106.215..106.216 rows=0 loops=1)
Buffers: shared hit=336 read=2408
-> Bitmap Index Scan on ticket_assigned_group_portal_id_assigned_agent_idx (cost=0.00..11.25 rows=282 width=0) (actual time=10.661..10.661 rows=2776 loops=1)
Index Cond: ((ticket.assigned_group = '447'::bigint) AND (ticket.portal_id = 8))
Buffers: shared hit=4 read=15
-> Bitmap Index Scan on ticket_created_at_portal_id_created_by_id_assigned_group_id_idx (cost=0.00..17844.47 rows=9 width=0) (actual time=95.551..95.551 rows=2 loops=1)
Index Cond: ((ticket.created_at >= '2020-12-07 03:00:10.973'::timestamp without time zone) AND (ticket.created_at <= '2021-02-05 03:00:10.973'::timestamp without time zone) AND (ticket.portal_id = 8) AND (ticket.created_by_id = '39731566'::bigint))
Buffers: shared hit=332 read=2393
Planning time: 43.083 ms
Execution time: 278.556 ms
(25 rows)
ticket_created_at_portal_id_created_by_id_assigned_group_id_idx is having all the columns of where clause except status is not null, but still query is using separate index for Index Cond: ((ticket.assigned_group = '447'::bigint) AND (ticket.portal_id = 8)) which is already present in 2nd index ticket_created_at_portal_id_created_by_id_assigned_group_id_idx.
Why it is so? Even when I include status column as well in the index, still query was using 2 index and hen doing a heavy filter on index heap scan.
How we can optimize it?
Also I did experiment with indexes of table but still unable to remove indexes. It seems that same columns are repeated in multiple indexes for diff queries, Please help if we can reduce these no of indexes.
All indexes for table are:
"ticket_pkey" PRIMARY KEY, btree (id)
"ticket_ticket_id_idx" UNIQUE, btree (ticket_id)
"uk2uors84i0m8sjxc6oaocuy6oj" UNIQUE CONSTRAINT, btree (ticket_id)
"idx_resource_id" btree (resource_id)
"idx_ticket_created_at" btree (created_at)
"ticket_assigned_agent_idx" btree (assigned_agent)
"ticket_assigned_group_idx" btree (assigned_group)
"ticket_assigned_group_portal_id_assigned_agent_idx" btree (assigned_group, portal_id, assigned_agent)
"ticket_created_at_portal_id_created_by_id_assigned_group_id_idx" btree (created_at, portal_id, created_by_id, assigned_group)
"ticket_created_at_portal_id_status_idx" btree (created_at, portal_id, status)
"ticket_id_resolved_at_assigned_group_status_idx" btree (id, resolved_at, assigned_group, status)
How easy would it be to use a phone book (sorted by last-name then first-name) to find every one with a first name of "Francis" whose last name starts with a letter between K and T? Not very easy, because it is not sorted by first name. You would have to go through the entire middle half of the phone book, reading everyone's first name.
Same here. When the first column in your index is used in a range/inequality query rather than equality, it makes all the columns after that one much less efficient. You would want to put the columns used for equality and not in an OR first. Unfortunately, that is only portal_id. The best thing to put next would depend on how selective each of those other conditions are which we can't guess from the info provided.
In deciding this, status IS NULL would be the same thing as equality, but status IS NOT NULL is not as there are any number of values it could be while still being not null, so it is effectively the same as an inequality. If this condition is highly selective, the best way to incorporate it would be in the WHERE of a partial index.
Because of the OR, you might still be best off with 2 indexes which could be combined in a bitmap or.
...(portal_id, assigned_group, created_at) WHERE status IS NOT NULL;
...(portal_id, created_by_id, created_at) WHERE status IS NOT NULL;
Another approach would be to avoid fetching and sorting all of the matching rows, by walking rows in order by updated_at using an index and stopping after 10 of them are found. An index can be used for walking a column in order as long as only things tested for equality (and without ORs) occur before the ORDER BY column, so:
...(portal_id, updated_at) WHERE status IS NOT NULL;

Database design for time series

Approximately every 10 min I insert ~50 records with the same timestamp.
It means ~600 records per hour or 7.200 records per day or 2.592.000 records per year.
User wants to retrieve all records for the timestamp closest to the asked time.
Design #1 - one table with index on timestamp column:
CREATE TABLE A (t timestamp, value int);
CREATE a_idx ON A (t);
Single insert statement creates ~50 records with the same timestamp:
INSERT INTO A VALUES (
(‘2019-01-02 10:00’, 5),
(‘2019-01-02 10:00’, 12),
(‘2019-01-02 10:00’, 7),
….
)
Get all records which are closest to the asked time
(I use the function greatest() available in PostgreSQL):
SELECT * FROM A WHERE t =
(SELECT t FROM A ORDER BY greatest(t - asked_time, asked_time - t) LIMIT 1)
I think this query is not efficient because it requires the full table scan.
I plan to partition the A table by timestamp to have 1 partition per year, but the approximate match above still will be slow.
Design #2 - create 2 tables:
1st table: to keep the unique timestamps and auto-incremented PK,
2nd table: to keep data and the foreign key on 1st table PK
CREATE TABLE UNIQ_TIMESTAMP (id SERIAL PRIMARY KEY, t timestamp);
CREATE TABLE DATA (id INTEGER REFERENCES UNIQ_TIMESTAMP (id), value int);
CREATE INDEX data_time_idx ON DATA (id);
Get all records which are closest to the asked time:
SELECT * FROM DATA WHERE id =
(SELECT id FROM UNIQ_TIMESTAMP ORDER BY greatest(t - asked_time, asked_time - t) LIMIT 1)
It should run faster compared to Design #1 because the nested select scans the smaller table.
Disadvantage of this approach:
- I have to insert into 2 tables instead just one
- I lost the ability to partition the DATA table by timestamp
What you could recommend?
I'd go with tje single table approach, perhaps partitioned by year so that it becomes easy to get rid of old data.
Create an index like
CREATE INDEX ON a (date_trunc('hour', t + INTERVAL '30 minutes'));
Then use your query like you wrote it, but add
AND date_trunc('hour', t + INTERVAL '30 minutes')
= date_trunc('hour', asked_time + INTERVAL '30 minutes')
The additional condition acts as a filter and can use the index.
You can use a UNION of two queries to find all timestamps closest to a given one:
(
select t
from a
where t >= timestamp '2019-03-01 17:00:00'
order by t
limit 1
)
union all
(
select t
from a
where t <= timestamp '2019-03-01 17:00:00'
order by t desc
limit 1
)
That will efficiently make use of an index on t. On a table with 10 million rows (~3 years of data), I get the following execution plan:
Append (cost=0.57..1.16 rows=2 width=8) (actual time=0.381..0.407 rows=2 loops=1)
Buffers: shared hit=6 read=4
I/O Timings: read=0.050
-> Limit (cost=0.57..0.58 rows=1 width=8) (actual time=0.380..0.381 rows=1 loops=1)
Output: a.t
Buffers: shared hit=1 read=4
I/O Timings: read=0.050
-> Index Only Scan using a_t_idx on stuff.a (cost=0.57..253023.35 rows=30699415 width=8) (actual time=0.380..0.380 rows=1 loops=1)
Output: a.t
Index Cond: (a.t >= '2019-03-01 17:00:00'::timestamp without time zone)
Heap Fetches: 0
Buffers: shared hit=1 read=4
I/O Timings: read=0.050
-> Limit (cost=0.57..0.58 rows=1 width=8) (actual time=0.024..0.025 rows=1 loops=1)
Output: a_1.t
Buffers: shared hit=5
-> Index Only Scan Backward using a_t_idx on stuff.a a_1 (cost=0.57..649469.88 rows=78800603 width=8) (actual time=0.024..0.024 rows=1 loops=1)
Output: a_1.t
Index Cond: (a_1.t <= '2019-03-01 17:00:00'::timestamp without time zone)
Heap Fetches: 0
Buffers: shared hit=5
Planning Time: 1.823 ms
Execution Time: 0.425 ms
As you can see it only requires very few I/O operations and that is pretty much independent of the table size.
The above can be used for an IN condition:
select *
from a
where t in (
(select t
from a
where t >= timestamp '2019-03-01 17:00:00'
order by t
limit 1)
union all
(select t
from a
where t <= timestamp '2019-03-01 17:00:00'
order by t desc
limit 1)
);
If you know you will never have more than 100 values close to that requested timestamp, you could remove the IN query completely and simply use a limit 100 in both parts of the union. That makes the query a bit more efficient as there is no second step for evaluating the IN condition, but might return more rows than you want.
If you always look for timestamps in the same year, then partitioning by year will indeed help with this.
You can put that into a function if it is too complicated as a query:
create or replace function get_closest(p_tocheck timestamp)
returns timestamp
as
$$
select *
from (
(select t
from a
where t >= p_tocheck
order by t
limit 1)
union all
(select t
from a
where t <= p_tocheck
order by t desc
limit 1)
) x
order by greatest(t - p_tocheck, p_tocheck - t)
limit 1;
$$
language sql stable;
The the query gets as simple as:
select *
from a
where t = get_closest(timestamp '2019-03-01 17:00:00');
Another solution is to use the btree_gist extension which provides a "distance" operator <->
Then you can create a GiST index on the timestamp:
create index on a using gist (t) ;
and use the following query:
select *
from a where t in (select t
from a
order by t <-> timestamp '2019-03-01 17:00:00'
limit 1);

How to auto-extract or index day from timestamp in Postgres?

We have a Postgres table with a timestamp field created_at. On a regular basis, we need to find all the records with the day field of created_at being a certain number.
We can run a query like
select * from table where extract(day from created_at) = 3;
I suspect this isn't efficient, ie it's doing a full-table scan. If so, can I create an index somehow to make the above efficient?
If it's not possible, we can create a separate column called created_at_day and create an index on it.
So we can simply run the query like
select * from table where created_at_day = 3;
Let's say created_at can be updated. Whenever this happens, created_at_day should be updated, too.
Does Postgres provide any support to automatically keep created_at_day in sync with created_at? If so, how?
Of course this can be done in the application logic. So whenever created_at is created or updated, we update the created_at_day column. But just wondering if there's an easier, automated way to do this.
Thanks
You can create an index on extract(day from created_at)
To see the difference:
Create a table
knayak=# create table t as select i ,now()::timestamp + interval '1 days' * i as created_at from generate_series(1,10000) as i;
SELECT 10000
Create normal index on created_at
knayak=# create index ind_created_at on t(created_at);
CREATE INDEX
knayak=# explain analyze select * from t where extract(day from created_at) = 3;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..205.00 rows=50 width=12) (actual time=1.049..6.020 rows=328 loops=1)
Filter: (date_part('day'::text, created_at) = '3'::double precision)
Rows Removed by Filter: 9672
Planning time: 0.392 ms
Execution time: 6.070 ms
(5 rows)
Create index with extract
knayak=# drop index ind_created_at;
DROP INDEX
knayak=# create index ind_created_at on t( extract(day from created_at) );
CREATE INDEX
knayak=# explain analyze select * from t where extract(day from created_at) = 3;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=4.67..61.66 rows=50 width=12) (actual time=0.110..0.260 rows=328 loops=1)
Recheck Cond: (date_part('day'::text, created_at) = '3'::double precision)
Heap Blocks: exact=54
-> Bitmap Index Scan on ind_created_at (cost=0.00..4.66 rows=50 width=0) (actual time=0.093..0.093 rows=328 loops=1)
Index Cond: (date_part('day'::text, created_at) = '3'::double precision)
Planning time: 0.316 ms
Execution time: 0.314 ms
(7 rows)

Limit slows down my postgres query

Hi i have a simple query on a single table which runs pretty fast, but i want to page my results and the LIMIT slows down the select incredibly. The Table contains about 80 Million rows. I'm on postgres 9.2.
Without LIMIT it takes 330ms and returns 2100 rows
EXPLAIN SELECT * from interval where username='1228321f131084766f3b0c6e40bc5edc41d4677e' order by time desc
Sort (cost=156599.71..156622.43 rows=45438 width=108)"
Sort Key: "time""
-> Bitmap Heap Scan on "interval" (cost=1608.05..155896.71 rows=45438 width=108)"
Recheck Cond: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)"
-> Bitmap Index Scan on interval_username (cost=0.00..1605.77 rows=45438 width=0)"
Index Cond: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)
EXPLAIN ANALYZE SELECT * from interval where
username='1228321f131084766f3b0c6e40bc5edc41d4677e' order by time desc
Sort (cost=156599.71..156622.43 rows=45438 width=108) (actual time=1.734..1.887 rows=2131 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 396kB
-> Bitmap Heap Scan on "interval" (cost=1608.05..155896.71 rows=45438 width=108) (actual time=0.425..0.934 rows=2131 loops=1)
Recheck Cond: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)
-> Bitmap Index Scan on interval_username (cost=0.00..1605.77 rows=45438 width=0) (actual time=0.402..0.402 rows=2131 loops=1)
Index Cond: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)
Total runtime: 2.065 ms
With LIMIT it takes several minuts (i never waited for it to end)
EXPLAIN SELECT * from interval where username='1228321f131084766f3b0c6e40bc5edc41d4677e' order by time desc LIMIT 10
Limit (cost=0.00..6693.99 rows=10 width=108)
-> Index Scan Backward using interval_time on "interval" (cost=0.00..30416156.03 rows=45438 width=108)
Filter: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)
Table definition
-- Table: "interval"
-- DROP TABLE "interval";
CREATE TABLE "interval"
(
uuid character varying(255) NOT NULL,
deleted boolean NOT NULL,
id bigint NOT NULL,
"interval" bigint NOT NULL,
"time" timestamp without time zone,
trackerversion character varying(255),
username character varying(255),
CONSTRAINT interval_pkey PRIMARY KEY (uuid),
CONSTRAINT fk_272h71b2gfyov9fwnksyditdd FOREIGN KEY (username)
REFERENCES appuser (panelistcode) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT uk_hyi5iws50qif6jwky9xcch3of UNIQUE (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE "interval"
OWNER TO postgres;
-- Index: interval_time
-- DROP INDEX interval_time;
CREATE INDEX interval_time
ON "interval"
USING btree
("time");
-- Index: interval_username
-- DROP INDEX interval_username;
CREATE INDEX interval_username
ON "interval"
USING btree
(username COLLATE pg_catalog."default");
-- Index: interval_uuid
-- DROP INDEX interval_uuid;
CREATE INDEX interval_uuid
ON "interval"
USING btree
(uuid COLLATE pg_catalog."default");
Further results
SELECT n_distinct FROM pg_stats WHERE tablename='interval' AND attname='username';
n_distinct=1460
SELECT AVG(length) FROM (SELECT username, COUNT(*) AS length FROM interval GROUP BY username) as freq;
45786.022605591910
SELECT COUNT(*) FROM interval WHERE username='1228321f131084766f3b0c6e40bc5edc41d4677e';
2131
The planner is expecting 45438 rows for username '1228321f131084766f3b0c6e40bc5edc41d4677e', while in reality there are only 2131 rows with it, thus it thinks it will find the 10 rows you want faster by looking backward through the interval_time index.
Try increasing the stats on the username column and see whether the query plan will change.
ALTER TABLE interval ALTER COLUMN username SET STATISTICS 100;
ANALYZE interval;
You can try different values of statistics up to 10000.
If you are still not satisfied with the plan and you are sure that you can do better than the planner and know what you are doing, then you can bypass any index easily by performing some operation over it that does not change its value.
For example, instead of ORDER BY time, you can use ORDER BY time + '0 seconds'::interval. That way any index on the value of time stored in the table will be bypassed. For integer values you can multiply * 1, etc.
The page http://thebuild.com/blog/2014/11/18/when-limit-attacks/ showed that i could force postgres to do better by using CTE
WITH inner_query AS (SELECT * from interval where username='7823721a3eb9243be63c6c3a13dffee44753cda6')
SELECT * FROM inner_query order by time desc LIMIT 10;