Postgres Explain Plans Are DIfferent - postgresql

I have 2 databases running on the same server.
Both are development databases with the same structure, but different amounts of data.
Running Postgres 9.2 on Linux (Centos/Redhat)
I'm running the following SQL on each database, but getting very different results in the performance.
Note: the write_history table structure is identical in each DB and have the same indexes/constraints.
Here is the SQL:
explain analyze SELECT * FROM write_history WHERE fk_device_rw_id =
'd2c969b8-2609-11e3-80ca-0750cff1e96c' AND fk_write_history_status_id
= 5 ORDER BY update_time DESC LIMIT 1 ;
And the Explain Plans for each DB:
DB1 - PreProd
Limit (cost=57902.39..57902.39 rows=1 width=103) (actual
time=0.056..0.056 rows=0 loops=1)' -> Sort
(cost=57902.39..57908.69 rows=2520 width=103) (actual
time=0.053..0.053 rows=0 loops=1)'
Sort Key: update_time'
Sort Method: quicksort Memory: 25kB'
-> Bitmap Heap Scan on write_history (cost=554.04..57889.79 rows=2520 width=103) (actual time=0.033..0.033 rows=0 loops=1)'
Recheck Cond: (fk_device_rw_id = 'd2c969b8-2609-11e3-80ca-0750cff1e96c'::uuid)'
Filter: (fk_write_history_status_id = 5)'
-> Bitmap Index Scan on idx_write_history_fk_device_rw_id (cost=0.00..553.41 rows=24034
width=0) (actual time=0.028..0.028 rows=0 loops=1)'
Index Cond: (fk_device_rw_id = 'd2c969b8-2609-11e3-80ca-0750cff1e96c'::uuid)' Total runtime: 0.112 ms'
DB2 - QA
Limit (cost=50865.41..50865.41 rows=1 width=108) (actual
time=545.521..545.523 rows=1 loops=1)' -> Sort
(cost=50865.41..50916.62 rows=20484 width=108) (actual
time=545.518..545.518 rows=1 loops=1)'
Sort Key: update_time'
Sort Method: top-N heapsort Memory: 25kB'
-> Bitmap Heap Scan on write_history (cost=1431.31..50762.99 rows=20484 width=108) (actual time=21.931..524.034 rows=22034
loops=1)'
Recheck Cond: (fk_device_rw_id = 'd2cd81a6-2609-11e3-b574-47328bfa4c38'::uuid)'
Rows Removed by Index Recheck: 1401986'
Filter: (fk_write_history_status_id = 5)'
Rows Removed by Filter: 40161'
-> Bitmap Index Scan on idx_write_history_fk_device_rw_id (cost=0.00..1426.19 rows=62074
width=0) (actual time=19.167..19.167 rows=62195 loops=1)'
Index Cond: (fk_device_rw_id = 'd2cd81a6-2609-11e3-b574-47328bfa4c38'::uuid)' Total runtime: 545.588 ms'
So a few questions:
what is the difference between "Sort Method: quicksort Memory: 25kB" and "Sort Method: top-N heapsort Memory: 25kB" ?
What could be the cause for the total runtime to be so different?
Table Row Count:
DB1 : write_history row count: 5,863,565
DB2 : write_history row count: 2,670,888
Please let me know if further information is required.
Thanks for the help.

A top-N sort means that it is sorting in support of an ORDER BY ... LIMIT N, and that it will throw away any tuples once it can show that tuple cannot be in the top N. The decision to switch to a top-N sort is made dynamically during the sort process. Since that sort had zero input tuples, it never made the decision to switch to it. So the difference in the reported methods is an effect, not a cause; and is no importance to this case.
I think the key for you is the bitmap heap scans:
(actual time=0.033..0.033 rows=0 loops=1)
(actual time=21.931..524.034 rows=22034 loops=1)
The smaller database has a lot more rows which match your criteria, so has a lot more work to do.
Also, the amount of rechecking work that needs to be done:
Rows Removed by Index Recheck: 1401986
suggests you have work_mem set to an very small value of work_mem, so your bitmap is overflowing it.

1) Two different sorting algorithms.
http://en.wikipedia.org/wiki/Quicksort
http://en.wikipedia.org/wiki/Heapsort
2) Volume of data being sorted will change the decision.
From memory of database theory; QuickSort performs very badly on large volumes of data due to the sequence it scans through the data. It's particularly bad if sorting requires caching to disk. Heap sort has a much better worst case than quick sort. The query planner knows this and will change the plan accordingly.
The number of rows being sorted is 8X larger. Be very careful to notice the number of rows expected to be selected as well as the number of rows in the table. The query planner uses histograms of the data to get a better picture of what will be selected not simply the number of rows in the table. In DB1 it's 2.5K, in DB2 it's 20K
You can check if these estimates are correct by checking if the resulting set of rows matches the count in the estimate. If not then consider performing an ANALYZE

Related

postgresql search is slow on type text[] column

I have product_details table with 30+ Million records. product attributes text type data is stored into column Value1.
Front end(web) users search for product details and it will be queried on column Value1.
create table product_details(
key serial primary key ,
product_key int,
attribute_key int ,
Value1 text[],
Value2 int[],
status text);
I created gin index on column Value1 to improve search query performance.
query execution improved a lot for many queries.
Tables and indexes are here
Below is one of query used by application for search.
select p.key from (select x.product_key,
x.value1,
x.attribute_key,
x.status
from product_details x
where value1 IS NOT NULL
) as pr_d
join attribute_type at on at.key = pr_d.attribute_key
join product p on p.key = pr_d.product_key
where value1_search(pr_d.value1) ilike '%B s%'
and at.type = 'text'
and at.status = 'active'
and pr_d.status = 'active'
and 1 = 1
and p.product_type_key=1
and 1 = 1
group by p.key
query is executed in 2 or 3 secs if we search %B % or any single or two char words and below is query plan
Group (cost=180302.82..180302.83 rows=1 width=4) (actual time=49.006..49.021 rows=65 loops=1)
Group Key: p.key
-> Sort (cost=180302.82..180302.83 rows=1 width=4) (actual time=49.005..49.009 rows=69 loops=1)
Sort Key: p.key
Sort Method: quicksort Memory: 28kB
-> Nested Loop (cost=0.99..180302.81 rows=1 width=4) (actual time=3.491..48.965 rows=69 loops=1)
Join Filter: (x.attribute_key = at.key)
Rows Removed by Join Filter: 10051
-> Nested Loop (cost=0.99..180270.15 rows=1 width=8) (actual time=3.396..45.211 rows=69 loops=1)
-> Index Scan using products_product_type_key_status on product p (cost=0.43..4420.58 rows=1413 width=4) (actual time=0.024..1.473 rows=1630 loops=1)
Index Cond: (product_type_key = 1)
-> Index Scan using product_details_product_attribute_key_status on product_details x (cost=0.56..124.44 rows=1 width=8) (actual time=0.026..0.027 rows=0 loops=1630)
Index Cond: ((product_key = p.key) AND (status = 'active'))
Filter: ((value1 IS NOT NULL) AND (value1_search(value1) ~~* '%B %'::text))
Rows Removed by Filter: 14
-> Seq Scan on attribute_type at (cost=0.00..29.35 rows=265 width=4) (actual time=0.002..0.043 rows=147 loops=69)
Filter: ((value_type = 'text') AND (status = 'active'))
Rows Removed by Filter: 115
Planning Time: 0.732 ms
Execution Time: 49.089 ms
But if i search for %B s%, query took 75 secs and below is query plan (second time query execution took 63 sec)
In below query plan, DB engine didn't consider index for scan as in above query plan indexes were used. Not sure why ?
Group (cost=8057.69..8057.70 rows=1 width=4) (actual time=62138.730..62138.737 rows=12 loops=1)
Group Key: p.key
-> Sort (cost=8057.69..8057.70 rows=1 width=4) (actual time=62138.728..62138.732 rows=14 loops=1)
Sort Key: p.key
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=389.58..8057.68 rows=1 width=4) (actual time=2592.685..62138.710 rows=14 loops=1)
-> Hash Join (cost=389.15..4971.85 rows=368 width=4) (actual time=298.280..62129.956 rows=831 loops=1)
Hash Cond: (x.attribute_type = at.key)
-> Bitmap Heap Scan on product_details x (cost=356.48..4937.39 rows=681 width=8) (actual time=298.117..62128.452 rows=831 loops=1)
Recheck Cond: (value1_search(value1) ~~* '%B s%'::text)
Rows Removed by Index Recheck: 26168889
Filter: ((value1 IS NOT NULL) AND (status = 'active'))
Rows Removed by Filter: 22
Heap Blocks: exact=490 lossy=527123
-> Bitmap Index Scan on product_details_value1_gin (cost=0.00..356.31 rows=1109 width=0) (actual time=251.596..251.596 rows=2846970 loops=1)
Index Cond: (value1_search(value1) ~~* '%B s%'::text)
-> Hash (cost=29.35..29.35 rows=265 width=4) (actual time=0.152..0.153 rows=269 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 18kB
-> Seq Scan on attribute_type at (cost=0.00..29.35 rows=265 width=4) (actual time=0.010..0.122 rows=269 loops=1)
Filter: ((value_type = 'text') AND (status = 'active'))
Rows Removed by Filter: 221
-> Index Scan using product_pkey on product p (cost=0.43..8.39 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=831)
Index Cond: (key = x.product_key)
Filter: (product_type_key = 1)
Rows Removed by Filter: 1
Planning Time: 0.668 ms
Execution Time: 62138.794 ms
Any suggestions pls to improve query for search %B s%
thanks
ilike '%B %' has no usable trigrams in it. The planner knows this, and punishes the pg_trgm index plan so much that the planner then goes with an entirely different plan instead.
But ilike '%B s%' does have one usable trigram in it, ' s'. It turns out that this trigram sucks because it is extremely common in the searched data, but the planner currently has no way to accurately estimate how much it sucks.
Even worse, this large number matches means your full bitmap can't fit in work_mem so it goes lossy. Then it needs to recheck all the tuples in any page which contains even one tuple that has the ' s' trigram in it, which looks like it is most of the pages in your table.
The first thing to do is to increase your work_mem to the point you stop getting lossy blocks. If most of your time is spent in the CPU applying the recheck condition, this should help tremendously. If most of your time is spent reading the product_details from disk (so that the recheck has the data it needs to run) then it won't help much. If you had done EXPLAIN (ANALYZE, BUFFERS) with track_io_timing turned on, then we would already know which is which.
Another thing you could do is have the application inspect the search parameter, and if it looks like two letters (with or without a space between), then forcibly disable that index usage, or just throw an error if there is no good reason to do that type of search. For example, changing the part of the query to look like this will disable the index:
where value1_search(pr_d.value1)||'' ilike '%B s%'
Another thing would be to rethink your data representation. '%B s%' is a peculiar thing to search for. Why would anyone search for that? Does it have some special meaning within the context of your data, which is not obvious to the outside observer? Maybe you could represent it in a different way that gets along better with pg_trgm.
Finally, you could try to improve the planning for GIN indexes generally by explicitly estimating how many tuples are going to fail recheck (due to inherent lossiness of the index, not due to overrunning work_mem). This would be a major undertaking, and you would be unlikely to see it in production for at least a couple years, if ever.

Gap in the execution-time parts in the query plan

I have trouble understanding the following query plan (anonyized). There seems to be a gap in how much time the actual parts of the query took and a part is missing.
The relevant part:
-> Sort (cost=306952.44..307710.96 rows=303409 width=40) (actual time=4222.122..4400.534 rows=582081 loops=1)
Sort Key: table1.column3, table1.column2
Sort Method: quicksort Memory: 71708kB
Buffers: shared hit=38058
-> Index Scan using myindex on table1 (cost=0.56..279325.68 rows=303409 width=40) (actual time=0.056..339.565 rows=582081 loops=1)
Index Cond: (((column1)::text = 'xxx1'::text) AND ((column2)::text > 'xxx2'::text) AND ((column2)::text < 'xxx3'::text))
Buffers: shared hit=38058
What happened between the index scan (up until 339.565) and starting to sort the result (4222.122)?
And the full plan (just in case):
GroupAggregate (cost=344290.95..348368.01 rows=30341 width=65) (actual time=4933.739..4933.806 rows=11 loops=1)
Group Key: t.column1, t.current_value, t.previous_value
Buffers: shared hit=38058
-> Sort (cost=344290.95..345045.68 rows=301892 width=72) (actual time=4933.712..4933.719 rows=58 loops=1)
Sort Key: t.column1, t.current_value, t.previous_value
Sort Method: quicksort Memory: 32kB
Buffers: shared hit=38058
-> Subquery Scan on t (cost=306952.44..316813.23 rows=301892 width=72) (actual time=4573.523..4933.607 rows=58 loops=1)
Filter: ((t.current_value)::text <> (t.previous_value)::text)
Rows Removed by Filter: 582023
Buffers: shared hit=38058
-> WindowAgg (cost=306952.44..313020.62 rows=303409 width=72) (actual time=4222.144..4859.579 rows=582081 loops=1)
Buffers: shared hit=38058
-> Sort (cost=306952.44..307710.96 rows=303409 width=40) (actual time=4222.122..4400.534 rows=582081 loops=1)
Sort Key: table1.column3, table1.column2
Sort Method: quicksort Memory: 71708kB
Buffers: shared hit=38058
-> Index Scan using myindex on table1 (cost=0.56..279325.68 rows=303409 width=40) (actual time=0.056..339.565 rows=582081 loops=1)
Index Cond: (((column1)::text = 'xxx1'::text) AND ((column2)::text > 'xxx2'::text) AND ((column2)::text < 'xxx3'::text))
Buffers: shared hit=38058
Planning Time: 0.405 ms
Execution Time: 4941.003 ms
Both cost and actual time data are shown as two numbers in these plans. The first is the setup time. Sloppily described, it's the time the query step takes to deliver its first row. See this. The second is the completion time, or time to last row.
actual time=setup...completion
The setup on a sort step includes the time required to retrieve the result set needing sorting and to actually perform the sort: shuffle the rows around in RAM or hopefully not, on disk. (Because quicksort is generally O(n log(n)) in complexity, that can be long for big result sets. You knew that.)
In your plan, the inner sort handles almost 600K rows that came from an index scan. It's the step where the sort takes time. It used 71,708 kilobytes of RAM.
As far as I can tell, there's nothing anomalous in your plan. How to speed up that sort? You might try shorter or fixed-length column2 and column3 data types, but that's all guesswork without seeing your query and table definitions.

LIMIT with ORDER BY makes query slow

I am having problems optimizing a query in PostgreSQL 9.5.14.
select *
from file as f
join product_collection pc on (f.product_collection_id = pc.id)
where pc.mission_id = 7
order by f.id asc
limit 100;
Takes about 100 seconds. If I drop the limit clause it takes about 0.5:
With limit:
explain (analyze,buffers) ... -- query exactly as above
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.84..859.32 rows=100 width=457) (actual time=102793.422..102856.884 rows=100 loops=1)
Buffers: shared hit=222430592
-> Nested Loop (cost=0.84..58412343.43 rows=6804163 width=457) (actual time=102793.417..102856.872 rows=100 loops=1)
Buffers: shared hit=222430592
-> Index Scan using file_pkey on file f (cost=0.57..23409008.61 rows=113831736 width=330) (actual time=0.048..28207.152 rows=55858772 loops=1)
Buffers: shared hit=55652672
-> Index Scan using product_collection_pkey on product_collection pc (cost=0.28..0.30 rows=1 width=127) (actual time=0.001..0.001 rows=0 loops=55858772)
Index Cond: (id = f.product_collection_id)
Filter: (mission_id = 7)
Rows Removed by Filter: 1
Buffers: shared hit=166777920
Planning time: 0.803 ms
Execution time: 102856.988 ms
Without limit:
=> explain (analyze,buffers) ... -- query as above, just without limit
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=20509671.01..20526681.42 rows=6804163 width=457) (actual time=456.175..510.596 rows=142055 loops=1)
Sort Key: f.id
Sort Method: quicksort Memory: 79392kB
Buffers: shared hit=37956
-> Nested Loop (cost=0.84..16494851.02 rows=6804163 width=457) (actual time=0.044..231.051 rows=142055 loops=1)
Buffers: shared hit=37956
-> Index Scan using product_collection_mission_id_index on product_collection pc (cost=0.28..46.13 rows=87 width=127) (actual time=0.017..0.101 rows=87 loops=1)
Index Cond: (mission_id = 7)
Buffers: shared hit=10
-> Index Scan using file_product_collection_id_index on file f (cost=0.57..187900.11 rows=169535 width=330) (actual time=0.007..1.335 rows=1633 loops=87)
Index Cond: (product_collection_id = pc.id)
Buffers: shared hit=37946
Planning time: 0.807 ms
Execution time: 569.865 ms
I have copied the database to a backup server so that I may safely manipulate the database without something else changing it on me.
Cardinalities:
Table file: 113,831,736 rows.
Table product_collection: 1370 rows.
The query without LIMIT: 142,055 rows.
SELECT count(*) FROM product_collection WHERE mission_id = 7: 87 rows.
What I have tried:
searching stack overflow
vacuum full analyze
creating two column indexes on file.product_collection_id & file.id. (there already are single column indexes on every field touched.)
creating two column indexes on file.id & file.product_collection_id.
increasing the statistics on file.id & file.product_collection_id, then re-vacuum analyze.
changing various query planner settings.
creating non-materialized views.
walking up and down the hallway while muttering to myself.
None of them seem to change the performance in a significant way.
Thoughts?
UPDATE from OP:
Tested this on PostgreSQL 9.6 & 10.4, and found no significant changes in plans or performance.
However, setting random_page_cost low enough is the only way to get faster performance on the without limit search.
With a default random_page_cost = 4, the without limit:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=9270013.01..9287875.64 rows=7145054 width=457) (actual time=47782.523..47843.812 rows=145697 loops=1)
Sort Key: f.id
Sort Method: external sort Disk: 59416kB
Buffers: shared hit=3997185 read=1295264, temp read=7427 written=7427
-> Hash Join (cost=24.19..6966882.72 rows=7145054 width=457) (actual time=1.323..47458.767 rows=145697 loops=1)
Hash Cond: (f.product_collection_id = pc.id)
Buffers: shared hit=3997182 read=1295264
-> Seq Scan on file f (cost=0.00..6458232.17 rows=116580217 width=330) (actual time=0.007..17097.581 rows=116729984 loops=1)
Buffers: shared hit=3997169 read=1295261
-> Hash (cost=23.08..23.08 rows=89 width=127) (actual time=0.840..0.840 rows=87 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 15kB
Buffers: shared hit=13 read=3
-> Bitmap Heap Scan on product_collection pc (cost=4.97..23.08 rows=89 width=127) (actual time=0.722..0.801 rows=87 loops=1)
Recheck Cond: (mission_id = 7)
Heap Blocks: exact=10
Buffers: shared hit=13 read=3
-> Bitmap Index Scan on product_collection_mission_id_index (cost=0.00..4.95 rows=89 width=0) (actual time=0.707..0.707 rows=87 loops=1)
Index Cond: (mission_id = 7)
Buffers: shared hit=3 read=3
Planning time: 0.929 ms
Execution time: 47911.689 ms
User Erwin's answer below will take me some time to fully understand and generalize to all of the use cases needed. In the mean time we will probably use either a materialized view or just flatten our table structure.
This query is harder for the Postgres query planner than it might look. Depending on cardinalities, data distribution, value frequencies, sizes, ... completely different query plans can prevail and the planner has a hard time predicting which is best. Current versions of Postgres are better at this in several aspects, but it's still hard to optimize.
Since you retrieve only relatively few rows from product_collection, this equivalent query with LIMIT in a LATERAL subquery should avoid performance degradation:
SELECT *
FROM product_collection pc
CROSS JOIN LATERAL (
SELECT *
FROM file f -- big table
WHERE f.product_collection_id = pc.id
ORDER BY f.id
LIMIT 100
) f
WHERE pc.mission_id = 7
ORDER BY f.id
LIMIT 100;
Edit: This results in a query plan with explain (analyze,verbose) provided by the OP:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=30524.34..30524.59 rows=100 width=457) (actual time=13.128..13.167 rows=100 loops=1)
Buffers: shared hit=3213
-> Sort (cost=30524.34..30546.09 rows=8700 width=457) (actual time=13.126..13.152 rows=100 loops=1)
Sort Key: file.id
Sort Method: top-N heapsort Memory: 76kB
Buffers: shared hit=3213
-> Nested Loop (cost=0.57..30191.83 rows=8700 width=457) (actual time=0.060..9.868 rows=2880 loops=1)
Buffers: shared hit=3213
-> Seq Scan on product_collection pc (cost=0.00..69.12 rows=87 width=127) (actual time=0.024..0.336 rows=87 loops=1)
Filter: (mission_id = 7)
Rows Removed by Filter: 1283
Buffers: shared hit=13
-> Limit (cost=0.57..344.24 rows=100 width=330) (actual time=0.008..0.071 rows=33 loops=87)
Buffers: shared hit=3200
-> Index Scan using file_pc_id_index on file (cost=0.57..582642.42 rows=169535 width=330) (actual time=0.007..0.065 rows=33 loops=87)
Index Cond: (product_collection_id = pc.id)
Buffers: shared hit=3200
Planning time: 0.595 ms
Execution time: 13.319 ms
You need these indexes (will help your original query, too):
CREATE INDEX idx1 ON file (product_collection_id, id); -- crucial
CREATE INDEX idx2 ON product_collection (mission_id, id); -- helpful
You mentioned:
two column indexes on file.id & file.product_collection_id.
Etc. But we need it the other way round: id last. The order of index expressions is crucial. See:
Is a composite index also good for queries on the first field?
Rationale: With only 87 rows from product_collection, we only fetch a maximum of 87 x 100 = 8700 rows (fewer if not every pc.id has 100 rows in table file), which are then sorted before picking the top 100. Performance degrades with the number of rows you get from product_collection and with bigger LIMIT.
With the multicolumn index idx1 above, that's 87 fast index scans. The rest is not very expensive.
More optimization is possible, depending on additional information. Related:
Can spatial index help a “range - order by - limit” query

Are JSONB indexes slower than native indexes?

I have a large table (30M rows) which has ~10 jsonb B-tree indexes.
When I create a query using few conditions, the query is relatively fast.
When I add more conditions, especially one with a sparse jsonb index (e.g. an integer between 0 and 1,000,000), the query speed drops off dramatically.
I am wondering whether jsonb indexes are slower than native indexes? Would I expect a performance boost by switching to native columns rather than JSON?
Table definition:
id integer
type text
data jsonb
company_index ARRAY
exchange_index ARRAY
eligible boolean
Example query:
SELECT id, data, type
FROM collection.bundles
WHERE ( (ARRAY['.X'] && bundles.exchange_index) AND
type IN ('discussion') AND
( ((data->>'sentiment_score')::bigint > 0 AND
(data->'display_tweet'->'stocktwit'->'id') IS NOT NULL) ) AND
( eligible = true ) AND
((data->'display_tweet'->'stocktwit')->>'id')::bigint IS NULL )
ORDER BY id DESC
LIMIT 50
Output:
Limit (cost=0.56..16197.56 rows=50 width=212) (actual time=31900.874..31900.874 rows=0 loops=1)
Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
I/O Timings: read=7644.206 write=7.294
-> Index Scan using bundles2_id_desc_idx on bundles (cost=0.56..2401044.17 rows=7412 width=212) (actual time=31900.871..31900.871 rows=0 loops=1)
Filter: (eligible AND ('{.X}'::text[] && exchange_index) AND (type = 'discussion'::text) AND ((((data -> 'display_tweet'::text) -> 'stocktwit'::text) -> 'id'::text) IS NOT NULL) AND (((data ->> 'sentiment_score'::text))::bigint > 0) AND (((((data -> 'display_tweet'::text) -> 'stocktwit'::text) ->> 'id'::text))::bigint IS NULL))
Rows Removed by Filter: 16093269
Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
I/O Timings: read=7644.206 write=7.294
Planning time: 0.366 ms
Execution time: 31900.909 ms
Note:
There are jsonb B-tree indexes on every jsonb condition used in this query. exchange_index and company_index have GIN indexes.
UPDATE
After Laurenz's changed query:
Limit (cost=150634.15..150634.27 rows=50 width=211) (actual time=15925.828..15925.828 rows=0 loops=1)
Buffers: shared hit=1137490 read=680349 written=2
I/O Timings: read=2896.702 write=0.038
-> Sort (cost=150634.15..150652.53 rows=7352 width=211) (actual time=15925.827..15925.827 rows=0 loops=1)
Sort Key: bundles.id DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1137490 read=680349 written=2
I/O Timings: read=2896.702 write=0.038
-> Bitmap Heap Scan on bundles (cost=56666.15..150316.40 rows=7352 width=211) (actual time=15925.816..15925.816 rows=0 loops=1)
Recheck Cond: (('{.X}'::text[] && exchange_index) AND (type = 'discussion'::text))
Filter: (eligible AND ((((data -> 'display_tweet'::text) -> 'stocktwit'::text) -> 'id'::text) IS NOT NULL) AND (((data ->> 'sentiment_score'::text))::bigint > 0) AND (((((data -> 'display_tweet'::text) -> 'stocktwit'::text) ->> 'id'::text))::bigint IS NULL))
Rows Removed by Filter: 273230
Heap Blocks: exact=175975
Buffers: shared hit=1137490 read=680349 written=2
I/O Timings: read=2896.702 write=0.038
-> BitmapAnd (cost=56666.15..56666.15 rows=23817 width=0) (actual time=1895.890..1895.890 rows=0 loops=1)
Buffers: shared hit=37488 read=85559
I/O Timings: read=325.535
-> Bitmap Index Scan on bundles2_exchange_index_ops_idx (cost=0.00..6515.57 rows=863703 width=0) (actual time=218.690..218.690 rows=892669 loops=1)
Index Cond: ('{.X}'::text[] && exchange_index)
Buffers: shared hit=7 read=313
I/O Timings: read=1.458
-> Bitmap Index Scan on bundles_eligible_idx (cost=0.00..23561.74 rows=2476877 width=0) (actual time=436.719..436.719 rows=2569331 loops=1)
Index Cond: (eligible = true)
Buffers: shared hit=37473
-> Bitmap Index Scan on bundles2_type_idx (cost=0.00..26582.83 rows=2706276 width=0) (actual time=1052.267..1052.267 rows=2794517 loops=1)
Index Cond: (type = 'discussion'::text)
Buffers: shared hit=8 read=85246
I/O Timings: read=324.077
Planning time: 0.433 ms
Execution time: 15928.959 ms
All your fancy indexes are not used at all, so the problem is not if they are fast or not.
There are several things at play here:
Seeing the dirtied and the written pages during the index scan, I suspect that there are quite a lot of “dead tuples” in your table. When the index scan visits them and notices they are dead, it “kills” those index entries so that subsequent index scans don't have to repeat that work.
If you repeat the query, you will probably notice that the number of blocks and the execution time becomes less.
You can reduce that problem by running VACUUM on the table or making sure autovacuum processes the table often enough.
Your major problem, however, is that the LIMIT clause tempts PostgreSQL to use the following strategy:
Since you only want 50 result rows in an order for which you have an index, just examine the table rows in index order and discard all rows that do not match the complicated condition until you have 50 results.
Unfortunately it has to scan 16093319 rows until it has found its 50 hits. The rows at the “high id” end of the table don't match the condition. PostgreSQL does not know about that correlation.
The solution is to discourage PostgreSQL from going down that route. The easiest way would be to drop all indexes on id, but given its name that is probably unfeasible.
The other way is to keep PostgreSQL from “seeing” the LIMIT clause when it plans the scan:
SELECT id, data, type
FROM (SELECT id, data, type
FROM collection.bundles
WHERE /* all your complicated conditions */
OFFSET 0) subquery
ORDER BY id DESC
LIMIT 50;
Remark: You didn't show your index definitions, but it sounds to be like you have quite a lot of them, possibly too many. Indexes are expensive, so make sure you define only those that give you a clear benefit.

Why does postgres do a table scan instead of using my index?

I'm working with the HackerNews dataset in Postgres. There are about 17M rows about 14.5M of them are comments and about 2.5M are stories. There is a very active user named "rbanffy" who has 25k submissions, about equal split stories/comments. Both "by" and "type" have separate indices.
I have a query:
SELECT *
FROM "hn_items"
WHERE by = 'rbanffy'
and type = 'story'
ORDER BY id DESC
LIMIT 20 OFFSET 0
That runs quickly (it's using the 'by' index). If I change the type to "comment" then it's very slow. From the explain, it doesn't use either index and does a scan.
Limit (cost=0.56..56948.32 rows=20 width=1937)
-> Index Scan using hn_items_pkey on hn_items (cost=0.56..45823012.32 rows=16093 width=1937)
Filter: (((by)::text = 'rbanffy'::text) AND ((type)::text = 'comment'::text))
If I change the query to have type||''='comment', then it will use the 'by' index and executes quickly.
Why is this happening? I understand from https://stackoverflow.com/a/309814/214545 that having to do a hack like this implies something is wrong. But I don't know what.
EDIT:
This is the explain for the type='story'
Limit (cost=72553.07..72553.12 rows=20 width=1255)
-> Sort (cost=72553.07..72561.25 rows=3271 width=1255)
Sort Key: id DESC
-> Bitmap Heap Scan on hn_items (cost=814.59..72466.03 rows=3271 width=1255)
Recheck Cond: ((by)::text = 'rbanffy'::text)
Filter: ((type)::text = 'story'::text)
-> Bitmap Index Scan on hn_items_by_index (cost=0.00..813.77 rows=19361 width=0)
Index Cond: ((by)::text = 'rbanffy'::text)
EDIT:
EXPLAIN (ANALYZE,BUFFERS)
Limit (cost=0.56..59510.10 rows=20 width=1255) (actual time=20.856..545.282 rows=20 loops=1)
Buffers: shared hit=21597 read=2658 dirtied=32
-> Index Scan using hn_items_pkey on hn_items (cost=0.56..47780210.70 rows=16058 width=1255) (actual time=20.855..545.271 rows=20 loops=1)
Filter: (((by)::text = 'rbanffy'::text) AND ((type)::text = 'comment'::text))
Rows Removed by Filter: 46798
Buffers: shared hit=21597 read=2658 dirtied=32
Planning time: 0.173 ms
Execution time: 545.318 ms
EDIT: EXPLAIN (ANALYZE,BUFFERS) of type='story'
Limit (cost=72553.07..72553.12 rows=20 width=1255) (actual time=44.121..44.127 rows=20 loops=1)
Buffers: shared hit=20137
-> Sort (cost=72553.07..72561.25 rows=3271 width=1255) (actual time=44.120..44.123 rows=20 loops=1)
Sort Key: id DESC
Sort Method: top-N heapsort Memory: 42kB
Buffers: shared hit=20137
-> Bitmap Heap Scan on hn_items (cost=814.59..72466.03 rows=3271 width=1255) (actual time=6.778..37.774 rows=11630 loops=1)
Recheck Cond: ((by)::text = 'rbanffy'::text)
Filter: ((type)::text = 'story'::text)
Rows Removed by Filter: 12587
Heap Blocks: exact=19985
Buffers: shared hit=20137
-> Bitmap Index Scan on hn_items_by_index (cost=0.00..813.77 rows=19361 width=0) (actual time=3.812..3.812 rows=24387 loops=1)
Index Cond: ((by)::text = 'rbanffy'::text)
Buffers: shared hit=152
Planning time: 0.156 ms
Execution time: 44.422 ms
EDIT: latest test results
I was playing around with the type='comment' query and noticed if changed the limit to a higher number like 100, it used the by index. I played with the values until I found the critical number was '47'. If I had a limit of 47, the by index was used, if I had a limit of 46, it was a full scan. I assume that number isn't magical, just happens to be the threshold for my dataset or some other variable I don't know. I'm don't know if this helps.
Since there ate many comments by rbanffy, PostgreSQL assumes that it will be fast enough if it searches the table in the order implied by the ORDER BY clause (which can use the primary key index) until it has found 20 rows that match the search condition.
Unfortunately it happens that in the guy has grown lazy lately — at any rate, PostgreSQL has to scan the 46798 highest ids until it has found its 20 hits. (You really shouldn't have removed the Backwards, that confused me.)
The best way to work around that is to confuse PostgreSQL so that it doesn't choose the primary key index, perhaps like this:
SELECT *
FROM (SELECT * FROM hn_items
WHERE by = 'rbanffy'
AND type = 'comment'
OFFSET 0) q
ORDER BY id DESC
LIMIT 20;