How PostgreSQL execute query? - postgresql

Can anyone explain why PostgreSQL works so:
If I execute this query
SELECT
*
FROM project_archive_doc as PAD, project_archive_doc as PAD2
WHERE
PAD.id = PAD2.id
it will be simple JOIN and EXPLAIN will looks like this:
Hash Join (cost=6.85..13.91 rows=171 width=150)
Hash Cond: (pad.id = pad2.id)
-> Seq Scan on project_archive_doc pad (cost=0.00..4.71 rows=171 width=75)
-> Hash (cost=4.71..4.71 rows=171 width=75)
-> Seq Scan on project_archive_doc pad2 (cost=0.00..4.71 rows=171 width=75)
But if I will execute this query:
SELECT *
FROM project_archive_doc as PAD
WHERE
PAD.id = (
SELECT PAD2.id
FROM project_archive_doc as PAD2
WHERE
PAD2.project_id = PAD.project_id
ORDER BY PAD2.created_at
LIMIT 1)
there will be no joins and EXPLAIN looks like:
Seq Scan on project_archive_doc pad (cost=0.00..886.22 rows=1 width=75)"
Filter: (id = (SubPlan 1))
SubPlan 1
-> Limit (cost=5.15..5.15 rows=1 width=8)
-> Sort (cost=5.15..5.15 rows=1 width=8)
Sort Key: pad2.created_at
-> Seq Scan on project_archive_doc pad2 (cost=0.00..5.14 rows=1 width=8)
Filter: (project_id = pad.project_id)
Why it is so and is there any documentation or articles about this?

Without table definitions and data it's hard to be specific for this case. In general, PostgreSQL is like most SQL databases in that it doesn't treat SQL as a step-by-step program for how to execute a query. It's more like a description of what you want the results to be and a hint about how you want the database to produce those results.
PostgreSQL is free to actually execute the query however it can most efficiently do so, so long as it produces the results you want.
Often it has several choices about how to produce a particular result. It will choose between them based on cost estimates.
It can also "understand" that several different ways of writing a particular query are equivalent, and transform one into another where it's more efficient. For example, it can transform an IN (SELECT ...) into a join, because it can prove they're equivalent.
However, sometimes apparently small changes to a query fundamentally change its meaning, and limit what optimisations/transformations PostgreSQL can make. Adding a LIMIT or OFFSET inside a subquery prevents PostgreSQL from flattening it, i.e. combining it with the outer query by tranforming it into a join. It also prevents PostgreSQL from moving WHERE clause entries between the subquery and outer query, because that'd change the meaning of the query. Without a LIMIT or OFFSET clause, it can do both these things because they don't change the query's meaning.
There's some info on the planner here.

Related

Postgres 9.4: How to fix Query Planner's Choice of Hash Join in ANY ARRAY lookup which runs 10x slower

I realize of course that figuring out these issues can be complex and require lots of info but I'm hoping there is a known issue or workaround for this particular case. I've narrowed down the change in the query that causes the sub-optimal query plan (this is running Postgres 9.4).
The following query runs in about 50ms. The tag_device table is a junction table with ~2 million entries, the devices table has about 1.5 million entries and the tags table has about 500,000 entries (Note: the actual IP values are just made up).
WITH inner_query AS (
SELECT * FROM tag_device
INNER JOIN tags ON tag_device.tag_id = tags.id
INNER JOIN devices ON tag_device.device_id = devices.id
WHERE devices.device_ip <<= ANY(ARRAY[
'10.0.0.1', '10.0.0.2', '10.0.0.5', '11.1.1.1', '12.2.2.35','13.0.0.1', '15.0.0.8', '1.160.0.1', '17.1.1.24', '18.2.2.1',
'10.0.0.6', '10.0.0.21', '10.0.0.52', '11.1.1.2', '12.2.2.34','13.0.0.2', '15.0.0.7', '1.160.0.2', '17.1.1.23', '18.2.2.2',
'10.0.0.7', '10.0.0.22', '10.0.0.53', '11.1.1.3', '12.2.2.33','13.0.0.3', '15.0.0.6', '1.160.0.3', '17.1.1.22', '18.2.2.3'
]::iprange[])
))
SELECT * FROM inner_query LIMIT 100 OFFSET 0;
A few things to note. device_ip is using the ip4r module (https://github.com/RhodiumToad/ip4r) to provide ip range lookups and this column has a gist index on it. The above query runs in about 50ms using the following query plan:
Limit (cost=140367.19..140369.19 rows=100 width=239)
CTE inner_query
-> Nested Loop (cost=40147.63..140367.19 rows=56193 width=431)
-> Merge Join (cost=40147.20..113345.15 rows=56193 width=261)
Merge Cond: (tag_device.device_id = devices.id)
-> Index Scan using tag_device_device_id_idx on tag_device (cost=0.43..67481.36 rows=1900408 width=51)
-> Materialize (cost=40136.82..40402.96 rows=53228 width=210)
-> Sort (cost=40136.82..40269.89 rows=53228 width=210)
Sort Key: devices.id
-> Bitmap Heap Scan on devices (cost=1489.12..30498.45 rows=53228 width=210)
Recheck Cond: (device_ip <<= ANY ('{10.0.0.1,10.0.0.2,10.0.0.5,11.1.1.1,12.2.2.2,13.0.0.1,15.0.0.2,1.160.0.5,17.1.1.1,18.2.2.2,10.0.0.1,10.0.0.2,10.0.0.5,11.1.1.1,12.2.2.2,13.0.0.1,15.0.0.2,1.160.0.5,17.1.1.1,18.2.2.2 (...)
-> Bitmap Index Scan on devices_iprange_idx (cost=0.00..1475.81 rows=53228 width=0)
Index Cond: (device_ip <<= ANY ('{10.0.0.1,10.0.0.2,10.0.0.5,11.1.1.1,12.2.2.2,13.0.0.1,15.0.0.2,1.160.0.5,17.1.1.1,18.2.2.2,10.0.0.1,10.0.0.2,10.0.0.5,11.1.1.1,12.2.2.2,13.0.0.1,15.0.0.2,1.160.0.5,17.1.1.1,18.2 (...)
-> Index Scan using tags_id_pkey on tags (cost=0.42..0.47 rows=1 width=170)
Index Cond: (id = tag_device.tag_id)
-> CTE Scan on inner_query (cost=0.00..1123.86 rows=56193 width=239)
If I increase the number of IP addresses in the ARRAY being looked up then the query plan changes and becomes drastically slower. So in the fast version of the query there are 30 items in the array. If I increase this to 80 items in the array then the query plan changes and becomes significantly slower (over 10x) The query remains the same in all other ways. The new query plan makes use of hash joins instead of merge joins and nested loops. Here is the new (much slower) query plan for when the array has 80 items in it instead of 30.
Limit (cost=204482.39..204484.39 rows=100 width=239)
CTE inner_query
-> Hash Join (cost=85839.13..204482.39 rows=146180 width=431)
Hash Cond: (tag_device.tag_id = tags.id)
-> Hash Join (cost=51368.40..145023.34 rows=146180 width=261)
Hash Cond: (tag_device.device_id = devices.id)
-> Seq Scan on tag_device (cost=0.00..36765.08 rows=1900408 width=51)
-> Hash (cost=45580.57..45580.57 rows=138466 width=210)
-> Bitmap Heap Scan on devices (cost=3868.31..45580.57 rows=138466 width=210)
Recheck Cond: (device_ip <<= ANY ('{10.0.0.1,10.0.0.2,10.0.0.5,11.1.1.1,12.2.2.35,13.0.0.1,15.0.0.8,1.160.0.1,17.1.1.24,18.2.2.1,10.0.0.6,10.0.0.21,10.0.0.52,11.1.1.2,12.2.2.34,13.0.0.2,15.0.0.7,1.160.0.2,17.1.1.23,18.2.2.2 (...)
-> Bitmap Index Scan on devices_iprange_idx (cost=0.00..3833.70 rows=138466 width=0)
Index Cond: (device_ip <<= ANY ('{10.0.0.1,10.0.0.2,10.0.0.5,11.1.1.1,12.2.2.35,13.0.0.1,15.0.0.8,1.160.0.1,17.1.1.24,18.2.2.1,10.0.0.6,10.0.0.21,10.0.0.52,11.1.1.2,12.2.2.34,13.0.0.2,15.0.0.7,1.160.0.2,17.1.1.23,18.2 (...)
-> Hash (cost=16928.88..16928.88 rows=475188 width=170)
-> Seq Scan on tags (cost=0.00..16928.88 rows=475188 width=170)
-> CTE Scan on inner_query (cost=0.00..2923.60 rows=146180 width=239)
The above query with it's default query plan runs in about 500ms (over 10 times slower). If I turn off hash joins using SET enable_hashjoin= OFF; then the query plan goes back to using merge joins and runs in ~50ms again with 80 items in the array.
Again the only change here is the number of items in the ARRAY that are being looked up.
Does anyone have any thoughts on why the planner is making the poor choice that results in the massive slow down?
The database fits into memory completely and is on SSDs.
I also want to point out that I'm using a CTE because I ran into an issue where the planner would not use the index on the tag_device table when I added in the limit to the query. Basically the issue described here: http://thebuild.com/blog/2014/11/18/when-limit-attacks/.
Thanks!
I see that there is a sort as part of the merge join. Once you get past a certain threshold the sort operation needed to do the merge join is deemed to be too expensive and a hash join is estimated to be cheaper. It may be more expensive (time wise) but cheaper in terms of CPU consumption to run the query this way.

Very bad performance of UNION select query in RedShift / ParAccel

I have two tables in redshift:
tbl_current_day - about 4.5M rows
tbl_previous_day - about 4.5M rows, with the same data exactly as tbl_current_day
In addition to it, I have a view called qry_both_days defined as following:
CREATE OR REPLACE qry_both_days AS
SELECT * FROM tbl_current_day
UNION SELECT * FROM tbl_previous_day;
When I run a query on one of the separate tables, I get very good performance as expected.
For example, the following query runs 5 seconds:
select count(distinct person_id) from tbl_current_day;
-- (person_id is of type int)
Explain plan:
XN Aggregate (cost=1224379.82..1224379.82 rows=1 width=4)
-> XN Subquery Scan volt_dt_0 (cost=1224373.80..1224378.61 rows=481 width=4)
-> XN HashAggregate (cost=1224373.80..1224373.80 rows=481 width=4)
-> XN Seq Scan on tbl_current_day (cost=0.00..979499.04 rows=97949904 width=4)
Note that width is 4 bytes, as it's supposed to be, as my column is of type int.
HOWEVER, when I run the same query on qry_both_days the query runs 20 times slower, while I would expect it to run only 2 times slower, as it should go over twice more rows:
select count(distinct person_id) from qry_both_days;
Explain plan:
XN Aggregate (cost=55648338.34..55648338.34 rows=1 width=4)
-> XN Subquery Scan volt_dt_0 (cost=55648335.84..55648337.84 rows=200 width=4)
-> XN HashAggregate (cost=55648335.84..55648335.84 rows=200 width=4)
-> XN Subquery Scan qry_both_days (cost=0.00..54354188.49 rows=517658938 width=4)
-> XN Unique (cost=0.00..49177599.11 rows=517658938 width=190)
-> XN Append (cost=0.00..10353178.76 rows=517658938 width=190)
-> XN Subquery Scan "*SELECT* 1" (cost=0.00..89649.20 rows=4482460 width=190)
-> XN Seq Scan on tbl_current_day (cost=0.00..44824.60 rows=4482460 width=190)
-> XN Subquery Scan "*SELECT* 2" (cost=0.00..90675.00 rows=4533750 width=187)
-> XN Seq Scan on tbl_previous_day (cost=0.00..45337.50 rows=4533750 width=187)
The problem: width is now 190, not 4 bytes as it's supposed to be!!!
Anybody knows how to make RedShift pick only the relevant columns on UNION SELECT?
Thanks!
UNION used by itself removes duplicate rows, e.g., uses an implied DISTINCT, as per the SQL spec.
That means that a lot more processing is required to prepare the output.
If you do not want DISTINCT results then you should always use UNION ALL to make sure the database is not checking for potential dupes.
Your view is created as SELECT *, so it always queries all the columns to create data for the view.
Then another SELECT is used and only requested columns from the view are returned.
If you have limited number of selected columns (like a two, three sets that are used all the time), I'd create a separate view for each column set.
Another way (even less elegant than one before) is to call each view so its name says which columns are included (lets say sorted and separated with '__') - like qry_both_days__age__name__person_id. Then, before each query, check if required view exists, if not create it.

speeding up wildcard text lookups

I have a simple table in Postgres with a bit over 8 million rows. The column of interest holds short text strings, typically one or more words total length less than 100 characters. It is set as 'character varying (100)'. The column is indexed. A simple look up like below takes > 3000 ms.
SELECT a, b, c FROM t WHERE a LIKE '?%'
Yes, for now, the need is to simply find the rows where "a" starts with the entered text. I want to bring the speed of look up down to under 100 ms (the appearance of instantaneous). Suggestions? Seems to me that full text search won't help here as my column of text is too short, but I would be happy to try that if worthwhile.
Oh, btw I also loaded the exact same data in mongodb and indexed column "a". Loading the data in mongodb was amazingly quick (mongodb++). Both mongodb and Postgres are pretty much instantaneous when doing exact lookups. But, Postgres actually shines when doing trailing wildcard searches as above, consistently taking about 1/3 as long as mongodb. I would be happy to pursue mongodb if I could speed that up as this is only a readonly operation.
Update: First, a couple of EXPLAIN ANALYZE outputs
EXPLAIN ANALYZE SELECT a, b, c FROM t WHERE a LIKE 'abcd%'
"Seq Scan on t (cost=0.00..282075.55 rows=802 width=40)
(actual time=1220.132..1220.132 rows=0 loops=1)"
" Filter: ((a)::text ~~ 'abcd%'::text)"
"Total runtime: 1220.153 ms"
I actually want to compare Lower(a) with the search term which is always at least 4 characters long, so
EXPLAIN ANALYZE SELECT a, b, c FROM t WHERE Lower(a) LIKE 'abcd%'
"Seq Scan on t (cost=0.00..302680.04 rows=40612 width=40)
(actual time=4.681..3321.387 rows=788 loops=1)"
" Filter: (lower((a)::text) ~~ 'abcd%'::text)"
"Total runtime: 3321.504 ms"
So I created an index
CREATE INDEX idx_t ON t USING btree (Lower(Substring(a, 1, 4) ));
"Seq Scan on t (cost=0.00..302680.04 rows=40612 width=40)
(actual time=3243.841..3243.841 rows=0 loops=1)"
" Filter: (lower((a)::text) = 'abcd%'::text)"
"Total runtime: 3243.860 ms"
Seems the only time an index is being used is when I am looking for an exact match
EXPLAIN ANALYZE SELECT a, b, c FROM t WHERE a = 'abcd'
"Index Scan using idx_t on geonames (cost=0.00..57.89 rows=13 width=40)
(actual time=40.831..40.923 rows=17 loops=1)"
" Index Cond: ((ascii_name)::text = 'Abcd'::text)"
"Total runtime: 40.940 ms"
Found a solution by implementing an index with varchar_pattern_ops, and am now looking for an even quicker lookups.
The PostgreSQL query planner is smart, but not an AI. To make it use an index on an expression use the exact same form of expression in the query.
With an index like this:
CREATE INDEX t_a_lower_idx ON t (lower(substring(a, 1, 4)));
Or simpler in PostgreSQL 9.1:
CREATE INDEX t_a_lower_idx ON t (lower(left(a, 4)));
Use this query:
SELECT * FROM t WHERE lower(left(a, 4)) = 'abcd';
Which is 100% functionally equivalent to:
SELECT * FROM t WHERE lower(a) LIKE 'abcd%'
Or:
SELECT * FROM t WHERE a ILIKE 'abcd%'
But not:
SELECT * FROM t WHERE a LIKE 'abcd%'
This is a functionally different query and you need a different index:
CREATE INDEX t_a_idx ON t (substring(a, 1, 4));
Or simpler with PostgreSQL 9.1:
CREATE INDEX t_a_idx ON t (left(a, 4));
And use this query:
SELECT * FROM t WHERE left(a, 4) = 'abcd';
Left anchored search terms of variable length
Case insensitive. Index:
Edit: Almost forgot: If you run your db with any other locale than the default 'C', you need to specify the operator class explicitly - text_pattern_ops in my example:
CREATE INDEX t_a_lower_idx
ON t (lower(left(a, <insert_max_length>)) text_pattern_ops);
Query:
SELECT * FROM t WHERE lower(left(a, <insert_max_length>)) ~~ 'abcdef%';
Can utilize the index and is almost as fast as the variant with a fixed length.
You may be interested in this post on dba.SE with more details about pattern matching, especially the last part about the operators ~>=~ and ~<~.
It is clearly documented that a regular expression search does not use any indexes for a variety of implementation. The only possible way for using indexes with regular expressions is limited to a prefix search like a*.

How can I optimize this postgresql query?

Below is a postgres query that seems to be taking far longer than I would expect. The field_instances table is indexed on both form_instance_id and field_id, and the form_instances table is indexed on workflow_state. So I thought it would be a fast query, but it takes forever. Can anybody help me interpret the query plan and what kinds of indexes to add to speed it up? Thanks.
explain analyze
select form_id,form_instance_id,answer,field_id
from form_instances,field_instances
where workflow_state = 'DRqueued'
and form_instance_id = form_instances.id
and field_id = 'Book_EstimatedDueDate';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=8733.85..95692.90 rows=9277 width=29) (actual time=2550.000..15430.000 rows=11431 loops=1)
Hash Cond: (field_instances.form_instance_id = form_instances.id)
-> Bitmap Heap Scan on field_instances (cost=2681.11..89071.72 rows=47567 width=25) (actual time=850.000..13690.000 rows=51726 loops=1)
Recheck Cond: ((field_id)::text = 'Book_EstimatedDueDate'::text)
-> Bitmap Index Scan on index_field_instances_on_field_id (cost=0.00..2669.22 rows=47567 width=0) (actual time=830.000..830.000 rows=51729 loops=1)
Index Cond: ((field_id)::text = 'Book_EstimatedDueDate'::text)
-> Hash (cost=5911.34..5911.34 rows=11312 width=8) (actual time=1590.000..1590.000 rows=11431 loops=1)
-> Bitmap Heap Scan on form_instances (cost=511.94..5911.34 rows=11312 width=8) (actual time=720.000..1570.000 rows=11431 loops=1)
Recheck Cond: ((workflow_state)::text = 'DRqueued'::text)
-> Bitmap Index Scan on index_form_instances_on_workflow_state (cost=0.00..509.11 rows=11312 width=0) (actual time=650.000..650.000 rows=11509 loops=1)
Index Cond: ((workflow_state)::text = 'DRqueued'::text)
Total runtime: 15430.000 ms
(12 rows)
When you say The field_instances table is indexed on both form_instance_id and field_id you mean that there are separate indexes on form_instance_id and field_id on that table?
Try dropping the index on form_instance_id and put a concatenated index on (form_instance_id, field_id).
An index works by giving you a quick lookup that tells you where the rows are that match your index. It then has to fetch through those rows to do what you want. So you always want your index to be as specific as possible. If you put two indexes on the table, you'll have two different ways to do a lookup, but a query will usually only take advantage of one of them. If you put a concatenated index on the table, you'll be able to look up on the first field in the index, the first two fields, etc efficiently. (So a concatenated index on (a, b) gives you fast lookups on a, even faster lookups on both a and b, but doesn't help you look things up on b)
Right now it is figuring out all possible things in form_instances that have the right state. It separately figures out all of the field_instances that have the right field id. It then does a hash join. For this makes a lookup hash from one result set, and scans the other for matches.
With my suggestion it should figure out all possible form_instances of interest. It will then go to the index, and figure out all of the field_instances that match on both the form instance and field id, and then it will find exactly the results of interest. Because the index is more specific, the database will have fewer rows of data to deal with to process your query.
http://explain.depesz.com is a fantastic online tool that helps you identify the hot spots visually. I pasted your results into the tool and got this analysis: http://explain.depesz.com/s/VIk
It's hard to tell anything specifically without seeing your tables and indexes, however.
Need to know the data you have in your table however just from looking at the sql and column names I would recommend
do you really need an index on workflow_state assuming elements within it can't be very unique - this might not improve select but will insert or update...
try making field_id check the first condition in your where statement

Postgresql index on xpath expression gives no speed up

We are trying to create OEBS-analog functionality in Postgresql. Let's say we have a form constructor and need to store form results in database (e.g. email bodies). In Oracle you could use a table with 150~ columns (and some mapping stored elsewhere) to store each field in separate column. But in contrast to Oracle we would like to store all the form in postgresql xml field.
The example of the tree is
<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<object_id>2</object_id>
<pack_form_id>23</pack_form_id>
<prod_form_id>34</prod_form_id>
</row>
We would like to search through this field.
Test table contains 400k rows and the following select executes in 90 seconds:
select *
from params
where (xpath('//prod_form_id/text()'::text, xmlvalue))[1]::text::int=34
So I created this index:
create index prod_form_idx
ON params using btree(
((xpath('//prod_form_id/text()'::text, xmlvalue))[1]::text::int)
);
And it made no difference. Still 90 seconds execution. EXPLAIN plan show this:
Bitmap Heap Scan on params (cost=40.29..6366.44 rows=2063 width=292)
Recheck Cond: ((((xpath('//prod_form_id/text()'::text, xmlvalue, '{}'::text[]))[1])::text)::integer = 34)
-> Bitmap Index Scan on prod_form_idx (cost=0.00..39.78 rows=2063 width=0)
Index Cond: ((((xpath('//prod_form_id/text()'::text, xmlvalue, '{}'::text[]))[1])::text)::integer = 34)
I am not the great plan interpreter so I suppose this means that index is being used. The question is: where's all the speed? And what can i do in order to optimize this kind of queries?
Well, at least the index is used. You get a bitmap index scan instead of a normal index scan though, which means the xpath() function will be called lots of times.
Let's do a little check :
CREATE TABLE foo ( id serial primary key, x xml, h hstore );
insert into foo (x,h) select XMLPARSE( CONTENT '<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<object_id>2</object_id>
<pack_form_id>' || n || '</pack_form_id>
<prod_form_id>34</prod_form_id>
</row>' ),
('object_id=>2,prod_form_id=>34,pack_form_id=>'||n)::hstore
FROM generate_series( 1,100000 ) n;
test=> EXPLAIN ANALYZE SELECT count(*) FROM foo;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=4821.00..4821.01 rows=1 width=0) (actual time=24.694..24.694 rows=1 loops=1)
-> Seq Scan on foo (cost=0.00..4571.00 rows=100000 width=0) (actual time=0.006..13.996 rows=100000 loops=1)
Total runtime: 24.730 ms
test=> explain analyze select * from foo where (h->'pack_form_id')='123';
QUERY PLAN
----------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..5571.00 rows=500 width=68) (actual time=0.075..48.763 rows=1 loops=1)
Filter: ((h -> 'pack_form_id'::text) = '123'::text)
Total runtime: 36.808 ms
test=> explain analyze select * from foo where ((xpath('//pack_form_id/text()'::text, x))[1]::text) = '123';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..5071.00 rows=500 width=68) (actual time=4.271..3368.838 rows=1 loops=1)
Filter: (((xpath('//pack_form_id/text()'::text, x, '{}'::text[]))[1])::text = '123'::text)
Total runtime: 3368.865 ms
As we can see,
scanning the whole table with count(*) takes 25 ms
extracting one key/value from a hstore adds a small extra cost, about 0.12 µs/row
extracting one key/value from a xml using xpath adds a huge cost, about 33 µs/row
Conclusions :
xml is slow (but everyone knows that)
if you want to put a flexible key/value store in a column, use hstore
Also since your xml data is pretty big it will be toasted (compressed and stored out of the main table). This makes the rows in the main table much smaller, hence more rows per page, which reduces the efficiency of bitmap scans since all rows on a page have to be rechecked.
You can fix this though. For some reason the xpath() function (which is very slow, since it handles xml) has the same cost (1 unit) as say, the integer operator "+"...
update pg_proc set procost=1000 where proname='xpath';
You may need to tweak the cost value. When given the right info, the planner knows xpath is slow and will avoid a bitmap index scan, using an index scan instead, which doesn't need rechecking the condition for all rows on a page.
Note that this does not solve the row estimates problem. Since you can't ANALYZE the inside of the xml (or hstore) you get default estimates for the number of rows (here, 500). So, the planner may be completely wrong and choose a catastrophic plan if some joins are involved. The only solution to this is to use proper columns.