View Cardinality with SELECT - postgresql

How can I view the estimated cardinality of a SELECT query in PostgreSQL prior execution? Is there a way to do it with the EXPLAIN keyword?

Related

PostgreSQL Query Performance Fluctuates

We have a system that loads data and then conducts data QC in PostgreSQL. The QC function's performance fluctuates drastically in one of our environments with no apparent pattern. I was able to track down the performance of the following simple query in the QC function:
WITH foo AS (SELECT full_address, jsonb_agg (gad_rec_id) gad_rec_ids
FROM azgiv.v_full_addresses
WHERE gad_gly_id = 495
GROUP BY full_address
HAVING count(1) > 1)
SELECT gad_nguid, gad_rec_id, foo.full_address
FROM azgiv.v_full_addresses JOIN foo
ON foo.full_address = v_full_addresses.full_address
AND v_full_addresses.gad_gly_id = 495;
When I ran into slow-performance situation (Fig 2), I had to ANALYZE the table behind the view before the query plan changes to fast (Fig 1). The v_full_addresses is a simple view of a partitioned table with bunch of columns concatenated.
Here are two images of the query plans for the above query. I am newbie when comes to understanding query optimization and any help is greatly appreciated.
&
If performance improves after you ANALYZE a table, that means that the database's knowledge about the distribution of the data is outdated.
The best remedy is to tell PostgreSQL to collect these statistics more often:
ALTER TABLE some_table SET (autovacuum_analyze_scale_factor = 0.02);
0.02 is five times lower than the default 0.1, so statistics will be gathered five times more often.
If the bad query plans are generated right after a bulk load, you must choose a different strategy. In this case the problem is that it takes up to a minute for auto-analyze to kick in and calculate new statistics.
In that case you should run an explicit ANALYZE at the end of the bulk load.

create 2 indexes on same column

I have a table with geometry column.
I have 2 indexes on this column:
create index idg1 on tbl using gist(geom)
create index idg2 on tbl using gist(st_geomfromewkb((geom)::bytea))
I have a lot of queries using the geom (geometry) field.
Which index is used ? (when and why)
If there are two indexes on same column (as I show here), can the select queries run slower than define just one index on column ?
The use of an index depends on how the index was defined, and how the query is invoked. If you SELECT <cols> FROM tbl WHERE geom = <some_value>, then you will use the idg1 index. If you SELECT <cols> FROM tabl WHERE st_geomfromewkb(geom) = <some_value>, then you will use the idg2 index.
A good way to know which index will be used for a particular query is to call the query with EXPLAIN (i.e., EXPLAIN SELECT <cols> FROM tbl WHERE geom = <some_value>) -- this will print out the query plan, which access methods, which indexes, which joins, etc. will be used.
For your question regarding performance, the SELECT queries could run slower because there are more indexes to consider in the query planning phase. In terms of executing a given query plan, a SELECT query will not run slower because by then the query plan has been established and the decision of which index to use has been made.
You will certainly experience performance impact upon INSERT/UPDATE/DELETE of the table, as all indexes will need to be updated with respect to the changes in the table. As such, there will be extra I/O activity on disk to propagate the changes, slowing down the database, especially at scale.
Which index is used depends on the query.
Any query that has
WHERE geom && '...'::geometry
or
WHERE st_intersects(geom, '...'::geometry)
or similar will use the first index.
The second index will only be used for queries that have the expression st_geomfromewkb((geom)::bytea) in them.
This is completely useless: it converts the geometry to EWKB format and back. You should find and rewrite all queries that have this weird construct, then you should drop that index.
Having two indexes on a single column does not slow down your queries significantly (planning will take a bit longer, but I doubt if you can measure that). You will have a performance penalty for every data modification though, which will take almost twice as long as with a single index.

Filtering over database views is much slower than direct query

I noticed a big difference in query plan when doing regular query versus creating database view and then querying the view.
case 1 basic query:
SELECT <somequery> WHERE <some-filter> <some-group-by>
case 2 database view:
CREATE VIEW myview AS SELECT <some-query> <some-group-by>;
SELECT FROM myview WHERE <some-filter>;
I have noticed that in the case 2 postgres will join/aggregate everything possible, and only then it applies the filter. In case 1 it doesn't touch rows filtered out with where clause. So case 2 is a lot slower.
Are there any tricks to work around this while keeping the database view?
Your View has to re-create the dataset to filter from every time you perform the SELECT FROM.
The easiest way is to change the view to a materialized view. If your data is not changing every 2 minutes, a materialized view will save the select to be used, where your filter can then work on the "saved" dataset. The second thing you can do is add Indexes on the View.
Example Here: https://hashrocket.com/blog/posts/materialized-view-strategies-using-postgresql
create materialized view matview.account_balances as
select
name,
coalesce(
sum(amount) filter (where post_time <= current_timestamp),
0
) as balance
from accounts
left join transactions using(name)
group by name;
create index on matview.account_balances (name);
create index on matview.account_balances (balance);
This is the simplest way to reduce the runtime of your query.
Hope this helps.

I need a suggestion on query tuning techiniques

select /* all_rows */x1,x2,x3
from view_x
where x1 in
(select a.b1 from mytable a,mytable2 b
where a.b2=b.c2)
as view_x is a view, which is trying to get the data from the other source(#othertable_dblink)
I have index on b1. but as view_x is a view , I don't have privilege to create a index on that.
NOTE: Due to this, the mytable and mytable2 are going on error like "table access full"
My Question: How can I reduce the time on this, by not allowing it to go for "table access full"
if there are any query tuning techniques , pls let me know.
"Table access full" is not an error, it's a data access path. Sometimes it's even the optimal one.
If you're sure the performance problem is on the sub-select, to speed that up the optimal indexes are likely:
Index on mytable2(c2)
Index on mytable1(b2,b1) (in that order)
The fields that need to be indexed to be useful for the join are mytable2.c2 and mytable1.b2, an index on mytable.b1 alone won't help for the join at all.
But depending on the size of the tables and the number of rows returned by that join, full scans might be the fastest option.

TSQL: left join on view very slow

in some complex stored procedure we use view and left join on this view. It takes 40 sec to execute.
Now, if we create TABLE variable and store in it result of view, then do the left join on this variable and not on the view, it takes 3 sec...
What can explain this behavior?
The view expands into the main query. S
So if you have 5 tables in the view, these expand with the extra table into one big query plan with 6 tables. The performance difference will most likely be caused by added complexity and permutations of the extra table you left join with.
Another potential issue: Do you then left join on a column that has some processing on it? This will further kill performance.
Courtesy of Alden W. This response for one of my questions like yours by Alden W.
You may be able to get better performance by specifying the VIEW ALGORITHM as MERGE. With MERGE MySQL will combine the view with your outside SELECT's WHERE statement, and then come up with an optimized execution plan.
To do this however you would have to remove the GROUP BY statement from your VIEW. As it is, if a GROUP BY statement is included in your view, MySQL will choose the TEMPLATE algorithm. A temporary table is being created of the entire view first, before being filtered by your WHERE statement.
If the MERGE algorithm cannot be used, a temporary table must be used instead. MERGE cannot be used if the view contains any of the following constructs:
Aggregate functions (SUM(), MIN(), MAX(), COUNT(), and so forth)
DISTINCT
GROUP BY
HAVING
LIMIT
UNION or UNION ALL
Subquery in the select list
Refers only to literal values (in this case, there is no underlying table)
Here is the link with more info. http://dev.mysql.com/doc/refman/5.0/en/view-algorithms.html
If you can change your view to not include the GROUP BY statement, to specify the view's algorithm the syntax is:
CREATE ALGORITHM = MERGE VIEW...