Postgresql 12 needs manual analyzing for performance - postgresql

I have a performance drop once in a couple of days. When I manually analyze a table, the performance is back to normal. The table to be analyzed has 2.7M records. The query is very complex, uses a lot of different tables, unions and joins. The view I query just aggregates some data from another view. If I query the main -aggregated- view, it takes about 1.5-3.5 secs. If I query the view one level higher, it takes only 0.2s.
This issue occured when migrated from 9.5 to 12.3. Analyzing one specific table solves it for a couple of days.
Autoanalyze never occurs automatically, so there seems no need to autoanalyze. But the query planner goes wrong somehow. I've increased the statistics_target in the config to 1500.
We never had this issue on 9.5. Maybe it has something to do with the JIT, but disabling it in the session does not seem to solve the issue.

Related

Do you need to VACUUM SORT on Redshift Materialized Views?

I can not seem to find any documentation that indicates how the sort order of an incrementally updated materialized view is maintained. Based on the lack of docs I assume this is just taken care of on REFRESH.
Does anyone know if you should be running VACUUM SORT on views?
I would do so, to be safer. On Materialized View Refresh docummentation it also mentions that the autorefresh can be automatically stopped by Redshift internal processes. We can also see some misleading information such as the vacuum_sort_benefit column for that view being NULL.
But after running the vacuum sort only my-mv-view, where my-mv-view is the name returned on svv_table_info it showed that got improvements on its sort.
It is also suggested to vacuum Postgresql materialized views due frozen ids; this behaviour are on versions after 9.4. More on the Routine Vacuuming at Postgres Official Page.
I hope this helps!
Cheers!

REFRESH MATERIALIZED VIEW suddenly taking more time to complete

We have a materialized view in our Postgres DB (11.12, managed by AWS RDS). We have a scheduled task that updates it every 5 minutes using REFRESH MATERIALIZED VIEW <view_name>. At some specific point last week, the time needed to refresh the view suddenly went from ~1s to ~20s. The view contains ~70k rows, with around 15 columns, all of them being integers, booleans or UUIDs.
Prior to this:
There were no changes in the server configuration.
There were no changes to the view itself. In fact, running EXPLAIN ANALYZE <expression used to create the view> returns that the query still gets executed in less than a second. If the query is ran with a client like Postico, it takes ~20s to fetch all the results (a bit consistent with the time needed to materialize it, although we assume this is due to the time needed for network transmission).
There were no changes in the schema or any significant record increase in the contents of the tables needed to compute the view.
RDS Performance Insights indicate that the query is mostly using CPU resources
I know this is probably not enough to get a solution, but:
Are there any server performance metrics or logs that could lead us to understand better this situation?
Is this just the normal time the server needs to persist the view to disk? If so, any idea of possible reasons why it started to take so long recently?
Here is a link to the execution plan.
EDIT: creating another materialized view with the same JOINS but selecting less columns performs as expected (~1s).
EDIT 2: setting enable_nestloop = false greatly speeds up the REFRESH operation (same performance as before). Would this indicate that refactoring the underlying query could solve the issue?
Try REFRESH materialized view concurrently.
When you refresh data for a materialized view, PostgreSQL locks the entire table therefore you cannot query data against it. To avoid this, you can use the CONCURRENTLY option.
REFRESH MATERIALIZED VIEW CONCURRENTLY view_name;
With CONCURRENTLY option, PostgreSQL creates a temporary updated version of the materialized view, compares two versions, and performs INSERT and UPDATE only the differences.
You can query against a materialized view while it is being updated. One requirement for using CONCURRENTLY option is that the materialized view must have a UNIQUE index.
Original poster here. This is more than a year old, but here's what happened and how we eventually fixed it.
TLDR:
-REFRESH MATERIALIZED VIEW <query> started to take much longer than executing the query used to construct the view (~1s vs ~20s).
After a couple of weeks this question was asked, the query itself started to behave similarly (taking ~20s to complete). At this point, the EXPLAIN ANALYZE started to show indications of performance issues with the query. So we ended up optimising the underlying query (the biggest performance gain being replacing some JOINS with a CTE).
After this, the performance of both the REFRESH MATERIALIZED VIEW <query> and the standalone query behaved correctly (execution time < 1s).
A still open question here is why the REFRESH MATERIALIZED VIEW <query> and the standalone query had different performance at some point in time? Was the DB query planner choosing different query plans depending on whether it was going to materialize the view or not? I guess if someone knows if such thing is possible, please comment.
Updates materialized view every time (or every 5 minutes) this is not a good way to refresh materialized. Then the meaning of using materialized view does not remain. Let me explain to you one of the ways I found with my own logic, based on my own experience, so you can find a more optimal way later. Assumed, we used two tables in our materialized view, and we need that is a changed data one of the two tables we will refresh materialized view. To do this during the update or delete table we must insert to the table (for example refresh_materialized table) one record (you can also use the trigger), through which will be performed refreshing materialized view
For example:
insert into refresh_materialized
(
refresh_status,
insert_date,
executed_date
)
values (
false,
now(),
null
)
And so in our schedule task, we can use this query:
select count(*) from refresh_materialized
where refresh_status = false
if the count(*) will be > 0 then we must refresh materialized view else do nothing. After the refreshing materialized view we must update this table:
update refresh_materialized
set
refresh_status = true,
executed_date = now()
where
refresh_status = false;

Sync Elasticsearch Postgresql on a Springboot application

I have Postgresql as my primary database and I would like to take advantage of the Elasticsearch as a search engine for my SpringBoot application.
Problem: The queries are quite complex and with millions of rows in each table, most of the search queries are timing out.
Partial solution: I utilized the materialized views concept in the Postgresql and have a job running that refreshes them every X minutes. But on systems with huge amounts of data and with other database transactions (especially writes) in progress, the views tend to take long times to refresh (about 10 minutes to refresh 5 views). I realized that the current views are at it's capacity and I cannot add more.
That's when I started exploring other options just for the search and landed on Elasticsearch and it works great with the amount of data I have. As a POC, I used the Logstash's Jdbc input plugin but then it doesn't support the DELETE operation (bummer).
From here the soft delete is the option which I cannot take because:
A) Almost all the tables in the postgresql DB are updated every few minutes and some of them have constraints on the "name" key which in this case will stay until a clean-up job runs.
B) Many tables in my Postgresql Db are referenced with CASCADE DELETE and it's not possible for me to update 220 table's Schema and JPA queries to check for the soft delete boolean.
The same question mentioned in the link above also provides PgSync that syncs the postgresql with elasticsearch periodically. However, I cannot go with that either since it has LGPL license which is forbidden in our organization.
I'm starting to wonder if anyone else encountered this strange limitation of elasticsearch and RDMS.
I'm open to other options rather than elasticsearch to solve my need. I just don't know what's the right stack to use. Any help here is much appreciated!

performance of refreshing postgres materialized view

I am exploring materialized views to create de-normalized view to avoid joining multiple tables for read performance. APIs will read from the materialized views to provide data to clients.
I am using amazon aurora postgres (version 11).
I am using a unique index on the materialized view (MV) so that I can use the “refresh concurrently” option.
What I am noticing though is that when only a fraction of the rows get updated in one of the source tables and I try to refresh the view, it's pretty slow. In fact slower than populating the view for the first time. e.g.: to populate MV first time takes ~30 mins, refresh is taking more than an hour. less than 1% of rows have been updated. The main three tables involved in generating the MV have ~18 million, 27 million & 40 million rows.
The timeliness of the materialized view refresh is important so that data is not stale for too long.
I could go with custom tables to store the denormalized data instead of materialized views but would have to implement logic to refresh data. So planning to avoid that if possible.
Is there anything that can be done to speed up the refresh process of the materialized views?
Please let me know if you need more details.
thanks
Kiran
You can create a second materialised view and update it (not concurrently), and then swap the names.to the tables in a transaction.
I actually have no idea why postgres didn't implement CONCURRENTLY this way.
Refreshing a materialized view is slow even if little has changed, because every time the view is refreshed, the defining query is run.
Using CONCURRENTLY makes the operation even slower, because it is not a wholesale replacement of the materialized view contents, but modification of the existing data.
Perhaps you could create a denormalized table that is updated by a trigger whenever the underlying tables are modified.
Maybe I'm a bit late to the party, but if you would be on postgres 13 or later you could try this extension: https://github.com/sraoss/pg_ivm
They have some limitations but promise much faster rebuilt times.
Here's a bit more on it from pganalyze: https://pganalyze.com/blog/5mins-postgres-15-beta1-incremental-materialized-views-pg-ivm (in contrary to project authors, they want you to be on v14 or later)

ALTER query very slow on tiny table in PostgreSQL

I've got PostgreSQL 9.2 and a tiny database with just a bit of seed data for a website that I'm working on.
The following query seems to run forever:
ALTER TABLE diagnose_bodypart ADD description text NOT NULL;
diagnose_bodypart is a table with less than 10 rows. I've let the query run for over a minute with no results. What could be the problem? Any recommendations for debugging this?
Adding a column does not require rewriting a table (unless you specify a DEFAULT). It is a quick operation absent any locks. pg_locks is the place to check, as Craig pointed out.
In general the most likely cause are long-running transactions. I would be looking at what work-flows are hitting these tables and how long the transactions are staying open for. Locks of this sort are typically transactional and so committing transactions will usually fix the problem.