I have a materialized view based on a very complex (and not very efficient) query. This materialized view is used for BI/visualization. It often takes ~4 minutes to complete the refresh, which is good enough for my needs. Running ANALYZE shows total cost of 2,116,446 with 137,682 rows and 1,976 width.
However, sometimes refresh materialize view XXX just never completes. Looking at the top processes (top in ubuntu), the process will use 100% CPU and 8.1% (of the server's 28GB memory) for a while... then all of a sudden it just disappears from the top list. It usually happens after ~4-5 minutes (although statement_timeout config is disabled). The postgres client just keeps waiting forever, and the view never refreshes.
Running the query behind the view directly (i.e. SELECT ...) will fail as well (same issue).
I'm using version 9.5. I've tried to increase effective_cache_size, shared_buffers, and work_mem in postgres config, but still the same result.
Sometimes, after several attempts, the refresh command will complete successfully. But it's unpredictable and currently just wouldn't work even after multiple attempts / db restarts.
Any suggestions to what might be the problem?
Related
I have a scenario that repeats itself every few hours. In every few hours, there is a sudden increase in row exclusive locks in PostgreSQL DB. In Meantime there seems that some queries are not responded in time and causes connection exhaustion to happen that PostgreSQL does not accept new clients anymore. After 2-3 minutes locks and connection numbers drops and the system comes back to normal state again.
I wonder if auto vacuum can be the root cause of this? I see analyze and vacuum (NOT FULL VACCUM) take about 20 seconds to complete on one of the tables. I have INSERT,SELECT,UPDATE and DELETE operations going on from my application and I don't have DDL commands (ALTER TABLE, DROP TABLE, CREATE INDEX, ...) going on. Can auto vacuum procedure conflict with queries from my application and cause them to wait until vacuum has completed? Or it's all the applications and my bad design fault? I should say one of my tables has a field of type jsonb that keeps relatively large data for each row (10 MB roughly).
I have attached an image from monitoring application that shows the sudden increase in row exclusive locks.
ROW EXCLUSIVE locks are perfectly harmless; they are taken on tables against which DML statements run. Your graph reveals nothing. You should set log_lock_waits = on and log_min_duration_statement to a reasonable value. Perhaps you can spot something in the logs. Also, watch out for long running transactions.
I have a postgres9.5 installed on a Linux machine and for the last few days, it is continuously showing a huge CPU usage(80%-90%). I try to check pg_stat_activity table to find long-running queries or sessions but there is not anything to blame for and also I check that I have indexed on all my table but still CPU usage is a spike up from the Postgres process. Is there any way to figure out the reason?
We have a large table (1.6T) and deleted 60% of the records, and want to reclaim that space for the OS and file system. We're running PostgreSQL 9.4 (we're stuck on that pending a major software upgrade).
We need that space, as we're down to 100GB and when materialized views are refreshed we're running out of space on the server.
I tried running VACUUM(FULL, ANALYZE, VERBOSE) schema.tablename and let it run for 24 hours last weekend, but had to cancel it to get the server back online.
I'm running it again this weekend, after deleting the indexes (I'm hoping that will speed it up so it will finish). So far there is no output or indication of progress. I created a tablespace on another SSD array and set it up as temp space using temp_tablespaces = 'name_of_other_tablespaces', but du -chs shows it is still empty.
The query shows active, but since disk usage isn't increasing it just feels like it's just sitting there, making no noise and pretending it's not there.
This is on a server with 512GB of RAM and a RAID 10 array of very fast enterprise SSDs. Is there any way to get progress and know that something is actually happening and that it's working? Any guesses as to duration, or other suggestions?
I found out what was happening, by finally noticing that it was waiting for an autovacuum process to finish, which never happened (autovacuum: VACUUM pg_toast.pg_toast_nnnnn (to prevent wraparound)). Once I killed that the VACUUM ran quite quickly and cleared up over 1TB of space. Time to celebrate!
I ran two deletes on a PostgreSQL 9.3.12 database against a fairly large table. Each one required a table scan and took about ~10 minutes to complete.
While they were running clients weren't impacted. Disk I/O was high, upwards of 70%, but that's fine.
After the second delete finished Disk I/O went to near zero and Load Average shot through the roof. Requests were not being completed in a timely manner and since new requests continued to arrive they all stacked up.
My two theories are:
Something with the underlying I/O layer that caused all I/O requests to block for some period of time, or
Postgres acquired (and held for a non-trivial period of time) a lock needed by clients. Either a global one or one related to the table from which rows were deleted. This table is frequently inserted into by clients; if someone were holding a lock that blocked inserts it would definitely explain this behavior.
Any ideas? Load was in excess of 40, which never happens in our environment even during periods of heavy load. Network I/O was high during/after the deletes but only because they were being streamed to our replication server.
I have a dedicated OLTP server with SQL Server 2008 R2, 24 CPU Cores and 32 GB RAM. Earlier the SQL Server max memory setting had the default value of 0 - 2147483647 MB. And the ETL(mainly stored procedures) had good performance. But last week, somehow we inadvertently changed the SQL Server Max Memory setting to 0 - 16 GB. And the overall performance of ETL degraded and now it is taking twice the time as earlier. I tried to change it back by manually setting it back to 2147483647, also tried running the below query:
EXEC sp_configure'Show Advanced Options',1;
GO
RECONFIGURE;
GO
EXEC sp_configure'max server memory (MB)',2147483647;
GO
RECONFIGURE;
GO
But I cannot see the improvement in the performance. I even restarted the server after the changes but no luck. I also tried to reset the settings via Tools-->Import and Export Settings --> Reset all settings, but still no luck. Earlier through task manager, it was showing that SQL server is utilizing 95% of the total memory all the time.Now the memory utilization is very low. I need the earlier setting back.
Can anyone help me, how I can restore the default settings (I cannot reinstall the SQL Server as its already in production and have large amount of data)
Memory setting changes are dynamic and SQL Server will acquire memory as it needs to IF the OS has free memory available. This could take a while but no need to reboot.
When you rebooted, you did at least a couple of things to "reset" performance. For starters, you flushed the data cache so everything SQL Server needs to process your stored procs have to be retrieved from disk. This resolves itself relatively quickly without any additional action. As the SPs are run again, cache gets warmed up.
Second, you also flushed all query plans. This is a bit trickier. Depending on statistics freshness and how your queries were written, you may have had some very efficient plans earlier but the current/new plans are bad (there are a number of reasons for it).
Check to ensure sp_configure shows the correct "run_value". Run through the usual performance and workload monitoring steps. If things get back to "normal", then come back here with specific perf tuning questions.