To avoid having the windows ghosting effect for one of our applications we want to increase the ghosting timeout to be 5 minutes i.e. the HungAppTimeout value should be 300000. I would like to know if this is with in the bounds of the allowed values for HungAppTimeout registry key?
Related
I have a use case to insert 100 000 rows per min at the same time in another end few threads will take the rows and delete them from my table. So definitely it will create lot of dead tuples in my table.
My auto-vacuum configurations are
autovacuum_max_workers = 3
autovacuum_naptime = 1min
utovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1
autovacuum_vacuum_cost_delay = 20ms
autovacuum_vacuum_cost_limit = -1
From "pg_stat_user_tables" I can find auto-vacuum is running on my table but within a few hours my disk will be full (500 GB) and I can't able to insert any new row.
on the second try, I changed the following configuration
autovacuum_naptime = 60min
autovacuum_vacuum_cost_delay = 0
This time my simulation and auto-vacuum are running well and max disk size is 180 GB.
Here my doubt is, if I change the "autovacuum_vacuum_cost_delay" to zero ms, how auto-vacuum freeing the dead tuples space and PG reuse it? why it is not working as intended if I set the value is 20 ms?
Here my doubt is, if I change the "autovacuum_vacuum_cost_delay" to zero ms, how auto-vacuum freeing the dead tuples space and PG reuse it?
The space freed up by vacuum is recorded in the free space map, from where it gets handed out for re-use by future INSERTs.
Another detail to add, in 9.6 the free space map is only vacuumed once the entire table itself is completely vacuumed, and so the freed up space is not findable until then. If the VACUUM never makes it to the very end, because it is too slow or gets interupted, then the space it is freeing up will not be reused for INSERTs. This was improved in v11.
why it is not working as intended if I set the value is 20 ms?
Because vacuum can't keep up at that value. The default values for PostgreSQL are often suitable only for smaller servers, which yours doesn't seem to be. It is appropriate and advisable to change the defaults in this situation. Note that in v12, the default was lowered from 20 to 2 (and its type was correspondingly changed from int to float, so you can now specify the value with more precision)
To summarize, your app creates tons of dead tuples and autovacuum can't keep up. Possible solutions
This sounds more like a task queue than a regular table. Perhaps a PostgreSQL table is not ideal for your this specific use case. Use a solution such as RabbitMQ/Redis instead.
Create time-based range partitions and purge old partitions once they're empty, while disabling autovacuum on this table alone. Consider not deleting rows at all and just purging old partitions if you can identify handled partitions.
Tweak with the autovacuum settings so that it works constantly, without any naps or interference. Increasing maintenance_work_mem could help speed autovacuum too. Perhaps you'll find out that you've reached your hard-drive's limits. In that case, you will have to optimize the storage so that it can accommodate those expensive INSERT+DELETE+autovacuum operations.
Well the default value is 2 ms Autovacuum. So your 20ms value is high:
autovacuum_vacuum_cost_delay (floating point)
"Specifies the cost delay value that will be used in automatic VACUUM operations. If -1 is specified, the regular vacuum_cost_delay value will be used. If this value is specified without units, it is taken as milliseconds. The default value is 2 milliseconds. This parameter can only be set in the postgresql.conf file or on the server command line; but the setting can be overridden for individual tables by changing table storage parameters."
As explained here Vacuum:
"
vacuum_cost_delay (floating point)
The amount of time that the process will sleep when the cost limit has been exceeded. If this value is specified without units, it is taken as milliseconds. The default value is zero, which disables the cost-based vacuum delay feature. Positive values enable cost-based vacuuming.
When using cost-based vacuuming, appropriate values for vacuum_cost_delay are usually quite small, perhaps less than 1 millisecond. While vacuum_cost_delay can be set to fractional-millisecond values, such delays may not be measured accurately on older platforms. On such platforms, increasing VACUUM's throttled resource consumption above what you get at 1ms will require changing the other vacuum cost parameters. You should, nonetheless, keep vacuum_cost_delay as small as your platform will consistently measure; large delays are not helpful.
"
We use sequences to maintain the number of orders at a particular business unit.
Last few we days, we have noticed that there have been has strange jumps ranging from 1 to 32 in the sequence numbers, multiple times a day.
The sequence which we have configured has a cache_value of 1.
Why is this happening and how can we resolve this?
I couldn't find much documentation regarding the same.
PostgreSQL sequences can jump by 32 when you promote a secondary server to primary, and sometimes when PostgreSQL crashes hard and has to recover.
I have only seen it skip 32, and I haven't been able to change it by changing the sequence's cache value.
https://www.postgresql.org/message-id/1296642753.8673.29.camel%40gibralter, and the response https://www.postgresql.org/message-id/20357.1296659633%40sss.pgh.pa.us, indicate that this is expected behavior in a crash condition. I'm unsure if this only occurs if you're using Streaming Replication or not.
I thought it was a safety feature, in case the primary had committed records that had not been replicated to the secondary at the time of promotion. Then a human would be able to manually dump and copy the records from the old-primary to the new-primary before wiping the old-primary.
This could indicate that your primary is crashing and recovering several times per day, which is not a good state to be in.
We are using MongoDB as a virtual machine (A3) on Azure. We are trying to simulate running cost of using MongoDB for our following scenario:
Scenario is to insert/update around 2k amount of data (time series data) every 5 minutes by 100,000 customers. We are using MongoDB on A3 instance (4 core) of Windows Server on Azure (that restricts 4TB per shard).
When we estimated running cost, it is coming out to be approx $34,000 per month - which includes MongoDB licensing, our MongoDB virtual machine, storage, backup storage and worker role.
This is way costly. We have some ideas to bring the cost down but need some advice on those ideas as some of you may have already done this.
Two questions:
1- As of today, we are estimating to use 28 MongoDB instances (with 4 TB limit). I have read that we can increase the disk size from 4TB to 64 TB on Linux VM or Windows Server 2012 server. This may reduce our number of shards needed. Is running MongoDB on 64TB disk size shard possible in Azure?
You may ask why 28 number of instances..
2- We are calculating our number of shards required based on "number of inserts per core"; which is itself depend on number of values inserted in the MongoDB per message. each value is 82 bytes. We did some load testing and it comes out that we can only run 8000 inserts per second and each core can handle approx. 193 inserts per second - resulting into need of 41 cores (which is way too high). You can divide 41 cores/4 resulting into A3 11 instances -- which is another cost....
Looking for help to see - if our calculation is wrong or the way we have setup is wrong.
Any help will be appreciated.
Question nr. 1:
1- As of today, we are estimating to use 28 MongoDB instances (with 4
TB limit). I have read that we can increase the disk size from 4TB to
64 TB on Linux VM or Windows Server 2012 server. This may reduce our
number of shards needed. Is running MongoDB on 64TB disk size shard
possible in Azure?
According to documentation here, the maximum you can achieve is 16TB, which is 16 Data disks attached, max. 1 TB each. So, technically the largest disk you can attach is 1TB, but you can build RAID 0 stripe with the 16 disks attached, so you can get 16TB storage. But this (16TB) is the maximum amount of storage you can officially get.
According to the Azure documentation A3 size can have a maximum of 8 data disks. So a maximum of 8TB. A4 can handle 16 disks. I would assume your bottleneck here is disk and not the number of cores. So i'm not convinced you need such a big cluster.
I was doing some tests to figure out the performance of Replica Sets in our environment. The set up consists of 1 Primary and 1 Secondary in local Data Center and 1 Secondary in remote Data Center.
My record consists of 1 field of size 512 bytes. The numbers of inserts were 100,000 and 500,000.
During week 1 the inserts in primary were happening within the following time:
100,000 writes - 5 seconds
500,000 writes - 20 seconds
Week 2 -
100,000 writes - 14 seconds
500,000 writes - 66 seconds
I can't seem to figure what could have caused the rate to dip down so much. I have an oplog of size 1 GB and journaling enabled. I am not concerned about replication lag since there isn't much lag. There is no other i/o processes happening in the environments on which the mongodb is setup. I have also deleted files and restarted the machines but still I notice this dip.
Can anyone let me know what could be the cause?
Thanks,
Ganesh
If these are virtual machines, then you might have a "noisy neighbor". If you're using NAS or SAN storage, then write throughput can be affected by network traffic or by I/O load for other hosts sharing the NAS or SAN.
There is a database with 9 million rows, with 3 million distinct entities. Such a database is loaded everyday into MongoDB using perl driver. It runs smoothly on the first load. However from the second, third, etc. the process is really slowed down. It blocks for long times every now and then.
I initially realised that this was because of the automatic flushing to disk every 60 seconds, so I tried setting syncdelay to 0 and I tried the nojournalling option. I have indexed the fields that are used for upsert. Also I have observed that the blocking is inconsistent and not always at the same time for the same line.
I have 17 G ram and enough hard disk space. I am replicating on two servers with one arbiter. I do not have any significant processes running in the background. Is there a solution/explanation for such blocking?
UPDATE: The mongostat tool says in the 'res' column that around 3.6 G is used.