I have a dedicated OLTP server with SQL Server 2008 R2, 24 CPU Cores and 32 GB RAM. Earlier the SQL Server max memory setting had the default value of 0 - 2147483647 MB. And the ETL(mainly stored procedures) had good performance. But last week, somehow we inadvertently changed the SQL Server Max Memory setting to 0 - 16 GB. And the overall performance of ETL degraded and now it is taking twice the time as earlier. I tried to change it back by manually setting it back to 2147483647, also tried running the below query:
EXEC sp_configure'Show Advanced Options',1;
GO
RECONFIGURE;
GO
EXEC sp_configure'max server memory (MB)',2147483647;
GO
RECONFIGURE;
GO
But I cannot see the improvement in the performance. I even restarted the server after the changes but no luck. I also tried to reset the settings via Tools-->Import and Export Settings --> Reset all settings, but still no luck. Earlier through task manager, it was showing that SQL server is utilizing 95% of the total memory all the time.Now the memory utilization is very low. I need the earlier setting back.
Can anyone help me, how I can restore the default settings (I cannot reinstall the SQL Server as its already in production and have large amount of data)
Memory setting changes are dynamic and SQL Server will acquire memory as it needs to IF the OS has free memory available. This could take a while but no need to reboot.
When you rebooted, you did at least a couple of things to "reset" performance. For starters, you flushed the data cache so everything SQL Server needs to process your stored procs have to be retrieved from disk. This resolves itself relatively quickly without any additional action. As the SPs are run again, cache gets warmed up.
Second, you also flushed all query plans. This is a bit trickier. Depending on statistics freshness and how your queries were written, you may have had some very efficient plans earlier but the current/new plans are bad (there are a number of reasons for it).
Check to ensure sp_configure shows the correct "run_value". Run through the usual performance and workload monitoring steps. If things get back to "normal", then come back here with specific perf tuning questions.
Related
We run several Debian instances with PostgreSQL on Google compute engine and lately we have already seen several occurrences of the following problem.
Instance becomes suddenly non responsive. We cannot ssh it and we cannot connect to the database. Internal monitoring using telegraf is also not running during that period, no monitoring data collected.
Google monitoring of CPU activity shows very low usage during that period. GCP logs do not show any migration in fact do not show anything at all. Also all internal logs for instance - postgresql log, syslog, logs from periodical cronjobs - show the same gap. Looks like the instance was sort of frozen during that time. We so far noticed it only with PostgreSQL instances since these are heavily used.
Instances run these variants of OS and PG:
Debian 9 with PG 11.9
Debian 9 with PG 10.13
These incidents usually take 10-15 minutes, but in one case it was 1:20 hours. At the end of the incident some PG process is killed by an OOM killer but activity on the database immediately before the incident starts is usually relatively low, CPU usage and memory usage too. So it looks more like an instance has limited resources when it starts again? If it is even possible...
Any idea what could be the cause of these issues or what shall we look for? As I mentioned generally no info in internal logs on Debian during the period of the incident.
UPDATE: To avoid misunderstanding - instances in question are data warehouse database running on N1-highmen-8 machine (8 CPUs and 52 GB RAM) with 5 TB SSD. Or database collecting metrics from internet - custom machine 20 CPUs with 90 GB RAM and 3 TB SSD. All SW up to date.
UPDATE 2: Neither syslog, nor kern.log nor messages do not show anything for the time intervals during instance was non responsive. Immediately after incident telegraf recorded huge average load on CPUs but actually quite small CPU usage and Google monitoring shows very small CPU usage during the whole incident. Also immediately after the end of the incident always one of postgresql processes is killed by OOM killer causing database to go to the recovery mode.
As for PG work_mem parameter - instance collecting metrics (20 CPUs 90 GB RAM, 3 TB SSD) uses 8MB - it only inserts data but usually runs like 500 - 1000 connections.
Second instance is data warehouse analytical database and uses work_mem 128MB because lower numbers caused very bad query plans on majority of queries and usually runs only like 10 - 30 connections.
There was no unusual number of connections immediately before incidents happened on both databases.
UPDATE 3: Analytical database had today several small incidents of the same character. During the last one we stopped instance from GCP GUI and started it again after few minutes. Maybe it caused migration to the different HW. Since this operation instance is running OK.
I experienced a similar issue but with a MySQL Instance in GCP, the first issue was related with the type of the VM instance I used, I had a f1-micro machine type on this VM Instance and suddenly I wasn’t able to access the ssh. As this type of VM Instance has only 0.6GB of memory, it became out of memory soon, I changed it to a e2-medium that is value by default and it resolved my problems this time.
As the Instance was out of memory the services in the instance started to fail, it was the reason that I can't access my instance.
At another time I started again with similar issues, but this time, the problem was the disk, I only had 10 GB and there was a process filling my disk, when a partition was out of space, the instance started to fail again.
I only resized my disk, now my instance disk is 20GB and is working fine.
Having said that, I suggest increasing your resources per your convenience to enhance your performance, because to have the problems you described is a good indicator that your existing machine type is not a good fit for your workloads you run on that instance.
If your situation is the same as mine, you could change the machine type to adjust your memory and you can follow the next steps for these tasks please visit the following link to get further information about it.
Changing a machine type
1.- Go to the VM Instances page.
2.- In the Name column, click your instance.
From the instance details page, complete the following steps:
a) Click the Stop button to stop the instance, if you have not stopped it yet.
b) After the instance stops, click the Edit button at the top of the page.
c) Under the Machine configuration section, select the machine type you want to use, or create a custom machine type to increase only the Memory.
d) Save your changes and start again your VM Instance.
You can resize your disk following this guide or with the following command:
gcloud compute disks resize DISK_NAME --size DISK_SIZE
Or with the Console:
Go to the Disks page to see a list of zonal persistent disks in your project.
Click the name of the disk that you want to resize.
On the disk details page, click Edit.
In the Size field, enter the new size for your disk.
Click Save to apply your changes to the disk.
After you resize the disk, you must resize the file system so that the operating system can access the additional space.
Note: Do not resize boot disks beyond 2 TB because this is the limit.
Edit1
You mentioned that the logs don’t show information about the issue when the instance is frozen.
Did you try with the kernel logs? I think it could provide a wealth of diagnostic information about this issue.
For Debian, this logs should be in the following path:
/var/log/kern.log
Also the messages log could help
/var/log/messages
You can obtain more information about the logs in this link.
Also, I think it could be a PostgreSQL config problem, for example you could take a look at "work_mem", this parameter specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. The value defaults is four megabytes (4MB).
You can consult this URL to get more information.
Also I have found a good article that explains how to configure the PostgreSQL for Data Warehouse Usage
Another option could be that the kernel process in charge of identifying memory that could be paged out. You could configure your process to check smaller chunks more often.
This link explains better this configuration.
Additionally, as far as I know a data warehouse server consumes a lot of resources, so it could be a good idea to check if your Instance has enough resources for your workload.
Edit2
I have found an article that describes a similar problem and it said that:
When you consume more memory than is available on your machine you can start to see out of out of memory errors within your Postgres logs, or in worse cases the OOM killer can start to randomly kill running processes to free up memory. An out of memory error in Postgres simply errors on the query you’re running, where as the the OOM killer in Linux begins killing running processes which in some cases might even include Postgres itself.
And this is the recommendation they give.
When you see an out of memory error you either want to increase the overall RAM on the machine itself by upgrading to a larger instance OR you want to decrease the amount of memory that work_mem uses. Yes, you read that right: out-of-memory it’s better to decrease work_mem instead of increase since that is the amount of memory that can be consumed by each process and too many operations are leveraging up to that much memory.
You could see the complete explanation of this article “Configuring memory for Postgres” here, it may help you with this issue.
I am stuck in a problem that PostgreSQL data writes are very slow.
I developed my application in Java (using JDBC) to insert data into a PostgreSQL DB. It works well on our remote development server. However, after I deploy it to the production server, it causes a problem.
The insert speed of PostgreSQL on the production server is only ~150 records/s for 200000K records, while it is ~1000 records/s for the same data set on the development server.
Firstly, I tried to change the configuration in postgresql.conf as follows:
effective_cache_size = 4GB
max_wal_size = 2GB
work_mem = 128MB
shared buffers = 512MB
After I changed the configuration and restarted, it only affects the query speed, while the insert speed does not change (~150 records/s).
I have checked my server memory info, there is a lot of free memory ~4GB. The inserter only uses 0.5% of 8GB (~40MB).
So my questions are:
Is this a problem of a storage disk, such as SSD and HDD or virtual
and physical etc.? Why is the insert speed still very slow, although I have changed the configuration? Is there any way
for increasing the insert speed?
Note: the problem does not relate to the insert query structure.
I have used the same query in the same condition elsewhere (I set up an
environment in 2 servers in the same way). I do not know why the
DEVELOPMENT server (4GB) works better than the PRODUCTION server
(8GB).
The only one of your parameters that has an influence on INSERT performance is max_wal_size. High values prevent frequent checkpoints.
Use iostat -x 1 on the database server to see how busy your disks are. If they are quite busy, you are probably I/O bottlenecked. Maybe the I/O subsystem on your test server is better?
If you are running the INSERTs in many small transactions, you may be bottlenecked by fsync to the WAL. The symptom is a busy disk with not much I/O being performed.
In that case batch the INSERTs in larger transactions. The difference you observe could then be due to different configuration: Maybe you set synchronous_commit or (horribile dictu!) fsync to off on the test server.
Mongodb Background Flushing blocks all the requests:
Server: Windows server 2008 R2
CPU Usage: 10 %
Memory: 64G, Used 7%, 250MB for Mongod
Disk % Read/Write Time: less than 5% (According to Perfmon)
Mongodb Version: 2.4.6
Mongostat Normally:
insert:509 query:608 update:331 delete:*0 command:852|0 flushes:0 mapped:63.1g vsize:127g faults:6449 locked db:Radius:12.0%
Mongostat Before(maybe while) Flushing:
insert:1 query:4 update:3 delete:*0 command:7|0 flushes:0 mapped:63.1g vsize:127g faults:313 locked db:local:0.0%
And Mongostat After Flushing:
insert:1572 query:1849 update:1028 delete:*0 command:2673|0 flushes:1 mapped:63.1g vsize:127g faults:21065 locked db:.:99.0%
As you see when flushes happening lock is 99% just at this point mongod stops responding any read/write operation (mongotop and mongostat also stop). The flushing takes about 7 to 8 seconds to complete which does not increase disk load more than 10%.
Is there any suggestions?
Under Windows server 2008 R2 (and other versions of Windows I would suspect, although I don't know for sure), MongoDB's (2.4 and older) background flush process imposes a global lock, doing substantial blocking of reads and writes, and the length of the flush time tends to be proportional to the amount of memory MongoDB is using (both resident and system cache for memory-mapped files), even if very little actual write activity is going on. This is a phenomenon we ran into at our shop.
In one replica set where we were using MongoDB version 2.2.2, on a host with some 128 GBs of RAM, when most of the RAM was in use either as resident memory or as standby system cache, the flush time was reliably between 10 and 15 seconds under almost no load and could go as high as 30 to 40 seconds under load. This could cause Mongo to go into long pauses of unresponsiveness every minute. Our storage did not show signs of being stressed.
The basic problem, it seems, is that Windows handles flushing to memory-mapped files differently than Linux. Apparently, the process is synchronous under Windows and this has a number of side effects, although I don't understand the technical details well enough to comment.
MongoDb, Inc., is aware of this issue and is working on optimizations to address it. The problem is documented in a couple of tickets:
https://jira.mongodb.org/browse/SERVER-13444
https://jira.mongodb.org/browse/SERVER-12401
What to do?
The phenomenon is tied, to some degree, to the minimum latency of the disk subsystem as measured under low stress, so you might try experimenting with faster disks, if you can. Some improvements have been reported with this approach.
A strategy that worked for us in some limited degree is avoiding provisioning too much RAM. It happened that we really didn't need 128 GBs of RAM, so by dialing back on the RAM, we were able to reduce the flush time. Naturally, that wouldn't work for everyone.
The latest versions of MongoDB (2.6.0 and later) seem to handle the
situation better in that writes are still blocked during the long
flush but reads are able to proceed.
If you are working with a sharded cluster, you could try dividing the RAM by putting multiple shards on the same host. We didn't try this ourselves, but it seems like it might have worked. On the other hand, careful design and testing would be highly recommended in any such scenario to avoid compromising performance and/or high availability
We tried playing with syncdelay. Reducing it didn't help (the long flush times just happened more frequently). Increasing it helped a little (there was more time between flushes to get work done), but increasing it too much can exacerbate the problem severely. We boosted the syncdelay to five minutes (300 seconds), at one point, and were rewarded with a background flush of 20 minutes.
Some optimizations are in the works at MongoDB, Inc. These may be available soon.
In our case, to relieve the pressure on the primary host, we periodically rebooted one of the secondaries (clearing all memory) and then failed over to it. Naturally, there is some performance hit due to re-caching, and I think this only worked for us because our workload is write-heavy. Moreover, this technique not in any sense a solution. But if high flush times are causing serious disruption, this may be one way to "reduce the fever" so to speak.
Consider running on Linux... :-)
Background flush by default does not block read/write. mongod does flush every 60s, unless otherwise specified with -syncDelay parameter. syncDelay uses fsync() operation, which can set to block write while in-memory pages flush to disk. A blocked write could have potential to block reads as well. Read more: http://docs.mongodb.org/manual/reference/command/fsync/
However, normally a flush should not take more than 1000ms (1 second). If it does, it is likely the amount of data flushing to disk is too large for your disk to handle.
Solution: upgrade to a faster disk like SSD, or decrease flush interval (try 30s, rather than the default 60s).
All,
I am running CentOS 6.0 with Postgresql 8.4 and can't seem to figure out how to prevent so much disc swap from occurring. I have 12 gigs of RAM and 4 processors and I am doing some simple updates (1 table at a time). I thought for a minute that the inserts happening in parallel from a script I wrong was causing the large memory usage but when I saw the simple update causing it too I basically threw in the towel and decided to ask for help.
I pasted the conf file here. http://pastebin.com/e0jdBu0J
You can see that I set the buffers relatively low and the connection amounts high. The DB service will not start if I set the shared buffers any higher than 64 megs. Anyone have an idea what may be causing this for me?
Thanks,
Adam
If you're going into swap, increasing shared_buffers will make the problem worse; you'll be taking RAM away from the part that's running out and swapping, instead dedicating memory to the database caching. It's worth fixing SHMMAX etc. just on general principle and for later tuning work, but that's not going to help with this problem.
Guessing at the identify of your memory gobbling source is a crapshoot. Far better to look at data from "top -c" and ps to find which processes are using a lot of it. It's possible for a really bad query to consume way more memory than it should. If you see memory use spike up for a PostgreSQL process running something, check the process ID against the information in pg_stat_tables to see what it's doing.
There are a couple of things that can cause this sort of issue that often surprise people. If you are doing a large number of row updates in a single transaction, and there are foreign key checks or triggers involved, that can run out of memory. The queue of things to check in each of those cases is kept in RAM, and can be surprisingly big.
There are two problems with your PostgreSQL settings that might be related. Databases don't actually work very well if you have a lot more active connections than cores in the server; best performance is normally 2 to 3 active clients per core. And all sorts of things go wrong once you've got more than a few hundred connection. There is some connections^2 behavior that gets ugly there performance wise, and there are some memory issues too. If you really need 1250 connections, you should be using a connection pooler such as pgBouncer or pgpool-II.
And effective_io_concurrency = 1000 is way too high for any hardware on the planet. Useful values for that in a small multiple of how many disks you have in the server. I have no idea what happens as far as memory usage goes when you set it that high, but it's not been tested very well at that range. Normal settings more like 1 to 25. The parameters outlined at Tuning Your PostgreSQL Server are much more important than it is; the concurrency value only impacts one particular type of table scan.
Centos 6 seems to have a very conservative shmmax as a default
Set your shared buffers to that recommended by postgres tuning resources
see for explanation and how to set.
To experiment you can (as root) use sysctl -w kernel.shmmax = n
where n is the value that the startup error message that postgres is trying to allocate on startup. When you identify the value you wish to use permanently then set that in /etc/sysctl.conf
I have a C++ application which is making use of PostgreSQL 8.3 on Windows. We use the libpq interface.
We have a multi-threaded app where each thread opens a connection and keeps using without PQFinish it.
We notice that for each query (especially the SELECT statements) postgres.exe memory consumption would go up. It goes up as high as 1.3 GB. Eventually, postgres.exe crashes and forces our program to create a new connection.
Has anyone experienced this problem before?
EDIT: shared_buffer is currently set to be 128MB in our conf. file.
EDIT2: a workaround that we have in place right now is to call PQfinish for every transaction. But then, this slows down our processing a bit since establishing a connection every time is quite slow.
In PostgreSQL, each connection has a dedicated backend. This backend not only holds connection and session state, but is also an execution engine. Backends aren't particularly cheap to leave lying around, and they cost both memory and synchronization overhead even when idle.
There's an optimum number of actively working backends for any given Pg server on any given workload, where adding more working backends slows things down rather than speeding it up. You want to find that point, and limit the number of backends to around that level. Unfortunately there's no magic recipe for this, it mostly involves benchmarking - on your hardware and with your workload.
If you need more connections than that, you should use a proxy or pooling system that allows you to separate "connection state" from "execution engine". Two popular choices are PgBouncer and PgPool-II . You can maintain light-weight connections from your app to the proxy/pooler, and let it schedule the workload to keep the database server working at its optimum load. If too many queries come in, some wait before being executed instead of competing for resources and slowing down all queries on the server.
See the postgresql wiki.
Note that if your workload is read-mostly, and especially if it has items that don't change often for which you can determine a reliable cache invalidation scheme, you can also potentially use memcached or Redis to reduce your database workload. This requires application changes. PostgreSQL's LISTEN and NOTIFY will help you do sane cache invalidation.
Many database engines have some separation of execution engine and connection state built in to the core database engine's design. Sybase ASE certainly does, and I think Oracle does too, but I'm not too sure about the latter. Unfortunately, because of PostgreSQL's one-process-per-connection model it's not easy for it to pass work around between backends, making it harder for PostgreSQL to do this natively, so most people use a proxy or pool.
I strongly recommend that you read PostgreSQL High Performance. I don't have any relationship/affiliation with Greg Smith or the publisher*, I just think it's great and will be very useful if you're concerned about your DB's performance.
* ... well, I didn't when I wrote this. I work for the same company now.
The memory usage is not necessarily a problem. PostgreSQL uses shared memory for some caching, and this memory does not count towards the size of the process memory usage until it's actually used. The more you use the process, the larger parts of the shared buffers will be active in it's address space.
If you have a large value for shared_buffers, this will happen. If you have it too large, the process can run out of address space and crash, yes.
The problem is probably that you don't close the transaction,
In PostgreSQL even if you do only selects without DML it runs in transaction which need to be rollback.
By adding rollback at the end of the transaction will reduce your memory problem