Looking for any advice I can get.
I have 16 virtual CPUs all writing to a single remote MongoDB server. The machine that's being written to is a 64-bit machine with 32GB RAM, running Windows Server 2008 R2. After a certain amount of time, all the CPUs stop cold (no gradual performance reduction), and any attempt to get a Remote Desktop Connection hangs.
I'm writing from Python via pymongo, and the insert statement is "[collection].insert([document], safe=True)"
I decided to more actively monitor my server as the distributed write job progressed, remoting in from time to time and checking the Task Manager. What I see is a steady memory creep, from 0.0GB all the way up to 29.9GB, in a fairly linear fashion. My leading theory is therefore that my writes are filling up the memory and eventually overwhelming the machine.
Am I missing something really basic? I'm new to MongoDB, but I remember that when writing to a MySQL database, inserts are typically followed by commits, where it's the commit statement that actually makes sure the record is written. Here I'm not doing any commits...?
Thanks,
Dave
Try it with journaling turned off and see if the problem remains.
Related
So I've spent the better part of my day (and several searches before) looking for a workable solution to prevent data loss when the host of a PostgreSQL server installation gets rebooted or shut down. We maintain a number of Azure and on-prem servers and the number of times someone has inadvertently shut down the server without first ensuring Postgres is no longer flushing data to disk is far more frequent than it should be. Of note we are a Windows Server shop.
Our current best practice (which if followed appropriately works) is to stop the Postgres service, then watch disk writes to the Postgres data directory in Resource Monitor. Once nothing is writing to that directory, shut down the host. I have to think that there's a better way to ensure that it doesn't get shutdown in a manner that leads to data corruption, regardless of adherence to the best practice (or in some cases, because Windows Update mandates a reboot, regardless of configured settings telling it not to reboot).
Some things I've considered, but have been unable to find solid answers for:
Create a scheduled task that uses the "On an event" trigger to monitor the System log for event 1074. It would have to be configured to "run whether the user is logged in or not". The script would cancel the shutdown command with shutdown /a, then run a script to elegantly shutdown Postgres. I've seen mixed results on if the scheduled job would reliably trigger before Task Scheduler is terminated in the shutdown sequence.
Create a shutdown script using Group Policy. My question there is will it wait for the script to complete before executing the shutdown?
How do you deal with data loss in your Postgres server Windows hosts?
First, if you register PostgreSQL as a Windows service, a shutdown of the machine will automatically shut down PostgreSQL first.
But even without that, a properly configured PostgreSQL server on proper hardware will never suffer data loss (unless you hit a rare PostgreSQL software bug). It is one of the basic requirements for a relational database to survive crashes without data loss.
To enumerate a few things that come to mind:
make sure that the PostgreSQL parameters fsync and synchronous_commit are set to on
make sure that you are using a reliable file system for the data files and the WAL (a Windows network share is not a reliable file system)
make sure you are using storage that has no caches that are not battery-backed
I installed MongoDb on a windows server 2008 R2 and the hotfix KB2731284 is not installed, but I cannot restart the server easily.
In the hotfix description, I got this message "You run an application that uses the FlushViewOfFile() function to clean up memory-mapped files from the paged memory pool." (see https://support.microsoft.com/en-us/kb/2731284)
My question is, when the funtion FlushViewOfFile() is called? My application is just writing in a collection and get data from it. Do I risk to get some wrong behaviors?
I think you can run MongoDb without applying the Hotfix, but I would not recommend it. In long time you may run into problems. They have included some fixes in MongoDB to workaround the problem.
A detailed description of the problem can be found here and here.
See also this.
On Windows, Memory Mapped File flushes are synchronous operations. When the OS Virtual Memory Manager is asked to flush a memory mapped file, it makes a synchronous write request to the file cache manager in the OS. This causes large I/O stalls on Windows systems with high Disk IO latency, while on Linux the same writes are asynchronous.
The problem becomes critical on high-latency disk drives like Azure persistent storage (10ms). This behavior results in very long bg flush times, capping disk IOPS at 100. On low latency storage (local storage and AWS) the problem is not that visible.
On Windows 7 and Windows Server 2008 R2 when applying the hotfix you get a better file allocation performance what is relevant for MongoDB
I'm using SQL Server 2008 R2. I have a stored procedure that runs bcp via xp_command shell. On my laptop with a copy of the database, a job with 50000 records is almost instant and bcp performance is 71K rows per sec.
I run exactly the same stored procedure on the server and it takes 1h 51 minutes and bcp performance is 7 rows per sec (so 10,000x slower). The query that selects the data runs in under a second on the server BTW. This happened last week and we restarted the SQL Server instance and it ran pretty quick again on server. After about 5 days, the performance got real slow again, but restarting SQL instance didn't help.
My command is:
bcp "exec DBNAME.dbo.SPNAME 224,1 "
queryout "\\Server\path\OUTPUT\11111.txt" -c -t\t -Usa -P"PASSWORD" -SSQLSERVER
If I run activity monitor, I see my stored procedure process and it says RUNNABLE.
The server is on a VM with 4 cores and 28GB RAM.
If I run the same bcp command from a dos shell, I get same.
I'm at a loss where to look now. Anyone got any suggestions?
TIA
Mark
To answer the question of "where to look" and because the task you are trying to complete involves distributed resources (I'm assuming here because you are using UNC paths)... you have to look into differences between the environments, which when comparing execution between Server and laptop... is just about everything.
Storage (and available storage)
CPU (and available cpu)
Network (and available bandwidth)
Memory (and available memory)
SQL Server version/updates
Maintenance schedules (of which the laptop will likely have none)
concurrent activity (of which the laptop will likely have none)
The data you seem to have addressed. You can confirm that the data/database objects are the same? This is a restored database you are working with on the laptop (restored from the server?) or you've manually inspected tables and indexes if not a restore from the server?
If not restored, could the laptop have less data?
To troubleshoot, you'll also need much more than activity monitor. You'll need performance monitor.
This is from some time ago (not sure why things like this dont expire on here, but oh well).
Hi One of our customers is running mongodb V2.2.3 on a 64 bit windows server 2008 R2 Enterprise.
We're currently seeing mmap flush times of over 20 seconds every minute.
What is confusing me is that it isn't doing any writes to the disk. (Disk write bytes is next to 0)
Our programme which access the data has been temporary turned off.
so all that is connected is a mongo shell.
Mongostat and mongotop aresn't showing anything
The database has 130 million records. There are 356 files for mmap.
Any sugestions on what could be causing this?
Thanks
If your working set is significantly larger than memory, and MongoDB is constantly going to disk for reads (and not just the normal spikes when syncing writes to disk), then you really should be sharding to spread the data across multiple machines/instances.
Given the behaviour you have described and that you have a large number of files for mmap, I suspect the underlying performance issue is SERVER-12401 in the MongoDB Jira issue tracker:
On Windows, Memory Mapped File flushes are synchronous operations. When the OS Virtual Memory Manager is asked to flush a memory mapped file, it makes a synchronous write request to the file cache manager in the OS. This causes large I/O stalls on Windows systems with high Disk IO latency, while on Linux the same writes are asynchronous.
There are a few possible ways to improve the flush performance on Windows, including code changes in both the MongoDB server and the Windows O/S. There is some ongoing work to address these issues, now that the synchronous flushing behaviour on Windows has been confirmed.
If you are using higher latency local storage (for example, spinning disks) you may be able to mitigate the issue by upgrading to SSD or better spec'd drives.
I would suggest upvoting/watching SERVER-12401 and the related Jira issues for updates.
It would also be worth upgrading from MongoDB 2.2 to a newer version as 2.2 is now past end-of-life for updates. There have been two major production release branches since then, including significant improvements in general performance/features as well as Windows support.
I have a PHP/Apache server with 12GB of RAM. I have been running Memcached on the same machine with 6GB of allotted RAM.
I wanted to run Memcached on a separate server (same datacenter, vlan, subnet), just as I do for MySQL. I setup a separate, identical server with the same memcached configuration.
I am seeing a roughly 10x page load time using Memcached from the remote server than what I get when running locally. I have primed both caches and I still have a 10x load time from remote.
I'm having trouble trouble shooting this.
You're loading 500kb of data per pageload, in all small keys? How many keys per pageload is this?
Latency to a remote server is very low, but running many roundtrips is still a bad idea. Memcached clients support multi-get operations, where you batch many keys into a single request/response with much lower latency.
Just for info, DDR3-1333 is about 10667 MB/s.
If you have, let's say, Gigabit ethernet, I guess it can explains some of the problems you are experiencing...