Could MongoDB send multiply update commands at once? - mongodb

In my project, I have to update documents in MongoDB many times. I find MongoDB support insert a lot of documents with insert_many use a single command, but I can't use update_many to update a lot at once, they have difference condition. I have to update them one by one.
With insert_many, I can insert more than 7000 documents per second. At the same environment, there are only about 1500 documents could be updated per second. It just seems inefficient to send thousands of commands when one will do.
Is it possible to send multiple update commands to MongoDB server at once?
Thanks for your explain #Blakes Seven, I have rewritten my program with Bulk and update documents with "unordered" operation. There is the speed report on my test environment.
1 thread: 12655 doc/s cpu: 150 - 200%
2 threads: 19005 doc/s cpu: 200 - 300%
3 threads: 24433 doc/s cpu: 300 - 400%
4 threads: 28957 doc/s cpu: 400 - 500%
5 threads: 35586 doc/s cpu: 500 - 600%
6 threads: 32942 doc/s cpu: 600+%
On my test environment, test program and MongoDB server running on the same machine, It seems not perfect for multiple threads. The CPU usage of MongoDB when run the program with a single thread, It was between 150% and 200%. MongoDB executed the operations in parallel exactly, seems have a limit of the threads with a client connect.
Anyway, a single thread is enough for me, besides, fewer thread has higher efficiency.
Another report on the online environment that client and server running on a different machine:
1 thread: 14719 doc/s
2 threads: 26837 doc/s
3 threads: 34908 doc/s
4 threads: 46151 doc/s
5 threads: 47842 doc/s
6 threads: 52522 doc/s

You can do that with
db.collection.findAndModify()
Please go through documentation :
https://docs.mongodb.com/manual/reference/method/db.collection.findAndModify/

Related

Slow server caused by mongodb instance

I see the MongoDB usage extremely high. It shows it is using 756% of the cpu and the load is at 4
22527 root 20 0 0.232t 0.024t 0.023t S 756.2 19.5 240695:16 mongod
I checked the MongoDB logs and found that every question is taking more than 200ms to execute which is causing the high resource usage and speed issue .

Unusual sysbench results Raspberry Pi

I have 2 raspberry pi's that I wanted to benchmark for load balancing purpose.
Raspberry pi Model B v1.1 - running Raspbian Jessie
Raspberry pi Model B+ v1.2 - running Raspbian Jessie
I installed sysbench on both systems and ran: sysbench --num-threads=1 --test=cpu --cpu-max-prime=10000 --validate run on the first and changed --num-threads=4 on the second, as its quadcore and ran both.
The results are not at all what I expected (I obviously expected the multithreaded benchmark to severely outperform the single threaded benchmark). When I ran a the command with a single thread, performance was about the same on both systems. But when I changed the number of threads to 4 on the second Pi it still took the same amount of time, except that the per request statistics showed that the average request took about 4 times as much time. I can seem to grasp why this is.
Here are the results:
Raspberry pi v1.1
Single thread
Maximum prime number checked in CPU test: 20000
Test execution summary:
total time: 1325.0229s
total number of events: 10000
total time taken by event execution: 1324.9665
per-request statistics:
min: 131.00ms
avg: 132.50ms
max: 171.58ms
approx. 95 percentile: 137.39ms
Threads fairness:
events (avg/stddev): 10000.0000/0.00
execution time (avg/stddev): 1324.9665/0.00
Raspberry pi v1.2
Four threads
Maximum prime number checked in CPU test: 20000
Test execution summary:
total time: 1321.0618s
total number of events: 10000
total time taken by event execution: 5283.8876
per-request statistics:
min: 486.45ms
avg: 528.39ms
max: 591.60ms
approx. 95 percentile: 553.98ms
Threads fairness:
events (avg/stddev): 2500.0000/0.00
execution time (avg/stddev): 1320.9719/0.03
"Raspberry pi Model B+ v1.2" has the same CPU as "Raspberry pi Model B v1.1". Both boards are from the first generation of Raspberry Pi and they have 1 core CPU.
For 4 CPU you need Raspberry Pi 2 Model B instead of Raspberry pi Model B+.
Yeah, the naming is a bit confusing :(

Google Cloud SQL Performance

I'm surprised by the poor performance of Google Cloud SQL when I ran a sysbench on it. Here's the result from sysbench after setting the test run from these 2 commands.
sysbench --test=oltp --oltp-table-size=1000000 --mysql-host=173.194.225.xxx --mysql-db=test --mysql-user=root --mysql-password=MYPASSWORD prepare
sysbench --test=oltp --oltp-table-size=1000000 --mysql-host=173.194.225.xxx --mysql-db=test --mysql-user=root --mysql-password=96220751 --max-time=60 --oltp-read-only=on --max-requests=0 --num-threads=8 run
Sysbench Result:
OLTP test statistics:
queries performed:
read: 7756
write: 0
other: 1108
total: 8864
transactions: 554 (9.13 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 7756 (127.83 per sec.)
other operations: 1108 (18.26 per sec.)
Test execution summary:
total time: 60.6740s
total number of events: 554
total time taken by event execution: 484.0527
per-request statistics:
min: 856.76ms
avg: 873.74ms
max: 897.26ms
approx. 95 percentile: 890.33ms
Threads fairness:
events (avg/stddev): 69.2500/0.66
execution time (avg/stddev): 60.5066/0.21
Can anyone comment on this result? I ran the test with D0 and D4 and I'm getting very similar result. Even the sysbench test from digitalocean shows a far better performance as shown below:
OLTP test statistics:
queries performed:
read: 358498
write: 0
other: 51214
total: 409712
transactions: 25607 (426.73 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 358498 (5974.23 per sec.)
other operations: 51214 (853.46 per sec.)
Test execution summary:
total time: 60.0074s
total number of events: 25607
total time taken by event execution: 479.9015
per-request statistics:
min: 7.50ms
avg: 18.74ms
max: 48.85ms
approx. 95 percentile: 21.88ms
Threads fairness:
events (avg/stddev): 3200.8750/5.73
execution time (avg/stddev): 59.9877/0.00

pymongo cursor getMore takes long time

I am having trouble with the time it takes for my python script to iterate a data set. The data set is about 40k documents. This is large enough to cause the pymongo cursor to issue multiple fetches which are internal and abstracted away from the developer. I simplified my script down as much as possible to demonstrate the problem:
from pymongo import Connection
import time
def main():
starttime = time.time()
cursor = db.survey_answers.find()
counter=0;
lastsecond=-1;
for entry in cursor:
if int(time.time()-starttime)!=lastsecond:
print "loop number:", counter, " seconds:",int(time.time()-starttime);
lastsecond= int(time.time()-starttime)
counter+=1;
print (time.time()-starttime), "seconds for the mongo query to get rows:",counter;
connection = Connection(APPSERVER)#either localhost or hostname depending on test
db = connection.beacon
if __name__ == "__main__":
main()
My set up is as follows. I have 4 separate hosts, one APPSERVER running mongos, and 3 other shard hosts with each being a primary replica set and secondary replica sets of the other two.
I can run this from one of the shard servers (with the connection pointing to the APPSERVER hostname) and I get:
loop number: 0 seconds: 0
loop number: 101 seconds: 2
loop number: 7343 seconds: 5
loop number: 14666 seconds: 8
loop number: 21810 seconds: 10
loop number: 28985 seconds: 13
loop number: 36078 seconds: 15
16.0257680416 seconds for the mongo query to get rows: 41541
So it's obvious what's going on here, the first batchsize of a cursor request is 100, and then each subsequent one is 4m worth of data which appears to be just over 7k documents for me. And each fetch costs 2-3 seconds!!!!
I thought I could fix this problem by moving my application closer to the mongos instance. I ran the above code on APPSERVER (with the connection pointing to localhost) hoping to decrease the network usage .... but it was worse!
loop number: 0 seconds: 0
loop number: 101 seconds: 9
loop number: 7343 seconds: 19
loop number: 14666 seconds: 28
loop number: 21810 seconds: 38
loop number: 28985 seconds: 47
loop number: 36078 seconds: 53
53.5974030495 seconds for the mongo query to get rows: 41541
The cursor sizes are exactly the same in both test, which is nice, but each cursor fetch costs 9-10 seconds here!!!
I know I have four separate hosts that need to communicate, so this can't be instant. But I will need to iterate over collections of maybe 10m records. At 2 seconds per 7k, that would take just shy of an hour! I can't have this!
Btw, I'm new to the python/mongo world, I'm used to php and mysql where I would expect this to process in a fraction of a second:
$q=mysql_query("select * from big_table");//let's say 10m rows here ....
$c=0;
while($r=mysql_fetch_rows($q))
$c++;
echo $c." rows examined";
Can somebody explain the gargantuan difference between the pymongo (~1 hour) and php/mysql (<1 sec) approaches I've presented? Thanks!
I was able to figure this out with the help of A. Jesse Jiryu Davis. It turns out I didn't have C extension installed. I wanted to run another test without the shards so I could rule out the network latency as an issue. I got a fresh clean host, set up mongo, imported my data, and ran my script and it took the same amount of time. So I know the sharding/replica sets didn't have anything to do with the problem.
Before the fix, I was able to print:
pymongo.has_c(): False
pymongo version 2.3
I then followed the instructions to install the dependencies for c extensions:
yum install gcc python-devel
Then I reinstalled the pymongo driver:
git clone git://github.com/mongodb/mongo-python-driver.git pymongo
cd pymongo/
python setup.py install
I reran my script and it now prints:
pymongo.has_c(): True
pymongo version 2.3+
And it takes about 1.8 seconds to run as opposed to the 16 above. That still seems long to fetch 40k records and iterate over them, but it's a significant improvement.
I will now run these updates on my prod (sharded, replica set) environment to hopefully see the same results.
**UPDATE**
I updated my pymongo driver on my prod environment and there was an improvement, though not as much. It took about 2.5-3.5 seconds over a few tests. I presume the sharding nature was the fault here. That still seems incredibly slow to iterate over 40k records.

Understanding results of mongostat

I am trying to understand the results of mongostat:
example
insert query update delete getmore command flushes mapped vsize res faults locked % idx
0 2 4 0 0 10 0 976m 2.21g 643m 0 0.1 0
0 1 0 0 0 4 0 976m 2.21g 643m 0 0 0
0 0 0 0 0 1 0 976m 2.21g 643m 0 0 0
I see
mapped - 976m
vsize-2.2.g
res - 643m
res - RAM, so ~650MB of my database is in RAM
mapped - total size of database (via memory mapped files)
vsize - ???
not sure why vsize is important or what exactly it means in this content - im running an m1.large so i have like 400GB of HD space + 8GB of RAM.
Can someone help me out here and explain if
I am on the right page
what stats I should monitor in production
This should give you enough information
mapped - amount of data mmaped (total data size) megabytes
vsize - virtual size of process in megabytes
res - resident size of process in megabytes
1) I am on the right page
So mongostat is not really a "live monitor". It's mostly useful for connecting to a specific server and watching for something specific (what's happening when this job runs?). But it's not really useful for tracking performance over time.
Typically, for monitoring the server, you will want to use a tool like Zabbix or Cacti or Munin. Or some third-party server monitor. The MongoDB webiste has a list.
2) what stats I should monitor in production
You should monitor the same basic stats you would monitor on any server:
CPU
Memory
Disk IO
Network traffic
For MongoDB specifically, you will to run db.serverStatus() and track the
opcounters
connections
indexcounters
Note that these are increasing counters, so you'll have to create the correct "counter type" in your monitoring system (Zabbix, Cacti, etc.) A few of these monitoring programs already have MongoDB plug-ins available.
Also note that MongoDB has a "free" monitoring service called MMS. I say "free" because you will be receiving calls from salespeople in exchange for setting up MMS.
Also you can use these mini tools watching mongodb
http://openmymind.net/2011/9/23/Compressed-Blobs-In-MongoDB/
by the way I remembered this great online tool from 10gen
https://mms.10gen.com/user/login