pymongo cursor getMore takes long time - mongodb

I am having trouble with the time it takes for my python script to iterate a data set. The data set is about 40k documents. This is large enough to cause the pymongo cursor to issue multiple fetches which are internal and abstracted away from the developer. I simplified my script down as much as possible to demonstrate the problem:
from pymongo import Connection
import time
def main():
starttime = time.time()
cursor = db.survey_answers.find()
counter=0;
lastsecond=-1;
for entry in cursor:
if int(time.time()-starttime)!=lastsecond:
print "loop number:", counter, " seconds:",int(time.time()-starttime);
lastsecond= int(time.time()-starttime)
counter+=1;
print (time.time()-starttime), "seconds for the mongo query to get rows:",counter;
connection = Connection(APPSERVER)#either localhost or hostname depending on test
db = connection.beacon
if __name__ == "__main__":
main()
My set up is as follows. I have 4 separate hosts, one APPSERVER running mongos, and 3 other shard hosts with each being a primary replica set and secondary replica sets of the other two.
I can run this from one of the shard servers (with the connection pointing to the APPSERVER hostname) and I get:
loop number: 0 seconds: 0
loop number: 101 seconds: 2
loop number: 7343 seconds: 5
loop number: 14666 seconds: 8
loop number: 21810 seconds: 10
loop number: 28985 seconds: 13
loop number: 36078 seconds: 15
16.0257680416 seconds for the mongo query to get rows: 41541
So it's obvious what's going on here, the first batchsize of a cursor request is 100, and then each subsequent one is 4m worth of data which appears to be just over 7k documents for me. And each fetch costs 2-3 seconds!!!!
I thought I could fix this problem by moving my application closer to the mongos instance. I ran the above code on APPSERVER (with the connection pointing to localhost) hoping to decrease the network usage .... but it was worse!
loop number: 0 seconds: 0
loop number: 101 seconds: 9
loop number: 7343 seconds: 19
loop number: 14666 seconds: 28
loop number: 21810 seconds: 38
loop number: 28985 seconds: 47
loop number: 36078 seconds: 53
53.5974030495 seconds for the mongo query to get rows: 41541
The cursor sizes are exactly the same in both test, which is nice, but each cursor fetch costs 9-10 seconds here!!!
I know I have four separate hosts that need to communicate, so this can't be instant. But I will need to iterate over collections of maybe 10m records. At 2 seconds per 7k, that would take just shy of an hour! I can't have this!
Btw, I'm new to the python/mongo world, I'm used to php and mysql where I would expect this to process in a fraction of a second:
$q=mysql_query("select * from big_table");//let's say 10m rows here ....
$c=0;
while($r=mysql_fetch_rows($q))
$c++;
echo $c." rows examined";
Can somebody explain the gargantuan difference between the pymongo (~1 hour) and php/mysql (<1 sec) approaches I've presented? Thanks!

I was able to figure this out with the help of A. Jesse Jiryu Davis. It turns out I didn't have C extension installed. I wanted to run another test without the shards so I could rule out the network latency as an issue. I got a fresh clean host, set up mongo, imported my data, and ran my script and it took the same amount of time. So I know the sharding/replica sets didn't have anything to do with the problem.
Before the fix, I was able to print:
pymongo.has_c(): False
pymongo version 2.3
I then followed the instructions to install the dependencies for c extensions:
yum install gcc python-devel
Then I reinstalled the pymongo driver:
git clone git://github.com/mongodb/mongo-python-driver.git pymongo
cd pymongo/
python setup.py install
I reran my script and it now prints:
pymongo.has_c(): True
pymongo version 2.3+
And it takes about 1.8 seconds to run as opposed to the 16 above. That still seems long to fetch 40k records and iterate over them, but it's a significant improvement.
I will now run these updates on my prod (sharded, replica set) environment to hopefully see the same results.
**UPDATE**
I updated my pymongo driver on my prod environment and there was an improvement, though not as much. It took about 2.5-3.5 seconds over a few tests. I presume the sharding nature was the fault here. That still seems incredibly slow to iterate over 40k records.

Related

Java multiple processes write to same folder performance issue

We have 4 Java instances each have 20 process writing to same and different folder in Linux and have performance issues on same folder. Below is our configuration
Java Instance 1 - 20 process writing to FOLDER 1 having writing performance 200000 per hour
Java Instance 2 - 20 process writing to FOLDER 1 having writing performance 200000 per hour
Java Instance 3 - 20 process writing to FOLDER 2 having writing performance 400000 per hour
Java Instance 4 - 20 process writing to FOLDER 4 having writing performance 400000 per hour
3rd and 4th instance have double the performance of Instance 1 and Instance 2
OS - REDHAT LINUX
CPU - 150+
Memory - 200+
Please advise same folder writing on Linux file system will have performance degradation as linux treat folder like files.
Below is our Java IO Code for reference .
File outFile = new File(outputFilePathTmp);
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(outFile));
marshaller.marshal(baseDocument.getBusinessDocument(),bufferedWriter);
String outputFilePathXml = regroupedXmlPath+File.separator+outputFileName.substring(0, indx)+"- Regrouped.xml";
outFile.setReadable(true,false);
outFile.setWritable(true,false);
outFile.setExecutable(true,false);
long outFileLength=outFile.length()/1024;
baseDocument.setFileSize(outFileLength);
bufferedWriter.close();

Randomly Login Timeout Expired errors from SQL 2000 DTS against SQL2008R2 databases

I have some JOBs running on SQL Server 2000, which are calling stored procedures or queries against remote SQL Servers (different editions).
The JOB calls a DTS, and is the DTS who does the remote connection and executes the Stored Procedure or gets a query results from the remote server.
This has been working without errors for years. I don't know why during the last month, I'm having randomly errors on these kind of jobs... I've read some other posts and seems to be related to a security issue, but I repeat, the most of times the jobs are working, only some runs are failing with the error.
Executed as user: SERVER\user. DTSRun: Loading... DTSRun: Executing...
DTSRun OnStart: DTSStep_DTSDynamicPropertiesTask_2 DTSRun OnError:
DTSStep_DTSDynamicPropertiesTask_2, Error = -2147467259 (80004005)
Error string: Login timeout expired Error source: Microsoft OLE DB
Provider for SQL Server Help file: Help context: 0
Error Detail Records: Error: -2147467259 (80004005); Provider Error: 0 (0)
Error string: Login timeout expired Error source: Microsoft OLE DB Provider for SQL Server
Help file: Help context: 0 DTSRun OnFinish:
DTSStep_DTSDynamicPropertiesTask_2 DTSRun: Package execution complete.
Process Exit Code 1. The step failed.
I really don't know what to check. After reboot the server the problems are still there. Any help from you guys would be appreciated.
EDIT 2019-02-14 16:15 -------------------------------------------------------------------------------------------
One of the solutions I found has been to change the Remote Login Timeout property from the default 20 seconds to 30 seconds, or to 0 (Zero means without timeout), by executing the next code:
sp_configure 'remote login timeout', 30 --Or 0 seconds for infinite
go
reconfigure with override
go
From: https://support.microsoft.com/es-es/help/314530/error-message-when-you-execute-a-linked-server-query-in-sql-server-tim
I've tried this solution changing it to 30 seconds, but with the same result. Of course I didn't set it to 0 for obvious reasons, the timeouts are there for something. And also tried 300 seconds (5 minutes to make a login!) and still the same.
EDIT 2019-02-25 11:25 -------------------------------------------------------------------------------------------
Very similar to my problem, still not solved...
https://www.sqlservercentral.com/Forums/Topic1727739-391-1.aspx
For the moment I have a temp solution, and it is to increase the Connect Timeout on the Connection object.
It was blank (probably using its default value).
Since I changed this property (Connection Object > Advanced... > Connect Timeout) to 300 I'm not having the problems on these DTS. I leaved 2 DTS without the change to ensure I continued having the problem, and these are the only ones which continue having the problem. The changed ones are working fine now.

Slow server caused by mongodb instance

I see the MongoDB usage extremely high. It shows it is using 756% of the cpu and the load is at 4
22527 root 20 0 0.232t 0.024t 0.023t S 756.2 19.5 240695:16 mongod
I checked the MongoDB logs and found that every question is taking more than 200ms to execute which is causing the high resource usage and speed issue .

pgbouncer free_servers - how to increase them

current setting of a pgbouncer server is the following - and what I don't understand is the 'free_servers' info given by the show lists command when connecting to pgbouncer. Is it a (soft or hard) limit on the number of connexion to the postgresql databases used with this instance of pgbouncer ?
configuration :
max_client_conn = 2048
default_pool_size = 1024
min_pool_size = 10
reserve_pool_size = 500
reserve_pool_timeout = 1
server_idle_timeout = 600
listen_backlog = 1024
show lists gives :
pgbouncer=# show lists ;
list | items
---------------+--------
databases | 6
pools | 3
free_clients | 185
used_clients | 15
free_servers | 70
used_servers | 30
it seems that there is a limit at 30 + 70 = 100 servers, but couldn't find it even review configuration values with show config, and documentation doesn't explicit which configuration to change / increase free_servers.
pgbouncer version : 1.7.2
EDIT :
I've just discover that, for a pool of 6 webservers configured to hit the same PG database, 3 of them can have 200 backend connexions (server connexion), and 3 of them can only make and maintain 100 connexions (as described in the first part). But, .. the configuration is exactly the same in pgbouncer configuration file, and the servers are cloned VM. The version of pgbouncer is also the same..
So far, I still haven't found documentation on internet where this limitation come from...
This data is just some internal information for PgBouncer.
Servers information is stored inside an array list data structure which is pre-allocated up to a certain size, in this case that is 100 slots. used_servers = 30, free_servers = 70 means there are 30 slots currently in used, and 70 slots free. PgBouncer will automatically increase the size of the list when it's full, hence there's no configuration for this.

mongodb higher faults on Windows than on Linux

I am executing below C# code -
for (; ; )
{
Console.WriteLine("Doc# {0}", ctr++);
BsonDocument log = new BsonDocument();
log["type"] = "auth";
BsonDateTime time = new BsonDateTime(DateTime.Now);
log["when"] = time;
log["user"] = "staticString";
BsonBoolean bol = BsonBoolean.False;
log["res"] = bol;
coll.Insert(log);
}
When I run it on a MongoDB instance (version 2.0.2) running on virtual 64 bit Linux machine with just 512 MB ram, I get about 5k inserts with 1-2 faults as reported by mongostat after few mins.
When same code is run against a MongoDB instance (version 2.0.2) running on a physical Windows machine with 8 GB of ram, I get 2.5k inserts with about 80 faults as reported by mongostat after few mins.
Why more faults are occurring on Windows? I can see following message in logs-
[DataFileSync] FlushViewOfFile failed 33 file
Journaling is disable on both instances
Also, is 5k insert on a virtual machine with 1-2 faults a good enough speed? or should I be expecting better inserts?
Looks like this is a known issue - https://jira.mongodb.org/browse/SERVER-1163
page fault counter on Windows is in fact the total page faults which include both hard and soft page fault.
Process : Page Faults/sec. This is an indication of the number of page faults that
occurred due to requests from this particular process. Excessive page faults from a
particular process are an indication usually of bad coding practices. Either the
functions and DLLs are not organized correctly, or the data set that the application
is using is being called in a less than efficient manner.