Apache Drill Mongo DB simple query takes too long - mongodb

I am trying out Apache Drill to execute a query on a mongo connection. Simple COUNT(1) queries are taking too long. On the order of 20 seconds per query. When I try to connect using any other mongo connector and run the same query it takes miliseconds. I have also seen people talking online about their mongo queries taking 2 seconds. I can live with 2 seconds but 20 is too much.
Here is the query:
select count(*) from mongo.test.contacts
Here is the Query Profile for the query.

It seems that some optimizations should be applied for your case. It will be very helpful if you will create a Jira ticket [1] with details:
DDL for MongoDB table, version of MongoDB and info from log files (because it is not clear what Drill did all this time).
Simple reproduce of your case can help to solve this issue more quickly.
Thanks.
[1] https://issues.apache.org/jira/projects/DRILL/issues/

Related

How to disable MongoDB aggregation timeout

I want to run aggregation on my large data sets. (It's about 361K documents) and Insert them to another collection.
I getting this error:
I tried to increase Max Time but it has maximum and it's not enough for my data sets. I found https://docs.mongodb.com/manual/reference/method/cursor.noCursorTimeout/ but it seems noCursorTimeout only apply on find not aggregation.
please tell me how I can disable cursor timeout or another solution to do this.
I am no MongoDB expert but will interpret what I know.
MongoDB Aggregation Cursors don't have a mechanism to adjust Batch Size or set Cursor Timeouts.
Therefore there is no direct way to alter this and the timeout of an aggregation query solely depends on the cursorTimeoutMillis parameter of the MongoDB or mongos` instance. Its default timeout value is 10 minutes.
Your only option is to change this value by the below command.
use admin
db.runCommand({setParameter:1, cursorTimeoutMillis: 1800000})
However, I strongly advise you against using this command. That's because it's a safety mechanism built into MongoDB. It automatically deletes queries that are running idle for more than 10 minutes, so that there is a lesser load in the MongoDB server. If you change this parameter (say to 30 minutes), MongoDB will allow idle queries to be running in the background for those 30 minutes, which will not only make all the new queries slower to execute, but also increase load and memory on the MongoDB side.
You have a couple of workarounds. Reduct the amount of documents if working on MongoDB Compass or copy and run the commands on Mongo Shell (I had success so far with this method).

How to intercept mondoDB Query from Presto Connector

I have written a number of Presto queries that pull from mongoDB collections, but others in our project query mongo directly. These folks would like to use my queries to save them the time of having to rewrite them.
Is there a way to obtain/extract the mongoDB query language generated by Presto?
Didn't see anything in the MongoDB connector documentation that would indicate how or if this was possible.
I am aware of SQL-mongo converters out there, but Presto SQL extends normal SQL to enable things like unwrapping arrays etc. that we encounter with non-relational stores and these converts have trouble with these things in my experience.
You can set MongoDB driver log level DEBUG in log.properties:
org.mongodb=DEBUG
However, it will print many unrelated logs (e.g. healthcheck). Filed an issue https://github.com/prestosql/presto/issues/5600
I guess the easiest way is to look into Mongodb while the query is running and get it from there, for example via logging:
db.setProfilingLevel(2)
db.system.profile.find().pretty()
You may also use some GUIs like MongoVue or Robo 3T - I used MongoVue in the past to evaluate running queries.

How do I query a MongoDB in Apache NiFi based on a dynamic value?

I'm trying to run a GetMongo Processor in Apache NiFi. I can get a base query to run just fine and output the records to my hard drive (just for initial testing. Will go to a Hadoop client eventually)
My problem is I want to run the query every 10 minutes and return ONLY new records in the last 10 minutes. The query I have tested on my local Mongo client is:
{"createdAt": {$gte: new Date(ISODate().getTime() - 1000 * 60 * 5)}}
At first, I thought it didn't like the dynamic part, so I tried putting in a static timestamp. But NiFi has told me every single query I have tried is invalid.
There are some decent guides but are very specific to SQL processors in NiFi and wondering if anyone has experience with creating a flow based on dynamic queries with Mongo in NiFi. Thanks so much in advance.

Alteryx MongoDB Output Performance

I'm trying to write on my mongo collection about 1 million rows, but it's taking too much time (actually it never ends).
Looking at mongo log I can see that the insert queries are called every time, there is no bulk operation.
Does Alteryx support bulk insert for Mongo?
I'm using Alteryx 10.1 and MongoDB 3.4
The answer at this moment is no, Alteryx doesn't support bulk insert for Mongo.

Track MongoDB performance?

Is there a way to track 'query' performance in MongoDB? Specially testing indexes or subdocuments?
In sql you can run queries, see execution time and other analytic metrics.
I have a huge mongoDB collection and want to try different variations and indexes, not sure how to test it, would be nice to see how long did it take to find a record.. (I am new in MongoDB). Thanks
There are two things here that you'll likely be familiar with.
Explain plans
Slow Logs
Explain Plans
Here are some basic docs on explain. Running explain is as simple as db.foo.find(query).explain(). (note that this actually runs the query, so if your query is slow this will be too)
To understand the output, you'll want to check some of the docs on the slow logs below. You're basically given details about "how much index was scanned", "how many are found", etc. As is the case with such performance details, interpretation is really up to you. Read the docs above and below to point you in the right direction.
Slow Logs
By default, slow logs are active with a threshold of 100ms. Here's a link to the full documentation on profiling. A couple of key points to get you started:
Get/Set profiling:
db.setProfilingLevel(2); // 0 => none, 1 => slow, 2 => all
db.getProfilingLevel();
See slow queries:
db.system.profile.find()
MongoDB has a query profiler you can turn on.
See: http://www.mongodb.org/display/DOCS/Database+Profiler
I recommend, based on my experience , to read mongologs with the mtools support ( https://github.com/rueckstiess/mtools ), there are lots of feature which help reading output .
For instance I found very useful the mlogFilter command e.g.:
mlogfilter.exe [PATH_TO_FILE_LOG] --slow --json | mongoimport --db logsTest -c logCollection --drop
Launching this command you will record all the slow queries in a table.
Activating mongoprofiler could be a problem for performance in production environment, so you can achieve the same results reading logs.
Consider also that the system.profile collection is capped , so you should check the size of the capped collection because otherwise you could not find the queries you're looking for.