QueryDSL using SpringData repositories having performance impact on Mongo - spring-data

I am using QueryDSL and SpringData Repositories, while performing search on 3,00,000 documents it is taking time around 9 seconds. Search parameters are not indexed on Mongo.
And if iam using Mongo Query it is taking time in ms.
i have seen predicate construction happens in ms, only hitting the query and retrieving results are taking the time.
Is there any place where Performance of QueryDSL are listed?
Thanks
Ravi

Related

Slow Queries on MongoDB 5.0 Native Time-Series Collection

I was using MongoDB to store some time-series data at 1 Hz. To facilitate this, my central document represented an hour of data per device per user, with 3600 values pre-allocated at document creation time. Two drawbacks:
every insert is an update. I need to query for the correct record (by user, by device, by day, by hour), append the latest IoT reading to the list, and update the record.
paged queries require complex custom code. I need to query for a record count of all the data matching my search range, then manually create each page of data to be returned.
I was hoping the MongoDB Native Time-Series Collections introduced in 5.0 would give me some performance improvements and it did, but only on ingestion rate. The comparison I used is to push 108,000 records into the database as quickly as possible and average the response time, then perform paged queries and get a range for the response time for those. This is what I observed:
Mongo Custom Code Solution: 30 milliseconds inserts, 10-20 millisecond
paged query.
Mongo Native Time-Series: 138 microsecond insert, 50-90 millisecond
paged query.
The difference in insert rate was expected, see #1 above. But for paged queries I didn't expect my custom time-series kluge implementation to be significantly faster than a native implementation. I don't really see a discussion of the advantages to be expected in the Mongo 5.0 documentation. Should I expect to see improvements in this regard?

MongoDB - A very Simple query is taking high response time

We have an issue with MongoDB query performance. There is a collection in MongoDb with approx 30 properties and total documents are near to 2000. Below is an example with one collection "deal", facing similar issue with almost all the collections in my db. Please note that i have created indexes, even compound indexes based on query use cases, and there is no COLLSCAN determined in any of the query. I have tried this with MongoDb 3.6 as well as 4.x, the result is same.
db.getCollection("deals").find({isActive:true, branch: ObjectId("5af1de276fcd080007ed79fd")})
enter image description here

Apache Drill Mongo DB simple query takes too long

I am trying out Apache Drill to execute a query on a mongo connection. Simple COUNT(1) queries are taking too long. On the order of 20 seconds per query. When I try to connect using any other mongo connector and run the same query it takes miliseconds. I have also seen people talking online about their mongo queries taking 2 seconds. I can live with 2 seconds but 20 is too much.
Here is the query:
select count(*) from mongo.test.contacts
Here is the Query Profile for the query.
It seems that some optimizations should be applied for your case. It will be very helpful if you will create a Jira ticket [1] with details:
DDL for MongoDB table, version of MongoDB and info from log files (because it is not clear what Drill did all this time).
Simple reproduce of your case can help to solve this issue more quickly.
Thanks.
[1] https://issues.apache.org/jira/projects/DRILL/issues/

MongoDB is aggregation discourage over simple query

We are running a MongoDB instance for some of our price data, and I would like to find the most recent price update for each product that I have in the database.
Coming from a SQL background my initial thought was that to create an query with a subquery, where the subquery is a group by query. In the subquery price updates are grouped by the product and then one can find the most recent update for each price update.
I talked to a colleague about this approach and he claimed that in the official training material from MongoDB it is said that one should prefer simple queries over aggregated ones. i.e. he would run a query for each product and then find the most recent price update by ordering them by the update date. So that the number of queries will be linear in comparison to the number of products.
I do agree that it is simpler to write such a query, instead of an aggregated one, but I would have thought that performance wise it would have been faster to go through the collection once and find the queries i.e. the number of queries will be constant in comparison to the number of products.
He claims also that mongodb also will be able to better do optimization when running simple queries when running in a cluster.
Anybody know if that is the case?
I tried to search on the internet and I cannot find such a claim that one should prefer simple queries over aggregated ones.
Another colleague of mine was also thinking that it may be the case that since MongoDB are a new technology, then maybe aggregation queries have not been optimized for clustered MongoDB instances.
Anybody who can shed some light on these matters?
Thanks in advance
Here is some information on the aggregation pipeline on a sharded MongoDb implementation
Aggregation Pipeline and Sharded Collections
Assuming you have the right indexes in place on your collections, you shouldn't have any problems using MongoDB aggregation.

(Real time) Small data aggregation MongoDB: triggers?

What is a reliable and efficient way to aggregate small data in MongoDB?
Currently, my data that needs to be aggregated is under 1 GB, but can go as high as 10 GB. I'm looking for a real time strategy or near real time (aggregation every 15 minutes).
It seems like the likes of Map/Reduce, Hadoop, Storm are all over kill. I know that triggers don't exist, but I found this one post that may be ideal for my situation. Is creating a trigger in MongoDB an ideal solution for real time small data aggregation?
MongoDB has two built-in options for aggregating data - the aggregation framework and map-reduce.
The aggregation framework is faster (executing as native C++ code as opposed to a JavaScript map-reduce job) but more limited in the sorts of aggregations that are supported. Map-reduce is very versatile and can support very complex aggregations but is slower than the aggregation framework and can be more difficult to code.
Either of these would be a good option for near real time aggregation.
One further consideration to take into account is that as of the 2.4 release the aggregation framework returns a single document containing its results and is therefore limited to returning 16MB of data. In contrast, MongoDB map-reduce jobs have no such limitation and may output directly to a collection. In the upcoming 2.6 release of MongoDB, the aggregation framework will also gain the ability to output directly to a collection, using the new $out operator.
Based on the description of your use case, I would recommend using map-reduce as I assume you need to output more than 16MB of data. Also, note that after the first map-reduce run you may run incremental map-reduce jobs that run only on the data that is new/changed and merge the results into the existing output collection.
As you know, MongoDB doesn't support triggers, but you may easily implement triggers in the application by tailing the MongoDB oplog. This blog post and this SO post cover the topic well.