MongoDB query is performing slow when using $in operator - mongodb

With some complex queries and aggregation, I got the list of object ids as below:
var objectIdsCollection = Array of ObjectIds
I have another collection, it's say Collection1 and It has OtherCollectionObjectId. Now, I need to filter this Collection1 based on object ids I got in an array.
Collection1 {
_id: ObjectId('xyz'),
name: ..
someOtherAttributes: ...
OtherCollectionObjectId: ObjectId('abc')
}
Below is the query which I am trying. I need to use await because, the result of this query is dependent on another query.
let queryData = await Document1.aggregate([
{
$match: {
OtherCollectionObjectId: {
$in: objectIdsCollection,
},
Deleted: false,
},
},
]);
But, this query is performing way slow. Sometimes it takes around 1 minute to fetch the results.
I tried couple of other suggestions from internet but nothing seems to work for this kind of scenario.
Please let me know anything which can improve the performance of this query.
Thanks

Related

Mongodb to fetch top 100 results for each category

I have a collection of transactions that has below schema:
{
_id,
client_id,
billings,
date,
total
}
What I want to achieve is to get the 10 latest transaction models based on the date for a list of client IDs. I don't think the $slice well as the use case is mostly for embedded arrays.
Currently, I am iterating through the client_ids and using find with the limit but it is extremely slow.
UPDATE
Example
https://mongoplayground.net/p/urKH7HOxwqC
This shows two clients with 10 transaction each on different days, I want to write a query that would return latest 5 transaction for each.
Any suggestions of how to query data to make it faster?
The most efficient way would be to just execute multiple queries, 1 for each client, like so:
const clients = await db.collection.distinct('client_id');
const results = await Promise.all(
clients.map((clientId) => db.collection.find({client_id: clientId}).sort({date: -1}).limit(5))
)
To improve this performance make sure you have an index on client_id and date. If for whatever reason you can't built these indexes I'd recommend using this following pipeline (with syntax available starting version 5.3+):
db.collection.aggregate([
{
$group: {
_id: "$client_id",
latestTransactions: {
"$bottomN": {
"n": 5,
"sortBy": {
"date": 1
},
"output": "$$ROOT"
}
}
}
}
])
Mongo Playground

MongoDB big collection aggregation is slow

I'm having a problem with the time of my mongoDB query, from a node backend using mongoose. i have a collection called people that has 10M records, and every record is queried from the backend and inserted from another part of the system that's written in c++ and needs to be very fast.
this is my mongoose schema:
{
_id: {type: String, index: {unique: true}}, // We generate our own _id! Might it be related to the slowness?
age: { type: Number },
id_num: { type: String },
friends: { type: Object }
}
schema.index({'id_num': 1}, { unique: true, collation: { locale: 'en_US', strength: 2 } })
schema.index({'age': 1})
schema.index({'id_num': 'text'});
Friends is an object looking like that: {"Adam": true, "Eve": true... etc.}.
there's no meaning to the value, and we use dictionaries to avoid duplicates fast on C++.
also, we didn't encounter a set/unique-list type of field in mongoDB.
The Problem:
We display people in a table with pagination. the table has abilities of sort, search, and select number of results.
At first, I queried all people and searched, sorted and paged it on the js. but when there are a lot of documents, It's turning problematic (memory problems).
The next thing i did was to try to fit those manipulations (searching, sorting & paging) on my query.
I used mongo's text search- but it not matches a partial word. is there any way to search a partial insensitive string? (I prefer not to use regex, to avoid unexpected problems)
I have to sort before paging, so I tried to use mongo sort. the problem is, that when the user wants to sort by "Friends", we want to return the people sorted by their number of friends (number of entries in the object).
The only way i succeeded pulling it off was using $addFields in aggregation:
{$addFields: {$size: {$ifNull: [{$objectToArray: '$friends'}, [] ]}}}
this addition is taking forever! when sorting by friends, the query takes about 40s for 8M people, and without this part it takes less than a second.
I used limit and skip for pagination. it works ok, but we have to wait until the user requests the second page and make another very long query.
In the end, this is the the interesting code part:
const { sortBy, sortDesc, search, page, itemsPerPage } = req.query
// Search never matches partial string
const match = search ? {$text: {$search: search}} : {}
const sortByInDB = ['age', 'id_num']
let sort = {$sort : {}}
const aggregate = [{$match: match}]
// if sortBy is on a simple type, we just use mongos sort
// else, we sortBy friends, and add a friends_count field.
if(sortByInDB.includes(sortBy)){
sort.$sort[sortBy] = sortDesc === 'true' ? -1 : 1
} else {
sort.$sort[sortBy+'_count'] = sortDesc === 'true' ? -1 : 1
// The problematic part of the query:
aggregate.push({$addFields: {friends_count: {$size: {
$ifNull: [{$objectToArray: '$friends'},[]]
}}}})
}
const numItems = parseInt(itemsPerPage)
const numPage = parseInt(page)
aggregate.push(sort, {$skip: (numPage - 1)*numItems}, {$limit: numItems})
// Takes a long time (when sorting by "friends")
let users = await User.aggregate(aggregate)
I tried indexing all simple fields, but the time is still too much.
The only other solution i could think of, is making mongo calculate a field "friends_count" every time a document is created or updated- but i have no idea how to do it, without slowing our c++ that writes to the DB.
Do you have any creative idea to help me? I'm lost, and I have to shorten the time drastically.
Thank you!
P.S: some useful information- the C++ area is writing the people to the DB in a bulk once in a while. we can sync once in a while and mostly rely on the data to be true. So, if that gives any of you any idea for a performance boost, i'd love to hear it.
Thanks!

Query Mongoose with priority query conditions

I need to find all documents where first query argument matches then if it can't find more documents that match that query it should apply another query argument.
so for example:
db.users.find({
$or: [
{ type: typeToSearch }, // First find all users that has type=typeToSearch
{ active: true } // then if it can't find more users with type=typeToSearch then look for active one
]})
.limit(20)
What actually this query does it will find active users first (depending on the order in a collection).
What I need is - if I have 18 users that have given type then they should be returned first and then 2 any active.
That's a cool feature you are looking for! Nothing in Mongoose will help you with this out of the box, and poking around in npm I don't see anything that will help you there either.
For your two queries you have to do something like this:
const fancyQuery = async limit => {
const first = await db.users.find({ type: typeToSearch }).limit(20)
let second = [];
if (first.length < limit)
second = await db.users.find({ active: true,
type:{$ne:typeToSearch}} })
.limit(20-first.length)
return [...first, ...second]
}
The only other path I can think of using the query api, is to fetch 40 items and then to filter the results with javascript. I think you'd need to change your query a little to prevent the active = true part of the query from also refetching all the same documents as the type query:
db.users.find({
$or: [
{ type: typeToSearch },
{ active: true,
type: {$ne: typeToSearch}
}
]})
.limit(40)
You'd filter the results first by type and then by not type and active up to 20 items.
You might also be able to use an aggregation pipeline to accomplish this, but I don't have an answer like that at my finger-tips.

Mongodb - return an array of _id of all updated documents

I need to update some documents in one collection, and send an array of the _ids of the updated documents to another collection.
Since update() returns the number of updated items not their ids, I've come up with the following to get the array:
var docsUpdated = [];
var cursor = myCollection.find(<myQuery>);
cursor.forEach(function(doc) {
myCollection.update({_id : doc._id}, <myUpdate>, function(error, response){
docsUpdated.push(doc._id);
});
});
Or I could do:
var docsUpdated = myCollection.distinct("_id", <myQuery>);
myCollection.update(<myQuery>, <myUpdate>, {multi : true});
I'm guessing the second version would be faster because it only calls the database twice. But both seem annoyingly inefficient - is there another way of doing this without multiple database calls? Or am I overcomplicating things?
I think you need the cursor operator ".aggregate()"
db.orders.aggregate([
{ $group: { _id: "$_id"} }
])
something along those lines that returns the results of all the id's in the collection

Meteor collection get last document of each selection

Currently I use the following find query to get the latest document of a certain ID
Conditions.find({
caveId: caveId
},
{
sort: {diveDate:-1},
limit: 1,
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
How can I use the same using multiple ids with $in for example
I tried it with the following query. The problem is that it will limit the documents to 1 for all the found caveIds. But it should set the limit for each different caveId.
Conditions.find({
caveId: {$in: caveIds}
},
{
sort: {diveDate:-1},
limit: 1,
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
One solution I came up with is using the aggregate functionality.
var conditionIds = Conditions.aggregate(
[
{"$match": { caveId: {"$in": caveIds}}},
{
$group:
{
_id: "$caveId",
conditionId: {$last: "$_id"},
diveDate: { $last: "$diveDate" }
}
}
]
).map(function(child) { return child.conditionId});
var conditions = Conditions.find({
_id: {$in: conditionIds}
},
{
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
You don't want to use $in here as noted. You could solve this problem by looping through the caveIds and running the query on each caveId individually.
you're basically looking at a join query here: you need all caveIds and then lookup last for each.
This is a problem of database schema/denormalization in my opinion: (but this is only an opinion!):
You could as mentioned here, lookup all caveIds and then run the single query for each, every single time you need to look up last dives.
However I think you are much better off recording/updating the last dive inside your cave document, and then lookup all caveIds of interest pulling only the lastDive field.
That will give you immediately what you need, rather than going through expensive search/sort queries. This is at the expense of maintaining that field in the document, but it sounds like it should be fairly trivial as you only need to update the one field when a new event occurs.