MongoDB improve $not & $elemMatch performance - mongodb

Let's say the DB contains tens of thousands of docs having the following structure
{
_id: ObjectId("5ef053e819aaa00013a2bd69"),
approvers: [
{
type: "ONE",
details: {
name: "NameOne",
surname: "SurnameOne"
}
},
{
type: "TWO",
details: {
name: "NameTwo",
surname: "SurnameTwo"
}
},
{
type: "THREE",
// details field is missing
}
]
}
I need to select only such docs where there's no approver of type "TWO" or "ONE" or the approver has missing details
I had an idea to use $not in a combination with $elemMatch:
{
$or: [
{
"approvers.type": {
$not: {
$in: ["ONE", "TWO"]
}
}
},
{
approvers: {
$not: {
$elemMatch: {
type: { $in: ["ONE", "TWO"]},
details: {$exists: true}
}
}
}
}
]
}
The query works but it's super ineffective since the index is not used. Based on my understanding, the DB engine has to do full collection scan and in each doc, check all the array elements.
Actually the collection has 75k records and every approvers array can hold up to 3 elements.
Is there any "trick" to make it more effective or the only option would be to change the data structure ?

This is where separated collection is beneficial
Assuming above are projects, a different structure can be in place
//approvals
[{
_id: ObjectId
projectId: ObjectId // etc the _id in your code
type: "one",
details: "some stuff"
},
{
_id: ObjectId
projectId: ObjectId // etc the _id in your code
type: "two",
details: "some stuff"
},
{
_id: ObjectId
projectId: ObjectId // etc the _id in your code
type: "three",
details: "some stuff"
}]
Then you can can get all the projectId where type $ne ["one", "two"], before retrieving the related projects using $in. This should be achievable via aggregation too though I never tried.

Related

MongoDB Comparison query Object Array Operators

Suppose that there is the applicants collection in a MongoDB database with these documents:
[
{
name: "Tom Hanks",
age: 42,
email: "tom.hanks#example.com"
job:{
"data engineer": 7,
"professor": 3
}
},
{
name: "Ken Smith",
age: 36,
email: "ken.smith#example.com"
job:{
"electronics engineer" : 10,
"database administrator" : 5
}
}
]
I want to write a query that retrieves the applicants who have some experience in the database field.
Tried: db.applications.find({ 'job': { $all: [ '.data.'] } })
However it's not working.
Can someone help, please.
I'm not sure how efficient this is or if its what you're looking for, but you can have a go at expanding the object to an array and then regex matching on each field using the $objectToArray operator and then regex matching on the keys
{
$project: {
jobMatch: {
$objectToArray: "$job"
}
}
},
{
$match: {
"jobMatch.k": {
$regex: "data"
}
}
}
])
You can go deeper and add a value match by doing values greater than a certain number with $and, let me know if you found it useful

How to only return X amount of embedded documents with MongoDB?

I have a large collection called posts, like so:
[{
_id: 349348jf49rk,
user: frje93u45t,
comments: [{
_id: fks9272ewt
user: 49wnf93hr9,
comment: "Hello world"
}, {
_id: j3924je93h
user: 49wnf93hr9,
comment: "Heya"
}, {
_id: 30283jt9dj
user: dje394ifjef,
comment: "Text"
}, {
_id: dkw9278467
user: fgsgrt245,
comment: "Hola"
}, {
_id: 4irt8ej4gt
user: 49wnf93hr9,
comment: "Test"
}]
}]
My comments subdocument can sometimes be 100s of documents long. My question is, how can I return just the 3 newest documents (based on the ID) instead of all the documents, and return the length of all documents as totalNumberOfComments as a count instead? I need to do this for 100s of posts sometimes. This is what the final result would look like:
[{
_id: 349348jf49rk,
user: frje93u45t,
totalNumberOfComments: 5,
comments: [{
_id: fks9272ewt
user: 49wnf93hr9,
comment: "Hello world"
}, {
_id: j3924je93h
user: 49wnf93hr9,
comment: "Heya"
}, {
_id: 30283jt9dj
user: dje394ifjef,
comment: "Text"
}]
}]
I understand that this could be completed after MongoDB returns the data by splicing, although I think it would be best to do this within the query so that Mongo doesn't have to return all comments for every single post all the time.
Does this solve your problem? try plugging in the _id values and see what you are missing and post them here.
begin with this query
db.collection.aggregate([{$match: {_id: 349348jf49rk}},
{$project:{
_id:1,
user:1,
totalNumberOfComments: { $size: "$comments" },
comments: {$slice:3}
}
}
])

How to remove an array value from item in a nested document array

I want to remove "tag4" only for "user3":
{
_id: "doc"
some: "value",
users: [
{
_id: "user3",
someOther: "value",
tags: [
"tag4",
"tag2"
]
}, {
_id: "user1",
someOther: "value",
tags: [
"tag3",
"tag4"
]
}
]
},
{
...
}
Note: This collection holds items referencing many users. Users are stored in a different collection. Unique tags for each user are also stored in the users collection. If an user removes a tag (or multiple) from his account it should be deleted from all items.
I tried this query, but it removes "tag4" for all users:
{
"users._id": "user3",
"users.tags": {
$in: ["tag4"]
}
}, {
$pullAll: {
"users.$.tags": ["tag4"]
}
}, {
multi: 1
}
I tried $elemMatch (and $and) in the selector but ended up with the same result only on the first matching document or noticed some strange things happen (sometimes all tags of other users are deleted).
Any ideas how to solve this? Is there a way to "back reference" in $pull conditions?
You need to use $elemMatch in your query object so that it will only match if both the _id and tags parts match for the same element:
db.test.update({
users: {$elemMatch: {_id: "user3", tags: {$in: ["tag4"]}}}
}, {
$pullAll: {
"users.$.tags": ["tag4"]
}
}, {
multi: 1
})

MongoDB aggregation framework approach to a multi-doc query

I am looking into the best way to organize filtering. I have the following document format:
{
_id: "info",
ids: ["id1", "id2", "id3"]
}
{
_id: "id1",
value: 5
}
{
_id: "id2",
value: 1
}
{
_id: "id3",
value: 5
}
I need to make the following query: get all documents by id from doc "info" and then filter them out by value 5. So, that result would be something like:
{
_id: "id1",
value: 5
}
{
_id: "id3",
value: 5
}
I suppose I need to do unwind on ids, but how do I then select all documents that match those values? Or maybe I should just use $in operator somehow to grab all documents and after that do filtering?
Any help is aprpeciated. Thanks.
If it is only MongoDB shell/script, I would do it like this:
db.ids.find({ _id: { $in: db.ids.findOne({ _id: "info" }).ids }, value: 5 })
You also have worse versions using:
or the eval command:
db.runCommand({
eval: function(value) {
var ids = db.ids.findOne({ _id: "info" }).ids;
return db.ids.find({ _id: { $in: ids }, value: value }).toArray();
},
args: [5]
})
or the $where operator (low performance because you execute one find for each candidate result with value 5):
db.ids.find({
value: 5,
$where: "db.ids.findOne({ _id: 'info', ids: this._id })"
})
But if you are trying to run the queries through a MongoDb driver, the story might be different.

Finding the next document in MongoDb

If this is my collection structure:
{ _id: ObjectId("4fdbaf608b446b0477000142"), name: "product 1" }
{ _id: ObjectId("4fdbaf608b446b0477000143"), name: "product 2" }
{ _id: ObjectId("4fdbaf608b446b0477000144"), name: "product 3" }
and I query product 1, is there a way to query the next document, which in this case would be "product 2"?
It is best to add explicit sort() criteria if you want a predictable order of results.
Assuming the order you are after is "insertion order" and you are using MongoDB's default generated ObjectIds, then you can query based on the ObjectId:
// Find next product created
db.products.find({_id: {$gt: ObjectId("4fdbaf608b446b0477000142") }}).limit(1)
Note that this example only works because:
the first four bytes of the ObjectId are calculated from a unix-style timestamp (see: ObjectId Specification)
a query on _id alone will use the default _id index (sorted by id) to find a match
So really, this implicit sort is the same as:
db.products.find({_id: {$gt: ObjectId("4fdbaf608b446b0477000142" )}}).sort({_id:1}).limit(1);
If you added more criteria to the query to qualify how to find the "next" product (for example, a category), the query could use a different index and the order may not be as you expect.
You can check index usage with explain().
Starting in Mongo 5, it's a perfect use case for the new $setWindowFields aggregation operator:
// { _id: 1, name: "product 1" }
// { _id: 2, name: "product 2" }
// { _id: 3, name: "product 3" }
db.collection.aggregate([
{ $setWindowFields: {
sortBy: { _id: 1 },
output: { next: { $push: "$$ROOT", window: { documents: [1, 1] } } }
}}
])
// { _id: 1, name: "product 1", next: [{ _id: 2, name: "product 2" }] }
// { _id: 2, name: "product 2", next: [{ _id: 3, name: "product 3" }] }
// { _id: 3, name: "product 3", next: [ ] }
This:
$sorts documents by their order of insertion using their _ids (ObjectIds contain the timestamp of insertion): sortBy: { _id: 1 }
adds the next field in each document (output: { running: { ... }})
by $pushing the whole document $$ROOT ($push: "$$ROOT")
on a specified span of documents (the window) which is in this case is a range of only the next document document: window: { documents: [1, 1] }.
You can get the items back in insertion order using ObjectId. See: http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs#OptimizingObjectIDs
If you want to get them back in some other order then you will have to define what that order is and store more information in your documents.