MongoDB aggregation framework approach to a multi-doc query - mongodb

I am looking into the best way to organize filtering. I have the following document format:
{
_id: "info",
ids: ["id1", "id2", "id3"]
}
{
_id: "id1",
value: 5
}
{
_id: "id2",
value: 1
}
{
_id: "id3",
value: 5
}
I need to make the following query: get all documents by id from doc "info" and then filter them out by value 5. So, that result would be something like:
{
_id: "id1",
value: 5
}
{
_id: "id3",
value: 5
}
I suppose I need to do unwind on ids, but how do I then select all documents that match those values? Or maybe I should just use $in operator somehow to grab all documents and after that do filtering?
Any help is aprpeciated. Thanks.

If it is only MongoDB shell/script, I would do it like this:
db.ids.find({ _id: { $in: db.ids.findOne({ _id: "info" }).ids }, value: 5 })
You also have worse versions using:
or the eval command:
db.runCommand({
eval: function(value) {
var ids = db.ids.findOne({ _id: "info" }).ids;
return db.ids.find({ _id: { $in: ids }, value: value }).toArray();
},
args: [5]
})
or the $where operator (low performance because you execute one find for each candidate result with value 5):
db.ids.find({
value: 5,
$where: "db.ids.findOne({ _id: 'info', ids: this._id })"
})
But if you are trying to run the queries through a MongoDb driver, the story might be different.

Related

MongoDB improve $not & $elemMatch performance

Let's say the DB contains tens of thousands of docs having the following structure
{
_id: ObjectId("5ef053e819aaa00013a2bd69"),
approvers: [
{
type: "ONE",
details: {
name: "NameOne",
surname: "SurnameOne"
}
},
{
type: "TWO",
details: {
name: "NameTwo",
surname: "SurnameTwo"
}
},
{
type: "THREE",
// details field is missing
}
]
}
I need to select only such docs where there's no approver of type "TWO" or "ONE" or the approver has missing details
I had an idea to use $not in a combination with $elemMatch:
{
$or: [
{
"approvers.type": {
$not: {
$in: ["ONE", "TWO"]
}
}
},
{
approvers: {
$not: {
$elemMatch: {
type: { $in: ["ONE", "TWO"]},
details: {$exists: true}
}
}
}
}
]
}
The query works but it's super ineffective since the index is not used. Based on my understanding, the DB engine has to do full collection scan and in each doc, check all the array elements.
Actually the collection has 75k records and every approvers array can hold up to 3 elements.
Is there any "trick" to make it more effective or the only option would be to change the data structure ?
This is where separated collection is beneficial
Assuming above are projects, a different structure can be in place
//approvals
[{
_id: ObjectId
projectId: ObjectId // etc the _id in your code
type: "one",
details: "some stuff"
},
{
_id: ObjectId
projectId: ObjectId // etc the _id in your code
type: "two",
details: "some stuff"
},
{
_id: ObjectId
projectId: ObjectId // etc the _id in your code
type: "three",
details: "some stuff"
}]
Then you can can get all the projectId where type $ne ["one", "two"], before retrieving the related projects using $in. This should be achievable via aggregation too though I never tried.

How to return a specific element of an array in a document?

I have a table containing documents set up as follows:
_id: 1,
name: { first: 'John', last: 'Doe' },
tools: [ 'Tool1', 'Tool2', 'Tool3' ],
skills: [
{ type: 'carpentry',
years: 3 },
{ type: 'plumbing',
year: 5 },
{ type: 'electrical',
year: 8 }
]
}
I need to write a script that can search each document in the table and return the value of a specific skill, for example: Find the number of years John Doe has in plumbing.
Since I don't need the full document, db.table.find({skills: {$elemMatch: {type:'plumbing'}}}) feels unnecessary and would still require me to search the document to find the value I'm looking for. Is there a way to just return the part of the document I'm looking for?
The desired output would be {type: 'plumbing', year: 5} so that I could then manipulate that data into another field in the document.
Try this-
db.collection.aggregate([
{
"$unwind": "$skills"
},
{
"$match": {
"skills.type": "plumbing"
}
},
{
"$project": {
skills: 1
}
}
])
Mongo Playground
OR try this if you only want year.
Mongo Playground 2

How to query multiple collections in mongodb (without using $lookup)?

I would like to create a single query that gets data from three different collections from providing a single query parameter. I have seen methods that use $lookup but I do not want to use that as I cannot use it on sharded collections.
Here is an example to explain further.
I have three collections: user, chatroom and chatMessage.
user collection:
{
_id: ObjectId('456'),
username: 'John',
contacts: [
{
_id: ObjectId('AB12'),
name: 'Mary',
idOfContact: ObjectId('123'),
},
{
_id: ObjectId('AB34'),
name: 'Jane',
_idOfContact: ObjectId('234'),
},
{
_id: ObjectId('AB56'),
name: 'Peter',
_idOfContact: ObjectId('345'),
}
],
}
chatroom collection:
{
_id: ObjectId('AB34'),
usersInThisChatRoom: [
ObjectId("456"),
ObjectId("123"),
ObjectId("234"),
]
}
chatMessage collection:
[
{
_id: ObjectId("M01"),
chatRoomObjectId: _id: ObjectId('AB34'),
senderObjectId: ObjectId('456'),
message: 'Hello humans!',
date: ISODate("2019-09-03T07:24:28.742Z"),
},
...(other messages)
]
What I would like to be returned
[
{
chatRoomObjectId: ObjectId('AB34'),
usersInThisChatRoom: [
{
contactName: 'John',
contactUserId: ObjectId('456'),
},
contactName: 'Mary',
contactUserId: ObjectId('123'),
},
contactName: 'Jane',
contactUserId: ObjectId('234'),
}
]
chatMessages: [
{
_id: ObjectId("M01"),
senderObjectId: ObjectId('456'),
message: 'Hello humans!',
date: ISODate("2019-09-03T07:24:28.742Z"),
},
...(other messages)
]
},
...(other documents)
]
Is there a way to get my desired results by making a single query using the user._id and will that be performance friendly?
Or, do I have to make several queries, one after another, to achieve what I want?
According to this answer, you cannot perform a single query across multiple collections (aside from the $lookup aggregation pipeline function.
This means that You either use the $lookup aggregation pipeline or you make several queries to the DB.

Get child object before _id in mongo

I'm trying to get the document before the current document from an collection.
The 'problem' is that this in an child of the document.
So let's say I have this document in the 'Shops' collection:
{
name: 'Test Shop',
invoices: [
{
_id: ObjectId("5c642436dc12625a909d8115"),
date: '2019-02-13 08:05:42.087Z',
value: 0
},
{
_id: ObjectId("5c6429bcc17f3d2e4c5dfb61"),
date: '2019-02-13 14:29:16.882Z',
value: 1
},
{
_id: ObjectId("5c642b32c17f3d2e4c5dfbdd"),
date: '2019-02-13 12:35:30.275Z',
value: 2
}
]
}
I have the latest invoice object with 'value: 2'
Now I want to fetch the object before this object. The object with 'value: 1'.
I'm trying to do that with this query, but It keeps me returning the first object (I think the first result for the search)
db.getCollection('shops').find({
'invoices._id': {
$lte: ObjectId("5c642b32c17f3d2e4c5dfbdd")
}
}, {'invoices.$':1}).sort({'invoices.date':1})
Is there a good way to only fetch the last result of the search, or do a good query?
Use $slice projection
db.collection.find(
{ },
{ "invoices": { "$slice": -1 }}
)
or $elemMatch projection
db.collection.find(
{ },
{ "invoices": { "$elemMatch": { "_id": ObjectId("5c642b32c17f3d2e4c5dfbdd") }}}
)

Mongodb aggregate on subdocument in array

I am implementing a small application using mongodb as a backend. In this application I have a data structure where the documents will contain a field that contains an array of subdocuments.
I use the following use case as a basis:
http://docs.mongodb.org/manual/use-cases/inventory-management/
As you can see from the example, each document have a field called carted, which is an array of subdocuments.
{
_id: 42,
last_modified: ISODate("2012-03-09T20:55:36Z"),
status: 'active',
items: [
{ sku: '00e8da9b', qty: 1, item_details: {...} },
{ sku: '0ab42f88', qty: 4, item_details: {...} }
]
}
This fits me perfect, except for one problem:
I want to count each unique item (with "sku" as the unique identifier key) in the entire collection where each document adds the count by 1 (multiple instances of the same "sku" in the same document will still just count 1). E.g. I would like this result:
{ sku: '00e8da9b', doc_count: 1 },
{ sku: '0ab42f88', doc_count: 9 }
After reading up on MongoDB, I am quite confused about how to do this (fast) when you have a complex schema as described above. If I have understood the otherwise excellent documentation correct, such operation may perhaps be achieved using either the aggregation framework or the map/reduce framework, but this is where I need some input:
Which framework would be better suited to achieve the result I am looking for, given the complexity of the structure?
What kind of indexes would be preferred in order to gain the best possible performance out of the chosen framework?
MapReduce is slow, but it can handle very large data sets. The Aggregation framework on the other hand is a little quicker, but will struggle with large data volumes.
The trouble with your structure shown is that you need to "$unwind" the arrays to crack open the data. This means creating a new document for every array item and with the aggregation framework it needs to do this in memory. So if you have 1000 documents with 100 array elements it will need to build a stream of 100,000 documents in order to groupBy and count them.
You might want to consider seeing if there's a schema layout that will server your queries better, but if you want to do it with the Aggregation framework here's how you could do it (with some sample data so the whole script will drop into the shell);
db.so.remove();
db.so.ensureIndex({ "items.sku": 1}, {unique:false});
db.so.insert([
{
_id: 42,
last_modified: ISODate("2012-03-09T20:55:36Z"),
status: 'active',
items: [
{ sku: '00e8da9b', qty: 1, item_details: {} },
{ sku: '0ab42f88', qty: 4, item_details: {} },
{ sku: '0ab42f88', qty: 4, item_details: {} },
{ sku: '0ab42f88', qty: 4, item_details: {} },
]
},
{
_id: 43,
last_modified: ISODate("2012-03-09T20:55:36Z"),
status: 'active',
items: [
{ sku: '00e8da9b', qty: 1, item_details: {} },
{ sku: '0ab42f88', qty: 4, item_details: {} },
]
},
]);
db.so.runCommand("aggregate", {
pipeline: [
{ // optional filter to exclude inactive elements - can be removed
// you'll want an index on this if you use it too
$match: { status: "active" }
},
// unwind creates a doc for every array element
{ $unwind: "$items" },
{
$group: {
// group by unique SKU, but you only wanted to count a SKU once per doc id
_id: { _id: "$_id", sku: "$items.sku" },
}
},
{
$group: {
// group by unique SKU, and count them
_id: { sku:"$_id.sku" },
doc_count: { $sum: 1 },
}
}
]
//,explain:true
})
Note that I've $group'd twice, because you said that an SKU can only count once per document, so we need to first sort out the unique doc/sku pairs and then count them up.
If you want the output a little different (in other words, EXACTLY like in your sample) we can $project them.
With the latest mongo build (it may be true for other builds too), I've found that slightly different version of cirrus's answer performs faster and consumes less memory. I don't know the details why, seems like with this version mongo somehow have more possibility to optimize the pipeline.
db.so.runCommand("aggregate", {
pipeline: [
{ $unwind: "$items" },
{
$group: {
// create array of unique sku's (or set) per id
_id: { id: "$_id"},
sku: {$addToSet: "$items.sku"}
}
},
// unroll all sets
{ $unwind: "$sku" },
{
$group: {
// then count unique values per each Id
_id: { id: "$_id.id", sku:"$sku" },
count: { $sum: 1 },
}
}
]
})
to match exactly the same format as asked in question, grouping by "_id" should be skipped