MongoDB - simple sub query example - mongodb

Given the data:
> db.parameters.find({})
{ "_id" : ObjectId("56cac0cd0b5a1ffab1bd6c12"), "name" : "Speed", "groups" : [ "
123", "234" ] }
> db.groups.find({})
{ "_id" : "123", "name" : "Group01" }
{ "_id" : "234", "name" : "Group02" }
{ "_id" : "567", "name" : "Group03" }
I would like to supply a parameter _id an make a query return all groups that are within the groups array of the given document in parameters table.
The straightforward solution seems to make several DB calls in PyMongo:
Get parameter from parameters table based on the supplied _id
For each element of groups array select a document from groups collection
But this will have so much unnecessary overhead. I feel there must be a better, faster way to do this within MongoDB (without running custom JS in the DB). Or should I re-structure my data by normalising it a little bit (like a table of relationships), neglecting the document-based approach?
Again, please help me find a solution that would work from PyMongo DB interface

You can do this within a single query using the aggregation framework. In particular you'd need to run an aggregation pipeline that uses the $lookup operator to do a left join from the parameters collection to the groups collection.
Consider running the following pipeline:
db.parameters.aggregate([
{ "$unwind": "$groups" },
{
"$lookup": {
"from": "groups",
"localField": "groups",
"foreignField": "_id",
"as": "grp"
}
},
{ "$unwind": "$grp" }
])
Sample Output
/* 1 */
{
"_id" : ObjectId("56cac0cd0b5a1ffab1bd6c12"),
"name" : "Speed",
"groups" : "123",
"grp" : {
"_id" : "123",
"name" : "Group01"
}
}
/* 2 */
{
"_id" : ObjectId("56cac0cd0b5a1ffab1bd6c12"),
"name" : "Speed",
"groups" : "234",
"grp" : {
"_id" : "234",
"name" : "Group02"
}
}
If your MongoDB server version does not support the $lookup pipeline operator, then you'd need execute two queries as follows:
# get the group ids
ids = db.parameters.find_one({ "_id": ObjectId("56cac0cd0b5a1ffab1bd6c12") })["groups"]
# query the groups collection with the ids from previous query
db.groups.find({ "_id": { "$in": ids } })
EDIT: matched the field name in the aggregation query to the field name in example dataset (within the question)

Related

MongoDB Sorting: Equivalent Aggregation Query

I have following students collection
{ "_id" : ObjectId("5f282eb2c5891296d8824130"), "name" : "Rajib", "mark" : "1000" }
{ "_id" : ObjectId("5f282eb2c5891296d8824131"), "name" : "Rahul", "mark" : "1200" }
{ "_id" : ObjectId("5f282eb2c5891296d8824132"), "name" : "Manoj", "mark" : "1000" }
{ "_id" : ObjectId("5f282eb2c5891296d8824133"), "name" : "Saroj", "mark" : "1400" }
My requirement is to sort the collection basing on 'mark' field in descending order. But it should not display 'mark' field in final result. Result should come as:
{ "name" : "Saroj" }
{ "name" : "Rahul" }
{ "name" : "Rajib" }
{ "name" : "Manoj" }
Following query I tried and it works fine.
db.students.find({},{"_id":0,"name":1}).sort({"mark":-1})
My MongoDB version is v4.2.8. Now question is what is the equivalent Aggregation Query of the above query. I tried following two queries. But both didn't give me desired result.
db.students.aggregate([{"$project":{"name":1,"_id":0}},{"$sort":{"mark":-1}}])
db.students.aggregate([{"$project":{"name":1,"_id":0,"mark":1}},{"$sort":{"mark":-1}}])
Why it is working in find()?
As per Cursor.Sort, When a set of results are both sorted and projected, the MongoDB query engine will always apply the sorting first.
Why it isn't working in aggregate()?
As per Aggregation Pipeline, The MongoDB aggregation pipeline consists of stages. Each stage transforms the documents as they pass through the pipeline. Pipeline stages do not need to produce one output document for every input document; e.g., some stages may generate new documents or filter out documents.
You need to correct:
You should change pipeline order, because if you have not selected mark field in $project then it will no longer available in further pipelines and it will not affect $sort operation.
db.students.aggregate([
{ "$sort": { "mark": -1 } },
{ "$project": { "name": 1, "_id": 0 } }
])
Playground: https://mongoplayground.net/p/xtgGl8AReeH

How to find MongoDB documents in one collection not referenced by documents in another collection?

I am looking for an efficient way in MonogDB to determine, which documents in one collection are not referenced by documents in another collection.
The database comprises two collections, inventory and tags, where some (not all) documents in inventory reference one of the tags documents:
{
"_id" : ObjectId("5e8df3c02e197074f39f61ea"),
"tag" : ObjectId("5e89a1af96d5d8b30aead768"),
"ean" : "5707196199178",
"location" : "shelf 1"
},
{
"_id" : ObjectId("5e8df211727079cdc24e20e1"),
"ean" : "5707196199178",
"location" : "shelf 1"
}
The 'tags' documents are without any reference to documents in inventory:
{
"_id" : ObjectId("5e7d174fc63ce5b0ca80b89a"),
"nfc" : { "id" : "04:5f:ae:f2:c2:66:81" },
"barcode" : { "code" : "29300310", "type" : "EAN8" }
},
{
"_id" : ObjectId("5e89a1af96d5d8b30aead768"),
"nfc" : { "id" : "04:48:af:f2:c2:66:80" },
"barcode" : { "code" : "29300716", "type" : "EAN8" }
},
{
"_id" : ObjectId("5e7d1756c63ce5b0ca80b89c"),
"nfc" : { "id" : "04:02:ae:f2:c2:66:81" },
"barcode" : { "code" : "29300648", "type" : "EAN8" }
}
Since not all documents in tags are used in inventory documents, I cannot simply have them as sub-documents.
Now I need to determine, which of the tags documents are not referenced by any inventory document. I would prefer not to have to maintain back references from tags to inventory to not risk inconsistencies (unless this can be done automatically by MongoDB?).
I'm very new to MongoDB, and from I've learned so far I'm under the impression that a view is probably what I need. But I seem to lack the proper search terms to find examples that help me understand enough to proceed. Maybe I need something different, here I'm hoping for your input to point me in the right direction.
You need to perform MongoDB aggregation with $lookup operator that allows two collections to be joined.
If there are "tags documents are not referenced by any inventory document", join field would be an empty array.
In the next step, we filter empty arrays with $size operator.
Try the query below:
db.tags.aggregate([
{
$lookup: {
from: "inventory",
localField: "_id",
foreignField: "tag",
as: "join"
}
},
{
$match: {
"join": {
$size: 0
}
}
},
{
$project: {
join: 0
}
}
])
tags not referenced | inventory not referenced

From two collections how to filter un matching data

In DB i have som sample data as fallows
items(Collection name)
//Object 1
{
"_id" : 1234,
"itemCode" : 3001,// (Number)
"category" : "Biscuts"
}
//Object 2
{
"_id" : 1235,
"itemCode" : 3002,// (Number)
"category" : "Health products"
}
The Above is the sample data in the items collection. So like this, there are many objects with the unique item code.
orders(Collection name)
{
"_id" : 1456,
"customer" : "ram",
"address" : "india",
"type" : "order",
"date" : "2018/08/20",
"orderId" : "999",
"itemcode" : "3001"//('string')
}
The above is the orders sample data. Even this collection has many objects with repeating item codes and orderid.
In the application, we have some tab called items not billed. So in this tab, we can see the items which were not used even once for the order. So from the above data how can I show the items which were not used?
For example: From the above data the resulting itemcode should be 3002 because that item is not used even once. How can I get the output with one DB query?
You can use below aggregation in mongo 4.0 version.
db.items.aggregate([
{ $addFields: {
itemCodeStr: {$toString: "$itemCode"}
}},
{
$lookup: {
from: "orders",
localField: "itemCodeStr",
foreignField: "itemcode",
as: "matched-orders"
}
},
{
$match: {
matched-orders: []
}
}
])

$lookup on Embedded Documents in MongoDB: Does order of values matter?

I have two similar collections within the same database which I am trying to merge using $lookup and the aggregate pipeline. Their _ids, which I'm using as the matching field, contain the same values, but in a different order:
Collection1:
{ "_id" : { "State" : "Vermont", "Race" : "Black American or African American" }, "Population" : 6456 }
Collection2:
{ "_id" : { "Race" : "Multiracial", "State" : "Arkansas" }, "Population" : 48996 }
I tried running the aggregate pipeline as follows:
db.Collection1.aggregate([{$lookup: {from: "Collection2", localField: "_id", foreignField: "_id", as: "Population"}}])
However, when I do that, I get:
{ "_id" : { "Race" : "Multiracial", "State" : "Arkansas" }, "Population" : [ ] }
I'd like to get the values for population within the array. I'm fairly new to MongoDB. Is there something wrong with my syntax for the aggregate command, or is it failing because 'Race' and 'State' are listed in a different order within the embedded document _id? Does the order of the values matter for matching on embedded documents?
Thank you so much for your time, and I appreciate any suggestions.

Save Subset of MongoDB Collection to Another Collection

I have a set like so
{date: 20120101}
{date: 20120103}
{date: 20120104}
{date: 20120005}
{date: 20120105}
How do I save a subset of those documents with the date '20120105' to another collection?
i.e db.subset.save(db.full_set.find({date: "20120105"}));
I would advise using the aggregation framework:
db.full_set.aggregate([ { $match: { date: "20120105" } }, { $out: "subset" } ])
It works about 100 times faster than forEach at least in my case. This is because the entire aggregation pipeline runs in the mongod process, whereas a solution based on find() and insert() has to send all of the documents from the server to the client and then back. This has a performance penalty, even if the server and client are on the same machine.
Here's the shell version:
db.full_set.find({date:"20120105"}).forEach(function(doc){
db.subset.insert(doc);
});
Note: As of MongoDB 2.6, the aggregation framework makes it possible to do this faster; see melan's answer for details.
Actually, there is an equivalent of SQL's insert into ... select from in MongoDB. First, you convert multiple documents into an array of documents; then you insert the array into the target collection
db.subset.insert(db.full_set.find({date:"20120105"}).toArray())
The most general solution is this:
Make use of the aggregation (answer given by #melan):
db.full_set.aggregate({$match:{your query here...}},{$out:"sample"})
db.sample.copyTo("subset")
This works even when there are documents in "subset" before the operation and you want to preserve those "old" documents and just insert a new subset into it.
Care must be taken, because the copyTo() command replaces the documents with the same _id.
There's no direct equivalent of SQL's insert into ... select from ....
You have to take care of it yourself. Fetch documents of interest and save them to another collection.
You can do it in the shell, but I'd use a small external script in Ruby. Something like this:
require 'mongo'
db = Mongo::Connection.new.db('mydb')
source = db.collection('source_collection')
target = db.collection('target_collection')
source.find(date: "20120105").each do |doc|
target.insert doc
end
Mongodb has aggregate along with $out operator which allow to save subset into new collection. Following are the details :
$out Takes the documents returned by the aggregation pipeline and writes them to a specified collection.
The $out operation creates a new collection in the current database if one does not already exist.
The collection is not visible until the aggregation completes.
If the aggregation fails, MongoDB does not create the collection.
Syntax :
{ $out: "<output-collection>" }
Example
A collection books contains the following documents:
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
The following aggregation operation pivots the data in the books collection to have titles grouped by authors and then writes the results to the authors collection.
db.books.aggregate( [
{ $group : { _id : "$author", books: { $push: "$title" } } },
{ $out : "authors" }
] )
After the operation, the authors collection contains the following documents:
{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }
In the asked question, use following query and you will get new collection named 'col_20120105' in your database
db.products.aggregate([
{ $match : { date : "20120105" } },
{ $out : "col_20120105" }
]);
You can also use $merge aggregation pipeline stage.
db.full_set.aggregate([
{$match: {...}},
{ $merge: {
into: { db: 'your_db', coll: 'your_another_collection' },
on: '_id',
whenMatched: 'keepExisting',
whenNotMatched: 'insert'
}}
])