Use MongoDB aggregation to find set intersection of two sets within the same document - mongodb

I'm trying to use the Mongo aggregation framework to find where there are records that have different unique sets within the same document. An example will best explain this:
Here is a document that is not my real data, but conceptually the same:
db.house.insert(
{
houseId : 123,
rooms: [{ name : 'bedroom',
owns : [
{name : 'bed'},
{name : 'cabinet'}
]},
{ name : 'kitchen',
owns : [
{name : 'sink'},
{name : 'cabinet'}
]}],
uses : [{name : 'sink'},
{name : 'cabinet'},
{name : 'bed'},
{name : 'sofa'}]
}
)
Notice that there are two hierarchies with similar items. It is also possible to use items that are not owned. I want to find documents like this one: where there is a house that uses something that it doesn't own.
So far I've built up the structure using the aggregate framework like below. This gets me to 2 sets of distinct items. However I haven't been able to find anything that could give me the result of a set intersection. Note that a simple count of set size will not work due to something like this: ['couch', 'cabinet'] compare to ['sofa', 'cabinet'].
{'$unwind':'$uses'}
{'$unwind':'$rooms'}
{'$unwind':'$rooms.owns'}
{'$group' : {_id:'$houseId',
use:{'$addToSet':'$uses.name'},
own:{'$addToSet':'$rooms.owns.name'}}}
produces:
{ _id : 123,
use : ['sink', 'cabinet', 'bed', 'sofa'],
own : ['bed', 'cabinet', 'sink']
}
How do I then find the set intersection of use and own in the next stage of the pipeline?

You were not very far from the full solution with aggregation framework - you needed one more thing before the $group step and that is something that would allow you to see if all the things that are being used match up with something that is owned.
Here is the full pipeline
> db.house.aggregate(
{'$unwind':'$uses'},
{'$unwind':'$rooms'},
{'$unwind':'$rooms.owns'},
{$project: { _id:0,
houseId:1,
uses:"$uses.name",
isOkay:{$cond:[{$eq:["$uses.name","$rooms.owns.name"]}, 1, 0]}
}
},
{$group: { _id:{house:"$houseId",item:"$uses"},
hasWhatHeUses:{$sum:"$isOkay"}
}
},
{$match:{hasWhatHeUses:0}})
and its output on your document
{
"result" : [
{
"_id" : {
"house" : 123,
"item" : "sofa"
},
"hasWhatHeUses" : 0
}
],
"ok" : 1
}
Explanation - once you unwrap both arrays you now want to flag the elements where used item is equal to owned item and give them a non-0 "score". Now when you regroup things back by houseId you can check if any used items didn't get a match. Using 1 and 0 for score allows you to do a sum and now a match for item which has sum 0 means it was used but didn't match anything in "owned". Hope you enjoyed this!

So here is a solution not using the aggregation framework. This uses the $where operator and javascript. This feels much more clunky to me, but it seems to work so I wanted to put it out there if anyone else comes across this question.
db.houses.find({'$where':
function() {
var ownSet = {};
var useSet = {};
for (var i=0;i<obj.uses.length;i++){
useSet[obj.uses[i].name] = true;
}
for (var i=0;i<obj.rooms.length;i++){
var room = obj.rooms[i];
for (var j=0;j<room.owns.length;j++){
ownSet[room.owns[j].name] = true;
}
}
for (var prop in ownSet) {
if (ownSet.hasOwnProperty(prop)) {
if (!useSet[prop]){
return true;
}
}
}
for (var prop in useSet) {
if (useSet.hasOwnProperty(prop)) {
if (!ownSet[prop]){
return true;
}
}
}
return false
}
})

For MongoDB 2.6+ Only
As of MongoDB 2.6, there are set operations available in the project pipeline stage. The way to answer this problem with the new operations is:
db.house.aggregate([
{'$unwind':'$uses'},
{'$unwind':'$rooms'},
{'$unwind':'$rooms.owns'},
{'$group' : {_id:'$houseId',
use:{'$addToSet':'$uses.name'},
own:{'$addToSet':'$rooms.owns.name'}}},
{'$project': {int:{$setIntersection:["$use","$own"]}}}
]);

Related

How do I update values in a nested array?

I would like to preface this with saying that english is not my mother tongue, if any of my explanations are vague or don't make sense, please let me know and I will attempt to make them clearer.
I have a document containing some nested data. Currently product and customer are arrays, I would prefer to have them as straight up ObjectIDs.
{
"_id" : ObjectId("5bab713622c97440f287f2bf"),
"created_at" : ISODate("2018-09-26T13:44:54.431Z"),
"prod_line" : ObjectId("5b878e4c22c9745f1090de66"),
"entries" : [
{
"order_number" : "123",
"product" : [
ObjectId("5ba8a0e822c974290b2ea18d")
],
"customer" : [
ObjectId("5b86a20922c9745f1a6408d4")
],
"quantity" : "14"
},
{
"order_number" : "456",
"product" : [
ObjectId("5b878ed322c9745f1090de6c")
],
"customer" : [
ObjectId("5b86a20922c9745f1a6408d5")
],
"quantity" : "12"
}
]
}
I tried using the following query to update it, however that proved unsuccessful as Mongo didn't behave quite as I had expected.
db.Document.find().forEach(function(doc){
doc.entries.forEach(function(entry){
var entry_id = entry.product[0]
db.Document.update({_id: doc._id}, {$set:{'product': entry_id}});
print(entry_id)
})
})
With this query it sets product in the root of the object, not quite what I had hoped for. What I was hoping to do was to iterate through entries and change each individual product and customer to be only their ObjectId and not an array. Is it possible to do this via the mongo shell or do I have to look for another way to accomplish this? Thanks!
In order to accomplish your specified behavior, you just need to modify your query structure a bit. Take a look here for the specific MongoDB documentation on how to accomplish this. I will also propose an update to your code below:
db.Document.find().forEach(function(doc) {
doc.entries.forEach(function(entry, index) {
var productElementKey = 'entries.' + index + '.product';
var productSetObject = {};
productSetObject[productElementKey] = entry.product[0];
db.Document.update({_id: doc._id}, {$set: productSetObject});
print(entry_id)
})
})
The problem that you were having is that you were not updating the specific element within the entries array, but rather adding a new key to the top-level of the document named product. Generally, in order to set the value of an inner document within an array, you need to specify the array key first (entries in this case) and the inner document key second (product in this case). Since you are trying to set specific elements within the entries array, you need to also specify the index in your query object, I have specified above.
In order to update the customer key in the inner documents, simply switch out the product for customer in my above code.
You're trying to add a property 'product' directly into your document with this line
db.Document.update({_id: doc._id}, {$set:{'product': entry_id}});
Try to modify all your entries first, then update your document with this new array of entries.
db.Document.find().forEach(function(doc){
let updatedEntries = [];
doc.entries.forEach(function(entry){
let newEntry = {};
newEntry["order_number"] = entry.order_number;
newEntry["product"] = entry.product[0];
newEntry["customer"] = entry.customer[0];
newEntry["quantity"] = entry.quantity;
updatedEntries.push(newEntry);
})
db.Document.update({_id: doc._id}, {$set:{'entries': updatedEntries}});
})
You'll need to enumerate all the documents and then update the documents one and a time with the value store in the first item of the array for product and customer from each entry:
db.documents.find().snapshot().forEach(function (elem) {
elem.entries.forEach(function(entry){
db.documents.update({
_id: elem._id,
"entries.order_number": entry.order_number
}, {
$set: {
"entries.$.product" : entry.product[0],
"entries.$.customer" : entry.customer[0]
}
}
);
});
});
Instead of doing 2 updates each time you could possibly use the filtered positional operator to do all updates to all arrays items within one update query.

How to get multiple document using array of MongoDb id?

I have an array of ids and I want to get all document of them at once. For that I am writing but it return 0 records.
How can I search using multiple Ids ?
db.getCollection('feed').find({"_id" : { "$in" : [
"55880c251df42d0466919268","55bf528e69b70ae79be35006" ]}})
I am able to get records by passing single id like
db.getCollection('feed').find({"_id":ObjectId("55880c251df42d0466919268")})
MongoDB is type sensitive, which means 1 is different with '1', so are "55880c251df42d0466919268" and ObjectId("55880c251df42d0466919268"). The later one is in ObjectID type but not str, and also is the default _id type of MongoDB document.
You can find more information about ObjectID here.
Just try:
db.getCollection('feed').find({"_id" : {"$in" : [ObjectId("55880c251df42d0466919268"), ObjectId("55bf528e69b70ae79be35006")]}});
I believe you are missing the ObjectId. Try this:
db.feed.find({
_id: {
$in: [ObjectId("55880c251df42d0466919268"), ObjectId("55bf528e69b70ae79be35006")]
}
});
For finding records of multiple documents you have to use "$in" operator
Although your query is fine, you just need to add ObjectId while finding data for Ids
db.feed.find({
"_id" : {
"$in" :
[ObjectId("55880c251df42d0466919268"),
ObjectId("55bf528e69b70ae79be35006")
]
}
});
Just I have use it, and is working fine:
module.exports.obtenerIncidencias386RangoDias = function (empresas, callback) {
let arraySuc_Ids = empresas.map((empresa)=>empresa.sucursal_id)
incidenciasModel.find({'sucursal_id':{$in:arraySuc_Ids}
}).sort({created_at: 'desc'}).then(
(resp)=> {
callback(resp)}
)
};
where:
empresas = ["5ccc642f9e789820146e9cb0","5ccc642f9e789820146e9bb9"]
The Id's need to be in this format : ObjectId("id").
you can use this to transform your string ID's :
const ObjectId = require('mongodb').ObjectId;
store array list in var and pass that array list to find function
var list=db.collection_name.find()
db.collection_name.find({_id:{$in:list}})
db.getCollection({your_collection_name}).find({
"_id" : {
"$in" :
[({value1}),
({value2})
]
}
})

MongoDB findOne query depends of result

lets say that I have collection of something and record can look like that:
{
'_id' : MongoId('xxxxxx'),
'dir' : '/home/',
'category' : 'catname',
'someData' : 'very important info'
}
Is it possible to make only one query like collection.findOne({????}, {'someData' : 1}); to find colection matching this:
find by dir, if not found search for category name, if there is still
nothing find collection with category name is 'default'
or at least, can I say in this query that I want first match dir and if not found I want match category
collection.findOne({
$or : [
{'dir' : 'someCondition'},
{'category' : 'someCondition'}
]
});
Try this:
db.collection.aggregate([
{ '$project':
{
//Keep existing fields
'_id':1,
'dir':1,
'category':1,
'someData':1,
//Compute some new values
'sameDir': {'$eq':['$dir', 'SOMEDIR']},
'sameCategory': {'$eq':['$dir', 'SOMECATEGORY']},
'defaultCategory': {'$eq':['$dir', 'default']},
}
},
{ '$sort':
{
'sameDir': -1,
'sameCategory': -1,
'defaultCategory': -1,
}
},
{ '$limit': 1 }
])
The $sort will keep true values first, so a directory match will come first, followed by a category match, finally followed by a default category. The $limit:1 will keep the one on top (ie. the best match). Of course, make sure to input your own SOMEDIR and SOMECATEGORY values.

Ordering fields from find query with projection

I have a Mongo find query that works well to extract specific fields from a large document like...
db.profiles.find(
{ "profile.ModelID" : 'LZ241M4' },
{
_id : 0,
"profile.ModelID" : 1,
"profile.AVersion" : 2,
"profile.SVersion" : 3
}
);
...this produces the following output. Note how the SVersion comes before the AVersion in the document even though my projection asked for AVersion before SVersion.
{ "profile" : { "ModelID" : "LZ241M4", "SVersion" : "3.5", "AVersion" : "4.0.3" } }
{ "profile" : { "ModelID" : "LZ241M4", "SVersion" : "4.0", "AVersion" : "4.0.3" } }
...the problem is that I want the output to be...
{ "profile" : { "ModelID" : "LZ241M4", "AVersion" : "4.0.3", "SVersion" : "3.5" } }
{ "profile" : { "ModelID" : "LZ241M4", "AVersion" : "4.0.3", "SVersion" : "4.0" } }
What do I have to do get the Mongo JavaScript shell to present the results of my query in the field order that I specify?
I have achieved it by projecting the fields using aliases, instead of including and excluding by 0 and 1s.
Try this:
{
_id : 0,
"profile.ModelID" :"$profile.ModelID",
"profile.AVersion":"$profile.AVersion",
"profile.SVersion":"$profile.SVersion"
}
I get it now. You want to return results ordered by "fields" rather the value of a fields.
Simple answer is that you can't do this. Maybe its possible with the new aggregation framework. But this seems overkill just to order fields.
The second object in a find query is for including or excluding returned fields not for ordering them.
{
_id : 0, // 0 means exclude this field from results
"profile.ModelID" : 1, // 1 means include this field in the results
"profile.AVersion" :2, // 2 means nothing
"profile.SVersion" :3, // 3 means nothing
}
Last point, you shouldn't need to do this, who cares what order the fields come-back in.
You application should be able to make use of the fields it needs regardless of the order the fields are in.
Another solution I applied to achieve this is the following:
db.profiles
.find({ "profile.ModelID" : 'LZ241M4' })
.toArray()
.map(doc => ({
profile: {
ModelID: doc.profile.ModelID,
AVersion: doc.profile.AVersion,
SVersion: doc.profile.SVersion
}
}))
Since version 2.6 (that came out in 2014) MongoDB preserves the order of the document fields following the write operation (source).
P.S. If you are using Python you might find this interesting.

In MongoDB mapreduce, how can I flatten the values object?

I'm trying to use MongoDB to analyse Apache log files. I've created a receipts collection from the Apache access logs. Here's an abridged summary of what my models look like:
db.receipts.findOne()
{
"_id" : ObjectId("4e57908c7a044a30dc03a888"),
"path" : "/videos/1/show_invisibles.m4v",
"issued_at" : ISODate("2011-04-08T00:00:00Z"),
"status" : "200"
}
I've written a MapReduce function that groups all data by the issued_at date field. It summarizes the total number of requests, and provides a breakdown of the number of requests for each unique path. Here's an example of what the output looks like:
db.daily_hits_by_path.findOne()
{
"_id" : ISODate("2011-04-08T00:00:00Z"),
"value" : {
"count" : 6,
"paths" : {
"/videos/1/show_invisibles.m4v" : {
"count" : 2
},
"/videos/1/show_invisibles.ogv" : {
"count" : 3
},
"/videos/6/buffers_listed_and_hidden.ogv" : {
"count" : 1
}
}
}
}
How can I make the output look like this instead:
{
"_id" : ISODate("2011-04-08T00:00:00Z"),
"count" : 6,
"paths" : {
"/videos/1/show_invisibles.m4v" : {
"count" : 2
},
"/videos/1/show_invisibles.ogv" : {
"count" : 3
},
"/videos/6/buffers_listed_and_hidden.ogv" : {
"count" : 1
}
}
}
It's not currently possible, but I would suggest voting for this case: https://jira.mongodb.org/browse/SERVER-2517.
Taking the best from previous answers and comments:
db.items.find().hint({_id: 1}).forEach(function(item) {
db.items.update({_id: item._id}, item.value);
});
From http://docs.mongodb.org/manual/core/update/#replace-existing-document-with-new-document
"If the update argument contains only field and value pairs, the update() method replaces the existing document with the document in the update argument, except for the _id field."
So you need neither to $unset value, nor to list each field.
From https://docs.mongodb.com/manual/core/read-isolation-consistency-recency/#cursor-snapshot
"MongoDB cursors can return the same document more than once in some situations. ... use a unique index on this field or these fields so that the query will return each document no more than once. Query with hint() to explicitly force the query to use that index."
AFAIK, by design Mongo's map reduce will spit results out in "value tuples" and I haven't seen anything that will configure that "output format". Maybe the finalize() method can be used.
You could try running a post-process that will reshape the data using
results.find({}).forEach( function(result) {
results.update({_id: result._id}, {count: result.value.count, paths: result.value.paths})
});
Yep, that looks ugly. I know.
You can do Dan's code with a collection reference:
function clean(collection) {
collection.find().forEach( function(result) {
var value = result.value;
delete value._id;
collection.update({_id: result._id}, value);
collection.update({_id: result.id}, {$unset: {value: 1}} ) } )};
A similar approach to that of #ljonas but no need to hardcode document fields:
db.results.find().forEach( function(result) {
var value = result.value;
delete value._id;
db.results.update({_id: result._id}, value);
db.results.update({_id: result.id}, {$unset: {value: 1}} )
} );
All the proposed solutions are far from optimal. The fastest you can do so far is something like:
var flattenMRCollection=function(dbName,collectionName) {
var collection=db.getSiblingDB(dbName)[collectionName];
var i=0;
var bulk=collection.initializeUnorderedBulkOp();
collection.find({ value: { $exists: true } }).addOption(16).forEach(function(result) {
print((++i));
//collection.update({_id: result._id},result.value);
bulk.find({_id: result._id}).replaceOne(result.value);
if(i%1000==0)
{
print("Executing bulk...");
bulk.execute();
bulk=collection.initializeUnorderedBulkOp();
}
});
bulk.execute();
};
Then call it:
flattenMRCollection("MyDB","MyMRCollection")
This is WAY faster than doing sequential updates.
While experimenting with Vincent's answer, I found a couple of problems. Basically, if you perform updates within a foreach loop, this will move the document to the end of the collection and the cursor will reach that document again (example). This can be circumvented if $snapshot is used. Hence, I am providing a Java example below.
final List<WriteModel<Document>> bulkUpdate = new ArrayList<>();
// You should enable $snapshot if performing updates within foreach
collection.find(new Document().append("$query", new Document()).append("$snapshot", true)).forEach(new Block<Document>() {
#Override
public void apply(final Document document) {
// Note that I used incrementing long values for '_id'. Change to String if
// you used string '_id's
long docId = document.getLong("_id");
Document subDoc = (Document)document.get("value");
WriteModel<Document> m = new ReplaceOneModel<>(new Document().append("_id", docId), subDoc);
bulkUpdate.add(m);
// If you used non-incrementing '_id's, then you need to use a final object with a counter.
if(docId % 1000 == 0 && !bulkUpdate.isEmpty()) {
collection.bulkWrite(bulkUpdate);
bulkUpdate.removeAll(bulkUpdate);
}
}
});
// Fixing bug related to Vincent's answer.
if(!bulkUpdate.isEmpty()) {
collection.bulkWrite(bulkUpdate);
bulkUpdate.removeAll(bulkUpdate);
}
Note : This snippet takes an average of 7.4 seconds to execute on my machine with 100k records and 14 attributes (IMDB dataset). Without batching, it takes an average of 25.2 seconds.