say i have this model
{
_id : 1,
ref: '1',
children: [
{
ref:'1.1',
grandchildren: [
{
ref:'1.1.1',
visible: true;
}
]
}
]
}
I'm aware that positional operator for nested arrays isn't available yet.
https://jira.mongodb.org/browse/SERVER-831
but wondered whether its possible to atomically update the document in the nested array?
In my example, i'd like to update the visible flag to false for the document for ref 1.1.1.
I have the children record ref == '1.1' and the grandchildrenref == '1.1.1'
thanks
Yes, this is possible only if you knew the index of the children array that has the grandchildren object to be updated beforehand and the update query will use the positional operator as follows:
db.collection.update(
{
"children.ref": "1.1",
"children.grandchildren.ref": "1.1.1"
},
{
"$set": {
"children.0.grandchildren.$.visible": false
}
}
)
However, if you don't know the array index positions beforehand, you should consider creating the $set conditions dynamically by using MapReduce. The basic idea with MapReduce is that it uses JavaScript as its query language but this tends to be fairly slower than the aggregation framework and not recommended for use in real-time data analysis.
In your MapReduce operation, you need to define a couple of steps i.e. the mapping step (which maps an operation into every document in the collection, and the operation can either do nothing or emit some object with keys and projected values) and reducing step (which takes the list of emitted values and reduces it to a single element).
For the map step, you ideally would want to get for every document in the collection, the index for each children array field and another key that contains the $set keys.
Your reduce step would be a function (which does nothing) simply defined as var reduce = function() {};
The final step in your MapReduce operation will then create a separate collection operations that contains the emitted operations array object along with a field with the $set conditions. This collection can be updated periodically when you run the MapReduce operation on the original collection.
Altogether, this MapReduce method would look like:
var map = function(){
for(var i = 0; i < this.children.length; i++){
emit(
{
"_id": this._id,
"index": i
},
{
"index": i,
"children": this.children[i],
"update": {
"ref": "children." + i.toString() + ".grandchildren.$.ref",
"visible": "children." + i.toString() + ".grandchildren.$.visible"
}
}
);
}
};
var reduce = function(){};
db.collection.mapReduce(
map,
reduce,
{
"out": {
"replace": "update_collection"
}
}
);
You can then use the cursor from the db.update_collection.find() method to iterate over and update your collection accordingly:
var cur = db.update_collection.find(
{
"value.children.ref": "1.1",
"value.children.grandchildren.ref": "1.1.1"
}
);
// Iterate through results and update using the update query object set dynamically by using the array-index syntax.
while (cur.hasNext()) {
var doc = cur.next();
var update = { "$set": {} };
// set the update query object
update["$set"][doc.value.update.visible] = false;
db.collection.update(
{
"children.ref": "1.1",
"children.grandchildren.ref": "1.1.1"
},
update
);
};
Related
This question already has answers here:
Retrieve only the queried element in an object array in MongoDB collection
(18 answers)
Closed 5 years ago.
I've tried several ways of creating an aggregation pipeline which returns just the matching entries from a document's embedded array and not found any practical way to do this.
Is there some MongoDB feature which would avoid my very clumsy and error-prone approach?
A document in the 'workshop' collection looks like this...
{
"_id": ObjectId("57064a294a54b66c1f961aca"),
"type": "normal",
"version": "v1.4.5",
"invitations": [],
"groups": [
{
"_id": ObjectId("57064a294a54b66c1f961acb"),
"role": "facilitator"
},
{
"_id": ObjectId("57064a294a54b66c1f961acc"),
"role": "contributor"
},
{
"_id": ObjectId("57064a294a54b66c1f961acd"),
"role": "broadcaster"
},
{
"_id": ObjectId("57064a294a54b66c1f961acf"),
"role": "facilitator"
}
]
}
Each entry in the groups array provides a unique ID so that a group member is assigned the given role in the workshop when they hit a URL with that salted ID.
Given a _id matching an entry in a groups array like ObjectId("57064a294a54b66c1f961acb"), I need to return a single record like this from the aggregation pipeline - basically returning the matching entry from the embedded groups array only.
{
"_id": ObjectId("57064a294a54b66c1f961acb"),
"role": "facilitator",
"workshopId": ObjectId("57064a294a54b66c1f961aca")
},
In this example, the workshopId has been added as an extra field to identify the parent document, but the rest should be ALL the fields from the original group entry having the matching _id.
The approach I have adopted can just about achieve this but has lots of problems and is probably inefficient (with repetition of the filter clause).
return workshopCollection.aggregate([
{$match:{groups:{$elemMatch:{_id:groupId}}}},
{$unwind:"$groups"},
{$match:{"groups._id":groupId}},
{$project:{
_id:"$groups._id",
role:"$groups.role",
workshopId:"$_id",
}},
]).toArray();
Worse, since it explicitly includes named fields from the entry, it will omit any future fields which are added to the records. I also can't generalise this lookup operation to the case of 'invitations' or other embedded named arrays unless I can know what the array entries' fields are in advance.
I have wondered if using the $ or $elemMatch operators within a $project stage of the pipeline is the right approach, but so far they have either been either ignored or triggered operator validity errors when running the pipeline.
QUESTION
Is there another aggregation operator or alternative approach which would help me with this fairly mainstream problem - to return only the matching entries from a document's array?
The implementation below can handle arbitrary queries, serves results as a 'top-level document' and avoids duplicate filtering in the pipeline.
function retrieveArrayEntry(collection, arrayName, itemMatch){
var match = {};
match[arrayName]={$elemMatch:itemMatch};
var project = {};
project[arrayName+".$"] = true;
return collection.findOne(
match,
project
).then(function(doc){
if(doc !== null){
var result = doc[arrayName][0];
result._docId = doc._id;
return result;
}
else{
return null;
}
});
}
It can be invoked like so...
retrieveArrayEntry(workshopCollection, "groups", {_id:ObjectId("57064a294a54b66c1f961acb")})
However, it relies on the collection findOne(...) method instead of aggregate(...) so will be limited to serving the first matching array entry from the first matching document. Projections referencing an array match clause are apparently not possible through aggregate(...) in the same way they are through findXXX() methods.
A still more general (but confusing and inefficient) implementation allows retrieval of multiple matching documents and subdocuments. It works around the difficulty MongoDb has with syntax consistency of Document and Subdocument matching through the unpackMatch method, so that an incorrect 'equality' criterion e.g. ...
{greetings:{_id:ObjectId("437908743")}}
...gets transferred into the required syntax for a 'match' criterion (as discussed at Within a mongodb $match, how to test for field MATCHING , rather than field EQUALLING )...
{"greetings._id":ObjectId("437908743")}
Leading to the following implementation...
function unpackMatch(pathPrefix, match){
var unpacked = {};
Object.keys(match).map(function(key){
unpacked[pathPrefix + "." + key] = match[key];
})
return unpacked;
}
function retrieveArrayEntries(collection, arrayName, itemMatch){
var matchDocs = {},
projectItems = {},
unwindItems = {},
matchUnwoundByMap = {};
matchDocs.$match={};
matchDocs.$match[arrayName]={$elemMatch:itemMatch};
projectItems.$project = {};
projectItems.$project[arrayName]=true;
unwindItems.$unwind = "$" + arrayName;
matchUnwoundByMap.$match = unpackMatch(arrayName, itemMatch);
return collection.aggregate([matchDocs, projectItems, unwindItems, matchUnwoundByMap]).toArray().then(function(docs){
return docs.map(function(doc){
var result = doc[arrayName];
result._docId = doc._id;
return result;
});
});
}
I currently have a collection that follows a format like this:
{ "_id": ObjectId(...),
"name" : "Name",
"red": 0,
"blue": 0,
"yellow": 1,
"green": 0,
...}
and so on (a bunch of colors). What I would like to do is to create a new array named colors, whose elements are those colors that have a value of 1.
For example:
{ "_id": ObjectId(...),
"name" : "Name",
"colors": ["yellow"]
}
Is this something I can do on the Mongo shell? Or should I do it in a program?
I'm pretty sure I can do it using Python, however I am having difficulties trying to do it directly in the shell. If it can be done in the shell, can anyone point me in the right direction?
Thanks.
Yes it can be easily done in the shell, or basically by following the example adapted into any language.
The key here is to look at the fields that are "colors" then contruct an update statement that both removes those fields from the document while testing them to see if they are valid for inclusion into the array, then of course adding that to the document update as well:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find().forEach(function(doc) {
doc.colors = doc.colors || [];
var update = { "$unset": {}};
Object.keys(doc).filter(function(key) {
return !/^_id|name|colors/.test(key)
}).forEach(function(key) {
update.$unset[key] = "";
if ( doc[key] == 1)
doc.colors.push(key);
});
update["$addToSet"] = { "colors": { "$each": doc.colors } };
bulk.find({ "_id": doc._id }).updateOne(update);
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp()
}
});
if ( count % 1000 != 0 )
bulk.execute();
The Bulk Operations usage means that batches of updates are sent rather than one request and response per document, so this will process a lot faster than merely issuing singular updates back and forth.
The main operators here are $unset to remove the existing fields and $addToSet to add the new evaluated array. Both are built up by cycling the keys of the document that make up the possible colors and excluding the other keys you don't want to modify using a regex filter.
Also using $addToSet and this line:
doc.colors = doc.colors || [];
with the purpose of being sure that if any document was already partially converted or otherwise touched by a code change that had already started storing the correct array, then these would not be adversely affected or overwritten by the update process.
tl;dr, spoiler
Mongodb's shell has access to some javascript-like methods on their objects. You can query your collection with db.yourCollectionName.find() which will return a cursor (cursor methods). Then iterate through to get each document, iterate through the keys, conditionally filter out keys like _id and name and then check to see if the value is 1, store that key somewhere in a collection.
Once done, you'd probably want to use db.yourCollectionName.update() or db.yourCollectionName.findAndModify() to find the record by _id and use $set to add a new field and set it's value to the collection of keys.
I want to perform a query on this collection to determine which documents have any keys in things that match a certain value. Is this possible?
I have a collection of documents like:
{
"things": {
"thing1": "red",
"thing2": "blue",
"thing3": "green"
}
}
EDIT: for conciseness
If you don't know what the keys will be and you need it to be interactive, then you'll need to use the (notoriously performance challenged) $where operator like so (in the shell):
db.test.find({$where: function() {
for (var field in this.settings) {
if (this.settings[field] == "red") return true;
}
return false;
}})
If you have a large collection, this may be too slow for your purposes, but it's your only option if your set of keys is unknown.
MongoDB 3.6 Update
You can now do this without $where by using the $objectToArray aggregation operator:
db.test.aggregate([
// Project things as a key/value array, along with the original doc
{$project: {
array: {$objectToArray: '$things'},
doc: '$$ROOT'
}},
// Match the docs with a field value of 'red'
{$match: {'array.v': 'red'}},
// Re-project the original doc
{$replaceRoot: {newRoot: '$doc'}}
])
I'd suggest a schema change so that you can actually do reasonable queries in MongoDB.
From:
{
"userId": "12347",
"settings": {
"SettingA": "blue",
"SettingB": "blue",
"SettingC": "green"
}
}
to:
{
"userId": "12347",
"settings": [
{ name: "SettingA", value: "blue" },
{ name: "SettingB", value: "blue" },
{ name: "SettingC", value: "green" }
]
}
Then, you could index on "settings.value", and do a query like:
db.settings.ensureIndex({ "settings.value" : 1})
db.settings.find({ "settings.value" : "blue" })
The change really is simple ..., as it moves the setting name and setting value to fully indexable fields, and stores the list of settings as an array.
If you can't change the schema, you could try #JohnnyHK's solution, but be warned that it's basically worst case in terms of performance and it won't work effectively with indexes.
Sadly, none of the previous answers address the fact that mongo can contain nested values in arrays or nested objects.
THIS IS THE CORRECT QUERY:
{$where: function() {
var deepIterate = function (obj, value) {
for (var field in obj) {
if (obj[field] == value){
return true;
}
var found = false;
if ( typeof obj[field] === 'object') {
found = deepIterate(obj[field], value)
if (found) { return true; }
}
}
return false;
};
return deepIterate(this, "573c79aef4ef4b9a9523028f")
}}
Since calling typeof on array or nested object will return 'object' this means that the query will iterate on all nested elements and will iterate through all of them until the key with value will be found.
You can check previous answers with a nested value and the results will be far from desired.
Stringifying the whole object is a hit on performance since it has to iterate through all memory sectors one by one trying to match them. And creates a copy of the object as a string in ram memory (both inefficient since query uses more ram and slow since function context already has a loaded object).
The query itself can work with objectId, string, int and any basic javascript type you wish.
Is there a way to count field names in mongodb? I have a mongo database of documents with other embedded documents within them. Here is an example of what the data might look like.
{
"incident": "osint181",
"summary":"Something happened",
"actor": {
"internal": {
"motive": [
"Financial"
],
"notes": "",
"role": [
"Malicious"
],
"variety": [
"Cashier"
]
}
}
}
Another document might look like this:
{
"incident": "osint182",
"summary":"Something happened",
"actor": {
"external": {
"motive": [
"Financial"
],
"notes": "",
"role": [
"Malicious"
],
"variety": [
"Hacker"
]
}
}
}
As you can see, the actor has changed from internal to external in the second document. What I would like to be able to do is count the number of incidents for each type of actor. My first attempt looked like this:
db.public.aggregate( { $group : { _id : "$actor", count : { $sum : 1 }}} );
But that gave me the entire subdocument and the count reflected how many documents were exactly the same. Rather I was hoping to get a count for internal and a count for external, etc. Is there an elegant way to do that? If not elegant, can someone give me a dirty way of doing that?
Best option for this kind of problem is using map-reduce of mongoDB , it will allow you to iterate through all the keys of the mongoDB document and easily you can add your complex logic . Check out map reduce examples here : http://docs.mongodb.org/manual/applications/map-reduce/
This was the answer I came up with based on the hint from Devesh. I create a map function that looks at the value of actor and checks if the document is an empty JSON object using the isEmptyObject function that I defined. Then I used mapReduce to go through the collection and check if the action field is empty. If the object is not empty then rather than returning the value of the key, I return the key itself which will be named internal, or external, or whatever.
The magic here was the scope call in mapReduce which makes it so that my isEmptyObject is in scope for mapReduce. The results are written to a collection which I named temp. After gathering the information I want from the temp collection, I drop it.
var isEmptyObject = function(obj) {
for (var name in obj) {
return false;
}
return true;
};
var mapFunction = function() {
if (isEmptyObject(this.action)) {
emit("Unknown",1); }
else {
for (var key in this.actor) { emit(key,1); } } };
var reduceFunction = function(inKeys,counter) {
return Array.sum(counter); };
db.public.mapReduce(mapFunction, reduceFunction, {out:"temp", scope:{isEmptyObject:isEmptyObject}} );
foo = db.temp.aggregate(
{ $sort : { value : -1 }});
db.temp.drop();
printjson(foo)
I'm using MongoDB and need to remove duplicate records. I have a listing collection that looks like so: (simplified)
[
{ "MlsId": "12345"" },
{ "MlsId": "12345" },
{ "MlsId": "23456" },
{ "MlsId": "23456" },
{ "MlsId": "0" },
{ "MlsId": "0" },
{ "MlsId": "" },
{ "MlsId": "" }
]
A listing is a duplicate if the MlsId is not "" or "0" and another listing has that same MlsId. So in the example above, the 2nd and 4th records would need to be removed.
How would I find all duplicate listings and remove them? I started looking at MapReduce but couldn't find an example that fit my case.
Here is what I have so far, but it doesn't check if the MlsId is "0" or "":
m = function () {
emit(this.MlsId, 1);
}
r = function (k, vals) {
return Array.sum(vals);
}
res = db.Listing.mapReduce(m,r);
db[res.result].find({value: {$gt: 1}});
db[res.result].drop();
I have not used mongoDB but I have used mapreduce. I think you are on the right track in terms of the mapreduce functions. To exclude he 0 and empty strings, you can add a check in the map function itself.. something like
m = function () {
if(this.MlsId!=0 && this.MlsId!="") {
emit(this.MlsId, 1);
}
}
And reduce should return key-value pairs. So it should be:
r = function(k, vals) {
emit(k,Arrays.sum(vals);
}
After this, you should have a set of key-value pairs in output such that the key is MlsId and the value is the number of thimes this particular ID occurs. I am not sure about the db.drop() part. As you pointed out, it will most probably delete all MlsIds instead of removing only the duplicate ones. To get around this, maybe you can call drop() first and then recreate the MlsId once. Will that work for you?
In mongodb you can use a query to restrict documents that are passed in for mapping. You probably want to do that for the ones you don't care about. Then in the reduce function you can ignore the dups and only return one of the docs for each duplicate key.
I'm a little confused about your goal though. If you just want to find duplicates and remove all but one of them then you can just create a unique index on that field and use the dropDups option; the process of creating the index will drop duplicate docs. Keeping the index will ensure that it doesn't happen again.
http://www.mongodb.org/display/DOCS/Indexes#Indexes-DuplicateValues
You can use aggregation operation to remove duplicates. Unwind, introduce a dummy $group and $sum stage and ignore the counts in your next stage. Something like this,
db.myCollection.aggregate([
{
$unwind: '$list'
},
{
$group:{
'_id':
{
'listing_id':'$_id', 'MlsId':'$list.MlsId'
},
'count':
{
'$sum':1
}
}
},
{
$group:
{
'_id':'$_id.listing_id',
'list':
{
'$addToSet':
{
'MlsId':'$_id.MlsId'
}
}
}
}
]);
this is how I following the #harri answer to remove duplicates:
//contains duplicated documents id and numeber of duplicates
db.createCollection("myDupesCollection")
res = db.sampledDB.mapReduce(m, r, { out : "myDupesCollection" });
// iterate through duplicated docs and remove duplicates (keep one)
db.myDupesCollection.find({value: {$gt: 1}}).forEach(function(myDoc){
u_id = myDoc._id.MlsId;
counts =myDoc.value;
db.sampledDB.remove({MlsId: u_id},counts-1); //if there are 3 docs, remove 3-1=2 of them
});