Is there a way to count field names in mongodb? I have a mongo database of documents with other embedded documents within them. Here is an example of what the data might look like.
{
"incident": "osint181",
"summary":"Something happened",
"actor": {
"internal": {
"motive": [
"Financial"
],
"notes": "",
"role": [
"Malicious"
],
"variety": [
"Cashier"
]
}
}
}
Another document might look like this:
{
"incident": "osint182",
"summary":"Something happened",
"actor": {
"external": {
"motive": [
"Financial"
],
"notes": "",
"role": [
"Malicious"
],
"variety": [
"Hacker"
]
}
}
}
As you can see, the actor has changed from internal to external in the second document. What I would like to be able to do is count the number of incidents for each type of actor. My first attempt looked like this:
db.public.aggregate( { $group : { _id : "$actor", count : { $sum : 1 }}} );
But that gave me the entire subdocument and the count reflected how many documents were exactly the same. Rather I was hoping to get a count for internal and a count for external, etc. Is there an elegant way to do that? If not elegant, can someone give me a dirty way of doing that?
Best option for this kind of problem is using map-reduce of mongoDB , it will allow you to iterate through all the keys of the mongoDB document and easily you can add your complex logic . Check out map reduce examples here : http://docs.mongodb.org/manual/applications/map-reduce/
This was the answer I came up with based on the hint from Devesh. I create a map function that looks at the value of actor and checks if the document is an empty JSON object using the isEmptyObject function that I defined. Then I used mapReduce to go through the collection and check if the action field is empty. If the object is not empty then rather than returning the value of the key, I return the key itself which will be named internal, or external, or whatever.
The magic here was the scope call in mapReduce which makes it so that my isEmptyObject is in scope for mapReduce. The results are written to a collection which I named temp. After gathering the information I want from the temp collection, I drop it.
var isEmptyObject = function(obj) {
for (var name in obj) {
return false;
}
return true;
};
var mapFunction = function() {
if (isEmptyObject(this.action)) {
emit("Unknown",1); }
else {
for (var key in this.actor) { emit(key,1); } } };
var reduceFunction = function(inKeys,counter) {
return Array.sum(counter); };
db.public.mapReduce(mapFunction, reduceFunction, {out:"temp", scope:{isEmptyObject:isEmptyObject}} );
foo = db.temp.aggregate(
{ $sort : { value : -1 }});
db.temp.drop();
printjson(foo)
Related
I'm relatively new to Mongo & mongoose, and I've hit a problem.
I have a reasonably (for me anyway) complex query, that will allow the user to search for all entered terms.
so if the query is something like so:
var query = { '$and' : [
{ "foo1" : "bar1" },
{ '$and' : [ "foor2" : { $ne : null } }, { "foo2" : "bar2" } ] },
{ "foo3" : "bar3" }
]};
Doc.find(query);
but the user can enter any number of combinations for the parameters, i.e. I could search for all items that match foo1 & foo2, or just all items that match foo2, or just foo3, etc.
Is there a way to tell the query to only look for a parameter if it isn't empty, or is there a way to build searches like this programmatically? I have seen other options, for adding parameters like this, but they only seem to add in the standard
{ foo : 'bar' }
format, and for some reason they always seem to get added to query whether they meet the conditions of the if statement or not.
Thanks.
Firstly, you don't need $and operator for what you want. Comma separation is implicit and.
Your example query should simply be:
var query = {
"foo1": "bar1",
//"foo2": { $ne: null}, is unnecessary as "foo2" is being searched for "bar2" already, so it won't be null
"foo2": "bar2",
"foo3": "bar3"
};
To build this query dynamically, you can check the parameters (say req.body) one by one and add them to query object with bracket notation:
var query = {};
if (req.body.foo1) {
query["foo1"] = req.body.foo1
}
if (req.body.foo2) {
query["foo2"] = req.body.foo2;
}
if (req.body.foo3) {
query["foo3"] = req.body.foo3;
}
Or, you can loop through the parameters and build the same query object if you are sure what they contain:
var query = {};
for(var key in req.body){
query[key] = req.body[key];
}
I've got the following doc in my db:
{
"_id": ObjectId("ABCDEFG12345"),
"options" : {
"foo": "bar",
"another": "something"
},
"date" : {
"created": 1234567890,
"updated": 0
}
}
And I want to update options.foo and date.updated at the same time using dot notation, like so:
var mongojs = require('mongojs');
var optionName = 'foo';
var optionValue = 'baz';
var updates = {};
updates['options.' + optionName] = optionValue;
updates['date.updated'] = new Date().getTime();
db.myCollection.findAndModify({
query : {
_id : ObjectId('ABCDEFG12345')
},
update : {
$set : updates
},
upsert : false,
new : true
}, function(error, doc, result) {
console.log(doc.options);
console.log(doc.date);
});
And this results in:
{
foo : 'baz',
another : 'something'
}
{
updated : 1234567890
}
Specifically, my pre-existing date.created field is getting clobbered even though I'm using dot notation.
Why is this only partially working? The options sub-document retains its pre-existing data (options.another), why doesn't the date sub-document retain its pre-existing data?
The behavior described typically happens when the object passed in the $set operator is of the form { "data" : { "updated" : 1234567890 } } rather than { "data.updated" : 1234567890 }, but I'm not familiar with dots in JavaScript enough to tell if that could be the cause on JS's side.
Also, it wouldn't explain why it happens with data and not options.
If you could print the object stored in the variable updates and that is sent to MongoDB in the update field, that would allow to tell on which side the issue is (JS or MongoDB).
i pass your code to a test environment and use the same library you are using. The mongojs library, for query by native ObjectId is like this mongojs.ObjectId("####") Can look the official documentation.
for the callback function in the findAndModify function, the docs parameter is an array so i navigate like an array
Note: [to concatenate the string i use template literals] (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals)
All work fine...
say i have this model
{
_id : 1,
ref: '1',
children: [
{
ref:'1.1',
grandchildren: [
{
ref:'1.1.1',
visible: true;
}
]
}
]
}
I'm aware that positional operator for nested arrays isn't available yet.
https://jira.mongodb.org/browse/SERVER-831
but wondered whether its possible to atomically update the document in the nested array?
In my example, i'd like to update the visible flag to false for the document for ref 1.1.1.
I have the children record ref == '1.1' and the grandchildrenref == '1.1.1'
thanks
Yes, this is possible only if you knew the index of the children array that has the grandchildren object to be updated beforehand and the update query will use the positional operator as follows:
db.collection.update(
{
"children.ref": "1.1",
"children.grandchildren.ref": "1.1.1"
},
{
"$set": {
"children.0.grandchildren.$.visible": false
}
}
)
However, if you don't know the array index positions beforehand, you should consider creating the $set conditions dynamically by using MapReduce. The basic idea with MapReduce is that it uses JavaScript as its query language but this tends to be fairly slower than the aggregation framework and not recommended for use in real-time data analysis.
In your MapReduce operation, you need to define a couple of steps i.e. the mapping step (which maps an operation into every document in the collection, and the operation can either do nothing or emit some object with keys and projected values) and reducing step (which takes the list of emitted values and reduces it to a single element).
For the map step, you ideally would want to get for every document in the collection, the index for each children array field and another key that contains the $set keys.
Your reduce step would be a function (which does nothing) simply defined as var reduce = function() {};
The final step in your MapReduce operation will then create a separate collection operations that contains the emitted operations array object along with a field with the $set conditions. This collection can be updated periodically when you run the MapReduce operation on the original collection.
Altogether, this MapReduce method would look like:
var map = function(){
for(var i = 0; i < this.children.length; i++){
emit(
{
"_id": this._id,
"index": i
},
{
"index": i,
"children": this.children[i],
"update": {
"ref": "children." + i.toString() + ".grandchildren.$.ref",
"visible": "children." + i.toString() + ".grandchildren.$.visible"
}
}
);
}
};
var reduce = function(){};
db.collection.mapReduce(
map,
reduce,
{
"out": {
"replace": "update_collection"
}
}
);
You can then use the cursor from the db.update_collection.find() method to iterate over and update your collection accordingly:
var cur = db.update_collection.find(
{
"value.children.ref": "1.1",
"value.children.grandchildren.ref": "1.1.1"
}
);
// Iterate through results and update using the update query object set dynamically by using the array-index syntax.
while (cur.hasNext()) {
var doc = cur.next();
var update = { "$set": {} };
// set the update query object
update["$set"][doc.value.update.visible] = false;
db.collection.update(
{
"children.ref": "1.1",
"children.grandchildren.ref": "1.1.1"
},
update
);
};
I want to perform a query on this collection to determine which documents have any keys in things that match a certain value. Is this possible?
I have a collection of documents like:
{
"things": {
"thing1": "red",
"thing2": "blue",
"thing3": "green"
}
}
EDIT: for conciseness
If you don't know what the keys will be and you need it to be interactive, then you'll need to use the (notoriously performance challenged) $where operator like so (in the shell):
db.test.find({$where: function() {
for (var field in this.settings) {
if (this.settings[field] == "red") return true;
}
return false;
}})
If you have a large collection, this may be too slow for your purposes, but it's your only option if your set of keys is unknown.
MongoDB 3.6 Update
You can now do this without $where by using the $objectToArray aggregation operator:
db.test.aggregate([
// Project things as a key/value array, along with the original doc
{$project: {
array: {$objectToArray: '$things'},
doc: '$$ROOT'
}},
// Match the docs with a field value of 'red'
{$match: {'array.v': 'red'}},
// Re-project the original doc
{$replaceRoot: {newRoot: '$doc'}}
])
I'd suggest a schema change so that you can actually do reasonable queries in MongoDB.
From:
{
"userId": "12347",
"settings": {
"SettingA": "blue",
"SettingB": "blue",
"SettingC": "green"
}
}
to:
{
"userId": "12347",
"settings": [
{ name: "SettingA", value: "blue" },
{ name: "SettingB", value: "blue" },
{ name: "SettingC", value: "green" }
]
}
Then, you could index on "settings.value", and do a query like:
db.settings.ensureIndex({ "settings.value" : 1})
db.settings.find({ "settings.value" : "blue" })
The change really is simple ..., as it moves the setting name and setting value to fully indexable fields, and stores the list of settings as an array.
If you can't change the schema, you could try #JohnnyHK's solution, but be warned that it's basically worst case in terms of performance and it won't work effectively with indexes.
Sadly, none of the previous answers address the fact that mongo can contain nested values in arrays or nested objects.
THIS IS THE CORRECT QUERY:
{$where: function() {
var deepIterate = function (obj, value) {
for (var field in obj) {
if (obj[field] == value){
return true;
}
var found = false;
if ( typeof obj[field] === 'object') {
found = deepIterate(obj[field], value)
if (found) { return true; }
}
}
return false;
};
return deepIterate(this, "573c79aef4ef4b9a9523028f")
}}
Since calling typeof on array or nested object will return 'object' this means that the query will iterate on all nested elements and will iterate through all of them until the key with value will be found.
You can check previous answers with a nested value and the results will be far from desired.
Stringifying the whole object is a hit on performance since it has to iterate through all memory sectors one by one trying to match them. And creates a copy of the object as a string in ram memory (both inefficient since query uses more ram and slow since function context already has a loaded object).
The query itself can work with objectId, string, int and any basic javascript type you wish.
I'm using MongoDB and need to remove duplicate records. I have a listing collection that looks like so: (simplified)
[
{ "MlsId": "12345"" },
{ "MlsId": "12345" },
{ "MlsId": "23456" },
{ "MlsId": "23456" },
{ "MlsId": "0" },
{ "MlsId": "0" },
{ "MlsId": "" },
{ "MlsId": "" }
]
A listing is a duplicate if the MlsId is not "" or "0" and another listing has that same MlsId. So in the example above, the 2nd and 4th records would need to be removed.
How would I find all duplicate listings and remove them? I started looking at MapReduce but couldn't find an example that fit my case.
Here is what I have so far, but it doesn't check if the MlsId is "0" or "":
m = function () {
emit(this.MlsId, 1);
}
r = function (k, vals) {
return Array.sum(vals);
}
res = db.Listing.mapReduce(m,r);
db[res.result].find({value: {$gt: 1}});
db[res.result].drop();
I have not used mongoDB but I have used mapreduce. I think you are on the right track in terms of the mapreduce functions. To exclude he 0 and empty strings, you can add a check in the map function itself.. something like
m = function () {
if(this.MlsId!=0 && this.MlsId!="") {
emit(this.MlsId, 1);
}
}
And reduce should return key-value pairs. So it should be:
r = function(k, vals) {
emit(k,Arrays.sum(vals);
}
After this, you should have a set of key-value pairs in output such that the key is MlsId and the value is the number of thimes this particular ID occurs. I am not sure about the db.drop() part. As you pointed out, it will most probably delete all MlsIds instead of removing only the duplicate ones. To get around this, maybe you can call drop() first and then recreate the MlsId once. Will that work for you?
In mongodb you can use a query to restrict documents that are passed in for mapping. You probably want to do that for the ones you don't care about. Then in the reduce function you can ignore the dups and only return one of the docs for each duplicate key.
I'm a little confused about your goal though. If you just want to find duplicates and remove all but one of them then you can just create a unique index on that field and use the dropDups option; the process of creating the index will drop duplicate docs. Keeping the index will ensure that it doesn't happen again.
http://www.mongodb.org/display/DOCS/Indexes#Indexes-DuplicateValues
You can use aggregation operation to remove duplicates. Unwind, introduce a dummy $group and $sum stage and ignore the counts in your next stage. Something like this,
db.myCollection.aggregate([
{
$unwind: '$list'
},
{
$group:{
'_id':
{
'listing_id':'$_id', 'MlsId':'$list.MlsId'
},
'count':
{
'$sum':1
}
}
},
{
$group:
{
'_id':'$_id.listing_id',
'list':
{
'$addToSet':
{
'MlsId':'$_id.MlsId'
}
}
}
}
]);
this is how I following the #harri answer to remove duplicates:
//contains duplicated documents id and numeber of duplicates
db.createCollection("myDupesCollection")
res = db.sampledDB.mapReduce(m, r, { out : "myDupesCollection" });
// iterate through duplicated docs and remove duplicates (keep one)
db.myDupesCollection.find({value: {$gt: 1}}).forEach(function(myDoc){
u_id = myDoc._id.MlsId;
counts =myDoc.value;
db.sampledDB.remove({MlsId: u_id},counts-1); //if there are 3 docs, remove 3-1=2 of them
});