I have my mongodb document like this...
{
"semantics" : [
{
"text": "abc gave payment to xyz",
"action": "gave"
},
{
"text": "abc wanted quicker solution",
"action": "want"
}
],
"keyword" : [
{
"word":"payment",
"imp":0.91
},
{
"word":"solution",
"imp":0.7
}
]
}
The requirement is to find action values for those words whose importance is more than 0.9.
In the above case, payment has importance more than 0.9 and, hence, should be considered. payment exists in one of the array and the action value in it is gave.
I am requesting help in constructing mongodb query for the same.
You can first use mapReduce:
db.collection.mapReduce(
function() {
for (var i = 0; i < this.keyword.length; i++) {
if (this.keyword[i].imp >= 0.9) {
emit(this.keyword[i].word, this.semantics)
}
}
},
function(key, values) { },
{
out: {merge: 'result'},
finalize: function(key, semantics) {
var result;
for (var i = 0; i < semantics.length; i++) {
if (semantics[i].text.indexOf(key) != -1) {
result = {key: semantics[i].action};
}
}
return result;
}
)
We don't care about the reduce function here since map will return {word with imp >= 0.9: the whole semantics}.
Later, before storing the result into result collection, finialize function is called and it goes through all semantics associated with a word and fetches all actions related to texts that contain the key word.
After this, you can db.result.find() to see the result, you will see some null result because not all key has a matched text and action, you need to clean up a bit.
Related
I have a trouble with mongodb data below.
I want to get data [projects][log][subject].
so, I tried like this
$project':{_id:0, projects.log.subject:1}
but it is not correct syntax..
{
"_id": ObjectID("569f3a3e9d2540764d8bde59"),
"A": "book",
"server": "us",
"projects": [
{
"domainArray": [
{
~~~~
}
],
"log": [
{
~~~~~,
"subject": "I WANT THIS"
}
],
"before": "234234234"
},
{
"domainArray": [
{
~~~~
}
],
"log": [
{
~~~~~,
"subject": "I WANT THIS"
}
],
"before": "234234234"
},....
] //end of projects
}//end of document
How can I get data group by [subject]? I have no idea about this..
Edited-
I expected data like this
{
"subject":"first",
"subject":"second",
"subject":"third",
"subject":"~~~"
}
Is it possible? I just want to get array of subject.
Not sure whether it is your expected result or not. Can you please try this:
db.project.aggregate([
{$project:{projects:1,_id:0}},
{$unwind:"$projects"},
{$unwind:"$projects.log"},
{$project:{subject:"$projects.log.subject",_id:0}}
])
and Map-Reduce for above result of word count is as below:
//map function
var map = function() {
for(var i in this.projects)
{
for(var j in this.projects[i].log)
{
var arrayWords = this.projects[i].log[j].subject.split(" ");
for(var k = 0; k < arrayWords.length; k++)
{
emit(arrayWords[k],{occurance:1});
}
}
}
};
//reduce function
function reduce(word, arrayOccurance) {
var totalOccurance = 0;
for (var i = 0; i < arrayOccurance.length; i++)
{
totalOccurance = totalOccurance + arrayOccurance[i].occurance;
}
return { occurance:totalOccurance };
}
//combine both function into operation and output result into wordOccurance collection
db.project.mapReduce(
map,
reduce,
{
query: {"projects.log.subject":"~~~~~"},
out: "wordOccurance"
}
)
//query the result output
db.wordOccurance.find()
You can change the query under mapreduce to match your subject that want to word count. Please let me know if my mapreduce function doesn't meet your expected result.
You can also refer below two pages to create and troubleshoot map and reduce function:
Troubleshoot the Map Function
Troubleshoot the Reduce Function
I have two mongodb databases.
Development DB
{
_id:"someid",
"parent1":{
"key1":"val1",
"key2":"val2",
"key3":"val3",
"key4":"val4",
"key5":"val5",
"key6":"val6"
}
}
Production DB
{
_id:"someid",
"parent1":{
"key1":"val1",
"key2":"val2",
"key3":"val3",
"key10":"val10",
"key11":"val11",
"key12":"val12"
}
}
I want to move my Development data to production data without losing newly added keys in production.
The output should become:
{
_id:"someid",
"parent1":{
"key1":"val1",
"key2":"val2",
"key3":"val3",
"key4":"val4",
"key5":"val5",
"key6":"val6"
"key10":"val10",
"key11":"val11",
"key12":"val12"
}
}
I can't update by using db.collection.update( { _id:...} , { $set: { some_key.param2 : new_info } }, as I can't add parent to each and every key.
Depending on your eventual needs there are a couple of approaches you can take to this:
Cycle Object keys and apply updates: Being where you essentially "read" the current object and then take note of it's current state when applying individual updates per each key. Bulk operations help somewhat here:
var bulk = db.target.initializeOrderedBulkOp(),
count = 0;
db.source.find().forEach(function(doc) {
Object.keys(doc.parent1).forEach(function(key) {
var query = { "_id": doc.id };
query["parent1." + key] = { "$ne": doc.parent1[key] };
var update = { "$set": {} };
update.$set["parent1." + key] = doc.parent1[key];
bulk.find(query).updateOne(update);
query = { "_id": doc._id };
update = { "$setOnInsert": {} };
update.$setOnInsert["parent1." + key] = doc.parent1[key];
bulk.find(query).upsert().updateOne(update);
count++;
if ( count % 500 == 0 ) {
bulk.execute();
bulk = db.target.initializeOrderedBulkOp();
}
});
});
if ( count % 500 != 0 )
bulk.execute();
Use a utility to "merge" the results per key: Such as with "lodash" library as in:
db.source.find().forEach(function(doc) {
var id = doc._id;
delete doc._id;
var result = db.target.findAndModify({
"query": { "_id": id },
"update": { "$setOnInsert": doc },
"upsert": true,
"new": true
});
var merged = _.merge(result,doc);
db.target.update({ "_id": merged._id }, merged );
});
The "latter" is generally heavier in "update" and communication load though a bit lighter in overall code. You can also "tweak" this in API code where you can in fact return if the "upsert" in fact resulted in such a thing or whether the document was actually just "found", in which case a decision can be made whether to do the "merge" or not.
Of course I am "abstracting" here, as in reality you source from different "databases" and "connections" rather than just collections as is given as an example. But these are the basic model patterns to follow.
I have a time data in my Mongo database. Each document equal a minute and contain 60 seconds as objects with value for each. How to get average value of all seconds in one minute?
A document looking like that:
{
"_id" : ObjectId("55575e4062771c26ec5f2287"),
"timestamp" : "2015-05-16T18:12:00.000Z",
"values" : {
"0" : "26.17",
"1" : "26.17",
"2" : "26.17",
...
"58" : "24.71",
"59" : "25.20"
}
}
You could take two approaches here:
Changing the schema and use the aggregation framework to get the average by using the $avg operator OR
Apply Map-Reduce.
Let's look at the first option. Currently as it is, the schema will not make it possible to use the aggregation framework because of the dynamic keys in the values subdocument. The ideal schema that would favour the aggregation framework would have the values field be an array which contains embedded key/value documents like this:
/* 0 */
{
"_id" : ObjectId("5559d66c9bbec0dd0344e4b0"),
"timestamp" : "2015-05-16T18:12:00.000Z",
"values" : [
{
"k" : "0",
"v" : 26.17
},
{
"k" : "1",
"v" : 26.17
},
{
"k" : "2",
"v" : 26.17
},
...
{
"k" : "58",
"v" : 24.71
},
{
"k" : "59",
"v" : 25.20
}
]
}
With MongoDB 3.6 and newer, use the aggregation framework to tranform the hashmaps to an array by using the $objectToArray operator then use $avg to calculate the average.
Consider running the following aggregate pipeline:
db.test.aggregate([
{
"$addFields": {
"values": { "$objectToArray": "$values" }
}
}
])
Armed with this new schema, you would then need to update your collection to change the string values to int by iterating the cursor returned from the aggregate method and using bulkWrite as follows:
var bulkUpdateOps = [],
cursor = db.test.aggregate([
{
"$addFields": {
"values": { "$objectToArray": "$values" }
}
}
]);
cursor.forEach(doc => {
const { _id, values } = doc;
let temp = values.map(item => {
item.key = item.k;
item.value = parseFloat(item.v) || 0;
delete item.k;
delete item.v;
return item;
});
bulkUpdateOps.push({
"updateOne": {
"filter": { _id },
"update": { "$set": { values: temp } },
"upsert": true
}
});
if (bulkUpdateOps.length === 1000) {
db.test.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) {
db.test.bulkWrite(bulkUpdateOps);
}
If your MongoDB version does not support the $objectToArray operator in the aggregation framework, then to convert the current schema into the one above takes a bit of native JavaScript functions with the MongoDB find() cursor's forEach() function as follows (assuming you have a test collection):
var bulkUpdateOps = [],
cursor = db.test.find();
cursor.forEach(doc => {
const { _id, values } = doc;
let temp = Object.keys(values).map(k => {
let obj = {};
obj.key = k;
obj.value = parseFloat(doc.values[k]) || 0;
return obj;
});
bulkUpdateOps.push({
"updateOne": {
"filter": { _id },
"update": { "$set": { values: temp } },
"upsert": true
}
});
if (bulkUpdateOps.length === 1000) {
db.test.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) {
db.test.bulkWrite(bulkUpdateOps);
}
or
db.test.find().forEach(function (doc){
var keys = Object.keys(doc.values),
values = keys.map(function(k){
var obj = {};
obj.key = k;
obj.value = parseFloat(doc.values[k]) || 0;
return obj;
});
doc.values = values;
db.test.save(doc);
});
The collection will now have the above schema and thus follows the aggregation pipeline that will give you the average time in one minute:
db.test.aggregate([
{
"$fields": {
"average": { "$avg": "$values.value" }
}
}
])
Or for MongoDB 3.0 and lower
db.test.aggregate([
{ "$unwind": "$values" },
{
"$group": {
"_id": "$timestamp",
"average": {
"$avg": "$values.value"
}
}
}
])
For the above document, the output would be:
/* 0 */
{
"result" : [
{
"_id" : "2015-05-16T18:12:00.000Z",
"average" : 25.684
}
],
"ok" : 1
}
As for the other Map-Reduce option, the intuition behind the operation is you would use JavaScript to make the necessary transformations and calculate the final average. You would need to define three functions:
Map
When you tell Mongo to MapReduce, the function you provide as the map function will receive each document as the this parameter. The purpose of the map is to exercise whatever logic you need in JavaScript and then call emit 0 or more times to produce a reducible value.
var map = function(){
var obj = this.values;
var keys = Object.keys(obj);
var values = [];
keys.forEach(function(key){
var val = parseFloat(obj[key]);
var value = { count: 1, qty: val };
emit(this.timestamp, value);
});
};
For each document you need to emit a key and a value. The key is the first parameter to the emit function and represents how you want to group the values (in this case you will be grouping by the timestamp). The second parameter to emit is the value, which in this case is a little object containing the count of documents (always 1) and total value of each individual value object key i.e. for each second within the minute.
Reduce
Next you need to define the reduce function where Mongo will group the items you emit and pass them as an array to this reduce function It's inside the reduce function where you want to do the aggregation calculations and reduce all the objects to a single object.
var reduce = function(key, values) {
var result = {count: 0, total: 0 };
values.forEach(function(value){
result.count += value.count;
result.total += value.qty;
});
return result;
};
This reduce function returns a single result. It's important for the return value to have the same shape as the emitted values. It's also possible for MongoDB to call the reduce function multiple times for a given key and ask you to process a partial set of values, so if you need to perform some final calculation, you can also give MapReduce a finalize function.
Finalize
The finalize function is optional, but if you need to calculate something based on a fully reduced set of data, you'll want to use a finalize function. Mongo will call the finalize function after all the reduce calls for a set are complete. This would be the place to calculate the average of all the second values in a document/timestamp:
var finalize = function (key, value) {
value.average = value.total / value.count;
return value;
};
Putting It Together
With the JavaScript in place, all that is left is to tell MongoDB to execute a MapReduce:
var map = function(){
var obj = this.values;
var keys = Object.keys(obj);
var values = [];
keys.forEach(function(key){
var val = parseFloat(obj[key]);
var value = { count: 1, qty: val };
emit(this.timestamp, value);
});
};
var reduce = function(key, values) {
var result = {count: 0, total: 0 };
values.forEach(function(value){
result.count += value.count;
result.total += value.qty;
});
return result;
};
var finalize = function (key, value) {
value.average = value.total / value.count;
return value;
};
db.collection.mapReduce(
map,
reduce,
{
out: { merge: "map_reduce_example" },
finalize: finalize
}
)
And when you query the output collection map_reduce_example, db.map_reduce_example.find(), you get the result:
/* 0 */
{
"_id" : null,
"value" : {
"count" : 5,
"total" : 128.42,
"average" : 25.684
}
}
References:
A Simple MapReduce with MongoDB and C#
MongoDB docuumentation on mapReduce
This kind of data structure creates lots of conflicts and difficult to handled mongo operations. This case either you changed your schema design. But, if you not able to changed this schema then follow this :
In your schema having two major problem 1> keys dynamic and 2> values of given keys in string so you should use some programming code to calculating avg check below scripts
From ref this first calculated size of values
Object.size = function(obj) {
var size = 0,
key;
for (key in obj) {
if (obj.hasOwnProperty(key)) size++;
}
return size;
};
db.collectionName.find().forEach(function(myDoc) {
var objects = myDoc.values;
var value = 0;
// Get the size of an object
var size = Object.size(objects);
for (var key in objects) {
value = value + parseFloat(objects[key]); // parse string values to float
}
var avg = value / size
print(value);
print(size);
print(avg);
});
My document contains an array like:
{
"differentialDiagnosis" : "IART/Flutter",
"explanation" : "The rhythm.",
"fileName" : "A115a JPEG.jpg",
"history" : "1 year old with fussiness",
"interpretationList" : [
{
"interpretations" : [
ObjectId("54efe7c8d6d5ca3d5c580a22"),
ObjectId("54efe80bd6d5ca3d5c580a26")
]
},
{
"interpretations" : [
ObjectId("54efe80bd6d5ca3d5c580a26"),
ObjectId("54efe82ad6d5ca3d5c580a28")
]
}
],
}
and I want to remove all occurrences of ObjectId("54efe80bd6d5ca3d5c580a26"),
but I write a query:
db.ekgs.update({'interpretationList.interpretations':ObjectId("54c09fb3581c4c8c218d1a40")}, {$pull:{ 'interpretationList.$.interpretations':{ ObjectId("54c09fb3581c4c8c218d1a40")}})
This removes only first occurrence of ObjectId("54efe80bd6d5ca3d5c580a26").
The reason your query is only removing the first occurrence is because, as explained in this page in the documentation, "the positional $ operator acts as a placeholder for the first element that matches the query document".
The problem is that it is really tricky to deal with these types of updates with schema having embedded arrays in embedded objects in embedded arrays. In order to get around this problem, if you are able to flatten the schema, then your update becomes much easier. So if instead, your document looked like this:
{
"differentialDiagnosis" : "IART/Flutter",
"explanation" : "The rhythm.",
"fileName" : "A115a JPEG.jpg",
"history" : "1 year old with fussiness",
"interpretations" : [
ObjectId("54efe7c8d6d5ca3d5c580a22"),
ObjectId("54efe80bd6d5ca3d5c580a26"),
ObjectId("54efe82ad6d5ca3d5c580a28")
]
}
Then your query would be as simple as the one below. (Remember to add { "multi": true } as an option if you want to update multiple documents).
db.ekgs.update(
{ "interpretations": ObjectId("54efe80bd6d5ca3d5c580a26")},
{ "$pull": { "interpretations": ObjectId("54efe80bd6d5ca3d5c580a26") }}
);
But I understand that you might not be able to change the schema. In that case, you can try a solution that requires a small script. In the mongo shell, you can use the following bit of JavaScript to do the operation.
// Get cursor with documents requiring updating.
var oid = ObjectId("54efe80bd6d5ca3d5c580a26");
var c = db.ekgs.find({ "interpretationList.interpretations": oid });
// Iterate through cursor, removing oid from each subdocument in interpretationList.
while (c.hasNext()) {
var isModified = false;
var doc = c.next();
var il = doc.interpretationList;
for (var i in il) {
var j = il[i].interpretations.length;
while (j--) {
// If oid to remove is present, remove it from array
// and set flag that the document has been modified.
if (il[i].interpretations[j].str === oid.str) {
il[i].interpretations.splice(j, 1);
isModified = true;
}
}
}
// If modified, update interpretationList for document.
if (isModified) {
db.ekgs.update({ "_id": doc._id }, { "$set": { "interpretationList": il }});
}
}
UPDATE: Example of how it might work using the Node.js driver.
// Get cursor with documents requiring updating.
var oid = new ObjectID("54efe80bd6d5ca3d5c580a26");
var ekgs = db.collection("ekgs");
ekgs.find({ "interpretationList.interpretations": oid },
function(err, c) {
if(err) throw err;
// Iterate through cursor, removing oid from each subdocument in interpretationList.
c.each(function(err, doc) {
if (err) throw err;
// If doc is null then the cursor is exhausted/empty and closed.
if (doc != null) {
var isModified = false;
var il = doc.interpretationList;
for (var i in il) {
var j = il[i].interpretations.length;
while (j--) {
// If oid to remove is present, remove it from array
// and set flag that the document has been modified.
if (il[i].interpretations[j].equals(oid)) {
il[i].interpretations.splice(j, 1);
isModified = true;
}
}
}
// If modified, update interpretationList for document.
if (isModified) {
ekgs.update({ "_id": doc._id },
{ "$set": { "interpretationList": il }},
function(err, res) {
if (err) throw err;
// Callback.
console.log(res);
});
}
}
});
});
I have recorded changes from an information system in a mongo database. Every time a set of values are set or changed, a record is saved in the mongo database.
The change collection is in the following form:
{ "user_id": 1, "timestamp": { "date" : "2010-09-22 09:28:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "valueA", "fieldB": "valueB", "fieldC": "valueC" } }
{ "user_id": 1, "timestamp": { "date" : "2010-09-24 19:01:52", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "new_valueA", "fieldB": null, "fieldD": "valueD" } }
{ "user_id": 1, "timestamp": { "date" : "2010-10-01 11:11:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldD": "new_valueD" } }
Of course there are thousands of records per user with different attributes which represent millions of records. What I want to do is to see a user status at a given time. By example, the user_id 1 at 2010-09-30 would be
fieldA: new_valueA
fieldC: valueC
fieldD: valueD
This means I need to flatten all the changes prior to a given date for a given user into a single record. Can I do that directly in mongo ?
Edit: I am using the 2.0 version of mongodb hence cannot benefit from the aggregation framework.
Edit: It sounds I have found the answer to my question.
var mapTimeAndChangesByUserId = function() {
var key = this.user_id;
var value = { timestamp: this.timestamp.date, changes: this.changes };
emit(key, value);
}
var reduceMergeChanges = function(user_id, changeset) {
var mergeFunction = function(a, b) { for (var attr in b) a[attr] = b[attr]; };
var result = {};
changeset.forEach(function(e) { mergeFunction(result, e.changes); });
return { timestamp: changeset.pop().timestamp, changes: result };
}
The reduce function merges the changes in the order they come and returns the result.
db.user_change.mapReduce(
mapTimeAndChangesByUserId,
reduceMergeChanges,
{
out: { inline: 1 },
query: { user_id: 1, "timestamp.date": { $lt: "2010-09-30" } },
sort: { "timestamp.date": 1 }
});
'results' : [
"_id": 1,
"value": {
"timestamp": "2010-09-24 19:01:52",
"changes": {
"fieldA": "new_valueA",
"fieldB": null,
"fieldC": "valueC",
"fieldD": "valueD"
}
}
]
Which is fine to me.
You could write a MR to do this.
Since the fields are a lot like tags you can modify a nice cookbook example of counting tags here: http://cookbook.mongodb.org/patterns/count_tags/ of course instead of counting you want the latest value applied (assumption since this is not clear in your question) for that field.
So lets get our map function:
map = function() {
if (!this.changes) {
// If there were not changes for some reason lets bail this record
return;
}
// We iterate the changes
for (index in this.changes) {
emit(index /* We emit the field name */, this.changes[index] /* We emit the field value */);
}
}
And now for our reduce:
reduce = function(values){
// This part is dependant upon your input query. If you add a sort of
// date (ts) DESC then you will prolly want the first index (0) not the last as
// gathered here by values.length
return values[values.length];
}
And this will output a single document per field change of the type:
{
_id: your_field_ie_fieldA,
value: whoop
}
You can then iterate the end of the (most likely) in line output and, bam, you have your changes.
This is of course one way of dong it and is not designed to be run completely in line to your app, however that all depends on the size of the data your working on; it could be run very close.
I am unsure whether the group and distinct can run on this but it looks like it might: http://docs.mongodb.org/manual/reference/method/db.collection.group/#db-collection-group however I should note that group is basically a MR wrapper but you could do something like (untested just like the MR above):
db.col.group( {
key: { 'changes.fieldA': 1, // the rest of the fields },
cond: { 'timestamp.date': { $gt: new Date( '01/01/2012' ) } },
reduce: function ( curr, result ) { },
initial: { }
} )
But it does require you to define the keys instead of just iterating them programmably (maybe a better way).