I have a Mongo collection that contains data sets with a pmID and a replyID, where the replyID can be null or a pmID. I'm trying to figure out, if I select by a pmID, how I can also get the PM where the pmID = replyID, and if that set's replyID isn't null, the object that matches that, etc.
{
pmID: 1,
replyID: null
},
{
pmID: 2,
replyID: null
},
{
pmID: 3,
replyID: 1
},
{
pmID: 4,
replyID: 3
}
So if I selected on pmID = 4, I also wanna get 3 and 1. I'm debating just doing a one by one select, but I'm hoping there's an easier answer.
Not really sure of your use case here, but from a glance I would personally model to store the "ancestors" in order with the object like so:
{
"pmId": 1,
"replyIds": [ ]
},
{
"pmId": 3,
"replyIds": [ 1 ]
},
{
"pmId": 4,
"replyIds": [ 3, 1 ]
}
In either that order or in reverse depending on how you implement the ancestor history.
The point being that when issuing a "reply" to an existing object, you should already have the data loaded that tells you if there are any existing replies. The new object then pushes the id of the the object it is being issued in reply to onto the the existing "replyIds" array when that new object is created:
// data is "pmId": 3 object
data.replyIds.unshift( data.pmId );
data.pmId = getNewPmId(); // whatever
db.collection.insert(data);
It's a fairly simple pattern which obviates the need to "recursively walk" through queries on read requests, and also allows for simple writes.
Retrieving all ancestors is just a simple matter of passing the array of replies to $in:
var data = db.collection.findOne({ "pmId": 4 });
db.collection.find({ "pmId": { "$in": data.replyIds } })
In addition to Neil's suggestion to use ancestor arrays, if you're modelling conversations you can group together conversations with a conversation_id and recall all messages connected to a given one by conversation_id:
{
"pmId": 1,
"conversationId" : 0
},
{
"pmId": 3,
"conversationId" : 0
},
{
"pmId": 4,
"conversationId" : 0
}
> var pm = db.test.findOne({ "pmId" : 3 })
> db.test.find({ "conversationId" : pm.conversationId })
Related
Is there any way that will only return the field of the store that has grade B?
I tried: db.restaurant.find( {"Result.Grade" : "B"} )
But it will return back all document content with grade B.
Thanks!!
First argument in the find query is for filter. Use second argument and pass the fields you want to retrieve.
db.restaurant.find(
{ "Result.Grade" : "B" }, //filter
{ "Name": 1, "Number": 1 } //projection
)
With the mongodb node driver use .project() cursor method
db.restaurant.find(
{ "Result.Grade" : "B" },
).project({ "Name": 1, "Number": 1 })
I have a collection of items,
[ a, b, c, d ]
And I want to group them in pairs such as,
[ [ a, b ], [ b, c ], [ c, d ] ]
This will be used in calculating the differences between each item in the original collection, but that part is solved using several techniques such as the one in this question.
I know that this is possible with map reduce, but I want to know if it's possible with aggregation.
Edit: Here's an example,
The collection of items; each item is an actual document.
[
{ val: 1 },
{ val: 3 },
{ val: 6 },
{ val: 10 },
]
Grouped version:
[
[ { val: 1 }, { val: 3 } ],
[ { val: 3 }, { val: 6 } ],
[ { val: 6 }, { val: 10 } ]
]
The resulting collection (or aggregation result):
[
{ diff: 2 },
{ diff: 3 },
{ diff: 4 }
]
This is something that just cannot be done with the aggregation framework, and the only current MongoDB method available for this type of operation is mapReduce.
The reason being that the a aggregation framework has no way of referring to any other document in the pipeline than the present one. This actually applies to "grouping" pipeline stages as well, since even though things are grouped on a "key" you cant really deal with individual documents in the way you want to.
MapReduce on the other hand has one feature available that allows you to do what you want here, and it's not even "directly" related to aggregation. It is in fact the ability to have "globally scoped variables" across all stages. And having a "variable" to basically "store the last document" is all you need to achieve your result.
So it's quite simple code, and there is in fact no "reduction" required:
db.collection.mapReduce(
function () {
if (lastVal != null)
emit( this._id, this.val - lastVal );
lastVal = this.val;
},
function() {}, // mapper is not called
{
"scope": { "lastVal": null },
"out": { "inline": 1 }
}
)
Which gives you a result much like this:
{
"results" : [
{
"_id" : ObjectId("54a425a99b8bcd6f73e2d662"),
"value" : 2
},
{
"_id" : ObjectId("54a425a99b8bcd6f73e2d663"),
"value" : 3
},
{
"_id" : ObjectId("54a425a99b8bcd6f73e2d664"),
"value" : 4
}
],
"timeMillis" : 3,
"counts" : {
"input" : 4,
"emit" : 3,
"reduce" : 0,
"output" : 3
},
"ok" : 1
}
That's really just picking "something unique" as the emitted _id value rather than anything specific, because all this is really doing is the difference between values on differing documents.
Global variables are usually the solution to these types of "pairing" aggregations or producing "running totals". Right now the aggregation framework has no access to global variables, even though it might well be a nice this to have. The mapReduce framework has them, so it is probably fair to say that they should be available to the aggregation framework as well.
Right now they are not though, so stick with mapReduce.
My database looks like this:
{
_id: 1,
values: [ 1, 2, 3, 4, 5 ]
},
{
_id: 2,
values: [ 2, 4, 6, 8, 10 ]
}, ...
I'd like to update every value in every document's nested array ("values") that meets some criterion. For instance, I'd like to increment every value that's >= 4 by one, which ought to yield:
{
_id: 1,
values: [ 1, 2, 3, 5, 6 ]
},
{
_id: 2,
values: [ 2, 5, 7, 8, 11 ]
}, ...
I'm used to working with SQL, where the nested array would be a seperated table connected with a unique ID. I'm a little lost in this new NoSQL world.
Thank you kindly,
This sort of update is not really possible using nested arrays, the reason for this is given in the positional $ operator documentation, and that states that you can only match the first array element for a given condition in the query.
So a statement like this:
db.collection.update(
{ "values": { "$gte": 4 } },
{ "$inc": { "values.$": 1 } }
)
Will not work in the sense that only the "first" array element that was matched would be incremented. So on your first document you would get this:
{ "_id" : 1, "values" : [ 1, 2, 3, 6, 6 ] }
In order to update the values as you are suggesting you would need to iterate the documents and the array elements to produce the result:
db.collecction.find({ "values": { "$gte": 4 } }).forEach(function(doc) {
for ( var i=0; i < doc.values.length; i++ ) {
if ( doc.values[i] >= 4 ) {
doc.values[i]++;
}
}
db.collection.update(
{ "_id": doc._id },
{ "$set": { "values": doc.values } }
);
})
Or whatever code equivalent of that basic concept.
Generally speaking, this sort of update does not lend itself well to a structure that contains elements in an array. If that is really your need, then the elements are better off listed within a separate collection.
Then again, the presentation of this question is more of a "hypothetical" situation without understanding your actual use case for performing this sort of udpate. So if you possibly described what you actually need to do and how your data really looks in another question, then that might get a more meaningful response in terms of the best approach for you to use.
I have recorded changes from an information system in a mongo database. Every time a set of values are set or changed, a record is saved in the mongo database.
The change collection is in the following form:
{ "user_id": 1, "timestamp": { "date" : "2010-09-22 09:28:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "valueA", "fieldB": "valueB", "fieldC": "valueC" } }
{ "user_id": 1, "timestamp": { "date" : "2010-09-24 19:01:52", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "new_valueA", "fieldB": null, "fieldD": "valueD" } }
{ "user_id": 1, "timestamp": { "date" : "2010-10-01 11:11:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldD": "new_valueD" } }
Of course there are thousands of records per user with different attributes which represent millions of records. What I want to do is to see a user status at a given time. By example, the user_id 1 at 2010-09-30 would be
fieldA: new_valueA
fieldC: valueC
fieldD: valueD
This means I need to flatten all the changes prior to a given date for a given user into a single record. Can I do that directly in mongo ?
Edit: I am using the 2.0 version of mongodb hence cannot benefit from the aggregation framework.
Edit: It sounds I have found the answer to my question.
var mapTimeAndChangesByUserId = function() {
var key = this.user_id;
var value = { timestamp: this.timestamp.date, changes: this.changes };
emit(key, value);
}
var reduceMergeChanges = function(user_id, changeset) {
var mergeFunction = function(a, b) { for (var attr in b) a[attr] = b[attr]; };
var result = {};
changeset.forEach(function(e) { mergeFunction(result, e.changes); });
return { timestamp: changeset.pop().timestamp, changes: result };
}
The reduce function merges the changes in the order they come and returns the result.
db.user_change.mapReduce(
mapTimeAndChangesByUserId,
reduceMergeChanges,
{
out: { inline: 1 },
query: { user_id: 1, "timestamp.date": { $lt: "2010-09-30" } },
sort: { "timestamp.date": 1 }
});
'results' : [
"_id": 1,
"value": {
"timestamp": "2010-09-24 19:01:52",
"changes": {
"fieldA": "new_valueA",
"fieldB": null,
"fieldC": "valueC",
"fieldD": "valueD"
}
}
]
Which is fine to me.
You could write a MR to do this.
Since the fields are a lot like tags you can modify a nice cookbook example of counting tags here: http://cookbook.mongodb.org/patterns/count_tags/ of course instead of counting you want the latest value applied (assumption since this is not clear in your question) for that field.
So lets get our map function:
map = function() {
if (!this.changes) {
// If there were not changes for some reason lets bail this record
return;
}
// We iterate the changes
for (index in this.changes) {
emit(index /* We emit the field name */, this.changes[index] /* We emit the field value */);
}
}
And now for our reduce:
reduce = function(values){
// This part is dependant upon your input query. If you add a sort of
// date (ts) DESC then you will prolly want the first index (0) not the last as
// gathered here by values.length
return values[values.length];
}
And this will output a single document per field change of the type:
{
_id: your_field_ie_fieldA,
value: whoop
}
You can then iterate the end of the (most likely) in line output and, bam, you have your changes.
This is of course one way of dong it and is not designed to be run completely in line to your app, however that all depends on the size of the data your working on; it could be run very close.
I am unsure whether the group and distinct can run on this but it looks like it might: http://docs.mongodb.org/manual/reference/method/db.collection.group/#db-collection-group however I should note that group is basically a MR wrapper but you could do something like (untested just like the MR above):
db.col.group( {
key: { 'changes.fieldA': 1, // the rest of the fields },
cond: { 'timestamp.date': { $gt: new Date( '01/01/2012' ) } },
reduce: function ( curr, result ) { },
initial: { }
} )
But it does require you to define the keys instead of just iterating them programmably (maybe a better way).
I am trying to count word usage using MongoDB. My collection currently looks like this:
{'_id':###, 'username':'Foo', words:[{'word':'foo', 'count':1}, {'word':'bar', 'count':1}]}
When a new post is made, I extract all the new words to an array but I'm trying to figure out to upsert to the words array and increment the count if the word already exists.
In the example above, for example, if the user "Foo" posted "lorem ipsum foo", I'd add "lorem" and "ipsum" to the users words array but increment the count for "foo".
Is this possible in one query? Currently I am using addToSet:
'$addToSet':{'words':{'$each':word_array}}
But that doesn't seem to offer any way of increasing the words count.
Would very much appreciate some help :)
If you're willing to switch from a list to hash (object), you can atomically do this.
From the docs: "$inc ... increments field by the number value if field is present in the object, otherwise sets field to the number value."
{ $inc : { field : value } }
So, if you could refactor your container and object:
words: [
{
'word': 'foo',
'count': 1
},
...
]
to:
words: {
'foo': 1,
'other_word: 2,
...
}
you could use the operation update with:
{ $inc: { 'words.foo': 1 } }
which would create { 'foo': 1 } if 'foo' doesn't exist, else increment foo.
E.g.:
$ db.bar.insert({ id: 1, words: {} });
$ db.bar.find({ id: 1 })
[
{ ..., "words" : { }, "id" : 1 }
]
$ db.bar.update({ id: 1 }, { $inc: { 'words.foo': 1 } });
$ db.bar.find({ id: 1 })
[
{ ..., "id" : 1, "words" : { "foo" : 1 } }
]
$ db.bar.update({ id: 1 }, { $inc: { 'words.foo': 1 } });
$ db.bar.find({ id: 1 })
[
{ ..., "id" : 1, "words" : { "foo" : 2 } }
]
Unfortunately it is not possible to do this in a single update with your schema. Your schema is a bit questionable and should probably be converted to having a dedicated collection with word counters, e.g :
db.users {_id:###, username:'Foo'}
db.words.counters {_id:###, word:'Word', userId: ###, count: 1}
That will avoid quite a few issues such as :
Running into maximum document size limits
Forcing mongo to keep moving around your documents as you increase their size
Both scenarios require two updates to do what you want which introduces atomicity issues. Updating per word by looping through word_array is better and safer (and is possible with both solutions).