PyMongo query field of documents - mongodb

In my DynamoDB every document has several fields, one of the fields is a document called "engines" that holds several documents (all the engines) that hold several fields, as the picture shows below:
I would like to get all the couples of (engine,definitions) that their definition date is greater than a specific date.
I tried:
cursor=collection.find(
{'engines': { "$elemMatch" :
{ "definitions" :
{'$gt': startdate} } } }
,{'engines':{'$elemMatch':1}},{'engines':{'$elemMatch':{'definitions':1}}} )
but I get:
TypeError: skip must be an instance of int
Can someone help with the query?

You've mixed up the closing } and ended up passing {'engines':{'$elemMatch':{'definitions':1}}} as a skip argument value.
I think you meant:
cursor = collection.find(
{
'engines': {
"$elemMatch": {
"definitions": {
'$gt': startdate
}
}
}
},
{
'engines': {
'$elemMatch': {
'definitions': 1
}
}
}
)

Related

mongodb query: nested elemMatch

Currently, that's my current document:
{
_id: 'd283015f-91e9-4404-9202-093c28d6a931',
referencedGeneralPractitioner: [
{
resourceType: 'practitioner',
cachedIdentifier: [
{
system: { value: 'urn:oid:1.3.6.1.4.1.19126.3' },
value: { value: '14277399B' }
}
]
}
]
}
Here, there's two nested objects arrays: referencedGeneralPractitioner[{cachedIdentifier[{}]}].
Currently, I'm getting results using this query:
{
"referencedGeneralPractitioner":{
"$elemMatch":{
"cachedIdentifier.value.value":"14277399B",
"cachedIdentifier.system.value":"urn:oid:1.3.6.1.4.1.19126.3"
}
}
}
It's getting my desired document, but I don't quite figure out if above query is which I'm really looking for.
I mean, I'm only applying $elemMatch on referencedGeneralPractitioner field array.
Is it really enought?
Should I add a nested $elemMatch on cachedIdentifier?
Any ideas?
It looks like you need to query it like this:
db.collection.find({
"referencedGeneralPractitioner.cachedIdentifier": {
"$elemMatch": {
"value.value": "14277399B",
"system.value": "urn:oid:1.3.6.1.4.1.19126.3"
}
}
})
playground
This is in case you need to find the full document having $and of both values in same element in any of the elements in the nested array , if you need to extract specific element you will need to $filter
if you need to search also based on element in the 1st array level then you need to modify as follow:
{
"referencedGeneralPractitioner": {
"$elemMatch": {
resourceType: 'practitioner',
"cachedIdentifier": {
"$elemMatch": {
"value.value": 1,
"system.value":2
}
}
}
}
}
This will give you all full documents where at same time there is resouceType:"practitioner" and { value.value:3 and system.value: 2 }
Also is important to stress that this will not gona work correctly!:
{
"referencedGeneralPractitioner":{
"$elemMatch":{
"cachedIdentifier.value.value":"14277399B",
"cachedIdentifier.system.value":"urn:oid:1.3.6.1.4.1.19126.3"
}
}
}
Since it will match false positives based on any single value in the nested elements like:
wrong playground

How to project whether field exists

If I have documents with similar structure as below. I am updating them with the results of the computations and I want to know whether the result has already been inserted into a document or not. Let's say for each document I run computation 'c' and computation 'd'. Now I want to display a table of all documents and show whether a computation 'd' has been already carried out. And for this table I do not care about computation 'c'.
{
"_id":1
"a":1,
"resultsOfComputation":{
"c":{large embedded document},
"d":{large embedded document}
}
}
{
"_id":2
"a":1,
"resultsOfComputation":{
"c":{large embedded document}
}
}
I would like to get a result that tells me whether a document contains a specific field. For example, I would like to know whether it contains field "resultsOfComputation.d", no matter what is the value of that field.
An example of the result of the query for "resultsOfComputation.d" would be:
{
"_id":1
"a":1,
"resultsOfComputation":{
"d":true
}
}
{
"_id":2
"resultsOfComputation":{
"d":false
}
}
If "resultsOfComputation.d" is not in the document it can also be undefined, which is also ok:
{
"_id":1
"a":1,
"resultsOfComputation":{
"d":true
}
}
{
"_id":2
"a":1,
"resultsOfComputation":{}
}
In general, the idea is to get all the root elements of the documents, but only true/false/undefined for the selected (one) result of computation, since the result of computation is a large embedded document.
Run the following aggregation pipeline to get the desired results:
db.collection.aggregate([
{
"$project": {
"a": 1,
"resultsOfComputation": {
"d": { "$gt": ["$resultsOfComputation.d", null] }
}
}
}
])
Sample Output
/* 1 */
{
"_id" : 1,
"a" : 1,
"resultsOfComputation" : {
"d" : true
}
}
/* 2 */
{
"_id" : 2,
"a" : 1,
"resultsOfComputation" : {
"d" : false
}
}

Finding Documents Based on Date inside Array

I've got a schema similar to this:
{
'policy': [
{ 'date' : { type: Date } }
]
}
I'm attempting to find these based on the first policy's date, based on a range. At first when my queries like this were silently failing (i.e. returning no results):
{
'$and' : [
{ 'policy.0.date' : { '$gt' : <lower-bound Date> } },
{ 'policy.0.date' : { '$lt' : <higher-bound Date> } }
]
}
I wasn't getting any errors, and I knew there were records there, so I pared it down to looking for all who were above a certain date:
{ 'policy.0.date' : { $gt : startDate } }
After attempting that, I got this error:
[Error: Can't use $gt with Array.]
I've even tried to querying without being based on the first element { 'policy.date' : { $gt : startDate } }, but that returns no records either.
Any ideas on how to query on the date field without resorting to $where?
It would seem actual DB queries return correctly, the issues lies in the version of mongoose used.

Merge changeset documents in a query

I have recorded changes from an information system in a mongo database. Every time a set of values are set or changed, a record is saved in the mongo database.
The change collection is in the following form:
{ "user_id": 1, "timestamp": { "date" : "2010-09-22 09:28:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "valueA", "fieldB": "valueB", "fieldC": "valueC" } }
{ "user_id": 1, "timestamp": { "date" : "2010-09-24 19:01:52", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "new_valueA", "fieldB": null, "fieldD": "valueD" } }
{ "user_id": 1, "timestamp": { "date" : "2010-10-01 11:11:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldD": "new_valueD" } }
Of course there are thousands of records per user with different attributes which represent millions of records. What I want to do is to see a user status at a given time. By example, the user_id 1 at 2010-09-30 would be
fieldA: new_valueA
fieldC: valueC
fieldD: valueD
This means I need to flatten all the changes prior to a given date for a given user into a single record. Can I do that directly in mongo ?
Edit: I am using the 2.0 version of mongodb hence cannot benefit from the aggregation framework.
Edit: It sounds I have found the answer to my question.
var mapTimeAndChangesByUserId = function() {
var key = this.user_id;
var value = { timestamp: this.timestamp.date, changes: this.changes };
emit(key, value);
}
var reduceMergeChanges = function(user_id, changeset) {
var mergeFunction = function(a, b) { for (var attr in b) a[attr] = b[attr]; };
var result = {};
changeset.forEach(function(e) { mergeFunction(result, e.changes); });
return { timestamp: changeset.pop().timestamp, changes: result };
}
The reduce function merges the changes in the order they come and returns the result.
db.user_change.mapReduce(
mapTimeAndChangesByUserId,
reduceMergeChanges,
{
out: { inline: 1 },
query: { user_id: 1, "timestamp.date": { $lt: "2010-09-30" } },
sort: { "timestamp.date": 1 }
});
'results' : [
"_id": 1,
"value": {
"timestamp": "2010-09-24 19:01:52",
"changes": {
"fieldA": "new_valueA",
"fieldB": null,
"fieldC": "valueC",
"fieldD": "valueD"
}
}
]
Which is fine to me.
You could write a MR to do this.
Since the fields are a lot like tags you can modify a nice cookbook example of counting tags here: http://cookbook.mongodb.org/patterns/count_tags/ of course instead of counting you want the latest value applied (assumption since this is not clear in your question) for that field.
So lets get our map function:
map = function() {
if (!this.changes) {
// If there were not changes for some reason lets bail this record
return;
}
// We iterate the changes
for (index in this.changes) {
emit(index /* We emit the field name */, this.changes[index] /* We emit the field value */);
}
}
And now for our reduce:
reduce = function(values){
// This part is dependant upon your input query. If you add a sort of
// date (ts) DESC then you will prolly want the first index (0) not the last as
// gathered here by values.length
return values[values.length];
}
And this will output a single document per field change of the type:
{
_id: your_field_ie_fieldA,
value: whoop
}
You can then iterate the end of the (most likely) in line output and, bam, you have your changes.
This is of course one way of dong it and is not designed to be run completely in line to your app, however that all depends on the size of the data your working on; it could be run very close.
I am unsure whether the group and distinct can run on this but it looks like it might: http://docs.mongodb.org/manual/reference/method/db.collection.group/#db-collection-group however I should note that group is basically a MR wrapper but you could do something like (untested just like the MR above):
db.col.group( {
key: { 'changes.fieldA': 1, // the rest of the fields },
cond: { 'timestamp.date': { $gt: new Date( '01/01/2012' ) } },
reduce: function ( curr, result ) { },
initial: { }
} )
But it does require you to define the keys instead of just iterating them programmably (maybe a better way).

Sorting by relevance with MongoDB

I have a collection of documents in the following form:
{ _id: ObjectId(...)
, title: "foo"
, tags: ["bar", "baz", "qux"]
}
The query should find all documents with any of these tags. I currently use this query:
{ "tags": { "$in": ["bar", "hello"] } }
And it works; all documents tagged "bar" or "hello" are returned.
However, I want to sort by relevance, i.e. the more matching tags the earlier the document should occur in the result. For example, a document tagged ["bar", "hello", "baz"] should be higher in the results than a document tagged ["bar", "baz", "boo"] for the query ["bar", "hello"]. How can I achieve this?
MapReduce and doing it client-side is going to be too slow - you should use the aggregation framework (new in MongoDB 2.2).
It might look something like this:
db.collection.aggregate([
{ $match : { "tags": { "$in": ["bar", "hello"] } } },
{ $unwind : "$tags" },
{ $match : { "tags": { "$in": ["bar", "hello"] } } },
{ $group : { _id: "$title", numRelTags: { $sum:1 } } },
{ $sort : { numRelTags : -1 } }
// optionally
, { $limit : 10 }
])
Note the first and third pipeline members look identical, this is intentional and needed. Here is what the steps do:
pass on only documents which have tag "bar" or "hello" in them.
unwind the tags array (meaning split into one document per tags element
pass on only tags exactly "bar" or "hello" (i.e. discard the rest of the tags)
group by title (it could be also by "$_id" or any other combination of original document
adding up how many tags (of "bar" and "hello") it had
sort in descending order by number of relevant tags
(optionally) limit the returned set to top 10.
You could potentially use MapReduce for something like that. You'd process each document in the Map step, figuring out how many tags match the query, and assign a score. Then you could sort based on that score.
http://www.mongodb.org/display/DOCS/MapReduce
Something that complex should be done after querying. Either server-side through db.eval (if your client supports this) or just clientside. Here's an example for what you're looking for.
It will retreive all posts with the tags you specified, then sorts them according to the amount of matches.
remove the db.eva( part and translate it to the language your client uses to query to get the clientside effect (
db.eval(function () {
var tags = ["a","b","c"];
return db.posts.find({tags:{$in:tags}}).toArray().sort(function(a,b){
var matches_a = 0;
var matches_b = 0;
a.tags.forEach(function (tag) {
for (t in tags) {
if (tag == t) {
matches_a++;
} else {
matches_b++;
}
}
});
b.tags.forEach(function(tag) {
for (t in tags) {
if (tag == t) {
matches_b++;
} else {
matches_a++;
}
}
});
return matches_a - matches_b;
});
});