Pymongo Query Filtering a versioned data - mongodb

Here's a sample data from my database:
{'data':
[{'is_active': True, 'printer_name': 'A', 'vid': 14510},
{'is_active': True, 'printer_name': 'Changed A', 'vid': 14511}
]
},
{'data':
[{'is_active': False, 'printer_name': 'B', 'vid': 14512}]
}
The vid field here is the version id. Whenever a record is edited, the modified data is pushed into the list and it therefore has a higher vid than its old version.
Now I want to define a method called get_all_active_printers which returns all printers with is_active :True
Here's my attempt, but it returns both printers when it should not return printer B
def get_all_active_printers():
return printers.find(
{'data.is_active': True}, {"data": {"$slice": -1}, '_id': 0, 'vid': 0})
Whats wrong with my query ?
Edit 1 in response to comment by WanBachtiar
Here's the actual output from using the command print([c for c in get_all_active_printers()])
[{'data': [{'printer_name': 'Changed A', 'vid': 1451906336.6602068, 'is_active': True, 'user_id': ObjectId('566bbf0d680fdc1ac922be4c')}]}, {'data': [{'printer_name': 'B', 'vid': 1451906343.8941162, 'is_active': False, 'user_id': ObjectId('566bbf0d680fdc1ac922be4c')}]}]
As you can see in the actual output - the is_active value for Printer B is False, but get_all_active_printers still returns B
Here's my version details:
Python 3.4.3
pymongo 3.2
mongodb 2.4.9
On Ubuntu 14.04, if that matters.
Edit 2
Noticed yet another issue. The query is returning vid field, even though have clearly mentioned 'vid': 0 in the projection.
* Edit 3*
I am not sure by what you mean when you say
"make sure that there is no other documents for {'printer_name': 'B'}"
. Yes the second data (on printer B) has a second row. That was the first data - when the printer was created when the field is_active was true. Later it becomes false. Here's the snapshot of the database:
But I want to filter on the latest data as old data is only for keeping an audit trail.
If i move 'data.is_active': True to the projections as in the following code:
def get_all_active_printers():
return printers.find(
{}, {'data': {'$slice': -1}, 'data.is_active': True, '_id': 0, 'vid': 0})
I get the following error message:
pymongo.errors.OperationFailure: database error: You cannot currently
mix including and excluding fields. Contact us if this is an issue.
So how do i filter based on the latest data, given the snapshot above ? Sorry if my question did not make it clear earlier.

Thanks for clarifying the question.
So you are wanting to query documents that have only the latest element with is_active: True.
Unfortunately, in your case find({'data.is_active': True}) would find all documents containing any data element with is_active:True, not just the last element in the array. Also, without knowing the length of the array, you cannot reference the last element of the array using the array.i syntax.
However there are other ways/alternatives:
Update using $push, $each and $position to insert new elements to the front of the array. Mongo Shell example:
/* When updating */
db.printers.update(
/* Filter document for printer B */
{"data.printer_name": 'B'},
/* Push a new document to the front of the array */
{"$push": {
"data": {
$each: [{'is_active': false, 'printer_name': "B", 'vid': 14513 }],
$position: 0
}
}
}
);
/* When querying now you know that the latest one is on index 0 */
db.printers.find(
{"data.0.is_active": true},
{"data": { $slice: 1} }
);
Note that $position is new in MongoDB v2.6.
Use MongoDB aggregation to $unwind the array, $group then $match to filter. For example:
db.printers.aggregate([
{$unwind: '$data' },
{$sort: { 'data.vid' : 1 } },
{$group: {
'_id': { 'printer_name' : '$data.printer_name', id:'$_id' },
'vid': { $max: '$data.vid' },
'is_active' : { $last: '$data.is_active' }
}
},
{$match:{"is_active": true}}
]);
It may be beneficial for you to re-consider the document schema. For example, instead of having an array of documents maybe you should consider having them flat for ease of querying.
{'is_active': true, 'printer_name': 'A', 'vid': 14512}
{'is_active': false, 'printer_name': 'B', 'vid': 14513}
For more examples and discussions of different version tracking schema designs, see these blog posts:
How to track versions with MongoDB
Further thought on how to track versions with MongoDB
Also a useful reference on schema designs : Data Modeling Introduction.
The query is returning vid field, even though have clearly mentioned
'vid': 0 in the projection.
You could hide it with "data.vid": 0 instead of vid:0.
If i move 'data.is_active': True to the projections as in the following code... I get the following error message.
There are rules of projections that you have to follow. Please see projecting fields from query results for more information on projections.
If you are starting a new project, I would recommend to use the latest stable release of MongoDB, currently it is v3.2.0.
Regards,
Wan.

Related

Mongo best practice to structure nested document array

I've been struggling to find a solution to the following problem and seem to get conflicting advice from various mongodb posts. I am trying to figure out how to correctly represent an "array" of sub-objects such that:
they can be upserted (i.e. updated or new element created if needed, in a single operation)
the ids of the objects are available as values that can be searched, not just keys (that you can't really search in mongo).
I have a structure that I can represent as an array (repr A):
{
_id: 1,
subdocs: [
{ sd_id: 1, title: t1 },
{ sd_id: 2, title: t2 },
...
]
}
or as a nested document (repr B)
{
_id: 1,
subdocs: {
1: { title: t1 },
2: { title: t2 },
...
}
}
I would like to be able to update OR insert (i.e. upsert) new subdocs without having to use extra in-application logic.
In repr B this is straight-forward as I can simply use set
$set: {subdocs.3.title: t3}
in an update with upsert: true.
In repr A it is possible to update an existing record using an 'arrayFilter' with something like:
update({_id: 1}, {$set: {subdocs.$[i].title: t3}}, {arrayFilter: [{i.sd_id: 3}], upsert: true})
The problem is that while the above will update an existing subobject it will not create a new subobject (i.e. with _id: 3) if it does not exist (it is not an upsert). The docs claim that $[] does support upsert but this does not work for me.
While repr B does allow for update/upserts there is no way to search on the ids of the subdocuments because they are now keys rather than values.
The only solution to the above is to use a denormalized representation with e.g. the id being stored as both a key and a value:
subdocs: {
1: { sd_id: 1, title: t1 },
2: { sd_id: 2, title: t2 },
...
}
But this seems precarious (because the values might get out of sync).
So my question is whether there is a way around this? Am I perhaps missing a way to do an upsert in case A?
UPDATE: I found a workaround that lets me effectively use repr A even though I'm not sure its optimal. It involves using two writes rather than one:
update({_id: 1, "subdocs.sd_id": {$ne: 3}}, {$push: {subdocs: {sd_id: 3}}})
update({_id: 1}, {$set: {subdocs.$[i].title: t3}}, {arrayFilter: [{i.sd_id: 3}]})
The first line in the above ensures that we only ever insert one subdoc with sd_id 3 (and only has an effect if the id does not exist) while the second line updates the record (which should now definitely exist). I can probably put these in an ordered bulkwrite to make it all work.

Conditional update of multiple entries in mongodb?

I am a relative newbie in mongodb but have lots of experience in MySQL.
I want to do a simple query that would appear in MySQL as follows:
UPDATE {database}.{table} SET {FIELD1}='{VALUE1}' WHERE {FIELD2} = {VALUE2};
e.g.
UPDATE test.user SET email='test#acc.ie' WHERE ref='12';
I don't want to destroy the database or collection on my first attempt.
I have only ever ran SELECT type queries on mongo, edited individual json entries or dropped an entire database. A select in mongodb looks like the following
db.getCollection('user').find({email : "test#acc.ie"})
Is the following correct based on the MySQL example?
db.user.update({ref : "12"}, {$set: {email: test#ac.ie}}, { multi: true })
Because this is the response I am getting:
Updated 0 record(s) in 141ms
The syntax looks ok, you will get an error with how you are accessing the collection currently, change it to
db.user.update({ref: 12}, {$set: {email: 'test#ac.ie'}}, { multi: true })
or
db.getCollection('user').update({ref: 12}, {$set: {email: 'test#ac.ie'}}, { multi: true })
Note: the value of the field being set should be quoted if it's a string and unquote if a number. Best read the manual for better referencing.

Meteor collection get last document of each selection

Currently I use the following find query to get the latest document of a certain ID
Conditions.find({
caveId: caveId
},
{
sort: {diveDate:-1},
limit: 1,
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
How can I use the same using multiple ids with $in for example
I tried it with the following query. The problem is that it will limit the documents to 1 for all the found caveIds. But it should set the limit for each different caveId.
Conditions.find({
caveId: {$in: caveIds}
},
{
sort: {diveDate:-1},
limit: 1,
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
One solution I came up with is using the aggregate functionality.
var conditionIds = Conditions.aggregate(
[
{"$match": { caveId: {"$in": caveIds}}},
{
$group:
{
_id: "$caveId",
conditionId: {$last: "$_id"},
diveDate: { $last: "$diveDate" }
}
}
]
).map(function(child) { return child.conditionId});
var conditions = Conditions.find({
_id: {$in: conditionIds}
},
{
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
You don't want to use $in here as noted. You could solve this problem by looping through the caveIds and running the query on each caveId individually.
you're basically looking at a join query here: you need all caveIds and then lookup last for each.
This is a problem of database schema/denormalization in my opinion: (but this is only an opinion!):
You could as mentioned here, lookup all caveIds and then run the single query for each, every single time you need to look up last dives.
However I think you are much better off recording/updating the last dive inside your cave document, and then lookup all caveIds of interest pulling only the lastDive field.
That will give you immediately what you need, rather than going through expensive search/sort queries. This is at the expense of maintaining that field in the document, but it sounds like it should be fairly trivial as you only need to update the one field when a new event occurs.

How can I get all the doc ids in MongoDB?

How can I get an array of all the doc ids in MongoDB? I only need a set of ids but not the doc contents.
You can do this in the Mongo shell by calling map on the cursor like this:
var a = db.c.find({}, {_id:1}).map(function(item){ return item._id; })
The result is that a is an array of just the _id values.
The way it works in Node is similar.
(This is MongoDB Node driver v2.2, and Node v6.7.0)
db.collection('...')
.find(...)
.project( {_id: 1} )
.map(x => x._id)
.toArray();
Remember to put map before toArray as this map is NOT the JavaScript map function, but it is the one provided by MongoDB and it runs within the database before the cursor is returned.
One way is to simply use the runCommand API.
db.runCommand ( { distinct: "distinct", key: "_id" } )
which gives you something like this:
{
"values" : [
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
],
"stats" : {
"n" : 7,
"nscanned" : 7,
"nscannedObjects" : 0,
"timems" : 2,
"cursor" : "DistinctCursor"
},
"ok" : 1
}
However, there's an even nicer way using the actual distinct API:
var ids = db.distinct.distinct('_id', {}, {});
which just gives you an array of ids:
[
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
]
Not sure about the first version, but the latter is definitely supported in the Node.js driver (which I saw you mention you wanted to use). That would look something like this:
db.collection('c').distinct('_id', {}, {}, function (err, result) {
// result is your array of ids
})
I also was wondering how to do this with the MongoDB Node.JS driver, like #user2793120. Someone else said he should iterate through the results with .each which seemed highly inefficient to me. I used MongoDB's aggregation instead:
myCollection.aggregate([
{$match: {ANY SEARCHING CRITERIA FOLLOWING $match'S RULES} },
{$sort: {ANY SORTING CRITERIA, FOLLOWING $sort'S RULES}},
{$group: {_id:null, ids: {$addToSet: "$_id"}}}
]).exec()
The sorting phase is optional. The match one as well if you want all the collection's _ids. If you console.log the result, you'd see something like:
[ { _id: null, ids: [ '56e05a832f3caaf218b57a90', '56e05a832f3caaf218b57a91', '56e05a832f3caaf218b57a92' ] } ]
Then just use the contents of result[0].ids somewhere else.
The key part here is the $group section. You must define a value of null for _id (otherwise, the aggregation will crash), and create a new array field with all the _ids. If you don't mind having duplicated ids (according to your search criteria used in the $match phase, and assuming you are grouping a field other than _id which also has another document _id), you can use $push instead of $addToSet.
Another way to do this on mongo console could be:
var arr=[]
db.c.find({},{_id:1}).forEach(function(doc){arr.push(doc._id)})
printjson(arr)
Hope that helps!!!
Thanks!!!
I struggled with this for a long time, and I'm answering this because I've got an important hint. It seemed obvious that:
db.c.find({},{_id:1});
would be the answer.
It worked, sort of. It would find the first 101 documents and then the application would pause. I didn't let it keep going. This was both in Java using MongoOperations and also on the Mongo command line.
I looked at the mongo logs and saw it's doing a colscan, on a big collection of big documents. I thought, crazy, I'm projecting the _id which is always indexed so why would it attempt a colscan?
I have no idea why it would do that, but the solution is simple:
db.c.find({},{_id:1}).hint({_id:1});
or in Java:
query.withHint("{_id:1}");
Then it was able to proceed along as normal, using stream style:
createStreamFromIterator(mongoOperations.stream(query, MortgageDocument.class)).
map(MortgageDocument::getId).forEach(transformer);
Mongo can do some good things and it can also get stuck in really confusing ways. At least that's my experience so far.
Try with an agregation pipeline, like this:
db.collection.aggregate([
{ $match: { deletedAt: null }},
{ $group: { _id: "$_id"}}
])
this gona return a documents array with this structure
_id: ObjectId("5fc98977fda32e3458c97edd")
i had a similar requirement to get ids for a collection with 50+ million rows. I tried many ways. Fastest way to get the ids turned out to be to do mongoexport with just the ids.
One of the above examples worked for me, with a minor tweak. I left out the second object, as I tried using with my Mongoose schema.
const idArray = await Model.distinct('_id', {}, function (err, result) {
// result is your array of ids
return result;
});

insert to specific index for mongo array

Mongo supports arrays of documents inside documents. For example, something like
{_id: 10, "coll": [1, 2, 3] }
Now, imagine I wanted to insert an arbitrary value at an arbitrary index
{_id: 10, "coll": [1, {name: 'new val'}, 2, 3] }
I know you can update values in place with $ and $set, but nothing for insertion. it kind of sucks to have to replace the entire array just for inserting at a specific index.
Starting with version 2.6 you finally can do this. You have to use $position operator. For your particular example:
db.students.update(
{ _id: 10},
{ $push: {
coll: {
$each: [ {name: 'new val'} ],
$position: 1
}
}}
)
The following will do the trick:
var insertPosition = 1;
var newItem = {name: 'new val'};
db.myCollection.find({_id: 10}).forEach(function(item)
{
item.coll = item.coll.slice(0, insertPosition).concat(newItem, item.coll.slice(insertPosition));
db.myCollection.save(item);
});
If the insertPosition is variable (i.e., you don't know exactly where you want to insert it, but you know you want to insert it after the item with name = "foo", just add a for() loop before the item.coll = assignment to find the insertPosition (and add 1 to it, since you want to insert it AFTER name = "foo".
Handy answer (not selected answer, but highest rated) from this similar post:
Can you have mongo $push prepend instead of append?
utilizes $set to insert 3 at the first position in an array, called "array". Sample from related answer by Sergey Nikitin:
db.test.update({"_id" : ObjectId("513ad0f8afdfe1e6736e49eb")},
{'$set': {'array.-1': 3}})
Regarding your comment:
Well.. with concurrent users this is going to be problematic with any database...
What I would do is the following:
Add a last modified timestamp to the document. Load the document, let the user modify it and use the timstamp as a filter when you update the document and also update the timestamp in one step. If it updates 0 documents you know it was modified in the meantime and you can ask the user to reload it.
Using the $position operator this can be done starting from version 2.5.3.
It must be used with the $each operator. From the documentation:
db.collection.update( <query>,
{ $push: {
<field>: {
$each: [ <value1>, <value2>, ... ],
$position: <num>
}
}
}
)