Mongodb: find subdocuments whose fields are in non-canonical order

Mongodb: find subdocuments whose fields are in non-canonical order - mongodb

As I switch to subdoc equality testing, I want to ensure all of my documents' subdocs field orders are in the canonical ordering which I will be testing. I cannot figure out a good query for checking this.
That is, given these records:
{'_id': .., 'doc':{'a': 1, 'b': 2}, ..}
{'_id': .., 'doc':{'a': 11, 'b': 21}, ..}
{'_id': .., 'doc':{'a': 0, 'b': 2}, ..}
...
{'_id': .., 'doc':{'b': 0, 'a': 4}, ..}
If I query with 'doc' : {'a': 4, 'b': 0} I won't find the doc w/ keys in the "wrong" order. I'd like to write a one-time query to find all of these docs, but even with $where I don't see how I can check the order of keys (I guess the where could try querying with the canonical order if need be).

So, I wrote a $where which does it. I'm not sure it's the best solution:
db.mycollection.find({$where: function() {
var keys = Object.keys(this['doc']);
var ref = ['a', 'b'];
for (var i=0; i < ref.length; i++) {
if (keys[i] != ref[i]) return true;
}
return false;
}},
{_id: 1})
Obviously this is overkill if my docs really only had the 2 subfields, but that's the pattern.

Related

Mongo remove document if all elements in array match condition

Docs:
{
_id: 1,
items: [{thing: 5}, {thing: 7}]
}
{
_id: 2,
items: [{thing: 5}, {thing: 11}]
}
I would like to remove all docs from the collection above if all elements in array have "thing" < 10. IE for that case doc 1 should be removed, doc 2 should remain.
Is it possible with a query to find only docs where all elements in the array match with a $lt query?
I tried this:
db.mycollection.remove({items: {$all: [{$elemMatch: {thing: {$lt: 11}}}]}})
However that will remove all docs if any of the elements in the array match the condition.

Use double negative (De-Morgan law):
{items: {$not: {$elemMatch: {thing: {$gte: 11}}}}}

Add a new field with large number of rows to existing collection in Mongodb

I have an existing collection with close to 1 million number of docs, now I'd like to append a new field data to this collection. (I'm using PyMongo)
For example, my existing collection db.actions looks like:
...
{'_id':12345, 'A': 'apple', 'B': 'milk'}
{'_id':12346, 'A': 'pear', 'B': 'juice'}
...
Now I want to append a new column field data to this existing collection:
...
{'_id':12345, 'C': 'beef'}
{'_id':12346, 'C': 'chicken'}
...
such that the resulting collection should look like this:
...
{'_id':12345, 'A': 'apple', 'B': 'milk', 'C': 'beef'}
{'_id':12346, 'A': 'pear', 'B': 'juice', 'C': 'chicken'}
...
I know we can do this with update_one with a for loop, e.g
for doc in values:
collection.update_one({'_id': doc['_id']},
{'$set': {k: doc[k] for k in fields}},
upsert=True
)
where values is a list of dictionary each containing two items, the _id key-value pair and new field key-value pair. fields contains all the new fields I'd like to add.
However, the issue is that I have a million number of docs to update, anything with a for loop is way too slow, is there a way to append this new field faster? something similar to insert_many except it's appending to an existing collection?
===============================================
Update1:
So this is what I have for now,
bulk = self.get_collection().initialize_unordered_bulk_op()
for doc in values:
bulk.find({'_id': doc['_id']}).update_one({'$set': {k: doc[k] for k in fields} })
bulk.execute()
I first wrote a sample dataframe into the db with insert_many, the performance:
Time spent in insert_many: total: 0.0457min
then I use update_one with bulk operation to add extra two fields onto the collection, I got:
Time spent: for loop: 0.0283min | execute: 0.0713min | total: 0.0996min
Update2:
I added an extra column to both the existing collection and the new column data, for the purpose of using left join to solve this. If you use left join you can ignore the _id field.
For example, my existing collection db.actions looks like:
...
{'A': 'apple', 'B': 'milk', 'dateTime': '2017-10-12 15:20:00'}
{'A': 'pear', 'B': 'juice', 'dateTime': '2017-12-15 06:10:50'}
{'A': 'orange', 'B': 'pop', 'dateTime': '2017-12-15 16:09:10'}
...
Now I want to append a new column field data to this existing collection:
...
{'C': 'beef', 'dateTime': '2017-10-12 09:08:20'}
{'C': 'chicken', 'dateTime': '2017-12-15 22:40:00'}
...
such that the resulting collection should look like this:
...
{'A': 'apple', 'B': 'milk', 'C': 'beef', 'dateTime': '2017-10-12'}
{'A': 'pear', 'B': 'juice', 'C': 'chicken', 'dateTime': '2017-12-15'}
{'A': 'orange', 'B': 'pop', 'C': 'chicken', 'dateTime': '2017-12-15'}
...

If your updates are really unique per document there is nothing faster than the bulk write API. Neither MongoDB nor the driver can guess what you want to update so you will need to loop through your update definitions and then batch your bulk changes which is pretty much described here:
Bulk update in Pymongo using multiple ObjectId
The "unordered" bulk writes can be slightly faster (although in my tests they weren't) but I'd still vote for the ordered approach for error handling reasons mainly).
If, however, you can group your changes into specific recurring patterns then you're certainly better off defining a bunch of update queries (effectively one update per unique value in your dictionary) and then issue those each targeting a number of documents. My Python is too poor at this point to write that entire code for you but here's a pseudocode example of what I mean:
Let's say you've got the following update dictionary:
{
key: "doc1",
value:
[
{ "field1", "value1" },
{ "field2", "value2" },
]
}, {
key: "doc2",
value:
[
// same fields again as for "doc1"
{ "field1", "value1" },
{ "field2", "value2" },
]
}, {
key: "doc3",
value:
[
{ "someotherfield", "someothervalue" },
]
}
then instead of updating the three documents separately you would send one update to update the first two documents (since they require the identical changes) and then one update to update "doc3". The more knowledge you have upfront about the structure of your update patterns the more you can optimize that even by grouping updates of subsets of fields but that's probably getting a little complicated at some point...
UPDATE:
As per your below request let's give it a shot.
fields = ['C']
values = [
{'_id': 'doc1a', 'C': 'v1'},
{'_id': 'doc1b', 'C': 'v1'},
{'_id': 'doc2a', 'C': 'v2'},
{'_id': 'doc2b', 'C': 'v2'}
]
print 'before transformation:'
for doc in values:
print('_id ' + doc['_id'])
for k in fields:
print(doc[k])
transposed_values = {}
for doc in values:
transposed_values[doc['C']] = transposed_values.get(doc['C'], [])
transposed_values[doc['C']].append(doc['_id'])
print 'after transformation:'
for k, v in transposed_values.iteritems():
print k, v
for k, v in transposed_values.iteritems():
collection.update_many({'_id': { '$in': v}}, {'$set': {'C': k}})

Since your join collection having less documents, you can convert the dateTime to date
db.new.find().forEach(function(d){
d.date = d.dateTime.substring(0,10);
db.new.update({_id : d._id}, d);
})
and do multiple field lookup based on date (substring of dateTime) and _id,
and out to a new collection (enhanced)
db.old.aggregate(
[
{$lookup: {
from : "new",
let : {id : "$_id", date : {$substr : ["$dateTime", 0, 10]}},
pipeline : [
{$match : {
$expr : {
$and : [
{$eq : ["$$id", "$_id"]},
{$eq : ["$$date", "$date"]}
]
}
}},
{$project : {_id : 0, C : "$C"}}
],
as : "newFields"
}
},
{$project : {
_id : 1,
A : 1,
B : 1,
C : {$arrayElemAt : ["$newFields.C", 0]},
date : {$substr : ["$dateTime", 0, 10]}
}},
{$out : "enhanced"}
]
).pretty()
result
> db.enhanced.find()
{ "_id" : 12345, "A" : "apple", "B" : "milk", "C" : "beef", "date" : "2017-10-12" }
{ "_id" : 12346, "A" : "pear", "B" : "juice", "C" : "chicken", "date" : "2017-12-15" }
{ "_id" : 12347, "A" : "orange", "B" : "pop", "date" : "2017-12-15" }
>

MongoDB Composite Key query with one field unknown

I have a database that has a unique combination of two fields (x and i) for every entry. So I have set the _id field to be {_id: {a: x, b: i}}. Now I want to retrieve all values that have a certain value x but that have any value for i.
Example:
{_id: {a: 1, b: 5}},
{_id: {a: 1, b: 3}},
{_id: {a: 2, b: 5}}
{_id: {a: 3, b: 3}}
Now I want to do something like: db.find({_id: {a: 1, b: { $exists: true}}) or even easier: db.find({_id: {a: 1}) that should return:
{_id: {a: 1, b: 5}},
{_id: {a: 1, b: 3}}
Is there any way I could achieve this? Or in other words can you query in any way on this composite primary key? Currently I added the fields to the object itself but this is not really an optimal solution as my data set gets really large.
Edit:
db.someCollection.find({"_id.a": 1, "_id.b": { $exists: true}})
Seems to be a solution, this is however just as slow as adding a as a field (not in the key) to the object. Is there a faster method?

Have you tried this?
db.someCollection.find({"_id.a": 1, "_id.b": { $exists: true}})

How to update field in array, only if other field matches value in MongoDB

I tangle a bit in jungles of query and update selectors of MongoDb.
Situation: We have Companies collection. smth like this
Companies.find({}) =>
{_id: 1, item_amount: 5},
{_id: 2, item_amount: 7},
{_id: 3, item_amount: 10}
And we have Users collection with some structure. User want to buy any item and it may decrement value in Companies and increment in Users, with one condition - user may know where he buy the item.
{userId: _id,
ownedItems:
[{company_id: 2, item_amount: 3},
{company_id: 1, item_amount: 5}]
}
Ok. How i may Users.update(), if user want to buy, for example, 5 items from {_id: 3} (we don't know if user has field for company_id: 3)
I thought it may be smth like this:
Companies.update({_id: 3}, {$inc: {item_amount: - 5}})
&&
Users.update({userId: _id}, {$set: {'ownedItems[x].company_id': 3}, $inc{'ownedItems[x].item_amount': 5})
But of cause there are some problems.
How i may know the [x] for each clint? Sort it? and what if I will need to add new Company?
Is $set will work if field 'ownedItems[x].company_id' does not exist?
May be I may check smth in IF/Else statement. With, for example, $exist, or $cond( for aggregation) selectors. But .find() always return a cursor (is it?), not true or false. So what may do? How may I use aggregation selectors? Like this:
Users.find({_id: userId}, {$cond: [{$eq: ['owenedItem[x].company_id', _someId_]}, {$set:{...}}, {$push: {...}}]})
Will it work?

Sounds like you want to use the $elemMatch operator in conjunction with the '$' positional operator:
http://docs.mongodb.org/manual/reference/operator/elemMatch/
http://docs.mongodb.org/manual/reference/operator/positional/
Let's take your example. Let's say you want to find the element in the user's "ownedItems" list where the "company_id" is 3, and update that element's "item_amount" by 5. That can be done in a single update statement like so:
Users.update( { userId: _id, ownedItems : { $elemMatch : { company_id : 3 } } },
{ $inc : { ownedItems.$.item_amount : 5 } } )
The $elemMatch operator makes it explicit which of the elements of the list you're querying for. You can then use the $ operator in the modifier clause to refer to that matched element and make changes to it.
If the "ownedItems" list of the user does not have any entry matching { "company_id" : 3 } then the above update will not do anything. If that happens, you can do a second update with the $addToSet operator to add a new entry like so:
Users.update( { userId: _id },
{ $addToSet : { ownedItems : { company_id : 3, item_amount : 5 } } } )
http://docs.mongodb.org/manual/reference/operator/addToSet/

Pymongo $size operator

Is $size equivalent operator for query condition in pymongo?
like
{'a': {'$size': 3}}
for {a: [1,2,3]}

I don't quite understand your question, but if you're asking if db.foo.find({a: {$size: 3}}) would return the document {a: [1, 2, 3]}, then the answer is yes.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Mongodb: find subdocuments whose fields are in non-canonical order - mongodb

Related

Mongo remove document if all elements in array match condition

Add a new field with large number of rows to existing collection in Mongodb

MongoDB Composite Key query with one field unknown

How to update field in array, only if other field matches value in MongoDB

Pymongo $size operator

Categories

Resources