$setIsSubset for regular queries in Mongo - mongodb

I am looking to do the equivalent of $setIsSubset http://docs.mongodb.org/manual/reference/operator/aggregation/setIsSubset/ for regular (i.e. NOT aggregate) queries in MongoDB. How can I do this?
Assume that I have the documents
{ 'x' : ['A', 'B'] }
{ 'x' : ['A', 'D'] }
And that
filter = ['A', 'B', C']
I want to do a
find({"x" : {'$setIsSubSet':filter}}) and expect only to get back
{ 'x' : ['A', 'B'] }
It seems like most conditional commands match any not all. I also want it to be a subset, so it seems that $and and $all would not match [A,B] to [A,B,C].

You could try the following in the shell:
var filer = ['A', 'B', 'C']
db.coll2.find({x: {"$not": {"$elemMatch": {"$nin" : filer }}}})
Output
{ "_id" : ObjectId("54f4d72f1f22d4a529052760"), "x" : [ "A", "B" ] }

Related

MongoDB - how to perform consecutive queries?

I have a schema where one field is an array of values. The collection might look something like:
{
_id: 1,
tags: ['a', 'b']
},
{
_id: 2,
tags: ['b', 'a']
},
_id: 3,
tags: ['a', 'c']
},
_id: 4,
tags: ['c', 'd']
},
_id:5,
tags: ['b', 'e']
}
The user should be able to perform consecutive filter operations, for example,
Filtering for 'a' will return _id:1, _id:2, and _id:3;
A consecutive filter for 'b' will return _id:1 and _id:2 (presumably from the results of step 1 above).
There might be n number of consecutive filter operations.
What is the best way to structure this filter with Mongodb?
Many thanks for your help.

Add a new field with large number of rows to existing collection in Mongodb

I have an existing collection with close to 1 million number of docs, now I'd like to append a new field data to this collection. (I'm using PyMongo)
For example, my existing collection db.actions looks like:
...
{'_id':12345, 'A': 'apple', 'B': 'milk'}
{'_id':12346, 'A': 'pear', 'B': 'juice'}
...
Now I want to append a new column field data to this existing collection:
...
{'_id':12345, 'C': 'beef'}
{'_id':12346, 'C': 'chicken'}
...
such that the resulting collection should look like this:
...
{'_id':12345, 'A': 'apple', 'B': 'milk', 'C': 'beef'}
{'_id':12346, 'A': 'pear', 'B': 'juice', 'C': 'chicken'}
...
I know we can do this with update_one with a for loop, e.g
for doc in values:
collection.update_one({'_id': doc['_id']},
{'$set': {k: doc[k] for k in fields}},
upsert=True
)
where values is a list of dictionary each containing two items, the _id key-value pair and new field key-value pair. fields contains all the new fields I'd like to add.
However, the issue is that I have a million number of docs to update, anything with a for loop is way too slow, is there a way to append this new field faster? something similar to insert_many except it's appending to an existing collection?
===============================================
Update1:
So this is what I have for now,
bulk = self.get_collection().initialize_unordered_bulk_op()
for doc in values:
bulk.find({'_id': doc['_id']}).update_one({'$set': {k: doc[k] for k in fields} })
bulk.execute()
I first wrote a sample dataframe into the db with insert_many, the performance:
Time spent in insert_many: total: 0.0457min
then I use update_one with bulk operation to add extra two fields onto the collection, I got:
Time spent: for loop: 0.0283min | execute: 0.0713min | total: 0.0996min
Update2:
I added an extra column to both the existing collection and the new column data, for the purpose of using left join to solve this. If you use left join you can ignore the _id field.
For example, my existing collection db.actions looks like:
...
{'A': 'apple', 'B': 'milk', 'dateTime': '2017-10-12 15:20:00'}
{'A': 'pear', 'B': 'juice', 'dateTime': '2017-12-15 06:10:50'}
{'A': 'orange', 'B': 'pop', 'dateTime': '2017-12-15 16:09:10'}
...
Now I want to append a new column field data to this existing collection:
...
{'C': 'beef', 'dateTime': '2017-10-12 09:08:20'}
{'C': 'chicken', 'dateTime': '2017-12-15 22:40:00'}
...
such that the resulting collection should look like this:
...
{'A': 'apple', 'B': 'milk', 'C': 'beef', 'dateTime': '2017-10-12'}
{'A': 'pear', 'B': 'juice', 'C': 'chicken', 'dateTime': '2017-12-15'}
{'A': 'orange', 'B': 'pop', 'C': 'chicken', 'dateTime': '2017-12-15'}
...
If your updates are really unique per document there is nothing faster than the bulk write API. Neither MongoDB nor the driver can guess what you want to update so you will need to loop through your update definitions and then batch your bulk changes which is pretty much described here:
Bulk update in Pymongo using multiple ObjectId
The "unordered" bulk writes can be slightly faster (although in my tests they weren't) but I'd still vote for the ordered approach for error handling reasons mainly).
If, however, you can group your changes into specific recurring patterns then you're certainly better off defining a bunch of update queries (effectively one update per unique value in your dictionary) and then issue those each targeting a number of documents. My Python is too poor at this point to write that entire code for you but here's a pseudocode example of what I mean:
Let's say you've got the following update dictionary:
{
key: "doc1",
value:
[
{ "field1", "value1" },
{ "field2", "value2" },
]
}, {
key: "doc2",
value:
[
// same fields again as for "doc1"
{ "field1", "value1" },
{ "field2", "value2" },
]
}, {
key: "doc3",
value:
[
{ "someotherfield", "someothervalue" },
]
}
then instead of updating the three documents separately you would send one update to update the first two documents (since they require the identical changes) and then one update to update "doc3". The more knowledge you have upfront about the structure of your update patterns the more you can optimize that even by grouping updates of subsets of fields but that's probably getting a little complicated at some point...
UPDATE:
As per your below request let's give it a shot.
fields = ['C']
values = [
{'_id': 'doc1a', 'C': 'v1'},
{'_id': 'doc1b', 'C': 'v1'},
{'_id': 'doc2a', 'C': 'v2'},
{'_id': 'doc2b', 'C': 'v2'}
]
print 'before transformation:'
for doc in values:
print('_id ' + doc['_id'])
for k in fields:
print(doc[k])
transposed_values = {}
for doc in values:
transposed_values[doc['C']] = transposed_values.get(doc['C'], [])
transposed_values[doc['C']].append(doc['_id'])
print 'after transformation:'
for k, v in transposed_values.iteritems():
print k, v
for k, v in transposed_values.iteritems():
collection.update_many({'_id': { '$in': v}}, {'$set': {'C': k}})
Since your join collection having less documents, you can convert the dateTime to date
db.new.find().forEach(function(d){
d.date = d.dateTime.substring(0,10);
db.new.update({_id : d._id}, d);
})
and do multiple field lookup based on date (substring of dateTime) and _id,
and out to a new collection (enhanced)
db.old.aggregate(
[
{$lookup: {
from : "new",
let : {id : "$_id", date : {$substr : ["$dateTime", 0, 10]}},
pipeline : [
{$match : {
$expr : {
$and : [
{$eq : ["$$id", "$_id"]},
{$eq : ["$$date", "$date"]}
]
}
}},
{$project : {_id : 0, C : "$C"}}
],
as : "newFields"
}
},
{$project : {
_id : 1,
A : 1,
B : 1,
C : {$arrayElemAt : ["$newFields.C", 0]},
date : {$substr : ["$dateTime", 0, 10]}
}},
{$out : "enhanced"}
]
).pretty()
result
> db.enhanced.find()
{ "_id" : 12345, "A" : "apple", "B" : "milk", "C" : "beef", "date" : "2017-10-12" }
{ "_id" : 12346, "A" : "pear", "B" : "juice", "C" : "chicken", "date" : "2017-12-15" }
{ "_id" : 12347, "A" : "orange", "B" : "pop", "date" : "2017-12-15" }
>

MongoDB: Add and remove from array field at the same time

I want to rename tags in our documents' tags array, e.g. change all tags a in the collection to c. The documents look something like this:
[ { _id: …, tags: ['a', 'b', 'c'] },
{ _id: …, tags: ['a', 'b'] },
{ _id: …, tags: ['b', 'c', 'd'] } ]
I need to keep tags unique. This means, an update like this will not work, because the first document will end up containing tag c twice:
db.docs.update(
{ tags: 'a' },
{ $set: { 'tags.$': 'c' } }
)
So, I tried this alternatively:
db.docs.update(
{ tags: 'a' },
{
$pull: { 'a' },
$addToSet: { 'c' }
}
)
But this gives a MongoError: Cannot update 'tags' and 'tags' at the same time.
Any chance of renaming the tags with one single update?
According to official MongoDB documentation, there is no way of expressing "replace" operation on a set of elements. So I guess, there isn't a way to do this in single update.
Update:
After some more investigation, I came across this document. If I understand it correctly, your query should look like this:
db.docs.update({
tags: 'a'
}, {
$set: { 'tags.$': 'c'}
})
Where 'tags.$' represents selector of the first element in "tags" array that matches the query, so it replaces first occurence of 'a' with 'c'. As I understand, your "tags" array does not contain duplicates, so first match will be the only match.

MongoDB: How to obtain value of nested object?

If the following is document
{ 'a': {
'b': ['a', 'x', 'b'],
't': ['a', 'z', 'w', 't']
}
}
I want to be able to obtain the value associated with the nested object. For example, in python, I would do print(dict_name['a']['t']).
I have tried find() and findOne() on both of the commands below
db.my_collection.find({}, { 'a.t': 1 })
db.my_collection.find({ 'a.t': {$exists: 'true} })
but they haven't been returning the correct data.
How can I query for the document with 'a' as a key, then that document, obtain the value associated with 't', expecting ['a', 'z', 'w', 't'] to be returned?
How about this? :
db.my_collection.aggregate([{"$project":{"_id":"$_id", "t":"$a.t"}}]);
On this test collection
{
"_id" : ObjectId("577ba92187630c1a06c4bcac"),
"a" : {
"b" : [
1,
2
],
"t" : [
2,
3
]
}
}
It gave me the following result-
{ "_id" : ObjectId("577ba92187630c1a06c4bcac"), "t" : [ 2, 3 ] }
You can do the following aggregation:
db.collection.aggregate([
{
$project: {
'_id': '$_id',
't': '$a.t'
}
}
])
This should give you what you are looking for.
Heres a link to project, basically assigning your 'a.t' array to a new value named 't' (this is what 't': '$a.t' pretty much means)

Find Documents in MongoDB whose with an array field is a subset of a query array

Suppose I have a insert a set of documents each with an array field. I would like to find all documents such that their array field is a subset of a query array. For example, if I have the following documents,
collection.insert([
{
'name': 'one',
'array': ['a', 'b', 'c']
},
{
'name': 'two',
'array': ['b', 'c', 'd']
},
{
'name': 'three',
'array': ['b', 'c']
}
])
and I query collection.find({'array': {'$superset': ['a', 'b', 'c']}), I would expect to see documents one and three as ['a', 'b', 'c'] and ['b', 'c'] are both subsets of ['a', 'b', 'c']. In other words, I'd like to do the inverse of Mongo's $all query, which selects all documents such that the query array is a subset of the document's array field. Is this possible? and if so, how?
In MongoDb, for array field:
"$in:[...]" means "intersection" or "any element in",
"$all:[...]" means "subset" or "contain",
"$elemMatch:{...}" means "any element match"
"$not:{$elemMatch:{$nin:[...]}}" means "superset" or "in"
There is a simple way to do this with aggregation framework or with a find query.
Find query is simple, but you have to use $elemMatch operator:
> db.collection.find({array:{$not:{$elemMatch:{$nin:['a','b','c']}}}}, {_id:0,name:1})
Note that this indicates that we want to not match an array which has an element which is (at the same time) not equal to 'a', 'b' or 'c'. I added a projection which only returns the name field of the resultant document which is optional.
To do this within the context of aggregation, you can use $setIsSubset:
db.collection.aggregate([
// Project the original doc and a new field that indicates if array
// is a subset of ['a', 'b', 'c']
{$project: {
doc: '$$ROOT',
isSubset: {$setIsSubset: ['$array', ['a', 'b', 'c']]}
}},
// Filter on isSubset
{$match: {isSubset: true}},
// Project just the original docs
{$project: {_id: 0, doc: 1}}
])
Note that $setIsSubset was added in MongoDB 2.6.