MongoDB: How to obtain value of nested object? - mongodb

If the following is document
{ 'a': {
'b': ['a', 'x', 'b'],
't': ['a', 'z', 'w', 't']
}
}
I want to be able to obtain the value associated with the nested object. For example, in python, I would do print(dict_name['a']['t']).
I have tried find() and findOne() on both of the commands below
db.my_collection.find({}, { 'a.t': 1 })
db.my_collection.find({ 'a.t': {$exists: 'true} })
but they haven't been returning the correct data.
How can I query for the document with 'a' as a key, then that document, obtain the value associated with 't', expecting ['a', 'z', 'w', 't'] to be returned?

How about this? :
db.my_collection.aggregate([{"$project":{"_id":"$_id", "t":"$a.t"}}]);
On this test collection
{
"_id" : ObjectId("577ba92187630c1a06c4bcac"),
"a" : {
"b" : [
1,
2
],
"t" : [
2,
3
]
}
}
It gave me the following result-
{ "_id" : ObjectId("577ba92187630c1a06c4bcac"), "t" : [ 2, 3 ] }

You can do the following aggregation:
db.collection.aggregate([
{
$project: {
'_id': '$_id',
't': '$a.t'
}
}
])
This should give you what you are looking for.
Heres a link to project, basically assigning your 'a.t' array to a new value named 't' (this is what 't': '$a.t' pretty much means)

Related

Add a new field with large number of rows to existing collection in Mongodb

I have an existing collection with close to 1 million number of docs, now I'd like to append a new field data to this collection. (I'm using PyMongo)
For example, my existing collection db.actions looks like:
...
{'_id':12345, 'A': 'apple', 'B': 'milk'}
{'_id':12346, 'A': 'pear', 'B': 'juice'}
...
Now I want to append a new column field data to this existing collection:
...
{'_id':12345, 'C': 'beef'}
{'_id':12346, 'C': 'chicken'}
...
such that the resulting collection should look like this:
...
{'_id':12345, 'A': 'apple', 'B': 'milk', 'C': 'beef'}
{'_id':12346, 'A': 'pear', 'B': 'juice', 'C': 'chicken'}
...
I know we can do this with update_one with a for loop, e.g
for doc in values:
collection.update_one({'_id': doc['_id']},
{'$set': {k: doc[k] for k in fields}},
upsert=True
)
where values is a list of dictionary each containing two items, the _id key-value pair and new field key-value pair. fields contains all the new fields I'd like to add.
However, the issue is that I have a million number of docs to update, anything with a for loop is way too slow, is there a way to append this new field faster? something similar to insert_many except it's appending to an existing collection?
===============================================
Update1:
So this is what I have for now,
bulk = self.get_collection().initialize_unordered_bulk_op()
for doc in values:
bulk.find({'_id': doc['_id']}).update_one({'$set': {k: doc[k] for k in fields} })
bulk.execute()
I first wrote a sample dataframe into the db with insert_many, the performance:
Time spent in insert_many: total: 0.0457min
then I use update_one with bulk operation to add extra two fields onto the collection, I got:
Time spent: for loop: 0.0283min | execute: 0.0713min | total: 0.0996min
Update2:
I added an extra column to both the existing collection and the new column data, for the purpose of using left join to solve this. If you use left join you can ignore the _id field.
For example, my existing collection db.actions looks like:
...
{'A': 'apple', 'B': 'milk', 'dateTime': '2017-10-12 15:20:00'}
{'A': 'pear', 'B': 'juice', 'dateTime': '2017-12-15 06:10:50'}
{'A': 'orange', 'B': 'pop', 'dateTime': '2017-12-15 16:09:10'}
...
Now I want to append a new column field data to this existing collection:
...
{'C': 'beef', 'dateTime': '2017-10-12 09:08:20'}
{'C': 'chicken', 'dateTime': '2017-12-15 22:40:00'}
...
such that the resulting collection should look like this:
...
{'A': 'apple', 'B': 'milk', 'C': 'beef', 'dateTime': '2017-10-12'}
{'A': 'pear', 'B': 'juice', 'C': 'chicken', 'dateTime': '2017-12-15'}
{'A': 'orange', 'B': 'pop', 'C': 'chicken', 'dateTime': '2017-12-15'}
...
If your updates are really unique per document there is nothing faster than the bulk write API. Neither MongoDB nor the driver can guess what you want to update so you will need to loop through your update definitions and then batch your bulk changes which is pretty much described here:
Bulk update in Pymongo using multiple ObjectId
The "unordered" bulk writes can be slightly faster (although in my tests they weren't) but I'd still vote for the ordered approach for error handling reasons mainly).
If, however, you can group your changes into specific recurring patterns then you're certainly better off defining a bunch of update queries (effectively one update per unique value in your dictionary) and then issue those each targeting a number of documents. My Python is too poor at this point to write that entire code for you but here's a pseudocode example of what I mean:
Let's say you've got the following update dictionary:
{
key: "doc1",
value:
[
{ "field1", "value1" },
{ "field2", "value2" },
]
}, {
key: "doc2",
value:
[
// same fields again as for "doc1"
{ "field1", "value1" },
{ "field2", "value2" },
]
}, {
key: "doc3",
value:
[
{ "someotherfield", "someothervalue" },
]
}
then instead of updating the three documents separately you would send one update to update the first two documents (since they require the identical changes) and then one update to update "doc3". The more knowledge you have upfront about the structure of your update patterns the more you can optimize that even by grouping updates of subsets of fields but that's probably getting a little complicated at some point...
UPDATE:
As per your below request let's give it a shot.
fields = ['C']
values = [
{'_id': 'doc1a', 'C': 'v1'},
{'_id': 'doc1b', 'C': 'v1'},
{'_id': 'doc2a', 'C': 'v2'},
{'_id': 'doc2b', 'C': 'v2'}
]
print 'before transformation:'
for doc in values:
print('_id ' + doc['_id'])
for k in fields:
print(doc[k])
transposed_values = {}
for doc in values:
transposed_values[doc['C']] = transposed_values.get(doc['C'], [])
transposed_values[doc['C']].append(doc['_id'])
print 'after transformation:'
for k, v in transposed_values.iteritems():
print k, v
for k, v in transposed_values.iteritems():
collection.update_many({'_id': { '$in': v}}, {'$set': {'C': k}})
Since your join collection having less documents, you can convert the dateTime to date
db.new.find().forEach(function(d){
d.date = d.dateTime.substring(0,10);
db.new.update({_id : d._id}, d);
})
and do multiple field lookup based on date (substring of dateTime) and _id,
and out to a new collection (enhanced)
db.old.aggregate(
[
{$lookup: {
from : "new",
let : {id : "$_id", date : {$substr : ["$dateTime", 0, 10]}},
pipeline : [
{$match : {
$expr : {
$and : [
{$eq : ["$$id", "$_id"]},
{$eq : ["$$date", "$date"]}
]
}
}},
{$project : {_id : 0, C : "$C"}}
],
as : "newFields"
}
},
{$project : {
_id : 1,
A : 1,
B : 1,
C : {$arrayElemAt : ["$newFields.C", 0]},
date : {$substr : ["$dateTime", 0, 10]}
}},
{$out : "enhanced"}
]
).pretty()
result
> db.enhanced.find()
{ "_id" : 12345, "A" : "apple", "B" : "milk", "C" : "beef", "date" : "2017-10-12" }
{ "_id" : 12346, "A" : "pear", "B" : "juice", "C" : "chicken", "date" : "2017-12-15" }
{ "_id" : 12347, "A" : "orange", "B" : "pop", "date" : "2017-12-15" }
>

Mongodb nested find

My database schema is somewhat like
{
"_id" : ObjectId("1"),
"createdAt" : ISODate("2017-03-10T00:00:00.000+0000"),
"user_list" : [
{
"id" : "a",
"some_flag" : 1,
},
{
"id" : "b",
"some_flag" : 0,
}
]
}
What I want to do is get the document where id is b & some_flag for the user b is 0.
My query is
db.collection.find({
'createdAt': {
$gte: new Date()
},
'user_list.id': 'b',
'user_list.some_flag': 1
}).sort({
createdAt: 1
})
When I run the query in shell. It returns the doc with id 1(which it shouldn't as the value of some_flag for b is 0)
The thing happening here is,
the query 'user_list.id': user_id matches with the nested object where "id" : b
'user_list.some_flag': 1 is matched with some_flag of nested object where "id": a (as the value of some_flag is 1 here)
What modifications should I make to compare the id & some_flag for the same nested object.
P.S. the amount of data is quite large & using aggregate will be a performance bottleneck
You should be using $elemMatch otherwise mongoDB queries are applied independently on array items, so in your case 'user_list.some_flag': 1 will be matched to array item with id a and 'user_list.id': 'b' will match array item with id b. So essentially if you want to query on array field with and logic use $elemMatch as following:
db.collection.find({
'createdAt': {
$gte: new Date()
},
user_list: {$elemMatch: {id: 'b', some_flag: 1}} // will only be true if a unique item of the array fulfill both of the conditions.
}).sort({
createdAt: 1
})
you need to try something like :
db.collection.find({
'createdAt': {
$gte: new Date()
},
user_list: {
$elemMatch: {
id: 'b',
some_flag: 1
}
}
}).sort({
createdAt: 1
});
This will match only user_list entries where _id is b and someflag is 1

$setIsSubset for regular queries in Mongo

I am looking to do the equivalent of $setIsSubset http://docs.mongodb.org/manual/reference/operator/aggregation/setIsSubset/ for regular (i.e. NOT aggregate) queries in MongoDB. How can I do this?
Assume that I have the documents
{ 'x' : ['A', 'B'] }
{ 'x' : ['A', 'D'] }
And that
filter = ['A', 'B', C']
I want to do a
find({"x" : {'$setIsSubSet':filter}}) and expect only to get back
{ 'x' : ['A', 'B'] }
It seems like most conditional commands match any not all. I also want it to be a subset, so it seems that $and and $all would not match [A,B] to [A,B,C].
You could try the following in the shell:
var filer = ['A', 'B', 'C']
db.coll2.find({x: {"$not": {"$elemMatch": {"$nin" : filer }}}})
Output
{ "_id" : ObjectId("54f4d72f1f22d4a529052760"), "x" : [ "A", "B" ] }

MongoDB Full Text on an Array within an Array of Elements

I am unable to retrieve documents when an array within an array of elements contains text that should match my search.
Here are two example documents:
{
_id: ...,
'foo': [
{
'name': 'Thing1',
'data': {
'text': ['X', 'X']
}
},{
'name': 'Thing2',
'data': {
'text': ['X', 'Y']
}
}
]
}
{
_id: ...,
'foo': [
{
'name': 'Thing3',
'data': {
'text': ['X', 'X']
}
},{
'name': 'Thing4',
'data': {
'text': ['X', 'Y']
}
}
]
}
By using the following query, I am able to return both documents:
db.collection.find({'foo.data.text': {'$in': ['Y']}}
However, I am unable to return these results using the full text command/index:
db.collection.runCommand("text", {search" "Y"})
I am certain that the full text search is working, as the same command issuing a search against "Thing1" will return the first document, and "Thing3" returns the second document.
I am certain that both foo.data.text and foo.name are both in the text index when using db.collection.getIndexes().
I created my index using: db.collection.ensureIndex({'foo.name': 'text', 'foo.data.text': 'text'}). Here are the indexes as shown by the above command:
{
"v" : 1,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"ns" : "testing.collection",
"background" : true,
"name" : "my_text_search",
"weights" : {
"foo.data.text" : 1,
"foo.name" : 1,
},
"default_language" : "english",
"language_override" : "language",
"textIndexVersion" : 1
}
Any suggestion on how to get this working with mongo's full text search?
Text search does not currently support indexed fields of nested arrays (at least not explicitly specified ones). An index on "foo.name" works fine as it is only one array deep, but the text search will not recurse through the subarray at "foo.data.text". Note that this behavior may change in the 2.6 release.
But fear not, in the meantime nested arrays can be text-indexed, just not with individually specified fields. You may use the wildcard specifier $** to recursively index ALL string fields in your collection, i.e.
db.collection.ensureIndex({"$**": "text" }
as documented at http://docs.mongodb.org/manual/tutorial/create-text-index-on-multiple-fields/ . Be careful though as this will index EVERY string field and could have negative storage and performance consequences. The simple document structure you describe though should work fine. Hope that helps.

Add subdocument array element to subdocument array element in mongoDB

Is this possible?
I have a collection C, with an array of attributes A1.
Each attribute has an array of subattributes A2.
How can I add a subdocument to a specific C.A1 subdocument ?
Here is an example.
db.docs.insert({_id: 1, A1: [{A2: [1, 2, 3]}, {A2: [4, 5, 6]}]})
If you know the index of the subdocument you want to insert, you can use dot notation with the index (starting from 0) in the middle:
db.docs.update({_id: 1}, {$addToSet: {'A1.0.A2': 9}})
This results in:
{
"A1" : [
{
"A2" : [
1,
2,
3,
9
]
},
{
"A2" : [
4,
5,
6
]
}
],
"_id" : 1
}
Yes, this is possible. If you post an example I can show you more specifically what the update query would look like. But here's a shot:
db.c.update({ A1: value }, { $addToSet: { "A1.$.A2": "some value" }})
I haven't actually tried this (I'm not in front of a Mongo instance right now) and I'm going off memory, but that should get you pretty close.
Yes, $push can be used to do the same. Try below given code.
db.c.update({ A1: value }, { $push: { "A1.$.A2": num }});