I am unable to retrieve documents when an array within an array of elements contains text that should match my search.
Here are two example documents:
{
_id: ...,
'foo': [
{
'name': 'Thing1',
'data': {
'text': ['X', 'X']
}
},{
'name': 'Thing2',
'data': {
'text': ['X', 'Y']
}
}
]
}
{
_id: ...,
'foo': [
{
'name': 'Thing3',
'data': {
'text': ['X', 'X']
}
},{
'name': 'Thing4',
'data': {
'text': ['X', 'Y']
}
}
]
}
By using the following query, I am able to return both documents:
db.collection.find({'foo.data.text': {'$in': ['Y']}}
However, I am unable to return these results using the full text command/index:
db.collection.runCommand("text", {search" "Y"})
I am certain that the full text search is working, as the same command issuing a search against "Thing1" will return the first document, and "Thing3" returns the second document.
I am certain that both foo.data.text and foo.name are both in the text index when using db.collection.getIndexes().
I created my index using: db.collection.ensureIndex({'foo.name': 'text', 'foo.data.text': 'text'}). Here are the indexes as shown by the above command:
{
"v" : 1,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"ns" : "testing.collection",
"background" : true,
"name" : "my_text_search",
"weights" : {
"foo.data.text" : 1,
"foo.name" : 1,
},
"default_language" : "english",
"language_override" : "language",
"textIndexVersion" : 1
}
Any suggestion on how to get this working with mongo's full text search?
Text search does not currently support indexed fields of nested arrays (at least not explicitly specified ones). An index on "foo.name" works fine as it is only one array deep, but the text search will not recurse through the subarray at "foo.data.text". Note that this behavior may change in the 2.6 release.
But fear not, in the meantime nested arrays can be text-indexed, just not with individually specified fields. You may use the wildcard specifier $** to recursively index ALL string fields in your collection, i.e.
db.collection.ensureIndex({"$**": "text" }
as documented at http://docs.mongodb.org/manual/tutorial/create-text-index-on-multiple-fields/ . Be careful though as this will index EVERY string field and could have negative storage and performance consequences. The simple document structure you describe though should work fine. Hope that helps.
Related
This question already has an answer here:
Update array with multiple conditions in mongodb
(1 answer)
Closed 4 years ago.
I don't know if it is possible.
I'm trying to do an automatic process to update all elements of a nested array in some documents. The array hasn't a fixed length.
Below is a simplified example of the collection:
{
"_id" : ObjectId("5ba2e413a4dd01725a658c63"),
"MyOwnID" : "123456789",
"MyArray" : [
{
Field1: 'FooName1',
Field2: 'FooSpec1',
FieldToUpdate: '...'
},
{
Field1: 'FooName1',
Field2: 'FooSpec2',
FieldToUpdate: '...'
},
{
... More elements ...
}
]
},
{
"_id" : ObjectId("5ba2e413a4dd01725a658c63"),
"MyOwnID" : "987654321",
"MyArray" : [
{
Field1: 'FooName1',
Field2: 'FooSpec1',
FieldToUpdate: '...'
},
{
Field1: 'FooName2',
Field2: 'FooSpec2',
FieldToUpdate: '...'
},
]
}
I tried and it worked for the first element:
Query for the second element:
db.getCollection('works').findOneAndUpdate(
{ MyOwnID: '123456789', '$and':[ { 'MyArray.Field1': 'FooName1' },{ 'MyArray.Field2': 'FooSpec1' } ] } ,
{ '$set': { 'MyArray.$.FieldToUpdate': 1234} }
)
But when I try to update the second element only the first is updated.
Query for the second element:
db.getCollection('works').findOneAndUpdate(
{ MyOwnID: '123456789', '$and':[ { 'MyArray.Field1': 'FooName1' },{ 'MyArray.Field2': 'FooSpec2' } ] } ,
{ '$set': { 'MyArray.$.FieldToUpdate': 4321} }
)
I tried with arrayFilters option and $elemMatch, both give me an error.
Any options?
You can try below query using $elemMatch
db.getCollection("works").findOneAndUpdate(
{
"MyOwnID": "123456789",
"MyArray": { "$elemMatch": { "Field1": "FooName1", "Field2": "FooSpec2" }}
},
{ "$set": { "MyArray.$.FieldToUpdate": 4321 }}
)
You tried with arrayFilters, but probably in a wrong way, becuse it's working with it. It's not very clear in mongoDB doc, but $[myRef] acts as a placeholder for arrayFilters. Knowing that, you can do this to achieve your goal :
db['01'].findOneAndUpdate(
{MyOwnID: '123456789'},
{$set:{"MyArray.$[object].FieldToUpdate":1234}},
{arrayFilters:[{ $and:[{'object.Field1': 'FooName1' },{ 'object.Field2': 'FooSpec1' }]}]}
)
Note that the unique document in arrayFilters is needed (with $and operator), because both conditions refer to the placeholder. If you put 2 conditions,
({arrayFilters:[{'object.Field1': 'FooName1' },{ 'object.Field2':
'FooSpec1' }]})
MongoDB will complain about two criteria with same base placeholder.
While the answer given by #Anthony Winzlet is right and works perfectly, it will only update the first array element matching conditions defined in $elemMatch, that's why i avoid to use it like that (unless having a unique index on including MyArray.Field1 and MyArray.Field2, you can't be sure that the matching element is unique in your array)
I have an existing collection with close to 1 million number of docs, now I'd like to append a new field data to this collection. (I'm using PyMongo)
For example, my existing collection db.actions looks like:
...
{'_id':12345, 'A': 'apple', 'B': 'milk'}
{'_id':12346, 'A': 'pear', 'B': 'juice'}
...
Now I want to append a new column field data to this existing collection:
...
{'_id':12345, 'C': 'beef'}
{'_id':12346, 'C': 'chicken'}
...
such that the resulting collection should look like this:
...
{'_id':12345, 'A': 'apple', 'B': 'milk', 'C': 'beef'}
{'_id':12346, 'A': 'pear', 'B': 'juice', 'C': 'chicken'}
...
I know we can do this with update_one with a for loop, e.g
for doc in values:
collection.update_one({'_id': doc['_id']},
{'$set': {k: doc[k] for k in fields}},
upsert=True
)
where values is a list of dictionary each containing two items, the _id key-value pair and new field key-value pair. fields contains all the new fields I'd like to add.
However, the issue is that I have a million number of docs to update, anything with a for loop is way too slow, is there a way to append this new field faster? something similar to insert_many except it's appending to an existing collection?
===============================================
Update1:
So this is what I have for now,
bulk = self.get_collection().initialize_unordered_bulk_op()
for doc in values:
bulk.find({'_id': doc['_id']}).update_one({'$set': {k: doc[k] for k in fields} })
bulk.execute()
I first wrote a sample dataframe into the db with insert_many, the performance:
Time spent in insert_many: total: 0.0457min
then I use update_one with bulk operation to add extra two fields onto the collection, I got:
Time spent: for loop: 0.0283min | execute: 0.0713min | total: 0.0996min
Update2:
I added an extra column to both the existing collection and the new column data, for the purpose of using left join to solve this. If you use left join you can ignore the _id field.
For example, my existing collection db.actions looks like:
...
{'A': 'apple', 'B': 'milk', 'dateTime': '2017-10-12 15:20:00'}
{'A': 'pear', 'B': 'juice', 'dateTime': '2017-12-15 06:10:50'}
{'A': 'orange', 'B': 'pop', 'dateTime': '2017-12-15 16:09:10'}
...
Now I want to append a new column field data to this existing collection:
...
{'C': 'beef', 'dateTime': '2017-10-12 09:08:20'}
{'C': 'chicken', 'dateTime': '2017-12-15 22:40:00'}
...
such that the resulting collection should look like this:
...
{'A': 'apple', 'B': 'milk', 'C': 'beef', 'dateTime': '2017-10-12'}
{'A': 'pear', 'B': 'juice', 'C': 'chicken', 'dateTime': '2017-12-15'}
{'A': 'orange', 'B': 'pop', 'C': 'chicken', 'dateTime': '2017-12-15'}
...
If your updates are really unique per document there is nothing faster than the bulk write API. Neither MongoDB nor the driver can guess what you want to update so you will need to loop through your update definitions and then batch your bulk changes which is pretty much described here:
Bulk update in Pymongo using multiple ObjectId
The "unordered" bulk writes can be slightly faster (although in my tests they weren't) but I'd still vote for the ordered approach for error handling reasons mainly).
If, however, you can group your changes into specific recurring patterns then you're certainly better off defining a bunch of update queries (effectively one update per unique value in your dictionary) and then issue those each targeting a number of documents. My Python is too poor at this point to write that entire code for you but here's a pseudocode example of what I mean:
Let's say you've got the following update dictionary:
{
key: "doc1",
value:
[
{ "field1", "value1" },
{ "field2", "value2" },
]
}, {
key: "doc2",
value:
[
// same fields again as for "doc1"
{ "field1", "value1" },
{ "field2", "value2" },
]
}, {
key: "doc3",
value:
[
{ "someotherfield", "someothervalue" },
]
}
then instead of updating the three documents separately you would send one update to update the first two documents (since they require the identical changes) and then one update to update "doc3". The more knowledge you have upfront about the structure of your update patterns the more you can optimize that even by grouping updates of subsets of fields but that's probably getting a little complicated at some point...
UPDATE:
As per your below request let's give it a shot.
fields = ['C']
values = [
{'_id': 'doc1a', 'C': 'v1'},
{'_id': 'doc1b', 'C': 'v1'},
{'_id': 'doc2a', 'C': 'v2'},
{'_id': 'doc2b', 'C': 'v2'}
]
print 'before transformation:'
for doc in values:
print('_id ' + doc['_id'])
for k in fields:
print(doc[k])
transposed_values = {}
for doc in values:
transposed_values[doc['C']] = transposed_values.get(doc['C'], [])
transposed_values[doc['C']].append(doc['_id'])
print 'after transformation:'
for k, v in transposed_values.iteritems():
print k, v
for k, v in transposed_values.iteritems():
collection.update_many({'_id': { '$in': v}}, {'$set': {'C': k}})
Since your join collection having less documents, you can convert the dateTime to date
db.new.find().forEach(function(d){
d.date = d.dateTime.substring(0,10);
db.new.update({_id : d._id}, d);
})
and do multiple field lookup based on date (substring of dateTime) and _id,
and out to a new collection (enhanced)
db.old.aggregate(
[
{$lookup: {
from : "new",
let : {id : "$_id", date : {$substr : ["$dateTime", 0, 10]}},
pipeline : [
{$match : {
$expr : {
$and : [
{$eq : ["$$id", "$_id"]},
{$eq : ["$$date", "$date"]}
]
}
}},
{$project : {_id : 0, C : "$C"}}
],
as : "newFields"
}
},
{$project : {
_id : 1,
A : 1,
B : 1,
C : {$arrayElemAt : ["$newFields.C", 0]},
date : {$substr : ["$dateTime", 0, 10]}
}},
{$out : "enhanced"}
]
).pretty()
result
> db.enhanced.find()
{ "_id" : 12345, "A" : "apple", "B" : "milk", "C" : "beef", "date" : "2017-10-12" }
{ "_id" : 12346, "A" : "pear", "B" : "juice", "C" : "chicken", "date" : "2017-12-15" }
{ "_id" : 12347, "A" : "orange", "B" : "pop", "date" : "2017-12-15" }
>
If the following is document
{ 'a': {
'b': ['a', 'x', 'b'],
't': ['a', 'z', 'w', 't']
}
}
I want to be able to obtain the value associated with the nested object. For example, in python, I would do print(dict_name['a']['t']).
I have tried find() and findOne() on both of the commands below
db.my_collection.find({}, { 'a.t': 1 })
db.my_collection.find({ 'a.t': {$exists: 'true} })
but they haven't been returning the correct data.
How can I query for the document with 'a' as a key, then that document, obtain the value associated with 't', expecting ['a', 'z', 'w', 't'] to be returned?
How about this? :
db.my_collection.aggregate([{"$project":{"_id":"$_id", "t":"$a.t"}}]);
On this test collection
{
"_id" : ObjectId("577ba92187630c1a06c4bcac"),
"a" : {
"b" : [
1,
2
],
"t" : [
2,
3
]
}
}
It gave me the following result-
{ "_id" : ObjectId("577ba92187630c1a06c4bcac"), "t" : [ 2, 3 ] }
You can do the following aggregation:
db.collection.aggregate([
{
$project: {
'_id': '$_id',
't': '$a.t'
}
}
])
This should give you what you are looking for.
Heres a link to project, basically assigning your 'a.t' array to a new value named 't' (this is what 't': '$a.t' pretty much means)
I have a document with an array field, similar to this:
{
"_id" : "....",
"Statuses" : [
{ "Type" : 1, "Timestamp" : ISODate(...) },
{ "Type" : 2, "Timestamp" : ISODate(...) },
//Etc. etc.
]
}
How can I update a specific Status item's Timestamp, by specifying its Type value?
From mongodb shell you can do this by
db.your_collection.update(
{ _id: ObjectId("your_objectid"), "Statuses.Type": 1 },
{ $set: { "Statuses.$.Timestamp": "new timestamp" } }
)
so the c# equivalent
var query = Query.And(
Query.EQ("_id", "your_doc_id"),
Query.EQ("Statuses.Type", 1)
);
var result = your_collection.Update(
query,
Update.Set("Statuses.$.Timestamp", "new timestamp", UpdateFlags.Multi,SafeMode.True)
);
This will update the specific document, you can remove _id filter if you wanted to update the whole collection
Starting with MongoDB 3.6, the $[<identifier>] positional operator may be used. Unlike the $ positional operator — which updates at most one array element per document — the $[<identifier>] operator will update every matching array element. This is useful for scenarios where a given document may have multiple matching array elements that need to be updated.
db.yourCollection.update(
{ _id: "...." },
{ $set: {"Statuses.$[element].Timestamp": ISODate("2021-06-23T03:47:18.548Z")} },
{ arrayFilters: [{"element.Type": 1}] }
);
The arrayFilters option matches the array elements to update, and the $[element] is used within the $set update operator to indicate that only array elements that matched the arrayFilter should be updated.
I have a document with an array field, similar to this:
{
"_id" : "....",
"Statuses" : [
{ "Type" : 1, "Timestamp" : ISODate(...) },
{ "Type" : 2, "Timestamp" : ISODate(...) },
//Etc. etc.
]
}
How can I update a specific Status item's Timestamp, by specifying its Type value?
From mongodb shell you can do this by
db.your_collection.update(
{ _id: ObjectId("your_objectid"), "Statuses.Type": 1 },
{ $set: { "Statuses.$.Timestamp": "new timestamp" } }
)
so the c# equivalent
var query = Query.And(
Query.EQ("_id", "your_doc_id"),
Query.EQ("Statuses.Type", 1)
);
var result = your_collection.Update(
query,
Update.Set("Statuses.$.Timestamp", "new timestamp", UpdateFlags.Multi,SafeMode.True)
);
This will update the specific document, you can remove _id filter if you wanted to update the whole collection
Starting with MongoDB 3.6, the $[<identifier>] positional operator may be used. Unlike the $ positional operator — which updates at most one array element per document — the $[<identifier>] operator will update every matching array element. This is useful for scenarios where a given document may have multiple matching array elements that need to be updated.
db.yourCollection.update(
{ _id: "...." },
{ $set: {"Statuses.$[element].Timestamp": ISODate("2021-06-23T03:47:18.548Z")} },
{ arrayFilters: [{"element.Type": 1}] }
);
The arrayFilters option matches the array elements to update, and the $[element] is used within the $set update operator to indicate that only array elements that matched the arrayFilter should be updated.