In update method, query parameter containing a list (pymongo) - mongodb

I have a dictionary. I need to insert column 2 into mongodb corresponding to column 1(key).
Say this is the dictionary:
values = {'a':['1','2','3'],
'b':['1','2'],
'c':['3','4'] }
Right now I am doing this:
for k,v in values.items():
col4.update({"name":k},{"$set":{"fieldName":v}})
But this takes 3 accesses to the db. Is it possible to do it one go like the way $in works.

In your code you are finding each document by name field and set its fieldName to v. There is no update operation in Mongo that can do such thing in one shot for multiple documents.
However there is a bulk insert statement which can be more efficient than multiple inserts or updates. http://docs.mongodb.org/manual/core/bulk-inserts/.
I thinks I previously didn't quite understand what you were asking and wrote the answer below, but I'm still not sure what you mean by $in. Perhaps you can provide example of data before and after update in DB, that way it will be absolutely clear what you are trying to achieve.
OLD answer ... (I'll edit it soon)
You need to restructure your loop. Build up a query (not running) by adding {field: newValue} to $set clause. After the loop is done you will have an analog of {$set:{"a": 1, "b": 1, "c": 3}}. Then you will update all fields in one shot.
Here is official documentation:
http://docs.mongodb.org/manual/reference/operator/update/set/

Related

How should I efficiently delete alot of records from a mongodb collection?

This bounty has ended. Answers to this question are eligible for a +500 reputation bounty. Bounty grace period ends in 4 hours.
Jiew Meng wants to draw more attention to this question.
I am using Mongo to store multi tenant data. As part of data cleanup for a tenant I want to delete everything related to the tenant. The tenantId is indexed but there are alot of rows and it takes a long time to query and I have no easy way to get the progress.
Currently I do something
db.records.deleteMany({tenantId: x})
Is there a better way?
Thinking of doing in batches but like query for x records then build a list of ids to delete. Seems very manual but isit the recommended way?
Some options that I can think of.
Drop the index, before deleting. You can recreate the index after the deletion.
Change the write concern to a lower value, possibly 0. Request won't wait for acknowledgement from secondaries.
db.records.deleteMany({tenantId: x},{w : 0});
If there is another field with enough cardinality to reduce the number of documents, try including that in the query.
Ex: if anotherField as 0,1,2,3 as values, then execute the delete command 4 times, each time with different value.
db.records.deleteMany({tenantId: x, anotherField: 0},{w : 0});
db.records.deleteMany({tenantId: x, anotherField: 1},{w : 0});
db.records.deleteMany({tenantId: x, anotherField: 2},{w : 0});
db.records.deleteMany({tenantId: x, anotherField: 3},{w : 0});
The performance may depend on variety of different factors. But here are some options you can try to improve the performance
Bulk operations
Bulk operations might help here. bulk.find(query).remove() is a version of db.collection.remove(query) that optimized for large numbers of operations. You can read more about it here
You can use the following way:
Declare a search query:
var query= {tenantId: x};
Initialize and use a bulk:
var bulk = db.yourCollection.initializeUnorderedBulkOp()
bulk.find(query).remove() // or try delete() instead of remove()
bulk.execute()
The idea here rather not to speed up the removal, but to produce less load.
Also you could try bulkWrite()
db.yourCollection.bulkWrite([
{ deleteMany: {
"filter" : query,
}}
])
TTL indexes
It may be not suitable for your use case, but there's entirely another approach without removing by yourself at all.
If it is suitable for you to delete data based on a timestamp, then a TTL index might help you. The idea here is that the record is being removed when the TTL expires.
Implemented as a special index type, TTL collections make it possible
to store data in MongoDB and have the mongod automatically remove data
after a specified period of time.
DeleteMany I think, There must be something common between all the rows that you want to remove from the collection.
You can find out something and then create a query accordingly.
this will help you to remove those records fast.
Let me give you one example. I want to remove all the records where username is not exists.
db.collection.deleteMany({ username: {$exists: false} })
The best place to start is to find something that all records have in common in-order to removed them all at once.
For example the following code deletes all entries that don't contain an email address.
db.users.deleteMany({ email: { $exists: false } })
MongoDB documentation have great examples. Link provided below.
https://www.mongodb.com/docs/manual/reference/method/db.collection.deleteMany/#delete-multiple-documents
You might also want to consider dropping the index since it could be recreated after your done with the operation.
Finally you might want to lower the write concern in your operation in order to speed things up. A compile list of options can be found here
https://www.mongodb.com/docs/v5.0/reference/write-concern/#w-option
I found a good tutorial on https://www.geeksforgeeks.org/mongodb-delete-multiple-documents-using-mongoshell/ that might help you further.
apologies for any grammatical mistakes since English is not my native tongue
I would suggest two solutions, and also please export your model If anything goes wrong you will have a backup of your data or try this in your test DB first 
you can use your tenantId as a condition, not matching _id but with extra logic, like if any of the records do have the tenantId delete them so this way all of your tenant data will be removed using a single query.
db.records.deleteMany({tenantId : {$exists: true})
// suggestion- if any of your tenant data has a field tenantId but it is null you can check for a null value also to delete those records.
 
2) find command data in all of the records, if there is use it as a condition to delete those records.
for example, all of your tenant data have a common field called type with the same value use delete statement like
db.records.deleteMany({type : 1})

Shift column of MongoDB

I am new in MongoDB. I have a collection in MongoDB. I would like to shift a column in collection of MongoDB. I would like to place image as last column and email as before the last column. How can I do that ?
I agree with others who mentioned that the order of keys in a document should not matter.
But if you still want to learn more about it, it is mentioned in the docs that:
MongoDB preserves the order of the document fields following write
operations...
Considering that, you can actually alter the order of fields by removing and re-inserting them (using $unset and $set) or by using the $rename operator which does exactly that. But you will need to do a couple of operations: First rename the image field to something else, than re-rename it back to image like so:
db.test.updateMany({}, {$rename: {image: 'image_'}})
db.test.updateMany({}, {$rename: {image_: 'image'}})
Since this will actually re-insert the image field, it will cause that field to be last in the document.

How to order the fields of the documents returned by the find query in MongoDB? [duplicate]

I am using PyMongo to insert data (title, description, phone_number ...) into MongoDB. However, when I use mongo client to view the data, it displays the properties in a strange order. Specifically, phone_number property is displayed first, followed by title and then comes description. Is there some way I can force a particular order?
The above question and answer are quite old. Anyhow, if somebody visits this I feel like I should add:
This answer is completely wrong. Actually in Mongo Documents ARE ordered key-value pairs. However when using pymongo it will use python dicts for documents which indeed are not ordered (as of cpython 3.6 python dicts retain order, however this is considered an implementation detail). But this is a limitation of the pymongo driver.
Be aware, that this limitation actually impacts the usability. If you query the db for a subdocument it will only match if the order of the key-values pairs is correct.
Just try the following code yourself:
from pymongo import MongoClient
db = MongoClient().testdb
col = db.testcol
subdoc = {
'field1': 1,
'field2': 2,
'filed3': 3
}
document = {
'subdoc': subdoc
}
col.insert_one(document)
print(col.find({'subdoc': subdoc}).count())
Each time this code gets executed the 'same' document is added to the collection. Thus, each time we run this code snippet the printed value 'should' increase by one. It does not because find only maches subdocuemnts with the correct ordering but python dicts just insert the subdoc in arbitrary order.
see the following answer how to use ordered dict to overcome this: https://stackoverflow.com/a/30787769/4273834
Original answer (2013):
MongoDB documents are BSON objects, unordered dictionaries of key-value pairs. So, you can't rely on or set a specific fields order. The only thing you can operate is which fields to display and which not to, see docs on find's projection argument.
Also see related questions on SO:
MongoDB field order and document position change after update
Can MongoDB and its drivers preserve the ordering of document elements
Ordering fields from find query with projection
Hope that helps.

MongoDB C driver _id generation

I use mongo_insert() three times to insert my data in three different collections. The problem is that the "_id" field must be exactly the same in each of the collections, but I do not know how to (ideally) recover and reuse the "_id" field generated in my first mongo_insert...
Please, advice me how to do it.
Normally, you could have different field, like CustomId for your private needs, and leave _id for mongo generation.
But if you still need it to be exactly the same - there could be 2 variants:
1) setting custom generated _id do each doc.
2) Save first doc, then read it again, check it's _id and set it to the other docs.

MongoDB: Nested query with arrays, and it's performance

I have 2 collections on 2 separate DBs. Both store an array field. I plan to query both at once so that:
All collection 1 documents that have elements [A,B] in their array
field and their _ids are present in collection 2's array field with a
specific document _id.
As an example:
docs (collection 1, DB 1):
[{"_id":ObjectId("doc1"), "array1":["A","B"]}, {"_id":ObjectId("doc2"), "array1":["A","C"]}]
user_docs (collection 2, DB 2):
[{"_id":ObjectId("usr1"), "array2": [ObjectId("doc1"),ObjectId("foo")]}, {"_id":ObjectId("usr2"), "array2": [ObjectId("bar"),ObjectId("baz")]}]
I need a query that given A,B and usr1, returns the 'doc1' object (because it has A,B in it's array1 field and usr1 has it in it's array2 field).
I obviously can fetch all docs having A,B in one query and all usr1's docs in another query and find the common elements at application level, but is there any better way of doing it using MongoDB?
Thanks for your help.
Ok im not sure i understand exactly what your trying to do from your description. But i dont understand why you would query data across db's this just seems very heavy handed to me why cant you store both the data sets in the same db. You can always separate later if required? Im not sure this will solve your vague problem but it would be a good place to start.
best of Luck.
You will have to query MongoDB twice, since you have no possibility of a join. You will have to do it on application level. If you can denormalize, do it. Cash the needed data in a embedded doc, so that you can do one query only.
I think #Eamonn is right, that you shouldn't have to do a query across DBs.