Renaming a Collection Using Pymongo - mongodb

I realize that a collection can be renamed in MongoDB using
db["old_name"].renameCollection("new_name")
But is there an equivalent in PyMongo? I tried the following, but it didn't work either.
db["old_name"].rename_collection("new_name")

According to the documentation, the method is simply named rename.
rename(new_name, session=None, **kwargs)
new_name: new name for this collection
session (optional): a ClientSession
**kwargs (optional): additional arguments to the rename command may be passed as keyword arguments to this helper method (i.e. dropTarget=True)

May be you can use rename in pymongo, just use
db.oldname.rename('newname')
You can check the link:
https://api.mongodb.com/python/current/api/pymongo/collection.html?highlight=rename

An admin command to rename a collection can be executed in like manner.
query = {
'renameCollection': '<source_namespace>',
'to': '<target_namespace>',
}
client.admin.command(query)

query = bson.son.SON([
('renameCollection', 't1.ccd'),
('to', 't2.ccd2'),
])
print(query)
self.mgclient.admin.command(query)
Need to use bson.son.SON, as dict is unordered. Please refer to:
http://api.mongodb.com/python/current/api/bson/son.html#bson.son.SON
http://api.mongodb.com/python/current/api/pymongo/database.html
Note the order of keys in the command document is significant (the
“verb” must come first), so commands which require multiple keys (e.g.
findandmodify) should use an instance of SON or a string and kwargs
instead of a Python dict.

Related

How do I create an array for an updateOne() call in mongoc (C libarary for Mongodb)?

I am completely mystified (and supremely frustrated). How do I create this call using the mongoc library?
I have the following doc structure in the collection
{_id: myOID,
subsriptions: {
newProducts: true,
newBlogPosts: true,
pressReleases: true,
}
}
I want to remove one of the subscriptions, for example, the user no longer wants to receive press releases from me.
This works in the mongo shell. Now I need to do it in C code
updateOne({_id: myOID}, [{'$unset': 'subscriptions.pressReleases'}], {})
Note how the update parameter in the Mongo shell is an anonymous array. I need to do that for the bson passed in as the update parameter in the mongoc_collection_update_one() API call.
The C code for updateOne is
mongo_status = mongoc_collection_update_one (mongo_collection,
mongo_query,
mongo_update,
NULL, /* No Opts to pass in */
NULL, /* no reply wanted */
&mongo_error);
Also note that in the aggregate() API, this is done with
{"pipeline" : [{'$unset': 'elists.lunch' }] }
Neither the updateOne() shell function nor the mongoc_collection_update_one() API call accept that, they want just the array.
How do I create the bson to use as the second parameter for mongoc_collection_update_one() API call?
Joe's answer works and I am able to accomplish what I need to do.
The $unset update operator takes an object, just like $set.
updateOne({_id: myOID},{'$unset':{'subscriptions.pressReleases': true}})
OR perhaps even better
updateOne({_id: myOID},{'$unset':{'subscriptions.pressReleases': {'$exists': true}}})
which will remove the subscription flag no matter what the value is for that field.
Doing it this way does not require an anonymous array (which I still don't know how to create).

Data factory lookup (dot) in the item() name

I am having lookup wherein salesforce query is there. I am using elements (item()) in subsequent activities. Till now i had item().name or item().email but now i have item().NVMStatsSF__Related_Lead__r.FirstName which has (dot) in the field name.
How should i parse it through body tag so that it reads it correctly?
So I have the following data in item()
{
"NVMStatsSF__Related_Lead__c": "00QE000egrtgrAK",
"NVMStatsSF__Agent__r.Name": "ABC",
"NVMStatsSF__Related_Lead__r.Email": "geggegg#gmail.com",
"NVMStatsSF__Related_Lead__r.FirstName": "ABC",
"NVMStatsSF__Related_Lead__r.OwnerId": "0025434535IIAW"
}
now when i use item().NVMStatsSF__Agent__r.Name it will not parse because of (dot) after NVMStatsSF__Agent__r. And it is giving me the following error.
'item().NVMStatsSF__Related_Lead__r.Email' cannot be evaluated because property 'NVMStatsSF__Related_Lead__r' doesn't exist, available properties are 'NVMStatsSF__Related_Lead__c, NVMStatsSF__Agent__r.Name, NVMStatsSF__Related_Lead__r.Email, NVMStatsSF__Related_Lead__r.FirstName, NVMStatsSF__Related_Lead__r.OwnerId'.",
"failureType": "UserError",
"target": "WebActivityToAddPerson"
this is because ADF uses '.' for object reading.
Could you find a way to rename the field name which contains '.'?
Seems like you need a built-in function to get the value of an object according to the key. Like getValue(item(), 'key.nestkey'). But unfortunately, seems there isn't such a function. You may need handle your key first.
Finally, it worked. I was being silly.
Instead of taking the value from the child table with the help of (dot) operator I just used subquery. Silly see.
And it worked.

MongoDB findOneAndReplace log if added as new document or replaced

I'm using mongo's findOneAndReplace() with upsert = true and returnNewDocument = true
as basically a way to not insert duplicate. But I want to get the _id of the new inserted document (or the old existing document) to be passed to a background processing task.
BUT I also want to log if the document was Added-As-New or if a Replacement took place.
I can't see any way to use findOneAndReplace() with these parameters and answer that question.
The only think I can think of is to find, and insert in two different requests which seems a bit counter-productive.
ps. I'm actually using pymongo's find_one_and_replace() but it seems identical to the JS mongo function.
EDIT: edited for clarification.
Is it not possible to use replace_one function ? In java I am able to use repalceOne which returns UpdateResult. That has method for finding if documented updated or not. I see repalce_one in pymongo and it should behave same. Here is doc PyMongo Doc Look for replace_one
The way I'm going to implement it for now (in python):
import pymongo
def find_one_and_replace_log(collection, find_query,
document_data,
log={}):
''' behaves like find_one_or_replace(upsert=True,
return_document=pymongo.ReturnDocument.AFTER)
'''
is_new = False
document = collection.find_one(find_query)
if not document:
# document didn't exist
# log as NEW
is_new = True
new_or_replaced_document = collection.find_one_and_replace(
find_query,
document_data,
upsert=True,
return_document=pymongo.ReturnDocument.AFTER
)
log['new_document'] = is_new
return new_or_replaced_document

call custom python function on every document in a collection Mongo DB

I want to call a custom python function on some existing attribute of every document in the entire collection and store the result as a new key-value pair in that (same) document. May I know if there's any way to do that (since each call is independent of others) ?
I noticed cursor.forEach but can't it be done just using python efficiently ?
A simple example would be to split the string in text and store the no. of words as a new attribute.
def split_count(text):
# some complex preprocessing...
return len(text.split())
# Need something like this...
db.collection.update_many({}, {'$set': {"split": split_count('$text') }}, upsert=True)
But it seems like setting a new attribute in a document based on the value of another attribute in the same document is not possible this way yet. This post is old but the issues seem to be still open.
I found a way to call any custom python function on a collection using parallel_scan in PyMongo.
def process_text(cursor):
for row in cursor.batch_size(200):
# Any complex preprocessing here...
split_text = row['text'].split()
db.collection.update_one({'_id': row['_id']},
{'$set': {'split_text': split_text,
'num_words': len(split_text) }},
upsert=True)
def preprocess(num_threads=4):
# Get up to max 'num_threads' cursors.
cursors = db.collection.parallel_scan(num_threads)
threads = [threading.Thread(target=process_text, args=(cursor,)) for cursor in cursors]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
This is not really faster than cursor.forEach (but not that slow either), but it helps me execute any arbitrarily complex python code and save the results from within Python itself.
Also if I have an array of ints in one of the attributes, doing cursor.forEach converts them to floats which I don't want. So I preferred this way.
But I would be glad to know if there're any better ways than this :)
It is quite unlikely that it will ever be efficient to do this kind of thing in python. This is because the document would have to make a round trip and go through the python function on the client machine.
In your example code, you are passing the result of a function to a mongodb update query, which won't work. You can't run any python code inside mongodb queries on the db server.
As the answer to you linked question suggests, this type of action has to be performed in the mongo shell. e.g:
db.collection.find().snapshot().forEach(
function (elem) {
splitLength = elem.text.split(" ").length
db.collection.update(
{
_id: elem._id
},
{
$set: {
split: splitLength
}
}
);
}
);

Can not delete collection from mongodb

Can not delete the collection from the shell,
The thing that the collection is available and my php script is accessing it (selecting|updating)
But when I used:
db._registration.drop()
it gives me an error:
Date, JS Error: TypeErrorL db._registration has no properties (shell): 1
The problem is not with deleting the collection. The problem is with accessing the collection. So you would not be able to update, find or do anything with it from the shell. As it was pointed in mongodb JIRA, this is a bug when a collection has characters like _, - or .
Nevertheless this type of names for collections is acceptable, but it cause a problem in shell.
You can delete it in shell with this command:
db.getCollection("_registration").drop()
or this
db['my-collection'].drop()
but I would rather rename it (of course if it is possible and will not end up with a lot of changing).
You can also use:
db["_registration"].drop()
which syntax works in JS as well.
For some reason the double quotes "_registration" did not workfor me .. but single quote '_registration' worked