I'm using mongo's findOneAndReplace() with upsert = true and returnNewDocument = true
as basically a way to not insert duplicate. But I want to get the _id of the new inserted document (or the old existing document) to be passed to a background processing task.
BUT I also want to log if the document was Added-As-New or if a Replacement took place.
I can't see any way to use findOneAndReplace() with these parameters and answer that question.
The only think I can think of is to find, and insert in two different requests which seems a bit counter-productive.
ps. I'm actually using pymongo's find_one_and_replace() but it seems identical to the JS mongo function.
EDIT: edited for clarification.
Is it not possible to use replace_one function ? In java I am able to use repalceOne which returns UpdateResult. That has method for finding if documented updated or not. I see repalce_one in pymongo and it should behave same. Here is doc PyMongo Doc Look for replace_one
The way I'm going to implement it for now (in python):
import pymongo
def find_one_and_replace_log(collection, find_query,
document_data,
log={}):
''' behaves like find_one_or_replace(upsert=True,
return_document=pymongo.ReturnDocument.AFTER)
'''
is_new = False
document = collection.find_one(find_query)
if not document:
# document didn't exist
# log as NEW
is_new = True
new_or_replaced_document = collection.find_one_and_replace(
find_query,
document_data,
upsert=True,
return_document=pymongo.ReturnDocument.AFTER
)
log['new_document'] = is_new
return new_or_replaced_document
Related
there are some questions here regarding how to save a result from a query into a javascript varialbe, but I'm just not able to implement them. The point is that I have a much difficult query, so this question is, in my opinion, unique.
Here is the problem. I have a collection namend "drives" and a key named "driveDate". I need to save 1 variable with the smallest date, and other with the biggest date.
The query for the smallest date is:
> db.drives.find({},{"_id":0,"driveDate":1}).sort({"driveDate":1}).limit(1)
The result is:
{ "driveDate" : ISODate("2012-01-11T17:24:12.676Z") }
how dan I save this to a variable, can I do something like:
tmp = db.drives.find({},{"_id":0,"driveDate":1}).sort({"driveDate":1}).limit(1)
Thanks!!!
Assuming you're trying to do this in the shell:
tmp = db.drives.find({}, {_id:0, driveDate:1}).sort({driveDate:1}).limit(1).toArray()[0]
find returns a cursor that you need to iterate over to retrieve the actual documents. Calling toArray on the cursor converts it to an array of docs.
After some time figuring out, I got the solution. here it is, for future reference:
var cursor = db.drives.find({},{"_id":1}).sort({"driveDate":1}).limit(1)
Then I can get the document from the cursor like this
var myDate = cursor.next()
That's it. Thanks for your help
I want to call a custom python function on some existing attribute of every document in the entire collection and store the result as a new key-value pair in that (same) document. May I know if there's any way to do that (since each call is independent of others) ?
I noticed cursor.forEach but can't it be done just using python efficiently ?
A simple example would be to split the string in text and store the no. of words as a new attribute.
def split_count(text):
# some complex preprocessing...
return len(text.split())
# Need something like this...
db.collection.update_many({}, {'$set': {"split": split_count('$text') }}, upsert=True)
But it seems like setting a new attribute in a document based on the value of another attribute in the same document is not possible this way yet. This post is old but the issues seem to be still open.
I found a way to call any custom python function on a collection using parallel_scan in PyMongo.
def process_text(cursor):
for row in cursor.batch_size(200):
# Any complex preprocessing here...
split_text = row['text'].split()
db.collection.update_one({'_id': row['_id']},
{'$set': {'split_text': split_text,
'num_words': len(split_text) }},
upsert=True)
def preprocess(num_threads=4):
# Get up to max 'num_threads' cursors.
cursors = db.collection.parallel_scan(num_threads)
threads = [threading.Thread(target=process_text, args=(cursor,)) for cursor in cursors]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
This is not really faster than cursor.forEach (but not that slow either), but it helps me execute any arbitrarily complex python code and save the results from within Python itself.
Also if I have an array of ints in one of the attributes, doing cursor.forEach converts them to floats which I don't want. So I preferred this way.
But I would be glad to know if there're any better ways than this :)
It is quite unlikely that it will ever be efficient to do this kind of thing in python. This is because the document would have to make a round trip and go through the python function on the client machine.
In your example code, you are passing the result of a function to a mongodb update query, which won't work. You can't run any python code inside mongodb queries on the db server.
As the answer to you linked question suggests, this type of action has to be performed in the mongo shell. e.g:
db.collection.find().snapshot().forEach(
function (elem) {
splitLength = elem.text.split(" ").length
db.collection.update(
{
_id: elem._id
},
{
$set: {
split: splitLength
}
}
);
}
);
I'm new to Meteor. I've been stuck on this problem for a while. I can successfully adds items to a collection and look at them fully in the console. However, I cannot access all of the read operations in my .js file.
That is, I can use .find() and .findOne() with empty parameters. But when I try to add .sort or an argument I get an error telling me the object is undefined.
Autopublish is turned on, so I'm not sure what the problem is. These calls are being made directly in the client.
This returns something--
Template.showcards.events({
"click .play-card": function () {
alert(Rounds.find());
}
})
And this returns nothing--
Template.showcards.events({
"click .play-card": function () {
alert(Rounds.find().sort({player1: -1}));
}
})
Sorry for the newbie question. Thanks in advance.
Meteor's collection API works a bit differently from the mongo shell's API, which is understandably confusing for new users. You'll need to do this:
Template.showcards.events({
'click .play-card': function() {
var sortedCards = Rounds.find({}, {sort: {player1: -1}}).fetch();
console.log(sortedCards);
}
});
See this for more details. Also note that logging a cursor (the result of a find) probably isn't what you want. If you want to see the contents of the documents, you need to fetch them.
Rounds.find().sort({player1: -1}) returns a cursor, so you will want to do this:
Rounds.find().sort({player1: -1}).fetch();
Note that this returns an Array of document objects. So you would do something more like this:
docs = Rounds.find().sort({player1: -1}).fetch();
alert(docs[0]);
For example suppose I insert data as follows
doc1 = [{url: 'http://domain.com/pic1.jpg'}, {url: 'http://domain.com/pic2.jpg'}]
doc2 = [{url: 'http://domain.com/pic3.jpg'}, {url: 'http://domain.com/pic4.jpg'}]
db.picture.insert(doc1)
db.picture.insert(doc2)
How could I replace all 'http' with 'https'?
MongoDB does not have in-built support for search and replace of a portion of a string. You could write a program in your favourite scripting language to do this.
You can use regular expression searching to get back all the URLs that start with "http:":
db.picture.find({url: /^http:/})
You could do that in your program to get the data, then modify it in your program, and update or replace the documents with the new values.
Im trying to 'compare' all documents between 2 collections, which will return true only and if only all documents inside 2 collections are exactly equal.
I've been searching for the methods on the collection, but couldnt find one that can do this.
I experimented something like these in the mongo shell, but not working as i expected :
db.test1 == db.test2
or
db.test1.to_json() == db.test2.to_json()
Please share your thoughts ! Thank you.
You can try using mongodb eval combined with your custom equals function, something like this.
Your methods don't work because in the first case you are comparing object references, which are not the same. In the second case, there is no guarantee that to_json will generate the same string even for the objects that are the same.
Instead, try something like this:
var compareCollections = function(){
db.test1.find().forEach(function(obj1){
db.test2.find({/*if you know some properties, you can put them here...if don't, leave this empty*/}).forEach(function(obj2){
var equals = function(o1, o2){
// here goes some compare code...modified from the SO link you have in the answer.
};
if(equals(ob1, obj2)){
// Do what you want to do
}
});
});
};
db.eval(compareCollections);
With db.eval you ensure that code will be executed on the database server side, without fetching collections to the client.