I'm handling my user register logic with Mongodb. I need insert a user if it is not exist, but get to know if it is already exists before insert, so I can notify the user he has already registered.
The update method with upsert will not return the result of how many docs have inserted ( I do not . And findAndModify method will only find docs after insert. So neither way I'm not able to know if there is already such a doc before I insert.
Is there a way to do this?
Update
update and findAndModify are not good examples. I do not want to update my doc if the user is already exists. I just want to know if the username is exists before insert. If not, then insert it.
I'm not using _id with insert. Should I use username as _id and use it to insert?
Use a unique key (on e-mail address or whatever identifies a user) and check for corresponding error code when trying to insert. That's the way. Not just for MongoDB.
db.users.createIndex({email: 1}, {unique: true})
Now when inserting a duplicate e-mail, check for error codes 11000 and 11001.
If you're willing to use _id as your userid, you can use the save() command.
This will create a new record if there's not one already, otherwise will update the existing record.
It's probably better to check for the user and then selectively update, but this is a down and dirty way of doing things:
http://docs.mongodb.org/manual/reference/method/db.collection.save/
Related
Since I will be listing my full database on /bots, and I want to use id instead of _id (discord uses id for everything, so I'm accustomed with id and not _id)
I don't even want to save id as _id in the database. So any idea what to do?
This is not possible!
MongoDB automatically creates an _id for every document that gets inserted into a database.
This is there in order to give you a one-to-one value that you will be able to use to identify each document.
The id also contains a timestamp to when you inserted the document which then can be used to optimize queries using indexes.
This is also a best practice to send the _id to the user (even if it's mapped to an id field) to then be able to query more efficiently and also to not expose their Discord Id to everyone.
Hope I could answer your question.
You could read more about it here:
https://docs.mongodb.com/manual/core/document/#the-id-field
How to remove _id in MongoDB and replace with another field as a Primary Key?
I'm trying to undertstand the best way to do this.
I am getting the name and email and I want to add it to my collection.
However, if the email already exists, then i don't want to insert the name and email. Is there a way to do this using upsert? I'm trying to understand from the documentation but its a bit confusing for me. http://docs.mongodb.org/manual/reference/method/db.collection.update/ Any help is greatly appreciated.
First of all, you should consider creating an unique index for email field to ensure that there could be only one document for any particular email:
db.collection.createIndex({email: 1}, {unique: true})
You could also add sparse option to allow documents without email.
Then you'll have two options depending on your particular use case: to use upsert, or to use insert ignoring duplicate key errors.
Upsert
Using the following upsert operation
db.collection.update({email: email}, {$set: {name: name}}, {upsert: true})
you will:
create new document if there is no such email yet;
update existing document with new name if the email already exists.
Here is a quotation from MondoDB documentation explaining upsert behavior when no document matches the query criteria:
The update creates a base document from the equality clauses in the <query> parameter, and then applies the update expressions from the <update> parameter.
Insert
If you don't want to update name field of an existing document, you should use basic insert operation instead:
db.u.insert({email: email, name: name})
ignoring all 11000 E11000 duplicate key errors.
I have a collection of users with the following schema:
{
_id:ObjectId("123...."),
name:"user_name",
field1:"field1 value",
field2:"field2 value",
etc...
}
The users are looked up by the user.name, which must be unique. When a new user is added, I first perform a search and if no such user is found, I add the new user document to the collection. The operations of searching for the user and adding a new user, if not found, are not atomic, so it's possible, when multiple application servers are connect to the DB server, for two add_user requests to be received at the same time with the same user name, resulting in no such user being found for both add_user requests, which in turn results with two documents having the same "user.name". In fact this happened (due to a bug on the client) with just a single app server running NodeJS and using Async library.
I was thinking of using findAndModify, but that doesn't work, since I'm not simply updating a field (that exists or doesn't exist) of a document that already exists and can use upsert, but want to insert a new document only if the search criteria fails. I can't make the query to be not equal to "user.name", since it will find other users.
First of all, you should maintain a unique index on the name field of the users collection. This can be specified in the schema if you are using Mongoose or by using the statement:
collection.ensureIndex('name', {unique: true}, callback);
This will make sure that the name field remains unique and will solve the problem of concurrent requests as you have specified in your question. You do not require searching when this index is set.
While writing data to mongodb, we are checking if the data is present get the _id and using save update it else using insert add the data. Read save is the best way if you are providing _id in the query while saving it will update/insert based on if the _id is present in the db. Is the save the best method or is there any other way.
If you have all data available to save, just run update() each time but use the upsert functionality. Only one query required:
db.collection.update(
['_id' => $id],
$data,
['upsert' => true]
);
If your _id is generated by mongo you always know there is a record in the database and update is the one to use, but then again you could also save().
If you generated your id's (and thus don't know if it comes from the collection), this will always work without having to run an extra query.
From the documentation
db.collection.save()
Updates an existing document or inserts a new document, depending on its document parameter.
db.collection.insert()
Inserts a document or documents into a collection.
If you use db.collection.insert() in your case you will get duplication key error since it will try to insert new document which has same _id with an existing document. But instead of using save you should use the update method.
I have a collection in which all of my documents have at least these 2 fields, say name and url (where url is unique so I set up a unique index on it). Now if I try to insert a document with a duplicate url, it will give an error and halt the program. I don't want this behavior, but I need something like mysql's insert or ignore, so that mongoDB should not insert the document with duplicate url and continue with the next documents.
Is there some parameter I can pass to the insert command to achieve this behavior? I generally do a batch of inserts using pymongo as:
collection.insert(document_array)
Here collection is a collection and document_array is an array of documents.
So is there some way I can implement the insert or ignore functionality for a multiple document insert?
Set the continue_on_error flag when calling insert(). Note PyMongo driver 2.1 and server version 1.9.1 are required:
continue_on_error (optional): If True, the database will not stop
processing a bulk insert if one fails (e.g. due to duplicate IDs).
This makes bulk insert behave similarly to a series of single inserts,
except lastError will be set if any insert fails, not just the last
one. If multiple errors occur, only the most recent will be reported
by error().
Use insert_many(), and set ordered=False.
This will ensure that all write operations are attempted, even if there are errors:
http://api.mongodb.org/python/current/api/pymongo/collection.html#pymongo.collection.Collection.insert_many
Try this:
try:
coll.insert(
doc_or_docs=doc_array,
continue_on_error=True)
except pymongo.errors.DuplicateKeyError:
pass
The insert operation will still throw an exception if an error occurs in the insert (such as trying to insert a duplicate value for a unique index), but it will not affect the other items in the array. You can then swallow the error as shown above.
Why not just put your call to .insert() inside a try: ... except: block and continue if the insert fails?
In addition, you could also use a regular update() call with the upsert flag. Details here: http://www.mongodb.org/display/DOCS/Updating#Updating-update%28%29
If you have your array of documents already in memory in your python script, why not insert them by iterating through them, and simply catch the ones that fail on insertion due to the unique index?
for doc in docs:
try:
collection.insert(doc)
except pymongo.errors.DuplicateKeyError:
print 'Duplicate url %s' % doc
Where collection is an instance of a collection created from your connection/database instances and docs is the array of dictionaries (documents) you would currently be passing to insert.
You could also decide what to do with the duplicate keys that violate your unique index within the except block.
It is highly recommended to use upsert
stat.update({'location': d['user']['location']}, \
{'$inc': {'count': 1}},upsert = True, safe = True)
Here stat is the collection if visitor location is already present in the collection, count is increased by one, else count is set to 1.
Here is the link for documentation http://www.mongodb.org/display/DOCS/Updating#Updating-UpsertswithModifiers
What I am doing :
Generate array of MongoDB ids I want to insert (hash of some values in my case)
Remove existing IDs (I am using a Redis queue bcoz performance, but you can query mongo)
Insert your cleaned data !
Redis is perfect for that, you can use Memcached or Mysql Memory, according your needs