mongodb multiple documents insert or update by unique key - mongodb

I would like to get a list of items from an external resource periodically and save them into a collection.
There are several possible solutions but they are not optimal, for example:
Delete the entire collection and save the new list of items
Get all items from the collection using "find({})" and use it to filter out existing items and save those that do not exist.
But a better solution will be to set a unique key and just do kind of "update or insert".
Right now on saving items the unique key already exists I will get an error
is there a way to do it at all?
**upsert won't do the work since it's updating all items with the same value so it's actually good for a single document only

I have a feeling you can achieve what you want simply by using the "normal" insertMany with the ordered option set to false. The documentation states that
Note that one document was inserted: The first document of _id: 13
will insert successfully, but the second insert will fail. This will
also stop additional documents left in the queue from being inserted.
With ordered to false, the insert operation would continue with any
remaining documents.
So you will get "duplicate key" exceptions which, however, you can simply ignore in your case.

Related

Is it possible to run a "dummy" query to see how many documents _would_ be inserted

I am using MongoDB to track unique views of a resource.
Everytime a user views a specific resource for the first time, a new view is logged in the db.
If that same user views the same resource again, the unique compound index on the collection blocks the insert of the duplicate.
For bulk inserts, with { ordered: false }, Mongo allows the new views through and blocks the duplicates. The return value of the insert is an object with an insertedCount property, telling me how many docs made it past the unique index.
In some cases, I want to know how many docs would be inserted before running the query. Then, based on the dummy insertedCount, I would choose to run the query, or not.
Is there a way to test a query and have it do everything except actually inserting the docs?
I could solve this by running some js serverside to get the answer I need. But I would prefer to let the db do those checks

create unique id in mongodb from last inserted id using pymongo

Is there a way I can find the last inserted document and the field, i.e. _id or id such that I can increment and use when inserting a new document?
The issue is that I create my own id count, but I do not store this, now I've deleted records, I cannot seem to add new records because I am attempting to use the same id.
There is no way to check insertion order in MongoDB, because the database does not keep any metadata in the collections regading the documents.
If your _id field is generated server-side then you need to have a very good algorithm for this value in order to provide collision avoidance and uniqueness while at the same time following any sequential constraints that you might have.

Insert to top of collection in mongodb

Is it possible in MongoDB to insert a new document to the top of a collection? Normally, when creating new document, it is appended or added to the end of the collection.
Please shed some light
The order in which documents are returned from a collection is undefined behavior. Usually documents appear in the order they are inserted, but you can not rely on that. There are several circumstances which can cause the documents to get returned in a different order.
When you want documents in a specific order, you need to sort them explicitly when you search for them.

Mongodb query which avoid duplicates items with different _id

I'm working with dump of events provided by 3rd party side. And in this dump they have duplicates events, which is fully identical, but have different _id. So my goal is to provide unique only data, so I can't use duplicates event.
Is it possible to build the query which return me only documents with unique value?
Lets say, in each document i have EventId field, and I need all the events, with unique EventID
I can't change dump, have only read permission to it.
You can use db.collection.distinct() method http://docs.mongodb.org/manual/reference/method/db.collection.distinct/

insert or ignore multiple documents in mongoDB

I have a collection in which all of my documents have at least these 2 fields, say name and url (where url is unique so I set up a unique index on it). Now if I try to insert a document with a duplicate url, it will give an error and halt the program. I don't want this behavior, but I need something like mysql's insert or ignore, so that mongoDB should not insert the document with duplicate url and continue with the next documents.
Is there some parameter I can pass to the insert command to achieve this behavior? I generally do a batch of inserts using pymongo as:
collection.insert(document_array)
Here collection is a collection and document_array is an array of documents.
So is there some way I can implement the insert or ignore functionality for a multiple document insert?
Set the continue_on_error flag when calling insert(). Note PyMongo driver 2.1 and server version 1.9.1 are required:
continue_on_error (optional): If True, the database will not stop
processing a bulk insert if one fails (e.g. due to duplicate IDs).
This makes bulk insert behave similarly to a series of single inserts,
except lastError will be set if any insert fails, not just the last
one. If multiple errors occur, only the most recent will be reported
by error().
Use insert_many(), and set ordered=False.
This will ensure that all write operations are attempted, even if there are errors:
http://api.mongodb.org/python/current/api/pymongo/collection.html#pymongo.collection.Collection.insert_many
Try this:
try:
coll.insert(
doc_or_docs=doc_array,
continue_on_error=True)
except pymongo.errors.DuplicateKeyError:
pass
The insert operation will still throw an exception if an error occurs in the insert (such as trying to insert a duplicate value for a unique index), but it will not affect the other items in the array. You can then swallow the error as shown above.
Why not just put your call to .insert() inside a try: ... except: block and continue if the insert fails?
In addition, you could also use a regular update() call with the upsert flag. Details here: http://www.mongodb.org/display/DOCS/Updating#Updating-update%28%29
If you have your array of documents already in memory in your python script, why not insert them by iterating through them, and simply catch the ones that fail on insertion due to the unique index?
for doc in docs:
try:
collection.insert(doc)
except pymongo.errors.DuplicateKeyError:
print 'Duplicate url %s' % doc
Where collection is an instance of a collection created from your connection/database instances and docs is the array of dictionaries (documents) you would currently be passing to insert.
You could also decide what to do with the duplicate keys that violate your unique index within the except block.
It is highly recommended to use upsert
stat.update({'location': d['user']['location']}, \
{'$inc': {'count': 1}},upsert = True, safe = True)
Here stat is the collection if visitor location is already present in the collection, count is increased by one, else count is set to 1.
Here is the link for documentation http://www.mongodb.org/display/DOCS/Updating#Updating-UpsertswithModifiers
What I am doing :
Generate array of MongoDB ids I want to insert (hash of some values in my case)
Remove existing IDs (I am using a Redis queue bcoz performance, but you can query mongo)
Insert your cleaned data !
Redis is perfect for that, you can use Memcached or Mysql Memory, according your needs