Is it possible to run a "dummy" query to see how many documents _would_ be inserted - mongodb

I am using MongoDB to track unique views of a resource.
Everytime a user views a specific resource for the first time, a new view is logged in the db.
If that same user views the same resource again, the unique compound index on the collection blocks the insert of the duplicate.
For bulk inserts, with { ordered: false }, Mongo allows the new views through and blocks the duplicates. The return value of the insert is an object with an insertedCount property, telling me how many docs made it past the unique index.
In some cases, I want to know how many docs would be inserted before running the query. Then, based on the dummy insertedCount, I would choose to run the query, or not.
Is there a way to test a query and have it do everything except actually inserting the docs?
I could solve this by running some js serverside to get the answer I need. But I would prefer to let the db do those checks

Related

mongodb multiple documents insert or update by unique key

I would like to get a list of items from an external resource periodically and save them into a collection.
There are several possible solutions but they are not optimal, for example:
Delete the entire collection and save the new list of items
Get all items from the collection using "find({})" and use it to filter out existing items and save those that do not exist.
But a better solution will be to set a unique key and just do kind of "update or insert".
Right now on saving items the unique key already exists I will get an error
is there a way to do it at all?
**upsert won't do the work since it's updating all items with the same value so it's actually good for a single document only
I have a feeling you can achieve what you want simply by using the "normal" insertMany with the ordered option set to false. The documentation states that
Note that one document was inserted: The first document of _id: 13
will insert successfully, but the second insert will fail. This will
also stop additional documents left in the queue from being inserted.
With ordered to false, the insert operation would continue with any
remaining documents.
So you will get "duplicate key" exceptions which, however, you can simply ignore in your case.

When are mongodb indexes updated?

Question
Are mongodb indexes updated before the success of a write operation is reported to the application or do index updates run in the background? If they run in the background: is there a way to wait for an index update to complete?
Background
I have a document
person1obj = {
email: 'user#domain.tld',
[...]
}
in a people collection where a unique index is applied to the email field. Now I'd like to insert another document
person2obj = {
email: 'user#domain.tld',
[...]
}
Obviously, I have to change the email field of person1 before person2 can be inserted. With mongoose, the code looks like
mongoose.model('Person').create(person1obj, function (err, person1) {
// person1 has been saved to the db and 'user#domain.tld' is
// added to the *unique* email field index
// change email for person1 and save
person1.email = 'otheruser#domain.tld';
person1.save(function(err, person1) {
// person1 has been updated in the db
// QUESTION: is it guaranteed that 'user#domain.tld' has been removed from
// the index?
// inserting person2 could fail if the index has not yet been updated
mongoose.model('Person').create(person2obj, function (err, person2) {
// ...
});
});
});
I have seen a random fail of my unit tests with the error E11000 duplicate key error index which made me wonder if index updates run in the background.
This question probably is related to mongodb's write concern but I couldn't find any documentation on the actual process for index updates.
From the FAQ (emphasis mine):
How do write operations affect indexes?
Any write operation that alters an indexed field requires an update to the index in addition to the document itself. If you update a document that causes the document to grow beyond the allotted record size, then MongoDB must update all indexes that include this document as part of the update operation.
Therefore, if your application is write-heavy, creating too many indexes might affect performance.
At the very least in the case of unique indexes, the indexing does not run in the background. This is evident by the fact that when you try to write a new document with a duplicate key that is suppose to be unique you get a duplicate key error.
If indexing was to happen asynchronously in the background, Mongo would not be able to tell if the write actually succeeded. Thus the indexing must happen during the write sequence.
While I have no evidence for this (though Mongo is open source, if you have enough time you can look it up), I believe that all indexing is done during the write sequence, even if its not a unique index. It wouldn't make sense to have special logic for writes that affect a unique index.

How to insert new document only if it doesn't already exist in MongoDB

I have a collection of users with the following schema:
{
_id:ObjectId("123...."),
name:"user_name",
field1:"field1 value",
field2:"field2 value",
etc...
}
The users are looked up by the user.name, which must be unique. When a new user is added, I first perform a search and if no such user is found, I add the new user document to the collection. The operations of searching for the user and adding a new user, if not found, are not atomic, so it's possible, when multiple application servers are connect to the DB server, for two add_user requests to be received at the same time with the same user name, resulting in no such user being found for both add_user requests, which in turn results with two documents having the same "user.name". In fact this happened (due to a bug on the client) with just a single app server running NodeJS and using Async library.
I was thinking of using findAndModify, but that doesn't work, since I'm not simply updating a field (that exists or doesn't exist) of a document that already exists and can use upsert, but want to insert a new document only if the search criteria fails. I can't make the query to be not equal to "user.name", since it will find other users.
First of all, you should maintain a unique index on the name field of the users collection. This can be specified in the schema if you are using Mongoose or by using the statement:
collection.ensureIndex('name', {unique: true}, callback);
This will make sure that the name field remains unique and will solve the problem of concurrent requests as you have specified in your question. You do not require searching when this index is set.

is there any way to restore predefined schema to mongoDB?

I'm beginner with mongoDB. i want to know is there any way to load predefined schema to mongoDB? ( for example like cassandra that use .cql file for this purpose)
If there is, please intruduce some document about structure of that file and way for restoring.
If there is not, how i can create an index only one time when I create a collection. I think it is wrong if i create index every time I call insert method or run my program.
p.s: I have a multi-threaded program that every thread insert and update my mongo collection. I want to create index only one time.
Thanks.
To create an index on a collection you need to use ensureIndex command. You need to only call it once to create an index on a collection.
If you call ensureIndex repeatedly with the same arguments, only the first call will create an index, all subsequent calls will have no effect.
So if you know what indexes you're going to use for your database, you can create a script that will call that command.
An example insert_index.js file that creates 2 indexes for collA and collB collections:
db.collA.ensureIndex({ a : 1});
db.collB.ensureIndex({ b : -1});
You can call it from a shell like this:
mongo --quiet localhost/dbName insert_index.js
This will create those indexes on a database named dbName on your localhost. It's worth noticing that if your database and/or collections are not yet created, this will create both the database and the collections for which you're adding the indexes.
Edit
To clarify a little bit. MongoDB is schemaless so you can't restore it's schema.
You can only create indexes and collections (by using createCollection helper).
MongoDB is basically schemaless so there is no definition of a schema or namespaces to be restored.
In the case of indexes, these can be created at any time. There does not need to be a collection present or even the required fields for the index as this will all be sorted out as the collections are created and when documents are inserted that matches the defined fields.
Commands to create an index are generally the same with each implementation language, for example:
db.collection.ensureIndex({ a: 1, b: -1 })
Will define the index on the target collection in the target database that will reference field "a" and field "b", the latter in descending order. This will happen even if the collection or even the database does not exist as yet, or in fact will establish a blank namespace in that case.
Subsequent calls to the same index creation method do not actually re-create the index. Where the same index is specified to one that already exists it is effectively skipped as a "no-operation".
As such, you can simply feed all your required index creation statements at application startup and anything that is not already present will be created. Anything that already exists will be left alone.

MongoDB - Combine filter with default filter

I'm hoping to do a very common task which is to never delete items that I store, but instead just mark them with a deleted flag. However, for almost every request I will now have to specify deleted:false. Is there a way to have a "default" filter on which you can add? Such that I can construct a live_items filter and do queries on top of that?
This was just one guess at a potential answer. In general, I'd just like to have deleted=False be the default search.
Thanks!
In SQL you would do this with a view, but unfortunately MongoDB doesn't support views.
But when queries which exclude items which are marked as deleted are far more frequent than those which include them, you could remove the deleted items from the main items collection and put them in a separate items_deleted collection. This also has the nice side-effect that the performance of the collection of active items doesn't get impaired by a large number of deleted items. The downside is that indices can't be guaranteed to be unique over both collections.
I went ahead and just made a python function that combines the specified query:
def find_items(filt, single=False, live=True):
if live:
live = {'deleted': False}
filt = dict(filt.items() + live.items())
if single:
return db.Item.find_one(filt)
else:
return db.Item.find(filt)