MongoDB: Unique Key in Embedded Document - mongodb

Is it possible to set a unique key for a key in an embedded document?
I have a Users collection with the following sample documents:
{
Name: "Bob",
Items: [
{
Name: "Milk"
},
{
Name: "Bread"
}
]
},
{
Name: "Jim"
},
Is there a way to create an index on the property Items.Name?
I got the following error when I tried to create an index:
> db.Users.ensureIndex({"Items.Name": 1}, {unique:true});
E11000 duplicate key error index: GroceryGuruApp.Users.$Items.Name_1 dup key: {
: null }
Any suggestions? Thank you!

Unique indexes exist only across collection. To enforce uniqueness and other constraints across document you must do it in client code. (Probably virtual collections would allow that, you could vote for it.)
What are you trying to do in your case is to create index on key Items.Name which doesn't exist in any of the documents (it doesn't refer to embedded documents inside array Items), thus it's null and violates unique constraint across collection.

You can create a unique compound sparse index to accomplish something like what you are hoping for. It may not be the best option (client side still might be better), but it can do what you're asking depending on specific requirements.
To do it, you'll need to create another field on the same level as Name: Bob that is unique to each top-level record (could do FirstName + LastName + Address, we'll call this key Identifier).
Then create an index like this:
ensureIndex({'Identifier':1, 'Items.name':1},{'unique':1, 'sparse':1})
A sparse index will ignore items that don't have the field, so that should get around your NULL key issue. Combining your unique Identifier and Items.name as a compound unique index should ensure that you can't have the same item name twice per person.
Although I should add that I've only been working with Mongo for a couple of months and my science could be off. This is not based on empirical evidence but rather observed behavior.
More on MongoDB Indexes
Compound Keys Indexes
Sparse Indexes

An alternative would be to model the items as a hash with the item name as the key.
Items: { "Milk": 1, "Bread": 1 }
I'm not sure about whether you're trying to use the index for performance or purely for the constraint. The right way to approach depends on your use cases, and determining whether the atomic operations are enough to keep your data consistent.

The index will be across all Users and since you asked it for 'unique', no user will be able to have two of the same named item AND no two users will be able to have the same named Item.
Is that what you want?
Furthermore, it appears that it's objecting to two Users having a 'null' value for Items.Name, clearly Jim does, is there another record like that?
It would be unusual to require uniqueness on an indexed collection like this.
MongoDB does allow unique indexes where it indexes only the first of each value, see
http://www.mongodb.org/display/DOCS/Indexes#Indexes-DuplicateValues, but I suspect the real solution is to not require uniqueness in this case.
If you want to ensure uniqueness only within the Items for a single user you might want to try the $addToSet option. See http://www.mongodb.org/display/DOCS/Updating#Updating-%24addToSet

You can use use findAndModify to create a sequence/counter function.
function getNextSequence(name) {
var ret = db.counters.findAndModify({
query: { _id: name },
update: { $inc: { seq: 1 } },
new: true,
upsert: true
});
return ret.seq;
}
Then use it whenever a new id is needed...
db.users.insert({
_id: getNextSequence("userid"),
name: "Sarah C."
})
This is from http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/. Check it out.

Related

Custom MongoDB Object _id vs Compound index

So I need to create a lookup collection in MongoDB to verify uniqueness. The requirement is to check if the same 2 values are being repeated or not. In SQL, I would something like this
SELECT count(id) WHERE key1 = 'value1' AND key2 = 'value2'
If the above query returns a count then it means the combination is not unique. I have 2 solutions in mind but I am not sure which one is more scalable. There are 30M+ docs against which I need to create this mapping.
Solution1:
I create a collection of docs with compound index on key1 and key2
{
_id: <MongoID>,
key1: <value1>,
key2: <value2>
}
Solution2:
I write application logic to create custom _id by concatenating value1 and value2
{
_id: <value1>_<value2>
}
Personally, I feel the second one is more optimised as it only has a single index and the size of doc is also smaller. But I am not sure if it is a good practice to create my own _id indexes as they may not be completely random. What do you think?
Thanks in advance.
Update:
My database already has a lot of indexes which take up memory so I want to keep index size to as low as possible specially for collections which are only used to verify uniqueness.
I would suggest Solution 1 i.e to use compound index and use two different properties key1 and key2
db.yourCollection.ensureIndex( { "key1": 1, "key2": 1 }, { unique: true } )
You can search easily by individual field if required. i.e if you require to search only by key1 or key2 then it would be easy with compound index. If you make _id with combination of keys, then it will be hard to search by individual field.
Size of document in Mongo is very least bothered while designing document.
If in near future if you would required to change keys values of same document with respect to other values, it will be easy. Keep in mind if you are using reference of this document in other collection's document.
In terms of your scalability, _id index would be sequential, easily shardable, and you can let MongoDB manage it.
If you are searching with those keys then it will use that index otherwise it will use the other required indexes for your search.
If you are still thinking of size of document than searching then you can go with Solution 1, make _id like
{_id:{key1:<value1>,key2:<value2>}}
By this you can search specific _id.key1 too.
Update:
Yes if document size is your concern than maintaining. And if you are sure about keys will not modify in future of same document and if it still modifying and do not have reference in other collections, then you can use Solution 1. Just use keys as objects than underscore _. You can add more keys later too if wanted in future.
I think the solution 2 is more suitable for your requirement. It is absolutely ok to generate the _id value of MongoDB. Most of the applications does populate the _id value with UUID. In your case, it make sense to concatenate value 1 and 2 for _id value assuming this collection is primarily used for verifying the uniqueness (i.e kind of temporary table) or lookup purpose.
Solution 1 is expensive as it requires additional index. Again, it depends on whether you are going to use this collection for verifying the uniqueness purpose alone or for some other use case as well.
Please note that you need to create the unique compound index, so that it doesn't allow to insert data for duplicate values.

Mongo bulk insert and avoid duplicate values for multiple keys

I have two collections items with 120,000 entries and itemHistories with more than 20 million entries. I periodically update all items and itemHistories by fetching an API that lists all history data for an item.
What I need to do is batch insert the history data to the collection while avoiding duplicates. Also the history API returns only date, info, item_id values.
Is it possible to batch insert in Mongo so that it doesn't add duplicates for 2 values combined (date, item_id). So if there already is an entry with the same date and item_id don't add it. Basically the date is an unique index for the item_id. It's allowed to have duplicate date values in the collection but only if the item_id is different for all the duplicates.
One item can have close to a million entries so I don't think fetching the history from the collection and comparing it to the API response is going to be optimal.
My current idea was to add another key to the collection called hash that is an md5(date,info,item_id) and make it an unique index. Suggestions?
A little bit of digging in the documentation of Mongoose and MongoDB I found out that there is a thing called Unique Compound Index that solves my problem and answers this question. Since I've never used indexes before I didn't know such a thing was possible.
You can also enforce a unique constraint on compound indexes. If you
use the unique constraint on a compound index, then MongoDB will
enforce uniqueness on the combination of the index key values.
For example, to create a unique index on groupNumber, lastname, and
firstname fields of the members collection, use the following
operation in the mongo shell:
db.members.createIndex( { groupNumber: 1, lastname: 1, firstname: 1 }, { unique: true } )
Source: https://docs.mongodb.org/manual/core/index-unique/
In my case I can use this code below to avoid duplicates:
db.itemHistories.createIndex( { date: 1, item_id: 1 }, { unique: true } )

mongodb sharding, use multiple fields as the shard key?

I have documents with the following schema:
{
idents: {
list: ['foo', 'bar', ...],
id: 123
}
...
}
the field idents.list is an array of string and always contains at least one element.
the field idents.id may or may not be existant.
over time more entries are added to 'idents.list' and at some point in the future the field idents.id may be set too.
these two fields are used to clearly identify a document and therefore are relevant for a shard key.
is it possible to use sharding with this schema?
UPDATE:
documents are always queried via {idents.list: 'foo'} OR { $or: [ {idents.list: 'foo'}, {idents.id: 42} ] }
Yes,you can do this. The documentation says:
Use a compound shard key that uses two or three values from all documents that provide the right mix of cardinality with scalable write operations and query isolation.
https://docs.mongodb.org/manual/tutorial/choose-a-shard-key/

Dealing with mongodb unique, sparse, compound indexes

Because mongodb will index sparse, compound indexes that contain 1 or more of the indexed fields, it is causing my unique, sparse index to fail because one of those fields is optional, and is being coerced to null by mongodb for the purpose of the index.
I need database-level ensurance of uniqueness for the combination of this field and a few others, and having to manage this at the application level via some concatenated string worries me.
As an alternative, I considered setting the default value of the possibly null indexed field to 'null ' + anObjectId, because it would allow me to keep the index without causing errors. Does this seem like a sensisble (although hacky) solution? Does anyone know of a better way I could enforce database-level uniqueness on a compound index?
Edit: I was asked to elaborate on the actual problem domain a bit more, so here it goes.
We get large data feeds from our customers that we need to integrate into our database. These feeds include various (3) unique identifiers supplied by the customer that we use for updating the versions we store in our database when the data feeds refresh. I need to tie uniqueness of these identifiers to the customer, because the same identifier could appear from multiple sources, and we want to allow that.
The document structure looks like this:
{
"identifiers": {
"identifierA": ...,
"identifierB": ...,
"identifierC": ...
},
"client": ...
}
Because the each individual identifier is optional (at least one of the three is required), I need to uniquely index the combination of the index with the client (e.g. one index is the combination of client plus identifierA). However, this index must only occur when the identifier exists, but this is not supported my mongodb (see the hyperlink above).
I was considering the above solution, but I would like to hear if anyone else has solved this or has suggestions.
https://docs.mongodb.org/manual/core/index-partial/
As of mongoDB 3.2 you can create partial index to support this as well.
db.users.createIndex(
{ name: 1, email: 1 },
{ unique: true, partialFilterExpression: { email: { $exists: true } } }
)
A sparse index avoids indexing a field that doesn't exist.
A unique index avoid documents being inserted that have the same field values.
Unfortunately as of MongoDB 2.6.7, the unique constraint is always enforced even when creating a compound index (indexing two or more fields) with the sparse and unique properties.
Example:
db = db.connect("test");
db.a.drop();
db.a.insert([
{},
{a : 1},
{b : 1},
{a : 1, b : 1}
]);
db.a.ensureIndex({a:1,b:1}, { sparse: true, unique: true } );
db.a.insert({a : 1}); // throws Error but wanted insert to be valid.
However, it works as expected for a single index field with sparse and unique properties.
I feel like this is a bug that will get fixed in future releases.
Anyhow, here are two solutions to get around this problem.
1) Add a non-null hash field to each document that is only computed when all the required fields for checking the uniqueness are supplied.
Then create a sparse unique index on the hash field.
function createHashForUniqueCheck(obj){
if( obj.firstName && obj.id){
return MD5( String( obj.firstName) + String(obj.id) );
}
return null;
}
2) On the application side, check for uniqueness before insertion into Mongodb. :-)
sparse index doc
A hash index ended up being sufficient for this

MongoDB: Unique and sparse compound indexes with sparse values

I'm trying to store the following link:
URL = {
hostname: 'i.imgur.com',
webid: 'qkELz.jpg'
}
I want a unique and sparse compound index on these two fields because:
A combination of hostname and webid should be unique.
webid will always be queried with hostname.
webid need not be globally unique.
A URL need not have a webid.
However, when I do this, I get the following error:
MongoError: E11000 duplicate key error index: db.urls.$hostname_1_webid_1 dup key: { : "imgur.com", : null }
I guess in the case of compound indexes, nulls are counted, whereas in regular indexes, they are not.
Any way out of this problem? For now I'm just going to index hostname and webid separately.
Keep in mind that mongodb can only use one index per query (it won't join indexes together to make a query on two fields that have separate indexes faster).
That said, if you want to try to check for uniqueness, you could do a query from the app before inserting (which only partially solves the problem, because there's a gap between when you query and when you insert).
You might want to vote on this JIRA issue for filtered indexes, which will probably help your use case:
https://jira.mongodb.org/browse/SERVER-785