Query Mongo Collection w/Compound Key - mongodb

I have a collection with a compound key:
db.stuff.insert( {"_id":{"aid":"123","brand":"acme"},"name":"Greg"} )
The compound key ensures uniqueness in a multi-tenant environment. For this application the order being constant for BSON is fine.
My question is this: Can I find all the 'stuff' with brand = "acme" (i.e. use one part of the compound key in a query)? If it's possible will it utilize the index?

Sure, use a simple find():
db.stuff.find({"_id.brand" : "acme"});

Related

Custom MongoDB Object _id vs Compound index

So I need to create a lookup collection in MongoDB to verify uniqueness. The requirement is to check if the same 2 values are being repeated or not. In SQL, I would something like this
SELECT count(id) WHERE key1 = 'value1' AND key2 = 'value2'
If the above query returns a count then it means the combination is not unique. I have 2 solutions in mind but I am not sure which one is more scalable. There are 30M+ docs against which I need to create this mapping.
Solution1:
I create a collection of docs with compound index on key1 and key2
{
_id: <MongoID>,
key1: <value1>,
key2: <value2>
}
Solution2:
I write application logic to create custom _id by concatenating value1 and value2
{
_id: <value1>_<value2>
}
Personally, I feel the second one is more optimised as it only has a single index and the size of doc is also smaller. But I am not sure if it is a good practice to create my own _id indexes as they may not be completely random. What do you think?
Thanks in advance.
Update:
My database already has a lot of indexes which take up memory so I want to keep index size to as low as possible specially for collections which are only used to verify uniqueness.
I would suggest Solution 1 i.e to use compound index and use two different properties key1 and key2
db.yourCollection.ensureIndex( { "key1": 1, "key2": 1 }, { unique: true } )
You can search easily by individual field if required. i.e if you require to search only by key1 or key2 then it would be easy with compound index. If you make _id with combination of keys, then it will be hard to search by individual field.
Size of document in Mongo is very least bothered while designing document.
If in near future if you would required to change keys values of same document with respect to other values, it will be easy. Keep in mind if you are using reference of this document in other collection's document.
In terms of your scalability, _id index would be sequential, easily shardable, and you can let MongoDB manage it.
If you are searching with those keys then it will use that index otherwise it will use the other required indexes for your search.
If you are still thinking of size of document than searching then you can go with Solution 1, make _id like
{_id:{key1:<value1>,key2:<value2>}}
By this you can search specific _id.key1 too.
Update:
Yes if document size is your concern than maintaining. And if you are sure about keys will not modify in future of same document and if it still modifying and do not have reference in other collections, then you can use Solution 1. Just use keys as objects than underscore _. You can add more keys later too if wanted in future.
I think the solution 2 is more suitable for your requirement. It is absolutely ok to generate the _id value of MongoDB. Most of the applications does populate the _id value with UUID. In your case, it make sense to concatenate value 1 and 2 for _id value assuming this collection is primarily used for verifying the uniqueness (i.e kind of temporary table) or lookup purpose.
Solution 1 is expensive as it requires additional index. Again, it depends on whether you are going to use this collection for verifying the uniqueness purpose alone or for some other use case as well.
Please note that you need to create the unique compound index, so that it doesn't allow to insert data for duplicate values.

How can I set multiple fields as primary key in MongoDB?

I am trying to create a collection with 50+ fields. I understand that the purpose of the primary key is to uniquely identify a record. Since the primary key is the _id in MongoDB that gets created automatically, isn't it obvious that all my records including duplicate would go into my DB with unique _id for evert record? Tell me where I'm going wrong.Other articles and discussions are more confusing.
How to set any one/more of the other fields as a primary key? But I don't want the default _id as primary key.
In what way, compound indexes are different from compound/primary key?
There is no such notion as a primary key in MongoDB. Terminology matters. Not knowing the terminology is a sure sign someone hasn't read the docs or at least not carefully.
A document in a collection must have an _id field which may be and by default is an ObjectId. This field has an index on it which enforces a unique constraint, so there can not be any two documents with the same value or combination of values in the _id field. Which, by what you describe, presumably is what you want.
My suggestion is to reuse the default _id as often as you can. Additional indices are expensive (RAM-wise). You have two options here: either use a different single value as _id or use multiple values, if the cardinality of the single field isn't enough.
Let us assume you want a clickstream per user recorded. Obviously, you need to have the unique user. But that would not be enough, since a user only could only have one entry. But since you need a timestamp fo each click anyway, you move it to the _id field:
{
_id:{
user: "some user",
ts: new ISODate()
},
...
}
Unless your Mongo installation is sharded, you can you create a unique compound index on multiple fields and use this as a surrogate composite primary key.
db.collection.createIndex( { a: 1, b: 1 }, { unique: true } )
Alternatively you could create your own _id values. However, as the default ObjectId is also a timestamp, personally I find it useful for auditing purposes.
Regarding the difference between compound index and composite primary key, by definition primary keys cannot be defined on a missing (null) fields and there can only be one primary key per document. In MongoDB only the _id field can be used as a primary key, as it is added by default when missing. In contrast, a compound index can be applied on missing fields by defining it as parse and you can define multiple compound indices on the same document.

Additional index in shard collection in MongoDB

I have sharded collection with shard key as hashed _id (docs ).
Now I want to add index on another field (called alias) which doesn't need to be unique but it should be sparse if possible. This new index would serve only for speeding my query.
Should I do some kind of compound index for this or there is another procedure if possible in the first place?
In app I make only 2 types of queries :
db.myCollection.findOne({alias : 'some-alias'})
db.myCollection.findOne({_id : 'some-id'})
The second one works as supposed to (because my shard key is on hashed _id field) but first query is problem.
Thanks,
Ivan

MongoDB: Unique and sparse compound indexes with sparse values

I'm trying to store the following link:
URL = {
hostname: 'i.imgur.com',
webid: 'qkELz.jpg'
}
I want a unique and sparse compound index on these two fields because:
A combination of hostname and webid should be unique.
webid will always be queried with hostname.
webid need not be globally unique.
A URL need not have a webid.
However, when I do this, I get the following error:
MongoError: E11000 duplicate key error index: db.urls.$hostname_1_webid_1 dup key: { : "imgur.com", : null }
I guess in the case of compound indexes, nulls are counted, whereas in regular indexes, they are not.
Any way out of this problem? For now I'm just going to index hostname and webid separately.
Keep in mind that mongodb can only use one index per query (it won't join indexes together to make a query on two fields that have separate indexes faster).
That said, if you want to try to check for uniqueness, you could do a query from the app before inserting (which only partially solves the problem, because there's a gap between when you query and when you insert).
You might want to vote on this JIRA issue for filtered indexes, which will probably help your use case:
https://jira.mongodb.org/browse/SERVER-785

Add _id when ensuring index?

I am building a webapp using Codeigniter (PHP) and MongoDB.
I am creating indexes and have one question.
If I am querying on three fields (_id, status, type) and want to
create an index do I need to include _id when ensuring the index like this:
db.comments.ensureIndex({_id: 1, status : 1, type : 1});
or will this due?
db.comments.ensureIndex({status : 1, type : 1});
You would need to explicitly include _id in your ensureIndex call if you wanted to include it in your compound index. But because filtering by _id already provides selectivity of a single document that's very rarely the right thing to do. I think it would only make sense if your documents are very large and you're trying to use covered indexes.
MongoDB will currently only use one index per query with the exception of $or queries. If your common query will always be searching on those three fields (_id, status, type) then a compound index would be helpful.
From within the DB shell you can use the explain() command on your query to get information on the indexes used.
You don't need to implicitly create index on the _id field, it's done automatically. See the mongo documentation:
The _id Index
For all collections except capped collections, an index is automatically created for the _id field. This index is special and cannot be deleted. The _id index enforces uniqueness for its keys (except for some situations with sharding).