I have a doubt regarding Mongo compound shard keys. Let's suppose I have document that is structured like this:
{
"players": [
{
"id": "12345",
"name": "John",
},
{
"id": "23415",
"name": "Doe",
}
]
}
Players embedded documents are always present and always 2. I think that the "players.0.id" and "players.1.id" should be a good choice as shard keys because are not monotonic and are evenly distributed.
What I can't understand from the documentation is if:
All documents with same "players.0.id" OR same "players.1.id" are supposed to be saved into the same Chunk, or
All documents with same "players.0.id" AND same "players.1.id" are supposed to be saved into the same Chunk.
In other words, if I query the Collection to get all games played by John (as player 1 or player 2) the query will be sent to one chunk or to all chunks?
You cannot create a shard key where part of the key is a multikey index (i.e. index on an array field). This is mentioned in Shard Key Index Type:
A shard key index cannot be an index that specifies a multikey index, a text index or a geospatial index on the shard key fields.
If you have exactly two items under the players field, why not create two sub-documents instead of using an array? An array is typically useful for use cases where you have multiple items of indeterminate number in a document. For example, this structure might work for your use case:
{
"players": {
"player_1": {
"id" : 12345,
"name": "John"
},
"player_2": {
"id": 54321,
"name": "Doe"
}
}
}
You can then create an index like:
> db.test.createIndex({'players.player_1.id':1, 'players.player_2.id':1})
To answer your questions, if you're using this shard key, then:
There is no guarantee that the same player_1.id and player_2.id will be on the same chunk. This will depend on your data distribution.
If you query John as player_1 OR player_2, the query will be sent to all shards. This is because you have a compound index as the shard key, and you're searching for an exact match on the non-prefix field.
To elaborate on question 2:
The query you're doing is this:
db.test.find({$or: [
{'players.player_1.id':123},
{'players.player_2.id':123}
]})
In a compound index, the index was first sorted by player_1.id, then for each player_1.id, there exist sorted player_2.id. For example, if you have 10 documents with some combination of values for player_1.id and player_2.id, you can visualize the index like this:
player_1.id | player_2.id
------------|-------------
0 | 10
0 | 123
1 | 100
1 | 123
2 | 123
2 | 150
123 | 10
123 | 100
123 | 123
123 | 150
Note that the value player_2.id: 123 occur multiple times in the table, once per each player_1.id. Also note that for each player_1.id value, the player_2.id values are sorted within it.
This is how MongoDB's compound index works and how it's sorted. There are more nuances with compound indexes that is too long to explain here, but the details are explained in the Compound Indexes page
The effect of this ordering method is that, there are many, many identical player_2.id values spread across the index. Since the overall index is only sorted in terms of player_1.id, it is not possible to find an exact player_2.id without specifying player_1.id. Hence, the above query will be sent to all shards.
Related
I am working on optimising my queries in mongodb.
In normal sql query there is an order in which where clauses are applied. For e.g. select * from employees where department="dept1" and floor=2 and sex="male", here first department="dept1" is applied, then floor=2 is applied and lastly sex="male".
I was wondering does it happen in a similar way in mongodb.
E.g.
DbObject search = new BasicDbObject("department", "dept1").put("floor",2).put("sex", "male");
here which match clause will be applied first or infact does mongo work in this manner at all.
This question basically arises from my background with SQL databases.
Please help.
If there are no indexes we have to scan the full collection (collection scan) in order to find the required documents. In your case if you want to apply with order [department, floor and sex] you should create this compound index:
db.employees.createIndex( { "department": 1, "floor": 1, "sex" : 1 } )
As documentation: https://docs.mongodb.org/manual/core/index-compound/
db.products.createIndex( { "item": 1, "stock": 1 } )
The order of the fields in a compound index is very important. In the
previous example, the index will contain references to documents
sorted first by the values of the item field and, within each value of
the item field, sorted by values of the stock field.
I need help.. Is there any method available to fetch documents between a range of indexes while using find in mongo.. Like [2:10] (from 2 to 10) ?
If you are talking about the "index" position within an array in your document then you want the $slice operator. The first argument being the index to start with and the second is how many to return. So from a 0 index position 2 is the "third" index:
db.collection.find({},{ "list": { "$slice": [ 2, 8 ] })
Within a collection itself if you use the .limit() an .skip() modifiers to move through the range in the collection:
db.collection.find({}).skip(2).limit(8)
Keep in mind that in the collection context MongoDB has no concept of "ordered" records and is dependent on the query and/or sort order that is given
I am querying 3 collections in MongoDB and then creating a new document by taking some fields from the documents of the 3 separate collections. For example: I am taking field 'A' from first collection, field 'B' from second and field 'C' from third.
Using them i am creating a json document like
var uploadDoc = {
'A' : <value of A>,
'B' : <value of B>,
'C' : <value of C>,
}
This uploadDoc is being uploaded to another collection.
Question: I wish to upload only distinct values of uploadDoc. By default MongoDB gives each uploadDoc a unique id. How do I insert uplodDocs to the collection only when another document with the same A, B and C values hasn't been inserted before?
I am using javascript to query the collections and create docs.
Two ways are simple:
use "upserts"
db.collection.update(uploadDoc,uploadDoc,{ "upsert": true })
Use a unique index
db.collection.ensureIndex({ "A": 1, "B": 1, "C": 1 },{ "unique": true });
db.collection.insert(uploadDoc); // Same thing fails :(
Both work. Choose one.
You should use Unique Indexes: doc
You shouldn't use upsert without unique indexes:
To avoid inserting the same document more than once, only use upsert: true if the query field is uniquely indexed.
because
Consider when multiple clients issue the following update with an upsert parameter at the same time:
[cut]
If all update() operations complete the query portion before any client successfully inserts data, and there is no unique index on the name field, then each update operation may result in an insert.
from here
Let's say I have the following document schema:
{
_id: ObjectId(...),
name: "Kevin",
weight: 500,
hobby: "scala",
favoriteFood : "chicken",
pet: "parrot",
favoriteMovie : "Diehard"
}
If I create a compound index on name-weight, I will be able to specify a strict parameter (name == "Kevin"), and a range on weight (between 50 and 200). However, I would not be able to do the reverse: specify a weight and give a "range" of names.
Of course compound index order matters where a range query is involved.
If only exact queries will be used (example: name == "Kevin", weight == 100, hobby == "C++"), then does the order actually matter for compound indexes?
When you have an exact query, the order should not matter. But when you want to be sure, the .explain() method on database cursors is your friend. It can tell you which indexes are used and how they are used when you perform a query in the mongo shell.
Important fields of the document returned by explain are:
indexOnly: when it's true, the query was completely covered by the index
n and nScanned: The first one tells you the number of found documents, the second how many documents had to be examined because the indexes couldn't sort them out. The latter shouldn't be notably higher than the first.
millis: number of milliseconds the query took to perform
I have a collection of photos. They can be referenced (one-to-many) from some other collections (events, news, posts, etc)
I can create reference like this:
db.photos.insert({ parent:{ collection: 'events', id: 12345 }})
db.photos.insert({ parent:{ collection: 'events', id: 54321 }})
//or just DBRef
db.photos.ensureIndex({parent:1})
OR
db.photos.insert({ post_id: 12345 })
db.photos.insert({ event_id: 54321 })
db.photos.ensureIndex({post_id:1}, {sparse: true})
db.photos.ensureIndex({event_id:1}, {sparse: true})
In the first case we have one big compound index
In the second — some number of smaller indexes
What's pros and cons of each approach?
First, check the number of hits on which field.
Second, create one compound index with max hit fields.
Third, create one compound index with min hit fields.
Note:
If a large number of fields hit at a time than used compound indexing.
Other cases create single indexing.