Mongodb Hashed Sharding - mongodb

If I choose {a:1,b:1,c:1} as my shard key and in my query I filter {a:1} in a hashed sharding strategy , is the query a targeted operation or it is broadcasting to every shard in the cluster?
If it is targeted operation how mongodb determine it? as hash of {a:1} is completely differ from hash of {a:1,b:1,c:1}

The simple answear is: Yes.
Look at it this way:
Let's assume you have got the following collection:
//1
{
a: 1,
b: 1,
c: 1,
d: 1
},
//2
{
a: 1,
b: 1,
c: 1,
d: 2
},
//3
{
a: 1,
b: 1,
c: 2,
d: 5
}
According to your index, docs 1 and 2 must be at the same bulk (let's say, on shard number 1) while doc 3 could be stored on a different bulk (let's say, on shard number 2)
Now, if you search for {a: 1}, all three docs should appear. Meaning that mongo had to distribute the que both to shard no.1 and shard no.2.
As for your second question, in MongoDb, you cannot perform Compound-Hashed-Index at all (and even if you could, than... yes. The hashed value would probably be diff)

Related

mongodb linear combination of values for sorting

I'm using mongodb (any version, I'd happily upgrade to make this work as I'm asking). I have a large collection (~50 million documents) with several numeric fields, e.g.:
{_id: 'doc1', a: 5, b: 2, c: 0},
{_id: 'doc2', a: 4, b: 9, c: 6},
{_id: 'doc3', a: 1, b: 7, c: 4},
{_id: 'doc4', a: 8, b: 1, c: 1},
...
I'd like to sort this collection by various linear combinations of a, b, and c (in reality there are something like 20 fields I'd like to combine). So for example I'd like to sort by 3*a + 4*b + 10*c. The weights (3, 4 and 10 in my example) are something I'd like to experiment with rapidly.
I didn't see a simple way for indexes to support efficient sorting on linear combinations of fields. I know I can do this with the aggregation pipeline, but I think it will still require a collection scan for every new set of weights (?).
If I were implementing mongodb I can imagine that perhaps I could compute a index on the expression 3*a + 4*b + 10*c efficiently by using indices on a, b and c, i.e. I wouldn't need to do a full collection scan to compute a new linearly weighted index. I'm not sure if that's true theoretically, and if it translates into anything I can do practically to solve this problem.
Any input is welcome!

Why does mongo choose this index in this situation? [duplicate]

To quote the docs:
When creating an index, the number associated with a key specifies the
direction of the index, so it should always be 1 (ascending) or -1
(descending). Direction doesn't matter for single key indexes or for
random access retrieval but is important if you are doing sorts or
range queries on compound indexes.
However, I see no reason why direction of the index should matter on compound indexes. Can someone please provide a further explanation (or an example)?
MongoDB concatenates the compound key in some way and uses it as the key in a BTree.
When finding single items - The order of the nodes in the tree is irrelevant.
If you are returning a range of nodes - The elements close to each other will be down the same branches of the tree. The closer the nodes are in the range the quicker they can be retrieved.
With a single field index - The order won't matter. If they are close together in ascending order they will also be close together in descending order.
When you have a compound key - The order starts to matter.
For example, if the key is A ascending B ascending the index might look something like this:
Row A B
1 1 1
2 2 6
3 2 7
4 3 4
5 3 5
6 3 6
7 5 1
A query for A ascending B descending will need to jump around the index out of order to return the rows and will be slower. For example it will return Row 1, 3, 2, 6, 5, 4, 7
A ranged query in the same order as the index will simply return the rows sequentially in the correct order.
Finding a record in a BTree takes O(Log(n)) time. Finding a range of records in order is only OLog(n) + k where k is the number of records to return.
If the records are out of order, the cost could be as high as OLog(n) * k
The simple answer that you are looking for is that the direction only matters when you are sorting on two or more fields.
If you are sorting on {a : 1, b : -1}:
Index {a : 1, b : 1} will be slower than index {a : 1, b : -1}
Why indexes
Understand two key points.
While an index is better than no index, the correct index is much better than either.
MongoDB will only use one index per query, making compound indexes with proper field ordering what you probably want to use.
Indexes aren't free. They take memory, and impose a performance penalty when doing inserts, updates and deletes. Normally the performance hit is negligible (especially compared to gains in read performance), but that doesn't mean that we can't be smart about creating our indexes.
How Indexes
Identifying what group of fields should be indexed together is about understanding the queries that you are running. The order of the fields used to create your index is critical. The good news is that, if you get the order wrong, the index won't be used at all, so it'll be easy to spot with explain.
Why Sorting
Your queries might need Sorting. But sorting can be an expensive operation, so it's important to treat the fields that you are sorting on just like a field that you are querying. So it will be faster if it has index. There is one important difference though, the field that you are sorting must be the last field in your index. The only exception to this rule is if the field is also part of your query, then the must-be-last-rule doesn't apply.
How Sorting
You can specify a sort on all the keys of the index or on a subset; however, the sort keys must be listed in the same order as they appear in the index. For example, an index key pattern { a: 1, b: 1 } can support a sort on { a: 1, b: 1 } but not on { b: 1, a: 1 }.
The sort must specify the same sort direction (i.e. ascending/descending) for all its keys as the index key pattern or specify the reverse sort direction for all its keys as the index key pattern. For example, an index key pattern { a: 1, b: 1 } can support a sort on { a: 1, b: 1 } and { a: -1, b: -1 } but not on { a: -1, b: 1 }.
Suppose there are these indexes:
{ a: 1 }
{ a: 1, b: 1 }
{ a: 1, b: 1, c: 1 }
Example Index Used
db.data.find().sort( { a: 1 } ) { a: 1 }
db.data.find().sort( { a: -1 } ) { a: 1 }
db.data.find().sort( { a: 1, b: 1 } ) { a: 1, b: 1 }
db.data.find().sort( { a: -1, b: -1 } ) { a: 1, b: 1 }
db.data.find().sort( { a: 1, b: 1, c: 1 } ) { a: 1, b: 1, c: 1 }
db.data.find( { a: { $gt: 4 } } ).sort( { a: 1, b: 1 } ) { a: 1, b: 1 }

Trying to do an upsert with a $push in the most efficient manner using pymongo

I have a search query, that if it finds a match, I would like to push onto its 'vals' array, if it does not find a match then I would like to do an insert with the search query along with the newVals array
findDict = {a: 100, b: 250, c: 110}
newVals = [{x: 1, y: 2}, {x: 4, y:7]}
collection.update(findDict,{'$push': {'vals': newVals}}, upsert = True)
In this example above, if a match was found for findDict, then newVals would be pushed onto the existing vals array for the matching record.
If no match is found, I would like it to create a new record that looks like this:
{a: 100, b: 250, c: 110, vals: [{x: 1, y: 2}, {x: 4, y:7]}
I have to do this several million times, so I'm hoping to do it in the most optimal way. I also have many threads coming in and doing this at once so have to worry about concurrency. The update statement posted above almost seems to work, but it creates an entry like this for some reason if no match is found:
{a: 100, b: 250, c: 110, vals: [ [ {x: 1, y: 2}, {x: 4, y:7 ] ]}
note the array inside the array...
I currently have a unique combined index on a,b, and c. This can be changed if it will help somehow. I think I could do an update with upsert set to False, followed by an insert which will fail if the unique index exists... but it seems I would be doing each search twice in that case and killing my efficiency.
Have you tried using $push with $each?
collection.update(
findDict,
{'$push': {'vals': {'$each': newVals}}},
upsert = True
)

Mongo indexes in detail

I have mongo db collection which looks like the following
collection {
X: 1,
Y: 2,
Z: 3,
T_update: 123,
T_publish: 243,
T_insert: 342
}
I have to create an index like
{X: 1, Y: 1, Z: 1, T_update: 1}
{X: 1, Y: 1, Z: 1, T_publish: 1}
{X: 1, Y: 1, Z: 1, T_insert: 1}
But what I see is that the value X: 1, Y:1, Z:1 will lead to redundancy and only time paramter which I intend to use for sorting is changing. Is there any better way to create the above indexes so that I do not ave to create three separate indexes.
Also say if I have index like
{X: 1, Y: 1, Z: 1, T_update: 1}
and I want Mongo to return result such that x = 5, y = any value, Z = 4, sort = T_update
will the above index be useful or should I create an index such as
{X:1, Z:1, T_update: 1},
I hope that I can avoid it.
The answer here is going to depend on the selectivity of the fields you are indexing - if the criteria you will be using to filter X, Y, or Z are not very selective then they can essentially be left out (or moved to the right of the compound key).
Let's say you are using a filter like Y is not equal to 1, where 1 is a rare value. Since you will be traversing almost the entire index to return most of the values, and scanning the data, having an index on Y will be of less benefit than having an index for the sort first. Given that scenario, if sorting on T_Update it would probably be beneficial to have an index like: {T_update: 1, Y : 1}.
In the end, there are lots and lots of permutations here in terms of what might be the most efficient way to index. The real way to figure out the best indexes for your data set is to use explain() and hint() to test the various indexes with your specific query pattern and data set.

Why does direction of index matter in MongoDB?

To quote the docs:
When creating an index, the number associated with a key specifies the
direction of the index, so it should always be 1 (ascending) or -1
(descending). Direction doesn't matter for single key indexes or for
random access retrieval but is important if you are doing sorts or
range queries on compound indexes.
However, I see no reason why direction of the index should matter on compound indexes. Can someone please provide a further explanation (or an example)?
MongoDB concatenates the compound key in some way and uses it as the key in a BTree.
When finding single items - The order of the nodes in the tree is irrelevant.
If you are returning a range of nodes - The elements close to each other will be down the same branches of the tree. The closer the nodes are in the range the quicker they can be retrieved.
With a single field index - The order won't matter. If they are close together in ascending order they will also be close together in descending order.
When you have a compound key - The order starts to matter.
For example, if the key is A ascending B ascending the index might look something like this:
Row A B
1 1 1
2 2 6
3 2 7
4 3 4
5 3 5
6 3 6
7 5 1
A query for A ascending B descending will need to jump around the index out of order to return the rows and will be slower. For example it will return Row 1, 3, 2, 6, 5, 4, 7
A ranged query in the same order as the index will simply return the rows sequentially in the correct order.
Finding a record in a BTree takes O(Log(n)) time. Finding a range of records in order is only OLog(n) + k where k is the number of records to return.
If the records are out of order, the cost could be as high as OLog(n) * k
The simple answer that you are looking for is that the direction only matters when you are sorting on two or more fields.
If you are sorting on {a : 1, b : -1}:
Index {a : 1, b : 1} will be slower than index {a : 1, b : -1}
Why indexes
Understand two key points.
While an index is better than no index, the correct index is much better than either.
MongoDB will only use one index per query, making compound indexes with proper field ordering what you probably want to use.
Indexes aren't free. They take memory, and impose a performance penalty when doing inserts, updates and deletes. Normally the performance hit is negligible (especially compared to gains in read performance), but that doesn't mean that we can't be smart about creating our indexes.
How Indexes
Identifying what group of fields should be indexed together is about understanding the queries that you are running. The order of the fields used to create your index is critical. The good news is that, if you get the order wrong, the index won't be used at all, so it'll be easy to spot with explain.
Why Sorting
Your queries might need Sorting. But sorting can be an expensive operation, so it's important to treat the fields that you are sorting on just like a field that you are querying. So it will be faster if it has index. There is one important difference though, the field that you are sorting must be the last field in your index. The only exception to this rule is if the field is also part of your query, then the must-be-last-rule doesn't apply.
How Sorting
You can specify a sort on all the keys of the index or on a subset; however, the sort keys must be listed in the same order as they appear in the index. For example, an index key pattern { a: 1, b: 1 } can support a sort on { a: 1, b: 1 } but not on { b: 1, a: 1 }.
The sort must specify the same sort direction (i.e. ascending/descending) for all its keys as the index key pattern or specify the reverse sort direction for all its keys as the index key pattern. For example, an index key pattern { a: 1, b: 1 } can support a sort on { a: 1, b: 1 } and { a: -1, b: -1 } but not on { a: -1, b: 1 }.
Suppose there are these indexes:
{ a: 1 }
{ a: 1, b: 1 }
{ a: 1, b: 1, c: 1 }
Example Index Used
db.data.find().sort( { a: 1 } ) { a: 1 }
db.data.find().sort( { a: -1 } ) { a: 1 }
db.data.find().sort( { a: 1, b: 1 } ) { a: 1, b: 1 }
db.data.find().sort( { a: -1, b: -1 } ) { a: 1, b: 1 }
db.data.find().sort( { a: 1, b: 1, c: 1 } ) { a: 1, b: 1, c: 1 }
db.data.find( { a: { $gt: 4 } } ).sort( { a: 1, b: 1 } ) { a: 1, b: 1 }