Aggregation Framework & large number of queries - mongodb

I have next document structure and insertion rules.
Document : {_id : 1, priority : 1.89, etc...}.
Rules : the field priority should be unique, the insert operation must affect the smallest number of elements.
I want to calculate new priority(Priority that will be displayed for user, 1,2,3,etc..) on db level.In SQL i can execute the next query(Based on Subquery)
Select *, (Select Count(fldName) from tableName where priority <= tn.priority) as priority from tableName as tn
In documentation i didn't found information about something similar, $project, $group, $let, etc...
Could you please give me advice about my problem!)
P.S
I can't store the data in normal form, they very often change and affect many elements.
Example :
{_id : 1, priority : 1, etc...}
{_id : 2, priority : 2, etc...}
{_id : 3, priority : 3, etc...}
insert in the middle
{_id : 4, priority : 2, etc...}
I should switch another elements
{_id : 1, priority : 1, etc...}
{_id : 4, priority : 2, etc...}
{_id : 2, priority : 3, etc...}
{_id : 3, priority : 4, etc...}

Related

MongoDB with composite hashed sharding key

I'm trying to understand how sharding will behave in a new MongoDB (v 4.4).
Let's say my data is composed of groups (several) that can contain devices (more than groups).
All these devices will store some time-dependent values over a period of time (3-9 months).
My initial idea is to create a device document for each 24 hours of data (in order to avoid hitting the 16MB/document limit).
This way a device will look something like:
{
_id: ObjectId(...)
grp_id: ObjectId(...) // OR DBRef ?
time_bucket: ISODate(...)
data: {...} // data for the last 24 hours specified by time_bucket
}
Now on to the sharding keys.
In order to have something non-monotonic I'm thinking of using {grp_id: "hashed"} since these will be user assigned, hence will not have any monotonicity assigned. Also devices from the smae group will be in the same shard (within chunk limits).
But groups are fairly low cardinality and high frequency, so the sharding key should become: {grp_id: "hashed", _id: 1}.
But this can cause trouble if I insert too much data for a device over time, so refine the key again: {grp_id: "hashed", _id: 1, time_bucket: 1}. This also has the nice ability to localize data, it will target only the shard in a certain time period.
Now to the question(s):
Will this approach work (in theory)?
If I already included grp_id: "hashed" will having _id: 1 in the key have any negative impact since it is a monotonic key?
Will the data be split more evenly between shards if I hash the _id instead of the grp_id (since grp_id is very high frequency and low cardinality compared to devices)? The key will be {_id: "hashed", grp_id: 1, time_bucket: 1}. In this case, will grp_id help with data locality, so devices from the same group will be placed on the same shard?
More generally: I think I'm a bit confused on how the data will be distributed when a hashed key will be used versus another in combination with range based keys and the fact that for range-based keys you can also define zones (which have to be manually managed).
I just tested the two sharding methods, {grp_id: "hashed", _id: 1, time_bucket: 1} (initial) vs {_id: "hashed", grp_id: 1, time_bucket: 1} (question 3):
The first approach yields imbalanced shards:
db.adminCommand({listDatabases:1, filter: {name: "test1"}})
{
"databases" : [
{
"name" : "test1",
"sizeOnDisk" : 22700032,
"empty" : false,
"shards" : {
"sharded-0" : 6377472,
"sharded-1" : 1425408,
"sharded-2" : 4210688,
"sharded-3" : 10686464
}
}
],
"totalSize" : 22700032,
"totalSizeMb" : 21,
"ok" : 1,
"operationTime" : Timestamp(1602680982, 2),
"$clusterTime" : {
"clusterTime" : Timestamp(1602680982, 2),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
While the second one yields balanced shards:
db.adminCommand({listDatabases:1, filter: {name: "test2"}})
{
"databases" : [
{
"name" : "test2",
"sizeOnDisk" : 21512192,
"empty" : false,
"shards" : {
"sharded-0" : 5152768,
"sharded-1" : 5185536,
"sharded-2" : 5218304,
"sharded-3" : 5955584
}
}
],
"totalSize" : 21512192,
"totalSizeMb" : 20,
"ok" : 1,
"operationTime" : Timestamp(1602682074, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1602682074, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
Both DBs contain 6 groups, each group 1000 devices, each device has 30 instances (data from 30 days).
I wonder what's the reason behind that, but I might ask a separate question.

Add ranking system in returned query document?

My User collection contains the following documents:
{"_id" : 1, point_count: "12"},
{"_id" : 2, point_count: "19"},
{"_id" : 3, point_count: "13"},
{"_id" : 4, point_count: "1233"},
{"_id" : 5, point_count: "1"},
... and about 1000 more
My question is: Is it possible to show ranking of each user based on point_count field when I search by id? let say I use this query to find user with id = 4
db.User.find({_id: 4})
let assume highest point_count in my entire collection is 1233, I'm hoping to get this result
{"_id" : 4, point_count: "1233", rank: 1}
rank 1 because 1233 is the highest point_count
or when I search user with id = 5, my expect result should be
{"_id" : 5, point_count: "1", rank: 1000..something}
1000..something because it is the lowest rank and there are around 1000+ id in the document.
Thank you all so much for helping me here!

MongoDB Why this error : can't append to array using string field name: comments

I have a DB structure like below:
{
"_id" : 1,
"comments" : [
{
"_id" : 2,
"content" : "xxx"
}
]
}
I update a new subdocument in the comments feild. It is OK.
db.test.update(
{"_id" : 1, "comments._id" : 2},
{$push : {"comments.$.comments" : {_id : 3, content:"xxx"}}}
)
after that the DB structure:
{
"_id" : 1,
"comments" : [
{
"_id" : 2,
"comments" : [
{
"id" : 3,
"content" : "xxx"
}
],
"content" : "xxx"
}
]
}
But when I update a new subdocument in the comment field that _id is 3, There is a error:
db.test.update(
{"_id" : 1, "comments.comments.id" : 3},
{$push : {"comments.comments.$.comments" : {id : 4, content:"xxx"}}}
)
error message:
can't append to array using string field name: comments
Well, it makes total sense if you think about it. MongoDb has the advantage and the disadvantage of solving magically certain things.
When you query the database for a specific regular field like this:
{ field : "value" }
The query {field:"value"} makes total sense, it wouldn't in case value is part of an array but Mongo solves it for you, so in case the structure is:
{ field : ["value", "anothervalue"] }
Mongo iterates through all of them and matches "value" into the field and you don't have to think about it. It works perfectly.. at only one level, because it's impossible to guess what you want to do if you have multiple levels
In your case the first query works because it's the case in this example:
db.test.update(
{"_id" : 1, "comments._id" : 2},
{$push : {"comments.$.comments" : {_id : 3, content:"xxx"}}}
)
Matches _id in the first level, and comments._id at the second level, it gets an array as a result but Mongo is able to solve it.
But in the second case, think what you need, let's isolate the where clause:
{"_id" : 1, "comments.comments.id" : 3},
"Give me from the main collection records with _id:1" (one doc)
"And comments which comments inside have and id=3" (array * array)
The first level is solved easily, comments.id, the second is not possible due comments returns an array, but one more level is an array of arrays and Mongo gets an array of arrays as a result and it's not possible to push a document into all the records of the array.
The solution is to narrow your where clause to obtain an unique document in comments (could be the first one) but it's not a good solution because you never know what is the position of the document you're looking for, using the shell I think the only option to be accurate is to do it in two steps. Check this query that works (not the solution anyway) but "solves" the multiple array part fixing it to the first record:
db.test.update(
{"_id" : 1, "comments.0.comments._id" : 3},
{$push : {"comments.0.comments.$.comments" : {id : 4, content:"xxx"}}}
)

Adding unique index in MongoDB ignoring nulls

I'm trying to add unique index on a group of fields in MongoDB. Not all of those fields are available in all of the documents and I'd like to index only those which have all of the fields.
So, I'm trying to run this:
db.mycollection.ensureIndex({date:1, type:1, reference:1}, {sparse: true, unique: true})
But I get an error E11000 duplicate key error index on a field which misses 'type' field (there are many of them and they are duplicate, but I just want to ignore them).
Is it possible in MongoDB or there is some workaround?
There are multiple people who want this feature and because there is no workaround for this, I would recommend voting up feature request Jira tickets in jira.mongodb.org:
SERVER-785 - support filtered (partial) indexes
SERVER-2193 - sparse indexes only support single field
Note that because 785 would provide a way to enforce this feature, 2193 is marked "won't fix" so it may be more productive to vote up and add your comments to 785.
The uniqueness, you can guarantee, using upsert operation instead of doing insert. This will make sure that if some document already exist then it will update or insert if document don't exist
test:Mongo > db.test4.ensureIndex({ a : 1, b : 1, c : 1}, {sparse : 1})
test:Mongo > db.test4.update({a : 1, b : 1}, {$set : { d : 1}}, true, false)
test:Mongo > db.test4.find()
{ "_id" : ObjectId("51ae978960d5a3436edbaf7d"), "a" : 1, "b" : 1, "d" : 1 }
test:Mongo > db.test4.update({a : 1, b : 1, c : 1}, {$set : { d : 1}}, true, false)
test:Mongo > db.test4.find()
{ "_id" : ObjectId("51ae978960d5a3436edbaf7d"), "a" : 1, "b" : 1, "d" : 1 }
{ "_id" : ObjectId("51ae97b960d5a3436edbaf7e"), "a" : 1, "b" : 1, "c" : 1, "d" : 1 }

Chaining time-based sort and limit issue

Lately I've encountered some strange behaviours (i.e. meaning that they are, IMHO, counter-intuitive) while playing with mongo and sort/limit.
Let's suppose I do have the following collection:
> db.fred.find()
{ "_id" : ObjectId("..."), "record" : 1, "time" : ISODate("2011-12-01T00:00:00Z") }
{ "_id" : ObjectId("..."), "record" : 2, "time" : ISODate("2011-12-02T00:00:00Z") }
{ "_id" : ObjectId("..."), "record" : 3, "time" : ISODate("2011-12-03T00:00:00Z") }
{ "_id" : ObjectId("..."), "record" : 4, "time" : ISODate("2011-12-04T00:00:00Z") }
{ "_id" : ObjectId("..."), "record" : 5, "time" : ISODate("2011-12-05T00:00:00Z") }
What I would like is retrieving, in time order, the 2 records previous to "record": 4 plus record 4 (i.e. record 2, record 3 and record 4)
Naively I was about running something along:
db.fred.find({time: {$lte: ISODate("2011-12-04T00:00:00Z")}}).sort({time: -1}).limit(2).sort({time: 1})
but it does not work the way I expected:
{ "_id" : ObjectId("..."), "record" : 1, "time" : ISODate("2011-12-01T00:00:00Z") }
{ "_id" : ObjectId("..."), "record" : 2, "time" : ISODate("2011-12-02T00:00:00Z") }
I was thinking that the result would have been record 2, record 3 and 4.
From what I recollected, it seems that the 2 sort does apply before limit:
sort({time: -1}) => record 4, record 3, record 2, record 1
sort({time: -1}).limit(2) => record 4, record 3
sort({time: -1}).limit(2).sort({time: 1}) => record 1, record 2
i.e it's like the second sort was applied to the cursor returned by find (i.e. the whole set) and then only, the limit is applied.
What is my mistake here and how can I achieve the expected behavior?
BTW: running mongo 2.0.1 on Ubuntu 11.01
The MongoDB shell lazily evaluates cursors, which is to say, the series of chained operations you've done results in one query being sent to the server, using the final state based on the chained operations. So when you say "sort({time: -1}).limit(2).sort({time: 1})" the second call to sort overrides the sort set by the first call.
To achieve your desired result, you're probably better off reversing the cursor output in your application code, especially if you're limiting to a small result set (here you're using 2). The exact code to do so depends on the language you're using, which you haven't specified.
Applying sort() to the same query multiple times makes no sense here. The effective sorting will be taken from the last sort() call. So
sort({time: -1}).limit(2).sort({time: 1})
is the same as
sort({time: 1}).limit(2)