MongoDB bucket boundaries - mongodb

How to use MongoDB aggregation to use $bucket to get the output for the below records
Name
value
N1
0
N2
20
N3
0.01
N4
50
N5
10
N6
20
N7
0
N8
11
N9
35
N10
51
Boundaries for this should be on 'value' column:
' =0
' >0 and <=10
' >10 and <=20
' >20 and <= 30
' >30
Note - value 0.01 should be in the range '>0 and <=10'

Pls refer to https://mongoplayground.net/p/V9mjIw1HmYK
db.collection.aggregate([
{
$bucket: {
groupBy: "$N",
// Field to group by
boundaries: [
0,
1,
10.1,
20.1,
30.1,
1000
],
// Boundaries for the buckets
default: "Other",
// Bucket id for documents which do not fall into a bucket
output: {
// Output for each bucket
"count": {
$sum: 1
},
"data": {
$push: {
"N": "$N"
}
}
}
}
},
])

Related

how to perform statistics for every n elements in MongoDB

How to perform basic statistics for every n elements in Mongodb. For example, if I have total of 100 records like below
Name
Count
Sample
a
10
x
a
20
y
a
10
z
b
10
x
b
10
y
b
5
z
how do I perform mean, median, std dev for every 10 records so I get 10 results. So I want to calculate mean/median/std dev for A for every 10 sample till all the elements of database. Similarly for b, c and so on
excuse me if it is a naive question
you need to have some sort of counter to keep track of count.... for example I have added here rownumber then applied bucket of 3 (here n=3) and then returning the sum and average of the group(3). this example can be modified to do some sorting and grouping before we create the bucket to get the desired result.
Pls refer to https://mongoplayground.net/p/CL7vQGUWD_S
db.collection.aggregate([
{
$set: {
"rownum": {
"$function": {
"body": "function() {try {row_number+= 1;} catch (e) {row_number= 0;}return row_number;}",
"args": [],
"lang": "js"
}
}
}
},
{
$bucket: {
groupBy: "$rownum",
// Field to group by
boundaries: [
1,
4,
7,
11,
14,
17,
21,
25
],
// Boundaries for the buckets
default: "Other",
// Bucket id for documents which do not fall into a bucket
output: {
// Output for each bucket
"countSUM": {
$sum: "$count"
},
"averagePrice": {
$avg: "$count"
}
}
}
}
])

Count and range MongoDB

Let´s say I have a bunch of documents in this format;
{Person: "X" , Note: 4}
What I need to do is to count the total of Person who has the field Note within the range 0 - 50, 51-100, 101-150 and 150 or more
Something like this
//range of Note //total of persons in this range
0-50 14
51-100 32
101-150 34
151 21
In MongoDb you have $lt and $gt commands through which you can get less then and greater then values.
Then you can use $count on it like this->
db.table.aggregate(
[
{
$match: {
Note: {
$gt: 0, $lt: 50
}
}
},
{
$count: "0-50"
}
]
)
It will show result like:
{ "0-50" : 14 }

How to use nested mongoDB query to calculate percentage?

It is our coursework today, this is the dataset, and this is the column description:
VARIABLE DESCRIPTIONS:
Column
1 Class (0 = crew, 1 = first, 2 = second, 3 = third)
10 Age (1 = adult, 0 = child)
19 Sex (1 = male, 0 = female)
28 Survived (1 = yes, 0 = no)
and the last question is
What percentage of passenger survived? (use a nested mongodb query)
I know if I am going to calculate the percentage, I use .count to find how many rows that Survive = 1, and how many rows in total and use .find(Survive:1).count() divide .find().count() , for now I know I can use aggregate to solve the problem but it does not meet the requirement. Any ideas?
Considering following data:
db.titanic.insert({ Survived: 1 })
db.titanic.insert({ Survived: 1 })
db.titanic.insert({ Survived: 1 })
db.titanic.insert({ Survived: 0 })
you can use $group with $sum. 1 passed as a argument will give you total count while $Survived will count survived people. Then you can use $divide to get the percentage.
db.titanic.aggregate([
{
$group: {
_id: null,
Survived: { $sum: "$Survived" },
Total: { $sum: 1 }
}
},
{
$project: {
_id: 0,
SurvivedPercentage: { $divide: [ "$Survived", "$Total" ] }
}
}
])
which outputs: { "SurvivedPercentage" : 0.75 }

How to group by uniform intervals of data between a maximum and minimum using the MongoDB aggregator?

Let's say I have a whole mess of data that yields a range of integer values for a particular field... I'd like to see those ranked by a grouping of intervals of occurrence, perhaps because I am clustering...like so:
[{
_id: {
response_time: "3-4"
},
count: 234,
countries: ['US', 'Canada', 'UK']
}, {
_id: {
response_time: "4-5"
},
count: 452,
countries: ['US', 'Canada', 'UK', 'Poland']
}, ...
}]
How can I write a quick and dirty way to A) group the collection data by equally spaced intervals over B) a minimum and maximum range using a MongoDB aggregator?
Well, in order to quickly formulate a conditional grouping syntax for MongoDB aggregators, we first adopt the pattern, per MongoDB syntax:
$cond: [
{ <conditional> }, // test the conditional
<truthy_value>, // assign if true
$cond: [ // evaluate if false
{ <conditional> },
<truthy_value>,
... // and so forth
]
]
In order to do that muy rapidamente, without having to write every last interval out in a deeply nested conditional, we can use this handy recursive algorithm (that you import in your shell script or node.js script of course):
$condIntervalBuilder = function (field, interval, min, max) {
if (min < max - 1) {
var cond = [
{ '$and': [{ $gt:[field, min] }, { $lte: [field, min + interval] }] },
[min, '-', (min + interval)].join('')
];
if ((min + interval) > max) {
cond.push(ag.$condIntervalBuilder(field, (max - min), min, max));
} else {
min += interval;
cond.push(ag.$condIntervalBuilder(field, interval, min, max));
}
} else if (min >= max - 1 ) {
var cond = [
{ $gt: [field, max] },
[ max, '<' ].join(''), // Accounts for all outside the range
[ min, '<' ].join('') // Lesser upper bound
];
}
return { $cond: cond };
};
Then, we can invoke it in-line or assign it to a variable that we use elsewhere in our analysis.

Select MongoDB documents with custom random probability distribution

I have a collection that looks something like this:
[
{
"id": 1,
"tier": 0
},
{
"id": 2,
"tier": 1
},
{
"id": 3
"tier": 2
},
{
"id": 4,
"tier": 0
}
]
Is there a standard way to select n elements where the probabilty of choosing an element of the lowest tier is p, the next lowest tier is (1-p)*p, and so on, with standard random selection of element?
So for example, if the most likely thing happens and I run the query against the above example with n = 2 and any p > .5 (which I think will always be true), then I'd get back [{"id": 1, ...}, {"id": 4}]; with n = 3, then [{"id": 4}, {"id": 1}, {"id": 2}], etc.
E.g. here's some pseudo-Python code given a dictionary like that as objs:
def f(objs, p, n):
# get eligible tiers
tiers_set = set()
for o in objs:
eligible_tiers.add(o["tier"])
tiers_list = sorted(list(tiers_set))
# get the tier for each index of results
tiers = []
while len(tiers) < min(n, len(obis)):
tiers.append(select_random_with_initial_p(eligible_tiers, p))
# get res
res = []
for tier in tiers:
res.append(select_standard_random_in_tier(objs, tier)
return res
First, enable geospatial indexing on a collection:
db.docs.ensureIndex( { random_point: '2d' } )
To create a bunch of documents with random points on the X-axis:
for ( i = 0; i < 10; ++i ) {
db.docs.insert( { key: i, random_point: [Math.random(), 0] } );
}
Then you can get a random document from the collection like this:
db.docs.findOne( { random_point : { $near : [Math.random(), 0] } } )
Or you can retrieve several document nearest to a random point:
db.docs.find( { random_point : { $near : [Math.random(), 0] } } ).limit( 4 )
This requires only one query and no null checks, plus the code is clean, simple and flexible. You could even use the Y-axis of the geopoint to add a second randomness dimension to your query.
To make your custom random selection, you can change that part [Math.random(), 0], so it best suits your random distribution
Source: Random record from MongoDB