How to group by uniform intervals of data between a maximum and minimum using the MongoDB aggregator? - mongodb

Let's say I have a whole mess of data that yields a range of integer values for a particular field... I'd like to see those ranked by a grouping of intervals of occurrence, perhaps because I am clustering...like so:
[{
_id: {
response_time: "3-4"
},
count: 234,
countries: ['US', 'Canada', 'UK']
}, {
_id: {
response_time: "4-5"
},
count: 452,
countries: ['US', 'Canada', 'UK', 'Poland']
}, ...
}]
How can I write a quick and dirty way to A) group the collection data by equally spaced intervals over B) a minimum and maximum range using a MongoDB aggregator?

Well, in order to quickly formulate a conditional grouping syntax for MongoDB aggregators, we first adopt the pattern, per MongoDB syntax:
$cond: [
{ <conditional> }, // test the conditional
<truthy_value>, // assign if true
$cond: [ // evaluate if false
{ <conditional> },
<truthy_value>,
... // and so forth
]
]
In order to do that muy rapidamente, without having to write every last interval out in a deeply nested conditional, we can use this handy recursive algorithm (that you import in your shell script or node.js script of course):
$condIntervalBuilder = function (field, interval, min, max) {
if (min < max - 1) {
var cond = [
{ '$and': [{ $gt:[field, min] }, { $lte: [field, min + interval] }] },
[min, '-', (min + interval)].join('')
];
if ((min + interval) > max) {
cond.push(ag.$condIntervalBuilder(field, (max - min), min, max));
} else {
min += interval;
cond.push(ag.$condIntervalBuilder(field, interval, min, max));
}
} else if (min >= max - 1 ) {
var cond = [
{ $gt: [field, max] },
[ max, '<' ].join(''), // Accounts for all outside the range
[ min, '<' ].join('') // Lesser upper bound
];
}
return { $cond: cond };
};
Then, we can invoke it in-line or assign it to a variable that we use elsewhere in our analysis.

Related

MongoDB sort using a custom function

Let's say I have a collection that looks like:
{
_id: 'aaaaaaaaaaaaaaaaaaaaaaaaa',
score: 10
hours: 50
},
{
_id: 'aaaaaaaaaaaaaaaaaaaaaaaab',
score: 5
hours: 55
},
{
_id: 'aaaaaaaaaaaaaaaaaaaaaaaac',
score: 15
hours: 60
}
I want to sort this list by a custom order, namely
value = (score - 1) / (T + 2) ^ G
score: score
T: current_hours - hours
G: some constant
How do I do this? I assume this is going to require writing a custom sorting function that compares the score and hours fields in addition to taking a current_hours as an input, performs that comparison and returns the sorted list. Note that hours and current_hours is simply the number of hours that have elapsed since some arbitrary starting point. So if I'm running this query 80 hours after the application started, current_hours takes the value of 80.
Creating an additional field value and keeping it constantly updated is probably too expensive for millions of documents.
I know that if this is possible, this is going to look something like
db.items.aggregate([
{ "$project" : {
"_id" : 1,
"score" : 1,
"hours" : 1,
"value" : { SOMETHING HERE, ALSO REQUIRES PASSING current_hours }
}
},
{ "$sort" : { "value" : 1 } }
])
but I don't know what goes into value
I think value will look something like this:
"value": {
$let: {
vars: {
score: "$score",
t: {
"$subtract": [
80,
"$hours"
]
},
g: 3
},
in: {
"$divide": [
{
"$subtract": [
"$$score",
1
]
},
{
"$pow": [
{
"$add": [
"$$t",
2
]
},
"$$g"
]
}
]
}
}
}
Playground example here
Although it's verbose, it should be reasonably straightforward to follow. It uses the arithmetic expression operators to build the calculation that you are requesting. A few specific notes:
We use $let here to set some vars for usage. This includes the "runtime" value for current_hours (80 in the example per the description) and 3 as an example for G. We also "reuse" score here which is not strictly necessary, but done for consistency of the next point.
$ refers to fields in the document where $$ refer to variables. That's why everything in the vars definition uses $ and everything for the actual calculation in in uses $$. The reference to score inside of in could have been done via just the field name ($), but I personally prefer the consistency of this approach.

How to do fast query on String dataType in MongoDB, where values are of type double

I have a field contractValue and other fields in a collection contract which is of type String . It basically holds double value like 1200 or 1500 but at some places it may contain value like $1200 or $1500.
Sample data from collection:
{ ..
..
contractValue: "1200", //This is the one stored as String. I need
// to perform range query over it
..
..
}
{ ..
..
contractValue: "$1500",
..
..
}
I have requirement where i need to fetch contracts based on contract values. Query can be like below:
{$and: [ {'contractValue': {$gt: 100}}, {'contractValue': {$lt: 1000 }}]}
This query is giving me wrong result. It is also giving me documents having contractValue like 1238999
Also I need to create indexes on contractValue
Is it possible to create index on contract value , so that I can efficiently make range query, so that whenever making any query, it will do < or > on Index and will fetch exact set of documents, rather than making change in schema?
How to handle values like $1200 in index, so index value just contain 1200 as integer
rather than $1200
try this:
https://mongoplayground.net/p/TG3Y5tdh9aK
it assumes string data will be either a quoted number or a quoted number with "$" at the front
db.collection.aggregate([
{
$project: {
"newContractValue": {
"$convert": {
"input": "$contractValue",
"to": "double",
"onError": {
$toDouble: {
"$substr": [
"$contractValue",
1,
{
"$strLenCP": "$contractValue"
}
]
}
}
}
}
}
},
{
$match: {
$and: [
{
"newContractValue": {
$gt: 100
}
},
{
"newContractValue": {
$lt: 1000
}
}
]
}
}
])
This can be used to set a new contractValueNew field as number from the existing contractValue
db.getCollection('yourCollection').find({})
.forEach(function(record) {
if(record.contractValue.toString().substring(0, 1) == '$') {
record.contractValueNew = NumberInt(parseInt(record.contractValue.substring(1, record.contractValue.length)));
} else {
record.contractValueNew = NumberInt(parseInt(record.contractValue))
}
db.getCollection('yourCollection').save(record)
})
Try:
db.collection.find({'contractValue': {$gt: 100, $lt: 1000 }})
Create index on contractValue , but convert all values as numbers ...

how to perform statistics for every n elements in MongoDB

How to perform basic statistics for every n elements in Mongodb. For example, if I have total of 100 records like below
Name
Count
Sample
a
10
x
a
20
y
a
10
z
b
10
x
b
10
y
b
5
z
how do I perform mean, median, std dev for every 10 records so I get 10 results. So I want to calculate mean/median/std dev for A for every 10 sample till all the elements of database. Similarly for b, c and so on
excuse me if it is a naive question
you need to have some sort of counter to keep track of count.... for example I have added here rownumber then applied bucket of 3 (here n=3) and then returning the sum and average of the group(3). this example can be modified to do some sorting and grouping before we create the bucket to get the desired result.
Pls refer to https://mongoplayground.net/p/CL7vQGUWD_S
db.collection.aggregate([
{
$set: {
"rownum": {
"$function": {
"body": "function() {try {row_number+= 1;} catch (e) {row_number= 0;}return row_number;}",
"args": [],
"lang": "js"
}
}
}
},
{
$bucket: {
groupBy: "$rownum",
// Field to group by
boundaries: [
1,
4,
7,
11,
14,
17,
21,
25
],
// Boundaries for the buckets
default: "Other",
// Bucket id for documents which do not fall into a bucket
output: {
// Output for each bucket
"countSUM": {
$sum: "$count"
},
"averagePrice": {
$avg: "$count"
}
}
}
}
])

Mongoose updates : Increment a counter and reset to 0 on a new date

I have a schema that looks like this:
var Counter = new Schema ({
_id: ObjectId,
date: Date,
counter: Number
})
On the request, I send date of a day and expect the date to be added and the counter to increase. Now when I add, a counter gets incremented ie 1,2,3 etc and a date gets added.
Now here is a problem: I want a counter to reset to 0 when a different date is given,(such to say on every new day, the counter should start at 0) and then start a counter increment again etc...
This is my code on how I have tried:
Counter.findOneAndUpdate(
{
$set:{
"date: thedate",
},
$inc: {
"counter: counter+1"
}
)
How do I achieve this ?
UPDATE FOR MORE CLARIFIATION
Take this example of two documents
{
"_id": ObjectId("1111"),
"date": "2020-04-13",
"counter": 0,
}
{
"_id": ObjectId("2222"),
"date": "2020-04-29",
"counter": 0,
}
My collection has more than one document. I want to update the document based on its id. For this case i want to update the 1st document of id 1111
Now if give an input date, say 2020-04-13 and id of '1111' which matches the first document, it should increment the couter to 1. If I give an the same date again (with same id of 111) it should increment the counter to 2.
If again I give an input date of 2020-04-14(which is another date) on the same first document of id 1111 it should reset the counter to 0.
Now How do I achieve this?
As you can execute update-with-an-aggregation-pipeline in .update() operations starting MongoDB version >= 4.2, try below querie :
Counter.findOneAndUpdate(
{ _id: ObjectId("............") }, // filter query - Input of type ObjectId()
/** Update date field with input & re-create 'counter' field based on condition */
[{
$addFields: {
date: inputDate, // Input type of date, In general Date is saved in MongoDB as ISODate("2020-05-06T00:00:00.000Z")
counter: { $cond: [ { $eq: [ "$date", inputDate ] }, { $add: [ "$counter", 1 ] }, 0 ] }
}
}]
)
Test : Test aggregation pipeline here : mongoplayground
I'm still not clear what you want to achieve but you can try this method by breaking the find and update
Counter.findOne({}) //match inside {} condition
.then(counter => {
if(counter.date){ //apply date condition here
counter.counter+=1;
} else {
counter.counter = 0;
}
counter.save()
.then(updatedCounter => {
// further action you want to take
})
})
.catch(err => {
//handle err here
})
You can use $inc operator in MongoDB here.
Example:
export async function incrementCounter(db, formId) {
return db.collection('posts').updateOne(
{ _id: postId },
{
$inc: {
comments: 1 // Increments by 1. Similarly, -2 will decrement by 2.
}
}
)
}
For the reset functionality, you can use MongoDB Atlas Triggers. You can also use a third-party library like mongodb-cron.

How to compare fields into AggregateFunction

sorry for my english ..
I need to compare the user'param(MonthYear / 6characters) with the field TxtDtVts (a date with 7 OR 8 characters)
If they match, it must return the fields "CodeTva" and "TauxTVA" by the desired Month
below is my collection "tickets":
{
"_id" : ObjectId("59e66bdda00472964e6a950b"),
"Pharma" : "HEA00001",
"TxtDtVts" : 2012016, // Or 22012016 (7 or 8 characts)
"TxtHrsVts" : 842,
"NumVts" : 845613,
"NumEmp" : 19,
"NumPoste" : 127,
"PVHT" : 1.0575,
"CodeTva" : 4,
"TauxTVA" : 2.1,
"PVTTC" : 1.08,
}
Here my end point and my aggregate function :
secureRoutes.route('/ticketTVA/month/:MonthYear') // Example 012016
.get(function(req, res){
var mois= req.params.month;
Ticket.aggregate([
{$project:{
TxtDtVts:1,
Correspondance: {
$let: {
vars: {
monthSubstring: { $substr: [ "$TxtDtVts", 0, -1 ] },
moisReq:{$substr: ["$mois",0,-1]},
},
in: { $cmp: [ "$$monthSubstring", "$$moisReq" ] }
}
}
}},
],function (err, result) {
if (err) {
console.log(err);
return;
}
console.log(result);
res.json(result);
});
})
I tried to use $substr to convert data into string, save them in vars and use $cmp to compare them.
If i do that (without match before), i have an error : errmsg: 'aggregation result exceeds maximum document size (16MB)
More, it's a bad way because TxtDtVts length change (1012016 OR 10012016)..
How can i compare this 2 data and if it match, return me "CodeTva" and "TauxTVA" ?
Thank you in advance
Try this on for size: Make your input an int, NOT a string. It is easy to take your input and do so. Then:
db.foo.aggregate([
{ $addFields: {
"rc": {$eq: [{$divide:[{ "$subtract": [ "$date", input ] },1000000]},
{$trunc: {$divide:[{ "$subtract": [ "$date", input ] },1000000]}} ]}
}}
,{ $match: {"rc":true}}
]);
The idea is to take a date like 8022013 (Feb 8, 2013) or 25121970 (Dec 25, 1970)
and subtract off the MMYYYY component. So an input of 22013 (the int version of the string "022013", note how leading zero drops off) yields 8000000. Division by 1000000 on a floating point and integer basis yields the same number (8). If the DB date was 8032013, the diff is 8010000. This yields 8 and 8.01 which are not equal.
I found a solution
jrsDuMois= [];
var i = 1;
for (; i <= 31; i++) {
jrsDuMois.push(parseInt(i+""+mois));
}
And i compare the field TxtDtVts with my array "jrsDuMois":
{ "$match": {
"TxtDtVts": { "$in": jrsDuMois },
}},