My data :
[
{ total: 7421356 },
{ total: 79421356 },
{ total: 105457854 },
{ total: 1054578540 },
{ total: 10545785400 },
]
I would like to have something like :
[
{ val: 7000000, count 1 },
{ val: 70000000, count 1 },
{ val: 100000000, count: 1 },
{ val: 1000000000, count 1 }
]
Actually i use this pipeline :
{
$addFields: {
length: {
$multiply: [
{
$add: [
{
$strLenCP: {
$toString: "$val",
}
},
-1
]
},
-1
]
},
},
},
{
$project: {
value: {
$trunc: ["$val", "$length"],
},
_id: 0,
}
},
{
$group: {
_id: "$value",
count: {
$sum: 1
}
}
},
{
$project: {
value: "$_id",
count: 1,
_id: 0,
}
},
{
$sort: {
value: 1
}
}
I have a problem when i have data like "10545785400".
It seems his length it too long and for my data "7421356" his result is now "0".
I thought the documents were going through the pipeline individually but it doesn't appear to be.
My first data seems to use the length of my last.
I hope someone can help me even if my explanations are not very clear.
EDIT : It seems to be a "type" problem. Data with greater than 1.000.000.000 are double not int32
EDIT 2 : It works with "24760000000" but not with "25661674539". I really don't understand why. They are stored in Double format.
tl;dr
add $toLong:"$total" (from here)
The doubles are being converted to scientific notation, and that's taking into account to measure the string length, which is 9 in most or all of the fields. As you said the type of those numbers is double.
See a working example here, converting the string to number before cutting off the data:
db.collection.aggregate({
$addFields: {
length: {
$multiply: [
{
$add: [
{
$strLenCP: {
$toString: {
$toLong: "$total"
},
}
},
-2
]
},
-1
]
},
}
},
{
$project: {
value: {
$toLong:{
$trunc: [
"$total",
"$length"
],
}
},
_id: 0,
}
})
Detail
You can't convert those big numbers to int using $toInt because the numbers are large. Large being larger than 10^10
Related
I am facing a problem in MongoDB. Suppose, I have the following collection.
{ id: 1, issueDate: "07/05/2021", code: "31" },
{ id: 2, issueDate: "12/11/2020", code: "14" },
{ id: 3, issueDate: "02/11/2021", code: "98" },
{ id: 4, issueDate: "01/02/2021", code: "14" },
{ id: 5, issueDate: "06/23/2020", code: "14" },
{ id: 6, issueDate: "07/01/2020", code: "31" },
{ id: 7, issueDate: "07/05/2022", code: "14" },
{ id: 8, issueDate: "07/02/2022", code: "20" },
{ id: 9, issueDate: "07/02/2022", code: "14" }
The date field is in the format MM/DD/YYYY. My goal is to get the count of items with each season (spring (March-May), summer (June-August), autumn (September-November) and winter (December-February).
The result I'm expecting is:
count of fields for each season:
{ "_id" : "Summer", "count" : 6 }
{ "_id" : "Winter", "count" : 3 }
top 2 codes (first and second most recurring) per season:
{ "_id" : "Summer", "codes" : {14, 31} }
{ "_id" : "Winter", "codes" : {14, 98} }
How can this be done?
You should never store date/time values as string, store always proper Date objects.
You can use $setWindowFields opedrator for that:
db.collection.aggregate([
// Convert string into Date
{ $set: { issueDate: { $dateFromString: { dateString: "$issueDate", format: "%m/%d/%Y" } } } },
// Determine the season (0..3)
{
$set: {
season: { $mod: [{ $toInt: { $divide: [{ $add: [{ $subtract: [{ $month: "$issueDate" }, 1] }, 1] }, 3] } }, 4] }
}
},
// Count codes per season
{
$group: {
_id: { season: "$season", code: "$code" },
count: { $count: {} },
}
},
// Rank occurrence of codes per season
{
$setWindowFields: {
partitionBy: "$_id.season",
sortBy: { count: -1 },
output: {
rank: { $denseRank: {} },
count: { $sum: "$count" }
}
}
},
// Get only top 2 ranks
{ $match: { rank: { $lte: 2 } } },
// Final grouping
{
$group: {
_id: "$_id.season",
count: { $first: "$count" },
codes: { $push: "$_id.code" }
}
},
// Some cosmetic for output
{
$set: {
season: {
$switch: {
branches: [
{ case: { $eq: ["$_id", 0] }, then: 'Winter' },
{ case: { $eq: ["$_id", 1] }, then: 'Spring' },
{ case: { $eq: ["$_id", 2] }, then: 'Summer' },
{ case: { $eq: ["$_id", 3] }, then: 'Autumn' },
]
}
}
}
}
])
Mongo Playground
I will give you clues,
You need to use $group with _id as $month on issueDate, use accumulator $sum to get month wise count.
You can divide month by 3, to get modulo, using $toInt, $divide, then put them into category using $cond.
Another option:
db.collection.aggregate([
{
$addFields: {
"season": {
$switch: {
branches: [
{
case: {
$in: [
{
$substr: [
"$issueDate",
0,
2
]
},
[
"06",
"07",
"08"
]
]
},
then: "Summer"
},
{
case: {
$in: [
{
$substr: [
"$issueDate",
0,
2
]
},
[
"03",
"04",
"05"
]
]
},
then: "Spring"
},
{
case: {
$in: [
{
$substr: [
"$issueDate",
0,
2
]
},
[
"12",
"01",
"02"
]
]
},
then: "Winter"
}
],
default: "No date found."
}
}
}
},
{
$group: {
_id: {
s: "$season",
c: "$code"
},
cnt1: {
$sum: 1
}
}
},
{
$sort: {
cnt1: -1
}
},
{
$group: {
_id: "$_id.s",
codes: {
$push: "$_id.c"
},
cnt: {
$sum: "$cnt1"
}
}
},
{
$project: {
_id: 0,
season: "$_id",
count: "$cnt",
codes: {
"$slice": [
"$codes",
2
]
}
}
}
])
Explained:
Add one more field for season based on $switch per month(extracted from issueDate string)
Group to collect per season/code.
$sort per code DESCENDING
group per season to form an array with most recurring codes in descending order.
Project the fields to the desired output and $slice the codes to limit only to the fist two most recurring.
Comment:
Indeed keeping dates in string is not a good idea in general ...
Playground
Let's say I have this data:
{"Plane":"5546","Time":"55.0", City:"LA"}
{"Plane":"5548","Time":"25.0", City:"CA"}
{"Plane":"5546","Time":"6.0", City:"LA"}
{"Plane":"5548","Time":"5.0", City:"CA"}
{"Plane":"5555","Time":"15.0", City:"XA"}
{"Plane":"5555","Time":"8.0", City:"XA"}
and more but I just visualize the data
I want to calculate and group all the time and plane, this is expected output:
{"_id:":["5546","LA"],"Sum":2,"LateRate":1,"Prob"0.5}
The sum is sum all the time, Late is sum all the time with time > "15" and Prob is Late/Sum
The code I have tried but it still is missing something:
db.Collection.aggregate([
{
$project: {
Sum: 1,
Late: {
$cond: [{ $gt: ["$Time", 15.0] }, 1, 0]
},
prob:1
}
},
{
$group:{
_id:{Plane:"$Plane", City:"$City"},
Sum: {$sum:1},
Late: {$sum: "$Late"}
}
},
{
$addFields: {
prob: {
"$divide": [
"$Late",
"$Sum"
]
}
}
},
])
db.collection.aggregate([
{
$project: {
Time: 1,
Late: {
$cond: [
{
$gt: [
{
$toDouble: "$Time"
},
15.0
]
},
"$Time",
0
]
},
prob: 1,
Plane: 1,
City: 1
}
},
{
$group: {
_id: {
Plane: "$Plane",
City: "$City"
},
Sum: {
$sum: {
"$toDouble": "$Time"
}
},
Late: {
$sum: {
$toDouble: "$Late"
}
}
}
},
{
$addFields: {
prob: {
"$divide": [
"$Late",
"$Sum"
]
}
}
}
])
Project limits the fields passed to the next stage
On string, you cannot perform all relational/arithmetic operations
Playground
My company has inserted numerical values for certain keys in string format. They can't be converted to integer format for some business reason.
Now coming to the query...
I am writing a mongo aggregate query which calculates annual cost for a particular manufacturer like Unilever across shops. It seems I cannot convert a string to integer inside the $cond and $eq blocks using $toInt method.
Please find below the sample collection.
[
{
_id: "ddfdfdfdggfgfgsg",
rate: "3323",
quantity_packs: "343",
shop_name: "Whole Foods",
manufacturer_name: "Unilever"
},
{
_id: "ddfdfdfsdsds",
rate: "434",
quantity_packs: "453",
shop_name: "Carrefour",
manufacturer_name: "Unilever"
},
{
_id: "dfdfdgcvgfgfvvv",
rate: "343",
quantity_packs: "23",
shop_name: "Target",
manufacturer_name: "Beirsdorf"
}
]
The query is
db.collection.aggregate([
{
$match: {
manufacturer_name: {
$in: [ "Unilever" ]
}
}
},
{
$group: {
_id: {
"Shop Name": "$shop_name"
},
"annual_cost": {
$sum: {
$cond: [
{
$eq: ["manufacturer_name", "Unilever"]
},
{ "$toInt": "$rate"},
0
]
}
},
"other_annual_cost": {
$sum: {
$cond: [
{
$ne: [$manufacturer_name, "Unilever"]
}, {"$toInt" : "$rate"},
0
]
}
},
"annual_qty": {
$sum: {
"$toInt": "$quantity_packs"
}
},
}
},
{
$project: {
"Purchase_Cost": {
$multiply: [ "$annual_cost", "$annual_qty" ]
},
"Other Manu Pur Cost": {
$multiply: ["$other_annual_cost", "$annual_qty"]
}
}
}
])
Current Output
[
{
_id: { 'Shop Name': 'Whole Foods' },
Purchase_Cost: 0
}
]
As $rate is of string type, the multiplication has yielded 0 as shown over here. Ideally the result should show some integer value for purchase cost as shown below.
Intended Output
[
{
_id: { 'Shop Name': 'Whole Foods' },
Purchase_Cost: 234
}
]
Any suggestion would be of great help. I want to make this query work somehow.
I have updated the question based on Rajdeep's Answer.
I just corrected this, please take a look
Playground
"annual_cost": {
$sum: {
$cond: [
{
$eq: [
"$manufacturer_name", //added $
"Unilever"
]
},
{
$toInt: "$rate" //added $toInt
},
0
]
I need to match one of two fields that must not be equal to zero. How to implement it?
I try these solutions but no luck:
Solution 1:
Model.aggregate[
{
$project: {
accountID: "$_id.accountID",
locationID: "$_id.locationID",
time: "$_id.time",
value: "$value",
actualValue: "$actualValue",
total: { $add: ["$value", "$actualValue"] },
},
},
{
$match: {
total: { $ne: 0 },
},
},
]
With this solution, it will wrong when a negative plus with the opposite version. Example -1500 + 1500 will become zero.
Solution 2
Model.aggregate([
{
$group: {
_id: {
accountID: "$accountID",
locationID: "$locationID",
time: "$time",
},
value: { $sum: "$values.val" },
actualValue: { $sum: "$values.actualVal" },
},
},
{
$addFields: {
absVal: { $abs: "$value" },
absActualVal: { $abs: "$actualValue" },
},
},
{
$project: {
accountID: "$_id.accountID",
locationID: "$_id.locationID",
time: "$_id.time",
value: "$value",
actualValue: "$actualValue",
total: { $add: ["$absVal", "$absActualVal"] },
},
},
{
$match: {
total: { $ne: 0 },
},
},
])
It works, but I lost 1 second from 3.5s to 4.5s when searching in 1m document.
Any suggestion? Thank you first
Some basic boolean logic should suffice, use something like:
Model.aggregate([
{
$match: {
$or: [
{
value: {$ne: 0}
},
{
actualValue: {$ne: 0}
}
]
}
}
{
$project: {
accountID: "$_id.accountID",
locationID: "$_id.locationID",
time: "$_id.time",
value: "$value",
actualValue: "$actualValue",
total: {$add: ["$value", "$actualValue"]},
},
}
])
If you care about efficiency make sure you have a compound index that covers both value and actualValue.
My dataset :
{
"codepostal": 84000,
"siren": 520010234,
"type": "home"
},
{
"codepostal": 84000,
"siren": 0,
"type": "home"
},
{
"codepostal": 84000,
"siren": 450123003,
"type": "appt"
} ...
My pipeline (total is an integer) :
var pipeline = [
{
$match: { codepostal: 84000 }
},
{
$group: {
_id: { type: "$type" },
count: { $sum: 1 }
}
},
{
$project: {
percentage: { $multiply: ["$count", 100 / total] }
}
},
{
$sort: { _id: 1 }
}
];
Results :
[ { _id: { type: 'appt' }, percentage: 66 },
{ _id: { type: 'home' }, percentage: 34 } ]
Expected results is to count when "siren" is set to 0 or another number.
Count siren=0 => part
Count siren!=0 => pro
[ { _id: { type: 'appt' }, totalPercent: 66, proPercent: 20, partPercent: 80},
{ _id: { type: 'home' }, totalPercent: 34, proPercent: 45, partPercent: 55 } ]
Thanks a lot for your help !!
You can use $cond to get 0 or 1 for pro/part documents depending o value of siren field. Then it's easy to calculate totals for each type of document:
[
{
$match: { codepostal: 84000 }
},
{
$group: {
_id: { type: "$type" },
count: { $sum: 1 },
countPro: { $sum: {$cond: [{$eq:["$siren",0]}, 0, 1]} },
countPart: {$sum: {$cond: [{$eq:["$siren",0]}, 1, 0]} }
}
},
{
$project: {
totalPercent: { $multiply: ["$count", 100 / total] },
proPercent: { $multiply: ["$countPro", {$divide: [100, "$count"]}] },
partPercent: { $multiply: ["$countPart", {$divide: [100, "$count"]}] }
}
},
{
$sort: { _id: 1 }
}
]
Note that I used $divide to calculate pro/part percentage relative to the count of document within type group.
For your sample documents (total = 3) output will be:
[
{
"_id" : { "type" : "appt" },
"totalPercent" : 33.3333333333333,
"proPercent" : 100,
"partPercent" : 0
},
{
"_id" : { "type" : "home" },
"totalPercent" : 66.6666666666667,
"proPercent" : 50,
"partPercent" : 50
}
]