I recently started working in Mongodb for POC. I have one json collection below
db.ccpsample.insertMany([
{
"ccp_id":1,
"period":601,
"sales":100.00
},
{
"ccp_id":1,
"period":602,
"growth":2.0,
"sales":"NULL" ##sales=100.00*(1+(2.0/100)) -- 100.00 comes from(ccp_id:1 and period=601)
},
{
"ccp_id":1,
"period":603,
"growth":3.0,
"sales":"NULL" ##sales=100.00*(1+(2.0/100))**(1+(3.0/100))-- 100.00 comes from(ccp_id:1 and period=601) 2.0 comes from (ccp_id:2 and period=602)
},
{
"ccp_id":2,
"period":601,
"sales":200.00
},
{
"ccp_id":2,
"period":602,
"growth":2.0,
"sales":"NULL" ##sales=200.00*(1+(2.0/100))
},
{
"ccp_id":2,
"period":603,
"growth":3.0,
"sales":"NULL" ##same like above
}
])
And i need to calculate sales field which has NULL by using above documents with matching conditions of ccp_id should same and period field should be equal to 601. I have added a line to demonstrate calculation of sales field in collection itself above. I tried with $graphlookup but no luck. Can you people kindly help or suggest some way?
You can use below aggregation:
db.ccpsample.aggregate([
{ $sort: { ccp_id: 1, period: 1 } },
{
$group: {
_id: "$ccp_id",
items: { $push: "$$ROOT" },
baseSale: { $first: "$sales" },
growths: { $push: "$growth" }
}
},
{
$unwind: {
path: "$items",
includeArrayIndex: "index"
}
},
{
$project: {
cpp_id: "$items.cpp_id",
period: "$items.period",
growth: "$items.growth",
sales: {
$cond: {
if: { $ne: [ "$items.sales", "NULL" ] },
then: "$items.sales",
else: {
$reduce: {
input: { $slice: [ "$growths", "$index" ] },
initialValue: "$baseSale",
in: { $multiply: [ "$$value", { $add: [1, { $divide: [ "$$this", 100 ] }] } ] }
}
}
}
}
}
}
])
Basically to calculate the value for n-th element you have to know following things:
sales value of first element ($first in $group)
the array of all growths ($push in $group)
the n which indicates how many multiplications you have to perform
To calculate the index you should $push all elements into one array and then use $unwind with includeArrayIndex option which will insert the index of unwinded array to field index.
Last step calculates the cumulative multiplication. It uses $slice with index field to evaluate how many growths should be processed. So there will be one element for 601, two elements for 602 and so on.
Then it's time for $reduce to process that array and perform the multiplications based on your formula: (1 + (growth/100))
Related
My doucments:
[{
"_id":"621c6e805961def3332bcf97",
"title":"monk plus",
"brand":"venture electronics",
"category":"earphones",
"variant":[
{
"price":1100,
"impedance":"16ohm"
},
{
"price":1600,
"impedance":"64ohm"
}],
"salesCount":185,
"buysCount":182,
"viewsCount":250
},
{
"_id":"621c6dab5961def3332bcf92",
"title":"nokia1",
"brand":"nokia",
"category":"mobile phones",
"variant":[
{
"price":10000,
"RAM":"4GB",
"ROM":"32GB"
},
{
"price":15000,
"RAM":"6GB",
"ROM":"64GB"
},
{
"price":20000,
"RAM":"8GB",
"ROM":"128GB"
}],
"salesCount":34,
"buysCount":21,
"viewsCount":80
}]
expected output
[{
_id:621c6e805961def3332bcf97
title:"monk plus"
brand:"venture electronics"
category:"earphones"
salesCount:185
viewsCount:250
variant:[
{
price:1100
impedance:"16ohm"
}]
}]
I have tried this aggregation method
[{
$match: {
'variant.price': {
$gte: 0,$lte: 1100
}
}},
{
$project: {
title: 1,
brand: 1,
category: 1,
salesCount: 1,
viewsCount: 1,
variant: {
$filter: {
input: '$variant',
as: 'variant',
cond: {
$and: [
{
$gte: ['$$variant.price',0]
},
{
$lte: ['$$variant.price',1100]
}
]
}
}
}
}}]
This method returns the expected output, now my question is there any other better approach that return the expected output.Moreover thank you in advance, and as I am new to nosql database so I am curious to learn from the community.Take a note on expected output all properties of particular document must return only the variant array of object I want to filter based on the price.
There's nothing wrong with your aggregation pipeline, and there are other ways to do it. If you just want to return matching documents, with only the first matching array element, here's another way to do it. (The .$ syntax only returns the first match unfortunately.)
db.collection.find({
// matching conditions
"variant.price": {
"$gte": 0,
"$lte": 1100
}
},
{
title: 1,
brand: 1,
category: 1,
salesCount: 1,
viewsCount: 1,
// only return first array element that matched
"variant.$": 1
})
Try it on mongoplayground.net.
Or, if you want to use an aggregation pipeline and return all matching documents in entirety except for the filtered array, you could just "overwrite" the array with the elements you want using "$set" (or its alias "$addFields"). Doing this means you won't need to "$project" anything.
db.collection.aggregate([
{
"$match": {
"variant.price": {
"$gte": 0,
"$lte": 1100
}
}
},
{
"$set": {
"variant": {
"$filter": {
"input": "$variant",
"as": "variant",
"cond": {
"$and": [
{ "$gte": [ "$$variant.price", 0 ] },
{ "$lte": [ "$$variant.price", 1100 ] }
]
}
}
}
}
}
])
Try it on mongoplayground.net.
your solution is good, just make sure to apply your $match and pagination before applying this step for faster queries
Assume I have a collection with millions of documents. Below is a sample of how the documents look like
[
{ _id:"1a1", points:[2,3,5,6] },
{ _id:"1a2", points:[2,6] },
{ _id:"1a3", points:[3,5,6] },
{ _id:"1b1", points:[1,5,6] },
{ _id:"1c1", points:[5,6] },
// ... more documents
]
I want to query a document by _id and return a document that looks like below:
{
_id:"1a1",
totalPoints: 16,
rank: 29
}
I know I can query the whole document, sort by descending order then get the index of the document I want by _id and add one to get its rank. But I have worries about this method.
If the documents are in millions won't this be 'overdoing' it. Querying a whole collection just to get one document? Is there a way to achieve what I want to achieve without querying the whole collection? Or the whole collection has to be involved because of the ranking?
I cannot save them ranked because the points keep on changing. The actual code is more complex but the take away is that I cannot save them ranked.
Total points is the sum of the points in the points array. The rank is calculated by sorting all documents in descending order. The first document becomes rank 1 and so on.
an aggregation pipeline like the following can get the result you want. but how it operates on a collection of millions of documents remains to be seen.
db.collection.aggregate(
[
{
$group: {
_id: null,
docs: {
$push: { _id: '$_id', totalPoints: { $sum: '$points' } }
}
}
},
{
$unwind: '$docs'
},
{
$replaceWith: '$docs'
},
{
$sort: { totalPoints: -1 }
},
{
$group: {
_id: null,
docs: { $push: '$$ROOT' }
}
},
{
$set: {
docs: {
$map: {
input: {
$filter: {
input: '$docs',
as: 'x',
cond: { $eq: ['$$x._id', '1a3'] }
}
},
as: 'xx',
in: {
_id: '$$xx._id',
totalPoints: '$$xx.totalPoints',
rank: {
$add: [{ $indexOfArray: ['$docs._id', '1a3'] }, 1]
}
}
}
}
}
},
{
$unwind: '$docs'
},
{
$replaceWith: '$docs'
}
])
I'm trying to get a grand total of all tags.
let topics = await ReadSchema.aggregate([{
$group: {
"_id": "$id",
count: { $size: { "$ifNull": [ "$summary.topics", [] ] } }
}
}]);
I get the error: server error MongoError: unknown group operator '$size'
Bonus points if you can remove duplicate "topics" in the total.
It can't allow $size as accumulator operator,
The $group's field Computed using the accumulator operators. The operator must be one of the following accumulator operators are: $accumulator, $addToSet, $avg, $first, $last, $max, $mergeObjects, $min, $push, $stdDevPop, $stdDevSamp, $sum, for more details refer $group,
use $sum before $size operator,
let topics = await ReadSchema.aggregate([
{
$group: {
"_id": "$id",
count: {
$sum: {
$size: {
"$ifNull": [ "$summary.topics", [] ]
}
}
}
}
}
]);
Remove duplicate topics in the total:
$addToSet to topics, make unique array of topics array
$reduce to iterate loop of topics array and get union of all topics tags
using $setUnion and, $size to get total count of unique topics
let topics = await ReadSchema.aggregate([
{
$group: {
_id: "$id",
topics: {
$addToSet: { $ifNull: [ "$summary.topics", [] ] }
}
}
},
{
$project: {
id: 1,
count: {
$size: {
$reduce: {
input: "$topics",
initialValue: [],
in: { $setUnion: ["$$this", "$$value"] }
}
}
}
}
}
])
Suggestions:
match topics is array condition in first stage in your query, $type: 4 indicates topics field has array data type or not, this will filter your documents before $group stage and you do not longer need to check $ifNull condition in $group stage, you can remove that condition.
for query optimization you can put index on summary.topics field.
how index works refer index
create index refer db.collection.createIndex
{
$match: {
$and: [
{ "summary.topics": { $type: 4 } },
{ "summary.topics": { $ne: [] } }
]
}
}
I found a faster way to remove duplicates and count size of an embedded array document.
let topics = await ReadSchema.aggregate([
{
$project: {
_id: '0',
topics: { $ifNull: ['$summary.topics', []] },
},
},
{ $unwind: '$topics' },
{ $group: { _id: '0', topics: { $addToSet: '$topics' } } },
{
$project: {
count: { $size: '$topics' },
},
},
]);
First we create a projection with just the topics. It will be an array of objects which contain the topics array for each document, we use $ifNull which will default to an empty array for documents where the embedded summary.topics array is missing.
We then $unwind that array of arrays into one flat array. Then $group the array using $addToSet which will implicitly remove duplicates by its nature.
We then $project a new document with a count property that takes the $size of the new array (as duplicates are now removed).
I have these documents in my collection
{
id:1,
small:[{k:'A',v:1},{k:'B',v:2},{k:'D',v:3}],
big:[{k:'A',v:2},{k:'B',v:3},{k:'C',v:1},{k:'D',v:4}]
},
{
id:2,
small:[{k:'A',v:1},{k:'B',v:2},{k:'D',v:3}],
big:[{k:'A',v:2},{k:'B',v:3},{k:'C',v:1},{k:'D',v:4}]
},
{
id:3,
small:[{k:'A',v:1},{k:'B',v:2},{k:'D',v:3}],
big:[{k:'A',v:2},{k:'B',v:3},{k:'C',v:1},{k:'D',v:4}]
}
Now, I want to get the sum for each key in both lists. I want my output to look like this:
{k:'A',small:3, big:6},
{k:'B',small:6, big:9},
{k:'D',small:9, big:12}
Notice that the output did not contain the key 'C'. This is because I only want to output the keys that are existing in the 'small' list. What mongodb functions
should I use for this?
Thanks!
Try below aggregation:
db.col.aggregate([
{ $unwind: "$small" },
{ $unwind: "$big" },
{ $redact: {
$cond: {
if: { $eq: [ "$small.k", "$big.k" ] },
then: "$$KEEP",
else: "$$PRUNE"
}
}
},
{
$group: { _id: "$small.k", small: { $sum: "$small.v" }, big: { $sum: "$big.v" } }
},
{
$sort: { "_id": 1 }
}
])
In general we need to have only one small and big in each document (that's why double $unwind). Then we want to keep only documents where keys are equal. That's the moment where C is filtered out - has no pair in small and we're utilizing $redact for that. Aggregation is just a $group with $sum.
I am quite new to MongoDB. Hopefully I am using the correct terminology to express my problem.
I have the following collection:
Data collection
{
"name":"ABC",
"resourceId":"i-1234",
"volumeId":"v-1234",
"data":"11/6/2013 12AM",
"cost": 0.5
},
{
"name":"ABC",
"resourceId":"v-1234",
"volumeId":"",
"data":"11/6/2013 2AM",
"cost": 1.5
}
I want to query the collection such that if a volumeId matches with another entries resourceId, then sum up the corresponding resourceId's cost together.
As a result, the cost would be 2.0 in this case.
Basically I want to match the volumeId of one entry to the resourceId of another entry and sum the costs if matched.
I hope I have explained my problem properly. Any help is appreciated. Thanks
Try this aggregation query:
db.col.aggregate([
{
$project: {
resourceId: 1,
volumeId: 1,
cost: 1,
match: {
$cond: [
{$eq: ["$volumeId", ""]},
"$resourceId",
"$volumeId"
]
}
}
},
{
$group: {
_id: '$match',
cost: {$sum: '$cost'},
resId: {
$addToSet: {
$cond: [
{$eq: ['$match', '$resourceId']},
null,
'$resourceId'
]
}
}
}
},
{$unwind: '$resId'},
{$match: {
resId: {
$ne: null
}
}
},
{
$project: {
resourseId: '$resId',
cost: 1,
_id: 0
}
}
])
And you will get the following:
{ "cost" : 2, "resourseId" : "i-1234" }
This is assuming the statement I wrote in the comment is true.