MongoDB combining group aggregation and strLenBytes - mongodb

I've a Mongo collection with documents like this:
{
"_id" : ObjectId("5a9d0d44c3a1ce5f14c6940a"),
"topic_id" : "5a7af30613b79405643e7da1",
"value" : "VMware Virtual Platform",
"timestamp" : "2018-03-05 09:26:25.136546",
"insert_ts" : "2018-03-05 09:26:25.136682",
"inserted_by" : 1
},
{
"_id" : ObjectId("5a9d0d44c3a1ce5f14c69409"),
"topic_id" : "5a7af30713b79479f82b4b84",
"value" : "VMware, Inc.",
"timestamp" : "2018-03-05 09:26:25.118931",
"insert_ts" : "2018-03-05 09:26:25.119081",
"inserted_by" : 1
},
{
"_id" : ObjectId("5a9d0d44c3a1ce5f14c69408"),
"topic_id" : "5a7af30713b7946d6d0a8772",
"value" : "Phoenix Technologies LTD 6.00 09/21/2015",
"timestamp" : "2018-03-05 09:26:25.101624",
"insert_ts" : "2018-03-05 09:26:25.101972",
"inserted_by" : 1
}
I would like to fetch some aggregated data from this collection. I want to know the oldest timestamp, the documents count and the total strlen of all values, but grouped by topic_id, where the document-id is greater than x.
In mysql, i would build a sql like this:
SELECT
MAX(_id) as max_id,
COUNT(*) as message_count,
MIN(timestamp) as min_timestamp,
LENGTH(GROUP_CONCAT(value)) as size
FROM `dev_topic_data_numeric`
WHERE _id > 22000
GROUP BY topic_id
How do i achieve this in MongoDB? I already tried to build it looking like this:
db.getCollection('topic_data_text').aggregate(
[
{
"$match":
{
"_id": {"$gte": ObjectId("5a9d0aefc3a1ce5f14c68c81") }
}
},
{
"$group":
{
"_id": "$topic_id",
"max_id": {"$max":"$_id"},
"min_timestamp": {"$min": "$timestamp"},
"message_count": {"$sum": 1},
/*"size": {"$strLenBytes": "$value" }*/
}
}
]
);
Then i uncomment $strLenBytes it crashes saying, that strLenBytes is not a group operator. The API of MongoDB does not help me here. How do i have to write it to get the strlen?
My expected result should look like this:
{
"_id" : "5a7af30613b79405643e7da1",
"max_id" : ObjectId("5a9d0d44c3a1ce5f14c6940a"),
"min_timestamp" : "2018-03-05 09:26:25.136546",
"message_count" : 1,
"size" : 23,
}
My MongoDB version is 3.4.4.

This is because $strLenBytes is not an accumulator, unlike $sum or $max. The $group stage accumulates values, so any operator that is valid in the $group stage are typically accumulators.
$strLenBytes converts one value to another in a 1-1 fashion. This is typically an operator for the $project stage.
Adding a $project stage in your aggregation should give you the result you require. Note that you would also need to modify the $group stage slightly to pass on the required values:
> db.test.aggregate([
{
"$match":
{
"_id": {"$gte": ObjectId("5a9d0aefc3a1ce5f14c68c81") }
}
},
{
"$group":
{
"_id": {"topic_id": "$topic_id", value: "$value"},
"max_id": {"$max":"$_id"},
"min_timestamp": {"$min": "$timestamp"},
"message_count": {"$sum": 1}
}
},
{
"$project":
{
"_id": "$_id.topic_id",
"max_id": "$max_id",
"min_timestamp": "$min_timestamp",
"message_count": "$message_count",
size: {"$strLenBytes": "$_id.value" }
}
}
])
Output using your example documents:
{
"_id": "5a7af30613b79405643e7da1",
"max_id": ObjectId("5a9d0d44c3a1ce5f14c6940a"),
"min_timestamp": "2018-03-05 09:26:25.136546",
"message_count": 1,
"size": 23
}
{
"_id": "5a7af30713b79479f82b4b84",
"max_id": ObjectId("5a9d0d44c3a1ce5f14c69409"),
"min_timestamp": "2018-03-05 09:26:25.118931",
"message_count": 1,
"size": 12
}
{
"_id": "5a7af30713b7946d6d0a8772",
"max_id": ObjectId("5a9d0d44c3a1ce5f14c69408"),
"min_timestamp": "2018-03-05 09:26:25.101624",
"message_count": 1,
"size": 40
}

After testing #kevin-adistambha's answer and some further experimenting, I found another way to achieve my wanted result - and maybe it has a better performance - but that needs more testing to be sure about this.
db.getCollection('topic_data_text').aggregate(
[
{
"$match":
{
"_id": {"$gt": ObjectId("5a9f9d8bd5de3ac75f8cc269") }
}
},
{
"$group":
{
"_id": "$topic_id",
"max_id": {"$max":"$_id"},
"min_timestamp": {"$min": "$timestamp"},
"message_count": {"$sum": 1},
"size": {"$sum": {"$strLenBytes": "$value"}}
}
}
]
);

Related

MongoDB aggregation: average sales per hour

I have a collection with sales. Now I need to get the average number of sales per hour within a date range.
Up to now I have a query like this:
db.getCollection('sales').aggregate({
"$match": {
$and: [
{ "createdAt": { $gte: ISODate("2018-05-01T00:00:00.000Z") } },
{ "createdAt": { $lt: ISODate("2018-10-30T23:59:00.000Z") } },
]
}
},{
"$project": {
"h":{"$hour":"$createdAt"},
}
},{
"$group":{
"_id": "$h",
"salesPerHour": { $sum: 1 },
},
},{
"$sort": { "salesPerHour": -1 }
});
The result looks like this: {"_id" : 15, "salesPerHour" : 681.0}
How can I get the average value of salesPerHour instead the sum?
Update 1 => Example document.
{
"_id" : "pX6jj7j4274J9xpSA",
"idFiscalSale" : "48",
"documentYear" : "2018",
"paymentType" : "cash",
"cashReceived" : 54,
"items" : [...],
"customer" : null,
"subTotal" : 23.89,
"taxTotal" : 3.7139,
"total" : 23.89,
"rewardPointsValue" : 0,
"rewardPointsEarned" : 24,
"discountValue" : 0,
"createdAt" : ISODate("2018-04-24T00:00:00.201Z")
}
You can use below aggregation query.
db.sales.aggregate([
{"$match":{
"createdAt":{
"$gte":ISODate("2018-05-01T00:00:00.000Z"),
"$lt":ISODate("2018-10-30T23:59:00.000Z")
}
}},
{"$group":{
"_id":{"$hour":"$createdAt"},
"salesPerHour":{"$sum":1}
}},
{"$group":{
"_id":null,
"salesPerHour":{"$avg":"$salesPerHour"}
}}
])
You can try below aggregation
You have to use $avg aggregation operator with the salesPerHour field
db.collection.aggregate([
{ "$match": {
"$and": [
{ "createdAt": { "$gte": ISODate("2018-05-01T00:00:00.000Z") }},
{ "createdAt": { "$lt": ISODate("2018-10-30T23:59:00.000Z") }}
]
}},
{ "$group": {
"_id": { "$hour": "$createdAt" },
"salesPerHour": {
"$avg": "$salesPerHour"
}
}}
])

Find Total based on group by of two mongo field

i have collection data like this -
{
"user_id" : "1",
"branch_id" : "1",
"total" : 100,
},
{
"user_id" : "1",
"branch_id" : "1",
"total" : 200
},
{
"user_id" : "1",
"branch_id" : "3",
"total" : 1400
},
{
"user_id" : "2",
"branch_id" : "1",
"total" : 100
},
{
"user_id" : "2",
"branch_id" : "1",
"total" : 100
},
I am looking to get output in the below format -
[
{
"user_id":"1",
"branch_id":"1",
"grand_total":"300"
},
{
"user_id":"1",
"branch_id":"3",
"grand_total":"1400"
},
{
"user_id":"2",
"branch_id":"1",
"grand_total":"200"
}
]
I have tried a mongo aggregate query, but the query gives output as undefined.
Basically I need to get per user wise per branch wise the total points he has earned.
Here is what I have tried but not working -
Collection.aggregate(
{
"$group": {
"_id": "$user_id",
"nameCount": { "$sum": 1 },
"branch_id": {
"$sum": {
"$cond": [ {"$branch_id":{"$ne":null}} ]
}
}
}
},
{
"$project": {
"_id": 0,
"name": "$_id",
"nameCount": 1,
"branch_id":1
}
}
);
Please help.
Your aggregation pipeline has to look like this:
{
"$group": {
"_id": {
user_id: "$user_id",
branch_id: "$branch_id"
},
"grand_total": {
"$sum": "$total"
},
}
}, {
"$project": {
"_id": 0,
"user_id": "$_id.user_id",
"branch_id": "$_id.branch_id",
"total": "$grand_total"
}
}
Inside your _id field in your "$group" pipeline you add the fields that you want to group your documents by. If you only want to group by one field you can write it as follows:
{"$group": {
"_id": "$user_id"
}
}
If you have multiple fields you want to group by (like it seems in your case) then you write it as follows:
{"$group": {
"_id": {
user_id: "$user_id",
branch_id: "$branch_id"
}
}
}
Every aggregation pipeline changes your document. So, in your $group if you call the sum of all totals like that "grand_total"
"grand_total": {
"$sum": "$total"
}
then in your $project pipeline that field total doesn't exist anymore. But instead we created a new field (grand_total) that is the sum.

MongoDB aggregate nested array correctly

OK I am very new to Mongo, and I am already stuck.
Db has the following structure (much simplified for sure):
{
{
"_id" : ObjectId("57fdfbc12dc30a46507044ec"),
"keyterms" : [
{
"score" : "2",
"value" : "AA",
},
{
"score" : "2",
"value" : "AA",
},
{
"score" : "4",
"value" : "BB",
},
{
"score" : "3",
"value" : "CC",
}
]
},
{
"_id" : ObjectId("57fdfbc12dc30a46507044ef"),
"keyterms" : [
...
There are some Objects. Each Object have an array "keywords". Each of this Arrays Entries, which have score and value. There are some duplicates though (not really, since in the real db the keywords entries have much more fields, but concerning value and score they are duplicates).
Now I need a query, which
selects one object by id
groups its keyterms in by value
and counts the dublicates
sorts them by score
So I want to have something like that as result
// for Object 57fdfbc12dc30a46507044ec
"keyterms"; [
{
"score" : "4",
"value" : "BB",
"count" : 1
},
{
"score" : "3",
"value" : "CC",
"count" : 1
}
{
"score" : "2",
"value" : "AA",
"count" : 2
}
]
In SQL I would have written something like this
select
score, value, count(*) as count
from
all_keywords_table_or_some_join
group by
value
order by
score
But, sadly enough, it's not SQL.
In Mongo I managed to write this:
db.getCollection('tests').aggregate([
{$match: {'_id': ObjectId('57fdfbc12dc30a46507044ec')}},
{$unwind: "$keyterms"},
{$sort: {"keyterms.score": -1}},
{$group: {
'_id': "$_id",
'keyterms': {$push: "$keyterms"}
}},
{$project: {
'keyterms.score': 1,
'keyterms.value': 1
}}
])
But there is something missing: the grouping of the the keywords by their value. I can not get rid of the feeling, that this is the wrong approach at all. How can I select the keywords array and continue with that, and use an aggregate function inly on this - that would be easy.
BTW I read this
(Mongo aggregate nested array)
but I can't figure it out for my example unfortunately...
You'd want an aggregation pipeline where after you $unwind the array, you group the flattened documents by the array's value and score keys, aggregate the counts using the $sum accumulator operator and retain the main document's _id with the $first operator.
The preceding pipeline should then group the documents from the previous pipeline by the _id key so as to preserve the original schema and recreate the keyterms array using the $push operator.
The following demonstration attempts to explain the above aggregation operation:
db.tests.aggregate([
{ "$match": { "_id": ObjectId("57fdfbc12dc30a46507044ec") } },
{ "$unwind": "$keyterms" },
{
"$group": {
"_id": {
"value": "$keyterms.value",
"score": "$keyterms.score"
},
"doc_id": { "$first": "$_id" },
"count": { "$sum": 1 }
}
},
{ "$sort": {"_id.score": -1 } },
{
"$group": {
"_id": "$doc_id",
"keyterms": {
"$push": {
"value": "$_id.value",
"score": "$_id.score",
"count": "$count"
}
}
}
}
])
Sample Output
{
"_id" : ObjectId("57fdfbc12dc30a46507044ec"),
"keyterms" : [
{
"value" : "BB",
"score" : "4",
"count" : 1
},
{
"value" : "CC",
"score" : "3",
"count" : 1
},
{
"value" : "AA",
"score" : "2",
"count" : 2
}
]
}
Demo
Meanwhile, I solved it myself:
aggregate([
{$match: {'_id': ObjectId('57fdfbc12dc30a46507044ec')}},
{$unwind: "$keyterms"},
{$sort: {"keyterms.score": -1}},
{$group: {
'_id': "$keyterms.value",
'keyterms': {$push: "$keyterms"},
'escore': {$first: "$keyterms.score"},
'evalue': {$first: "$keyterms.value"}
}},
{$limit: 15},
{$project: {
"score": "$escore",
"value": "$evalue",
"count": {$size: "$keyterms"}
}}
])

Return specific field in aggregate

i am trying to aggregate the following data:
{
"_id" : ObjectId("527a6b7c24a8874c078b9d10"),
"Name" : "FirstName",
"Link" : "www.mylink.com/123",
"year" : 2013
}
{
"_id" : ObjectId("527a6b7c24a8874c078b9d11"),
"Name" : "FirstName",
"Link" : "www.mylink.com/124",
"year" : 2013
}
{
"_id" : ObjectId("527a6b7c24a8874c078b9d12"),
"Name" : "SecondName",
"Link" : "www.mylink.com/125",
"year" : 2013
}
I want to aggregate number of occurencies of Name field, but also want to return the corresponding Link field in the output of aggregate query. Now I am doing it like this (which does not return the Link field in the output):
db.coll.aggregate([
{ "$match": { "Year": 2013 } },
{ "$group": {
"_id": {
"Name": "$Name"
},
"count": { "$sum": 1 }
}},
{ "$project": {
"_id": "$_id",
"count": 1
}},
{ $sort: {
count: 1
} }
])
The above returns only Name field and count. But how can I also return the corresponding Link field (could be several) in the output of aggregate query?
Best Regards
db.coll.aggregate([
{ "$match": { "year": 2013 } },
{ "$group": {"_id": "$Name", "Link": {$push: "$Link"}, "count": { "$sum": 1 }}},
{ "$project": {"Name": "$_id", _id: 0, "Link": 1, "count": 1}},
{ $sort: {count: 1} }
])
Results in:
{ "Link" : [ "www.mylink.com/125" ], "count" : 1, "Name" : "SecondName" }
{ "Link" : [ "www.mylink.com/123", "www.mylink.com/124" ], "count" : 2, "Name" : "FirstName" }
Ok so the $match was correct except for a typo with 'Year' --> 'year'
The $group could be simplified a little bit. I removed an extra set of brackets so that you get id: 'FirstName' instead of id: { 'name': 'FirstName' } since we can reshape the _id to 'name' in the $project stage anyways.
You needed to add $push or $addToSet to maintain the $Link value in your grouping. $addToSet will allow for unique values in the array only, while $push will add all values, so use whichever at your discretion.
$project and $sort are straightforward, rename and include/exclude whichever fields you would like.

Selecting Distinct values from Array in MongoDB

I have a collection name Alpha_Num, It has following structure. I am trying to find out which Alphabet-Numerals pair will appear maximum number of times ?
If we just go with the data below, pair abcd-123 appears twice so as pair efgh-10001, but the second one is not a valid case for me as it appears in same document.
{
"_id" : 12345,
"Alphabet" : "abcd",
"Numerals" : [
"123",
"456",
"2345"
]
}
{
"_id" : 123456,
"Alphabet" : "efgh",
"Numerals" : [
"10001",
"10001",
"1002"
]
}
{
"_id" : 123456567,
"Alphabet" : "abcd",
"Numerals" : [
"123"
]
}
I tried to use aggregation frame work, something like below
db.Alpha_Num.aggregate([
{"$unwind":"$Numerals"},
{"$group":
{"_id":{"Alpha":"$Alphabet","Num":"$Numerals"},
"count":{$sum:1}}
},
{"$sort":{"count":-1}}
])
Problem in this query is it gives pair efgh-10001 twice.
Question : How to select distinct values from array "Numerals" in the above condition ?
Problem solved.
db.Alpha_Num.aggregate([{
"$unwind": "$Numerals"
}, {
"$group": {
_id: {
"_id": "$_id",
"Alpha": "$Alphabet"
},
Num: {
$addToSet: "$Numerals"
}
}
}, {
"$unwind": "$Num"
}, {
"$group": {
_id: {
"Alplha": "$_id.Alpha",
"Num": "$Num"
},
count: {
"$sum": 1
}
}
}])
Grouping using $addToSet and unwinding again did the trick. Got the answer from one of 10gen online course.