I'm trying to fix this little issue, i have trying to search a round to find help, but i can't find help anywhere so i trying to ask here.
i try to get a top most views products from a visitor log, the data in my mongodb look like this
{
"_id" : ObjectId("56617f12cc8eaaa6010041ab"),
"Product" : {
"UUID" : "c7b3e23e-0bf9-4fd5-b8d3-f559b80c42ed"
},
"Log" : {
"Method" : "visit",
"IP" : "127.0.0.1",
"Added" : "2015-12-04 12:54:58"
}
}
What i want is create a group by on the Product.UUID field and all logs not older then 1,5 month and what i have done right now look like this.
db.getCollection('log-product').aggregate([
{
"$group" : {
_id:"$Product.UUID",
total: {$sum : 1}
}
},
{"$sort" : {total: -1}},
{"$limit" : 8}
])
here i group on Product.UUID and sort it DESC on total count and limit it to 8, my problem is i can't find a way to count how many visitor the single product have.
Hope somebody out there can help me width this question.
You need to filter "Log.Added" by time interval first then pass the results to $group:
db.getCollection('log-product').aggregate([
{
"$match": {
"Log.Added": { $gt: new Date(2015,10, 15), $lt: new Date(2015,11,15) }
}
},
{
"$group" : {
_id:"$Product.UUID",
total: {$sum : 1}
}
},
{"$sort" : {total: -1}},
{"$limit" : 8}
])
You can filter by Log.Added and group by product uuid and $Log.IP.:
var currentDate = new Date();
var dateOffset = (24*60*60*1000) * 45;
var initInterval = new Date(new Date() - dateOffset);
db.getCollection('log-product').aggregate([
{ "$match" : { "Log.Added": {$lte: currentDate, $gte: initInterval}}},
{
"$group" : {
_id:{"product": "$Product.UUID", "visitor":"$Log.IP"},
total: {$sum : 1}
}
},
{"$sort" : {total: -1}},
{"$limit" : 8}
])
Related
I have below collection, need to find duplicate records in mongo, how can we find that as below is one sample of collection we have around more then 10000 records of collections.
/* 1 */
{
"_id" : 1814099,
"eventId" : "LAS012",
"eventName" : "CustomerTab",
"timeStamp" : ISODate("2018-12-31T20:09:09.820Z"),
"eventMethod" : "click",
"resourceName" : "CustomerTab",
"targetType" : "",
"resourseUrl" : "",
"operationName" : "",
"functionStatus" : "",
"results" : "",
"pageId" : "CustomerPage",
"ban" : "290824901",
"jobId" : "87377713",
"wrid" : "87377713",
"jobType" : "IBJ7FXXS",
"Uid" : "sc343x",
"techRegion" : "W",
"mgmtReportingFunction" : "N",
"recordPublishIndicator" : "Y",
"__v" : 0
}
We can first find the unique ids using
const data = await db.collection.aggregate([
{
$group: {
_id: "$eventId",
id: {
"$first": "$_id"
}
}
},
{
$group: {
_id: null,
uniqueIds: {
$push: "$id"
}
}
}
]);
And then we can make another query, which will find all the duplicate documents
db.collection.find({_id: {$nin: data.uniqueIds}})
This will find all the documents that are redundant.
Another way
To find the event ids which are duplicated
db.collection.aggregate(
{"$group" : { "_id": "$eventId", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }
)
To get duplicates from db, you need to get only the groups that have a count of more than one, we can use the $match operator to filter our results. Within the $match pipeline operator, we'll tell it to look at the count field and tell it to look for counts greater than one using the $gt operator representing "greater than" and the number 1. This looks like the following:
db.collection.aggregate([
{$group: {
_id: {eventId: "$eventId"},
uniqueIds: {$addToSet: "$_id"},
count: {$sum: 1}
}
},
{$match: {
count: {"$gt": 1}
}
}
]);
I assume that eventId is a unique id.
I have the following data:
{ "_id" : ObjectId("55fbffbdebdbc43337b08946"), "date" : 1442578343617,
"body" : { "entries" : [
{ "url" : "google.com/randomString", "time" : 143.832},
{ "url" : "youtube.com/randomString", "time" : 170.128},
{ "url" : "google.com/randomString", "time" : 125.428}
] } }
And I want to sum the time that takes to load the google.com webpages.
What I am trying to do is:
db.har.aggregate([
{$match: {date: 1442578343617, "body.entries.url": /google/}},
{ $unwind : "$body.log.entries"},
{ $group : {"_id" : 123,"total" : {$sum:"$body.entries.time"}}}
])
But the result I get is the total sum: { "_id" : 123, "total" : 439.388 }
How do I filter by body.entries.url?
Thank you very much for your time
Here you are unwinding wrong array body.log.entries.
You need to first match by date timestamp to filter out documents and then use $unwind and again match body.entries.url like :
db.collection.aggregate([{
$match: {
date: 1442578343617
}
}, {
"$unwind": "$body.entries"
}, {
$match: {
"body.entries.url": /google/
}
}, {
$group: {
"_id": null, //you can use any other param here
"total": {
$sum: "$body.entries.time"
}
}
}])
Filtering by url before unwinding keeps all the documents that contain a google url. But it will also keep the other urls of a document that contains google (in this case: youtube). So when you unwind you will still have those youtube urls and never filter them.
So just:
db.har.aggregate([
{$match: {date: 1442578343617},
{$unwind : "$body.log.entries"},
{$match: {"body.entries.url": /google/},
{$group: {"_id" : 123,"total" : {$sum:"$body.entries.time"}}}
])
I have many documents like this one:
{
"_id" : ObjectId("54a94200aa76d3db6cd51977"),
"URL" : "http://...",
"Statistics" : [
{
"Date" : ISODate("2010-05-18T18:07:29.000+0000"),
"Clicks" : NumberInt(250),
},
{
"Date" : ISODate("2010-05-21T12:06:41.000+0000"),
"Clicks" : NumberInt(165),
},
{
"Date" : ISODate("2010-05-30T08:37:50.000+0000"),
"Clicks" : NumberInt(263),
}
]
}
My query looks like this:
db.clicks.aggregate([
{ $match : 'Statistics.Date' : { $gte: new Date("2010-05-18T00:00:00.000Z"), $lte: new Date("2010-05-18T23:59:59.999Z") } },
{ $unwind' => '$Statistics' },
{ $group : { _id : { year : { $year : '$Statistics.Date' }, month : { $month : '$Statistics.Date' }, day : { $dayOfMonth : '$Statistics.Date' } }, Clicks : { $sum : '$Statistics.Clicks' } },
{ $sort : { _id : 1 } }
])
When I try to sum up the clicks from a specific date it gives me all dates, instead of only one. What am I doing wrong? Thanks in advance.
Edit 1:
As there are >80.000 documents in that collection I can't do a $unwind before the $match. Also afaik this would be not a good idea, 'cause that would make the query slower than necessary.
The huge amount of documents and data in it is the reason why I have to use $sum. The document I made above is just an example and only the structure is the same as in my project.
The above query gives me back smth like this:
{
"_id" : [
{
"year" : 2010,
"month" : 5,
"day" : 18
}
],
"Clicks" : 250
},
{
"_id" : [
{
"year" : 2010,
"month" : 4,
"day" : 21
}
],
"Clicks" : 165
},
{
"_id" : [
{
"year" : 2010,
"month" : 5,
"day" : 30
}
],
"Clicks" : 263
}
If I don't use $group I also have to use $limit as the query would exceed 16MB otherwise:
db.clicks.aggregate([
{ $match : 'Statistics.Date' : { $gte: new Date("2010-05-18T00:00:00.000Z"), $lte: new Date("2010-05-18T23:59:59.999Z") } },
{ $unwind' : '$Statistics' },
{ $limit : 1 }
])
This result:
{
"_id" : ObjectId("54a94200aa76d3db6cd51977"),
"URL" : "http://...",
"Statistics" : {
"Date" : {
"sec" : 1274166878,
"usec" : 0
},
"Clicks" : 250
}
}
Due to performance reasons I have to use $group and not using it is not an option.
As I have all done in PHP there may be some errors in the document, queries and results I mentioned. Hopefully this won't be a problem. I still haven't figured out what's causing my problem. Can anyone help me?
Edit 2:
As this seems to be an performance issue which can't be solved I'm migrating all the data from the 'Statistics' array into an own collection. Thx to anyone for your help.
You need to run your $match twice, both before and after the $unwind:
db.clicks.aggregate([
{ $match : { 'Statistics.Date' : {
$gte: new ISODate("2010-05-18T00:00:00.000Z"),
$lte: new ISODate("2010-05-18T23:59:59.999Z") } } },
{ $unwind: '$Statistics' },
{ $match : { 'Statistics.Date' : {
$gte: new ISODate("2010-05-18T00:00:00.000Z"),
$lte: new ISODate("2010-05-18T23:59:59.999Z") } } },
{ $group : {
_id : { year : { $year : '$Statistics.Date' },
month : { $month : '$Statistics.Date' },
day : { $dayOfMonth : '$Statistics.Date' } },
Clicks : { $sum : '$Statistics.Clicks' } } },
{ $sort : { _id : 1 } }
])
The first $match is used to select the documents with at least one Statistics element in the right date range. The second one is used to filter out the other Statistics elements of those docs that aren't in the right date range.
Things may have been solved but posting answer for ones who are seeking help from this question
{ $match : 'Statistics.Date' : { $gte: new Date("2010-05-18T00:00:00.000Z"),
enter code here$lte: new Date("2010-05-18T23:59:59.999Z") } }
this match will filter main documents. What you want is to filter the documents inside Statistics array.
Now documents filtered by $match will contain full Statistic array. And unwinding after filtering may have Sub-document of Statistic whose sibling document(document that are in the same array) have passed $match condition.
Note: simple find projection:
db.col_name.find({},{"Statistics.$":1}) will filter array too but
$project in aggregation is not helping in filtering array of documents.
Using $sort and $group in one aggregation query behaving strangely.
Test data:
db.createCollection("test");
db.test.insert({
ts : 100,
category : 1
});
db.test.insert({
ts : 80,
category : 1
});
db.test.insert({
ts : 60,
category : 2
});
db.test.insert({
ts : 40,
category : 3
});
So when sorting it by ts all looks good, but when I use both $sort and $group result goes in a wrong order. Query:
db.test.aggregate([
{
$sort : {ts: 1}
},
{
$group:{"_id":"$category"}
}
]);
And the result in reverse order:
{ "_id" : 1 }
{ "_id" : 2 }
{ "_id" : 3 }
Is it Mongo feature or my misunderstanding? Maby mongo firstly applied grouping and then can't sort by absent field. For this reason probably mongoose prohibits use distinct with sorting.
You need to first $group and $sort the result. Since you only want the _id field you will need the $project stage.
db.test.aggregate(
[
{ "$group": { "_id": "$category" }},
{ "$sort" : { "ts": 1 }},
{ "$project": { "_id": 1 }}
]
);
If you want to sort the other way, do it like this:
db.test.aggregate([
{
$sort : {ts: -1}
},
{
$group:{"_id":"$category"}
}
]);
Notice the - in front of the 1.
When you first $sort by ts, you are basically sorting all the elements from your collection. Thus, if you were to only run the $sort stage in the aggregation pipeline, you would get the following result:
//Query
db.test.aggregate([
{ $sort: { ts: 1} }
]);
//Output
{ "_id" : ObjectId("55141da6e4c260ae9e00832b"), "ts" : 40, "category" : 3 }
{ "_id" : ObjectId("55141d9fe4c260ae9e00832a"), "ts" : 60, "category" : 2 }
{ "_id" : ObjectId("55141d99e4c260ae9e008329"), "ts" : 80, "category" : 1 }
{ "_id" : ObjectId("55141d93e4c260ae9e008328"), "ts" : 100, "category" : 1 }
In your code, when you add the $group stage, you are basically grouping the above results by the category field, producing the output that you get:
{ "_id" : 1 }
{ "_id" : 2 }
{ "_id" : 3 }
In the end it all depends on what you are trying to achieve.
If you want to return the categories filtered by the ts field, you should only use the $sort stage and then manipulate the resulting data set:
var data = db.test.aggregate([
{$sort: { ts: 1}},
{$project: {
_id: 0,
ts: 1,
category: 1
}
}
]).toArray();
for(var i = 0; i < data.length; i++) {
console.log(data[i].category); //Output 3, 2, 1 in that sequence, on different lines
}
I have a user base stored in mongo. Users may record their date of birth.
I need to run a report aggregating users by age.
I now have a pipeline that groups users by year of birth. However, that is not precise enough because most people are not born on January 1st; so even if they are born in, say, 1970, they may well not be 43 yet.
db.Users.aggregate([
{ $match : { "DateOfBirth" : { $exists : true} } },
{ $project : {"YearOfBirth" : {$year : "$DateOfBirth"} } },
{ $group : { _id : "$YearOfBirth", Total : { $sum : 1} } },
{ $sort : { "Total" : -1 } }
])
Do you know if it's possible to perform some kind of arithmetic within the aggregation framework to exactly calculate the age of a user? Or is this possible with MapReduce only?
It seems like the whole thing is possible with the new Mongo 2.4 version just released, supporting additional Date operations (namely the "$subtract").
Here's how I did it:
db.Users.aggregate([
{ $match : { "DateOfBirth" : { $exists : true} } },
{ $project : {"ageInMillis" : {$subtract : [new Date(), "$DateOfBirth"] } } },
{ $project : {"age" : {$divide : ["$ageInMillis", 31558464000] }}},
// take the floor of the previous number:
{ $project : {"age" : {$subtract : ["$age", {$mod : ["$age",1]}]}}},
{ $group : { _id : "$age", Total : { $sum : 1} } },
{ $sort : { "Total" : -1 } }
])
There are not enough dateTime operators and math operators to project out the date. But you might be able to create age ranges by composing a dynamic query:
Define your date ranges as cut-off dates as
dt18 = today - 18
dt25 = today - 25
...
dt65 = today - 65
Then do nested conditionals, where you progressively use the cut off dates as age group markers, like so:
db.folks.save({ "_id" : 1, "bd" : ISODate("2000-02-03T00:00:00Z") });
db.folks.save({ "_id" : 2, "bd" : ISODate("2010-06-07T00:00:00Z") });
db.folks.save({ "_id" : 3, "bd" : ISODate("1990-10-20T00:00:00Z") });
db.folks.save({ "_id" : 4, "bd" : ISODate("1964-09-23T00:00:00Z") });
db.folks.aggregate(
{
$project: {
ageGroup: {
$cond: [{
$gt: ["$bd",
ISODate("1995-03-19")]
},
"age0_18",
{
$cond: [{
$gt: ["$bd",
ISODate("1988-03-19")]
},
"age18_25",
"age25_plus"]
}]
}
}
},
{
$group: {
_id: "$ageGroup",
count: {
$sum: 1
}
}
})