mongodb: get top x rankings(count) of occurrences in documents

mongodb: get top x rankings(count) of occurrences in documents - mongodb

I have documents that have general geo location information. I am trying get the top x cities for example.
{
"_id" : ObjectId("593b6a7068c4281a3f7702c5"),
"clientID" : "1000000000",
"session_id" : "I9Ak2k1taOGHU0Z0000000000",
"location" : {
"country" : "United States",
"city" : "Seattle",
"postal" : "98105",
"traits" : null,
"local" : "America/Los_Angeles"
},
"dateTime" : ISODate("2017-06-09T21:12:56.819+0000"),
"action" : "PLAY",....
}
I am fairly new to mongo, coming from sql. I was hoping to get this done using aggregation & without doing it in code after I get a range of docs returned.
I figured out how to group these documents by session_id and such I now just need to add the instance of city count sorted descending and just display those cities.
my group query:
db.Playerstats.aggregate(
// Pipeline
[
// Stage 1 for tests
{
$match: {
clientID: "1000000000"
}
},
// Stage 2
{
$group: {
_id : "$session_id",
start: { $first: "$dateTime"},
stop: { $last: "$dateTime"},
eventID : { $addToSet: "$eventID"},
status : { $addToSet: "$eventStatus"},
browser: { $addToSet: "$browser.userAgent"},
OS : { $addToSet: "$browser.platform"},
city : { $addToSet: "location.city"},
p2pData : { $sum: "$p2p.totalVerifiedBytes"},
actions : { $addToSet: "$action"}
}
},
// Stage 3
{
$sort: {
startDate: -1
}
},
],
);
Thanks for any insight.

I just needed to add another grouping to the query for the cities and get that count. now I can add a limit and sorting to round it out.
{
$group: {
_id: "$city",
count: { $sum: 1 }
}
}

Related

How can I ensure my aggregation filters out subdocuments that are past expiry date in mongo?

I want to count the number of resetPassword (subdocument in Users schema) codes that are currently active. For a code to be active it's expiry date must be greater than the current date.
Here is my users schema. If someone requests to reset there password, we'll push a new { code: X, expiresAt, createdAt } Object to the array.
id: { type: String, unique: true },
resetPassword: [
{
code: String,
expiresAt: Date,
createdAt: Date,
},
],
I'm having an issue trying to $sum the total number of active reset codes. Here is the query I'm running that returns an empty array...note that if I were to remove the resetPassword.expiresAt: { $gt: nowDateInMilliseconds() } match section, it will return all the codes. I've tried moving this match statement out of the intial $match stage then doing an unwind & a match on the expiresAt but this didn't work either.
[
{
$match: {
"id": userId,
'resetPassword.expiresAt': {
$gt: nowDateInMillisec(),
},
},
},
{
$group: {
_id: '$id',
totalValidResetCodes: {
$sum: {
$size: '$resetPassword',
},
},
},
},
]
This returns an empty array, even though I've got the expiry dates set to a date in the future.
I also tried the following with the same result (notice how I added $unwind and another $match to the pipeline)
[
{
$match: {
"id": userId,
},
},
{
$unwind: '$resetPassword',
},
{
$match: {
'resetPassword.expiresAt': {
$gt: nowDateInMillisec(),
},
}
},
{
$group: {
_id: '$id',
totalValidResetCodes: {
$sum: {
$size: '$resetPassword',
},
},
},
},
]
nowDateInMillisec() - This simply returns the current date in milliseconds from epoch.
What am I doing wrong?

You can try $reduce in $project, instead of your all process, you need to return ISOdate from this nowDateInMillisec(),
db.collection.aggregate([
{ $match: { id: 1 } },
{
$project: {
totalValidResetCodes: {
$reduce: {
input: "$resetPassword",
initialValue: 0,
in: {
$add: [
"$$value",
{
$cond: [
{ $gt: ["$$this.expiresAt", nowDateInMillisec()] },
// If you really want to pass timestamp then try below line
// { $gt: ["$$this.expiresAt", { $toDate: nowDateInMillisec() }] },
1,
0
]
}
]
}
}
}
}
}
])
Playground

Below is my research, date format when stored in mongodb format should work for milliseconds as well. The below test is for up to a minute.
> db.users13.find().pretty();
{
"_id" : ObjectId("5f4e03768379a4e3f957641d"),
"id" : "johnc",
"resetPassword" : [
{
"code" : "abc",
"expiresAt" : ISODate("2020-09-02T09:11:18.394Z"),
"createdAt" : ISODate("2020-09-01T08:11:18.394Z")
},
{
"code" : "mno",
"expiresAt" : ISODate("2020-08-25T09:26:18.394Z"),
"createdAt" : ISODate("2020-08-25T08:11:18.394Z")
}
]
}
{
"_id" : ObjectId("5f4e06938379a4e3f957641f"),
"id" : "katey",
"resetPassword" : [
{
"code" : "j2c",
"expiresAt" : ISODate("2020-09-02T08:48:18.394Z"),
"createdAt" : ISODate("2020-09-01T08:11:18.394Z")
},
{
"code" : "rml",
"expiresAt" : ISODate("2020-09-01T08:26:18.394Z"),
"createdAt" : ISODate("2020-09-01T08:11:18.394Z")
}
]
}
> db.users13.aggregate([
{$unwind:"$resetPassword"},
{$match:{"resetPassword.expiresAt":{$gt:ISODate()}}}
]).pretty();
{
"_id" : ObjectId("5f4e03768379a4e3f957641d"),
"id" : "johnc",
"resetPassword" : {
"code" : "abc",
"expiresAt" : ISODate("2020-09-02T09:11:18.394Z"),
"createdAt" : ISODate("2020-09-01T08:11:18.394Z")
}
}
{
"_id" : ObjectId("5f4e06938379a4e3f957641f"),
"id" : "katey",
"resetPassword" : {
"code" : "j2c",
"expiresAt" : ISODate("2020-09-02T08:48:18.394Z"),
"createdAt" : ISODate("2020-09-01T08:11:18.394Z")
}
}
> ISODate()
ISODate("2020-09-01T08:31:37.059Z")
>

#Joe answered my question in the comments. He hinted that using a milliseconds since epoch time in the match filter wouldn't work since I'm using a Date type in my Mongoose schema.
So instead of doing this: $gt: nowDateInMillisec(), I simply used a Date type like so: $gt: new Date(),

How can i count total documents and also grouped counts simultanously in mongodb aggregation?

I have a dataset in mongodb collection named visitorsSession like
{ip : 192.2.1.1,country : 'US', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.3.1.8,country : 'UK', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.5.1.4,country : 'UK', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.8.1.7,country : 'US', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.1.1.3,country : 'US', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'}
I am using this mongodb aggregation
[{$match: {
nsp : "/hrm.sbtjapan.com",
creationDate : {
$gte: "2019-12-15T00:00:00.359Z",
$lte: "2019-12-20T23:00:00.359Z"
},
type : "Visitors"
}}, {$group: {
_id : "$country",
totalSessions : {
$sum: 1
}
}}, {$project: {
_id : 0,
country : "$_id",
totalSessions : 1
}}, {$sort: {
country: -1
}}]
using above aggregation i am getting results like this
[{country : 'US',totalSessions : 3},{country : 'UK',totalSessions : 2}]
But i also total visitors also along with result like totalVisitors : 5
How can i do this in mongodb aggregation ?

You can use $facet aggregation stage to calculate total visitors as well as visitors by country in a single pass:
db.visitorsSession.aggregate( [
{
$match: {
nsp : "/hrm.sbtjapan.com",
creationDate : {
$gte: "2019-12-15T00:00:00.359Z",
$lte: "2019-12-20T23:00:00.359Z"
},
type : "Visitors"
}
},
{
$facet: {
totalVisitors: [
{
$count: "count"
}
],
countrySessions: [
{
$group: {
_id : "$country",
sessions : { $sum: 1 }
}
},
{
$project: {
country: "$_id",
_id: 0,
sessions: 1
}
}
],
}
},
{
$addFields: {
totalVisitors: { $arrayElemAt: [ "$totalVisitors.count" , 0 ] },
}
}
] )
The output:
{
"totalVisitors" : 5,
"countrySessions" : [
{
"sessions" : 2,
"country" : "UK"
},
{
"sessions" : 3,
"country" : "US"
}
]
}

You could be better off with two queries to do this.
To save the two db round trips following aggregation can be used which IMO is kinda verbose (and might be little expensive if documents are very large) to just count the documents.
Idea: Is to have a $group at the top to count documents and preserve the original documents using $push and $$ROOT. And then before other matches/filter ops $unwind the created array of original docs.
db.collection.aggregate([
{
$group: {
_id: null,
docsCount: {
$sum: 1
},
originals: {
$push: "$$ROOT"
}
}
},
{
$unwind: "$originals"
},
{ $match: "..." }, //and other stages on `originals` which contains the source documents
{
$group: {
_id: "$originals.country",
totalSessions: {
$sum: 1
},
totalVisitors: {
$first: "$docsCount"
}
}
}
]);
Sample O/P: Playground Link
[
{
"_id": "UK",
"totalSessions": 2,
"totalVisitors": 5
},
{
"_id": "US",
"totalSessions": 3,
"totalVisitors": 5
}
]

MongoDB two groups Aggregate

Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single purpose aggregation methods.
I would like to transform that :
{
"_id" : ObjectId("5836b919885383034437d4a7"),
"Identificador" : "G-3474",
"Miembros" : [
{
"_id" : ObjectId("5836b916885383034437d238"),
"Nombre" : "Pilar",
"Email" : "pcarrillocasa#gmail.es",
"Edad" : 24,
"País" : "España",
"Tipo" : "Usuario individual",
"Apellidos" : "Carrillo Casa",
"Teléfono" : 637567234,
"Ciudad" : "Santander",
"Identificador" : "U-3486",
"Información_creación" : {
"Fecha_creación" : {
"Mes" : 4,
"Día" : 22,
"Año" : 2016
},
"Hora_creación" : {
"Hora" : 15,
"Minutos" : 34,
"Segundos" : 20
}
}
}
}
into that
{
"Nombre_Grupo" : "Amigo invisible"
"Ciudades" : [
{
"Ciudad" : "Madrid",
"Miembros": 30
},
{
"Ciudad" : "Almería",
"Miembros": 10
}
{
"Ciudad" : "Badajoz",
"Miembros": 20
}
]
}
with MongoDB.
I tried with that:
db.Grupos_usuarios.aggregate([
{ $group: { _id: "$Nombre_Grupo",total: { $sum: "$amount" } },
$group: { _id: "$Ciudad",total: { $sum: "$amount" } } }
])
but I could not get what I needed.
May somebody help me to know what I am doing bad?

The following aggregation gets the output you are looking for.
The $unwind stage deconstructs an array field from the input documents to output a document for each element. These documents are used to group by the Miembros.Ciudad and get the total Miembros for each Ciudad. In the second group stage we Pivot data to get all the Ciudades from the previous grouping into an array. The last $project is for formatting the output.
db.test.aggregate( [
{
$unwind: "$Miembros"
},
{
$group: {
_id: "$Miembros.Ciudad",
total: { $sum: 1 }
}
},
{
$group: {
_id: "Amigo invisible",
Ciudades: { $push: { Ciudad: "$_id", Miembros: "$total"} }
}
},
{
$project: {
Nombre_Grupo: "$_id",
Ciudades: 1,
_id: 0
}
}
] )

Count Distinct Within Date Range

I have a MongoDB database with a collection of site-events. The documents look like:
{
"_id" : ObjectId("5785bb02eac0636f1dc07023"),
"referrer" : "https://example.com",
"_t" : ISODate("2016-07-12T18:10:17Z"),
"_p" : "ucd7+hvjpacuhtgbq1caps4rqepvwzuoxm=",
"_n" : "visited site",
"km screen resolution" : "1680x1050"
},
{
"_id" : ObjectId("5785bb02eac0636f1dc07047"),
"url" : "https://www.example.com/",
"referrer" : "Direct",
"_t" : ISODate("2016-07-12T18:10:49Z"),
"_p" : "txt6t1siuingcgo483aabmses2et5uqk0=",
"_n" : "visited site",
"km screen resolution" : "1366x768"
},
{
"_id" : ObjectId("5785bb02eac0636f1dc07053"),
"url" : "https://www.example.com/",
"referrer" : "Direct",
"_t" : ISODate("2016-07-12T18:10:56Z"),
"_p" : "gcama1az5jxa74wa6o9r4v/3k+zulciqiu=",
"_n" : "visited site",
"km screen resolution" : "1366x768"
}
I want to get a count of the unique persons within a date range. In SQL it would be
SELECT COUNT(DISTINCT(`_p`)) FROM collection WHERE `_t` > '<SOME DATE>' AND `_t` <= '<SOME OTHER DATE>'
So far, I've grouped the dates along using the aggregation pipeline:
db.siteEvents.aggregate(
[
{
$match : {"_n": "visited site"}
},
{
$group : {
_id: {
year : { $year : "$_t" },
month : { $month : "$_t" },
day : { $dayOfMonth : "$_t" },
_p : "$_p"
},
count: { $sum: 1 }
}
},
{
$group : {
_id : {
year : { $year : "$_id.year" },
month : { $month : "$_id.month" },
day : { $dayOfMonth : "$_id.day" }
},
count: { $sum: 1 }
}
}
]
);
But this gives errors - I believe because of the second grouping _id trying to grab an intermediate field. I'm currently just using the Mongo shell, but if I had to choose an alternative driver it would be PyMongo. I'd like to get this to work in the shell (so I can understand the process).

With an aggregation pipeline it could look like so
db.getCollection('siteEvents').aggregate([
{
$match: {
_t: {
$gt: ISODate("2016-07-11T08:10:17.000Z"),
$lt: ISODate("2016-07-12T14:10:17.000Z")
}
}
},
{
$group: {
_id: "$_p"
}
},
{
$group: {
_id: null,
distinctCount: { $sum: 1 }
}
}
])
If you know the resulting distinct values won't be large then you could use a simply query like so
db.getCollection('siteEvents').distinct(
'_p',
{
_t: {
$gt: ISODate("2016-07-11T08:10:17.000Z"),
$lt: ISODate("2016-07-12T14:10:17.000Z")
}
}).length

You can use the $addToSet operator in the $group stage to return an array of distinct "_p" value then $project the resulted document to return the size of the array which is nothing other than the distinct count.
db.siteEvents.aggregate(
[
{"$match": {"_n": "visited site", "_t": {"$gt": <SOME DATE>, "$lt": <SOME OTHER DATE>}}},
{"$group": {
"_id": None,
"_p_values": {"$addToSet": "$_p"}
}},
{"$project": {"_id": 0, "count": {"$size": "$_p_values"}}}
]
)
For small size collection you can simply use distinct but you need to pass in the query argument.
len(db.siteEvents.distinct("_p", {"_n": "visited site", "_t": {"$gt": <SOME DATE>, "$lt": <SOME OTHER DATE>}}))

Mongodb Aggregate using $group twice

I have a bunch of documents in mongo with the following structure:
{
"_id" : "",
"number" : 2,
"colour" : {
"_id" : "",
"name" : "Green",
"hex" : "00ff00"
},
"position" : {
"_id" : "",
"name" : "Defence",
"type" : "position"
},
"ageGroup" : {
"_id" : "",
"name" : "Minor Peewee",
"type" : "age"
},
"companyId" : ""
}
I'm currently using Mongo's aggregate to group the documents by ageGroup.name which returns:
//Query
Jerseys.aggregate([
{$match: { companyId: { $in: companyId } } },
{$group: {_id: "$ageGroup.name", jerseys: { $push: "$$ROOT" }} }
]);
//returns
{
_id: "Minor Peewee",
jerseys: array[]
}
but I'd like it to also group by position.name within the age groups. ie:
{
_id: "Minor Peewee",
positions: array[]
}
//in positions array...
{
_id: "Defence",
jerseys: array[]
}
// or ageGroups->positions->jerseys if that makes more sense.
I've tried multiple groups but I don't think I'm setting them up correctly I always seem to get an array of _id's. I'm using Meteor as the server and I'm doing it within a meteor method.

You can use a composite aggregate _id in the first grouping stage.
Then, you can use one of those keys as the "main" _id of the final aggregate and $push the other into another array.
Jerseys.aggregate([
{
$match: { companyId: { $in: companyId } }
},
{
$group: { // each position and age group have an array of jerseys
_id: { position: "$position", ageGroup: "$ageGroup" },
jerseys: { $push: "$$ROOT" }
}
},
{
$group: { // for each age group, create an array of positions
_id: { ageGroup: "$_id.ageGroup" },
positions: { $push: { position: "$_id.position", jerseys:"$jerseys" } }
}
}
]);

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

mongodb: get top x rankings(count) of occurrences in documents - mongodb

I just needed to add another grouping to the query for the cities and get that count. now I can add a limit and sorting to round it out. { $group: { _id: "$city", count: { $sum: 1 } } }

Related

How can I ensure my aggregation filters out subdocuments that are past expiry date in mongo?

How can i count total documents and also grouped counts simultanously in mongodb aggregation?

MongoDB two groups Aggregate

Count Distinct Within Date Range

Mongodb Aggregate using $group twice

Categories

Resources