mongodb get entries where id exists multiple times based on count condition - mongodb

I have a collection 'bookings' with the following example structure:
{
"_id" : ObjectId("1"),
"user" : ObjectId("1"),
"event" : ObjectId("1"),
},
{
"_id" : ObjectId("2"),
"user" : ObjectId("1"),
"event" : ObjectId("1"),
},
{
"_id" : ObjectId("3"),
"user" : ObjectId("2"),
"event" : ObjectId("1"),
},
{
"_id" : ObjectId("4"),
"user" : ObjectId("3"),
"event" : ObjectId("1"),
},
{
"_id" : ObjectId("5"),
"user" : ObjectId("4"),
"event" : ObjectId("2"),
},
{
"_id" : ObjectId("6"),
"user" : ObjectId("1"),
"event" : ObjectId("2"),
}
I cant figure out a query that shows all "event" id's in which the same "user" id appears multiple times. something like this:
{
"event": 1,
"user": 1,
"count": 2
}
Does not have to be this exact output, in other words I just want a way to have a query to get all events for which the same "user" id has more than one entry in this "bookings" collection.
Any suggestions? Thanks!

You just need to do grouping and filtering.
In SQL it would be just as simple as
SELECT count(*) as cc, user, event FROMM t1 GROUP BY user, event HAVING cc > 1
In MongoDB, you can use the aggregation framework to do equivalent stuff.
It does the same in 3 different steps in the pipeline: group, filter, project.
db.mycollection.aggregate( [
{ $group: { _id: { user: "$user", event: "$event", }, count: { $sum: 1 } } },
{ $match: { count: { $gt: 1 } } },
{ $project: { _id: 0,
userId: "$_id. user",
event: "$_id.event",
count: 1
}
}
] )
This documentation can help you to understand deeper: https://www.mongodb.com/docs/manual/reference/sql-aggregation-comparison/

Related

MongoDB - Grouping by inner-documents and retrieving top results

I'm trying to find the most common (and least common) skills stored in the mongo database. I'm using mongoose to retrieve the results.
The User is the root document, which each have an inner Profile document. The profile has an attribute of 'skills' which contain an array of ProfileSkillEntry's which has a title (the skill name).
return User.aggregate([{
$group: {
'_id': '$profile.skills.title',
'count': {
$sum: 1
}
}
}, {
$sort: {
'count': -1
}
}, {
$limit: 5
}]);
I expect it to combine all of the registered Users skills together, find the top 5 occurring and return that. Instead it seems to be grouping per-user and giving invalid results.
Example User document structure:
{
"_id" : ObjectId("..."),
"firstName" : "Harry",
"lastName" : "Potter",
"profile" : {
"_id" : ObjectId("..."),
"skills" : [
{
"_id" : ObjectId("..."),
"title" : "Java",
"description" : "Master",
"dateFrom" : "31/07/2019",
"coreSkill" : true
},
{
"_id" : ObjectId("..."),
"title" : "JavaScript",
"description" : "Proficient",
"dateFrom" : "31/07/2019",
"coreSkill" : false
}
],
}
}
Please use the below query. Just add the sort and limit as per your requirement
db.test.aggregate(
[{ $unwind: { path: "$profile.skills"} },
{ $group: { _id: "$profile.skills.title",
"count": { $sum: 1 }} }] )

Optimize $group performance in mongodb

I have huge documents about 2 millions like this, each account has many logs
{
"_id" : 1,
"type" : "login",
"date" : "2057-03-28T02:59:41.176Z",
"link" : DBRef("accounts", ObjectId("5bd9683d4df859ad279b5649"))
},
{
"_id" : 2,
"type" : "login",
"date" : "2057-03-28T02:53:41.176Z",
"link" : DBRef("accounts", ObjectId("5bd9683d4df859ad279b5649"))
},
{
"_id" : 3,
"type" : "login",
"date" : "2057-03-28T02:49:41.176Z",
"link" : DBRef("accounts", ObjectId("5bd9683d4df859ad279b5643"))
}
I'm trying to get latest date of log per account by this way, but it took couple seconds to get result.
db.logs.aggregate(
{
$match: {
"link.$ref": "accounts",
"link.$id": { $in: [ObjectId("5bd9683d4df859ad279b5649"), ObjectId("5bd9683d4df859ad279b5643")]} //array accounts id
}
},
{
$group: {
_id: "$link",
date: {$last: "$date"},
type: {$last: "$type"}
}
}
)
I'm using mongodb 3.4

MongoDB filtering out subdocuments with lookup aggregation

Our project database has a capped collection called values which gets updated every few minutes with new data from sensors. These sensors all belong to a single sensor node, and I would like to query the last data from these nodes in a single aggregation. The problem I am having is filtering out just the last of ALL the types of sensors while still having only one (efficient) query. I looked around and found the $group argument, but I can't seem to figure out how to use it correctly in this case.
The database is structured as follows:
nodes:
{
"_id": 681
"sensors": [
{
"type": "foo"
},
{
"type": "bar"
}
]
}
values:
{
"_id" : ObjectId("570cc8b6ac55850d5740784e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "foo",
"nodeid" : 681,
"value" : 10
}
{
"_id" : ObjectId("190ac8b6ac55850d5740776e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "bar",
"nodeid" : 681,
"value" : 20
}
{
"_id" : ObjectId("167bc997bb66750d5740665e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "bar",
"nodeid" : 200,
"value" : 20
}
{
"_id" : ObjectId("110cc9c6ac55850d5740784e"),
"timestamp" : ISODate("2016-04-09T12:06:46.344Z"),
"type" : "foo",
"nodeid" : 681,
"value" : 12
}
so let's imagine I want the data from node 681, I would want a structure like this:
nodes:
{
"_id": 681
"sensors": [
{
"_id" : ObjectId("570cc8b6ac55850d5740784e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "foo",
"nodeid" : 681,
"value" : 10
},
{
"_id" : ObjectId("190ac8b6ac55850d5740776e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "bar",
"nodeid" : 681,
"value" : 20
}
]
}
Notice how one value of foo is not queried, because I want to only get the latest value possible if there are more than one value (which is always going to be the case). The ordering of the collection is already according to the timestamp because the collection is capped.
I have this query, but it just gets all the values from the database (which is waaay too much to do in a lifetime, let alone one request of the web app), so I was wondering how I would filter it before it gets aggregated.
query:
db.nodes.aggregate(
[
{
$unwind: "$sensors"
},
{
$match:{
nodeid: 681
}
},
{
$lookup:{
from: "values", localField: "sensors.type", foreignField: "type", as: "sensors"
}
}
}
]
)
Try this
// Pipeline
[
// Stage 1 - sort the data collection if not already done (optional)
{
$sort: {
"timestamp":1
}
},
// Stage 2 - group by type & nodeid then get first item found in each group
{
$group: {
"_id":{type:"$type",nodeid:"$nodeid"},
"sensors": {"$first":"$$CURRENT"} //consider using $last if your collection is on reverse
}
},
// Stage 3 - project the fields in desired
{
$project: {
"_id":"$sensors._id",
"timestamp":"$sensors.timestamp",
"type":"$sensors.type",
"nodeid":"$sensors.nodeid",
"value":"$sensors.value"
}
},
// Stage 4 - group and push it to array sensors
{
$group: {
"_id":{nodeid:"$nodeid"},
"sensors": {"$addToSet":"$$CURRENT"}
}
}
]
as far as I got document structure, there is no need to use $lookup as all data is in readings(values) collection.
Please see proposed solution:
db.readings.aggregate([{
$match : {
nodeid : 681
}
},
{
$group : {
_id : {
type : "$type",
nodeid : "$nodeid"
},
readings : {
$push : {
timestamp : "$timestamp",
value : "$value",
id : "$_id"
}
}
}
}, {
$project : {
_id : "$_id",
readings : {
$slice : ["$readings", -1]
}
}
}, {
$unwind : "$readings"
}, {
$project : {
_id : "$readings.id",
type : "$_id.type",
nodeid : "$_id.nodeid",
timestamp : "$readings.timestamp",
value : "$readings.value",
}
}, {
$group : {
_id : "$nodeid",
sensors : {
$push : {
_id : "$_id",
timestamp : "$timestamp",
value : "$value",
type:"$type"
}
}
}
}
])
and output:
{
"_id" : 681,
"sensors" : [
{
"_id" : ObjectId("110cc9c6ac55850d5740784e"),
"timestamp" : ISODate("2016-04-09T12:06:46.344Z"),
"value" : 12,
"type" : "foo"
},
{
"_id" : ObjectId("190ac8b6ac55850d5740776e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"value" : 20,
"type" : "bar"
}
]
}
Any comments welcome!

MongoDB $match not working for part of sub document

In my MongoDB, I am trying to "group by" a set of records using the aggregate() function and the $match query I use does not select the records in the pipeline.
My dataset has sub documents (on which I apply the $match) and it looks like this:
{
"_id" : ObjectId("550994e21cba9597624195aa"),
"taskName" : "task name",
"taskDetail" : "task detail.",
"scheduledStartDate" : ISODate("2015-04-06T09:00:00.000Z"),
"scheduledEndDate" : ISODate("2015-04-06T11:00:00.000Z"),
"user" : {
"id" : "abcd1123",
"name" : "username"
},
"status" : "Assigned"
}
{
"_id" : ObjectId("550994e21cba9597624195aa"),
"taskName" : "task name",
"taskDetail" : "task detail.",
"scheduledStartDate" : ISODate("2015-04-06T09:00:00.000Z"),
"scheduledEndDate" : ISODate("2015-04-06T11:00:00.000Z"),
"user" : {
"id" : "abcd1123",
"name" : "username"
},
"status" : "Assigned"
}
{
"_id" : ObjectId("550994e21cba9597624195aa"),
"taskName" : "task name",
"taskDetail" : "task detail.",
"scheduledStartDate" : ISODate("2015-04-06T09:00:00.000Z"),
"scheduledEndDate" : ISODate("2015-04-06T11:00:00.000Z"),
"user" : {
"id" : "abcd1124",
"name" : "username"
},
"status" : "Assigned"
}
I want to find out count of status for a particular user grouped by status type. For the user "abcd1123", the query that I use is:
db.tasks.aggregate( [
{
$match : {
user : { id : "abcd1123" }
}
},
{
$group: {
_id: "$status",
count: { $sum: 1 }
}
}
] )
The above query does not return any results, because the $match does not put any results into the pipeline. If I modify the query like this:
db.tasks.aggregate( [
{
$match : {
user : { id : "abcd1123", name : "username" }
}
},
{
$group: {
_id: "$status",
count: { $sum: 1 }
}
}
] )
it works. But I don't have the username as input and I need to find only based on user id.
Since user is not an array, I can't use $unwind on it.
How do I acheive this?
You are too close only change your aggregation query as below
db.collectionName.aggregate({
"$match": {
"user.id": "abcd1123"
}
},
{
"$group": {
"_id": "$status",
"count": {
"$sum": 1
}
}
})

How to do this query in mongo: get newest messages for a list of users

I have a collection of messages with fields user_id, created_time, and content. Given a list of user_id, I would like to get back a list of messages, where for each user_id it contains a message newest with respect to that user. I thought about using a distinct command together with sort in mongo but that doesn't seem to be supported. Is there a way to do this in mongo using a single query?
MongoDB has the Aggregation framework which you can use for tasks that require some manipulation of your data in your collection
Consider the following dataset
> db.messages.find().pretty()
{
"_id" : ObjectId("52ecb77486d35a12f3552aa1"),
"user_id" : "fred",
"create_date" : ISODate("1392-09-21T00:00:00Z")
}
{
"_id" : ObjectId("52ecb79286d35a12f3552aa2"),
"user_id" : "fred",
"create_date" : ISODate("1392-06-01T00:00:00Z")
}
{
"_id" : ObjectId("52ecb7a386d35a12f3552aa3"),
"user_id" : "marty",
"create_date" : ISODate("1393-04-06T00:00:00Z")
}
{
"_id" : ObjectId("52ecb7af86d35a12f3552aa4"),
"user_id" : "marty",
"create_date" : ISODate("1386-02-12T00:00:00Z")
}
So in passing this to aggregate we want to group on user_id and get the most recent or maximum create_date
> db.messages.aggregate([
{ $group: { _id: { user_id: "$user_id" }, create_date: { $max: "$create_date" }} }
])
{
"result" : [
{
"_id" : {
"user_id" : "marty"
},
"create_date" : ISODate("1393-04-06T00:00:00Z")
},
{
"_id" : {
"user_id" : "fred"
},
"create_date" : ISODate("1392-09-21T00:00:00Z")
}
],
"ok" : 1
}
That's not bad but you can clean it up with $project
> db.messages.aggregate([
{ $group: { _id: { user_id: "$user_id" }, create_date: { $max: "$create_date" }} },
{ $project: { _id: 0, user_id: "$_id.user_id", create_date: 1} }
])
{
"result" : [
{
"create_date" : ISODate("1393-04-06T00:00:00Z"),
"user_id" : "marty"
},
{
"create_date" : ISODate("1392-09-21T00:00:00Z"),
"user_id" : "fred"
}
],
"ok" : 1
}
So that actually looks like a clean record to use. In latest drivers the returned value from aggregate should be a cursor you can iterate over. So the results are just the same to work with as using find.
Additional documentation on operators to use can be found here.