I'm making a search page where people can search for other users and filter the results by various categories (gender, industry, occupation, etc) and I want to be able to show a count next to each facet, like you see on ebay and lots of ecommerce sites.
I've got it semi-working, but the problem is that I'm doing the facet counts after the filtering has already taken place, so if the filter includes industry = "science" (for example), then I don't get a count of all the other industries, I only get the count of the users whose industry is "science".
How can I make sure I get the counts of all the industries if the search query includes filtering by industry?
{
$geoNear: {
near: {
type: "Point",
coordinates: [parseFloat(longitude), parseFloat(latitude)],
},
distanceField: "dist.calculated",
maxDistance: Number((maxDistance || 1500) * 1609.34),
spherical: true,
query: filters,
},
},
{
$project: {
name: 1,
location: 1,
gender: 1,
username: 1,
dist: 1,
area: 1,
industry: 1,
},
},
{
$facet: {
paginatedResults: [
{ $skip: parseInt(skip) },
{ $limit: parseInt(limit) },
],
totalCount: [
{
$count: "count",
},
],
industry: [
{ $match: { industry: { $exists: 1 } } },
{
$sortByCount: "$industry",
},
],
gender: [
{ $match: { gender: { $exists: 1 } } },
{
$sortByCount: "$gender",
},
],
},
},
]);```
Related
Our MongoDB database has a collection of companies and a collection of users. I'm running an aggregation pipeline, the details of which vary based on various filters chosen by an administrator, but below is a simple example. The companies collection has approximately 15 entries, and there are currently around 40,000 entries in the users collection.
Our aggregation returns user data, but must contain the companyname from the companies collection (there may also be filtering on the company depending on what options are chosen). A user document contains an array of subdocuments for reports - we need to include the last report date for possible sorting/filtering, too. The only other non-standard thing is that there is an object field keydata which needs to be included for searching. This object will be different for every user.
With only 40,000 records in the users collection, this aggregation is currently taking approximately five seconds. Eventually, we will have millions of records in the database, so I need a far more performant way of searching records.
Any pointers, please?
[
{
$lookup: {
from: "users",
let: {
cref: "$companyref",
companyname: "$companyname",
},
as: "users",
pipeline: [
{
$match: {
$expr: {
$eq: ["$$cref", "$companyref"],
},
},
},
{
$addFields: {
lastreportdate: {
$arrayElemAt: [
{
$slice: ["$reports.date", -1],
},
0,
],
},
kd: {
$objectToArray: "$keydata",
},
name: {
$concat: ["$firstname", " ", "$lastname"],
},
companyname: "$$companyname",
},
},
{
$match: {
$or: [
{
"kd.v": /searchstring/,
},
{
name: /searchstring/,
},
{
email: /searchstring/,
},
{
userref: /searchstring/,
},
{
username: /searchstring/,
},
],
},
},
{
$project: {
kd: 0,
lastreportdate: 0,
},
},
],
},
},
{
$unwind: {
path: "$users",
preserveNullAndEmptyArrays: false,
},
},
{
$replaceRoot: {
newRoot: "$users",
},
},
{
$project: {
userref: 1,
keydata: 1,
email: 1,
zipcode: 1,
companyref: 1,
companyname: 1,
firstname: 1,
lastname: 1,
name: 1,
reports: 1,
lastlogin: 1,
},
},
]
Collection companies has an index on companyref and collection users has an index on userref.
I have two functionalities working individually but want to combine them.
Functionality 1 - Sort users by their geoNear distance.
Functionality 2 - The users should not have already been liked by the
current user (look up partnership collection)
How to update this query to start from the user's collection so I can do geoNear?
The output in the below mongoplayground is correct except that the resulting users are not sorted by calculatedDist which is a field calculated by geoNear.
$geoNear: {
near: { type: "Point", coordinates: [x,y },
distanceField: "calculatedDist",
spherical: true
}
geoNear needs location which is only available in users collection hence I think below query needs to be modified to start in user's collection.
https://mongoplayground.net/p/7H_NxciKezB
db={
users: [
{
_id: "abc",
name: "abc",
group: 1,
location: {
type: "Point",
coordinates: [
54.23,
67.12
]
},
calculatedDist: 13
},
{
_id: "xyz",
name: "xyyy",
group: 1,
location: {
type: "Point",
coordinates: [
54.23,
67.12
]
},
calculatedDist: 11
},
{
_id: "123",
name: "yyy",
group: 1,
location: {
type: "Point",
coordinates: [
54.23,
67.12
]
},
calculatedDist: 2
},
{
_id: "rrr",
name: "tttt",
group: 1,
location: {
type: "Point",
coordinates: [
54.23,
67.12
]
},
calculatedDist: 11
},
{
_id: "eee",
name: "uuu",
group: 1,
location: {
type: "Point",
coordinates: [
54.23,
67.12
]
},
calculatedDist: 7
},
],
partnership: [
{
_id: "abc_123",
fromUser: "abc",
toUser: "123"
},
{
_id: "eee_rrr",
fromUser: "eee",
toUser: "rrr"
},
{
_id: "rrr_abc",
fromUser: "rrr",
toUser: "abc"
},
{
_id: "abc_rrr",
fromUser: "abc",
toUser: "rrr"
},
{
_id: "xyz_rrr",
fromUser: "xyz",
toUser: "rrr"
},
{
_id: "rrr_eee",
fromUser: "rrr",
toUser: "eee"
},
]
}
geoNear as far as I know has to be the first thing to be done so my query should start with the users collection. This breaks my partnership check because for that to work, I start at partnership collection.
In the playground above, the user eee has a lesser calculated distance as a result of geoNear but it shows after user abc.
Try this out:
db.partnership.aggregate([
// $geoNear
{
$match: {
$or: [
{
fromUser: "rrr"
},
{
toUser: "rrr"
}
]
}
},
{
$group: {
_id: 0,
from: {
$addToSet: "$fromUser"
},
to: {
$addToSet: "$toUser"
}
}
},
{
$project: {
_id: 0,
users: {
$filter: {
input: {
$setIntersection: [
"$from",
"$to"
]
},
cond: {
$ne: [
"$$this",
"rrr"
]
}
}
}
}
},
{
$lookup: {
from: "users",
let: {
userId: "$users"
},
pipeline: [
{
"$geoNear": {
"near": {
"type": "Point",
"coordinates": [
31.4998,
-61.4065
]
},
"distanceField": "calculatedDist",
"spherical": true
}
},
{
"$match": {
"$expr": {
"$in": [
"$_id",
"$$userId"
]
}
}
}
],
as: "users"
}
},
{
$project: {
users: 1,
count: {
$size: "$users"
}
}
}
])
Here, we use the pipelined form of lookup.
The lookup is on the user's collection, in which we specify a pipeline with the $geoNear stage as the first stage.
And finally filter out and only keep the users belonging to a partnership.
This is the playground link. Let me know if it works, on the playground I can't test it because $geoNear requires a 2d index.
While using calculatedDist, it looks like this:
db.partnership.aggregate([
// $geoNear
{
$match: {
$or: [
{
fromUser: "rrr"
},
{
toUser: "rrr"
}
]
}
},
{
$group: {
_id: 0,
from: {
$addToSet: "$fromUser"
},
to: {
$addToSet: "$toUser"
}
}
},
{
$project: {
_id: 0,
users: {
$filter: {
input: {
$setIntersection: [
"$from",
"$to"
]
},
cond: {
$ne: [
"$$this",
"rrr"
]
}
}
}
}
},
{
$lookup: {
from: "users",
let: {
userId: "$users"
},
pipeline: [
{
$sort: {
calculatedDist: 1
}
},
{
"$match": {
"$expr": {
"$in": [
"$_id",
"$$userId"
]
}
}
}
],
as: "users"
}
},
{
$project: {
users: 1,
count: {
$size: "$users"
}
}
}
])
Playground.
Given a basic Mongoose user schema that uses timestamps (hence createdAt & updatedAt are available by default), I am trying to build a query that includes pagination server-side using aggregation.
So far, I have achieved the following that works pretty decently when there's data in the database and any filter matches any of the items in the collection.
{
"users": [
{
"_id": "61c331f4bd87407c01b81324",
"bio": "🚀",
"surname": "Doe",
"name": "John",
"verified": true,
"disabled": false,
"username": "user1235",
"score": 1
}
],
"pagination": {
"total": 1,
"limit": 10,
"page": 1,
"pages": 1
}
}
However, this just returns an empty array if no item matches the criteria, losing the pagination part. My guess is facet part is the one not working here as I expect.
const findUsersQuery = await this.userModel.aggregate()
.match(query)
.sort({ cratedAt: -1 })
.project({
password: 0, email: 0, roles: 0, __v: 0,
facebookId: 0, googleId: 0, createdAt: 0,
updatedAt: 0, birthday: 0
})
.facet({
total: [{
$count: 'createdAt'
}],
data: [{
$addFields: {
_id: '$_id'
}
}]
})
.unwind('$total')
.project({
users: {
$slice: ['$data', ((page * limit) - limit), {
$ifNull: [limit, '$total.createdAt']
}]
},
pagination: {
total: '$total.createdAt',
limit: {
$literal: limit
},
page: {
$literal: page
},
pages: {
$ceil: {
$divide: ['$total.createdAt', limit]
}
},
}
})
return findUsersQuery[0]
I have tried using $ifNull in some places, even in facet stage, in order to just return 0 or an empty array to return the same structure (keeping users array and pagination piece), with no avail:
.project({
users: {
$ifNull: [
{
$slice: ['$data', ((page * limit) - limit), {
$ifNull: [limit, '$total.createdAt']
}]
},
[]
]
},
pagination: {
total: {
$ifNull: ['$total.createdAt', 0]
},
limit: {
$literal: limit
},
page: {
$literal: page
},
pages: {
$ifNull: [
{
$ceil: {
$divide: ['$total.createdAt', limit]
}
},
0
]
},
}
})
How can I use $ifNull (if that is the best approach) in order to keep the same response structure when no items match the criteria? Like this:
{
"users": [],
"pagination": {
"total": 0,
"limit": 10,
"page": 1,
"pages": 1
}
}
Any help would be very much appreciated.
You do not need to use the $ifNull here, your problem originates from the $unwind behavior.
You want to be using the preserveNullAndEmptyArrays in the $unwind
New in version 3.2: To output documents where the array field is missing, null or an empty array, use the preserveNullAndEmptyArrays option.
This will keep the document even if "empty", like so:
.unwind({
path: "$total",
preserveNullAndEmptyArrays: true
})
I have a document with a subdocument (not referenced). I want to apply the aggregation on the field of the subdocument.
Schema
const MFileSchema = new Schema({
path: String,
malwareNames: [String],
title: String,
severity: String // i want to aggregate bases on this field
});
const ScanSchema = new Schema({
agent: { type: Schema.Types.ObjectId, ref: "Agent" },
completedAt: Date,
startedAt: { type: Date, default: Date.now() },
mFiles: [MFileSchema] // array of malicious files schema
});
Model
let Scan = model("Scan", ScanSchema);
Task
Find the sum of severity in all scan documents of particular agents.
// agents is an array Agents (the schema is not important to show, consider the _id)
The Aggregation Query I am using
let c = await Scan.aggregate([
{ $match: { agent: agents } },
{ $project: { "mFiles.severity": true } },
{ $group: { _id: "$mFiles.severity", count: { $sum: 1 } } }
]);
console.log(c);
Actual Output
[]
Expected Output
// The value of count in this question is arbitrary
[
{ _id: "Critical", count: 30 },
{ _id: "Moderate", count: 33 },
{ _id: "Clean", count: 500 }
]
PS: Also I would appreciate if you could suggest me the best resources to learn MongoDB aggregations
You need to use $in query operator in the $match stage, and add $unwind stage before $group stage.
db.collection.aggregate([
{
$match: {
agent: {
$in: [
"5e2c98fc3d785252ce5b5693",
"5e2c98fc3d785252ce5b5694"
]
}
}
},
{
$project: {
"mFiles.severity": true
}
},
{
$unwind: "$mFiles"
},
{
$group: {
_id: "$mFiles.severity",
count: {
$sum: 1
}
}
}
])
Playground
Sample data:
[
{
"agent": "5e2c98fc3d785252ce5b5693",
"mFiles": [
{
"title": "t1",
"severity": "Critical"
},
{
"title": "t2",
"severity": "Critical"
},
{
"title": "t3",
"severity": "Moderate"
},
{
"title": "t4",
"severity": "Clean"
}
]
},
{
"agent": "5e2c98fc3d785252ce5b5694",
"mFiles": [
{
"title": "t5",
"severity": "Critical"
},
{
"title": "t6",
"severity": "Critical"
},
{
"title": "t7",
"severity": "Moderate"
}
]
}
]
Output:
[
{
"_id": "Moderate",
"count": 2
},
{
"_id": "Critical",
"count": 4
},
{
"_id": "Clean",
"count": 1
}
]
For mongoose integration:
//agents must be an array of objectIds like this
// [ObjectId("5e2c98fc3d785252ce5b5693"), ObjectId("5e2c98fc3d785252ce5b5694")]
//or ["5e2c98fc3d785252ce5b5693","5e2c98fc3d785252ce5b5694"]
const ObjectId = require("mongoose").Types.ObjectId;
let c = await Scan.aggregate([
{
$match: {
agent: {
$in: agents
}
}
},
{
$project: {
"mFiles.severity": true
}
},
{
$unwind: "$mFiles"
},
{
$group: {
_id: "$mFiles.severity",
count: {
$sum: 1
}
}
}
]);
Best place for learning mongodb aggregation is the official docs.
Running an aggregation such as the following:
[
{
"$match":{
"datasourceName":"Startup Failures",
"sheetName":"Data",
"Cost":{
"$exists":true
},
"Status":{
"$exists":true
}
}
},
{
"$group":{
"Count of Cost":{
"$sum":1
},
"Count of Status":{
"$sum":1
},
"_id":null
}
},
{
"$project":{
"Count of Cost":1,
"Count of Status":1
}
}
]
The result of the exists filters actually filters out the whole documents where "Cost" or "Status" do not exist. Such that the projection (Count) of both Cost and Status are the same. I don't want to filter the whole document only the individual columns such that the projection I get is the number of documents where Cost exists (Count of Cost) and the other projection is the number of documents where Status exists. In the case of my data these would give two separate numbers.
I have an aggregation using $facet; this allows do queries in parallel for each document pass. So, we query and count the Cost and Status as two facets of the same query.
db.test.aggregate( [
{
$match: { fld1: "Data" }
},
{
$facet: {
cost: [
{ $match: { cost: { $exists: true } } },
{ $count: "count" }
],
status: [
{ $match: { status: { $exists: true } } },
{ $count: "count" }
],
}
},
{
$project: {
costCount: { $arrayElemAt: [ "$cost.count" , 0 ] },
statusCount: { $arrayElemAt: [ "$status.count" , 0 ] }
}
}
] )
I get a result of { "costCount" : 4, "statusCount" : 3 }, using the following documents:
{ _id: 1, fld1: "Data", cost: 12, status: "Y" },
{ _id: 2, fld1: "Data", status: "N" },
{ _id: 3, fld1: "Data" },
{ _id: 4, fld1: "Data", cost: 90 },
{ _id: 5, fld1: "Data", cost: 44 },
{ _id: 6, fld1: "Data", cost: 235, status: "N" },
{ _id: 9, fld1: "Stuff", cost: 0, status: "Y" }
NOTE: Here is a similar query using the facets: MongoDB Custom sorting on two fields.