Storing/Saving only a particular object from list of objects along with parent object in MongoDB

In mongodb, I have a master table called category
sample data as below:
"_id" : "63d3e01f43aa4e0ee349f841",
"subCategories" : [
"subCategoryId" : NumberLong(1),
"name": "Mobile phones"
"subCategoryId" : NumberLong(2),
"name": "XYZ Machine"
There is another table called product. Sample data as below:
"_id" : "63d3e13b43aa4e0ee349f842",
"productId" : NumberLong(1),
"name" : "iphone 14",
"category" : DBRef("category", "63d3e01f43aa4e0ee349f841")
While adding new product, only 1 category and 1 subcategory from that selected category can be selected. In my case, I am using #DbRef and I am struggling to find a way through which I can save only 1 subcategory within the product table. Right now it points to an entire object of the category table in which there can be x number of subcategories.
Is it possible to achieve this using #DbRef annotation without changing the database structure and without breaking the category table records in between separate category & subcategory tables ?
May be something like this:
"_id" : "63d3e13b43aa4e0ee349f842",
"productId" : NumberLong(1),
"name" : "iphone 14",
"category" : DBRef({"category", "63d3e01f43aa4e0ee349f841"},
"subCategoryId", 1)
Using MongoDb version 4+ with Java spring-data-mongo

I don't think it is possible to achieve your expected behaviour without changing the schema. From official doc of DBRef,
DBRefs are a convention for representing a document, rather than a specific reference type.
So DBRef will point to a specific document, instead of certain sub-document array entry.
This leaves us 2 options:
change the category collection to store document like this:
"categoryId" : "63d3e01f43aa4e0ee349f841", // this is new
"subCategoryId" : NumberLong(1),
"name": "Mobile phones"
Unfortunately this is banned as changing schema is not allowed
add another field in product schema to store the subCategory Id and use it to locate subCategory entries when $lookup
"_id": "63d3e13b43aa4e0ee349f842",
"productId": NumberLong(1),
"name": "iphone 14",
"category": {
"$ref": "category",
"$id": "63d3e01f43aa4e0ee349f841"
"subCategoryId": NumberLong(1) // this is new
the aggregation:
$match: {
"_id": "63d3e13b43aa4e0ee349f842"
"$lookup": {
"from": "category",
"let": {
categoryId: "$category.$id",
subCategoryId: "$subCategoryId"
"pipeline": [
$match: {
$expr: {
$eq: [
$unwind: "$subCategories"
$match: {
$expr: {
$eq: [
"as": "subCategoryLookup"
Mongo Playground
This is also kind of banned as it needs to add one more field to the product schema. But I would still suggest this as this involves a minimal change to the schema.


Remove documents not linked by DBRef mongodb

Hello everyone I try to remove all documents which are no link by an DBRef field.
For example I have 2 collections User and Order.
Let's say User collection has this structure :
"_id": "12345",
"lastName": "Michael",
"firstName": "Bernard"
And Order collection has this structure :
"_id": "123456",
"orderDate": "2022-12-26",
"userWhoOrder": {
"$ref": "user",
"$id": "12345"
"userWhoPay": {
"$ref": "user",
"$id": "123456"
Note : userWhoOrder and userWhoPay are DBRef fields (and it can be the same user).
So I want remove all users who are not present in order collection (neither userWhoOrder neither userWhoPay).
I know I can do this in 2 steps :
Get a list of userWhoOrder and userWhoPay from order collection.
Filter users list which not contains users of step 1 and remove them.
But I want to know if there are a properly way to do this with a single request (using $lookup for example).
Here is what I tried for getting a users list to remove :
db.getCollection("user").aggregate[{$lookup: {
from: "order",
let: {userId: "$_id"},
pipeline: [
{$match: {$expr: {
$and: [
{$ne: ["$userWhoOrder.$id", "$$userId"]},
{$ne: ["$userWhoPay.$id", "$$userId"]}
as: "result"

MongoDB - average of a feature after slicing the max of another feature in a group of documents

I am very new in mongodb and trying to work around a couple of queries, which I am not even sure if they 're feasible.
The structure of each document is:
"_id" : {
"$oid": Text
"grade": Text,
"type" : Text,
"score": Integer,
"info" : {
"range" : NumericText,
"genre" : Text,
"special": {keys:values}
The first query would give me:
per grade (thinking I have to group by "grade")
the highest range (thinking I have to call $max:$range, it should work with a string)
the score average (thinking I have to call $avg:$score)
I tried something like the following, which apparently is wrong:
'$group': {'_id':'$grade',
'highest_range': {'$max':'$info',
'average_score': {'$avg':'$score'}}}
The second query would give the distinct genre records.
Any help is valuable!
ADDITION - providing an example of the document and the output:
"_id" : {
"$oid": '60491ea71f8'
"grade": D,
"type" : Shop,
"score": 4,
"info" : {
"range" : "2",
"genre" : 'Pet shop',
"special": {'ClientsParking':True,
And the output I am looking into is something within lines:
[{grade: A, "highest_range":"4", "average_score":3.5},
{grade: B, "highest_range":"7", "average_score":8.3},
{grade: C, "highest_range":"3", "average_score":2.4}]
I think you are looking for this:
'$group': {
'_id': '$grade',
'highest_range': { '$max': '$info.range' },
'average_score': { '$avg': '$score' }
However, $min, $max, $avg works only on numbers, not strings.
You could try { '$first': '$info.range' } or { '$last': '$info.range' }. But it requires $sort for proper result. Not clear what you mean by "highest range".

Compare a date of two elements

My problem is difficult to explain :
In my website I save every action of my visitors (view, click, buy etc).
I have a simple collection named "flow" where my data is registered
"_id" : ObjectId("534d4a9a37e4fbfc0bf20483"),
"profile" : ObjectId("534bebc32939ffd316a34641"),
"activities" : [
"id" : ObjectId("534bebc42939ffd316a3af62"),
"date" : ISODate("2013-12-13T22:39:45.808Z"),
"verb" : "like",
"product" : "5"
"id" : ObjectId("534bebc52939ffd316a3f480"),
"date" : ISODate("2013-12-20T19:19:10.098Z"),
"verb" : "view",
"product" : "6"
"id" : ObjectId("534bebc32939ffd316a3690f"),
"date" : ISODate("2014-01-01T07:11:44.902Z"),
"verb" : "buy",
"product" : "5"
"id" : ObjectId("534bebc42939ffd316a3741b"),
"date" : ISODate("2014-01-11T08:49:02.684Z"),
"verb" : "favorite",
"product" : "26"
I would like to aggregate these data to retrieve the number of people who made an action (for example "view") and then another later in time (for example "buy"). To to that I need to compare "date" inside my "activities" array...
I tried to use aggregation framework to do that but I do not see how too make this request
This is my beginning :
{ $project: { profile: 1, activities: 1, _id: 0 } },
{ $match: { $and: [{'activities.verb': 'view'}, {'activities.verb': 'buy'}] }}, //First verb + second verb
{ $unwind: '$activities' },
{ $match: { 'activities.verb': {$in:['view', 'buy']} } }, //First verb + second verb,
$group: {
_id: '$profile',
view: { $push: { $cond: [ { $eq: [ "$activities.verb", "view" ] } , "$", null ] } },
buy: { $push: { $cond: [ { $eq: [ "$activities.verb", "buy" ] } , "$", null ] } }
Maybe the format of my collection "flow" is not the best to do what I want...If you have any better idea dont hesitate
Thank you for your help !
Here is the aggregation that will give you the total number of buyers who viewed first and then bought (though not necessarily the same product that they viewed).
{$match: {"activities.verb":{$all:["view","buy"]}}},
{$unwind :"$activities"},
{$match: {"activities.verb":{$in:["view","buy"]}}},
{$group: {
then : "$",
else : new Date(9999,0,1)
lastBought: {$max:{$cond:{
else:new Date(1900,0,1)}
{$project: {viewedThenBought:{$cond:{
Here you first pass through the pipeline only the documents that have all the "verbs" you are interested in. When you group the first time, you want to use the earliest "view" and the last "buy" and the next project compares them to see if they viewed before they bought.
The last step gives you the count of all the people who satisfied your criteria.
Be careful to leave out all $project phases that don't actually compute any new fields (like you very first $project). The aggregation framework is smart enough to never pass through any fields that it sees are not used in any later stages, so there is never a need to $project just to "eliminate" fields as that will happen automatically.
For your query:
I would like to aggregate these data to retrieve the number of people who made an action
Try this:
// De-normalize the array into individual documents
{"$unwind" : "$activities"},
// Match for the verbs you are interested in
{"$match" : {"activities.verb":{$in:["buy", "view"]}}},
// Group by verb to get the count
{"$group" : {_id:"$activities.verb", count:{$sum:1}}}
The above query would produce an output like:
"result" : [
"_id" : "buy",
"count" : 1
"_id" : "view",
"count" : 1
"ok" : 1
Note: The $and operator in your query ({ $match: { $and: [{'activities.verb': 'view'}, {'activities.verb': 'buy'}] }}) is not required as that's the default if you specify multiple conditions. Only if you need a logical OR, $or operator is required.
If you want to use the date in the aggregation query to do queries like how many "views by day", etc.. the Date Aggregation Operators will come in handy.
I see where you are going with this and I think you are basically on the right track. So more or less un-altered (but for formatting preference) and the few tweeks at the end:
// Try to $match "first" always to make sure you can get an index
{ "$match": {
"$and": [
{"activities.verb": "view"},
{"activities.verb": "buy"}
// Don't worry, the optimizer "sees" this and will sort of "blend" with
// with the first stage.
{ "$project": {
"profile": 1,
"activities": 1,
"_id": 0
{ "$unwind": "$activities" },
{ "$match": {
"activities.verb": { "$in":["view", "buy"] }
{ "$group": {
"_id": "$profile",
"view": { "$min": { "$cond": [
{ "$eq": [ "$activities.verb", "view" ] },
"buy": { "$max": { "$cond": [
{ "$eq": [ "$activities.verb", "buy" ] },
{ "$project": {
"viewFirst": { "$lt": [ "$view", "$buy" ] }
So essentially the $min and $max operators should be self explanatory in the context in that you should be looking for the "first" view to correspond with the "last" purchase. As for me, and would make sense, you would actually be matching these by product (but hint: "Grouping") but I'll leave that part up to you.
The other advantage here is that the false values will always be negated if there is an actual date to match the "verb". Otherwise this goes through as false and this turns out to be okay.
That is because the next thing you do is $project to "compare" the values and ask the question "Did the 'view' happen before the 'buy'?" which is a logical evaluation of the "less than" $lt operator.
As for the schema itself. If you are storing a lot of these "events" then you are probably better off flattening things out into separate documents and finding some way to mark each with the same "session" identifier if that is separate to "profile".
Getting away from large arrays ( which this seems to lead to ) if likely going to help performance, and with care, makes little different to the aggregation process.