Merge two array items into one based on common field in MongoDB - mongodb

I have following following collection (in Json format) -
/* 1 */
"_id" : NumberLong(1111),
"valueArray" : [
"accountNumber" : NumberLong(12345),
"levels" : [
"accountNumber" : NumberLong(67890),
"levels" : [
/* 2 */
"_id" : NumberLong(2222),
"valueArray" : [
"accountNumber" : NumberLong(33333),
"levels" : [
"accountNumber" : NumberLong(33333),
"levels" : [
"accountNumber" : NumberLong(44444),
"levels" : [
Notice the 2nd document where accountNumber is same (33333). I want to merge these into single entry. The output should look like -
/* 1 */
"_id" : NumberLong(1111),
"valueArray" : [
"accountNumber" : NumberLong(12345),
"levels" : [
"accountNumber" : NumberLong(67890),
"levels" : [
/* 2 */
"_id" : NumberLong(2222),
"valueArray" : [
"accountNumber" : NumberLong(33333),
"levels" : [
"accountNumber" : NumberLong(44444),
"levels" : [
I tried multiple approaches $concatArrays, $setUnion etc but I end getting some or other error. Even if I am able to get some output, it is not in required format.
Can someone please help here?

You can use double $unwind to get single document per valueArray.accountNumber and the double $group to aggregate those values, initialy by accountNumber and then by _id, try:
$unwind: "$valueArray"
$unwind: "$valueArray.levels"
$group: {
_id: {
_id: "$_id",
accountNumber: "$valueArray.accountNumber"
levels: { $push: "$valueArray.levels" }
$group: {
_id: "$_id._id",
valueArray: {
$push: { accountNumber: "$_id.accountNumber", levels: "$levels" }


MongoDB - Group by and count value, but treat per record as one

I want to group by and count follow_user.tags.tag_id per record, so no matter how many times the same tag_id show up on the same record, it only counts as 1.
My database structure looks like this:
"external_userid" : "EXID1",
"follow_user" : [
"userid" : "USERID1",
"tags" : [
"tag_id" : "TAG1"
"userid" : "USERID2",
"tags" : [
"tag_id" : "TAG1"
"tag_id" : "TAG2"
"external_userid" : "EXID2",
"follow_user" : [
"userid" : "USERID1",
"tags" : [
"tag_id" : "TAG2"
Here's my query:
{ "$unwind": "$follow_user" }, { "$unwind": "$follow_user.tags" },
{ "$group" : { "_id" : { "follow_user᎐tags᎐tag_id" : "$follow_user.tags.tag_id" }, "COUNT(_id)" : { "$sum" : 1 } } },
{ "$project" : { "total" : "$COUNT(_id)", "tagId" : "$_id.follow_user᎐tags᎐tag_id", "_id" : 0 } }
What I expected:
"total" : 1,
"tagId" : "TAG1"
"total" : 2,
"tagId" : "TAG2"
What I get:
"total" : 2,
"tagId" : "TAG1"
"total" : 2,
"tagId" : "TAG2"
$set - Create a new field follow_user_tags.
1.1. $setUnion - To distinct the value from the Result 1.1.1.
1.1.1. $reduce - Add the value of follow_user.tags.tag_id into array.
$unwind - Deconstruct follow_user_tags array field to multiple documents.
$group - Group by follow_user_tags and perform total count via $sum.
$project - Decorate output document.
$set: {
follow_user_tags: {
$setUnion: {
"$reduce": {
"input": "$follow_user.tags",
"initialValue": [],
"in": {
"$concatArrays": [
$unwind: "$follow_user_tags"
$group: {
_id: "$follow_user_tags",
total: {
$sum: 1
$project: {
_id: 0,
tagId: "$_id",
total: 1
Sample Mongo Playground

mongo query join- get total records as key from another collection

I have two collections:
quiz_customer_record collection
"_id" : ObjectId("5f6ec91cbf74d27430b9c24f"),
"quiz_id" : "5f3a33185a1cd35632b8c98c",
"user_id" : "5efae8bed5c5f06f30a057ff",
"name" : "ABC",
"qualification" : "ttt",
"time_required" : "0:13 Mins",
"questions_attempted" : 2,
"total_quiz_questions" : 2,
"attempt_date" : "2020-09-26T04:52:48.169Z"
/* 4 */
"_id" : ObjectId("5f6eca82bf74d27430b9c252"),
"quiz_id" : "5f3a33185a1cd35632b8c98c",
"user_id" : "5f6ec9ba3b502398598a5ade",
"name" : "Test",
"qualification" : "BSC",
"time_required" : "0:6 Mins",
"questions_attempted" : 2,
"total_quiz_questions" : 2,
"attempt_date" : "2020-09-26T04:58:46.060Z"
dummy collection
/* 1 */
"_id" : ObjectId("5f6ec906bf74d27430b9c24d"),
"user_id" : "5efae8bed5c5f06f30a057ff",
"question_id" : "5f6ec888bf74d27430b9c248",
"quiz_id" : "5f3a33185a1cd35632b8c98c",
"selected_answer" : [
"attempt_date" : "2020-09-26T04:52:25.977Z",
"correct_answer" : [
"result" : true
/* 2 */
"_id" : ObjectId("5f6eca82bf74d27430b9c250"),
"user_id" : "5f6ec9ba3b502398598a5ade",
"question_id" : "5f6ec888bf74d27430b9c248",
"quiz_id" : "5f3a33185a1cd35632b8c98c",
"selected_answer" : [
"attempt_date" : "2020-09-26T04:58:46.060Z",
"correct_answer" : [
"result" : true
/* 3 */
"_id" : ObjectId("5f6eca82bf74d27430b9c251"),
"user_id" : "5f6ec9ba3b502398598a5ade",
"question_id" : "5f6ec8b4bf74d27430b9c24b",
"quiz_id" : "5f3a33185a1cd35632b8c98c",
"selected_answer" : [
"attempt_date" : "2020-09-26T04:58:46.060Z",
"correct_answer" : [
"result" : true
From the 2nd(dummy collection i want the total records per user)
I am using this query in which i need modifications:
db.quiz_customer_record.aggregate([{ $match: { quiz_id:"5f3a33185a1cd35632b8c98c"}},
$sort: { attempt_date: -1 }
$group: {
_id: "$user_id",
result1: { $first: "$attempt_date" },
quiz_id: { $first: "$quiz_id" },
o_id: { $first: "$_id" }
$project: {
_id: "$o_id",
user_id: "$_id",
result1: 1
this will give the result as:
/* 1 */
"attempt_date" : "2020-09-26T04:52:48.169Z",
"_id" : ObjectId("5f6ec91cbf74d27430b9c24f"),
"user_id" : "5efae8bed5c5f06f30a057ff"
/* 2 */
"attempt_date" : "2020-09-26T04:58:46.060Z",
"_id" : ObjectId("5f6eca82bf74d27430b9c252"),
"user_id" : "5f6ec9ba3b502398598a5ade"
Expected Result: (as per user_id I need the count of records from dummy collection where quiz_id and attempt_date(result1 from above query) matches)
/* 1 */
"attempt_date" : "2020-09-26T04:52:48.169Z",
"_id" : ObjectId("5f6ec91cbf74d27430b9c24f"),
"user_id" : "5efae8bed5c5f06f30a057ff",
/* 2 */
"attempt_date" : "2020-09-26T04:58:46.060Z",
"_id" : ObjectId("5f6eca82bf74d27430b9c252"),
"user_id" : "5f6ec9ba3b502398598a5ade",
where total_dummy_rec is the count of total record per user in "dummy" collection.
I am confused on how to approach so i can achieve this result. Help me find a solution. Thank you!
You can add 2 stages after your pipeline stages,
$lookup to join dummy collection, where pass required field in let and in pipeline match condition
moved $project at last and count total document in dummy using $size
$lookup: {
from: "dummy",
let: {
quiz_id: "$quiz_id",
user_id: "$_id",
attempt_date: "$attempt_date"
pipeline: [
$match: {
$expr: {
$and: [
{ $eq: ["$$quiz_id", "$quiz_id"] },
{ $eq: ["$$user_id", "$user_id"] },
{ $eq: ["$$attempt_date", "$attempt_date"] }
as: "dummy"
$project: {
_id: "$o_id",
user_id: "$_id",
result1: 1,
total_dummy_rec: {
$size: "$dummy"

Mongodb find maximum scored student embeddded array

I am new in mongodb ,Please help me out
I have more than 500 students details like this..
"_id" : 7,
"name" : "Salena Olmos",
"scores" : [
"score" : 90.37826509157176,
"type" : "exam"
"score" : 42.48780666956811,
"type" : "quiz"
"score" : 96.52986171633331,
"type" : "homework"
/* 2 */
"_id" : 8,
"name" : "Daphne Zheng",
"scores" : [
"score" : 22.13583712862635,
"type" : "exam"
"score" : 14.63969941335069,
"type" : "quiz"
"score" : 75.94123677556644,
"type" : "homework"
Need to find one student details who got highest marks in "type" exam
Output as follows...
"_id" : 7,
"name" : "Salena Olmos",
"scores" : [
"score" : 90.37826509157176,
"type" : "exam"
"score" : 42.48780666956811,
"type" : "quiz"
"score" : 96.52986171633331,
"type" : "homework"
I need one student details from whole collection. The problem I am facing that need to search in embedded array "score" as well as "type".
Someone please help me
Try this
$group: {
_id: "$_id",
scores: {
$first: "$scores"
data: {
$push: "$$ROOT"
$unwind: "$data"
$match: {
"data.scores.type": "exam"
$sort: {
"data.scores.score": -1
$project: {
_id: 1,
name: "$",
scores: "$scores"
$limit: 1
Sample Playground
While this doesn't answer the question, it is related. This one filters out all the subdocuments which match the conditions "greater or equal 90" and type "exam"
$match: {
"scores.score": {
$gte: 90
"scores.type": "exam"
$project: {
name: true,
list: {
$filter: {
input: "$scores",
as: "list",
cond: {
$and: [
$gt: [
$eq: [
which returns
"_id": 7,
"list": [
"score": 90.37826509157176,
"type": "exam"
"name": "Salena Olmos"
If you want the entire document, then add doc: "$$ROOT", to the projection.

How to compare nested array elements with each other and count the total sub documents?

my mongodb document set look like this
"_id" : ObjectId("59093a8e1104a53169"),
"createdAt" : ISODate("2017-05-03T02:03:58.249+0000"),
"phone" : "0000000000",
"email" : "",
"dob" : "12/26/1976",
"password" : "*******",
"stripeID" : "***",
"picture" : "htt://g",
"name" : {
"first" : "P",
"last" : "e"
"addresses" : [
"description" : "237 S ABCD, USA",
"_id" : ObjectId("59093bsaaudua"),
"loc" : [
"apartment" : "",
"description" : "787 S Defghsvd USA",
"_id" : ObjectId("5a26b77dfhgswj"),
"loc" : [
"description" : "13210 hdsg sdjhf 90284, USA",
"_id" : ObjectId("5d2482basasas17be1"),
"loc" : [
what i need to do is compare loc[0] with loc[1] if addresses exists in the document and know how many of them has this x === y. i don't know how to approach this. any help would be great. thanks in advance.
i.e. what i want is in all the documents if any user has equal loc array element's, then i want to find those documents. my query should return like:
"description" : "13210 hdsg sdjhf 90284, USA",
"_id" : ObjectId("5d2482basasas17be1"),
"loc" : [
this should do the trick:
$unwind: '$addresses'
$match: {
$expr: {
$eq: [
{ $arrayElemAt: ["$addresses.loc", 0] },
{ $arrayElemAt: ["$addresses.loc", 1] }
$replaceRoot: {
newRoot: "$addresses"
if you also want the count, you can do this:
$unwind: '$addresses'
$match: {
$expr: {
$eq: [
{ $arrayElemAt: ["$addresses.loc", 0] },
{ $arrayElemAt: ["$addresses.loc", 1] }
$replaceRoot: {
newRoot: "$addresses"
$group: {
_id: null,
count: {
$sum: 1
addresses: {
$push: '$$ROOT'
$project: {
_id: 0

Partition data around a match query during aggregation

What I have been trying to get my head around is to perform some kind of partitioning(split by predicate) in a mongo query. My current query looks like:
{"$match": { $and:[ {$or:[{"toggled":false},{"toggled":true, "status":"INACTIVE"}]} , {"updatedAt":{$gte:1549786260000}} ] }},
{"$unwind" :"$interests"},
{"$group" : {"_id": {"iid": "$interests", "pid":"$publisher"}, "count": {"$sum" : 1}}},
{"$project":{ _id: 0, "iid": "$_id.iid", "pid": "$", "count": 1 }}
This results in the following output:
"count" : 3.0,
"iid" : "INT456",
"pid" : "P789"
"count" : 2.0,
"iid" : "INT789",
"pid" : "P789"
"count" : 1.0,
"iid" : "INT123",
"pid" : "P789"
"count" : 1.0,
"iid" : "INT123",
"pid" : "P123"
All good so far, but then I had realized that for the documents that match the specific filter {"toggled":true, "status":"INACTIVE"}, I would rather decrement the count (-1). (considering the eventual value can be negative as well.)
Is there a way to somehow partition the data after match to make sure different grouping operations are performed for both the collection of documents?
Something that sounds similar to what I am looking for is
$mergeObjects, or maybe $reduce, but not much that I can relate from the documentation examples.
Note: I can sense, one straightforward way to deal with this would be to perform two queries, but I am looking for a single query to perform the operation.
Sample documents for the above output would be:
/* 1 */
"_id" : ObjectId("5d1f7******"),
"id" : "CON123",
"title" : "Game",
"content" : {},
"status" : "ACTIVE",
"publisher" : "P789",
"interests" : [
"updatedAt" : NumberLong(1582078628264)
/* 2 */
"_id" : ObjectId("5d1f8******"),
"id" : "CON456",
"title" : "Home",
"content" : {},
"status" : "INACTIVE",
"publisher" : "P789",
"interests" : [
"updatedAt" : NumberLong(1582078628264)
/* 3 */
"_id" : ObjectId("5d0e9******"),
"id" : "CON654",
"title" : "School",
"content" : {},
"status" : "ACTIVE",
"publisher" : "P789",
"interests" : [
"updatedAt" : NumberLong(1582078628264)
/* 4 */
"_id" : ObjectId("5d207*******"),
"id" : "CON789",
"content" : { },
"status" : "ACTIVE",
"publisher" : "P123",
"interests" : [
"updatedAt" : NumberLong(1582078628264)
What I am looking forward to as a result though is
"count" : 1.0, (2-1)
"iid" : "INT456",
"pid" : "P789"
"count" : 0.0, (1-1)
"iid" : "INT789",
"pid" : "P789"
"count" : 1.0,
"iid" : "INT123",
"pid" : "P789"
"count" : 1.0,
"iid" : "INT123",
"pid" : "P123"
This aggregation gives the desired result.
db.posts.aggregate( [
{ $match: { updatedAt: { $gte: 1549786260000 } } },
{ $facet: {
{ $match: { toggle: false } },
{ $unwind : "$interests" },
{ $group : { _id : { iid: "$interests", pid: "$publisher" }, count: { $sum : 1 } } },
{ $match: { toggle: true, status: "INACTIVE" } },
{ $unwind : "$interests" },
{ $group : { _id : { iid: "$interests", pid: "$publisher" }, count: { $sum : -1 } } },
} },
{ $project: { result: { $concatArrays: [ "$FALSE", "$TRUE" ] } } },
{ $unwind: "$result" },
{ $replaceRoot: { newRoot: "$result" } },
{ $group : { _id : "$_id", count: { $sum : "$count" } } },
{ $project:{ _id: 0, iid: "$_id.iid", pid: "$", count: 1 } }
] )
The output from the query using the input data from the question post:
{ "count" : 1, "iid" : "INT123", "pid" : "P789" }
{ "count" : 1, "iid" : "INT123", "pid" : "P123" }
{ "count" : 0, "iid" : "INT789", "pid" : "P789" }
{ "count" : 1, "iid" : "INT456", "pid" : "P789" }
[ EDIT ADD 2 ]
This query gets the same result with different approach (code):
db.posts.aggregate( [
$match: { updatedAt: { $gte: 1549786260000 } }
$unwind : "$interests"
$group : {
_id : {
iid: "$interests",
pid: "$publisher"
count: {
$sum: {
$switch: {
branches: [
{ case: { $eq: [ "$toggle", false ] },
then: 1 },
{ case: { $and: [ { $eq: [ "$toggle", true] }, { $eq: [ "$status", "INACTIVE" ] } ] },
then: -1 }
_id: 0,
iid: "$_id.iid",
pid: "$",
count: 1
] )
[ EDIT ADD 3 ]
The facet query runs the two facets (TRUE and FALSE) on the same set of documents; it is like two queries running in parallel. But, there is some duplication of code as well as additional stages for shaping the documents down the pipeline to get the desired output.
The second query avoids the code duplication, and there are much lesser stages in the aggregation pipeline. This will make difference when the input dataset has a large number of documents to process - in terms of performance. In general, lesser stages means lesser iterations of the documents (as a stage has to scan the documents which are output from the previous stage).