combining distinct on projection in mongodb - mongodb

Is there a query i can use on the following collection to get the result at the bottom?
Example:
{
"_id" : ObectId(xyz),
"name" : "Carl",
"something":"else"
},
{
"_id" : ObectId(aaa),
"name" : "Lenny",
"something":"else"
},
{
"_id" : ObectId(bbb),
"name" : "Carl",
"something":"other"
}
I need a query to get this result:
{
"_id" : ObectId(xyz),
"name" : "Carl"
},
{
"_id" : ObectId(aaa),
"name" : "Lenny"
},
A set of documents with no identical names. Its not important which _ids are kept.

You can use aggregation framework to get this shape, the query could look like this:
db.collection.aggregate(
[
{
$group:
{
_id: "$name",
id: { $first: "$_id" }
}
},
{
$project:{
_id:"$id",
name:"$_id"
}
}
]
)
As long as you don't need other fields this will be sufficient.
If you need to add other fields - please update document structure and expected result.
as you don't care about ids it can be simplified
db.collection.aggregate([{$group:{_id: "$name"}}])

Related

MongoDB - Grouping by inner-documents and retrieving top results

I'm trying to find the most common (and least common) skills stored in the mongo database. I'm using mongoose to retrieve the results.
The User is the root document, which each have an inner Profile document. The profile has an attribute of 'skills' which contain an array of ProfileSkillEntry's which has a title (the skill name).
return User.aggregate([{
$group: {
'_id': '$profile.skills.title',
'count': {
$sum: 1
}
}
}, {
$sort: {
'count': -1
}
}, {
$limit: 5
}]);
I expect it to combine all of the registered Users skills together, find the top 5 occurring and return that. Instead it seems to be grouping per-user and giving invalid results.
Example User document structure:
{
"_id" : ObjectId("..."),
"firstName" : "Harry",
"lastName" : "Potter",
"profile" : {
"_id" : ObjectId("..."),
"skills" : [
{
"_id" : ObjectId("..."),
"title" : "Java",
"description" : "Master",
"dateFrom" : "31/07/2019",
"coreSkill" : true
},
{
"_id" : ObjectId("..."),
"title" : "JavaScript",
"description" : "Proficient",
"dateFrom" : "31/07/2019",
"coreSkill" : false
}
],
}
}
Please use the below query. Just add the sort and limit as per your requirement
db.test.aggregate(
[{ $unwind: { path: "$profile.skills"} },
{ $group: { _id: "$profile.skills.title",
"count": { $sum: 1 }} }] )

Summing a value of a key over multiple documents in MongoDB

I have a collection named users with the following structure to its documents
{
"_id" : <user_id>,
"NAME" : "ABC",
"TIME" : 53.0,
"OBJECTS" : 1
},
{
"_id" : <user_id>,
"NAME" : "ABCD",
"TIME" : 353.0,
"OBJECTS" : 70
}
Now, I want to sum the value of OBJECTS over the entire collection and return the value along with the objects.
Something like this
{
{
"_id" : <user_id>,
"NAME" : "ABC",
"TIME" : 53.0,
"OBJECTS" : 1
},
{
"_id" : <user_id>,
"NAME" : "ABCD",
"TIME" : 353.0,
"OBJECTS" : 70
},
"TOTAL_OBJECTS": 71
}
Or any way wherein I don't have to compute on the received object and can directly access from it. Now, I've tried looking this up but I found none where the hierarchy of the existing documents isn't destroyed.
You can use $group specifying null as a grouping id. You'll gather all documents into one array (using $$ROOT variable) and another field can represent a sum of OBJECT like below:
db.users.aggregate([
{
$group: {
_id: null,
documents: { $push: "$$ROOT" },
TOTAL_OBJECTS: { $sum: "$OBJECTS" }
}
}
])
db.users.aggregate(
// Pipeline
[
// Stage 1
{
$group: {
_id: null,
TOTAL_OBJECTS: {
$sum: '$OBJECTS'
},
documents: {
$addToSet: '$$CURRENT'
}
}
},
]
);
Into above aggregate query I have pushed all documents into an array using $addToSet operator as a part of $group stage of aggregate operation

mongoDB query to find the document in nested array

[{
"username":"user1",
"products":[
{"productID":1,"itemCode":"CODE1"},
{"productID":2,"itemCode":"CODE1"},
{"productID":3,"itemCode":"CODE2"},
]
},
{
"username":"user2",
"products":[
{"productID":1,"itemCode":"CODE1"},
{"productID":2,"itemCode":"CODE2"},
]
}]
I want to find all the "productID" of "products" for "user1" such that "itemCode" for the product is "CODE1".
What query in mongoDB should be written to do so?
If you only need to match a single condition, then the dot notation is sufficient.
In Mongo shell:
db.col.find({"products.itemCode" : "CODE1", "username" : "user1"})
This will return all users with nested product objects having itemCode "CODE1".
Updated
Wasn't clear on your requirements at first but this should be it.
If you want each product as a separate entry, then you would need to use the aggregate framework. First split the entries in the array using $unwind, then use $match for your conditions.
db.col.aggregate(
{ $unwind: "$products" },
{ $match: { username: "user1", "products.itemCode": "CODE1" } }
);
response:
{ "_id" : ObjectId("57cdf9c0f7f7ecd0f7ef81b6"), "username" : "user1", "products" : { "productID" : 1, "itemCode" : "CODE1" } }
{ "_id" : ObjectId("57cdf9c0f7f7ecd0f7ef81b6"), "username" : "user1", "products" : { "productID" : 2, "itemCode" : "CODE1" } }
The answer to your question is
db.col.aggregate([
{ $unwind: "$products" },
{ $match: { username: "user1", "products.itemCode": CODE1 } },
{ $project: { _id: 0, "products.productID": 1 } }
]);
In my case didn't work without [ ] tags.
You need multiple filter for this like below which is nothing but AND condition (assuming your collection name is collection1)
db.collection1.find({"username":"user1", "products.itemCode" : "CODE1"})

MongoDB query only the inner document

My mongodb collection looks like this:
{
"_id" : ObjectId("5333bf6b2988dc2230c9c924"),
"name" : "Mongo2",
"notes" : [
{
"title" : "mongodb1",
"content" : "mongo content1"
},
{
"title" : "replicaset1",
"content" : "replca content1"
}
]
}
{
"_id" : ObjectId("5333fd402988dc2230c9c925"),
"name" : "Mongo2",
"notes" : [
{
"title" : "mongodb2",
"content" : "mongo content2"
},
{
"title" : "replicaset1",
"content" : "replca content1"
},
{
"title" : "mongodb2",
"content" : "mongo content3"
}
]
}
I want to query only notes that have the title "mongodb2" but do not want the complete document.
I am using the following query:
> db.test.find({ 'notes.title': 'mongodb2' }, {'notes.$': 1}).pretty()
{
"_id" : ObjectId("5333fd402988dc2230c9c925"),
"notes" : [
{
"title" : "mongodb2",
"content" : "mongo bakwas2"
}
]
}
I was expecting it to return both notes that have title "mongodb2".
Does mongo return only the first document when we query for a document within a document ?
The positional $ operator can only return the first match index that it finds.
Using aggregate:
db.test.aggregate([
// Match only the valid documents to narrow down
{ "$match": { "notes.title": "mongodb2" } },
// Unwind the array
{ "$unwind": "$notes" },
// Filter just the array
{ "$match": { "notes.title": "mongodb2" } },
// Reform via group
{ "$group": {
"_id": "$_id",
"name": { "$first": "$name" },
"notes": { "$push": "$notes" }
}}
])
So you can use this to "filter" specific documents from the array.
$ always refers to the first match, as does the $elemMatch projection operator.
I think you have three options:
separate the notes so each is a document of its own
accept sending more data over the network and filter client-side
use the aggregation pipeline ($match and $project)
I'd probably choose option 1, but you probably have a reason for your data model.

Obtaining $group result with group count

Assuming I have a collection called "posts" (in reality it is a more complex collection, posts is too simple) with the following structure:
> db.posts.find()
{ "_id" : ObjectId("50ad8d451d41c8fc58000003"), "title" : "Lorem ipsum", "author" :
"John Doe", "content" : "This is the content", "tags" : [ "SOME", "RANDOM", "TAGS" ] }
I expect this collection to span hundreds of thousands, perhaps millions, that I need to query for posts by tags and group the results by tag and display the results paginated. This is where the aggregation framework comes in. I plan to use the aggregate() method to query the collection:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
]);
The catch is that to create the paginator I would need to know the length of the output array. I know that to do that you can do:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
{ "$group" : {
_id: null,
total: { $sum: 1 }
} }
]);
But that would discard the output from previous pipeline (the first group). Is there a way that the two operations be combined while preserving each pipeline's output? I know that the output of the whole aggregate operation can be cast to an array in some language and have the contents counted but there may be a possibility that the pipeline output may exceed the 16Mb limit. Also, performing the same query just to obtain the count seems like a waste.
So is obtaining the document result and count at the same time possible? Any help is appreciated.
Use $project to save tag and count into tmp
Use $push or addToSet to store tmp into your data list.
Code:
db.test.aggregate(
{$unwind: '$tags'},
{$group:{_id: '$tags', count:{$sum:1}}},
{$project:{tmp:{tag:'$_id', count:'$count'}}},
{$group:{_id:null, total:{$sum:1}, data:{$addToSet:'$tmp'}}}
)
Output:
{
"result" : [
{
"_id" : null,
"total" : 5,
"data" : [
{
"tag" : "SOME",
"count" : 1
},
{
"tag" : "RANDOM",
"count" : 2
},
{
"tag" : "TAGS1",
"count" : 1
},
{
"tag" : "TAGS",
"count" : 1
},
{
"tag" : "SOME1",
"count" : 1
}
]
}
],
"ok" : 1
}
I'm not sure you need the aggregation framework for this other than counting all the tags eg:
db.posts.aggregate(
{ "unwind" : "$tags" },
{ "group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
);
For paginating through per tag you can just use the normal query syntax - like so:
db.posts.find({tags: "RANDOM"}).skip(10).limit(10)