Get unique values from arrays per record in Mongodb - mongodb

I have a collection in MongoDB that looks like this:
{
"_id" : ObjectId("56d3e53b965b57e4d1eb3e71"),
"name" : "John",
"posts" : [
{
"topic" : "Harry Potter",
"obj_ids" : [
"1234"
],
"dates_posted" : [
"2014-12-24"
]
},
{
"topic" : "Daniel Radcliffe",
"obj_ids" : [
"1235",
"1236",
"1237"
],
"dates_posted" : [
"2014-12-22",
"2015-01-13",
"2014-12-24"
]
}
],
},
{
"_id" : ObjectId("56d3e53b965b57e4d1eb3e72"),
"name" : "Jane",
"posts" : [
{
"topic" : "Eragon",
"tweet_ids" : [
"1672",
"1673",
"1674"
],
"dates_posted" : [
"2014-12-27",
"2014-11-16"
]
}
],
}
How could I query to get a result like:
{
"name": "John",
"dates": ["2014-12-24", "2014-12-22", "2015-01-13"]
},
{
"name": "Jane",
"dates" : ["2014-12-27", "2014-11-16"]
}
I need the dates to be unique, as "2014-12-24" appears in both elements of "posts" but I need only the one.
I tried doing db.collection.aggregate([{$unwind: "$posts"}, {$group:{_id:"$posts.dates_posted"}}]) and that gave me results like this:
{ "_id" : [ "2014-12-24", "2014-12-22", "2015-01-13", "2014-12-24" ] }
{ "_id" : [ "2014-12-27", "2014-11-16" ] }
How can I remove the duplicates and also get the name corresponding to the dates?

You would need to use the $addToSet operator to maintain unique values. One way of doing it would be to:
unwind posts.
unwind "posts.date_posted", so that the array gets flattened and the value can be aggregated in the group stage.
Then group by _id and accumulate unique values for the date field, along with name.
code:
db.collection.aggregate([
{
$unwind:"$posts"
},
{
$unwind:"$posts.dates_posted"
},
{
$group:
{
"_id":"$_id",
"dates":{$addToSet:"$posts.dates_posted"},
"name":{$first:"$name"}
}
},
{
$project:
{
"name":1,
"dates":1,
"_id":0
}
}
])
The cons of this approach being that, it uses two unwind stages, which is quiet costly, since it would increase the number of documents, input to the subsequent stages, by a multiplication factor of n where n is the number of values in the array that is flattened.

Related

Get distinct values from mongodb array with Springboot

I want to filter all collection from mongoDb document where duplicated elements are there in values field which is list. how we can get only unique data and it's named as group.
My Collection:
{
"_id" : ObjectId("5d4b3d44101cc8e202d30eaf"),
"name" : "motorManufacturer",
"values" : [
"LEAM",
"Bico",
"Bico",
"LEAM"
]
}
{
"_id" : ObjectId("5d4b3d44101cc8e202d30eaf"),
"name" : "motorManufacturer",
"values" : [
"NOV",
"NOV"
"SLB",
"SLB",
"SD",
]
}
I have this Collection. I want only unique data from values in Spingboot.
Expceted Result:
{
"name" : "motorManufacturer",
"values" : [
"Bico",
"LEAM"
]
}
{
"name" : "motorManufacturer",
"values" : [
"NOV"
"SLB",
"SD",
]
]
Use $setUnion to remove duplicates.
db.collection.aggregate([
{
$project: {
"name": 1,
"values": {
$setUnion: ["$values"]
}
}
}
])
Demo
In spring-data-mongodb, this can be written as,
ProjectionOperation projectStage = Aggregation.project().andInclude("name")
.and(SetOperators.SetUnion.arrayAsSet("values")).as("values");
mongoTemplate.aggregate(Aggregation.newAggregation(projectStage),
<your collection class>.class,
<your collection class>.class).getMappedResults();

Omit empty fields from MongoDB query result

Is there a way to omit empty fields (eg empty string, or an empty array) from MongoDB query results' documents (find or aggregate).
Document in DB:
{
"_id" : ObjectId("5dc3fcb388c1c7c5620ed496"),
"name": "Bill",
"emptyString" : "",
"emptyArray" : []
}
Output:
{
"_id" : ObjectId("5dc3fcb388c1c7c5620ed496"),
"name": "Bill"
}
Similar question for Elasticsearch: Omit null fields from elasticsearch results
Please use aggregate function.
If you want to remove key. you use $cond by using $project.
db.Speed.aggregate( [
{
$project: {
name: 1,
"_id": 1,
"emptyString": {
$cond: {
if: { $eq: [ "", "$emptyString" ] },
then: "$$REMOVE",
else: "$emptyString"
}
},
"emptyArray": {
$cond: {
if: { $eq: [ [], "$emptyArray" ] },
then: "$$REMOVE",
else: "$emptyArray"
}
}
}
}
] )
One way this could be done is using cursor.map() which is available on find() and aggregation([]) both.
The idea is to have list of the fields that are present/could be in the documents and filter out by using delete operator to remove the fields (which are empty strings or empty array, both have length property) from returning document.
Mongo Shell:
var fieldsList = ["name", "emptyString", "emptyArray"];
db.collection.find().map(function(d) {
fieldsList.forEach(function(k) {
if (
k in d &&
(Array.isArray(d[k]) ||
(typeof d[k] === "string" || d[k] instanceof String)) &&
d[k].length === 0
) {
delete d[k];
}
});
return d;
});
Test documents:
{
"_id" : ObjectId("5dc426d1f667120607ac5006"),
"name" : "Bill",
"emptyString" : "",
"emptyArray" : [ ]
}
{
"_id" : ObjectId("5dc426d1f667120607ac5007"),
"name" : "Foo",
"emptyString" : "foo",
"emptyArray" : [ ]
}
{
"_id" : ObjectId("5dc426d1f667120607ac5008"),
"name" : "Bar",
"emptyString" : "",
"emptyArray" : [
"foo",
"bar"
]
}
{
"_id" : ObjectId("5dc426d1f667120607ac5009"),
"name" : "May",
"emptyString" : "foobar",
"emptyArray" : [
"foo",
"bar"
]
}
O/P
[
{
"_id" : ObjectId("5dc426d1f667120607ac5006"),
"name" : "Bill"
},
{
"_id" : ObjectId("5dc426d1f667120607ac5007"),
"name" : "Foo",
"emptyString" : "foo"
},
{
"_id" : ObjectId("5dc426d1f667120607ac5008"),
"name" : "Bar",
"emptyArray" : [
"foo",
"bar"
]
},
{
"_id" : ObjectId("5dc426d1f667120607ac5009"),
"name" : "May",
"emptyString" : "foobar",
"emptyArray" : [
"foo",
"bar"
]
}
]
Note: if the number of fields are very large in the documents this may not be very optimal solution since the comparisons are going to happen with all fields in document. You might want to chunk the fieldsList with properties that are suspected to be empty array or string.
I think the easiest way to remove all empty string- and empty array-fields from the output is to add the aggregation stage below. (And yes, "easy" is relative, when you have to create these levels of logic to accomplish such a trivial task...)
$replaceRoot: {
newRoot: {
$arrayToObject: {
$filter: {
input: {
$objectToArray: '$$ROOT'
},
as: 'item',
cond: {
$and: [
{ $ne: [ '$$item.v', [] ] },
{ $ne: [ '$$item.v', '' ] }
]
}
}
}
}
}
Just modify the cond-clause to filter out other types of fields (e.g. null).
btw: I haven't tested the performance of this, but at least it's generic and somewhat readable.
Edit: IMPORTANT! The $replaceRoot-stage does prevent MongoDB from optimizing the pipeline, so if you use it in a View that you run .find() on, it will append a $match-stage to the end of the View's pipeline, in stead of prepending an indexed search at the start of the pipeline. This will have significant impact on the performance. You can safely use it in a custom pipeline though, as long as you have the $match-stage before it. (At least as far as my limited MongoDB knowledge tells me). And if anyone knows how to prépend a $match-stage to a View when querying, then please leave a comment :-)

MongoDB: projection $ when find document into nested arrays

I have the following document of collection "user" than contains two nested arrays:
{
"person" : {
"personId" : 78,
"firstName" : "Mario",
"surname1" : "LOPEZ",
"surname2" : "SEGOVIA"
},
"accounts" : [
{
"accountId" : 42,
"accountRegisterDate" : "2018-01-04",
"banks" : [
{
"bankId" : 1,
"name" : "Bank LTD",
},
{
"bankId" : 2,
"name" : "Bank 2 Corp",
}
]
},
{
"accountId" : 43,
"accountRegisterDate" : "2018-01-04",
"banks" : [
{
"bankId" : 3,
"name" : "Another Bank",
},
{
"bankId" : 4,
"name" : "BCT bank",
}
]
}
]
}
I'm trying to get a query that will find this document and get only this subdocument at output:
{
"bankId" : 3,
"name" : "Another Bank",
}
I'm getting really stucked. If I run this query:
{ "accounts.banks.bankId": "3" }
Gets the whole document. And I've trying combinations of projection with no success:
{"accounts.banks.0.$": 1} //returns two elements of array "banks"
{"accounts.banks.0": 1} //empty bank array
Maybe that's not the way to query for this and I'm going in bad direction.
Can you please help me?
You can try following solution:
db.user.aggregate([
{ $unwind: "$accounts" },
{ $match: { "accounts.banks.bankId": 3 } },
{
$project: {
items: {
$filter: {
input: "$accounts.banks",
as: "bank",
cond: { $eq: [ "$$bank.bankId", 3 ] }
}
}
}
},
{
$replaceRoot : {
newRoot: { $arrayElemAt: [ "$items", 0 ] }
}
}
])
To be able to filter accounts by bankId you need to $unwind them. Then you can match accounts to the one having bankId equal to 3. Since banks is another nested array, you can filter it using $filter operator. This will give you one element nested in items array. To get rid of the nesting you can use $replaceRoot with $arrayElemAt.

add where condition in aggregate and group function in mongodb

I have mongo model lets say MYLIST containing data like:-
{
"_id" : ObjectId("542139f31284ad1461dbc15f"),
"Category" : "CENTER",
"Name" : "STAND",
"Url" : "center/stand",
"Img" : [ {
"url" : "www.google.com/images",
"main" : "1",
"home" : "1",
"id" : "34faf230-43cf-11e4-8743-311ea2261289"
},
{
"url" : "www.google.com/images1",
"main" : "1",
"home" : "0",
"id" : "34faf230-43cf-11e4-8743-311e66441289"
} ]
}
I execute the following query to the MYLIST collection:
db.MYLIST.aggregate([
{ "$group": {
"_id": "$Category",
"Name": { "$addToSet": {
"name": "$Name",
"url": "$Url",
"img": "$Img"
}}
}},
{ "$sort": { "_id" : 1 } }
]);
And I got the following result -
[
{ _id: 'CENTER',
Name:
[ { "name" : "Stand",
"url" : "center/stand",
"img": { "url" : "www.google.com/images" , "main" : "1", "home" : "1", "id" : "350356a0-43cf-11e4-8743-311ea2261289" }
}]
},
{ _id: 'CENTER',
Name:
[ { "name" : "Stand",
"url" : "center/stand",
"img": { "url" : "www.google.com/images1" , "main" : "1", "home" : "0", "id" : "34faf230-43cf-11e4-8743-311ea2261289" }
}]
}
]
As you can see my img key itself is an array of objects, Hence I am getting multiple entries for the same category of each entry in img array.
What I actually need is to get only those images that have some value for home key.
expected result:-
[
{ _id: 'CENTER',
Name:
[ { "name" : "Stand",
"url" : "center/stand",
"img": { "url" : "www.google.com/images" , "main" : "1", "home" : "1", "id" : "350356a0-43cf-11e4-8743-311ea2261289" }
}]
},
]
Hence I would like to add where the condition for img.home > 0 on the above-mentioned query, Could anybody help me to resolve this issue as my relatively new to MongoDB.
Still really not sure if this is what you want or even why you would be using $addToSet on this grouping. But if all you want to do is "filter" the content of the array returned in your result, then what you want to do is $match the array elements to your condition after processing an $unwind pipeline in order to "de-normalize" the content:
db.MYLIST.aggregate([
// If you only want those matching array members it makes sense to match the
// documents that contain them first
{ "$match": { "Img.home": 1 } },
// Unwind to de-normalize or "un-join" the documents
{ "$unwind": "$Img" },
// Match again to "filter" out those elements that do not match
{ "$match": { "Img.home": 1 } },
// Then do your grouping
{ "$group": {
"_id": "$Category",
"Name": {
"$addToSet": {
"name": "$Name",
"url": "$Url",
"img": "$Img"
}
}
}},
// Finally sort
{ "$sort": { "_id" : 1 } }
]);
So the $match pipeline is the equivalent of a general query or "where clause" in SQL terms, and can be used at any stage. It is usually best to have this as a first stage when there is some type of filtering that results from this. It reduces the overall load by reducing documents to be processed even if "all" of the end results are not removed as would be the case of working with an array.
The $unwind stage allows the array elements to be processed just like another document. And of course you can just use another $match pipeline stage after this in order to just match the documents to your query condition.

Aggregation framework flatten subdocument data with parent document

I am building a dashboard that rotates between different webpages. I am wanting to pull all slides that are part of the "Test" deck and order them appropriately. After the query my result would ideally look like.
[
{ "url" : "http://10.0.1.187", "position": 1, "duartion": 10 },
{ "url" : "http://10.0.1.189", "position": 2, "duartion": 3 }
]
I currently have a dataset that looks like the following
{
"_id" : ObjectId("53a612043c24d08167b26f82"),
"url" : "http://10.0.1.189",
"decks" : [
{
"title" : "Test",
"position" : 2,
"duration" : 3
}
]
}
{
"_id" : ObjectId("53a6103e3c24d08167b26f81"),
"decks" : [
{
"title" : "Test",
"position" : 1,
"duration" : 2
},
{
"title" : "Other Deck",
"position" : 1,
"duration" : 10
}
],
"url" : "http://10.0.1.187"
}
My attempted query looks like:
db.slides.aggregate([
{
"$match": {
"decks.title": "Test"
}
},
{
"$sort": {
"decks.position": 1
}
},
{
"$project": {
"_id": 0,
"position": "$decks.position",
"duration": "$decks.duration",
"url": 1
}
}
]);
But it does not yield my desired results. How can I query my dataset and get my expected results in a optimal way?
Well to truly "flatten" the document as your title suggests then $unwind is always going to be employed as there really is not other way to do that. There are however some different approaches if you can live with the array being filtered down to the matching element.
Basically speaking, if you really only have one thing to match in the array then your fastest approach is to simply use .find() matching the required element and projecting:
db.slides.find(
{ "decks.title": "Test" },
{ "decks.$": 1 }
).sort({ "decks.position": 1 }).pretty()
That is still an array but as long as you have only one element that matches then this does work. Also the items are sorted as expected, though of course the "title" field is not dropped from the matched documents, as that is beyond the possibilities for simple projection.
{
"_id" : ObjectId("53a6103e3c24d08167b26f81"),
"decks" : [
{
"title" : "Test",
"position" : 1,
"duration" : 2
}
]
}
{
"_id" : ObjectId("53a612043c24d08167b26f82"),
"decks" : [
{
"title" : "Test",
"position" : 2,
"duration" : 3
}
]
}
Another approach, as long as you have MongoDB 2.6 or greater available, is using the $map operator and some others in order to both "filter" and re-shape the array "in-place" without actually applying $unwind:
db.slides.aggregate([
{ "$project": {
"url": 1,
"decks": {
"$setDifference": [
{
"$map": {
"input": "$decks",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.title", "Test" ] },
{
"position": "$$el.position",
"duration": "$$el.duration"
},
false
]
}
}
},
[false]
]
}
}},
{ "$sort": { "decks.position": 1 }}
])
The advantage there is that you can make the changes without "unwinding", which can reduce processing time with large arrays as you are not essentially creating new documents for every array member and then running a separate $match stage to "filter" or another $project to reshape.
{
"_id" : ObjectId("53a6103e3c24d08167b26f81"),
"decks" : [
{
"position" : 1,
"duration" : 2
}
],
"url" : "http://10.0.1.187"
}
{
"_id" : ObjectId("53a612043c24d08167b26f82"),
"url" : "http://10.0.1.189",
"decks" : [
{
"position" : 2,
"duration" : 3
}
]
}
You can again either live with the "filtered" array or if you want you can again "flatten" this truly by adding in an additional $unwind where you do not need to filter with $match as the result already contains only the matched items.
But generally speaking if you can live with it then just use .find() as it will be the fastest way. Otherwise what you are doing is fine for small data, or there is the other option for consideration.
Well as soon as I posted I realized I should be using an $unwind. Is this query the optimal way to do it, or can it be done differently?
db.slides.aggregate([
{
"$unwind": "$decks"
},
{
"$match": {
"decks.title": "Test"
}
},
{
"$sort": {
"decks.position": 1
}
},
{
"$project": {
"_id": 0,
"position": "$decks.position",
"duration": "$decks.duration",
"url": 1
}
}
]);