aggregate group distinct on array of objects after querying - mongodb

I have array of products where a product looks like this:
{
"invNumber":445,
"attributes": [
{
"id": "GR1",
"value": "4",
"description": "Re/Rek"
},
{
"id": "WEBAKKUNDE",
"value": "2",
"description": "NO"
},
{
"id": "WEBAKKUNDK",
"value": "1",
"description": "YES"
},
{
"id": "WEBAKMONTO",
"value": "2",
"description": "NO"
}
{
"id": "WEBPAKFTTH",
"value": "2",
"description": "NO"
}
]
}
What i want to to is get all products that have {"id":"WEBAKKUNDE",value:1} or {"id":"WEBPAKFTTH","value":"1"} and from these products than only return all distinct
{"id": "GR1"} objects.
I am trying to to something like this:
db.getCollection('products').aggregate([
{$unwind:'$attributes'},
{$match:{$or:[{$and:[{"attributes.id":"WEBAKKUNDE"},
{"attributes.value":"1"}]},{$and:[{"attributes.id":"WEBPAKFTTH"},
{"attributes.value":"1"}]}]}},
])
but i dont know how to get the distinct objects from the returned products.

You can use below aggregation query.
$match to check if the array has input criteria followed by $filter with $arrayElemAt to project the GR1 element.
$group on GR1 element to output distinct value.
Note - You will need to add GR1 criteria to the $match if you expect to have attributes without GR1 element for matching attributes.
db.products.aggregate([
{"$match":{
"attributes":{
"$elemMatch":{
"$or":[
{"id":"WEBAKKUNDE","value":"1"},
{"id":"WEBPAKFTTH","value":"1"}
]
}
}
}},
{"$group":{
"_id":{
"$arrayElemAt":[
{"$filter":{"input":"$attributes","cond":{"$eq":["$$this.id","GR1"]}}},
0
]
}
}}
])

Try the following query:
db.test.aggregate([
{ $match:{ "attributes.id" : "WEBAKKUNDE", "attributes.value":"1" } },
{ $unwind: "$attributes" },
{ $match: { "attributes.id": "GR1" } },
])
But lets explain it:
$match:{ "attributes.id" : "WEBAKKUNDE", "attributes.value":"1" } will find all documents that match the id and value attributes on the documents:
$unwind: "$attributes" will give us a document for array item, so in your example we end up with 5 documents.
$match: { "attributes.id": "GR1" } will filter out the remainding for the id being GR1
More reading:
https://docs.mongodb.com/manual/reference/operator/aggregation/match
https://docs.mongodb.com/manual/reference/operator/aggregation/unwind

Related

filtering MongoDB array of Nested objects

I'm using MongoDB Compass for my queries while searching through a lot of data that I've inherited and quite often being asked to produce reports on the data for various teams but the documents often have too much data for them to easily parse so I'd like to cut down the data being reported on as much as possible
I've got the following example document
{
"_id": "123456",
"name": "Bob",
"date": "2022-07-01",
"fruit": [
{
"_id": "000001",
"foodName": "apple",
"colour": "red"
},
{
"_id": "000002",
"foodName": "apple",
"colour": "green"
},
{
"_id": "000003",
"foodName": "banana",
"colour": "yellow"
},
{
"_id": "000004",
"foodName": "orange",
"colour": "orange"
}
]
}
using
db.people.find( { "fruit.foodName" : "apple" } )
returns the whole document
I'd like to search for just the apples so that I get the result:
{
"_id": "123456",
"name": "Bob",
"date": "2022-07-01",
"fruit": [
{
"_id": "000001",
"foodName": "apple",
"colour": "red"
},
{
"_id": "000002",
"foodName": "apple",
"colour": "green"
}
]
}
Is that possible?
You will need to use an aggregation for this and use the $filter operator, The reason you can't use the query language for this is because their projection options are limited and only allow the projection of a single array element, because in your case the array can contain more than 1 matching subdocument it won't do.
You can read more about query language projections here
db.collection.aggregate([
{
$match: {
"fruit.foodName": "apple"
}
},
{
$addFields: {
fruit: {
$filter: {
input: "$fruit",
cond: {
$eq: [
"$$this.foodName",
"apple"
]
}
}
}
}
}
])
Mongo Playground

Mongo - equivalence between distinct and aggregate

Hi: I'm facing some issues whe replacing the distinct 'native' function of mongo with an aggregate query.
In my case my query is like:
db.collection.distinct('stuff.shape')
mongo then returns an array with distinct values of object.field like
['square','triangle','circle']
but douing with aggregate
db.collection.aggregate([
{ $match:{ 'stuff.shape':{$exists: true} } },
{ $group:{ '_id': '$stuff.shape'} }
])
returns many elements like
{'_id':['triangle']}
{'_id':['square']}
{'_id':['circle']}
my goal is getting the same list as native aggregate.
This is because the expression I want to "distinct" has some calculated data I can't put in the distinct directly
sample data:
[
{
"type": "obligation",
"stuff": {
"name": "must-turn-right",
"shape": "circle"
}
},
{
"type": "information",
"stuff": {
"name": "town_name",
"shape": "square"
}
},
{
"type": "obligation",
"stuff": {
"name": "yeld",
"shape": "triangle"
}
},
{
"type": "danger",
"stuff": {
"name": "beware_of_cattle",
"shape": "triangle"
}
}
]
link to mongoplaygroud
As #thammada.ts already said, it's not possible to get an array of strings from the aggregate function that is equivalent to the output of the distinct function.
It is possible to create an aggregate query that returns one document with the distinct values as an array in the document by adding an extra $group stage in the aggregation pipeline:
db.collection.aggregate([
{ $match:{ 'stuff.shape':{$exists: true} } },
{ $group:{ '_id': '$stuff.shape'} },
{ $group:{ '_id': null, 'shape': {$push: '$_id'}}}
])
gives output
[{
"_id": null,
"shape": [
"square",
"circle",
"triangle"
]
}]
Aggregation pipeline returns documents. If you want an array of field values, you need to perform that transformation in your application.

MongoDb aggregation framework to group elements of inner array

I'm using MongoDB aggregation framework trying to transform each document:
{
"all": [
{
"type": "A",
"id": "1"
},
{
"type": "A",
"id": "1"
},
{
"type": "B",
"id": "2"
},
{
"type": "A",
"id": "3"
}
]
}
into this:
{
"unique_type_A": [ "3", "1" ]
}
(final result is a collection of n documents with unique_type_A field)
The calculation consists of returning in an array all the uniques types of entities of type A.
I got stuck with $group step, anyone knows how to do it?
To apply this logic to each document, you can use the following;
db.collection.aggregate([
{
$unwind: "$all"
},
{
$match: {
"all.type": "A"
}
},
{
$group: {
_id: {
"type": "$all.type",
"oldId": "$_id"
},
unique_type_A: {
$addToSet: "$all.id"
}
}
},
{
$project: {
_id: 0
}
}
])
Where we first $unwind, to be able to filter and play with each member of all array. Then we just filter the non type:"A" members. The $group stage has the difference with a complex _id, where we utilize the _id of $unwind result, which refers back to the original document, so that we can group the results per original document. Collecting the id from all array with $addToSet to keep only unique values, and voilĂ !
And here is the result per document;
[
{
"unique_type_A": [
"3",
"1"
]
},
{
"unique_type_A": [
"4",
"11",
"5"
]
}
]
Check the code interactively on Mongoplayground

Results based on $sort in $lookup of mongodb

I have to sort the final results based on the lookup result. Below is my aggregate query:
{ $match : {status:'active'},
{ $limit : 10},
{ $lookup:
{
from : "metas",
localField : "_id",
foreignField: "post_id",
as : "meta"
}
}
This query produce results as:
{
"_id": "594b6adc2a8c4f294025e46e",
"title": "Test 1",
"created_at": "2017-06-22T06:59:40.809Z",
"meta": [
{
"_id": "594b6b072a8c4f294025e46f",
"post_id": "594b6adc2a8c4f294025e46e",
"views": 1,
},
{
"_id": "594b6b1c2a8c4f294025e471",
"post_id": "594b6adc2a8c4f294025e46e",
}
],
},
{
"_id": "594b6adc2a8c4f29402f465",
"title": "Test 2",
"created_at": "2017-06-22T06:59:40.809Z",
"meta": [
{
"_id": "594b6b072a8c4f294025e46f",
"post_id": "594b6adc2a8c4f29402f465",
"views": 0,
},
{
"_id": "594b6b1c2a8c4f294025e471",
"post_id": "594b6adc2a8c4f29402f465",
}
],
},
{
"_id": "594b6adc2a8c4f29856d442",
"title": "Test 3",
"created_at": "2017-06-22T06:59:40.809Z",
"meta": [
{
"_id": "594b6b072a8c4f294025e46f",
"post_id": "594b6adc2a8c4f29856d442",
"views": 3,
},
{
"_id": "594b6b1c2a8c4f294025e471",
"post_id": "594b6adc2a8c4f29856d442",
}
],
}
Now what I want here is to sort these results based on 'views' under 'meta'. Like result will be list in descending order of 'meta.views'. First result will be meta with views=3, then views=1 and then views=0
$unwind operator splits an array into seperate documents for each object contained in an array
For eg
db.collection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: {
path : "$meta"
}
},
// Stage 2
{
$sort: {
'meta.views':-1
}
},
]
);
Although $lookup does not support sorting, the easiest solution I think, and probably also the fastest, is to create a proper index on the related collection.
In this case, an index on the metas collection on the foreign field post_id and the field on which sorting is wanted views. Make sure to make the index in the correct sorting order.
Not only is the result now sorted, the query is probably also faster now it can use an index.

MongoDB find subdocument and sort the results

I have a collection in MongoDB with a complex structure and subdocuments.
The document have an structure like this:
doc1 = {
'_id': '12345678',
'url': "http//myurl/...",
'nlp':{
"status": "OK",
"entities": {
"0": {
"type" : "Person",
"relevance": "0.877245",
"text" : "Neelie Kroes"
},
"1": {
"type": "Company",
"relevance": "0.36242",
"text": "ICANN"
},
"2": {
"type": "Company",
"relevance": "0.265175",
"text": "IANA"
}
}
}
}
doc2 = {
'_id': '987456321',
'url': "http//myurl2/...",
'nlp':{
"status": "OK",
"entities": {
"0": {
"type": "Company",
"relevance": "0.96",
"text": "ICANN"
},
"1": {
"type" : "Person",
"relevance": "0.36242",
"text" : "Neelie Kroes"
},
"2": {
"type": "Company",
"relevance": "0.265175",
"text": "IANA"
}
}
}
}
My task is to search for "type" AND "text" inside the subdocument, then sort by "relevance".
With the $elemMatch operator I'm able to perform the query:
db.resource.find({
'nlp.entities': {
'$elemMatch': {'text': 'Neelie Kroes', 'type': 'Person'}
}
});
Perfect, now I have to sort all the records with entities of type "Person" and value "Neelie Kroes" by relevance descending.
I tried with a normal "sort", but, as the manual said about the sort() in $elemMatch, the result may not reflect the sort order because the sort() was applied to the elements of the array before the $elemMatch projection.
In fact, the _id:987456321 will be the first (with a relevance of 0.96, but referenced to ICANN).
How can I do, to sort my documents by matched subdocument's relevance?
P.S.: I can't change the document structure.
As noted I hope your documents actually do have an array, but if $elemMatch is working for you then they should.
At any rate, you cannot sort by an element in an array using find. But there is a case where you can do this using .aggregate():
db.collection.aggregate([
// Match the documents that you want, containing the array
{ "$match": {
"nlp.entities": {
"$elemMatch": {
"text": "Neelie Kroes",
"type": "Person"
}
}
}},
// Project to "store" the whole document for later, duplicating the array
{ "$project": {
"_id": {
"_id": "$_id",
"url": "$url",
"nlp": "$nlp"
},
"entities": "$nlp.entities"
}},
// Unwind the array to de-normalize
{ "$unwind": "$entities" },
// Match "only" the relevant entities
{ "$match": {
"entities.text": "Neelie Kroes",
"entities.type": "Person"
}},
// Sort on the relevance
{ "$sort": { "entities.relevance": -1 } },
// Restore the original document form
{ "$project": {
"_id": "$_id._id",
"url": "$_id.url",
"nlp": "$_id.nlp"
}}
])
So essentially, after doing the $match condition for documents that contained the relevant match, you then use $project "store" the original document in the _id field and $unwind a "copy" of the "entities" array.
The next $match "filters" the array contents to only those ones that are relevant. Then you apply the $sort to the "matched" documents.
As the "original" document was stored under _id, you use $project to "restore" the structure that the document actually had to begin with.
That is how you "sort" on your matched element of an array.
Note that if you had multiple "matches" within an array for a parent document, then you would have to employ an additional $group stage to get the $max value for the "relevance" field in order to complete your sort.