MongoDB find subdocument and sort the results - mongodb

I have a collection in MongoDB with a complex structure and subdocuments.
The document have an structure like this:
doc1 = {
'_id': '12345678',
'url': "http//myurl/...",
'nlp':{
"status": "OK",
"entities": {
"0": {
"type" : "Person",
"relevance": "0.877245",
"text" : "Neelie Kroes"
},
"1": {
"type": "Company",
"relevance": "0.36242",
"text": "ICANN"
},
"2": {
"type": "Company",
"relevance": "0.265175",
"text": "IANA"
}
}
}
}
doc2 = {
'_id': '987456321',
'url': "http//myurl2/...",
'nlp':{
"status": "OK",
"entities": {
"0": {
"type": "Company",
"relevance": "0.96",
"text": "ICANN"
},
"1": {
"type" : "Person",
"relevance": "0.36242",
"text" : "Neelie Kroes"
},
"2": {
"type": "Company",
"relevance": "0.265175",
"text": "IANA"
}
}
}
}
My task is to search for "type" AND "text" inside the subdocument, then sort by "relevance".
With the $elemMatch operator I'm able to perform the query:
db.resource.find({
'nlp.entities': {
'$elemMatch': {'text': 'Neelie Kroes', 'type': 'Person'}
}
});
Perfect, now I have to sort all the records with entities of type "Person" and value "Neelie Kroes" by relevance descending.
I tried with a normal "sort", but, as the manual said about the sort() in $elemMatch, the result may not reflect the sort order because the sort() was applied to the elements of the array before the $elemMatch projection.
In fact, the _id:987456321 will be the first (with a relevance of 0.96, but referenced to ICANN).
How can I do, to sort my documents by matched subdocument's relevance?
P.S.: I can't change the document structure.

As noted I hope your documents actually do have an array, but if $elemMatch is working for you then they should.
At any rate, you cannot sort by an element in an array using find. But there is a case where you can do this using .aggregate():
db.collection.aggregate([
// Match the documents that you want, containing the array
{ "$match": {
"nlp.entities": {
"$elemMatch": {
"text": "Neelie Kroes",
"type": "Person"
}
}
}},
// Project to "store" the whole document for later, duplicating the array
{ "$project": {
"_id": {
"_id": "$_id",
"url": "$url",
"nlp": "$nlp"
},
"entities": "$nlp.entities"
}},
// Unwind the array to de-normalize
{ "$unwind": "$entities" },
// Match "only" the relevant entities
{ "$match": {
"entities.text": "Neelie Kroes",
"entities.type": "Person"
}},
// Sort on the relevance
{ "$sort": { "entities.relevance": -1 } },
// Restore the original document form
{ "$project": {
"_id": "$_id._id",
"url": "$_id.url",
"nlp": "$_id.nlp"
}}
])
So essentially, after doing the $match condition for documents that contained the relevant match, you then use $project "store" the original document in the _id field and $unwind a "copy" of the "entities" array.
The next $match "filters" the array contents to only those ones that are relevant. Then you apply the $sort to the "matched" documents.
As the "original" document was stored under _id, you use $project to "restore" the structure that the document actually had to begin with.
That is how you "sort" on your matched element of an array.
Note that if you had multiple "matches" within an array for a parent document, then you would have to employ an additional $group stage to get the $max value for the "relevance" field in order to complete your sort.

Related

aggregate group distinct on array of objects after querying

I have array of products where a product looks like this:
{
"invNumber":445,
"attributes": [
{
"id": "GR1",
"value": "4",
"description": "Re/Rek"
},
{
"id": "WEBAKKUNDE",
"value": "2",
"description": "NO"
},
{
"id": "WEBAKKUNDK",
"value": "1",
"description": "YES"
},
{
"id": "WEBAKMONTO",
"value": "2",
"description": "NO"
}
{
"id": "WEBPAKFTTH",
"value": "2",
"description": "NO"
}
]
}
What i want to to is get all products that have {"id":"WEBAKKUNDE",value:1} or {"id":"WEBPAKFTTH","value":"1"} and from these products than only return all distinct
{"id": "GR1"} objects.
I am trying to to something like this:
db.getCollection('products').aggregate([
{$unwind:'$attributes'},
{$match:{$or:[{$and:[{"attributes.id":"WEBAKKUNDE"},
{"attributes.value":"1"}]},{$and:[{"attributes.id":"WEBPAKFTTH"},
{"attributes.value":"1"}]}]}},
])
but i dont know how to get the distinct objects from the returned products.
You can use below aggregation query.
$match to check if the array has input criteria followed by $filter with $arrayElemAt to project the GR1 element.
$group on GR1 element to output distinct value.
Note - You will need to add GR1 criteria to the $match if you expect to have attributes without GR1 element for matching attributes.
db.products.aggregate([
{"$match":{
"attributes":{
"$elemMatch":{
"$or":[
{"id":"WEBAKKUNDE","value":"1"},
{"id":"WEBPAKFTTH","value":"1"}
]
}
}
}},
{"$group":{
"_id":{
"$arrayElemAt":[
{"$filter":{"input":"$attributes","cond":{"$eq":["$$this.id","GR1"]}}},
0
]
}
}}
])
Try the following query:
db.test.aggregate([
{ $match:{ "attributes.id" : "WEBAKKUNDE", "attributes.value":"1" } },
{ $unwind: "$attributes" },
{ $match: { "attributes.id": "GR1" } },
])
But lets explain it:
$match:{ "attributes.id" : "WEBAKKUNDE", "attributes.value":"1" } will find all documents that match the id and value attributes on the documents:
$unwind: "$attributes" will give us a document for array item, so in your example we end up with 5 documents.
$match: { "attributes.id": "GR1" } will filter out the remainding for the id being GR1
More reading:
https://docs.mongodb.com/manual/reference/operator/aggregation/match
https://docs.mongodb.com/manual/reference/operator/aggregation/unwind

Results based on $sort in $lookup of mongodb

I have to sort the final results based on the lookup result. Below is my aggregate query:
{ $match : {status:'active'},
{ $limit : 10},
{ $lookup:
{
from : "metas",
localField : "_id",
foreignField: "post_id",
as : "meta"
}
}
This query produce results as:
{
"_id": "594b6adc2a8c4f294025e46e",
"title": "Test 1",
"created_at": "2017-06-22T06:59:40.809Z",
"meta": [
{
"_id": "594b6b072a8c4f294025e46f",
"post_id": "594b6adc2a8c4f294025e46e",
"views": 1,
},
{
"_id": "594b6b1c2a8c4f294025e471",
"post_id": "594b6adc2a8c4f294025e46e",
}
],
},
{
"_id": "594b6adc2a8c4f29402f465",
"title": "Test 2",
"created_at": "2017-06-22T06:59:40.809Z",
"meta": [
{
"_id": "594b6b072a8c4f294025e46f",
"post_id": "594b6adc2a8c4f29402f465",
"views": 0,
},
{
"_id": "594b6b1c2a8c4f294025e471",
"post_id": "594b6adc2a8c4f29402f465",
}
],
},
{
"_id": "594b6adc2a8c4f29856d442",
"title": "Test 3",
"created_at": "2017-06-22T06:59:40.809Z",
"meta": [
{
"_id": "594b6b072a8c4f294025e46f",
"post_id": "594b6adc2a8c4f29856d442",
"views": 3,
},
{
"_id": "594b6b1c2a8c4f294025e471",
"post_id": "594b6adc2a8c4f29856d442",
}
],
}
Now what I want here is to sort these results based on 'views' under 'meta'. Like result will be list in descending order of 'meta.views'. First result will be meta with views=3, then views=1 and then views=0
$unwind operator splits an array into seperate documents for each object contained in an array
For eg
db.collection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: {
path : "$meta"
}
},
// Stage 2
{
$sort: {
'meta.views':-1
}
},
]
);
Although $lookup does not support sorting, the easiest solution I think, and probably also the fastest, is to create a proper index on the related collection.
In this case, an index on the metas collection on the foreign field post_id and the field on which sorting is wanted views. Make sure to make the index in the correct sorting order.
Not only is the result now sorted, the query is probably also faster now it can use an index.

Looking for sub-documents containing a field in a document's array

Assuming I have the following persons collection:
{
"_id": ObjectId("569d07a38e61973f6aded134"),
"name": "john",
"pets": [
{
"name": "spot",
"type": "dog",
"special": "spot eye"
},
{
"name": "bob",
"type": "cat",
}
]
},
{
"_id": ObjectId("569d07a38e61973f6aded135"),
"name": "susie",
"pets": [
{
"name": "fred",
"type": "cat",
}
]
}
How can I retrieve the persons who's pet(s) has a special field? I'm looking to have the returned pets array only contain the pets with a special field.
For example, the expected result from the collection above would be:
{
"_id": ObjectId("569d07a38e61973f6aded134"),
"name": "john",
"pets": [
{
"name": "spot",
"type": "dog",
"special": "spot eye"
}
]
}
I'm trying to implement this in hopefully one query with pymongo, although even just a working MongoDB or mongoose query would be lovely.
I've tried to start with:
db.persons.find({pets:{special:{$exists:true}}});
but that has returned 0 records, even though there should be some.
If the array holds embedded documents, you can query for specific fields in the embedded documents using dot notation.
Without dot notation you are querying array documents for a complete match.
Try the following query:
db.persons.find({'pets.special':{$exists:true}});
You can use the aggregation framework to get the desired result. Run the following aggregation pipeline:
db.persons.aggregate([
{
"$match": {
"pets.special": { "$exists": true }
}
},
{
"$project": {
"name": 1,
"pets": {
"$setDifference": [
{
"$map": {
"input": "$pets",
"as": "el",
"in": {
"$cond": [
{ "$gt": [ "$$el.special", null ] },
"$$el", false
]
}
}
},
[false]
]
}
}
}
])
Sample Output
{
"result" : [
{
"_id" : ObjectId("569d07a38e61973f6aded134"),
"name" : "john",
"pets" : [
{
"name" : "spot",
"type" : "dog",
"special" : "spot eye"
}
]
}
],
"ok" : 1
}
The operators that make a significant difference are the $setDifference and $map operators. The $map operator in essence creates a new array field that holds values as a result of the evaluated logic in a subexpression to each element of an array. The $setDifference operator then returns a set with elements that appear in the first set but not in the second set; i.e. performs a relative complement of the second set relative to the first. In this case it will return the final pets array that has elements not related to the parent documents based on the existence of the special property, based on the conditional operator $cond which evaluates the expression returned by the comparison operator $gt.

How to merge array field in document in Mongo aggregation

I have one requirement where i need to do aggregation on two records both have an array field with different value. What I need that when I do aggregation on these records the result should have one array with unique values from both different arrays. Here is example :
First record
{ Host:"abc.com" ArtId:"123", tags:[ "tag1", "tag2" ] }
Second record
{ Host:"abc.com" ArtId:"123", tags:[ "tag2", "tag3" ] }
After aggregation on host and artid i need result like this:
{ Host: "abc.com", ArtId: "123", count :"2", tags:[ "tag1", "tag2", "tag3" ]}
I tried $addToset in group statement but it gives me like this tags :[["tag1","tag2"],["tag2","tag3"]]
Could you please help me how i can achieve this in aggregation
TLDR;
Modern releases should use $reduce with $setUnion after the initial $group as is shown:
db.collection.aggregate([
{ "$group": {
"_id": { "Host": "$Host", "ArtId": "$ArtId" },
"count": { "$sum": 1 },
"tags": { "$addToSet": "$tags" }
}},
{ "$addFields": {
"tags": {
"$reduce": {
"input": "$tags",
"initialValue": [],
"in": { "$setUnion": [ "$$value", "$$this" ] }
}
}
}}
])
You were right in finding the $addToSet operator, but when working with content in an array you generally need to process with $unwind first. This "de-normalizes" the array entries and essentially makes a "copy" of the parent document with each array entry as a singular value in the field. That's what you need to avoid the behavior you are seeing without using that.
Your "count" poses an interesting problem though, but easily solved through the use of a "double unwind" after an initial $group operation:
db.collection.aggregate([
// Group on the compound key and get the occurrences first
{ "$group": {
"_id": { "Host": "$Host", "ArtId": "$ArtId" },
"tcount": { "$sum": 1 },
"ttags": { "$push": "$tags" }
}},
// Unwind twice because "ttags" is now an array of arrays
{ "$unwind": "$ttags" },
{ "$unwind": "$ttags" },
// Now use $addToSet to get the distinct values
{ "$group": {
"_id": "$_id",
"tcount": { "$first": "$tcount" },
"tags": { "$addToSet": "$ttags" }
}},
// Optionally $project to get the fields out of the _id key
{ "$project": {
"_id": 0,
"Host": "$_id.Host",
"ArtId": "$_id.ArtId",
"count": "$tcount",
"tags": "$ttags"
}}
])
That final bit with $project is also there because I used "temporary" names for each of the fields in other stages of the aggregation pipeline. This is because there is an optimization in $project that "copies" the fields from an existing stage in the order they already appeared "before" any "new" fields are added to the document.
Otherwise the output would look like:
{ "count":2 , "tags":[ "tag1", "tag2", "tag3" ], "Host": "abc.com", "ArtId": "123" }
Where the fields are not in the same order as you might think. Trivial really, but it matters to some people, so worth explaining why, and how to handle.
So $unwind does the work to keep the items separated and not in arrays, and doing the $group first allows you to get the "count" of the occurrences of the "grouping" key.
The $first operator used later "keeps" that "count" value, as it just got "duplicated" for every value present in the "tags" array. It's all the same value anyway so it does not matter. Just pick one.

Exclude fields in matching element retrieved by positional operator

I have a texts collection with documents looking like this:
{
title: 'A title',
author: 'Author Name',
published: 1944,
languages: [
{
code: 'en',
text: 'A long english text by the author...'
},
{
code: 'da',
text: 'En lang dansk tekst skrevet af forfatteren...'
}
// + many more languages
]
}
and would like to make a query that retrieves the title, author and published date, and the text in a given language, so I do this:
texts.findOne(
{ title: titleArg, language.code: languageArg },
{ 'title': 1, 'author': 1, 'published': 1, 'languages.$': 1 } ...
but I would like to return the matching language element WITHOUT mongodb's _id field.
If I do this in the projection:
{ '_id': 0, 'title': 1, 'author': 1, 'published': 1, 'languages.$': 1 }
I get the document back without it's main _id, but if I do this:
{ 'languages.$._id': 0, 'title': 1, 'author': 1, 'published': 1, 'languages.$': 1 }
or this:
{ 'languages._id': 0, 'title': 1, 'author': 1, 'published': 1, 'languages.$': 1 }
nothing at all is returned.
Does anyone know how to create a projection that returns an element in an array AND exludes some fields in that element?
You seem to be actually saying that your documents really look like this:
{
"_id": ObjectId("53b25ad420edfc7d0df16a0c"),
"title": "A title",
"author": "Author Name",
"published": 1944,
"languages": [
{
"_id": ObjectId("53b25af720edfc7d0df16a0d"),
"code": "en",
"text": "A long english text by the author..."
},
{
"_id": ObjectId("53b25b0720edfc7d0df16a0e"),
"code": "da",
"text": "En lang dansk tekst skrevet af forfatteren..."
}
]
}
Just to clear up the point, MongoDB does not insert the _id values in the array elements. This is something that certain Object Document Mappers or ODM's do, one such ODM is mongoose. But it is other software that does this as MongoDB and default drivers only place this field at the "top" level of the documents in a collection unless another value is specified in it's place.
Being specific or precise in the fields within in array that you wish to "project", is beyond the scope of what you can do with .find(). You actually need the .aggregate() method in order to "re-shape" the document and remove all the _id fields the way you want:
db.collection.aggregate([
// Match the document(s) that meet the conditions
{ "$match": {
"title": "A title",
"languages.code": "en"
}},
// Unwind the array to de-normalize for processing
{ "$unwind": "$languages" },
// Match to "filter" the actual array documents
{ "$match": { "languages.code": "en" } },
// Group back the array per document and keep only the wanted fields
{ "$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"author": { "$first": "$author" },
"published": { "$first": "$published" },
"languages": {
"$push": {
"code": "$languages.code",
"text": "$languages.text"
}
}
}},
// Finally project to remove the "root" _id field
{ "$project": {
"_id": 0,
"title": 1,
"author": 1,
"published": 1,
"languages": 1
}}
])
MongoDB 2.6 introduces some new operators to make this process possible in a single $project stage:
db.collection.aggregate([
{ "$match": {
"title": "A title",
"languages.code": "en"
}},
{ "$project": {
"_id": 0,
"title": 1,
"author": 1,
"published": 1,
"languages": {
"$setDifference": [
{ "$map": {
"input": "$languages",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.code", "en" ] },
{ "code": "$$el.code", "text": "$$el.text" },
false
]
}
}},
[false]
]
}
}}
])
The general intent of such operations is usually for more involved document re-shaping than what you are doing, including the need to match multiple array elements which is something you cannot do with positional projection.
But this is also the only present way to "alter" the fields returned from within the array elements as you wish to happen.
Also look at the full list of aggregation operators for a more detailed explanation of each one.