Hi: I'm facing some issues whe replacing the distinct 'native' function of mongo with an aggregate query.
In my case my query is like:
db.collection.distinct('stuff.shape')
mongo then returns an array with distinct values of object.field like
['square','triangle','circle']
but douing with aggregate
db.collection.aggregate([
{ $match:{ 'stuff.shape':{$exists: true} } },
{ $group:{ '_id': '$stuff.shape'} }
])
returns many elements like
{'_id':['triangle']}
{'_id':['square']}
{'_id':['circle']}
my goal is getting the same list as native aggregate.
This is because the expression I want to "distinct" has some calculated data I can't put in the distinct directly
sample data:
[
{
"type": "obligation",
"stuff": {
"name": "must-turn-right",
"shape": "circle"
}
},
{
"type": "information",
"stuff": {
"name": "town_name",
"shape": "square"
}
},
{
"type": "obligation",
"stuff": {
"name": "yeld",
"shape": "triangle"
}
},
{
"type": "danger",
"stuff": {
"name": "beware_of_cattle",
"shape": "triangle"
}
}
]
link to mongoplaygroud
As #thammada.ts already said, it's not possible to get an array of strings from the aggregate function that is equivalent to the output of the distinct function.
It is possible to create an aggregate query that returns one document with the distinct values as an array in the document by adding an extra $group stage in the aggregation pipeline:
db.collection.aggregate([
{ $match:{ 'stuff.shape':{$exists: true} } },
{ $group:{ '_id': '$stuff.shape'} },
{ $group:{ '_id': null, 'shape': {$push: '$_id'}}}
])
gives output
[{
"_id": null,
"shape": [
"square",
"circle",
"triangle"
]
}]
Aggregation pipeline returns documents. If you want an array of field values, you need to perform that transformation in your application.
Related
I'm using MongoDB aggregation framework trying to transform each document:
{
"all": [
{
"type": "A",
"id": "1"
},
{
"type": "A",
"id": "1"
},
{
"type": "B",
"id": "2"
},
{
"type": "A",
"id": "3"
}
]
}
into this:
{
"unique_type_A": [ "3", "1" ]
}
(final result is a collection of n documents with unique_type_A field)
The calculation consists of returning in an array all the uniques types of entities of type A.
I got stuck with $group step, anyone knows how to do it?
To apply this logic to each document, you can use the following;
db.collection.aggregate([
{
$unwind: "$all"
},
{
$match: {
"all.type": "A"
}
},
{
$group: {
_id: {
"type": "$all.type",
"oldId": "$_id"
},
unique_type_A: {
$addToSet: "$all.id"
}
}
},
{
$project: {
_id: 0
}
}
])
Where we first $unwind, to be able to filter and play with each member of all array. Then we just filter the non type:"A" members. The $group stage has the difference with a complex _id, where we utilize the _id of $unwind result, which refers back to the original document, so that we can group the results per original document. Collecting the id from all array with $addToSet to keep only unique values, and voilĂ !
And here is the result per document;
[
{
"unique_type_A": [
"3",
"1"
]
},
{
"unique_type_A": [
"4",
"11",
"5"
]
}
]
Check the code interactively on Mongoplayground
I am trying to perform $lookup on collection with conditions, the problem I am facing is that I would like to match the text field of all objects which are inside an array (accounts array) in other (plates) collection.
I have tried using $map as well as $in and $setIntersection but nothing seems to work. And, I am unable to find a way to match the text fields of each of the objects in array.
My document structures are as follows:
plates collection:
{
"_id": "Batch 1",
"rego" : "1QX-WA-123",
"date" : 1516374000000.0
"accounts": [{
"text": "Acc1",
"date": 1516374000000
},{
"text": "Acc2",
"date": 1516474000000
}]
}
accounts collection:
{
"_id": "Acc1",
"date": 1516374000000
"createdAt" : 1513810712802.0
}
I am trying to achieve something like this:
{
$lookup: {
from: 'plates',
let: { 'accountId': '$_id' },
pipeline: [{
'$match': {
'$expr': { '$and': [
{ '$eq': [ '$account.text', '$$accountId' ] },
{ '$gte': [ '$date', ISODate ("2016-01-01T00:00:00.000Z").getTime() ] },
{ '$lte': [ '$date', ISODate ("2019-01-01T00:00:00.000Z").getTime() ] }
]}
}
}],
as: 'cusips'
}
},
The output I am trying to get is:
{
"_id": "Acc1",
"date": 1516374000000
"createdAt" : 1513810712802.0,
"plates": [{
"_id": "Batch 1",
"rego": "1QX-WA-123"
}]
}
Personally I would be initiating the aggregation from the "plates" collection instead where the initial $match conditions can filter the date range more cleanly. Getting your desired output is then a simple matter of "unwinding" the resulting "accounts" matches and "inverting" the content.
Easy enough with MongoDB 3.6 features which you must have in order to use $lookup with $expr. We even don't need that form for $lookup here:
db.plates.aggregate([
{ "$match": {
"date": {
"$gte": new Date("2016-01-01").getTime(),
"$lte": new Date("2019-01-01").getTime()
}
}},
{ "$lookup": {
"from": "accounts",
"localField": "accounts.text",
"foreignField": "_id",
"as": "accounts"
}},
{ "$unwind": "$accounts" },
{ "$group": {
"_id": "$accounts",
"plates": { "$push": { "_id": "$_id", "rego": "$rego" } }
}},
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": ["$_id", { "plates": "$plates" }]
}
}}
])
This of course is an "INNER JOIN" which would only return "accounts" entries where the matc
Doing the "join" from the "accounts" collection means you need additional handling to remove the non-matching entries from the "accounts" array within the "plates" collection:
db.accounts.aggregate([
{ "$lookup": {
"from": "plates",
"let": { "account": "$_id" },
"pipeline": [
{ "$match": {
"date": {
"$gte": new Date("2016-01-01").getTime(),
"$lte": new Date("2019-01-01").getTime()
},
"$expr": { "$in": [ "$$account", "$accounts.text" ] }
}},
{ "$project": { "_id": 1, "rego": 1 } }
],
"as": "plates"
}}
])
Note that the $match on the "date" properties should be expressed as a regular query condition instead of within the $expr block for optimal performance of the query.
The $in is used to compare the "array" of "$accounts.text" values to the local variable defined for the "_id" value of the "accounts" document being joined to. So the first argument to $in is the "single" value and the second is the "array" of just the "text" values which should be matching.
This is also notably a "LEFT JOIN" which returns all "accounts" regardless of whether there are any matching "plates" to the conditions, and therefore you can possibly end up with an empty "plates" array in the results returned. You can filter those out if you didn't want them, but where that was the case the former query form is really far more efficient than this one since the relation is defined and we only ever deal with "plates" which would meet the criteria.
Either method returns the same response from the data provided in the question:
{
"_id" : "Acc1",
"date" : 1516374000000,
"createdAt" : 1513810712802,
"plates" : [
{
"_id" : "Batch 1",
"rego" : "1QX-WA-123"
}
]
}
Which direction you actually take that from really depends on whether the "LEFT" or "INNER" join form is what you really want and also where the most efficient query conditions can be made for the items you actually want to select.
Hmm, not sure how you tried $in, but it works for me:
{
$lookup: {
from: 'plates',
let: { 'accountId': '$_id' },
pipeline: [{
'$match': {
'$expr': { '$and': [
{ '$in': [ '$$accountId', '$accounts.text'] },
{ '$gte': [ '$date', ISODate ("2016-01-01T00:00:00.000Z").getTime() ] },
{ '$lte': [ '$date', ISODate ("2019-01-01T00:00:00.000Z").getTime() ] }
]}
},
}],
as: 'cusips'
}
}
I have array of products where a product looks like this:
{
"invNumber":445,
"attributes": [
{
"id": "GR1",
"value": "4",
"description": "Re/Rek"
},
{
"id": "WEBAKKUNDE",
"value": "2",
"description": "NO"
},
{
"id": "WEBAKKUNDK",
"value": "1",
"description": "YES"
},
{
"id": "WEBAKMONTO",
"value": "2",
"description": "NO"
}
{
"id": "WEBPAKFTTH",
"value": "2",
"description": "NO"
}
]
}
What i want to to is get all products that have {"id":"WEBAKKUNDE",value:1} or {"id":"WEBPAKFTTH","value":"1"} and from these products than only return all distinct
{"id": "GR1"} objects.
I am trying to to something like this:
db.getCollection('products').aggregate([
{$unwind:'$attributes'},
{$match:{$or:[{$and:[{"attributes.id":"WEBAKKUNDE"},
{"attributes.value":"1"}]},{$and:[{"attributes.id":"WEBPAKFTTH"},
{"attributes.value":"1"}]}]}},
])
but i dont know how to get the distinct objects from the returned products.
You can use below aggregation query.
$match to check if the array has input criteria followed by $filter with $arrayElemAt to project the GR1 element.
$group on GR1 element to output distinct value.
Note - You will need to add GR1 criteria to the $match if you expect to have attributes without GR1 element for matching attributes.
db.products.aggregate([
{"$match":{
"attributes":{
"$elemMatch":{
"$or":[
{"id":"WEBAKKUNDE","value":"1"},
{"id":"WEBPAKFTTH","value":"1"}
]
}
}
}},
{"$group":{
"_id":{
"$arrayElemAt":[
{"$filter":{"input":"$attributes","cond":{"$eq":["$$this.id","GR1"]}}},
0
]
}
}}
])
Try the following query:
db.test.aggregate([
{ $match:{ "attributes.id" : "WEBAKKUNDE", "attributes.value":"1" } },
{ $unwind: "$attributes" },
{ $match: { "attributes.id": "GR1" } },
])
But lets explain it:
$match:{ "attributes.id" : "WEBAKKUNDE", "attributes.value":"1" } will find all documents that match the id and value attributes on the documents:
$unwind: "$attributes" will give us a document for array item, so in your example we end up with 5 documents.
$match: { "attributes.id": "GR1" } will filter out the remainding for the id being GR1
More reading:
https://docs.mongodb.com/manual/reference/operator/aggregation/match
https://docs.mongodb.com/manual/reference/operator/aggregation/unwind
Assuming I have the following persons collection:
{
"_id": ObjectId("569d07a38e61973f6aded134"),
"name": "john",
"pets": [
{
"name": "spot",
"type": "dog",
"special": "spot eye"
},
{
"name": "bob",
"type": "cat",
}
]
},
{
"_id": ObjectId("569d07a38e61973f6aded135"),
"name": "susie",
"pets": [
{
"name": "fred",
"type": "cat",
}
]
}
How can I retrieve the persons who's pet(s) has a special field? I'm looking to have the returned pets array only contain the pets with a special field.
For example, the expected result from the collection above would be:
{
"_id": ObjectId("569d07a38e61973f6aded134"),
"name": "john",
"pets": [
{
"name": "spot",
"type": "dog",
"special": "spot eye"
}
]
}
I'm trying to implement this in hopefully one query with pymongo, although even just a working MongoDB or mongoose query would be lovely.
I've tried to start with:
db.persons.find({pets:{special:{$exists:true}}});
but that has returned 0 records, even though there should be some.
If the array holds embedded documents, you can query for specific fields in the embedded documents using dot notation.
Without dot notation you are querying array documents for a complete match.
Try the following query:
db.persons.find({'pets.special':{$exists:true}});
You can use the aggregation framework to get the desired result. Run the following aggregation pipeline:
db.persons.aggregate([
{
"$match": {
"pets.special": { "$exists": true }
}
},
{
"$project": {
"name": 1,
"pets": {
"$setDifference": [
{
"$map": {
"input": "$pets",
"as": "el",
"in": {
"$cond": [
{ "$gt": [ "$$el.special", null ] },
"$$el", false
]
}
}
},
[false]
]
}
}
}
])
Sample Output
{
"result" : [
{
"_id" : ObjectId("569d07a38e61973f6aded134"),
"name" : "john",
"pets" : [
{
"name" : "spot",
"type" : "dog",
"special" : "spot eye"
}
]
}
],
"ok" : 1
}
The operators that make a significant difference are the $setDifference and $map operators. The $map operator in essence creates a new array field that holds values as a result of the evaluated logic in a subexpression to each element of an array. The $setDifference operator then returns a set with elements that appear in the first set but not in the second set; i.e. performs a relative complement of the second set relative to the first. In this case it will return the final pets array that has elements not related to the parent documents based on the existence of the special property, based on the conditional operator $cond which evaluates the expression returned by the comparison operator $gt.
I have a collection in MongoDB with a complex structure and subdocuments.
The document have an structure like this:
doc1 = {
'_id': '12345678',
'url': "http//myurl/...",
'nlp':{
"status": "OK",
"entities": {
"0": {
"type" : "Person",
"relevance": "0.877245",
"text" : "Neelie Kroes"
},
"1": {
"type": "Company",
"relevance": "0.36242",
"text": "ICANN"
},
"2": {
"type": "Company",
"relevance": "0.265175",
"text": "IANA"
}
}
}
}
doc2 = {
'_id': '987456321',
'url': "http//myurl2/...",
'nlp':{
"status": "OK",
"entities": {
"0": {
"type": "Company",
"relevance": "0.96",
"text": "ICANN"
},
"1": {
"type" : "Person",
"relevance": "0.36242",
"text" : "Neelie Kroes"
},
"2": {
"type": "Company",
"relevance": "0.265175",
"text": "IANA"
}
}
}
}
My task is to search for "type" AND "text" inside the subdocument, then sort by "relevance".
With the $elemMatch operator I'm able to perform the query:
db.resource.find({
'nlp.entities': {
'$elemMatch': {'text': 'Neelie Kroes', 'type': 'Person'}
}
});
Perfect, now I have to sort all the records with entities of type "Person" and value "Neelie Kroes" by relevance descending.
I tried with a normal "sort", but, as the manual said about the sort() in $elemMatch, the result may not reflect the sort order because the sort() was applied to the elements of the array before the $elemMatch projection.
In fact, the _id:987456321 will be the first (with a relevance of 0.96, but referenced to ICANN).
How can I do, to sort my documents by matched subdocument's relevance?
P.S.: I can't change the document structure.
As noted I hope your documents actually do have an array, but if $elemMatch is working for you then they should.
At any rate, you cannot sort by an element in an array using find. But there is a case where you can do this using .aggregate():
db.collection.aggregate([
// Match the documents that you want, containing the array
{ "$match": {
"nlp.entities": {
"$elemMatch": {
"text": "Neelie Kroes",
"type": "Person"
}
}
}},
// Project to "store" the whole document for later, duplicating the array
{ "$project": {
"_id": {
"_id": "$_id",
"url": "$url",
"nlp": "$nlp"
},
"entities": "$nlp.entities"
}},
// Unwind the array to de-normalize
{ "$unwind": "$entities" },
// Match "only" the relevant entities
{ "$match": {
"entities.text": "Neelie Kroes",
"entities.type": "Person"
}},
// Sort on the relevance
{ "$sort": { "entities.relevance": -1 } },
// Restore the original document form
{ "$project": {
"_id": "$_id._id",
"url": "$_id.url",
"nlp": "$_id.nlp"
}}
])
So essentially, after doing the $match condition for documents that contained the relevant match, you then use $project "store" the original document in the _id field and $unwind a "copy" of the "entities" array.
The next $match "filters" the array contents to only those ones that are relevant. Then you apply the $sort to the "matched" documents.
As the "original" document was stored under _id, you use $project to "restore" the structure that the document actually had to begin with.
That is how you "sort" on your matched element of an array.
Note that if you had multiple "matches" within an array for a parent document, then you would have to employ an additional $group stage to get the $max value for the "relevance" field in order to complete your sort.