Filtering a mongodb query result based on the position of a field in an array - mongodb

Apologies for the confusing title, I am not sure how to summarize this.
Suppose I have the following list of documents in a collection:
{ "name": "Lorem", "source": "A" }
{ "name": "Lorem", "source": "B" }
{ "name": "Ipsum", "source": "A" }
{ "name": "Ipsum", "source": "B" }
{ "name": "Ipsum", "source": "C" }
{ "name": "Foo", "source": "B" }
as well an ordered list of accepted sources, where lower indexes signify higher priority
sources = ["A", "B"]
My query should:
Take a list of available sources and a list of wanted names
Return a maximum of one document per name.
In case of multiple matches, the document with the most prioritized source should be chosen.
Example:
wanted_names = ['Lorem', 'Ipsum', 'Foo', 'NotThere']
Result:
{ "name": "Lorem", "source": "A" }
{ "name": "Ipsum", "source": "A" }
{ "name": "Foo", "source": "B" }
The results don't necessarily have to be ordered.
Is it possible to do this with a Mongo query alone? If so could someone point me towards a resource detailing how to accomplish it?
My current solution doesn't support a list of names, and instead relies on a Python script to execute multiple queries:
db.collection.aggregate([
{$match: {
"name": "Lorem",
"source": {
$in: sources
}}},
{$addFields: {
"order": {
$indexOfArray: [sources, "$source"]
}}},
{$sort: {
"order": 1
}},
{$limit: 1}
]);
Note: _id fields are omitted in this question for the sake of brevity

How about this: With $group we have $min operator which takes lower source
Note: If you prioritize as ['B', 'A'], use $max then
db.collection.aggregate([
{
$match: {
"name": {
$in: [
"Lorem",
"Ipsum",
"Foo",
"NotThere"
]
},
"source": {
$in: [
"A",
"B"
]
}
}
},
{
$group: {
_id: "$name",
source: {
$min: "$source"
}
}
},
{
$project: {
_id: 0,
name: "$_id",
source: 1
}
}
])
MongoPlayground

Related

MongoDb Aggregate Total Count Before Grouping

I have an aggregation pipeline that groups objects and holds count for some specific field for grouped objects. You can reproduce the problem here: https://mongoplayground.net/p/2DGaiQDYDBP .
The schema is like this;
[
{
"_id": {
"$oid": "63ce93ffb6e06322db59fdc0"
},
"fruit": "apple",
"source": "tree",
"is_fruit_important": "true"
},
{
"_id": {
"$oid": "63ce93ffb6e06322db59fdc1"
},
"fruit": "orange",
"source": "tree",
"is_fruit_important": "false"
},
]
and the current query groups fruits by the source, and holds the count of important fruits for every group. After applying aggregation I get something like this after query:
[
{
"count": {
"number_of_important_fruits": 1
},
"objects": [
{
"fruit": "apple",
"id": "63ce93ffb6e06322db59fdc0",
"is_fruit_important": "true",
"source": "tree"
},
{
"fruit": "orange",
"id": "63ce93ffb6e06322db59fdc1",
"is_fruit_important": "false",
"source": "tree"
}
],
"source": {
"source-of": "tree"
}
}
]
Is there a way to put the number of all fruits in the database to the response object. For example like this:
{
"total-count": 2,
"result": [
{
"count": {
"number_of_important_fruits": 1
},
"objects": [
{
"fruit": "apple",
"id": "63ce93ffb6e06322db59fdc0",
"is_fruit_important": "true",
"source": "tree"
},
{
"fruit": "orange",
"id": "63ce93ffb6e06322db59fdc1",
"is_fruit_important": "false",
"source": "tree"
}
],
"source": {
"source-of": "tree"
}
}
]
}
They can be handled in separate aggregation pipelines but that's what I would not like to implement. Any help would be highly appreciated.
Add one additional group stage just before the final $project, using $sum with $size for a total count, or add up the important counts for a total important count.
{$group: {
_id: null,
result: {$push: "$$ROOT"},
"count_total": {$sum: {$size: "$objects"}},
"count_important": {$sum: "$count.number_of_important_fruits"}
}},
Playground
You can simply add a $facet stage to push all your results into result. Then perform a $size on result to get total-count.
db.collection.aggregate([
...,
{
"$facet": {
"result": [],
"total-important-count": [
{
$group: {
_id: null,
cnt: {
$sum: "$count.number_of_important_fruits"
}
}
}
]
}
},
{
"$addFields": {
"total-count": {
$size: "$result"
},
"total-important-count": {
$first: "$total-important-count.cnt"
}
}
}
])
Mongo Playground

How can I get random documents with the field starting letters (A to Z) in MongoDB?

I'm trying to make an aggregate with mongodb. My goal is that I want to get random names
starting with letters A to Z. As a result, each word starts with letter must be only once in the response but I can't figure out how to do it. I used match condition with regex and sample condition to get random documents.
Here is my collection;
[
{
"name": "ahmet"
},
{
"name": "barış"
},
{
"name": "ceyhun"
},
{
"name": "aslan"
},
{
"name": "deniz"
},
....
]
Here is my aggregate function;
db.collection.aggregate([
{
$match: {
name: {
$regex: "^a|^b|^c" // must be A to Z
}
}
},
{
"$sample": {
"size": 3 // Must be 26
}
}
])
I'm waiting response to be like this;
[
{
"name": "ahmet"
},
{
"name": "barış"
},
{
"name": "ceyhun"
},
.... // other words starting with d, e , f but only one word for each letter
]
But I'm getting;
[
{
"_id": ObjectId("5a934e000102030405000001"),
"name": "barış"
},
{
"_id": ObjectId("5a934e000102030405000003"),
"name": "aslan"
},
{
"_id": ObjectId("5a934e000102030405000000"),
"name": "ahmet"
},
// name => aslan, name => ahmet (Two words starting with same letter)
]
I'm newbie at mongodb and if anyone can help me where I'm wrong, I'll be appreciate.
Mongo Playground
You can do something like this:
Edit with guard improvement suggestions* (using $substrCP, $ifNull):
db.collection.aggregate([
{
$group: {
_id: {$substrCP: ["$name", 0, 1]},
name: {$push: "$name"}
}
},
{
$project: {_id: 0,
name: {
$arrayElemAt: [
"$name",
{$toInt: {$multiply: [{$rand: {}}, {$size: {$ifNull: ["$name",[]] }}]}
}
]
}
}
}
])
As you can see on this playground example.
The $group will keep a list of names per each firstL, the $arryElemAt with the $rand will keep only a random item.
*Thanks to #Mbay and #Paul for the improvement suggestions

Delete objects that met a condition inside an array in mongodb

My collection has array "name" with objects inside. I need to remove only those objects inside array where "name.x" is blank.
"name": [
{
"name.x": [
{
"_id": "607e7fcca57aa56e2a06b57b",
"name": "abc",
"type": "123"
}
],
"_id": {
"$oid": "62232cd70ce38c5007de31e6"
},
"qty": "1.0",
"Unit": "pound,lbs"
},
{
"name.x": [
{
"_id": "607e7fcca57aa56e2a06b430",
"name": "xyz",
"type": "123"
}
],
"_id": {
"$oid": "62232cd70ce38c5007de31e7"
},
"qty": "1.0",
"Unit": "pound,lbs"
},{
"name.x": []
,
"_id": {
"$oid": "62232cd70ce38c5007de31e7"
},
"qty": "1.0",
"Unit": "pound,lbs"
}
I tried to get all the ids where name.x is blank using python and used $pull to remove objects base on those ids.But the complete array got deleted.How can I remove the objects that meet the condition.
Think MongoDB update with aggregation pipeline meets your requirement especially to deal with the field name with ..
$set - Update the name array field by $filter name.x field is not an empty array.
db.collection.update({},
[
{
$set: {
name: {
$filter: {
input: "$name",
cond: {
$ne: [
{
$getField: {
field: "name.x",
input: "$$this"
}
},
[]
]
}
}
}
}
}
],
{
multi: true
})
Sample Mongo Playground

Mongodb querying for all min values that match criteria and indexing

Suppose I had the following:
[
{
"team": "A",
"age": 1,
"name": "Abe"
},
{
"team": "A",
"age": 5,
"name": "Apple"
},
{
"team": "B",
"age": 1,
"name": "Ben"
},
{
"team": "B",
"age": 2,
"name": "Bon"
},
{
"team": "C",
"age": 5,
"name": "Cherry"
}
]
I have the following query:
They must be in either TeamA or TeamB.
After filtering, I want only the youngest.
So in this example, it would return only Abe and Ben.
Preferably, I want it done in a single query. My guess is I have to use an aggregation pipeline, something like
db.People.aggregate(
[
$match: { $or: [{ team: 'A' }, { team: 'B' }] },
// some more stuff
);
Question1: I'm not sure what the next step would be. Could someone point me in the right direction?
Question2:
There may be a million records and I was thinking of adding two index:
Index on Team. I'm thinking this will allow it to filter Teams of interest.
Index on Age so that it can grab only the mins.
Would these indexes help or what kind of indexes should I be looking into?
**Edit: I'm getting closer, but I'm only interested in the records themselves. **
db.collection.aggregate([
{
$match: {
$or: [
{
team: "A"
},
{
team: "B"
}
]
}
},
{
$group: {
_id: "$age",
items: {
$push: "$$ROOT"
}
}
},
{
$sort: {
_id: 1
}
},
{
$limit: 1,
}
])

How to get sum of child entries for hierarchical documents?

I have a document of the following form:
{
"name": "root1",
"children": [{
"name": "A",
"children": [{
"name": "A1",
"items": 20
}, {
"name": "A2",
"items": 19
}],
"items": 8
}, {
"name": "B",
"items": 12
}],
"items": 1
}
That is, each level has a "name" field, an "items" field, and optionally a children field. I would like to run a query which returns the total number of items for each root. In this example, it should return (since 20+19+8+12+1=60)
{ "_id" : "root1", "items" : 60 }
However, each document can have arbitrarily many levels. That is, this example has two to three children below the root, but other documents may have more. That is, I cannot do something like
db.myCollection.aggregate( { $unwind : "$children" },
{ $group : { _id : "$name", items: { $sum : "$items" } } } )
What sort of query will work?
There really is no way to descend arrays to arbitrary depths using the aggregation framework. For this sort of structure you need to use mapReduce where you can programatically do this:
db.collection.mapReduce(
function () {
var items = 0;
var action = function(current) {
items += current.items;
if ( current.hasOwnProperty("children") ) {
current.children.forEach(function(child) {
action( child );
});
}
};
action( this );
emit( this.name, items );
},
function(){},
{ "out": { "inline": 1 } }
)
If you do not want mapReduce then consider another structure for your data and do things differently:
{ "name": "root1", "items": 1, "path": [], "root": null },
{ "name": "A", "items": 8, "path": ["root1"], "root": "root1" },
{ "name": "A1", "items": 20, "path": ["root1", "A"], "root": "root1" },
{ "name": "A2", "items": 19, "path": ["root1", "A"], "root": "root1" },
{ "name": "B", "items": 12, "path": ["root1"], "root": "root1" }
Then you just have a simple aggregate:
db.collection.aggregate([
{ "$group": {
"_id": {
"$cond": [
"$root",
"$root",
"$name"
]
},
"items": { "$sum": "$items" }
}}
])
So if you take a different approach to mapping a hierarchy then doing things such as aggregating totals for paths is much easier without the recursive inspection that would otherwise be required.
The approach that you need depends on your actual usage requirements.