Complex count query in MongoDB - mongodb

I have a json that have the following structure:
{"data": {
"key1": "value1",
"key2": "value2",
"manualTests": [{"name": "component1", "passed": false, "x": 12},
{"name": "component2", "passed": true},
{"name": "component3", "passed": false, "responseTime": 5}],
"automaticTests": [{"name": "component4", "passed": false},
{"name": "component5", "passed": true, "conversion": "Z"},
{"name": "component6", "passed": false}],
"semiautomaticTests": [{"name": "component7", "passed": true},
{"name": "component8", "passed": true},
{"name": "component9", "passed": true}]
}}
My mongoDB contains a really huge number of these and I need to get a list of all the components that have not passed the tests. So the output desired should be:
{
"component1": 150,
"component2": 35,
"component3": 17,
"component4": 5,
"component5": 3,
"component6": 1
}
The numbers are random and for each component they show how many components did pass the tests. How do I calculate it in mongoDB? The format is not strict, the prime requirement is that the output should contain a name of a component failed and their number out of the whole sample.

You can try below aggregation query.
$match stage to consider only tests where there is at-least one fail component.
$project with $filter to extract all the failed components followed by $concatArrays to merge all failed components across all tests.
$unwind to flatten the array and $group to count for each failed component.
db.colname.aggregate([
{"$match":{
"$or":[
{"manualTests.passed":false},
{"automaticTests.passed":false},
{"semiautomaticTests.passed":false}
]
}},
{"$project":{
"tests":{
"$concatArrays":[
{"$filter":{"input":"$manualTests","as":"mt","cond":{"$eq":["$$mt.passed",false]}}},
{"$filter":{"input":"$automaticTests","as":"at","cond":{"$eq":["$$at.passed",false]}}},
{"$filter":{"input":"$semiautomaticTests","as":"st","cond":{"$eq":["$$st.passed",false]}}}
]
}
}},
{"$unwind":"$tests"},
{"$group":{"_id":"$tests.name","count":{"$sum":1}}}
])

Related

Get items of array by index in MongoDB

So I have a data structure in a Mongo collection (v. 4.0.18) that looks something like this…
{
"_id": ObjectId("242kl4j2lk23423"),
"name": "Doug",
"kids": [
{
"name": "Alice",
"age": 15,
},
{
"name": "James",
"age": 13,
},
{
"name": "Michael",
"age": 10,
},
{
"name": "Sharon",
"age": 8,
}
]
}
In Mongo, how would I get back a projection of this object with only the first two kids? I want the output to look like this:
{
"_id": ObjectId("242kl4j2lk23423"),
"name": "Doug",
"kids": [
{
"name": "Alice",
"age": 15,
},
{
"name": "James",
"age": 13,
}
]
}
It seems like I should easily be able to get them by index, but I'm not seeing anything in the docs about how to do that. The real-world problem I'm trying to solve has nothing to do with kids, and the array could be quite lengthy. I'm trying to break it up and process it in batches without having to load the whole thing into memory in my application.
EDIT (non-sequential indexes):
I noticed that since I asked about item 1 & 2 that $slice would suffice…however, what if I wanted items 1 & 3? Is there a way I can specify specific array indexes to return?
Any ideas or pointers for how to accomplish that?
Thanks!
You are looking for the $slice projection operator if the desired selection are near each other.
https://docs.mongodb.com/manual/reference/operator/projection/slice/
This would return the first 2
client.db.collection.find({"name":"Doug"}, { "kids": { "$slice": 2 } })
returns
{'_id': ObjectId('5f85f682a45e15af3a907f51'), 'name': 'Doug', 'kids': [{'name': 'Alice', 'age': 15}, {'name': 'James', 'age': 13}]}
this would skip the first kid and return the next two (second and third)
client.db.collection.find({"name":"Doug"}, { "kids": { "$slice": [1, 2] } })
returns
{'_id': ObjectId('5f85f682a45e15af3a907f51'), 'name': 'Doug', 'kids': [{'name': 'James', 'age': 13}, {'name': 'Michael', 'age': 10}]}
Edit:
Arbitrary selections 1 and 3 probably need to route through an aggregation pipeline rather than a simple query. The performance shouldn't be too much different assuming you have an index on the $match field.
Steps of your pipeline should be pretty obvious and you should be able to take it from here.
Hate to point to RTFM, but that's going to be super helpful here to at least be acquainted with the pipeline operations.
https://docs.mongodb.com/manual/reference/operator/aggregation/
Your pipeline should:
$match on your desired query
$set some new field kid_selection to element 1 (second element) and element 3 (4th element) since counting starts at 0. Notice the prefixed $ on the "kids" key name in the kid_selection setter. When referencing a key in the document you're working on, you need to prefix with $
project the whole document, minus the original kids field that we've selected from
client.db.collection.aggregate([
{"$match":{"name":"Doug"}},
{"$set": {"kid_selection": [
{ "$arrayElemAt": [ "$kids", 1 ] },
{ "$arrayElemAt": [ "$kids", 3 ] }
]}},
{ "$project": { "kids": 0 } }
])
returns
{
'_id': ObjectId('5f86038635649a988cdd2ade'),
'name': 'Doug',
'kid_selection': [
{'name': 'James', 'age': 13},
{'name': 'Sharon', 'age': 8}
]
}

Alternate to $strLenCP field in mongoDB 3.0[prior versions]

I'm currently using mongo 3.0v. I've got a requirement to find the length of each string in the result from the aggregate command.
For example:
db.getCollection('temp').find()
[
{"key": "value1"},
{"key": "value2" },
{"key": "valuee2"}
]
This query gives the length of the key field:
db.getCollection('temp').aggregate([{
$project: {
"strLength": {"$strLenCP": "$key"}
}
}])
Like:
[
{"strLength": 6},
{"strLength": 6},
{"strLength": 7}
]
But $strLenCP key is not supported in prior 3.4 versions. So are there any alternate options for this?

Is it possible to do a subquery to return an array for the $nin operator in MongoDB?

I have a data set that looks something like:
{"key": "abc", "val": 1, "status": "np"}
{"key": "abc", "val": 2, "status": "p"}
{"key": "def", "val": 3, "status": "np"}
{"key": "ghi", "val": 4, "status": "np"}
{"key": "ghi", "val": 5, "status": "p"}
I want a query that returns document(s) that have a status="np" but only where there are other documents with the same key that do not have a status value of "p". So the document returned from the data set above would be key="def" since "abc" has a value of "np" but "abc" also has a document with a value of "p". This is also true for key="ghi". I came up with something close but I don't think the $nin operator supports q distinct query.
db.test2.find({$and: [{"status":"np"}, {"key": {$nin:[<distinct value query>]]})
If I were to hardcode the value in the $nin array, it would work:
db.test2.find({$and: [{"status":"np"}, {"key": {$nin:['abc', 'ghi']}}]})
I just need to be able to write a find inside the square brackets. I could do something like:
var res=[];
res = db.test2.distinct("key", {"status": "p"});
db.test2.find({$and: [{"status":"np"}, {"key": {$nin:res}}]});
But the problem with this is that in the time between the two queries, another process may update the "status" of a document and then I'd have inconsistent data.
Try this
db.so.aggregate([
{$group: {'_id': '$key', 'st': {$push: '$status'}}},
{$project :{st: 1, '$val':1, '$status':1, 'hasNp':{$in:['np', '$st']}, hasP: {$in:['p', '$st']}}},
{$match: {hasNp: true, hasP: false}}
]);

Query on the last element of an array in MongoDB when the array size is stored in a variable

I have a dataset in MongoDB and this is an example of a line of my data:
{ "conversionDate": "2016-08-01",
"timeLagInDaysHistogram": 0,
"pathLengthInInteractionsHistogram": 4,
"campaignPath": [
{"campaignName": "name1", "source": "sr1", "medium": "md1", "click": "0"},
{"campaignName": "name2", "source": "sr1", "medium": "md1", "click": "0"},
{"campaignName": "name1", "source": "sr2", "medium": "md2", "click": "1"},
{"campaignName": "name3", "source": "sr1", "medium": "md3", "click": "1"}
],
"totalTransactions": 1,
"totalValue": 37.0,
"avgCartValue": 37.0
}
(The length of campaignPath is not constant, so each line can have a different amount of elements.
And I want to find elements that matches "source = sr1" in the last element of campaignPath.
I know I can't do a query with something like
db.paths.find(
{
'campaignPath.-1.source': "sr1"
}
)
But, since I have "pathLengthInInteractionsHistogram" stored which is equal to the length of campaignPath lenght, can't I do something like:
db.paths.find(
{
'campaignPath.$pathLengthInInteractionsHistogram.source': "sr1"
}
)
Starting with MongoDB 3.2, you can do this with aggregate which provides the $arrayElemAt operator which accepts a -1 index to access the last element.
db.paths.aggregate([
// Project the original doc along with the last campaignPath element
{$project: {
doc: '$$ROOT',
lastCampaign: {$arrayElemAt: ['$campaignPath', -1]}
}},
// Filter on the last campaign's source
{$match: {'lastCampaign.source': 'sr1'}},
// Remove the added lastCampaign field
{$project: {doc: 1}}
])
In earlier releases, you're stuck using $where. This will work but has poor performance:
db.paths.find({
$where: 'this.campaignPath[this.pathLengthInInteractionsHistogram-1].source === "sr1"'
})
which you could also do without using pathLengthInInteractionsHistogram:
db.paths.find({$where: 'this.campaignPath[this.campaignPath.length-1].source === "sr1"'})

Mongodb query for arrays of combination two fields

Say I have a document in the database:
{"name": "Jason", "score": 20, "color": "blue"}
And I have an array of data that contains documents with name and score, is there a way for me to query for the combination of the name and score via $in? For example, if I had a list that looked like
var data = [
{"name": "Bob", "score": 12}
{"name": "Jason", "score": 20}
{"name": "Tammy", "score": 19}
];
And I wanted to query the collection to see if any combination of name and score found within data existed within said collection, how could I do that?
Close because $in is actually a shortened form of $or. You already have the array there so:
db.collection.find({ "$or": data })