I'm currently using mongo 3.0v. I've got a requirement to find the length of each string in the result from the aggregate command.
For example:
db.getCollection('temp').find()
[
{"key": "value1"},
{"key": "value2" },
{"key": "valuee2"}
]
This query gives the length of the key field:
db.getCollection('temp').aggregate([{
$project: {
"strLength": {"$strLenCP": "$key"}
}
}])
Like:
[
{"strLength": 6},
{"strLength": 6},
{"strLength": 7}
]
But $strLenCP key is not supported in prior 3.4 versions. So are there any alternate options for this?
Related
i am fairly new to mongodb and try to develop a nice way of evaluating a so called multiple choice question.
The data looks like this:
db.test.insertMany([
{
Q2_1: 1,
Q2_2: -77,
Q2_3: 1
},
{
Q2_1: -77,
Q2_2: -77,
Q2_3: 1
},
{
Q2_1: 1,
Q2_2: 0,
Q2_3: 0
},
{
Q2_1: 0,
Q2_2: 1,
Q2_3: 0
}
])
In this example we have 4 probands, who gave answers to 3 items.
Every field can contain one of three values -77, 0, 1
-77: proband did not see the item. So it is neither calculated in the 'base' NOR in 'abs'.
0: proband did see the item, but did not choose it. (counts for 'base' BUT NOT for 'abs')
1: proband did see the item, and chose it. (counts for 'base' AND for 'abs')
now i want a result for every item. So item 1 (Q2_1 has the key value of 1 and so on)
so item 1 would have been seen by 3 probands so the 'base' would be 3.
it would have been chosen by two probands so the 'abs' would be 2.
and therefore the 'perc' would be 0.666666.
expected result array:
[
{
"key": 1,
"abs": 2,
"base": 3,
"perc": 0.6666666666
},
{
"key": 2,
"abs": 1,
"base": 2,
"perc": 0.5
},
{
"key": 3,
"abs": 2,
"base": 4,
"perc": 0.5
}
]
Is it possible to do this evaluation in one aggregation query and get this expected result array?
thanks for help :-)
Query
$objectToArray to remove the data from the keys (you should not save data on fields, fields are for the schema only, MongoDB query language is not made for data in fields)
unwind and replace root
group and 2 condition based accumulators base and abs
add the perc and fix the key, split on _ and take the second part
sort by key
*query is bigger because data on fields doesn't work good in MongoDB, so try to avoid it
Playmongo (you can put the mouse in the end of each stage to see what it does)
aggregate(
[{"$unset": ["_id"]}, {"$set": {"data": {"$objectToArray": "$$ROOT"}}},
{"$unwind": "$data"}, {"$replaceRoot": {"newRoot": "$data"}},
{"$group":
{"_id": "$k",
"base": {"$sum": {"$cond": [{"$eq": ["$v", -77]}, 0, 1]}},
"abs": {"$sum": {"$cond": [{"$eq": ["$v", 1]}, 1, 0]}}}},
{"$set": {"key": {"$arrayElemAt": [{"$split": ["$_id", "_"]}, 1]}}},
{"$set": {"_id": "$$REMOVE", "perc": {"$divide": ["$abs", "$base"]}}},
{"$sort": {"key": 1}}])
So I have a data structure in a Mongo collection (v. 4.0.18) that looks something like this…
{
"_id": ObjectId("242kl4j2lk23423"),
"name": "Doug",
"kids": [
{
"name": "Alice",
"age": 15,
},
{
"name": "James",
"age": 13,
},
{
"name": "Michael",
"age": 10,
},
{
"name": "Sharon",
"age": 8,
}
]
}
In Mongo, how would I get back a projection of this object with only the first two kids? I want the output to look like this:
{
"_id": ObjectId("242kl4j2lk23423"),
"name": "Doug",
"kids": [
{
"name": "Alice",
"age": 15,
},
{
"name": "James",
"age": 13,
}
]
}
It seems like I should easily be able to get them by index, but I'm not seeing anything in the docs about how to do that. The real-world problem I'm trying to solve has nothing to do with kids, and the array could be quite lengthy. I'm trying to break it up and process it in batches without having to load the whole thing into memory in my application.
EDIT (non-sequential indexes):
I noticed that since I asked about item 1 & 2 that $slice would suffice…however, what if I wanted items 1 & 3? Is there a way I can specify specific array indexes to return?
Any ideas or pointers for how to accomplish that?
Thanks!
You are looking for the $slice projection operator if the desired selection are near each other.
https://docs.mongodb.com/manual/reference/operator/projection/slice/
This would return the first 2
client.db.collection.find({"name":"Doug"}, { "kids": { "$slice": 2 } })
returns
{'_id': ObjectId('5f85f682a45e15af3a907f51'), 'name': 'Doug', 'kids': [{'name': 'Alice', 'age': 15}, {'name': 'James', 'age': 13}]}
this would skip the first kid and return the next two (second and third)
client.db.collection.find({"name":"Doug"}, { "kids": { "$slice": [1, 2] } })
returns
{'_id': ObjectId('5f85f682a45e15af3a907f51'), 'name': 'Doug', 'kids': [{'name': 'James', 'age': 13}, {'name': 'Michael', 'age': 10}]}
Edit:
Arbitrary selections 1 and 3 probably need to route through an aggregation pipeline rather than a simple query. The performance shouldn't be too much different assuming you have an index on the $match field.
Steps of your pipeline should be pretty obvious and you should be able to take it from here.
Hate to point to RTFM, but that's going to be super helpful here to at least be acquainted with the pipeline operations.
https://docs.mongodb.com/manual/reference/operator/aggregation/
Your pipeline should:
$match on your desired query
$set some new field kid_selection to element 1 (second element) and element 3 (4th element) since counting starts at 0. Notice the prefixed $ on the "kids" key name in the kid_selection setter. When referencing a key in the document you're working on, you need to prefix with $
project the whole document, minus the original kids field that we've selected from
client.db.collection.aggregate([
{"$match":{"name":"Doug"}},
{"$set": {"kid_selection": [
{ "$arrayElemAt": [ "$kids", 1 ] },
{ "$arrayElemAt": [ "$kids", 3 ] }
]}},
{ "$project": { "kids": 0 } }
])
returns
{
'_id': ObjectId('5f86038635649a988cdd2ade'),
'name': 'Doug',
'kid_selection': [
{'name': 'James', 'age': 13},
{'name': 'Sharon', 'age': 8}
]
}
I have a json that have the following structure:
{"data": {
"key1": "value1",
"key2": "value2",
"manualTests": [{"name": "component1", "passed": false, "x": 12},
{"name": "component2", "passed": true},
{"name": "component3", "passed": false, "responseTime": 5}],
"automaticTests": [{"name": "component4", "passed": false},
{"name": "component5", "passed": true, "conversion": "Z"},
{"name": "component6", "passed": false}],
"semiautomaticTests": [{"name": "component7", "passed": true},
{"name": "component8", "passed": true},
{"name": "component9", "passed": true}]
}}
My mongoDB contains a really huge number of these and I need to get a list of all the components that have not passed the tests. So the output desired should be:
{
"component1": 150,
"component2": 35,
"component3": 17,
"component4": 5,
"component5": 3,
"component6": 1
}
The numbers are random and for each component they show how many components did pass the tests. How do I calculate it in mongoDB? The format is not strict, the prime requirement is that the output should contain a name of a component failed and their number out of the whole sample.
You can try below aggregation query.
$match stage to consider only tests where there is at-least one fail component.
$project with $filter to extract all the failed components followed by $concatArrays to merge all failed components across all tests.
$unwind to flatten the array and $group to count for each failed component.
db.colname.aggregate([
{"$match":{
"$or":[
{"manualTests.passed":false},
{"automaticTests.passed":false},
{"semiautomaticTests.passed":false}
]
}},
{"$project":{
"tests":{
"$concatArrays":[
{"$filter":{"input":"$manualTests","as":"mt","cond":{"$eq":["$$mt.passed",false]}}},
{"$filter":{"input":"$automaticTests","as":"at","cond":{"$eq":["$$at.passed",false]}}},
{"$filter":{"input":"$semiautomaticTests","as":"st","cond":{"$eq":["$$st.passed",false]}}}
]
}
}},
{"$unwind":"$tests"},
{"$group":{"_id":"$tests.name","count":{"$sum":1}}}
])
I have a data set that looks something like:
{"key": "abc", "val": 1, "status": "np"}
{"key": "abc", "val": 2, "status": "p"}
{"key": "def", "val": 3, "status": "np"}
{"key": "ghi", "val": 4, "status": "np"}
{"key": "ghi", "val": 5, "status": "p"}
I want a query that returns document(s) that have a status="np" but only where there are other documents with the same key that do not have a status value of "p". So the document returned from the data set above would be key="def" since "abc" has a value of "np" but "abc" also has a document with a value of "p". This is also true for key="ghi". I came up with something close but I don't think the $nin operator supports q distinct query.
db.test2.find({$and: [{"status":"np"}, {"key": {$nin:[<distinct value query>]]})
If I were to hardcode the value in the $nin array, it would work:
db.test2.find({$and: [{"status":"np"}, {"key": {$nin:['abc', 'ghi']}}]})
I just need to be able to write a find inside the square brackets. I could do something like:
var res=[];
res = db.test2.distinct("key", {"status": "p"});
db.test2.find({$and: [{"status":"np"}, {"key": {$nin:res}}]});
But the problem with this is that in the time between the two queries, another process may update the "status" of a document and then I'd have inconsistent data.
Try this
db.so.aggregate([
{$group: {'_id': '$key', 'st': {$push: '$status'}}},
{$project :{st: 1, '$val':1, '$status':1, 'hasNp':{$in:['np', '$st']}, hasP: {$in:['p', '$st']}}},
{$match: {hasNp: true, hasP: false}}
]);
I have a dataset in MongoDB and this is an example of a line of my data:
{ "conversionDate": "2016-08-01",
"timeLagInDaysHistogram": 0,
"pathLengthInInteractionsHistogram": 4,
"campaignPath": [
{"campaignName": "name1", "source": "sr1", "medium": "md1", "click": "0"},
{"campaignName": "name2", "source": "sr1", "medium": "md1", "click": "0"},
{"campaignName": "name1", "source": "sr2", "medium": "md2", "click": "1"},
{"campaignName": "name3", "source": "sr1", "medium": "md3", "click": "1"}
],
"totalTransactions": 1,
"totalValue": 37.0,
"avgCartValue": 37.0
}
(The length of campaignPath is not constant, so each line can have a different amount of elements.
And I want to find elements that matches "source = sr1" in the last element of campaignPath.
I know I can't do a query with something like
db.paths.find(
{
'campaignPath.-1.source': "sr1"
}
)
But, since I have "pathLengthInInteractionsHistogram" stored which is equal to the length of campaignPath lenght, can't I do something like:
db.paths.find(
{
'campaignPath.$pathLengthInInteractionsHistogram.source': "sr1"
}
)
Starting with MongoDB 3.2, you can do this with aggregate which provides the $arrayElemAt operator which accepts a -1 index to access the last element.
db.paths.aggregate([
// Project the original doc along with the last campaignPath element
{$project: {
doc: '$$ROOT',
lastCampaign: {$arrayElemAt: ['$campaignPath', -1]}
}},
// Filter on the last campaign's source
{$match: {'lastCampaign.source': 'sr1'}},
// Remove the added lastCampaign field
{$project: {doc: 1}}
])
In earlier releases, you're stuck using $where. This will work but has poor performance:
db.paths.find({
$where: 'this.campaignPath[this.pathLengthInInteractionsHistogram-1].source === "sr1"'
})
which you could also do without using pathLengthInInteractionsHistogram:
db.paths.find({$where: 'this.campaignPath[this.campaignPath.length-1].source === "sr1"'})