MongoDB How to get value from subdocument when field name is unknown - mongodb

I'm trying to get data from one or more subdocuments but I don't know the name of the field that will hold the subdocument. Here are some examples of what the documents look like.
https://github.com/vz-risk/VCDB/blob/master/data/json/0C5DE044-B9B4-408D-9E65-D367EED12AB2.json
https://github.com/vz-risk/VCDB/blob/master/data/json/064F5887-C2DA-4139-B3AA-D55906F8C30A.json
I would like to get the action varieties for these incidents, so in the case of the first one I would like to get action.malware.variety and action.social.variety. In the second example it would be action.hacking.variety and action.malware.variety. So the problem is that I don't know what field is going to hold the subdocument. It could be one of hacking, malware, social, error, misuse, physical, and environmental.
So I would like to $unwind that field and do some stuff with the key name. Is this something that can be done with aggregation or do I need to switch over to mapReduce?

You seem to be talking about a case where you are not sure if all of the hacking, social or malaware parts are there right. I think you want $project first using the $ifNull operator as in:
db.stuff.aggregate([
{$project:
{
hacking: {$ifNull: ["$action.hacking.variety",[null]]},
social: {$ifNull: ["$action.social.variety",[null]]},
malware: {$ifNull: ["$action.malware.variety",[null]]}
}
},
{$unwind: "$hacking"},
{$unwind: "$social"},
{$unwind: "$malware"}
])
That should give you documents with something in each of those values.
Sort of pretty much the same with any of your possible list of values.

Related

Assign an incremental number to a field in a MongoDB aggregation

I have a MongoDB aggregation and each document has a field (groupNumber) like this:
I need that each groupNumber has a different number for each document (could be incremental 1,2,3..)
Most of the solutions that I had found are using "find" like this one , but I think that cannot be used in an aggregation.
Thanks in advance.
You can do this with a lookup, but I don't think it will scale well. I would think this should be persisted in some way, perhaps $out to another collection, so that the numbers are stable and deleting or inserting a document won't change the numbers for any other group.
db.target.aggregate([
{"$lookup":{
"from":"target",
"as":"looked",
"let":{"srcId":"$_id"},
"pipeline":[
{"$match":{"$expr":{"$lte":["$_id","$$srcId"]}}},
{"$group":{"_id":"null", "cnt":{"$sum":1}}}
]
}},
{"$addFields":{"groupNumber":{"$arrayElemAt":["$looked.cnt",0]}}},
{"$project":{"looked":0}}
])

Mongodb optimal query

My use case is different. I am trying to map it to user and orders for easy understanding.
I have to get the following for a user
For each department
For each order type
delivered count
unique orders
Unique order count means user might have ordered the same product, but that count has to be 1 for the same. I have the background logic and identified via duplicate order ids.
db.getCollection('user_orders').aggregate([{"user_id":123},
{$group: {"_id": {"department":"$department", "order_type":"$order_type"},
"del_count":{$sum:"$del_count"},
"unique_order":{$addToSet:{"unique_order":"$unique_order"}}}},
{$project: {"_id":0,
"department":"$_id.department",
"order_type_name":"$_id.order_type",
"unique_order_count": {$size:"$unique_order"},
"del_count":"$del_count"
}},
{$group: {"_id":"$department",
order_types: {$addToSet:
{"order_type_name":"$order_type_name",
"unique_order_count": "$unique_order_count",
"del_count":"$del_count"
}}}}
])
Sorry for my query formatting.
This query is working absolutely fine. I added the second grouping to bring the responses together for all order types of the same department.
Can I do the same in less number of pipelines - efficient ways?
The $project stage appears to be redundant but it's more refactoring rather than performance improvement. Your simplified pipeline can look like below:
db.getCollection('user_orders').aggregate([{$group: {"_id": {"department":"$department", "order_type":"$order_type"},
"del_count":{$sum:"$del_count"},
"unique_order":{$addToSet:{"unique_order":"$unique_order"}}}},
{$group: {"_id":"$_id.department",
order_types: {$addToSet:
{"order_type_name":"$_id.order_type",
"unique_order_count": {$size:"$unique_order"},
"del_count":"$del_count"
}}}}
])

Can an index on a subfield cover queries on projections of that field?

Imagine you have a schema like:
[{
name: "Bob",
naps: [{
time: 2019-05-01T15:35:00,
location: "sofa"
}, ...]
}, ...
]
So lots of people, each with a few dozen naps. You want to find out 'what days do people take the most naps?', so you index naps.time, and then query with:
aggregate([
{$unwind: naps},
{$group: { _id: {$day: "$naps.time"}, napsOnDay: {"$sum": 1 } }
])
But when doing explain(), mongo tells me no index was used in this query, when clearly the index on the time Date field could have been. Why is this? How can I get mongo to use the index for the more optimal query?
Indexes stores pointers to actual documents, and can only be used when working with a material document (i.e. the document that is actually stored on disk).
$match or $sort does not mutate the actual documents, and thus indexes can be used in these stages.
In contrast, $unwind, $group, or any other stages that changes the actual document representations basically loses the connection between the index and the material documents.
Additionally, when those stages are processed without $match, you're basically saying that you want to process the whole collection. There is no point in using the index if you want to process the whole collection.

How to project in MongoDB after sort?

In find operation fields can be excluded, but what if I want to do a find then a sort and just after then the projection. Do you know any trick, operation for it?
doc: fields {Object}, the fields to return in the query. Object of fields to include or exclude (not both), {‘a’:1}
You can run a usual find query with conditions, projections, and sort. I think you want to sort on a field that you don't want to project. But don't worry about that, you can sort on that field even after not projecting it.
If you explicitly select projection of sorting field as "0", then you won't be able to perform that find query.
//This query will work
db.collection.find(
{_id:'someId'},
{'someField':1})
.sort('someOtherField':1)
//This query won't work
db.collection.find(
{_id:'someId'},
{'someField':1,'someOtherField':0})
.sort('someOtherField':1)
However, if you still don't get required results, look into the MongoDB Aggregation Framework!
Here is the sample query for aggregation according to your requirement
db.collection.aggregate([
{$match: {_id:'someId'}},
{$sort: {someField:1}},
{$project: {_id:1,someOtherField:1}},
])

$elemMatch Projection on a Simple Array

Imagine a collection of movies (stored in a MongoDB collection), with each one looking something like this:
{
_id: 123456,
name: 'Blade Runner',
buyers: [1123, 1237, 1093, 2910]
}
I want to get a list of movies, each one with an indication whether buyer 2910 (for example) bought it.
Any ideas?
I know I can change [1123, 1237, 1093, 2910] to [{id:1123}, {id:1237}, {id:1093}, {id:2910}] to allow the use of $elemMatch in the projection, but would prefer not to touch the structure.
I also know I can perhaps use the $unwind operator (within the aggregation framework), but that seems very wasteful in cases where buyer has thousands of values (basically exploding each document into thousands of copies in memory before matching).
Any other ideas? Am I missing something really simple here?
You can use the $setIsSubset aggregation operator to do this:
var buyer = 2910;
db.movies.aggregate(
{$project: {
name: 1,
buyers: 1,
boughtIt: {$setIsSubset: [[buyer], '$buyers']}
}}
)
That will give you all movie docs with a boughtIt field added that indicates whether buyer is contained in the the movie's buyers array.
This operator was added in MongoDB 2.6.
Not really sure of your intent here, but you don't need to change the structure just to use $elemMatch in projection. You can just issue like this:
db.movies.find({},{ "buyers": { "$elemMatch": { "$eq": 2910 } } })
That would filter the returned array elements to just the "buyer" that was indicated, or nothing where this was not present. It is true to point out that the $eq operator used here is not actually documented, but it does exist. So that may not be immediately clear that you can construct a condition in that way.
It seems a little wasteful to me though as you are returning "everything" regardless of whether the "buyer" is present or not. So a "query" seems more logical than a projection:
db.movies.find({ "buyers": 2910 })
And optionally either just keeping only that matched result:
db.movies.find({ "buyers": 2910 },{ "buyers.$": 1})
Set operators in the aggregation framework give you more options with $project which can do more to alter the document. But if you just want to know if someone "bought" the item, then a "query" seems the be logical and fastest way to do so.