(mongo) How could i get the documents that have a value in array along with size - mongodb

I have a mongo collection with something like the below:
{
"_id" : ObjectId("59e013e83260c739f029ee21"),
"createdAt" : ISODate("2017-10-13T01:16:24.653+0000"),
"updatedAt" : ISODate("2017-11-11T17:13:52.956+0000"),
"age" : NumberInt(34),
"attributes" : [
{
"year" : "2017",
"contest" : [
{
"name" : "Category1",
"division" : "Department1"
},
{
"name" : "Category2",
"division" : "Department1"
}
]
},
{
"year" : "2016",
"contest" : [
{
"name" : "Category2",
"division" : "Department1"
}
]
},
{
"year" : "2015",
"contest" : [
{
"name" : "Category1",
"division" : "Department1"
}
]
}
],
"name" : {
"id" : NumberInt(9850214),
"first" : "john",
"last" : "afham"
}
}
now how could i get the number of documents who have contest with name category1 more than one time or more than 2 times ... and so on
I tried to use size and $gt but couldn't form a correct result

Assuming that a single contest will never contain the same name (e.g. "Category1") value more than once, here is what you can do.
The absence of any unwinds will result in improved performance in particular on big collections or data sets with loads of entries in your attributes arrays.
db.collection.aggregate({
$project: {
"numberOfOccurrences": {
$size: { // count the number of matching contest elements
$filter: { // get rid of all contest entries that do not contain at least one entry with name "Category1"
input: "$attributes",
cond: { $in: [ "Category1", "$$this.contest.name" ] }
}
}
}
}
}, {
$match: { // filter the number of documents
"numberOfOccurrences": {
$gt: 1 // put your desired min. number of matching contest entries here
}
}
}, {
$count: "numberOfDocuments" // count the number of matching documents
})

Try this on for size.
db.foo.aggregate([
// Start with breaking down attributes:
{$unwind: "$attributes"}
// Next, extract only name = Category1 from the contest array. This will yield
// an array of 0 or 1 because I am assuming that contest names WITHIN
// the contest array are unique. If found and we get an array of 1, turn that
// into a single doc instead of an array of a single doc by taking arrayElemAt 0.
// Otherwise, "x" is not set into the doc AT ALL. All other vars in the doc
// will go away after $project; if you want to keep them, change this to
// $addFields:
,{$project: {x: {$arrayElemAt: [ {$filter: {
input: "$attributes.contest",
as: "z",
cond: {$eq: [ "$$z.name", "Category1" ]}
}}, 0 ]}
}}
// We split up attributes before, creating multiple docs with the same _id. We
// must now "recombine" these _id (OP said he wants # of docs with name).
// We now have to capture all the single "x" that we created above; docs without
// Category1 will have NO "x" and we don't want to include them in the count.
// Also, we KNOW that name can only be Category 1 but division could vary, so
// let's capture that in the $push in case we might want it:
,{$group: {_id: "$_id", x: {$push: "$x.division"}}}
// One more pass to compute length of array:
,{$addFields: {len: {$size: "$x"}} }
// And lastly, the filter for one time or two times or n times:
,{$match: {len: {$gt: 2} }}
]);

First, we need to flatten the document by the attributes and contest fields. Then to group by the document initial _id and a contest names counting different contests along the way. Finally, we filter the result.
db.person.aggregate([
{ $unwind: "$attributes" },
{ $unwind: "$attributes.contest" },
{$group: {
_id: {initial_id: "$_id", contest: "$attributes.contest.name"},
count: {$sum: 1}
}
},
{$match: {$and: [{"_id.contest": "Category1"}, {"count": {$gt: 1}}]}}]);

Related

How do I join collections based on a substring in MongoDB

Just trying to wrap my head around how to reference a collection based on a substring match.
Say I have one collection with chat information (including cell_number: 10 digits). I have another collection in which every document includes a field for area code (3 digits) and some other information about that area code, like so:
Area codes collection excerpt:
{
"_id" : ObjectId("62cf3d56580efcedf7c3b845"),
"code" : "206",
"region" : "WA",
"abbr" : "WA"
}
{
"_id" : ObjectId("62cf3d56580efcedf7c3b846"),
"code" : "220",
"region" : "OH",
"abbr" : "OH"
}
{
"_id" : ObjectId("62cf3d56580efcedf7c3b847"),
"code" : "226",
"region" : "Ontario",
"abbr" : "ON"
}
What I want to do is be able to grab out just the document from the area codes collection which has a "code" field value matching the first 3 characters of a cell number. A simple example is below. Note that in this example I have hardcoded the cell number for simplicity, but in reality it would be coming from the chat_conversations "parent" collection:
db.chat_conversations.aggregate([
{$match: {wait_start: {$gte: new Date("2020-07-01"), $lt: new Date("2022-07-12")}}},
{$lookup: {
from: "area_codes",
let: {
areaCode: "$code",
cellNumber: "2065551234"
},
pipeline: [
{$match: {
$expr: {
$eq: ["$$areaCode", {$substr: ["$$cellNumber", 0, 3]}]
}
}}
],
as: "area_code"
}},
]).pretty()
Unfortunately nothing I try here seems to work and I just get back an empty array from the area codes collection.
What am I missing here?
you should do a let on chat_converstion
or you should use let with vars and in expressions
this query should work for you
db.chat_conversations.aggregate([
{$match: {wait_start: {$gte: new Date("2020-07-01"), $lt: new Date("2022-07-12")}}},
{
$lookup: {
from: "area_codes",
let:{
"cell_Number":{$substr:["$cellNumber",0,3]}
},
pipeline:[
{
$match:{
$expr:{
$eq: [ "$code", "$$cell_Number" ]
}
}
}],
as: "data"
}
},
])

Finding the mongoDB object

I have a Sequence : [k1,k4,k6,k10]and a MongoDB object as follows:{"_id" : "x" , "k" : [ "k1", "k2 ","k6"]},{"_id" : "y" , "k" : [ "k2", "k4 ","k10","k11"]},{"_id" : "z" , "k" : [ "k4", "k6 ","k10","k12"]}
I have to find the particular object which has a maximum number of matching elements from the array.In this case, it will be "z" as it has three matching elements in the "k" array i,e ["k4","k6","k10"].
So I wanted to know that is there a MongoDB way of doing?
You can use this query :
db.data.aggregate([{
$match: {
k: {
$in: ["k1", "k4", "k6", "k10"]
}
}
}, {
$addFields: {
count: {
$size:{
$setIntersection: [
["k1", "k4", "k6", "k10"], "$k"
]
}
}
}
}, {
$sort: {
count: -1
}
}, {
$limit: 1
}])
This query will :
filter on relevant records (non-necessary but better perfs with it)
Calculate the count of matching elements using the set intersection
sort them by count descending
return the first record
Note: There can be multiple records with maximum matches, the above query gets you only one.

Aggregate on array of embedded documents

I have a mongodb collection with multiple documents. Each document has an array with multiple subdocuments (or embedded documents i guess?). Each of these subdocuments is in this format:
{
"name": string,
"count": integer
}
Now I want to aggregate these subdocuments to find
The top X counts and their name.
Same as 1. but the names have to match a regex before sorting and limiting.
I have tried the following for 1. already - it does return me the top X but unordered, so I'd have to order them again which seems somewhat inefficient.
[{
$match: {
_id: id
}
}, {
$unwind: {
path: "$array"
}
}, {
$sort: {
'count': -1
}
}, {
$limit: x
}]
Since i'm rather new to mongodb this is pretty confusing for me. Happy for any help. Thanks in advance.
The sort has to include the array name in order to avoid an additional sort later on.
Given the following document to work with:
{
students: [{
count: 4,
name: "Ann"
}, {
count: 7,
name: "Brad"
}, {
count: 6,
name: "Beth"
}, {
count: 8,
name: "Catherine"
}]
}
As an example, the following aggregation query will match any name containing the letters "h" and "e". This needs to happen after the "$unwind" step in order to only keep the ones you need.
db.tests.aggregate([
{$match: {
_id: ObjectId("5c1b191b251d9663f4e3ce65")
}},
{$unwind: {
path: "$students"
}},
{$match: {
"students.name": /[he]/
}},
{$sort: {
"students.count": -1
}},
{$limit: 2}
])
This is the output given the above mentioned input:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 6, "name" : "Beth" } }
Both names contain the letters "h" and "e", and the output is sorted from high to low.
When setting the limit to 1, the output is limited to:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
In this case only the highest count has been kept after having matched the names.
=====================
Edit for the extra question:
Yes, the first $match can be changed to filter on specific universities.
{$match: {
university: "University X"
}},
That will give one or more matching documents (in case you have a document per year or so) and the rest of the aggregation steps would still be valid.
The following match would retrieve the students for the given university for a given academic year in case that would be needed.
{$match: {
university: "University X",
academic_year: "2018-2019"
}},
That should narrow it down to get the correct documents.

MongoDB: How to match on the elements of an array?

I have two collections as follows:
db.qnames.find()
{ "_id" : ObjectId("5a4da53f97a9ca769a15d49e"), "domain" : "mail.google.com", "tldOne" : "google.com", "clients" : 10, "date" : "2016-12-30" }
{ "_id" : ObjectId("5a4da55497a9ca769a15d49f"), "domain" : "mail.google.com", "tldOne" : "google.com", "clients" : 9, "date" : "2017-01-30” }
and
db.dropped.find()
{ "_id" : ObjectId("5a4da4ac97a9ca769a15d49c"), "domain" : "google.com", "dropDate" : "2017-01-01", "regStatus" : 1 }
I would like to join the two collections and choose the documents for which 'dropDate' field (from dropped collection) is larger than the 'date' filed (from qnames field). So I used the following query:
db.dropped.aggregate( [{$lookup:{ from:"qnames", localField:"domain",foreignField:"tldOne",as:"droppedTraffic"}},
{$match: {"droppedTraffic":{$ne:[]} }},
{$unwind: "$droppedTraffic" } ,
{$match: {dropDate:{$gt:"$droppedTraffic.date"}}} ])
but this query does not filter the records where dropDate < date. Anyone can give me a clue of why it happens?
The reason why you are not getting the record is
Date is used as a String in your collections, to make use of the comparison operators to get the desired result modify your collection documents using new ISODate("your existing date in the collection")
Please note even after modifying both the collections you need to modify your aggregate query, since in the final $match query two values from the same document is been compared.
Sample query to get the desired documents
db.dropped.aggregate([
{$lookup: {
from:"qnames",
localField:"domain",
foreignField:"tldOne",
as:"droppedTraffic"}
},
{$project: {
_id:1, domain:1,
regStatus:1,
droppedTraffic: {
$filter: {
input: "$droppedTraffic",
as:"droppedTraffic",
cond:{ $gt: ["$$droppedTraffic.date", "$dropDate"]}
}
}
}}
])
In this approach given above we have used $filter which avoids the $unwind operation
You should use $redact to compare two fields of the same document. Following example should work:
db.dropped.aggregate( [
{$lookup:{ from:"qnames", localField:"domain",foreignField:"tldOne",as:"droppedTraffic"}},
{$match: {"droppedTraffic":{$ne:[]} }},
{$unwind: "$droppedTraffic" },
{
"$redact": {
"$cond": [
{ "$lte": [ "$dropDate", "$droppedTraffic.date" ] },
"$$KEEP",
"$$PRUNE"
]
}
}
])

Compare a date of two elements

My problem is difficult to explain :
In my website I save every action of my visitors (view, click, buy etc).
I have a simple collection named "flow" where my data is registered
{
"_id" : ObjectId("534d4a9a37e4fbfc0bf20483"),
"profile" : ObjectId("534bebc32939ffd316a34641"),
"activities" : [
{
"id" : ObjectId("534bebc42939ffd316a3af62"),
"date" : ISODate("2013-12-13T22:39:45.808Z"),
"verb" : "like",
"product" : "5"
},
{
"id" : ObjectId("534bebc52939ffd316a3f480"),
"date" : ISODate("2013-12-20T19:19:10.098Z"),
"verb" : "view",
"product" : "6"
},
{
"id" : ObjectId("534bebc32939ffd316a3690f"),
"date" : ISODate("2014-01-01T07:11:44.902Z"),
"verb" : "buy",
"product" : "5"
},
{
"id" : ObjectId("534bebc42939ffd316a3741b"),
"date" : ISODate("2014-01-11T08:49:02.684Z"),
"verb" : "favorite",
"product" : "26"
}
]
}
I would like to aggregate these data to retrieve the number of people who made an action (for example "view") and then another later in time (for example "buy"). To to that I need to compare "date" inside my "activities" array...
I tried to use aggregation framework to do that but I do not see how too make this request
This is my beginning :
db.flows.aggregate([
{ $project: { profile: 1, activities: 1, _id: 0 } },
{ $match: { $and: [{'activities.verb': 'view'}, {'activities.verb': 'buy'}] }}, //First verb + second verb
{ $unwind: '$activities' },
{ $match: { 'activities.verb': {$in:['view', 'buy']} } }, //First verb + second verb,
{
$group: {
_id: '$profile',
view: { $push: { $cond: [ { $eq: [ "$activities.verb", "view" ] } , "$activities.date", null ] } },
buy: { $push: { $cond: [ { $eq: [ "$activities.verb", "buy" ] } , "$activities.date", null ] } }
}
}
])
Maybe the format of my collection "flow" is not the best to do what I want...If you have any better idea dont hesitate
Thank you for your help !
Here is the aggregation that will give you the total number of buyers who viewed first and then bought (though not necessarily the same product that they viewed).
db.flow.aggregate(
{$match: {"activities.verb":{$all:["view","buy"]}}},
{$unwind :"$activities"},
{$match: {"activities.verb":{$in:["view","buy"]}}},
{$group: {
_id:"$_id",
firstViewed:{$min:{$cond:{
if:{$eq:["$activities.verb","view"]},
then : "$activities.date",
else : new Date(9999,0,1)
}}},
lastBought: {$max:{$cond:{
if:{$eq:["$activities.verb","buy"]},
then:"$activities.date",
else:new Date(1900,0,1)}
}}}
},
{$project: {viewedThenBought:{$cond:{
if:{$gt:["$lastBought","$firstViewed"]},
then:1,
else:0
}}}},
{$group:{_id:null,totalViewedThenBought:{$sum:"$viewedThenBought"}}}
)
Here you first pass through the pipeline only the documents that have all the "verbs" you are interested in. When you group the first time, you want to use the earliest "view" and the last "buy" and the next project compares them to see if they viewed before they bought.
The last step gives you the count of all the people who satisfied your criteria.
Be careful to leave out all $project phases that don't actually compute any new fields (like you very first $project). The aggregation framework is smart enough to never pass through any fields that it sees are not used in any later stages, so there is never a need to $project just to "eliminate" fields as that will happen automatically.
For your query:
I would like to aggregate these data to retrieve the number of people who made an action
Try this:
db.flows.aggregate([
// De-normalize the array into individual documents
{"$unwind" : "$activities"},
// Match for the verbs you are interested in
{"$match" : {"activities.verb":{$in:["buy", "view"]}}},
// Group by verb to get the count
{"$group" : {_id:"$activities.verb", count:{$sum:1}}}
])
The above query would produce an output like:
{
"result" : [
{
"_id" : "buy",
"count" : 1
},
{
"_id" : "view",
"count" : 1
}
],
"ok" : 1
}
Note: The $and operator in your query ({ $match: { $and: [{'activities.verb': 'view'}, {'activities.verb': 'buy'}] }}) is not required as that's the default if you specify multiple conditions. Only if you need a logical OR, $or operator is required.
If you want to use the date in the aggregation query to do queries like how many "views by day", etc.. the Date Aggregation Operators will come in handy.
I see where you are going with this and I think you are basically on the right track. So more or less un-altered (but for formatting preference) and the few tweeks at the end:
db.flows.aggregate([
// Try to $match "first" always to make sure you can get an index
{ "$match": {
"$and": [
{"activities.verb": "view"},
{"activities.verb": "buy"}
]
}},
// Don't worry, the optimizer "sees" this and will sort of "blend" with
// with the first stage.
{ "$project": {
"profile": 1,
"activities": 1,
"_id": 0
}},
{ "$unwind": "$activities" },
{ "$match": {
"activities.verb": { "$in":["view", "buy"] }
}},
{ "$group": {
"_id": "$profile",
"view": { "$min": { "$cond": [
{ "$eq": [ "$activities.verb", "view" ] },
"$activities.date",
null
]}},
"buy": { "$max": { "$cond": [
{ "$eq": [ "$activities.verb", "buy" ] },
"$activities.date",
null
]}}
}},
{ "$project": {
"viewFirst": { "$lt": [ "$view", "$buy" ] }
}}
])
So essentially the $min and $max operators should be self explanatory in the context in that you should be looking for the "first" view to correspond with the "last" purchase. As for me, and would make sense, you would actually be matching these by product (but hint: "Grouping") but I'll leave that part up to you.
The other advantage here is that the false values will always be negated if there is an actual date to match the "verb". Otherwise this goes through as false and this turns out to be okay.
That is because the next thing you do is $project to "compare" the values and ask the question "Did the 'view' happen before the 'buy'?" which is a logical evaluation of the "less than" $lt operator.
As for the schema itself. If you are storing a lot of these "events" then you are probably better off flattening things out into separate documents and finding some way to mark each with the same "session" identifier if that is separate to "profile".
Getting away from large arrays ( which this seems to lead to ) if likely going to help performance, and with care, makes little different to the aggregation process.