How do I join collections based on a substring in MongoDB - mongodb

Just trying to wrap my head around how to reference a collection based on a substring match.
Say I have one collection with chat information (including cell_number: 10 digits). I have another collection in which every document includes a field for area code (3 digits) and some other information about that area code, like so:
Area codes collection excerpt:
{
"_id" : ObjectId("62cf3d56580efcedf7c3b845"),
"code" : "206",
"region" : "WA",
"abbr" : "WA"
}
{
"_id" : ObjectId("62cf3d56580efcedf7c3b846"),
"code" : "220",
"region" : "OH",
"abbr" : "OH"
}
{
"_id" : ObjectId("62cf3d56580efcedf7c3b847"),
"code" : "226",
"region" : "Ontario",
"abbr" : "ON"
}
What I want to do is be able to grab out just the document from the area codes collection which has a "code" field value matching the first 3 characters of a cell number. A simple example is below. Note that in this example I have hardcoded the cell number for simplicity, but in reality it would be coming from the chat_conversations "parent" collection:
db.chat_conversations.aggregate([
{$match: {wait_start: {$gte: new Date("2020-07-01"), $lt: new Date("2022-07-12")}}},
{$lookup: {
from: "area_codes",
let: {
areaCode: "$code",
cellNumber: "2065551234"
},
pipeline: [
{$match: {
$expr: {
$eq: ["$$areaCode", {$substr: ["$$cellNumber", 0, 3]}]
}
}}
],
as: "area_code"
}},
]).pretty()
Unfortunately nothing I try here seems to work and I just get back an empty array from the area codes collection.
What am I missing here?

you should do a let on chat_converstion
or you should use let with vars and in expressions
this query should work for you
db.chat_conversations.aggregate([
{$match: {wait_start: {$gte: new Date("2020-07-01"), $lt: new Date("2022-07-12")}}},
{
$lookup: {
from: "area_codes",
let:{
"cell_Number":{$substr:["$cellNumber",0,3]}
},
pipeline:[
{
$match:{
$expr:{
$eq: [ "$code", "$$cell_Number" ]
}
}
}],
as: "data"
}
},
])

Related

How to project single array field

starting from this document schema
{
"_id" : ObjectId("5a5cfde58c8a4a35a89726b4"),
"Names" : [
{ "Code" : "en", "Name" : "ITALY" },
{ "Code" : "it", "Name" : "ITALIA" }
],
"TenantID" : ObjectId("5a5cfde58c8a4a35a89726b2"),
...extra irrelevant fields
}
I need to get an object like this
{
"ID" : ObjectId("5a5cfde58c8a4a35a89726b4"),
"Name" : "ITALY"
}
Filtering by array's Code field (in the sample by 'en').
I wrote an aggregate query, this
db.Countries.aggregate([{$project: {_id:0, 'ID': '$_id', 'Names': {$filter: {input: '$Names', as: 'item', cond: {$eq: ['$$item.Code', 'en']}}}}},{$skip: 10},{$limit: 5}]);
that correctly return only documents with 'en' Names values, returning only the sub array matching elements.
Now I cannot find the way to return only the Name field value into a new 'Name' field.
Which is the best and most performing method to do this ?
Actually I'm trying on Mongo shell, but later I need to replicate this behavior with .NET driver.
Using MongoDB 3.6
Thanks
You can try below aggregation query.
$match to only consider documents where there is atleast one element in array matching input criteria.
$filter array on the input criteria and $arrayElemAt to project the single matched element.
$let to output name from matched object.
db.Countries.aggregate([
{"$match":{"Names.Code":"en"}},
{"$project":{
"_id":0,
"ID": "$_id",
"Name":{
"$let":{
"vars":{
"obj":{
"$arrayElemAt":[
{"$filter":{
"input":"$Names",
"as":"name",
"cond":{"$eq":["$$name.Code","en"]}
}},
0]
}
},
"in":"$$obj.Name"
}
}
}},
{"$skip": 10},
{"$limit": 5}
])

MongoDB: How to match on the elements of an array?

I have two collections as follows:
db.qnames.find()
{ "_id" : ObjectId("5a4da53f97a9ca769a15d49e"), "domain" : "mail.google.com", "tldOne" : "google.com", "clients" : 10, "date" : "2016-12-30" }
{ "_id" : ObjectId("5a4da55497a9ca769a15d49f"), "domain" : "mail.google.com", "tldOne" : "google.com", "clients" : 9, "date" : "2017-01-30” }
and
db.dropped.find()
{ "_id" : ObjectId("5a4da4ac97a9ca769a15d49c"), "domain" : "google.com", "dropDate" : "2017-01-01", "regStatus" : 1 }
I would like to join the two collections and choose the documents for which 'dropDate' field (from dropped collection) is larger than the 'date' filed (from qnames field). So I used the following query:
db.dropped.aggregate( [{$lookup:{ from:"qnames", localField:"domain",foreignField:"tldOne",as:"droppedTraffic"}},
{$match: {"droppedTraffic":{$ne:[]} }},
{$unwind: "$droppedTraffic" } ,
{$match: {dropDate:{$gt:"$droppedTraffic.date"}}} ])
but this query does not filter the records where dropDate < date. Anyone can give me a clue of why it happens?
The reason why you are not getting the record is
Date is used as a String in your collections, to make use of the comparison operators to get the desired result modify your collection documents using new ISODate("your existing date in the collection")
Please note even after modifying both the collections you need to modify your aggregate query, since in the final $match query two values from the same document is been compared.
Sample query to get the desired documents
db.dropped.aggregate([
{$lookup: {
from:"qnames",
localField:"domain",
foreignField:"tldOne",
as:"droppedTraffic"}
},
{$project: {
_id:1, domain:1,
regStatus:1,
droppedTraffic: {
$filter: {
input: "$droppedTraffic",
as:"droppedTraffic",
cond:{ $gt: ["$$droppedTraffic.date", "$dropDate"]}
}
}
}}
])
In this approach given above we have used $filter which avoids the $unwind operation
You should use $redact to compare two fields of the same document. Following example should work:
db.dropped.aggregate( [
{$lookup:{ from:"qnames", localField:"domain",foreignField:"tldOne",as:"droppedTraffic"}},
{$match: {"droppedTraffic":{$ne:[]} }},
{$unwind: "$droppedTraffic" },
{
"$redact": {
"$cond": [
{ "$lte": [ "$dropDate", "$droppedTraffic.date" ] },
"$$KEEP",
"$$PRUNE"
]
}
}
])

(mongo) How could i get the documents that have a value in array along with size

I have a mongo collection with something like the below:
{
"_id" : ObjectId("59e013e83260c739f029ee21"),
"createdAt" : ISODate("2017-10-13T01:16:24.653+0000"),
"updatedAt" : ISODate("2017-11-11T17:13:52.956+0000"),
"age" : NumberInt(34),
"attributes" : [
{
"year" : "2017",
"contest" : [
{
"name" : "Category1",
"division" : "Department1"
},
{
"name" : "Category2",
"division" : "Department1"
}
]
},
{
"year" : "2016",
"contest" : [
{
"name" : "Category2",
"division" : "Department1"
}
]
},
{
"year" : "2015",
"contest" : [
{
"name" : "Category1",
"division" : "Department1"
}
]
}
],
"name" : {
"id" : NumberInt(9850214),
"first" : "john",
"last" : "afham"
}
}
now how could i get the number of documents who have contest with name category1 more than one time or more than 2 times ... and so on
I tried to use size and $gt but couldn't form a correct result
Assuming that a single contest will never contain the same name (e.g. "Category1") value more than once, here is what you can do.
The absence of any unwinds will result in improved performance in particular on big collections or data sets with loads of entries in your attributes arrays.
db.collection.aggregate({
$project: {
"numberOfOccurrences": {
$size: { // count the number of matching contest elements
$filter: { // get rid of all contest entries that do not contain at least one entry with name "Category1"
input: "$attributes",
cond: { $in: [ "Category1", "$$this.contest.name" ] }
}
}
}
}
}, {
$match: { // filter the number of documents
"numberOfOccurrences": {
$gt: 1 // put your desired min. number of matching contest entries here
}
}
}, {
$count: "numberOfDocuments" // count the number of matching documents
})
Try this on for size.
db.foo.aggregate([
// Start with breaking down attributes:
{$unwind: "$attributes"}
// Next, extract only name = Category1 from the contest array. This will yield
// an array of 0 or 1 because I am assuming that contest names WITHIN
// the contest array are unique. If found and we get an array of 1, turn that
// into a single doc instead of an array of a single doc by taking arrayElemAt 0.
// Otherwise, "x" is not set into the doc AT ALL. All other vars in the doc
// will go away after $project; if you want to keep them, change this to
// $addFields:
,{$project: {x: {$arrayElemAt: [ {$filter: {
input: "$attributes.contest",
as: "z",
cond: {$eq: [ "$$z.name", "Category1" ]}
}}, 0 ]}
}}
// We split up attributes before, creating multiple docs with the same _id. We
// must now "recombine" these _id (OP said he wants # of docs with name).
// We now have to capture all the single "x" that we created above; docs without
// Category1 will have NO "x" and we don't want to include them in the count.
// Also, we KNOW that name can only be Category 1 but division could vary, so
// let's capture that in the $push in case we might want it:
,{$group: {_id: "$_id", x: {$push: "$x.division"}}}
// One more pass to compute length of array:
,{$addFields: {len: {$size: "$x"}} }
// And lastly, the filter for one time or two times or n times:
,{$match: {len: {$gt: 2} }}
]);
First, we need to flatten the document by the attributes and contest fields. Then to group by the document initial _id and a contest names counting different contests along the way. Finally, we filter the result.
db.person.aggregate([
{ $unwind: "$attributes" },
{ $unwind: "$attributes.contest" },
{$group: {
_id: {initial_id: "$_id", contest: "$attributes.contest.name"},
count: {$sum: 1}
}
},
{$match: {$and: [{"_id.contest": "Category1"}, {"count": {$gt: 1}}]}}]);

Compare a date of two elements

My problem is difficult to explain :
In my website I save every action of my visitors (view, click, buy etc).
I have a simple collection named "flow" where my data is registered
{
"_id" : ObjectId("534d4a9a37e4fbfc0bf20483"),
"profile" : ObjectId("534bebc32939ffd316a34641"),
"activities" : [
{
"id" : ObjectId("534bebc42939ffd316a3af62"),
"date" : ISODate("2013-12-13T22:39:45.808Z"),
"verb" : "like",
"product" : "5"
},
{
"id" : ObjectId("534bebc52939ffd316a3f480"),
"date" : ISODate("2013-12-20T19:19:10.098Z"),
"verb" : "view",
"product" : "6"
},
{
"id" : ObjectId("534bebc32939ffd316a3690f"),
"date" : ISODate("2014-01-01T07:11:44.902Z"),
"verb" : "buy",
"product" : "5"
},
{
"id" : ObjectId("534bebc42939ffd316a3741b"),
"date" : ISODate("2014-01-11T08:49:02.684Z"),
"verb" : "favorite",
"product" : "26"
}
]
}
I would like to aggregate these data to retrieve the number of people who made an action (for example "view") and then another later in time (for example "buy"). To to that I need to compare "date" inside my "activities" array...
I tried to use aggregation framework to do that but I do not see how too make this request
This is my beginning :
db.flows.aggregate([
{ $project: { profile: 1, activities: 1, _id: 0 } },
{ $match: { $and: [{'activities.verb': 'view'}, {'activities.verb': 'buy'}] }}, //First verb + second verb
{ $unwind: '$activities' },
{ $match: { 'activities.verb': {$in:['view', 'buy']} } }, //First verb + second verb,
{
$group: {
_id: '$profile',
view: { $push: { $cond: [ { $eq: [ "$activities.verb", "view" ] } , "$activities.date", null ] } },
buy: { $push: { $cond: [ { $eq: [ "$activities.verb", "buy" ] } , "$activities.date", null ] } }
}
}
])
Maybe the format of my collection "flow" is not the best to do what I want...If you have any better idea dont hesitate
Thank you for your help !
Here is the aggregation that will give you the total number of buyers who viewed first and then bought (though not necessarily the same product that they viewed).
db.flow.aggregate(
{$match: {"activities.verb":{$all:["view","buy"]}}},
{$unwind :"$activities"},
{$match: {"activities.verb":{$in:["view","buy"]}}},
{$group: {
_id:"$_id",
firstViewed:{$min:{$cond:{
if:{$eq:["$activities.verb","view"]},
then : "$activities.date",
else : new Date(9999,0,1)
}}},
lastBought: {$max:{$cond:{
if:{$eq:["$activities.verb","buy"]},
then:"$activities.date",
else:new Date(1900,0,1)}
}}}
},
{$project: {viewedThenBought:{$cond:{
if:{$gt:["$lastBought","$firstViewed"]},
then:1,
else:0
}}}},
{$group:{_id:null,totalViewedThenBought:{$sum:"$viewedThenBought"}}}
)
Here you first pass through the pipeline only the documents that have all the "verbs" you are interested in. When you group the first time, you want to use the earliest "view" and the last "buy" and the next project compares them to see if they viewed before they bought.
The last step gives you the count of all the people who satisfied your criteria.
Be careful to leave out all $project phases that don't actually compute any new fields (like you very first $project). The aggregation framework is smart enough to never pass through any fields that it sees are not used in any later stages, so there is never a need to $project just to "eliminate" fields as that will happen automatically.
For your query:
I would like to aggregate these data to retrieve the number of people who made an action
Try this:
db.flows.aggregate([
// De-normalize the array into individual documents
{"$unwind" : "$activities"},
// Match for the verbs you are interested in
{"$match" : {"activities.verb":{$in:["buy", "view"]}}},
// Group by verb to get the count
{"$group" : {_id:"$activities.verb", count:{$sum:1}}}
])
The above query would produce an output like:
{
"result" : [
{
"_id" : "buy",
"count" : 1
},
{
"_id" : "view",
"count" : 1
}
],
"ok" : 1
}
Note: The $and operator in your query ({ $match: { $and: [{'activities.verb': 'view'}, {'activities.verb': 'buy'}] }}) is not required as that's the default if you specify multiple conditions. Only if you need a logical OR, $or operator is required.
If you want to use the date in the aggregation query to do queries like how many "views by day", etc.. the Date Aggregation Operators will come in handy.
I see where you are going with this and I think you are basically on the right track. So more or less un-altered (but for formatting preference) and the few tweeks at the end:
db.flows.aggregate([
// Try to $match "first" always to make sure you can get an index
{ "$match": {
"$and": [
{"activities.verb": "view"},
{"activities.verb": "buy"}
]
}},
// Don't worry, the optimizer "sees" this and will sort of "blend" with
// with the first stage.
{ "$project": {
"profile": 1,
"activities": 1,
"_id": 0
}},
{ "$unwind": "$activities" },
{ "$match": {
"activities.verb": { "$in":["view", "buy"] }
}},
{ "$group": {
"_id": "$profile",
"view": { "$min": { "$cond": [
{ "$eq": [ "$activities.verb", "view" ] },
"$activities.date",
null
]}},
"buy": { "$max": { "$cond": [
{ "$eq": [ "$activities.verb", "buy" ] },
"$activities.date",
null
]}}
}},
{ "$project": {
"viewFirst": { "$lt": [ "$view", "$buy" ] }
}}
])
So essentially the $min and $max operators should be self explanatory in the context in that you should be looking for the "first" view to correspond with the "last" purchase. As for me, and would make sense, you would actually be matching these by product (but hint: "Grouping") but I'll leave that part up to you.
The other advantage here is that the false values will always be negated if there is an actual date to match the "verb". Otherwise this goes through as false and this turns out to be okay.
That is because the next thing you do is $project to "compare" the values and ask the question "Did the 'view' happen before the 'buy'?" which is a logical evaluation of the "less than" $lt operator.
As for the schema itself. If you are storing a lot of these "events" then you are probably better off flattening things out into separate documents and finding some way to mark each with the same "session" identifier if that is separate to "profile".
Getting away from large arrays ( which this seems to lead to ) if likely going to help performance, and with care, makes little different to the aggregation process.

Mongo order by length of array

Lets say I have mongo documents like this:
Question 1
{
answers:[
{content: 'answer1'},
{content: '2nd answer'}
]
}
Question 2
{
answers:[
{content: 'answer1'},
{content: '2nd answer'}
{content: 'The third answer'}
]
}
Is there a way to order the collection by size of answers?
After a little research I saw suggestions of adding another field, that would contain number of answers and use it as a reference but may be there is native way to do it?
I thought you might be able to use $size, but that's only to find arrays of a certain size, not ordering.
From the mongo documentation:
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size
You cannot use $size to find a range of sizes (for example: arrays with more than 1 element). If you need to query for a range, create an extra size field that you increment when you add elements. Indexes cannot be used for the $size portion of a query, although if other query expressions are included indexes may be used to search for matches on that portion of the query expression.
Looks like you can probably fairly easily do this with the new aggregation framework, edit: which isn't out yet.
http://www.mongodb.org/display/DOCS/Aggregation+Framework
Update Now the Aggregation Framework is out...
> db.test.aggregate([
{$unwind: "$answers"},
{$group: {_id:"$_id", answers: {$push:"$answers"}, size: {$sum:1}}},
{$sort:{size:1}}]);
{
"result" : [
{
"_id" : ObjectId("5053b4547d820880c3469365"),
"answers" : [
{
"content" : "answer1"
},
{
"content" : "2nd answer"
}
],
"size" : 2
},
{
"_id" : ObjectId("5053b46d7d820880c3469366"),
"answers" : [
{
"content" : "answer1"
},
{
"content" : "2nd answer"
},
{
"content" : "The third answer"
}
],
"size" : 3
}
],
"ok" : 1
}
I use $project for this:
db.test.aggregate([
{
$project : { answers_count: {$size: { "$ifNull": [ "$answers", [] ] } } }
},
{
$sort: {"answers_count":1}
}
])
It also allows to include documents with empty answers.
But also has a disadvantage (or sometimes advantage): you should manually add all needed fields in $project step.
you can use mongodb aggregation stage $addFields which will add extra field to store count and then followed by $sort stage.
db.test.aggregate([
{
$addFields: { answers_count: {$size: { "$ifNull": [ "$answers", [] ] } } }
},
{
$sort: {"answers_count":1}
}
])
You can use $size attribute to order by array length.
db.getCollection('test').aggregate([
{$project: { "answers": 1, "answer_count": { $size: "$answers" } }},
{$sort: {"answer_count": -1}}])