MongoDB Aggregate $project - mongodb

I store our web server logs in MongoDB and the schema looks similar to as follows:
[
{
"_id" : 12345,
"url" : "http://www.mydomain.com/xyz/abc.html",
....
},
....
]
I am trying to use the $project operator to reshape this schema a little bit before I start passing my collection through an aggregation pipeline. Basically, I need to add a new field called "type" that will later be used to perform group-by. The logic for the new field is pretty simple.
if "url" contains "pattern_A" then set "type" = "sales lead";
else if "url" contains "pattern_B" then set "type" = "existing client";
...
I'm thinking it would have to be something like this:
db.weblog.aggregate(
{
$project : {
type : { /* how to implement the logic??? */ }
}
}
);
I know how to do this using map-reduce (by setting the "keyf" attribute to a custom JS function that implements the above logic) but am now trying to use the new aggregation framework to do this. I tried to implement the logic using the expression operators but so far couldn't get it to work. Any help/suggestion would be greatly appreciated!

I am sharing my "solution" in case others encounter the same needs like mine.
After researching for a couple of weeks, as #asya-kamsky suggested in one of his comments, I've decided to add a computed field to my original MongoDB schema. It's not ideal because whenever the logic for the computed field changes I would have to do bulk updates to update all documents in my collection but it was either that or rewrite my code to use MapReduce. I chose the former for now. In looking at MongoDB Jira board, it would appear that many people have asked for more diverse operators to be added for the $project operator and I certainly hope that the MongoDB dev team gets around to adding them sooner than later
Operator for splitting string based on a separator.
New projection operator $elemMatch
Allow $slice operator in $project
add a $inOrder operator to $project

You need to use combination of several operators and expressions.
first, the $cond operator in $project lets you implement if then else logic.
$cond : takes an array of three elements, first a boolean expression, second and third are values to use for the field value - if boolean expression is true then it uses second element for value, if not then third element.
you can nest these so that third element is itself a $cond expression to get if-then-else-if-then-etc.
string manipulation is a little awkward but you do have $substr available.
If you post some examples of what exactly you tried, I may be able to spot why it didn't work.

Related

MongoDB conditional query on nested document array

Hi I'm trying to write a conditional query on nested document array.
I've read the document for days and couldn't figure out how to make this work.
DB looks like below :
[
{
"id":1,
"team":"team1",
"players":[
{
"name":"Mario",
"substitutes":[
"Luigi",
"Yoshi"
]
},
{
"name":"Wario",
"substitutes":[
]
}
]
},
{
"id":2,
"team":"team2",
"players":[
{
"name":"Bowser",
"substitutes":[
"Toad",
"Mario"
]
},
{
"name":"Wario",
"substitutes":[
]
}
]
}
]
Due to my lack of English, it's hard to put but what I'm trying to do is
to find teams that includes all queried players.
Each object in players array, some have substitutes.
For each objects in players array, if one of the queried players is not the main player("players.name"), then I want it to look for if one of substitutes("players.substitutes") is.
Team.find({players:{$in:[ 'Mario', 'Wario' ]}}) (mongoose query)
this will give me an array with 'team1'.
but what I want to get is both teams because 'Mario' is one of the substitutes for 'Bowser'(team2).
I failed to make a query but what I've been trying is not to use $where since the official MongoDB docs says :
AGGREGATION ALTERNATIVES PREFERRED
Starting in MongoDB 3.6, the $expr operator allows the use of
aggregation expressions within the query language. And, starting in
MongoDB 4.4, the $function and $accumulator allows users to define
custom aggregation expressions in JavaScript if the provided pipeline
operators cannot fulfill your application’s needs.
Given the available aggregation operators:
The use of $expr with aggregation operators that do not use JavaScript
(i.e. non-$function and non-$accumulator operators) is faster than
$where because it does not execute JavaScript and should be preferred
if possible. However, if you must create custom expressions, $function
is preferred over $where.
BUT if it could be easily written with $where operator then it's totally fine.
Any suggestions or ideas that lead to any further would be highly appreciated.
Firstly, your query is incorrect. And it is not very obvious what exactly is your filter criteria. So I am giving two suggestions:
If you want to filter all documents that have name defined in your matching criteria (which returns both documents):
db.Team.find({"players.name":{$in:[ 'Mario', 'Wario' ]}}).pretty()
If you want to filter all documents that have any provided player names in the substitutes array (which returns only one, because team1 doesn't have any substitutes are Mario/Wario)
db.Team.find({"players.substitutes":{$in:[ 'Mario', 'Wario' ]}}).pretty()
The names being looked at could be present in name or substitute
db.Team.find({ $or: [{"players.substitutes":{$in:[ 'Mario', 'Wario' ]}}, {"players.name":{$in:[ 'Mario', 'Wario' ]}}] }).pretty()

$elemMatch Projection on a Simple Array

Imagine a collection of movies (stored in a MongoDB collection), with each one looking something like this:
{
_id: 123456,
name: 'Blade Runner',
buyers: [1123, 1237, 1093, 2910]
}
I want to get a list of movies, each one with an indication whether buyer 2910 (for example) bought it.
Any ideas?
I know I can change [1123, 1237, 1093, 2910] to [{id:1123}, {id:1237}, {id:1093}, {id:2910}] to allow the use of $elemMatch in the projection, but would prefer not to touch the structure.
I also know I can perhaps use the $unwind operator (within the aggregation framework), but that seems very wasteful in cases where buyer has thousands of values (basically exploding each document into thousands of copies in memory before matching).
Any other ideas? Am I missing something really simple here?
You can use the $setIsSubset aggregation operator to do this:
var buyer = 2910;
db.movies.aggregate(
{$project: {
name: 1,
buyers: 1,
boughtIt: {$setIsSubset: [[buyer], '$buyers']}
}}
)
That will give you all movie docs with a boughtIt field added that indicates whether buyer is contained in the the movie's buyers array.
This operator was added in MongoDB 2.6.
Not really sure of your intent here, but you don't need to change the structure just to use $elemMatch in projection. You can just issue like this:
db.movies.find({},{ "buyers": { "$elemMatch": { "$eq": 2910 } } })
That would filter the returned array elements to just the "buyer" that was indicated, or nothing where this was not present. It is true to point out that the $eq operator used here is not actually documented, but it does exist. So that may not be immediately clear that you can construct a condition in that way.
It seems a little wasteful to me though as you are returning "everything" regardless of whether the "buyer" is present or not. So a "query" seems more logical than a projection:
db.movies.find({ "buyers": 2910 })
And optionally either just keeping only that matched result:
db.movies.find({ "buyers": 2910 },{ "buyers.$": 1})
Set operators in the aggregation framework give you more options with $project which can do more to alter the document. But if you just want to know if someone "bought" the item, then a "query" seems the be logical and fastest way to do so.

How to comapre all records of two collections in mongodb using mapreduce?

I have an use case in which I want to compare each record of two collections in mongodb and after comparing each record I need to find mismatch fields of all record.
Let us take an example, in collection1 I have one record as {id : 1, name : "bks"}
and in collection2 I have a record as {id : 1, name : "abc"}
When I compare above two records with same key, then field name is a mismatch field as name is different.
I am thinking to achieve this use case using mapreduce in mongodb. But I am facing some problems while accessing collection name in map function. When I tried to compare it in map function, I got error as : "errmsg" : "exception: ReferenceError: db is not defined near '
Can anyone give me some thoughts on how to compare records using mapreduce?
I might have helped you to read the documentation:
When upgrading to MongoDB 2.4, you will need to refactor your code if your map-reduce operations, group commands, or $where operator expressions include any global shell functions or properties that are no longer available, such as db.
So from your error fragment, you appear to be referencing db in order to access another collection. You cannot do that.
If indeed you are intending to "compare" items in one collection to those in another, then there is no other approach other than looping code:
db.collection.find().forEach(function(doc) {
var another = db.anothercollection.findOne({ "_id": doc._id });
// Code to compare
})
There is simply no concept of "joins" as such available to MongoDB, and operations such as mapReduce or aggregate or others strictly work with one collection only.
The exception is db.eval(), but as per all of strict warnings in the documentation, this is almost always a very bad idea.
Live with your comparison in looping code.

Can you match sub-fields with $all in Mongo?

I have a collection of document, where each document looks like this:
{'name' : 'John', 'locations' :
[
{'place' : 'Paris', 'been' : true}
{'place' : 'Moscow', 'been' : false}
{'place' : 'Berlin', 'been' : true}
]
}
Where the locations array could have any length.
I want to match documents where the been field is true for all elements in the locations array. Looking at the documentation it looks like I should use $and somehow but I'm not sure if it works with sub-fields.
There are several options:
use $ne: db.destinations.find({"locations.been":{$ne:false}})
change your business logic to precompute that value before saving the document. Otherwise, this search must look through all records and then all places. This value could be indexed.
use the $where operator, but, understand the performance implications. It may require a full table scan. In this case, it would.
write a map-reduce function with the filter logic and only emit those that are valid. You'd need to incrementally update it per the docs.
write a query using the aggregation framework. There are a lot of good examples here. Although, like other solutions, this could end up looping through the entire collection.
I think it's impossible to do with standart MongoDB operators like $elemMatch or $all. The only possible way is to write custom JS query:
db.test.find("return this.locations.every(function(loc){return loc.been});")

In MongoDB, can you index a field to find objects that don't have a value in an array?

I know that Multikeys allow you to efficient find objects that have an array as a field, where a particular value is present in that array.
For example, you could store an object:
{
"ar":["book","cat"]
}
And then, provided the "ar" field is indexed, you could say:
db.blah.find({"ar":"cat"})
And it will efficiently find the above object.
However, would something like this work:
db.blah.find({"ar":{$not : "cat"}})
Here I'd like to find all objects where the "ar" array does not contain a "cat". Would this query work, and if it works, would it be efficient? (ie. would it use the index on the "ar" field?)
If you take a look at this MongoDB server issue, the answer is "it can use the index".
However, when using the $not operator, it's often not very efficient to use the index. If 'cat' appears in 2% of the entries, then you still have to read through 98% of the data. At that point, you might as well just read the whole data set one entry at a time.
The $not operator is not used in the way you imply, it is a meta operator that is used to negate the check of another operator only. I think you actually mean to use $ne:
db.blah.find({ "ar" : { $ne : "cat"}})
If you do that and you have an index on "ar" then because it is a negative match you are going to have to scan just about all of the documents in the index to check each one - that is actually less efficient than scanning the table alone, because you have to do the index load/scan, then pull all the data to return results too.
If this is important enough and frequent enough to want to avoid this occurring, then why not add a simple field that is true/false (or 0/1 in my example) to test for the existence of the "cat" field. Here, I am adding such a field to an existing data set, I would suggest adding it on creation or modification to add "cat" in the future though - avoiding batch operations is usually a good idea:
db.blah.update({ "ar" : "cat"}, { $set : {"cat_test" : 1 } }, true, true)
db.blah.update({ "ar" : {$ne : "cat"}}, { $set : {"cat_test" : 0 } }, true, true)
db.blah.ensureIndex({ "cat_test" : 1 })
Now you can just run something like:
db.blah.find({ "cat_test" : 0 })
That will make efficient use of an index. Whether or not this is worth it will depend on your data usage and your model, of course