Combine MonogDB aggregate function with $in or $nin - mongodb

I'm trying to query data using MongoDB query which combine $match together with $nin operators:
coll = mdb.get_collection_by_name('Client')
query = [
{"$group": {"_id": {"Domain": "$Domain"}, "count":{"$sum":1}}},
{"$match": { "UserName": { "$nin": excluded_users}}},
{"$sort": { "count": -1 } }
]
res = list(coll.aggregate(query))
print(res)
As it seems, MongoDB ignores my $match line and calculate records which contains excluded users as well. I tried to search for similar examples but didn't find queries which combine $match with $in or $nin. Is that supported?
Am I doing something wrong? What is the best way to combine aggregate queries with $nin or $in operators?
Thanks

A $group step removes all fields which are not explicitly stated. That means the documents which arrive in your $match stage only have the fields _id and count. There is no field UserName anymore you could match by. If you want to filter by UserName, you need to make sure the UserName's are also part of the documents generated by $group.
Because $group squashes multiple documents into one, you need to specify how to deal if multiple of the grouped documents have different values for UserName. Options are $last/$first to use the value of the last/first aggregated document (warning: without an explicit $sort before $group, the order is unpredictable), $addToSet to create an array of all unique values or $push to create an array of all values with duplicates.
But chances are, that you want to filter the documents with the unwanted documents before you group them, so a better solution might be to move $match before $group.

Related

Update MongoDB collection, create new field with value of exiting field, but field is an array

I am trying to create a new field, and set its value to that of an existing array object that resides in the same document.
I have tried 2 approaches:
db.collection.aggregate( [ { $addFields: { "newField": "$oldField"} } ] )
This works great, but only updates 20 documents, not all documents in the collection.
db.collection.update(
{},
{ $set: {"newField": "$oldField"} },
false,
true
)
This updates all documents in the collection, but sets them all to the string "$oldField", and not the value of the object oldField.
How can I update all documents in my collection, adding a new field and setting its value to that of an existing field, which is an array?
Thank you!
Aggregate doesn't change the database unless you are using a stage like $out or $merge as final stage. Aggregate is returning the data to the client, not to the database.
Updates can be done in 2 ways
the older update operators (they are simple and fast)
using a pipeline and update operators (they are powerful, can use all aggregate operators, but not all aggregate stage operator)
In your case you need to refer to a field so you need a pipeline update.
Its the same as the aggregation, just the pipeline is argument to the update method.
Query
(you can use updateMany instead of multi:true)
update({},
[{"$addFields": {"newField": "$oldField"}}],
{"multi": true})

Can an index on a subfield cover queries on projections of that field?

Imagine you have a schema like:
[{
name: "Bob",
naps: [{
time: 2019-05-01T15:35:00,
location: "sofa"
}, ...]
}, ...
]
So lots of people, each with a few dozen naps. You want to find out 'what days do people take the most naps?', so you index naps.time, and then query with:
aggregate([
{$unwind: naps},
{$group: { _id: {$day: "$naps.time"}, napsOnDay: {"$sum": 1 } }
])
But when doing explain(), mongo tells me no index was used in this query, when clearly the index on the time Date field could have been. Why is this? How can I get mongo to use the index for the more optimal query?
Indexes stores pointers to actual documents, and can only be used when working with a material document (i.e. the document that is actually stored on disk).
$match or $sort does not mutate the actual documents, and thus indexes can be used in these stages.
In contrast, $unwind, $group, or any other stages that changes the actual document representations basically loses the connection between the index and the material documents.
Additionally, when those stages are processed without $match, you're basically saying that you want to process the whole collection. There is no point in using the index if you want to process the whole collection.

MongoDB: How To Save Returned Results To Another Collection?

Consider the following:
I have a MongoDB collection named C_a. It contains a very large number of documents (e.g., more than 50,000,000).
For the sake of simplicity let's assume that each document has the following schema:
{
"username" : "Aventinus"
"text": "I love StackOverflow!",
"tags": [
"programming",
"mongodb"
]
}
Using text index I can return all documents which contain the keyword StackOverflow like this:
db.C_a.find({$text:{$search:"StackOverflow"}})
My question is the following:
Considering that the query above may return hundreds of thousands of documents, what is the easiest/fastest way to directly save the returned results into another collection named C_b?
Note: This post explains how to use aggregate to find exact matches (i.e., specific dates). I'm interested in using Text Index to save all the posts which include a specific keyword.
The referenced answer is correct. The example query from that answer can be updated to use your criteria:
db.C_a.aggregate([
{$match: {$text: {$search:"StackOverflow"}}},
{$out:"C_b"}
]);
From the MongoDB documentation for $text:
If using the $text operator in aggregation, the following restrictions also apply.
The $match stage that includes a $text must be the first stage in the pipeline.
A text operator can only occur once in the stage.
The text operator expression cannot appear in $or or $not expressions.
The text search, by default, does not return the matching documents in order of matching scores. Use the $meta aggregation expression in the $sort stage.

How to project in MongoDB after sort?

In find operation fields can be excluded, but what if I want to do a find then a sort and just after then the projection. Do you know any trick, operation for it?
doc: fields {Object}, the fields to return in the query. Object of fields to include or exclude (not both), {‘a’:1}
You can run a usual find query with conditions, projections, and sort. I think you want to sort on a field that you don't want to project. But don't worry about that, you can sort on that field even after not projecting it.
If you explicitly select projection of sorting field as "0", then you won't be able to perform that find query.
//This query will work
db.collection.find(
{_id:'someId'},
{'someField':1})
.sort('someOtherField':1)
//This query won't work
db.collection.find(
{_id:'someId'},
{'someField':1,'someOtherField':0})
.sort('someOtherField':1)
However, if you still don't get required results, look into the MongoDB Aggregation Framework!
Here is the sample query for aggregation according to your requirement
db.collection.aggregate([
{$match: {_id:'someId'}},
{$sort: {someField:1}},
{$project: {_id:1,someOtherField:1}},
])

Aggregate framework can't use indexes

I run this command:
db.ads_view.aggregate({$group: {_id : "$campaign", "action" : {$sum: 1} }});
ads_view : 500 000 documents.
this queries take 1.8s . this is its profile : https://gist.github.com/afecec63a994f8f7fd8a
indexed : db.ads_view.ensureIndex({campaign: 1});
But mongodb don't use index. Anyone know if can aggregate framework use indexes, how to index this query.
This is a late answer, but since $group in Mongo as of version 4.0 still won't make use of indexes, it may be helpful for others.
To speed up your aggregation significantly, performe a $sort before $group.
So your query would become:
db.ads_view.aggregate({$sort:{"campaign":1}},{$group: {_id : "$campaign", "action" : {$sum: 1} }});
This assumes an index on campaign, which should have been created according to your question. In Mongo 4.0, create the index with db.ads_view.createIndex({campaign:1}).
I tested this on a collection containing 5.5+ Mio. documents. Without $sort, the aggregation would not have finished even after several hours; with $sort preceeding $group, aggregation is taking a couple of seconds.
The $group operator is not one of the ones that will use an index currently. The list of operators that do (as of 2.2) are:
$match
$sort
$limit
$skip
From here:
http://docs.mongodb.org/manual/applications/aggregation/#pipeline-operators-and-indexes
Based on the number of yields going on in the gist, I would assume you either have a very active instance or that a lot of this data is not in memory when you are doing the group (it will yield on page fault usually too), hence the 1.8s
Note that even if $group could use an index, and your index covered everything being grouped, it would still involve a full scan of the index to do the group, and would likely not be terrible fast anyway.
$group doesn't use an index because it doesn't have to. When you $group your items you're essentially indexing all documents passing through the $group stage of the pipeline using your $group's _id. If you used an index that matched the $group's _id, you'd still have to pass through all the docs in the index so it's the same amount of work.