This seems like a simple enough question but I haven't found an answer:
I'm using MongoDB and I want to perform a query in which I provide search criteria, but I also want to carve out an exception where certain documents are excluded based on criteria. For example, imagine a collection with the fields name, age and gender.
Retrieving everyone below a certain age? Easy: <collection>.find({'age':{'$lt':<maxAge>}})
Retrieving all females below a certain age? Piece of cake: <collection>.find({'gender':female, 'age':{'$lt':<maxAge>}})
But what about retrieving everyone, except if they are [female and below a certain age?]. You can easily negate a specific field with the '$ne' operator, but how do I negate everyone who matches a set of criteria?
You would have to apply boolean logic to invert the AND into an OR of the negation of each term individually:
collection.find({$or: [{age: {$gte: maxAge}}, {gender: {$ne: 'female'}}]})
or
collection.find({$or: [{age: {$not: {$lt: maxAge}}}, {gender: {$ne: 'female'}}]})
Related
Consider the following:
I have a MongoDB collection named C_a. It contains a very large number of documents (e.g., more than 50,000,000).
For the sake of simplicity let's assume that each document has the following schema:
{
"username" : "Aventinus"
"text": "I love StackOverflow!",
"tags": [
"programming",
"mongodb"
]
}
Using text index I can return all documents which contain the keyword StackOverflow like this:
db.C_a.find({$text:{$search:"StackOverflow"}})
My question is the following:
Considering that the query above may return hundreds of thousands of documents, what is the easiest/fastest way to directly save the returned results into another collection named C_b?
Note: This post explains how to use aggregate to find exact matches (i.e., specific dates). I'm interested in using Text Index to save all the posts which include a specific keyword.
The referenced answer is correct. The example query from that answer can be updated to use your criteria:
db.C_a.aggregate([
{$match: {$text: {$search:"StackOverflow"}}},
{$out:"C_b"}
]);
From the MongoDB documentation for $text:
If using the $text operator in aggregation, the following restrictions also apply.
The $match stage that includes a $text must be the first stage in the pipeline.
A text operator can only occur once in the stage.
The text operator expression cannot appear in $or or $not expressions.
The text search, by default, does not return the matching documents in order of matching scores. Use the $meta aggregation expression in the $sort stage.
In find operation fields can be excluded, but what if I want to do a find then a sort and just after then the projection. Do you know any trick, operation for it?
doc: fields {Object}, the fields to return in the query. Object of fields to include or exclude (not both), {‘a’:1}
You can run a usual find query with conditions, projections, and sort. I think you want to sort on a field that you don't want to project. But don't worry about that, you can sort on that field even after not projecting it.
If you explicitly select projection of sorting field as "0", then you won't be able to perform that find query.
//This query will work
db.collection.find(
{_id:'someId'},
{'someField':1})
.sort('someOtherField':1)
//This query won't work
db.collection.find(
{_id:'someId'},
{'someField':1,'someOtherField':0})
.sort('someOtherField':1)
However, if you still don't get required results, look into the MongoDB Aggregation Framework!
Here is the sample query for aggregation according to your requirement
db.collection.aggregate([
{$match: {_id:'someId'}},
{$sort: {someField:1}},
{$project: {_id:1,someOtherField:1}},
])
I wonder, does the order of conditions in $or query matter?
E.g. can this query be reliable to get either document where role matches userrole or any other document only if the one with userrole is not found?
{$or: [{role: 'userrole'},{}]}, {limit: 1}
Order does not matter.
Since the operator $or$ gets evaluated document-wise, the query
{$or: [{role: 'userrole'},{}]}
will evaluate to true on each document, hence it will always return every document in the collection.
If, in addition, you use the limit() method, then the first n documents in the collection (according to their internal ordering) will be returned.
I'm trying to get data from one or more subdocuments but I don't know the name of the field that will hold the subdocument. Here are some examples of what the documents look like.
https://github.com/vz-risk/VCDB/blob/master/data/json/0C5DE044-B9B4-408D-9E65-D367EED12AB2.json
https://github.com/vz-risk/VCDB/blob/master/data/json/064F5887-C2DA-4139-B3AA-D55906F8C30A.json
I would like to get the action varieties for these incidents, so in the case of the first one I would like to get action.malware.variety and action.social.variety. In the second example it would be action.hacking.variety and action.malware.variety. So the problem is that I don't know what field is going to hold the subdocument. It could be one of hacking, malware, social, error, misuse, physical, and environmental.
So I would like to $unwind that field and do some stuff with the key name. Is this something that can be done with aggregation or do I need to switch over to mapReduce?
You seem to be talking about a case where you are not sure if all of the hacking, social or malaware parts are there right. I think you want $project first using the $ifNull operator as in:
db.stuff.aggregate([
{$project:
{
hacking: {$ifNull: ["$action.hacking.variety",[null]]},
social: {$ifNull: ["$action.social.variety",[null]]},
malware: {$ifNull: ["$action.malware.variety",[null]]}
}
},
{$unwind: "$hacking"},
{$unwind: "$social"},
{$unwind: "$malware"}
])
That should give you documents with something in each of those values.
Sort of pretty much the same with any of your possible list of values.
I need some advice in creating and ordering indexes in mongo.
I have a post collection with 5 properties:
Posts
status
start date
end date
lowerCaseTitle
sortOrder
Almost all the posts will have the same status of 1 and only a handful will have a rejected status. All my queries will filter on status, start and end dates, and sort on sortOrder. I also will have one query that does a regex search on the title.
Should I set up a compound key on {status:1, start:1, end:1, sort:1}? Does it matter which order I put the fields in the compound index - should I put status first in the compound index since it's the most broad? Is it better to do a compound index rather than a single index on each property? Does mongo only use a single index on any given query?
Are there any hints for indexes on lowerCaseTitle if I'm doing a regex query on that?
sample queries are:
db.posts.find({status: {$gte:0}, start: {$lt: today}, end: {$gt: today}}).sort({sortOrder:1})
db.posts.find( {lowerCaseTitle: /japan/, status:{$gte:0}, start: {$lt: today}, end: {$gt: today}}).sort({sortOrder:1})
That's a lot of questions in one post ;) Let me go through them in a practical order :
Every query can use at most one index (with the exception of top level $or clauses and such). This includes any sorting.
Because of the above you will definitely need a compound index for your problem rather than seperate per-field indexes.
Low cardinality fields (so, fields with very few unique values across your dataset) should usually not be in the index since their selectivity is very limited.
Order of the fields in your compound index matter, and so does the relative direction of each field in your compound index (e.g. "{name:1, age:-1}"). There's a lot of documentation about compound indexes and index field directions on mongodb.org so I won't repeat all of it here.
Sorts will only use the index if the sort field is in the index and is the field in the index directly after the last field that was used to select the resultset. In most cases this would be the last field of the index.
So, you should not include status in your index at all since once the index walk has eliminated the vast majority of documents based on higher cardinality fields it will at most have 2-3 documents left in most cases which is hardly optimized by a status index (especially since you mentioned those 2-3 documents are very likely to have the same status anyway).
Now, the last note that's relevant in your case is that when you use range queries (and you are) it'll not use the index for sorting anyway. You can check this by looking at the "scanAndOrder" value of your explain() once you test your query. If that value exists and is true it means it'll sort the resultset in memory (scan and order) rather than use the index directly. This cannot be avoided in your specific case.
So, your index should therefore be :
db.posts.ensureIndex({start:1, end:1})
and your query (order modified for clarity only, query optimizer will run your original query through the same execution path but I prefer putting indexed fields first and in order) :
db.posts.find({start: {$lt: today}, end: {$gt: today}, status: {$gte:0}}).sort({sortOrder:1})