In my database I have a field of name. In some records it is an empty string, in others it has a name in it.
In my query, I'm currently doing:
db.users.find({}).sort({'name': 1})
However, this returns results with an empty name field first, then alphabetically returns results. As expected, doing .sort({'name': -1}) returns results with a name and then results with an empty string, but it's in reverse-alphabetical order.
Is there an elegant way to achieve this type of sorting?
How about:
db.users.find({ "name": { "$exists": true } }).sort({'name': 1})
Because after all when a field you want to sort on is not actually present then the returned value is null and therefor "lower" in the order than any positive result. So it makes sense to exclude those results if you really are only looking for something with a matching value.
If you really want all the results in there and regarless of a null content, then I suggest you "weight" them via .aggregate():
db.users.aggregate([
{ "$project": {
"name": 1,
"score": {
"$cond": [
{ "$ifNull": [ "$name", false ] },
1,
10
]
}
}},
{ "$sort": { "score": 1, "name": 1 } }
])
And that moves all null results to the "end of the chain" by assigning a value as such.
If you want to filter out documents with an empty "name" field, change your query: db.users.find({"name": {"$ne": ""}}).sort({"name": 1})
Related
I am beginner in MongoDB and struck at a place I am trying to fetch data from nested array but is it taking so long time as data is around 50K data, also it is not much accurate data, below is schema structure please see once -
{
"_id": {
"$oid": "6001df3312ac8b33c9d26b86"
},
"City": "Los Angeles",
"State":"California",
"Details": [
{
"Name": "Shawn",
"age": "55",
"Gender": "Male",
"profession": " A science teacher with STEM",
"inDate": "2021-01-15 23:12:17",
"Cars": [
"BMW","Ford","Opel"
],
"language": "English"
},
{
"Name": "Nicole",
"age": "21",
"Gender": "Female",
"profession": "Law student",
"inDate": "2021-01-16 13:45:00",
"Cars": [
"Opel"
],
"language": "English"
}
],
"date": "2021-01-16"
}
Here I am trying to filter date with date and Details.Cars like
db.getCollection('news').find({"Details.Cars":"BMW","date":"2021-01-16"}
it is returning details of other persons too which do not have cars- BMW , Only trying to display details of person like - Shawn which have BMW or special array value and date too not - Nicole, rest should not appear but is it not happening.
Any help is appreciated. :)
A combination of $match on the top-level fields and $filter on the array elements will do what you seek.
db.foo.aggregate([
{$match: {"date":"2021-01-16"}}
,{$addFields: {"Details": {$filter: {
input: "$Details",
as: "zz",
cond: { $in: ['BMW','$$zz.Cars'] }
}}
}}
,{$match: {$expr: { $gt:[{$size:"$Details"},0] } }}
]);
Notes:
$unwind is overly expensive for what is needed here and it likely means "reassembling" the data shape later.
We use $addFields where the new field to add (Details) already exists. This effectively means "overwrite in place" and is a common idiom when filtering an array.
The second $match will eliminate docs where the date matches but not a single entry in Details.Cars is a BMW i.e. the array has been filtered down to zero length. Sometimes you want to know this info so if this is the case, do not add the final $match.
I recommend you look into using real dates i.e. ISODate instead of strings so that you can easily take advantage of MongoDB date math and date formatting functions.
Is a common mistake think that find({nested.array:value}) will return only the nested object but actually, this query return the whole object which has a nested object with desired value.
The query is returning the whole document where value BMW exists in the array Details.Cars. So, Nicole is returned too.
To solve this problem:
To get multiple elements that match the criteria you can do an aggregation stage using $unwind to separate the different objects into array and match by the criteria you want.
db.collection.aggregate([
{
"$match": { "Details.Cars": "BMW", "date": "2021-01-26" }
},
{
"$unwind": "$Details"
},
{
"$match": { "Details.Cars": "BMW" }
}
])
This query first match by the criteria to avoid $unwind over all collection.
Then $unwind to get every document and $match again to get only the documents you want.
Example here
To get only one element (for example, if you match by _id and its unique) you can use $elemMatch in this way:
db.collection.find({
"Details.Cars": "BMW",
"date": "2021-01-16"
},
{
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
}
})
Example here
You can use $elemenMatch into query or projection stage. Docs here and here
Using $elemMatch into query the way is this:
db.collection.find({
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
},
"date": "2021-01-16"
},
{
"Details.$": 1
})
Example here
The result is the same. In the second case you are using positional operator to return, as docs says:
The first element that matches the query condition on the array.
That is, the first element where "Cars": "BMW".
You can choose the way you want.
with 2 documents like :
{
"name": "hello",
"family": 1
},
{
"name": "world",
"family": 1,
"category": 2
}
and a query like :
doc.find({$or: [{family: 1}, {category: 2}]})
how can i have results sorted with the one matching the 2 conditions ("world") as a first result but still have the doc matching only 1 condition as a last result ("hello") ?
i can't use the default $and operator as i would not see the "hello" document that do not match both conditions.
i saw how aggregation could help but for a more complex example than that it would be a lot of computation, i'm guessing this is common use case and there must be something obvious i'm missing
You cannot do that sort of query (pun not intended) with a simple .find() statement. What you are asking for involves "weighting", which is applying a "calculated precedence to values.
Anything with "calculation" basically conditions to be programmatically applied, and the particular assertion here to "sort" rules out the "JavaScript runner" options like mapReduce and simply leaves the Aggregation Framework or other handling of the results.
For the aggregation framework approach you would need to $project a calculated "weight" to each matched document based on the conditions:
db.collection.aggregate([
// Same match conditions to filter
{ "$match": { "$or": [{ "family": 1, }, { "category": 2 }] } },
// Assign the "weight" based on conditions
{ "$project": {
"name": 1,
"family": 1,
"weight": {
"$add": [
{ "$cond": {
"if": { "$eq": [ "$family", 1 ] },
"then": 1,
"else": 0
}},
{ "$cond": {
"if": { "$eq": [ "$category", 2 ] },
"then": 1,
"else": 0
}}
]
}
}},
// Then sort "descending" with highest "weight" on top
{ "$sort": { "weight": -1 } }
])
Basically you are using $cond to evaluate the condition that the returned document actually has data meeting your condition, since in the selection either field being present is a valid response. Where the condition is present we assign a value, and where not the value is 0.
When "both" conditions are present the $add operation combines the total in the weight. So here documents that met only one condition have a 1 and for both they have 2. If you waned for example "family" to have the greater preference, then you would assign 2 in the condition, leaving you with possible document scores of:
3 : For both category and family
2 : For family only
1 : For category only
You could shorten the syntax of the $project in MongoDB 3.4 or later with the $addFields pipeline operator instead, which is most useful when you have a "lot" of other document properties you want to return without needing to list them all in the $project.
Aside from this, the database services don't allow for "calculations" on the "sort". This is considered "manipulation", which is the purpose of the Aggregation Framework.
Whilst you can do the same sort of "weighting" by post processing the result set in client code, the issue here is of course where you want to "limit" the results to return in actions like "paging". This is where running the operations on the server comes into play, and the reason why you use the Aggregation Framework for this.
I want to sort a collection by putting items with a specific values before other items.
For example I want all the items with "getthisfirst": "yes" to be before all the others.
{"getthisfirst": "yes"}
{"getthisfirst": "yes"}
{"getthisfirst": "no"}
{"getthisfirst": "maybe"}
This as a general concept is called "weighting". So without any other mechanism in place, then you handle this logically in a MongoDB query by "projecting" the values for the "weight" into the document logically.
Your method for "projecting" and altering the fields present in your document is the .aggregate() method, and specifically it's $project pipeline stage:
db.collection.aggregate([
{ "$project": {
"getthisfirst": 1,
"weight": {
"$cond": [
{ "$eq": [ "$getthisfirst", "yes" ] },
10,
{ "$cond": [
{ "$eq": [ "$getthisfirst", "maybe" ] },
5,
0
]}
]
}
}},
{ "$sort": { "weight": -1 } }
]);
The $cond operator here is a "ternary" ( if/then/else ) condition where the first argument is a conditional statment arriving to boolean true|false. If true "then" the second argument is returned as the result, otherwise the "else" or third argument is returned in response.
In this "nested" case, then where the "yes" is a match then a certain "weight" score is assigned, otherwise we move on to the next condition test where when "maybe" is a match then anoter score is assigned, or otherwise the score is 0 since we only have three posibilities to match.
Then the $sort condition is applied in order to, well "order" ( in decending order ) the results with the largest "weight" on top.
I need help to solve the following issue. My collection has a "targets" field.
Each user can have 0 or more targets.
When I run my query I'd like to retrieve the document with the maximum number of matched targets.
Ex:
documents=[{
targets:{
"cluster":"01",
}
},{
targets:{
"cluster":"01",
"env":"DC",
"core":"PO"
}
},{
targets:{
"cluster":"01",
"env":"DC",
"core":"PO",
"platform":"IG"
}
}];
userTarget={
"cluster":"01",
"env":"DC",
"core":"PO"
}
You seem to be asking to return the document where the most conditions were met, and possibly not all conditions. The basic process is an $or query to return the documents that can match either of the conditions. Then you basically need a statement to calculate "how many terms" were met in the document, and return the one that matched the most.
So the combination here is an .aggregate() statement using the intitial results from $or to calculate and then sort the results:
// initial targets object
var userTarget = {
"cluster":"01",
"env":"DC",
"core":"PO"
};
// Convert to $or condition
// and the calcuation condition to match
var orCondition = [],
scoreCondition = []
Object.keys(userTarget).forEach(function(key) {
var query = {},
cond = { "$cond": [{ "$eq": ["$target." + key, userTarget[key]] },1,0] };
query["target." + key] = userTarget[key];
orCondition.push(query);
scoreCondition.push(cond);
});
// Run aggregation
Model.aggregate(
[
// Match with condition
{ "$match": { "$or": orCondition } },
// Calculate a "score" based on matched fields
{ "$project": {
"target": 1,
"score": {
"$add": scoreCondition
}
}},
// Sort on the greatest "score" (descending)
{ "$sort": { "score": -1 } },
// Return the first document
{ "$limit": 1 }
],
function(err,result) {
// check errors
// Remember that result is an array, even if limitted to one document
console.log(result[0]);
}
)
So before processing the aggregate statement, we are going to generate the dynamic parts of the pipeline operations based on the input in the userTarget object. This would produce an orCondition like this:
{ "$match": {
"$or": [
{ "target.cluster" : "01" },
{ "target.env" : "DC" },
{ "target.core" : "PO" }
]
}}
And the scoreCondition would expand to a coding like this:
"score": {
"$add": [
{ "$cond": [{ "$eq": [ "$target.cluster", "01" ] },1,0] },
{ "$cond": [{ "$eq": [ "$target.env", "DC" ] },1,0] },
{ "$cond": [{ "$eq": [ "$target.core", "PO" ] },1,0] },
]
}
Those are going to be used in the selection of possible documents and then for counting the terms that could match. In particular the "score" is made by evaluating each condition within the $cond ternary operator, and then either attributing a score of 1 where there was a match, or 0 where there was not a match on that field.
If desired, it would be simple to alter the logic to assign a higher "weight" to each field with a different value going towards the score depending on the deemed importance of the match. At any rate, you simply $add these score results together for each field for the overall "score".
Then it is just a simple matter of applying the $sort to the returned "score", and then using $limit to just return the top document.
It's not super efficient, since even though there is a match for all three conditions the basic question you are asking of the data cannot presume that there is, hence it needs to look at all data where "at least one" condition was a match, and then just work out the "best match" from those possible results.
Ideally, I would personally run an additional query "first" to see if all three conditions were met, and if not then look for the other cases. That still is two separate queries, and would be different from simply just pushing the "and" conditions for all fields as the first statement in $or.
So the preferred implementation I think should be:
Look for a document that matches all given field values; if not then
Run the either/or on every field and count the condition matches.
That way, if all fields match then the first query is fastest and only needs to fall back to the slower but required implementaion shown in the listing if there was no actual result.
My recent project encountered the same problem as this one: the question
db.test.update(
{name:"abc123", "config.a":1 },
{$addToSet:{ config:{a:1,b:2} } },
true
)
Will produce such error:
Cannot apply $addToSet to a non-array field
But after changed to:
db.test.update(
{name:"abc123", "config.a":{$in:[1]} },
{$addToSet:{ config:{a:1,b:2} } },
true
)
It works fine.
Also referenced this link: Answer
Can Any one explain what's going on? "config.a":1 will turn config to be an object? Where "config.a":{$in:[1]} won't?
What you are trying to do here is add a new item to an array only where the item does not exist and also create a new document where it does not exist. You choose $addToSet because you want the items to be unique, but in fact you really want them to be unique by "a" only.
So $addToset will not do that, and you rather need to "test" the element being present. But the real problem here is that it is not possible to both do that and "upsert" at the same time. The logic cannot work as a new document will be created whenever the array element was not found, rather than append to the array element like you want.
The current operation errors by design as $addToSet cannot be used to "create" an array, but only to "add" members to an existing array. But as stated already, you have other problems with achieving the logic.
What you need here is a sequence of update operations that each "try" to perform their expected action. This can only be done with multiple statements:
// attempt "upsert" where document does not exist
// do not alter the document if this is an update
db.test.update(
{ "name": "abc" },
{ "$setOnInsert": { "config": [{ "a": 1, "b": 2 }] }},
{ "upsert": true }
)
// $push the element where "a": 1 does not exist
db.test.update(
{ "name": "abc", "config.a": { "$ne": 1 } },
{ "$push": { "config": { "a": 1, "b": 2 } }}
)
// $set the element where "a": 1 does exist
db.test.update(
{ "name": "abc", "config.a": 1 },
{ "$set": { "config.$.b": 2 } }
)
On a first iteration the first statement will "upsert" the document and create the array with items. The second statement will not match the document because the "a" element has the value that was specified. The third statement will match the document but it will not alter it in a write operation because the values have not changed.
If you now change the input to "b": 3 you get different responses but the desired result:
db.test.update(
{ "name": "abc" },
{ "$setOnInsert": { "config": [{ "a": 1, "b": 3 }] }},
{ "upsert": true }
)
db.test.update(
{ "name": "abc", "config.a": { "$ne": 1 } },
{ "$push": { "config": { "a": 1, "b": 3 } }}
)
db.test.update(
{ "name": "abc", "config.a": 1 },
{ "$set": { "config.$.b": 3 } }
)
So now the first statement matches a document with "name": "abc" but does not do anything since the only valid operations are on "insert". The second statement does not match because "a" matches the condition. The third statment matches the value of "a" and changes "b" in the matched element to the desired value.
Subsequently changing "a" to another value that does not exist in the array allows both 1 and 3 to do nothing but the second statement adds another member to the array keeping the content unique by their "a" keys.
Also submitting a statement with no changes from existing data will of course result in a response that says nothing is changed on all accounts.
That's how you do your operations. You can do this with "ordered" Bulk operations so that there is only a single request and response from the server with the valid response to modified or created.
As explained in this issue on the MongoDB JIRA (https://jira.mongodb.org/browse/SERVER-3946), this can be solved in a single query:
The following update fails because we use $addToSet on a field which we have also included in the first argument (the field which accepts the fields and values to query for). As far as I understand it, you can't use upsert: true in this scenario where we $addToSet to the same field we query with.
db.foo.update({x : "a"}, {$addToSet: {x: "b"}} , {upsert: true}); // FAILS
The solution is to use $elemMatch: {$eq: field: value}
db.foo.update({x: {$elemMatch: {$eq: "a"}}}, {$addToSet: {x: "b"}}, {upsert: true});