mongo $slice query reverse index out of range - mongodb

The following query in mongo, behaves strange :
db.items.findOne({},{ "List": { "$slice": [ skip, 3 ] }})
First:
Instead of returning one object with ["_id","List"] keys only, it returns a full object.
Second:
if skip is negative and |skip| is higher than list.length then it returns the first three elements as though skip==0
I would expect for:
{
"_id" : ObjectId("542babf265f5de9a0d5c2928"),
"List" : [
1,
2,
3,
4,
5
]
"other" : "not_important"
}
query:
db.items.findOne({},{ "List": { "$slice": [-10, 3 ] }})
to get:
{
"_id" : ObjectId("542babf265f5de9a0d5c2928"),
"List" : []
}
instead, I get:
{
"_id" : ObjectId("542babf265f5de9a0d5c2928"),
"List" : [
1,
2,
3
]
"other" : "not_important"
}
Why?
I use mongoDB 2.4.10

Second: if skip is negative and |skip| is higher than list.length then it returns the first three elements as though skip==0
Yes. That is how the javascript Array.prototype.slice() method works, which is internally used by mongodb.
According to the ECMAScript® Language Specification,
If relativeStart is negative, let k be max((len + relativeStart),0);
else let k be min(relativeStart, len).
In your case relativeStart is -10,
k = max((-10+5),0), k = 0; (where, 5 is the length of your array).
Hence k or skip will always be 0, in these cases.
First: Instead of returning one object with ["_id","List"] keys only, it returns a full object.
Yes, the projection operator works that way. Unless a inclusion or exclusion is explicitly specified in the projection parameter, the whole document is retrieved with the projection operators such as $slice,$elemmatch being applied.
db.items.findOne({},{"_id":1,"List": { "$slice": [-10, 3 ] }})
would return:
{ "_id" : ObjectId("542babf265f5de9a0d5c2928"), "List" : [ 1, 2, 3 ] }
The second parameter to the findOne() method is not only for simple projection purpose, fields are not projected, only if any one of the field names have a value of 0 or 1 against them. If not the whole document is returned. If any field has a projection operator to be applied, it would be applied and projected.
The projection mechanism seems to happen in the below manner, whenever the $slice operator is involved.
By default all the fields would be included for projection.
By Default all the fields whose values are derived based on the projection operator, $slice, if truthy, are always displayed, irrespective of the below.
Steps taking place for exclusion or inclusion.
The list of fields specified in the projection parameter are accumulated in their specified order.
For only the first field encountered with value '0' or '1':
If the
field has a value '0' - then it is excluded, and all the remaining
fields are marked to be included.
If a field has '1' - then it is included, and all the remaining fields
are marked to be excluded.
For all the subsequent fields, they are excluded or included based on
their values.

Whilst this behavior is by design for the $slice operator, it is possible since MongoDB 3.2 to evaluate this and alter the result with the aggregation operator for $slice:
Given the example documents:
{ "_id" : ObjectId("5922846dbcf60428d0f69f6e"), "a" : [ 1, 2, 3, 4 ] }
{ "_id" : ObjectId("5922847cbcf60428d0f69f6f"), "a" : [ 5, 6 ] }
If given a conditional expression to test against the length of the array with $size and only perform the $slice when the reverse index was greater than or equal to that length, or otherwise return an empty array:
db.collection.aggregate([
{ "$project": {
"a": {
"$cond": {
"if": { "$gte": [ { "$size": "$a" }, 4 ] },
"then": { "$slice": [ "$a", -4, 2 ] },
"else": { "$literal": [] },
}
}
}}
])
Then of course you get:
{ "_id" : ObjectId("5922846dbcf60428d0f69f6e"), "a" : [ 1, 2 ] }
{ "_id" : ObjectId("5922847cbcf60428d0f69f6f"), "a" : [ ] }
So that is how you could get MongoDB to return a "slice" that acts in this way.

Related

Searching with Precedence on Array Order

My gut feeling is that the answer is no, but is it possible to perform a search in Mongodb comparing the similarity of arrays where order is important?
E.g.
I have three documents like so
{'_id':1, "my_list": ["A",2,6,8,34,90]},
{'_id':2, "my_list": ["A","F",2,6,19,8,90,55]},
{'_id':3, "my_list": [90,34,8,6,3,"A"]}
1 and 2 are the most similar, 3 is wildly different irrespective of the fact it contains all of the same values as 1.
Ideally I would do a search similar to {"my_list" : ["A",2,6,8,34,90] } and the results would be document 1 and 2.
It's almost like a regex search with wild cards. I know I can do this in python easily enough, but speed is important and I'm dealing with 1.3 million documents.
Any "comparison" or "selection" is actually more or less subjective to the actual logic applied. But as a general principle you could always consider the product of the matched indices from the array to test against and the array present in the document. For example:
var sample = ["A",2,6,8,34,90];
db.getCollection('source').aggregate([
{ "$match": { "my_list": { "$in": sample } } },
{ "$addFields": {
"score": {
"$add": [
{ "$cond": {
"if": {
"$eq": [
{ "$size": { "$setIntersection": [ "$my_list", sample ] }},
{ "$size": { "$literal": sample } }
]
},
"then": 100,
"else": 0
}},
{ "$sum": {
"$map": {
"input": "$my_list",
"as": "ml",
"in": {
"$multiply": [
{ "$indexOfArray": [
{ "$reverseArray": "$my_list" },
"$$ml"
]},
{ "$indexOfArray": [
{ "$reverseArray": { "$literal": sample } },
"$$ml"
]}
]
}
}
}}
]
}
}},
{ "$sort": { "score": -1 } }
])
Would return the documents in order like this:
/* 1 */
{
"_id" : 1.0,
"my_list" : [ "A", 2, 6, 8, 34, 90],
"score" : 155.0
}
/* 2 */
{
"_id" : 2.0,
"my_list" : ["A", "F", 2, 6, 19, 8, 90, 55],
"score" : 62.0
}
/* 3 */
{
"_id" : 3.0,
"my_list" : [ 90, 34, 8, 6, 3, "A"],
"score" : 15.0
}
The key being that when applied using $reverseArray, the values from $indexOfArray will be "larger" produced by the matching index on order from "first to last" ( reversed ) which gives a larger "weight" to matches at the beginning of the array than those as it moves towards the end.
Of course you should make consideration for things like the second document does in fact contain "most" of the matches and have more array entries would place a "larger" weight on the initial matches than in the first document.
From the above "A" scores more in the second document than in the first because the array is longer even though both matched "A" in the first position. However there is also some effect that "F" is a mismatch and therefore has a greater negative effect than it would if it was later in the array. Same applies to "A" in the last document, where at the end of the array the match has little bearing on the overall weight.
The counter to this in consideration is to add some logic to consider the "exact match" case, such as here the $size comparison from the $setIntersection of the sample and the current array. This would adjust the scores to ensure that something that matched all provided elements actually scored higher than a document with less positional matches, but more elements overall.
With a "score" in place you can then filter out results ( i.e $limit ) or whatever other logic you can apply in order to only return the actual results wanted. But the first step is calculating a "score" to work from.
So it's all generally subjective to what logic actually means a "nearest match", but the $reverseArray and $indexOfArray operations are generally key to putting "more weight" on the earlier index matches rather than the last.
Overall you are looking for "calculation" of logic. The aggregation framework has some of the available operators, but which ones actually apply are up to your end implementation. I'm just showing something that "logically works" to but more weight on "earlier matches" in an array comparison rather than "latter matches", and of course the "most weight" where the arrays are actually the same.
NOTE: Similar logic could be achieved using the includeArrayIndex option of $unwind for earlier version of MongoDB without the main operators used above. However the process does require usage of $unwind to deconstruct arrays in the first place, and the performance hit this would incur would probably negate the effectiveness of the operation.

Sort a mongo collection by a string field [duplicate]

I want to sort a collection by putting items with a specific values before other items.
For example I want all the items with "getthisfirst": "yes" to be before all the others.
{"getthisfirst": "yes"}
{"getthisfirst": "yes"}
{"getthisfirst": "no"}
{"getthisfirst": "maybe"}
This as a general concept is called "weighting". So without any other mechanism in place, then you handle this logically in a MongoDB query by "projecting" the values for the "weight" into the document logically.
Your method for "projecting" and altering the fields present in your document is the .aggregate() method, and specifically it's $project pipeline stage:
db.collection.aggregate([
{ "$project": {
"getthisfirst": 1,
"weight": {
"$cond": [
{ "$eq": [ "$getthisfirst", "yes" ] },
10,
{ "$cond": [
{ "$eq": [ "$getthisfirst", "maybe" ] },
5,
0
]}
]
}
}},
{ "$sort": { "weight": -1 } }
]);
The $cond operator here is a "ternary" ( if/then/else ) condition where the first argument is a conditional statment arriving to boolean true|false. If true "then" the second argument is returned as the result, otherwise the "else" or third argument is returned in response.
In this "nested" case, then where the "yes" is a match then a certain "weight" score is assigned, otherwise we move on to the next condition test where when "maybe" is a match then anoter score is assigned, or otherwise the score is 0 since we only have three posibilities to match.
Then the $sort condition is applied in order to, well "order" ( in decending order ) the results with the largest "weight" on top.

retrieve only first two elements from an array in mongoose also if there is only one then to get only that element?

I have a field array of strings now i want to get only first two strings from the array now can someone please tell me how to do that also suppose photos r=are only 1 now how to only get first element is there any particular way of doing this i have read about projection, slice operators but couldn't figure out what to use and how, also if i have only one element then in that case would i first have to calculate the size of array to check if its size is greater then 2 then get first two elements otherwise get only one eleemnt ??
The operator is of course $slice to return just the required elements by indexed positions.
Consider the following sample:
{ "list" : [ 1, 2, 3 ] }
{ "list" : [ 1, 2 ] }
{ "list" : [ 1 ] }
{ "list" : [ ] }
If you then use the projection part of a query like so:
db.collection.find({}, { "$slice": [0,2] })
Then you are asking for the two elements starting from the 0 index position, which is the first two elements.
Then the result is:
{ "list" : [ 1, 2 ] }
{ "list" : [ 1, 2 ] }
{ "list" : [ 1 ] }
{ "list" : [ ] }
So it just does not care how many elements are actually there, it just retrives the elements requested.
Language or framework makes no difference. The operator issued to MongoDB is the only thing that matters. Typically, all "projection" ( which is where you use the operator ) is handled in the second argument to a .find() or similar operation.

Is there a way to prevent mongo queries "branching" on arrays?

If I have the following documents:
{a: {x:1}} // without array
{a: [{x:1}]} // with array
Is there a way to query for {'a.x':1} that will return the first one but not the second one? IE, I want the document where a.x is 1, and a is not an array.
Please note that future version of MongoDB would incorporate the $isArray aggregation expression. In the meantime...
...the following code will do the trick as the $elemMatch operator matches only documents having an array field:
> db.test.find({"a.x": 1, "a": {$not: {$elemMatch: {x:1}}}})
Given that dataset:
> db.test.find({},{_id:0})
{ "a" : { "x" : 1 } }
{ "a" : [ { "x" : 1 } ] }
{ "a" : [ { "x" : 0 }, { "x" : 1 } ]
It will return:
> db.test.find({"a.x": 1, "a": {$not: {$elemMatch: {x:1}}}}, {_id:0})
{ "a" : { "x" : 1 } }
Please note this should be considered as a short term solution. The MongoDB team took great cares to ensure that [{x:1}] and {x:1} behave the same (see dot-notation or $type for arrays). So you should consider that at some point in the future, $elemMatch might be updated (see JIRA issue SERVER-6050). In the meantime, maybe worth considering fixing your data model so it would no longer be necessary to distinguish between an array containing one subdocument and a bare subdocument.
You can do this by adding a second term that ensures a has no elements. That second term will always be true when a is a plain subdoc, and always false when a is an array (as otherwise the first term wouldn't have matched).
db.test.find({'a.x': 1, 'a.0': {$exists: false}})

Mongodb query with fields in the same documents

I have the following json:
{
"a1": {"a": "b"},
"a2": {"a": "c"}
}
How can I request all documents where a1 and a2 are not equal in the same document?
You could use $where:
db.myCollection.find( { $where: "this.a1.a != this.a2.a" } )
However, be aware that this won't be very fast, because it will have to spin up the java script engine and iterate each and every document and check the condition for each.
If you need to do this query for large collections, or very often, it's best to introduce a denormalized flag, like areEqual. Still, such low-selectivity fields don't yield good index performance, because he candidate set is still large.
update
using the new $expr operator available as of mongo 3.6 you can use aggregate expressions in find query like this:
db.myCollection.find({$expr: {$ne: ["$a1.a", "$a2.a"] } });
Although this comment solves the problem, I think a better match for this use case would be to use $addFields operator available as of version 3.4 instead of $project.
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$addFields": {
"aEq": {"$eq":["$a1.a","$a2.a"]}
}
},
{"$match":{"aEq": false}}
]);
To avoid JavaScript use the aggregation framework:
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$project": {
"a1":1,
"a2":1,
"aCmp": {"$cmp":["$a1.a","$a2.a"]}
}
},
{"$match":{"aCmp":0}}
])
On our development server the equivalent JavaScript query takes 7x longer to complete.
Update (10 May 2017)
I just realized my answer didn't answer the question, which wanted values that are not equal (sometimes I'm really slow). This will work for that:
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$project": {
"a1":1,
"a2":1,
"aEq": {"$eq":["$a1.a","$a2.a"]}
}
},
{"$match":{"aEq": false}}
])
$ne could be used in place of $eq if the match condition was changed to true but I find using $eq with false to be more intuitive.
MongoDB uses Javascript in the background, so
{"a": "b"} == {"a": "b"}
would be false.
So to compare each you would have to a1.a == a2.a
To do this in MongoDB you would use the $where operator
db.myCollection.find({$where: "this.a1.a != this.a2.a"});
This assumes that each embedded document will have a property "a". If that isn't the case things get more complicated.
Starting in Mongo 4.4, for those that want to compare sub-documents and not only primitive values (since {"a": "b"} == {"a": "b"} is false), we can use the new $function aggregation operator that allows applying a custom javascript function:
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 1, "y" : 2 } }
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 3, "y" : 2 } }
db.collection.aggregate(
{ $match:
{ $expr:
{ $function: {
body: function(a1, a2) { return JSON.stringify(a1) != JSON.stringify(a2); },
args: ["$a1", "$a2"],
lang: "js"
}}
}
}
)
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 3, "y" : 2 } }
$function takes 3 parameters:
body, which is the function to apply, whose parameter are the two fields to compare.
args, which contains the fields from the record that the body function takes as parameter. In our case, both "$a1" and "$a2".
lang, which is the language in which the body function is written. Only js is currently available.
Thanks all for solving my problem -- concerning the answers that use aggregate(), one thing that confused me at first is that $eq (or $in, or lots of other operators) has different meaning depending on where it is used. In a find(), or the $match phase of aggregation, $eq takes a single value, and selects matching documents:
db.items.aggregate([{$match: {_id: {$eq: ObjectId("5be5feb45da16064c88e23d4")}}}])
However, in the $project phase of aggregation, $eq takes an Array of 2 expressions, and makes a new field with value true or false:
db.items.aggregate([{$project: {new_field: {$eq: ["$_id", "$foreignID"]}}}])
In passing, here's the query I used in my project to find all items whose list of linked items (due to a bug) linked to themselves:
db.items.aggregate([{$project: {idIn: {$in: ["$_id","$header.links"]}, "header.links": 1}}, {$match: {idIn: true}}])