MongoDB should report error when negative integer is used in dot notation? - mongodb

MongoDB allows to use dot notation to do queries on JSON sub-keys or array elements (see ref1 or ref2). For instance, if a is an array in the documents the following query:
db.c.find({"a.1": "foo"})
returns all documents in which 2nd element in the a arrary is the "foo" string.
So far, so good.
What is a bit surprissing in that MongoDB accepts using negative values for the index, e.g.:
db.c.find({"a.-1": "foo"})
That doesn't return anything (makes sense if it an unsupported syntax) but what I wonder if why MongoDB doesn't return error upon this operation or if it has some sense at the end. Documentation (as far as I've checked) doesn't provide any clue.
Any information on this is welcome!

That is not an error. The BSON spec defines a key name as
Zero or more modified UTF-8 encoded characters followed by '\x00'. The (byte*) MUST NOT contain '\x00', hence it is not full UTF-8.
Since "-1" is a valid string by that definition, it is a valid key name.
Quick demo:
> db.test.find({"a.-1":{$exists:true}})
{ "_id" : 0, "a" : { "-1" : 3 } }
Playground
Also note how that spec defines array:
Array - The document for an array is a normal BSON document with integer values for the keys, starting with 0 and continuing sequentially. For example, the array ['red', 'blue'] would be encoded as the document {'0': 'red', '1': 'blue'}. The keys must be in ascending numerical order.

Related

How to interpret indexof expression and functions in Azure Data Factory

I'm trying to understand the indexof expression(function) of Azure Data Factory.
Example
This example finds the starting index value for the "world" substring in the "hello world" string:
indexOf('hello world', 'world')
And returns this result: 6
I'm confused by what is meant by the 'index value' and how the example arrived at the result 6.
Also, using the above example, can someone let me know what would be the answer for the following expression?
#if(greater(indexof(string(pipeline().parameters.Config),'FilenameMask'),0),pipeline().parameters.Config.FilenameMask,'')
indexof
{"FilenameMask":"accounts*."}
'Config' represents a field in sql database
Per the docs:
Return the starting position or index value for a substring. This function is not case-sensitive, and indexes start with the number 0.
hello world
01234567890
^
+--- "world" found starting at position 6
Regarding the 2nd part of your question. Here's the expression re-written for a bit of clarity:
#if( greater(indexof(string(pipeline().parameters.Config),'FilenameMask'),0)
,pipeline().parameters.Config.FilenameMask
,'')
which can be read as follows:
if the index of the string "FilenameMask" within x is greater than 0 then
return x.Filenamemask
else
return an empty string
where x is pipeline().parameters.Config, which is the value of your "Config" column from the database table. It will hold values such as
{"sparkConfig":{"header":"true"},"FilenameMask":"cashsales*."}
and
{"FilenameMask":"accounts*."}
The ADF expression can also be read as follows:
if the JSON in the Config column contains a "FilenameMask" key then
return the value of the FilenameMask key
else
return an empty string

$expr query operator does not seem to work with array dot notation

I need to find documents in a collection matching a comparison between two dates, one of them being in an array in the document.
I found a solution using a combination of $arrayElemAt with aggregation pipeline but stand frustrated with the problem.
I originally thought the problem came from how I used the $lte operator with Date fields but I managed to narrow down the issue to a much simpler reproducible one.
db.getCollection('test-expr').insert({
"value1" : 1,
"value2" : 1,
"values" : [
1
]
})
Now trying to find the document with $expr:
db.getCollection('test-expr').find({$expr: {$eq: ["$value1", "$value2"]}})
=> Returns the document I just inserted.
Trying to compare the first element of the array using dot notation:
db.getCollection('test-expr').find({$expr: {$eq: ["$value1", "$values.0"]}})
=> Returns nothing.
I think maybe the dot notation doesn't work in $expr but I couldn't find anything pointing that out in the MongoDB documentation.
As mentioned before, I have found solutions to answer my problem but would like to understand why it cannot be achieved this way.
Am I missing something?
The $expr allows usage of Aggregation Expression Operators only. The dot notation you are using will not work to access the array element for the field "values" : [ 1 ]. You need to use the $arrayElemAt operator, and it works fine.
Your code find({$expr: {$eq: ["$value1", "$value2"]}}) worked, because the $expr used the aggregation expression operator $eq, not the MongoDB Query Language (MQL)'s operator $eq. Note both the operators look alike, but the usage and syntax is different.
And, the code find({$expr: {$eq: ["$value1", "$values.0"]}}) did not work - as expected. In the aggregation operator the $values.0, the 0 is interpreted as field name, not an index of an array field.
I think maybe the dot notation doesn't work in $expr but I couldn't
find anything pointing that out in the MongoDB documentation.
The dot notation works fine in $expr also. Here is an example, with sample document:
{ "_id" : 1, "val" : { "0" : 99, "a" : 11 } }
Now using the $expr and dot notation:
db.test.find({ $expr: { $eq: [ "$val.0", 99 ] } } )
db.test.find({ $expr: { $eq: [ "$val.a", 11 ] } } )
Both the queries return the document - the match happens with the filter using the $expr and the dot notation. But, this is valid with embedded (or sub) documents only not with array fields.
Form the documentation, Aggregation Pipeline Operators says:
These expression operators are available to construct
expressions
for use in the aggregation pipeline stages.
expressions:
Expressions can include field paths, literals, system variables,
expression objects, and expression operators. Expressions can be
nested.
Field Paths
Aggregation expressions use field path to access fields in the input
documents. To specify a field path, prefix the field name or the
dotted field name (if the field is in the embedded document) with a
dollar sign $. For example, "$user" to specify the field path for the
user field or "$user.name" to specify the field path to "user.name"
field.

Building $in array containing both strings and regex patterns

I have a Mongo collection where every document in the collection has a sources array property. Searches on this property can be a combination of exact matches and regex. For example, when using Mongo shell, the query below searches for documents with source='gas valves' OR 'hose' is contained in source item. This works just as I expect
db.notice.find({sources:{$in:[/\bhose/i,'gas valves']}})
Things get a little trickier in mgo. Because some items in the $in array can be regex, and the others strings - the only way I have figured to make a query is by using $or:
var regEx []bson.RegEx
var matches []string
// do stuff to populate regEx and matches
filter["$or"] = []bson.M{
{"sources":bson.M{"$in":regEx}},
{"sources":bson.M{"$in":matches}},
}
Is there some way I could construct one slice with both regex and string to use with $in - eliminating the need for the $or
Use []interface{}:
matches := []interface{}{
bson.RegEx{"jo.+", "i"},
"David",
"Jenna",
}
db.C("people").Find(bson.M{"name": bson.M{"$in": matches}})
[] means slice and interface{} means any type. Put together, []interface{} is a slice of any type.

How do I find the length of an associative array in AutoHotkey?

If you use the length() function on an associative array, it will return the "largest index" in use within the array. So, if you have any keys which are not integers, length() will not return the actual number of elements within your array. (And this could happen for other reasons as well.)
Is there a more useful version of length() for finding the length of an associative array?
Or do I need to actually cycle through and count each element? I'm not sure how I would do that without knowing all of the possible keys beforehand.
If you have a flat array, then Array.MaxIndex() will return the largest integer in the index. However this isn't always the best because AutoHotKey will allow you to have an array whose first index is not 1, so the MaxIndex() could be misleading.
Worse yet, if your object is an associative hashtable where the index may contain strings, then MaxIndex() will return null.
So it's probably best to count them.
DesiredDroids := object()
DesiredDroids["C3P0"] := "Gold"
DesiredDroids["R2D2"] := "Blue&White"
count :=0
for key, value in DesiredDroids
count++
MsgBox, % "We're looking for " . count . " droid" . ( count=1 ? "" : "s" ) . "."
Output
We're looking for 2 droids.

Behaviour of the project stage operator in projecting Arrays

My question is closely related to this, but not similar.
I have a sample document in my collection:
db.t.insert({"a":1,"b":2});
My intent is to project a field named combined of type array with the values of both a and b together.([1,2]).
I simply try to aggregate with a $project stage:
db.t.aggregate([
{$project:{"combined":[]}}
])
MongoDB throws an error: disallowed field type Array in object expression.
Which means a field cannot be projected as a array.
But when i use a $cond operator to project an array, the field gets projected.
db.t.aggregate([
{$project:{"combined":{$cond:[{$eq:[1,1]},["$a","$b"],"$a"]}}}
])
I get the o/p: {"combined" : [ "$a", "$b" ] }.
If you notice the output, the value of a and b are treated as if they were literals and not a field paths.
Can anyone please explain to me this behavior?, When i make the condition to fail,
db.t.aggregate([
{$project:{"combined":{$cond:[{$eq:[1,2]},["$a","$b"],"$a"]}}}
])
I get the expected output where $a is treated as a field path, since $a is not enclosed as an array element.
I've run into this before too and it's annoying, but it's actually working as documented for literal ["$a", "$b"]; the first error about disallowed field type is...not as clear why it complains. You have to follow the description of the grammar of the $project stage spread out in the documentation, however. I'll try to do that here. Starting at $project,
The $project stage has the following prototype form:
{ $project: { <specifications> } }
and specifications can be one of the following:
1. <field> : <1 or true or 0 or false>
2. <field> : <expression>
What's an expression? From aggregation expressions,
Expressions can include field paths and system variables, literals, expression objects, and operator expressions.
What are each of those things? A field path/system variable should be familiar: it's a string literal prefixed with $ or $$. An expression object has the form
{ <field1>: <expression1>, ... }
while an operator expression has one of the forms
{ <operator>: [ <argument1>, <argument2> ... ] }
{ <operator>: <argument> }
for some enumerated list of values for <operator>.
What's an <argument>? The documentation isn't clear on it, but from my experience I think it's any expression, subject to the syntax rules of the given operator (examine the operator expression "cond" : ... in the question).
Arrays fit in only as containers for argument lists and as literals. Literals are literals - their content is not evaluated for field paths or system variables, which is why the array literal argument in the $cond ends up with the value [ "$a", "$b" ]. The expressions in the argument array are evaluated.
The first error about Array being a disallowed value type is a bit odd to me, since an array literal is a valid expression, so according to the documentation it can be a value in an object expression. I don't see any ambiguity in parsing it as part of an object expression, either. It looks like it's just a rule they made to make the parsing easier? You can "dodge" it using $literal to put in a constant array value:
db.collection.project([{ "$project" : { "combined" : { "$literal" : [1, 2] } } }])
I hope this helps explain why things work this way. I was surprised the first time I tried to do something like [ "$a", "$b" ] and it didn't work as I expected. It'd be nice if there were a feature to pack field paths into an array, at least. I've found uses for it when $grouping on ordered pairs of values, as well.
There's a JIRA ticket, SERVER-8141, requesting an $array operator to help with cases like this.