Building $in array containing both strings and regex patterns - mongodb

I have a Mongo collection where every document in the collection has a sources array property. Searches on this property can be a combination of exact matches and regex. For example, when using Mongo shell, the query below searches for documents with source='gas valves' OR 'hose' is contained in source item. This works just as I expect
db.notice.find({sources:{$in:[/\bhose/i,'gas valves']}})
Things get a little trickier in mgo. Because some items in the $in array can be regex, and the others strings - the only way I have figured to make a query is by using $or:
var regEx []bson.RegEx
var matches []string
// do stuff to populate regEx and matches
filter["$or"] = []bson.M{
{"sources":bson.M{"$in":regEx}},
{"sources":bson.M{"$in":matches}},
}
Is there some way I could construct one slice with both regex and string to use with $in - eliminating the need for the $or

Use []interface{}:
matches := []interface{}{
bson.RegEx{"jo.+", "i"},
"David",
"Jenna",
}
db.C("people").Find(bson.M{"name": bson.M{"$in": matches}})
[] means slice and interface{} means any type. Put together, []interface{} is a slice of any type.

Related

MongoDB should report error when negative integer is used in dot notation?

MongoDB allows to use dot notation to do queries on JSON sub-keys or array elements (see ref1 or ref2). For instance, if a is an array in the documents the following query:
db.c.find({"a.1": "foo"})
returns all documents in which 2nd element in the a arrary is the "foo" string.
So far, so good.
What is a bit surprissing in that MongoDB accepts using negative values for the index, e.g.:
db.c.find({"a.-1": "foo"})
That doesn't return anything (makes sense if it an unsupported syntax) but what I wonder if why MongoDB doesn't return error upon this operation or if it has some sense at the end. Documentation (as far as I've checked) doesn't provide any clue.
Any information on this is welcome!
That is not an error. The BSON spec defines a key name as
Zero or more modified UTF-8 encoded characters followed by '\x00'. The (byte*) MUST NOT contain '\x00', hence it is not full UTF-8.
Since "-1" is a valid string by that definition, it is a valid key name.
Quick demo:
> db.test.find({"a.-1":{$exists:true}})
{ "_id" : 0, "a" : { "-1" : 3 } }
Playground
Also note how that spec defines array:
Array - The document for an array is a normal BSON document with integer values for the keys, starting with 0 and continuing sequentially. For example, the array ['red', 'blue'] would be encoded as the document {'0': 'red', '1': 'blue'}. The keys must be in ascending numerical order.

MongoDB aggregation split string with a regex

I want to split a string given a regex within a MongoDB aggregation
The documentation says :
The $split operator returns an array. The and
inputs must both be strings. Otherwise, the operation
fails with an error.
Do you know to perfom the same thing with a regex ?
I would like to ideally keep the delimiter with the next split
Here is an example of data :
input = "[part1]
aaaa
[part2]
bbbb
[part3]
cccc"
regex = r"(?:^|\n)\[part\d+\]"
output = ["[part1]
aaaa",
"[part2]
bbbb",
"[part3]
cccc"]

$expr query operator does not seem to work with array dot notation

I need to find documents in a collection matching a comparison between two dates, one of them being in an array in the document.
I found a solution using a combination of $arrayElemAt with aggregation pipeline but stand frustrated with the problem.
I originally thought the problem came from how I used the $lte operator with Date fields but I managed to narrow down the issue to a much simpler reproducible one.
db.getCollection('test-expr').insert({
"value1" : 1,
"value2" : 1,
"values" : [
1
]
})
Now trying to find the document with $expr:
db.getCollection('test-expr').find({$expr: {$eq: ["$value1", "$value2"]}})
=> Returns the document I just inserted.
Trying to compare the first element of the array using dot notation:
db.getCollection('test-expr').find({$expr: {$eq: ["$value1", "$values.0"]}})
=> Returns nothing.
I think maybe the dot notation doesn't work in $expr but I couldn't find anything pointing that out in the MongoDB documentation.
As mentioned before, I have found solutions to answer my problem but would like to understand why it cannot be achieved this way.
Am I missing something?
The $expr allows usage of Aggregation Expression Operators only. The dot notation you are using will not work to access the array element for the field "values" : [ 1 ]. You need to use the $arrayElemAt operator, and it works fine.
Your code find({$expr: {$eq: ["$value1", "$value2"]}}) worked, because the $expr used the aggregation expression operator $eq, not the MongoDB Query Language (MQL)'s operator $eq. Note both the operators look alike, but the usage and syntax is different.
And, the code find({$expr: {$eq: ["$value1", "$values.0"]}}) did not work - as expected. In the aggregation operator the $values.0, the 0 is interpreted as field name, not an index of an array field.
I think maybe the dot notation doesn't work in $expr but I couldn't
find anything pointing that out in the MongoDB documentation.
The dot notation works fine in $expr also. Here is an example, with sample document:
{ "_id" : 1, "val" : { "0" : 99, "a" : 11 } }
Now using the $expr and dot notation:
db.test.find({ $expr: { $eq: [ "$val.0", 99 ] } } )
db.test.find({ $expr: { $eq: [ "$val.a", 11 ] } } )
Both the queries return the document - the match happens with the filter using the $expr and the dot notation. But, this is valid with embedded (or sub) documents only not with array fields.
Form the documentation, Aggregation Pipeline Operators says:
These expression operators are available to construct
expressions
for use in the aggregation pipeline stages.
expressions:
Expressions can include field paths, literals, system variables,
expression objects, and expression operators. Expressions can be
nested.
Field Paths
Aggregation expressions use field path to access fields in the input
documents. To specify a field path, prefix the field name or the
dotted field name (if the field is in the embedded document) with a
dollar sign $. For example, "$user" to specify the field path for the
user field or "$user.name" to specify the field path to "user.name"
field.

Behaviour of the project stage operator in projecting Arrays

My question is closely related to this, but not similar.
I have a sample document in my collection:
db.t.insert({"a":1,"b":2});
My intent is to project a field named combined of type array with the values of both a and b together.([1,2]).
I simply try to aggregate with a $project stage:
db.t.aggregate([
{$project:{"combined":[]}}
])
MongoDB throws an error: disallowed field type Array in object expression.
Which means a field cannot be projected as a array.
But when i use a $cond operator to project an array, the field gets projected.
db.t.aggregate([
{$project:{"combined":{$cond:[{$eq:[1,1]},["$a","$b"],"$a"]}}}
])
I get the o/p: {"combined" : [ "$a", "$b" ] }.
If you notice the output, the value of a and b are treated as if they were literals and not a field paths.
Can anyone please explain to me this behavior?, When i make the condition to fail,
db.t.aggregate([
{$project:{"combined":{$cond:[{$eq:[1,2]},["$a","$b"],"$a"]}}}
])
I get the expected output where $a is treated as a field path, since $a is not enclosed as an array element.
I've run into this before too and it's annoying, but it's actually working as documented for literal ["$a", "$b"]; the first error about disallowed field type is...not as clear why it complains. You have to follow the description of the grammar of the $project stage spread out in the documentation, however. I'll try to do that here. Starting at $project,
The $project stage has the following prototype form:
{ $project: { <specifications> } }
and specifications can be one of the following:
1. <field> : <1 or true or 0 or false>
2. <field> : <expression>
What's an expression? From aggregation expressions,
Expressions can include field paths and system variables, literals, expression objects, and operator expressions.
What are each of those things? A field path/system variable should be familiar: it's a string literal prefixed with $ or $$. An expression object has the form
{ <field1>: <expression1>, ... }
while an operator expression has one of the forms
{ <operator>: [ <argument1>, <argument2> ... ] }
{ <operator>: <argument> }
for some enumerated list of values for <operator>.
What's an <argument>? The documentation isn't clear on it, but from my experience I think it's any expression, subject to the syntax rules of the given operator (examine the operator expression "cond" : ... in the question).
Arrays fit in only as containers for argument lists and as literals. Literals are literals - their content is not evaluated for field paths or system variables, which is why the array literal argument in the $cond ends up with the value [ "$a", "$b" ]. The expressions in the argument array are evaluated.
The first error about Array being a disallowed value type is a bit odd to me, since an array literal is a valid expression, so according to the documentation it can be a value in an object expression. I don't see any ambiguity in parsing it as part of an object expression, either. It looks like it's just a rule they made to make the parsing easier? You can "dodge" it using $literal to put in a constant array value:
db.collection.project([{ "$project" : { "combined" : { "$literal" : [1, 2] } } }])
I hope this helps explain why things work this way. I was surprised the first time I tried to do something like [ "$a", "$b" ] and it didn't work as I expected. It'd be nice if there were a feature to pack field paths into an array, at least. I've found uses for it when $grouping on ordered pairs of values, as well.
There's a JIRA ticket, SERVER-8141, requesting an $array operator to help with cases like this.

Searching with multiple keys and "begins with"

What's the best way to perform the following type of search in a collection named "things":
mylist = ['lak', 'dodg', 'ang']
and the return could be:
["lake", "Lakers", "laky", "dodge", "Dodgers", "Angels", "angle"]
Would I need to perform a separate query for each?
To do this you want to use the mongodb command $in to search for all things that match with something in your array.
The command you would use would be:
db.things.find( {name: { $in: mylist }} )
But for this to work you want to be using regular expressions in your array, so you can either define them in the array, or if you want to maintain strings then the best thing to do it probably just create another array and loop through and create regex from the strings.
mylist = [/^lak/i, /^dodg/i, /^ang/i]
The ^ making it match only if it begins with the value, and the i at the end to make the search case insensitive.