MongoDB aggregation split string with a regex - mongodb

I want to split a string given a regex within a MongoDB aggregation
The documentation says :
The $split operator returns an array. The and
inputs must both be strings. Otherwise, the operation
fails with an error.
Do you know to perfom the same thing with a regex ?
I would like to ideally keep the delimiter with the next split
Here is an example of data :
input = "[part1]
aaaa
[part2]
bbbb
[part3]
cccc"
regex = r"(?:^|\n)\[part\d+\]"
output = ["[part1]
aaaa",
"[part2]
bbbb",
"[part3]
cccc"]

Related

Regex expression in q to match specific integer range following string

Using q’s like function, how can we achieve the following match using a single regex string regstr?
q) ("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13") like regstr
>>> 0111110b
That is, like regstr matches the foo-strings which end in the numbers 8,9,10,11,12.
Using regstr:"foo[8-12]" confuses the square brackets (how does it interpret this?) since 12 is not a single digit, while regstr:"foo[1[0-2]|[1-9]]" returns a type error, even without the foo-string complication.
As the other comments and answers mentioned, this can't be done using a single regex. Another alternative method is to construct the list of strings that you want to compare against:
q)str:("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")
q)match:{x in y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
0111110b
If your eventual goal is to filter on the matching entries, you can replace in with inter:
q)match:{x inter y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
A variation on Cillian’s method: test the prefix and numbers separately.
q)range:{x+til 1+y-x}.
q)s:"foo",/:string 82,range 7 13 / include "foo82" in tests
q)match:{min(x~/:;in[;string range y]')#'flip count[x]cut'z}
q)match["foo";8 12;] s
00111110b
Note how unary derived functions x~/: and in[;string range y]' are paired by #' to the split strings, then min used to AND the result:
q)flip 3 cut's
"foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo"
"82" ,"7" ,"8" ,"9" "10" "11" "12" "13"
q)("foo"~/:;in[;string range 8 12]')#'flip 3 cut's
11111111b
00111110b
Compositions rock.
As the comments state, regex in kdb+ is extremely limited. If the number of trailing digits is known like in the example above then the following can be used to check multiple patterns
q)str:("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13"; "foo3x"; "foo123")
q)any str like/:("foo[0-9]";"foo[0-9][0-9]")
111111100b
Checking for a range like 8-12 is not currently possible within kdb+ regex. One possible workaround is to write a function to implement this logic. The function range checks a list of strings start with a passed string and end with a number within the range specified.
range:{
/ checking for strings starting with string y
s:((c:count y)#'x)like y;
/ convert remainder of string to long, check if within range
d:("J"$c _'x)within z;
/ find strings satisfying both conditions
s&d
}
Example use:
q)range[str;"foo";8 12]
011111000b
q)str where range[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
This could be made more efficient by checking the trailing digits only on the subset of strings starting with "foo".
For your example you can pad, fill with a char, and then simple regex works fine:
("."^5$("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")) like "foo[1|8-9][.|0-2]"

How to Get only numeric value in text file using powershell?

I have a text file sample.txt containing
computer
computer.pc = 1
pc
i want only number 1, where i want to assign that value to a variable
$number = Get -content "sample.txt"
You can extract the number by using the Regex Match method.
Example code to do this:
$number = ([regex]::Match((Get-content "sample.txt"), "\d+")).Value
The pattern \d+ means to match one or more decimal digits and using the Match method will return the first match found.
See Quantifiers in Regular Expressions for additional information regarding the quantifiers available.

Check if string contains any of the following

I'm trying to check if a string contains one of four sub strings in a simpler way than this:
if (imageUrl.contains('.jpg') ||
imageUrl.contains('.png') ||
imageUrl.contains('.tif') ||
imageUrl.contains('.gif')) {
}
Is there a way to do this? For example checking against a list?
You can use a regex pattern instead of a simple string:
imageUrl.contains(new RegExp("\.(jpg|png|tif|gif)"))
Might be somewhat simpler.
RegularExpression can solve your problem. RegEx are used to search patterns in strings.
RegEx example:
^The matches any string that starts with The
end$ matches a string that ends with end
^The end$ exact string match (starts and ends with The end)
abc* matches a string that has ab followed by zero or more c

Building $in array containing both strings and regex patterns

I have a Mongo collection where every document in the collection has a sources array property. Searches on this property can be a combination of exact matches and regex. For example, when using Mongo shell, the query below searches for documents with source='gas valves' OR 'hose' is contained in source item. This works just as I expect
db.notice.find({sources:{$in:[/\bhose/i,'gas valves']}})
Things get a little trickier in mgo. Because some items in the $in array can be regex, and the others strings - the only way I have figured to make a query is by using $or:
var regEx []bson.RegEx
var matches []string
// do stuff to populate regEx and matches
filter["$or"] = []bson.M{
{"sources":bson.M{"$in":regEx}},
{"sources":bson.M{"$in":matches}},
}
Is there some way I could construct one slice with both regex and string to use with $in - eliminating the need for the $or
Use []interface{}:
matches := []interface{}{
bson.RegEx{"jo.+", "i"},
"David",
"Jenna",
}
db.C("people").Find(bson.M{"name": bson.M{"$in": matches}})
[] means slice and interface{} means any type. Put together, []interface{} is a slice of any type.

Searching with multiple keys and "begins with"

What's the best way to perform the following type of search in a collection named "things":
mylist = ['lak', 'dodg', 'ang']
and the return could be:
["lake", "Lakers", "laky", "dodge", "Dodgers", "Angels", "angle"]
Would I need to perform a separate query for each?
To do this you want to use the mongodb command $in to search for all things that match with something in your array.
The command you would use would be:
db.things.find( {name: { $in: mylist }} )
But for this to work you want to be using regular expressions in your array, so you can either define them in the array, or if you want to maintain strings then the best thing to do it probably just create another array and loop through and create regex from the strings.
mylist = [/^lak/i, /^dodg/i, /^ang/i]
The ^ making it match only if it begins with the value, and the i at the end to make the search case insensitive.