Regex expression in q to match specific integer range following string

Regex expression in q to match specific integer range following string - kdb

Using q’s like function, how can we achieve the following match using a single regex string regstr?
q) ("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13") like regstr
>>> 0111110b
That is, like regstr matches the foo-strings which end in the numbers 8,9,10,11,12.
Using regstr:"foo[8-12]" confuses the square brackets (how does it interpret this?) since 12 is not a single digit, while regstr:"foo[1[0-2]|[1-9]]" returns a type error, even without the foo-string complication.

As the other comments and answers mentioned, this can't be done using a single regex. Another alternative method is to construct the list of strings that you want to compare against:
q)str:("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")
q)match:{x in y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
0111110b
If your eventual goal is to filter on the matching entries, you can replace in with inter:
q)match:{x inter y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"

A variation on Cillian’s method: test the prefix and numbers separately.
q)range:{x+til 1+y-x}.
q)s:"foo",/:string 82,range 7 13 / include "foo82" in tests
q)match:{min(x~/:;in[;string range y]')#'flip count[x]cut'z}
q)match["foo";8 12;] s
00111110b
Note how unary derived functions x~/: and in[;string range y]' are paired by #' to the split strings, then min used to AND the result:
q)flip 3 cut's
"foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo"
"82" ,"7" ,"8" ,"9" "10" "11" "12" "13"
q)("foo"~/:;in[;string range 8 12]')#'flip 3 cut's
11111111b
00111110b
Compositions rock.

As the comments state, regex in kdb+ is extremely limited. If the number of trailing digits is known like in the example above then the following can be used to check multiple patterns
q)str:("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13"; "foo3x"; "foo123")
q)any str like/:("foo[0-9]";"foo[0-9][0-9]")
111111100b
Checking for a range like 8-12 is not currently possible within kdb+ regex. One possible workaround is to write a function to implement this logic. The function range checks a list of strings start with a passed string and end with a number within the range specified.
range:{
/ checking for strings starting with string y
s:((c:count y)#'x)like y;
/ convert remainder of string to long, check if within range
d:("J"$c _'x)within z;
/ find strings satisfying both conditions
s&d
}
Example use:
q)range[str;"foo";8 12]
011111000b
q)str where range[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
This could be made more efficient by checking the trailing digits only on the subset of strings starting with "foo".

For your example you can pad, fill with a char, and then simple regex works fine:
("."^5$("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")) like "foo[1|8-9][.|0-2]"

Related

how do we iterate and store the results in a variable in kdb

I have a string say example "https://www.google.com" and a count of paging say 5
how do i iterate the URL to append p/page=incrementing numbers of paging and store the result in a variable as list?
"https://www.google.com/page=1"
"https://www.google.com/page=2"
"https://www.google.com/page=3"
"https://www.google.com/page=4"
"https://www.google.com/page=5"
So the end result will look like this having a variable query_var which will hold a list of string example below
query_var:("https://www.google.com/page=1";"https://www.google.com/page=2";"https://www.google.com/page=3";"https://www.google.com/page=4";"https://www.google.com/page=5");
count query_var \\5

You can use the join function , with the each-right adverb /:
query_var: "https://www.google.com/page=" ,/: string 1_til 6
Make it a function to support a varied count of pages:
f:{"https://www.google.com/page=" ,/: string 1_til x+1}
q)10=count f[10]
1b

Q1
You don’t even need a lambda to make this a function. It’s a sequence of unary functions, so you can compose them.
q)f: "https://www.google.com/page=",/: string ::
q)f 1+til 6
"https://www.google.com/page=1"
"https://www.google.com/page=2"
"https://www.google.com/page=3"
"https://www.google.com/page=4"
"https://www.google.com/page=5"
"https://www.google.com/page=6"
Q2
q)("https://www.google.com";;"info/history")
enlist["https://www.google.com";;"info/history"]
q)"/"sv'("https://www.google.com";;"info/history")#/: string `aapl`msft`nftx
"https://www.google.com/aapl/info/history"
"https://www.google.com/msft/info/history"
"https://www.google.com/nftx/info/history"
List notation is syntactic sugar for enlist. The list with a missing item is a projection of enlist and can be iterated.
Again, a sequence of unaries is all composable without a lambda:
q)g: "/"sv'("https://www.google.com";;"info/history")#/: string ::
q)g `aapl`msft`nftx
"https://www.google.com/aapl/info/history"
"https://www.google.com/msft/info/history"
"https://www.google.com/nftx/info/history"

If you can assume a unique character that doesn't appear elsewhere in your url (e.g. #) then ssr is a simple and easily-readable approach:
ssr["https://www.google.com/page=#";"#";]each string 1+til 5
ssr["https://www.google.com/#/info/history";"#";]each string`aapl`msft`nftx

SPARQL how to make DAY/MONTH always return 2 digits

I have a datetime in my SPARQL-query that I want to transform to a date.
Therefore I do:
BIND(CONCAT(YEAR(?dateTime), "-",MONTH(?dateTime), "-", DAY(?dateTime)) as ?date)
This part of code works but returns for example 2022-2-3, I want it to be 2022-02-03. If the dateTime is 2022-11-23, nothing should change.

You can take the integers you get back from the YEAR, MONTH, and DAY functions and pad them with the appropriate number of zeros (after turning them into strings):
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT * WHERE {
BIND(2022 AS ?yearInt) # this would come from your YEAR(?dateTime) call
BIND(2 AS ?monthInt) # this would come from your MONTH(?dateTime) call
BIND(13 AS ?dayInt) # this would come from your DAY(?dateTime) call
# convert to strings
BIND(STR(?yearInt) AS ?year)
BIND(STR(?monthInt) AS ?month)
BIND(STR(?dayInt) AS ?day)
# pad with zeros
BIND(CONCAT("00", ?year) AS ?paddedYear)
BIND(CONCAT("0000", ?month) AS ?paddedMonth)
BIND(CONCAT("00", ?day) AS ?paddedDay)
# extract the right number of digits from the padded strings
BIND(SUBSTR(?paddedYear, STRLEN(?paddedYear)-3) AS ?fourDigitYear)
BIND(SUBSTR(?paddedDay, STRLEN(?paddedDay)-1) AS ?twoDigitDay)
BIND(SUBSTR(?paddedMonth, STRLEN(?paddedMonth)-1) AS ?twoDigitMonth)
# put it all back together
BIND(CONCAT(?fourDigitYear, "-", ?twoDigitMonth, "-", ?twoDigitDay) as ?date)
}

#gregory-williams gives a portable answer. An alternative is functions from F&O (XPath and XQuery Functions and Operators 3.1) "fn:format-...."
I'm not sure of the coverage in various triplestores - Apache Jena provides fn:format-number, which is needed for the question, but not fn-format-dateTime etc
See
https://www.w3.org/TR/xpath-functions-3/#formatting-the-number
https://www.w3.org/TR/xpath-functions-3/#formatting-dates-and-times
For example:
fn:format-number(1,"000") returns the string "001".
Apache Jena also has a local extension afn:sprintf using the C or Java syntax of sprintf:
afn:sprintf("%03d", 1) returns "001".

Matlab: Function that returns a string with the first n characters of the alphabet

I'd like to have a function generate(n) that generates the first n lowercase characters of the alphabet appended in a string (therefore: 1<=n<=26)
For example:
generate(3) --> 'abc'
generate(5) --> 'abcde'
generate(9) --> 'abcdefghi'
I'm new to Matlab and I'd be happy if someone could show me an approach of how to write the function. For sure this will involve doing arithmetic with the ASCII-codes of the characters - but I've no idea how to do this and which types that Matlab provides to do this.

I would rely on ASCII codes for this. You can convert an integer to a character using char.
So for example if we want an "e", we could look up the ASCII code for "e" (101) and write:
char(101)
'e'
This also works for arrays:
char([101, 102])
'ef'
The nice thing in your case is that in ASCII, the lowercase letters are all the numbers between 97 ("a") and 122 ("z"). Thus the following code works by taking ASCII "a" (97) and creating an array of length n starting at 97. These numbers are then converted using char to strings. As an added bonus, the version below ensures that the array can only go to 122 (ASCII for "z").
function output = generate(n)
output = char(97:min(96 + n, 122));
end
Note: For the upper limit we use 96 + n because if n were 1, then we want 97:97 rather than 97:98 as the second would return "ab". This could be written as 97:(97 + n - 1) but the way I've written it, I've simply pulled the "-1" into the constant.
You could also make this a simple anonymous function.
generate = #(n)char(97:min(96 + n, 122));
generate(3)
'abc'
To write the most portable and robust code, I would probably not want those hard-coded ASCII codes, so I would use something like the following:
output = 'a':char(min('a' + n - 1, 'z'));

...or, you can just generate the entire alphabet and take the part you want:
function str = generate(n)
alphabet = 'a':'z';
str = alphabet(1:n);
end
Note that this will fail with an index out of bounds error for n > 26, so you might want to check for that.

You can use the char built-in function which converts an interger value (or array) into a character array.
EDIT
Bug fixed (ref. Suever's comment)
function [str]=generate(n)
a=97;
% str=char(a:a+n)
str=char(a:a+n-1)
Hope this helps.
Qapla'

Function to split string in matlab and return second number

I have a string and I need two characters to be returned.
I tried with strsplit but the delimiter must be a string and I don't have any delimiters in my string. Instead, I always want to get the second number in my string. The number is always 2 digits.
Example: 001a02.jpg I use the fileparts function to delete the extension of the image (jpg), so I get this string: 001a02
The expected return value is 02
Another example: 001A43a . Return values: 43
Another one: 002A12. Return values: 12
All the filenames are in a matrix 1002x1. Maybe I can use textscan but in the second example, it gives "43a" as a result.

(Just so this question doesn't remain unanswered, here's a possible approach: )
One way to go about this uses splitting with regular expressions (MATLAB's strsplit which you mentioned):
str = '001a02.jpg';
C = strsplit(str,'[a-zA-Z.]','DelimiterType','RegularExpression');
Results in:
C =
'001' '02' ''
In older versions of MATLAB, before strsplit was introduced, similar functionality was achieved using regexp(...,'split').
If you want to learn more about regular expressions (abbreviated as "regex" or "regexp"), there are many online resources (JGI..)
In your case, if you only need to take the 5th and 6th characters from the string you could use:
D = str(5:6);
... and if you want to convert those into numbers you could use:
E = str2double(str(5:6));

If your number is always at a certain position in the string, you can simply index this position.
In the examples you gave, the number is always the 5th and 6th characters in the string.
filename = '002A12';
num = str2num(filename(5:6));
Otherwise, if the formating is more complex, you may want to use a regular expression. There is a similar question matlab - extracting numbers from (odd) string. Modifying the code found there you can do the following
all_num = regexp(filename, '\d+', 'match'); %Find all numbers in the filename
num = str2num(all_num{2}) %Convert second number from str

How is the full substring different from using .text()?

I'm failing to see how taking the full substring is different from just using .text()?
This is a snippet of a larger code set that I'm trying to understand but failing:
$(this).text().substring(0, ($(this).text().length - 1))
Substring takes a portion of the full text/string, but in this case it is taking the whole string, correct?

No, here substring is returning characters 0 to n-1 of an n length string.
x = "hello";
>>> "hello"
x.substring(0, x.length - 1)
>>> "hell"
From the MDN documentation linked:
substring extracts characters from indexA up to but not including indexB. In particular:
If indexA equals indexB, substring returns an empty string.
If indexB is omitted, substring extracts characters to the end of the string.
If either argument is less than 0 or is NaN, it is treated as if it were 0.
If either argument is greater than stringName.length, it is treated as if it were stringName.length.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Regex expression in q to match specific integer range following string - kdb

For your example you can pad, fill with a char, and then simple regex works fine: ("."^5$("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")) like "foo[1|8-9][.|0-2]"

Related

how do we iterate and store the results in a variable in kdb

SPARQL how to make DAY/MONTH always return 2 digits

Matlab: Function that returns a string with the first n characters of the alphabet

Function to split string in matlab and return second number

How is the full substring different from using .text()?

Categories

Resources