XSLT Substring not woring [duplicate] - substring

Let's say I have this string: "123_12345_123456"
I would like to extract everything before the second "_" (underscore)
I tried:
fn:tokenize("123_1234_12345", '_')[position() le 2]
That returns:
123
1234
What I actually want is:
123_1234
How do I achieve that?
I am using XQuery 1.0

Regular expressions are flexible and compact:
replace('123_1234_12345', '_[^_]+$', '')
Another solution that may be better readable is to a) tokenize the string, b) keep the tokens you want to preserve and c) join them again:
string-join(
tokenize('123_1234_12345', '_')[position() = 1 to 2],
'_'
)

Taking the basic idea from Michael Kay's deleted answer, it could be implemented like this:
substring($input, 1, index-of(string-to-codepoints($input), 95)[2] - 1)

Related

Regex expression in q to match specific integer range following string

Using q’s like function, how can we achieve the following match using a single regex string regstr?
q) ("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13") like regstr
>>> 0111110b
That is, like regstr matches the foo-strings which end in the numbers 8,9,10,11,12.
Using regstr:"foo[8-12]" confuses the square brackets (how does it interpret this?) since 12 is not a single digit, while regstr:"foo[1[0-2]|[1-9]]" returns a type error, even without the foo-string complication.
As the other comments and answers mentioned, this can't be done using a single regex. Another alternative method is to construct the list of strings that you want to compare against:
q)str:("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")
q)match:{x in y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
0111110b
If your eventual goal is to filter on the matching entries, you can replace in with inter:
q)match:{x inter y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
A variation on Cillian’s method: test the prefix and numbers separately.
q)range:{x+til 1+y-x}.
q)s:"foo",/:string 82,range 7 13 / include "foo82" in tests
q)match:{min(x~/:;in[;string range y]')#'flip count[x]cut'z}
q)match["foo";8 12;] s
00111110b
Note how unary derived functions x~/: and in[;string range y]' are paired by #' to the split strings, then min used to AND the result:
q)flip 3 cut's
"foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo"
"82" ,"7" ,"8" ,"9" "10" "11" "12" "13"
q)("foo"~/:;in[;string range 8 12]')#'flip 3 cut's
11111111b
00111110b
Compositions rock.
As the comments state, regex in kdb+ is extremely limited. If the number of trailing digits is known like in the example above then the following can be used to check multiple patterns
q)str:("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13"; "foo3x"; "foo123")
q)any str like/:("foo[0-9]";"foo[0-9][0-9]")
111111100b
Checking for a range like 8-12 is not currently possible within kdb+ regex. One possible workaround is to write a function to implement this logic. The function range checks a list of strings start with a passed string and end with a number within the range specified.
range:{
/ checking for strings starting with string y
s:((c:count y)#'x)like y;
/ convert remainder of string to long, check if within range
d:("J"$c _'x)within z;
/ find strings satisfying both conditions
s&d
}
Example use:
q)range[str;"foo";8 12]
011111000b
q)str where range[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
This could be made more efficient by checking the trailing digits only on the subset of strings starting with "foo".
For your example you can pad, fill with a char, and then simple regex works fine:
("."^5$("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")) like "foo[1|8-9][.|0-2]"

How to strip everything except digits from a string in Scala (quick one liners)

This is driving me nuts... there must be a way to strip out all non-digit characters (or perform other simple filtering) in a String.
Example: I want to turn a phone number ("+72 (93) 2342-7772" or "+1 310-777-2341") into a simple numeric String (not an Int), such as "729323427772" or "13107772341".
I tried "[\\d]+".r.findAllIn(phoneNumber) which returns an Iteratee and then I would have to recombine them into a String somehow... seems horribly wasteful.
I also came up with: phoneNumber.filter("0123456789".contains(_)) but that becomes tedious for other situations. For instance, removing all punctuation... I'm really after something that works with a regular expression so it has wider application than just filtering out digits.
Anyone have a fancy Scala one-liner for this that is more direct?
You can use filter, treating the string as a character sequence and testing the character with isDigit:
"+72 (93) 2342-7772".filter(_.isDigit) // res0: String = 729323427772
You can use replaceAll and Regex.
"+72 (93) 2342-7772".replaceAll("[^0-9]", "") // res1: String = 729323427772
Another approach, define the collection of valid characters, in this case
val d = '0' to '9'
and so for val a = "+72 (93) 2342-7772", filter on collection inclusion for instance with either of these,
for (c <- a if d.contains(c)) yield c
a.filter(d.contains)
a.collect{ case c if d.contains(c) => c }

ignore spaces and cases MATLAB

diary_file = tempname();
diary(diary_file);
myFun();
diary('off');
output = fileread(diary_file);
I would like to search a string from output, but also to ignore spaces and upper/lower cases. Here is an example for what's in output:
the test : passed
number : 4
found = 'thetest:passed'
a = strfind(output,found )
How could I ignore spaces and cases from output?
Assuming you are not too worried about accidentally matching something like: 'thetEst:passed' here is what you can do:
Remove all spaces and only compare lower case
found = 'With spaces'
found = lower(found(found ~= ' '))
This will return
found =
withspaces
Of course you would also need to do this with each line of output.
Another way:
regexpi(output(~isspace(output)), found, 'match')
if output is a single string, or
regexpi(regexprep(output,'\s',''), found, 'match')
for the more general case (either class(output) == 'cell' or 'char').
Advantages:
Fast.
robust (ALL whitespace (not just spaces) is removed)
more flexible (you can return starting/ending indices of the match, tokenize, etc.)
will return original case of the match in output
Disadvantages:
more typing
less obvious (more documentation required)
will return original case of the match in output (yes, there's two sides to that coin)
That last point in both lists is easily forced to lower or uppercase using lower() or upper(), but if you want same-case, it's a bit more involved:
C = regexpi(output(~isspace(output)), found, 'match');
if ~isempty(C)
C = found; end
for single string, or
C = regexpi(regexprep(output, '\s', ''), found, 'match')
C(~cellfun('isempty', C)) = {found}
for the more general case.
You can use lower to convert everything to lowercase to solve your case problem. However ignoring whitespace like you want is a little trickier. It looks like you want to keep some spaces but not all, in which case you should split the string by whitespace and compare substrings piecemeal.
I'd advertise using regex, e.g. like this:
a = regexpi(output, 'the\s*test\s*:\s*passed');
If you don't care about the position where the match occurs but only if there's a match at all, removing all whitespaces would be a brute force, and somewhat nasty, possibility:
a = strfind(strrrep(output, ' ',''), found);

Separating file name in parts by identifier

This may be a very simple task for many but I could not find anything appropriate for me.
I have a file name: filenm_A006.2011.269.10.47.G25_2010
I want to separate all its parts (separated by . and _) to use them separately. How can I do it with simple matlab commands?
Kind Regards,
Mushi
I recommend regexp:
fname = 'filenm_A006.2011.269.10.47.G25_2010';
parts = regexp(fname, '[^_.]+', 'match');
parts =
'filenm' 'A006' '2011' '269' '10' '47' 'G25' '2010'
You can now refer to parts{1} through parts{8} for the pieces. Explanation: the regexp pattern [^_.] means all characters not equal to _ or ., and the + means you want groups of at least 1 character. Then 'match' asks the regexp function to return a cell array of the strings of all the matches of that pattern. There are other regexp modes; for example, the indices of each piece of the file.
Use the command
strsplit.
cellArrayOfParts = strsplit(fileName,{'.' '_'});
You can use strsplit to split it:
strsplit('filenm_A006.2011.269.10.47.G25_2010',{'_','.'})
ans =
'filenm' 'A006' '2011' '269' '10' '47' 'G25' '2010'
Another option is to use regexp, like Peter suggested.

Regex to grab a number from within a string

I would like to grab a number from a string.
The string is: "responseID: 1 is the return value";
Sometimes the number could be 1, or 3, 300, 3000...
I am trying to do this in objective C for the iphone.
I've tried NSPredicate and NSRegularExpression, but I can not seem to get the right regex to start with.
I tried "*[0-9]+*";
Does this return the "1" , "300" or whatever number to me when I call the regex?
Thank you!
\d+ should match 1 or more digits.
Sorry, I was able to search again and found exactly what I was looking for: see this:
Objective C: How to grab a number after a sub string from string