PATINDEX incorrect result when looking for dash character "-" - tsql

This simple example shows the issue I've run into, but I don't understand why...
I'm testing for the location of the first character that is either a lower or upper case letter, a single dash, or a period in a string parameter passed to me.
These two pattern matches appear to check the same thing, and yet run this code yourself and it will print a 0 then a 3:
PRINT PATINDEX ( '%[a-z,A-Z,-,.]%', '16-82')
PRINT PATINDEX ( '%[-,a-z,A-Z,.]%', '16-82')
I don't understand why it works only if the dash character is the first one we check for.
Is this a bug? Or working as designed and I missed something... I'm using SQL Server 2016, but I don't think that matters.

A dash within a character group may play either of the two roles:
It may denote the dash itself, like it does in the expression [-abc]
It may denote the "everything inbetween" operator, like it does in the expression [a-z].
In your particular example, the character group [a-z,A-Z,-,.] denotes the following:
Everything from a to z
Comma ,
Everything from A to Z
Everything from , to , (i.e. just the comma again).
Dot .
In fact, you probably wanted to write [-a-zA-Z.]

Related

SIMILAR TO function in postgresql is not working as expected

I am using below code to do calculation
select column1 from tablename where code SIMILAR TO '%(-|_|–)EST[1-2][0-9](-|_)%'
for this column value -CSEST190-KCY18-04-01-L the condition was passed, but in actual I want to ignore this type of data.
The correct value which should pass through the above condition is
-CS-EST19-0-KCY18-04-01-L
-CS_EST19-0-KCY18-04-01-L
Any suggestions, how to avoid this type of confusion?
Easiest way is to go full-regex, instead of using SQL standard SIMILAR TO.
select column1 from tablename where code ~ '[_–-]EST[12][0-9][_-]'
Notice this is does not have to match the full string, and you don't have to add .* on both ends (equivalent of % in LIKE and SIMILAR TO). The reason you got a match on that is, because of the underscore _, which is a single wildcard character.
Also, I switched the order, in the square brackets, so that the dash is the last character. That way it's treated as a character literal, not as a range specifier.

Regex match invalid pattern ios swift 4 [duplicate]

How to rewrite the [a-zA-Z0-9!$* \t\r\n] pattern to match hyphen along with the existing characters ?
The hyphen is usually a normal character in regular expressions. Only if it’s in a character class and between two other characters does it take a special meaning.
Thus:
[-] matches a hyphen.
[abc-] matches a, b, c or a hyphen.
[-abc] matches a, b, c or a hyphen.
[ab-d] matches a, b, c or d (only here the hyphen denotes a character range).
Escape the hyphen.
[a-zA-Z0-9!$* \t\r\n\-]
UPDATE:
Never mind this answer - you can add the hyphen to the group but you don't have to escape it. See Konrad Rudolph's answer instead which does a much better job of answering and explains why.
It’s less confusing to always use an escaped hyphen, so that it doesn't have to be positionally dependent. That’s a \- inside the bracketed character class.
But there’s something else to consider. Some of those enumerated characters should possibly be written differently. In some circumstances, they definitely should.
This comparison of regex flavors says that C♯ can use some of the simpler Unicode properties. If you’re dealing with Unicode, you should probably use the general category \p{L} for all possible letters, and maybe \p{Nd} for decimal numbers. Also, if you want to accomodate all that dash punctuation, not just HYPHEN-MINUS, you should use the \p{Pd} property. You might also want to write that sequence of whitespace characters simply as \s, assuming that’s not too general for you.
All together, that works out to apattern of [\p{L}\p{Nd}\p{Pd}!$*] to match any one character from that set.
I’d likely use that anyway, even if I didn’t plan on dealing with the full Unicode set, because it’s a good habit to get into, and because these things often grow beyond their original parameters. Now when you lift it to use in other code, it will still work correctly. If you hard‐code all the characters, it won’t.
[-a-z0-9]+,[a-z0-9-]+,[a-z-0-9]+ and also [a-z-0-9]+ all are same.The hyphen between two ranges considered as a symbol.And also [a-z0-9-+()]+ this regex allow hyphen.
use "\p{Pd}" without quotes to match any type of hyphen. The '-' character is just one type of hyphen which also happens to be a special character in Regex.
Is this what you are after?
MatchCollection matches = Regex.Matches(mystring, "-");

I can't understand the behaviour of btrim()

I'm currently working with postgresql, I learned about this function btrim, I checked many websites for explanation, but I don't really understand.
Here they mention this example:
btrim('xyxtrimyyx', 'xyz')
It gives trim.
When I try this example:
btrim('xyxtrimyyx', 'yzz')
or
btrim('xyxtrimyyx', 'y')
I get this: xyxtrimyyx
I don't understand this. Why didn't it remove the y?
From the docs you point to, the definition says:
Remove the longest string consisting only of characters in characters
(a space by default) from the start and end of string
The reason your example doesn't work is because the function tries to strip the text from Both sides of the text, consisting only of the characters specified
Lets take a look at the first example (from the docs):
btrim('xyxtrimyyx', 'xyz')
This returns trim, because it goes through xyxtrimyyx and gets up to the t and doesn't see that letter in xyz, so that is where the function stops stripping from the front.
We are now left with trimyyx
Now we do the same, but from the end of the string.
While one of xyz is the last letter, remove that letter.
We do this until m, so we are left with trim.
Note: I have never worked with any form of sql. I could be wrong about the exact way that postgresql does this, But I am fairly certain from the docs that this is how it is done.

Looking for the first underscore then find the 5th space after

I am trying to create a filter where I am looking for the first (or last) occurrence of an underscore (or it can be any character) and then start from there to look for the 5th character.
I am thinking of something along the lines of either right or left char index picking a side to start on. Really trying to look for a good explanation of why your answer is written in that manner.
Example: I am looking for __poptarts_________.
So I would want it to start at the leftmost _ and search for the 5th character after that (p).
You could achieve that by using both SUBSTRING and CHARINDEX
SELECT SUBSTRING (string,(CHARINDEX('_',string,0)+1),5)
In your case which would be:
SELECT SUBSTRING ('I am looking for __poptarts_________',(CHARINDEX('_','I am looking for __poptarts_________',0)+1),5)
Result is _popt because you put two '_' before 'p'

Removing a trailing Space from Regex Matched group

I'm using regular expression lib icucore via RegKit on the iPhone to
replace a pattern in a large string.
The Pattern i'm looking for looks some thing like this
| hello world (P1)|
I'm matching this pattern with the following regular expression
\|((\w*|.| )+)\((\w\d+)\)\|
This transforms the input string into 3 groups when a match is found, of which group 1(string) and group 3(string in parentheses) are of interest to me.
I'm converting these formated strings into html links so the above would be transformed into
Hello world
My problem is the trailing space in the third group. Which when the link is highlighted and underlined, results with the line extending beyond the printed characters.
While i know i could extract all the matches and process them manually, using the search and replace feature of the icu lib is a much cleaner solution, and i would rather not do that as a result.
Many thanks as always
Would the following work as an alternate regular expression?
\|((\w*|.| )+)\s+\((\w\d+)\)\| Where inserting the extra \s+ pulls the space outside the 1st grouping.
Though, given your example & regex, I'm not sure why you don't just do:
\|(.+)\s+\((\w\d+)\)\|
Which will have the same effect. However, both your original regex and my simpler one would both fail, however on:
| hello world (P1)| and on the same line | howdy world (P1)|
where it would roll it up into 1 match.
\|\s*([\w ,.-]+)\s+\((\w\d+)\)\|
will put the trailing space(s) outside the capturing group. This will of course only work if there always is a space. Can you guarantee that?
If not, use
\|\s*([\w ,.-]+(?<!\s))\s*\((\w\d+)\)\|
This uses a lookbehind assertion to make sure the capturing group ends in a non-space character.