Looking for the first underscore then find the 5th space after - tsql

I am trying to create a filter where I am looking for the first (or last) occurrence of an underscore (or it can be any character) and then start from there to look for the 5th character.
I am thinking of something along the lines of either right or left char index picking a side to start on. Really trying to look for a good explanation of why your answer is written in that manner.
Example: I am looking for __poptarts_________.
So I would want it to start at the leftmost _ and search for the 5th character after that (p).

You could achieve that by using both SUBSTRING and CHARINDEX
SELECT SUBSTRING (string,(CHARINDEX('_',string,0)+1),5)
In your case which would be:
SELECT SUBSTRING ('I am looking for __poptarts_________',(CHARINDEX('_','I am looking for __poptarts_________',0)+1),5)
Result is _popt because you put two '_' before 'p'

Related

Extracting Portions of String

I have a field with the following types of string
X000233756_9981900025_201901_EUR_/
I firstly need to take take the characters to the left of the first _
Secondly I need to take the characters between the first and 2nd _
First _ is CHARINDEX('_',[Line_Item_Text],1) AS Position_1
Second _ is CHARINDEX('_',[Line_Item_Text],CHARINDEX('_',[Line_Item_Text],1)+1) AS Position_2
I was then expecting to be able to do
left([Line_Item_Text],CHARINDEX('_',[Line_Item_Text],1)-1) AS Data_1
Substring([Line_Item_Text],CHARINDEX('_',[Line_Item_Text],1)+1),CHARINDEX('_',[Line_Item_Text],CHARINDEX('_',[Line_Item_Text],1)+1) - CHARINDEX('_',[Line_Item_Text],1)+1)) AS Data_2"
Which should give me
X000233756
9981900025
But getting errors with incorrect number of functions when I start adding and subtracting from CHARINDEX Function.
Any ideas where I am going wrong?
TIA
Geoff
Actually, using the base string functions here is going to be an ugly nightmare. You might find that STRING_SPLIT along with some clever logic might be easier:
SELECT value
FROM STRING_SPLIT('X000233756_9981900025_201901_EUR_', '_')
WHERE LEN(value) > 6 AND NOT value LIKE '[A-Z]%';
This answer assumes that the third and fourth components would always be a 6 digit date and 3 letter currency code, and that the first (but not second) component would always start with some letter.
Demo

PATINDEX incorrect result when looking for dash character "-"

This simple example shows the issue I've run into, but I don't understand why...
I'm testing for the location of the first character that is either a lower or upper case letter, a single dash, or a period in a string parameter passed to me.
These two pattern matches appear to check the same thing, and yet run this code yourself and it will print a 0 then a 3:
PRINT PATINDEX ( '%[a-z,A-Z,-,.]%', '16-82')
PRINT PATINDEX ( '%[-,a-z,A-Z,.]%', '16-82')
I don't understand why it works only if the dash character is the first one we check for.
Is this a bug? Or working as designed and I missed something... I'm using SQL Server 2016, but I don't think that matters.
A dash within a character group may play either of the two roles:
It may denote the dash itself, like it does in the expression [-abc]
It may denote the "everything inbetween" operator, like it does in the expression [a-z].
In your particular example, the character group [a-z,A-Z,-,.] denotes the following:
Everything from a to z
Comma ,
Everything from A to Z
Everything from , to , (i.e. just the comma again).
Dot .
In fact, you probably wanted to write [-a-zA-Z.]

I can't understand the behaviour of btrim()

I'm currently working with postgresql, I learned about this function btrim, I checked many websites for explanation, but I don't really understand.
Here they mention this example:
btrim('xyxtrimyyx', 'xyz')
It gives trim.
When I try this example:
btrim('xyxtrimyyx', 'yzz')
or
btrim('xyxtrimyyx', 'y')
I get this: xyxtrimyyx
I don't understand this. Why didn't it remove the y?
From the docs you point to, the definition says:
Remove the longest string consisting only of characters in characters
(a space by default) from the start and end of string
The reason your example doesn't work is because the function tries to strip the text from Both sides of the text, consisting only of the characters specified
Lets take a look at the first example (from the docs):
btrim('xyxtrimyyx', 'xyz')
This returns trim, because it goes through xyxtrimyyx and gets up to the t and doesn't see that letter in xyz, so that is where the function stops stripping from the front.
We are now left with trimyyx
Now we do the same, but from the end of the string.
While one of xyz is the last letter, remove that letter.
We do this until m, so we are left with trim.
Note: I have never worked with any form of sql. I could be wrong about the exact way that postgresql does this, But I am fairly certain from the docs that this is how it is done.

How do I remove the first or last n characters from a value in q/ kdbstudio?

I've looked into the underscore for drop/cut, but this only seems to remove the first or last n entries, not characters. Any ideas?
Depends on what you're using drop cut on.
Can you provide an example of your values?
Below shows how cut can be used on a sting and then a list of strings.
It uses each right to drop a value from each item.
http://code.kx.com/q/ref/adverbs/#each-right
q)1_"12456789"
"2456789"
q)
q)1_("12456789";"12456789")
"12456789"
q)
q)1_/:("12456789";"12456789")
"2456789"
"2456789"
#Connor Gervin had almost what I wanted, but if you want to cast back to a string, you can use `$(-3)_'string sym from tab

Removing a trailing Space from Regex Matched group

I'm using regular expression lib icucore via RegKit on the iPhone to
replace a pattern in a large string.
The Pattern i'm looking for looks some thing like this
| hello world (P1)|
I'm matching this pattern with the following regular expression
\|((\w*|.| )+)\((\w\d+)\)\|
This transforms the input string into 3 groups when a match is found, of which group 1(string) and group 3(string in parentheses) are of interest to me.
I'm converting these formated strings into html links so the above would be transformed into
Hello world
My problem is the trailing space in the third group. Which when the link is highlighted and underlined, results with the line extending beyond the printed characters.
While i know i could extract all the matches and process them manually, using the search and replace feature of the icu lib is a much cleaner solution, and i would rather not do that as a result.
Many thanks as always
Would the following work as an alternate regular expression?
\|((\w*|.| )+)\s+\((\w\d+)\)\| Where inserting the extra \s+ pulls the space outside the 1st grouping.
Though, given your example & regex, I'm not sure why you don't just do:
\|(.+)\s+\((\w\d+)\)\|
Which will have the same effect. However, both your original regex and my simpler one would both fail, however on:
| hello world (P1)| and on the same line | howdy world (P1)|
where it would roll it up into 1 match.
\|\s*([\w ,.-]+)\s+\((\w\d+)\)\|
will put the trailing space(s) outside the capturing group. This will of course only work if there always is a space. Can you guarantee that?
If not, use
\|\s*([\w ,.-]+(?<!\s))\s*\((\w\d+)\)\|
This uses a lookbehind assertion to make sure the capturing group ends in a non-space character.