Extracting Portions of String - substring

I have a field with the following types of string
X000233756_9981900025_201901_EUR_/
I firstly need to take take the characters to the left of the first _
Secondly I need to take the characters between the first and 2nd _
First _ is CHARINDEX('_',[Line_Item_Text],1) AS Position_1
Second _ is CHARINDEX('_',[Line_Item_Text],CHARINDEX('_',[Line_Item_Text],1)+1) AS Position_2
I was then expecting to be able to do
left([Line_Item_Text],CHARINDEX('_',[Line_Item_Text],1)-1) AS Data_1
Substring([Line_Item_Text],CHARINDEX('_',[Line_Item_Text],1)+1),CHARINDEX('_',[Line_Item_Text],CHARINDEX('_',[Line_Item_Text],1)+1) - CHARINDEX('_',[Line_Item_Text],1)+1)) AS Data_2"
Which should give me
X000233756
9981900025
But getting errors with incorrect number of functions when I start adding and subtracting from CHARINDEX Function.
Any ideas where I am going wrong?
TIA
Geoff

Actually, using the base string functions here is going to be an ugly nightmare. You might find that STRING_SPLIT along with some clever logic might be easier:
SELECT value
FROM STRING_SPLIT('X000233756_9981900025_201901_EUR_', '_')
WHERE LEN(value) > 6 AND NOT value LIKE '[A-Z]%';
This answer assumes that the third and fourth components would always be a 6 digit date and 3 letter currency code, and that the first (but not second) component would always start with some letter.
Demo

Related

selecting cases based upon first few characters in spss?

i want to select cases with particular first 3 characters.
for example cases with first 3 characters containing "I22".
the length of whole value can vary. e,g "I228" or "I2279" but they have common first three characters "I22"
i usually use compute variable_name= "I228".
but this is tedious as i have to enter all variation of "I22" e.g "I228", "I229" and so on..
it would be much easier if i can just select cases based upon same first 3 characters
you can use the char.cubstr function to find out what the first three characters are in your string variable. For example:
if char.substr(variable_name,1,3)="I22" keep_this=1.
or:
select cases if char.substr(variable_name,1,3)="I22".

I can't understand the behaviour of btrim()

I'm currently working with postgresql, I learned about this function btrim, I checked many websites for explanation, but I don't really understand.
Here they mention this example:
btrim('xyxtrimyyx', 'xyz')
It gives trim.
When I try this example:
btrim('xyxtrimyyx', 'yzz')
or
btrim('xyxtrimyyx', 'y')
I get this: xyxtrimyyx
I don't understand this. Why didn't it remove the y?
From the docs you point to, the definition says:
Remove the longest string consisting only of characters in characters
(a space by default) from the start and end of string
The reason your example doesn't work is because the function tries to strip the text from Both sides of the text, consisting only of the characters specified
Lets take a look at the first example (from the docs):
btrim('xyxtrimyyx', 'xyz')
This returns trim, because it goes through xyxtrimyyx and gets up to the t and doesn't see that letter in xyz, so that is where the function stops stripping from the front.
We are now left with trimyyx
Now we do the same, but from the end of the string.
While one of xyz is the last letter, remove that letter.
We do this until m, so we are left with trim.
Note: I have never worked with any form of sql. I could be wrong about the exact way that postgresql does this, But I am fairly certain from the docs that this is how it is done.

How do I remove the first or last n characters from a value in q/ kdbstudio?

I've looked into the underscore for drop/cut, but this only seems to remove the first or last n entries, not characters. Any ideas?
Depends on what you're using drop cut on.
Can you provide an example of your values?
Below shows how cut can be used on a sting and then a list of strings.
It uses each right to drop a value from each item.
http://code.kx.com/q/ref/adverbs/#each-right
q)1_"12456789"
"2456789"
q)
q)1_("12456789";"12456789")
"12456789"
q)
q)1_/:("12456789";"12456789")
"2456789"
"2456789"
#Connor Gervin had almost what I wanted, but if you want to cast back to a string, you can use `$(-3)_'string sym from tab

Looking for the first underscore then find the 5th space after

I am trying to create a filter where I am looking for the first (or last) occurrence of an underscore (or it can be any character) and then start from there to look for the 5th character.
I am thinking of something along the lines of either right or left char index picking a side to start on. Really trying to look for a good explanation of why your answer is written in that manner.
Example: I am looking for __poptarts_________.
So I would want it to start at the leftmost _ and search for the 5th character after that (p).
You could achieve that by using both SUBSTRING and CHARINDEX
SELECT SUBSTRING (string,(CHARINDEX('_',string,0)+1),5)
In your case which would be:
SELECT SUBSTRING ('I am looking for __poptarts_________',(CHARINDEX('_','I am looking for __poptarts_________',0)+1),5)
Result is _popt because you put two '_' before 'p'

Formatting dates as dd_mm_yyyy in Go gives strange values

So as in the title, I'm trying to format a date in dd_mm_yy format using time.Now().Format("02_01_2006") as shown in this playground session:
http://play.golang.org/p/alAj-OcRZt
First problem, dd_mm_yyyy isn't an acceptable format, only dd_mm_yy is, which is fine I can manipulate the returned string myself.
The problem I have for you is to help me figure out what Go is even trying to do with this input.
You should notice the result you get is:
10_1110009
A good few thousand years off and it's lost the underscore which it only does it for _2. Does this character sequence represent something special here?
Replacing the last underscore with a hyphen or space returns a valid result. dd_mm_yy works fine. Just this particular case seems to completely fly off the handle.
On my local machine (Go playground is on a specific date) the result for today (the 5th) is:
05_01 5016
Which is equally strange, if not moreso as it's substituted in a space which seems to be an ANSIC thing.
This is very likely due to the following bug: https://github.com/golang/go/issues/11334
This has been fixed in Go 1.6beta1
Found an issue from their github:
https://github.com/golang/go/issues/11334
Basically _2 is taking the 2 as the day value from the reference time and then trying to parse the rest (006) which it doesn't recognise so it all goes wrong from there.