T-SQL: split string on multiple delimiters - tsql

I have been given a T-SQL task: to convert/format names which are in ALL CAPS into Title Case. I have decided that splitting the names into tokens, and capitalizing the first letter out of each token, would be a reasonable approach (I am willing to take advice if there's a better option, especially in T-SQL).
That said, to accomplish this, I'd have to split the name fields on spaces AND dashes, hyphens, etc. Then, once it is tokenized, I can worry about normalizing the case.
Is there any reasonable way to split a string along any delimiter in a list?

If ease & performance is important then grab a copy of PatExtract8k.
Here's a basic example where I split on any character that is not a letter or number ([^a-z0-9]):
-- Sample String
DECLARE #string VARCHAR(8000) = 'abc.123&xyz!4445556__5566^rrr';
-- Basic Use
SELECT pe.* FROM samd.patExtract8K(#string,'[^a-z0-9]') AS pe;
Output:
itemNumber itemIndex itemLength item
--------------- ----------- ----------- -------------
1 1 3 abc
2 5 3 123
3 9 3 xyz
4 13 7 4445556
5 22 4 5566
6 27 3 rrr
It returns what you need as well as:
the length of the item (ItemLength)
It's position in the string (ItemIndex)
It's ordinal position in the string (ItemNumber.)
Now against a table. Here we're doing the same thing but I'll explicitly call out the characters I want to use as a delimiter. Here it's any of these characters: *.&,?%/>
-- Sample Table
DECLARE #table TABLE (SomeId INT IDENTITY, SomeString VARCHAR(100));
INSERT #table VALUES('abc***332211,,XXX'),('abc.123&&555%jjj'),('ll/111>ff?12345');
SELECT t.*, pe.*
FROM #table AS t
CROSS APPLY samd.patExtract8K(t.SomeString,'[*.&,?%/>]') AS pe;
This returns:
SomeId SomeString itemNumber itemIndex itemLength item
----------- ------------------- ------------ ---------- ----------- ---------
1 abc***332211,,XXX 1 1 3 abc
1 abc***332211,,XXX 2 7 6 332211
1 abc***332211,,XXX 3 15 3 XXX
2 abc.123&&555%jjj 1 1 3 abc
2 abc.123&&555%jjj 2 5 3 123
2 abc.123&&555%jjj 3 10 3 555
2 abc.123&&555%jjj 4 14 3 jjj
3 ll/111>ff?12345 1 1 2 ll
3 ll/111>ff?12345 2 4 3 111
3 ll/111>ff?12345 3 8 2 ff
3 ll/111>ff?12345 4 11 5 12345
On the other hand - If I wanted to extract the delimiters I could change the pattern like this: [^*.&,?%/>]. Now the same query returns:
SomeId itemNumber itemIndex itemLength item
----------- -------------------- -------------------- ----------- ---------
1 1 4 3 ***
1 2 13 2 ,,
2 1 4 1 .
2 2 8 2 &&
2 3 13 1 %
3 1 3 1 /
3 2 7 1 >
3 3 10 1 ?

Related

KDB+: How to retrieve the rows immediately before and after a given row that conform to a specific logic?

Given the following table
time kind counter key1 value
----------------------------------------
1 1 1 1 1
2 0 1 1 2
3 0 1 2 3
5 0 1 1 4
5 1 2 2 5
6 0 2 3 6
7 0 2 2 7
8 1 3 3 8
9 1 4 3 9
How would one select the value in the first row
immediately after and immediately before each
row of kind 1 ordered by time where the key1
value is the same in both instances .i.e:
time value prevvalue nextvalue
---------------------------------------------
1 1 0n 2
5 5 3 7
8 8 6 0n
9 9 6 0n
Here are some of the things I have tried, though
to be honest I have no idea how to canonically achieve
something like this in q whereby the prior value has a
variable offset to the current row?
select
prev[value],
next[value],
by key1 where kind<>1
update 0N^prevval,0N^nextval from update prevval:prev value1,nextval:next value1 by key1 from table
Some advice or a pointer on how to achieve this would be great!
Thanks
I was able to use the following code to return a table meeting your requirements. If this is correct, the sample table you have provided is incorrect, otherwise I have misunderstood the question.
q)table:([] time:1 2 3 5 5 6 7 8 9;kind:1 0 0 0 1 0 0 1 1;counter:1 1 1 1 2 2 2 3 4;key1:1 1 2 1 2 3 2 3 3;value1:1 2 3 4 5 6 7 8 9)
q)tab2:update 0N^prevval,0N^nextval from update prevval:prev value1,nextval:next value1 by key1 from table
q)tab3:select from tab2 where kind=1
time value1 prevval nextval
---------------------------
1 1 2
5 5 3 7
8 8 6 9
9 9 8
The update statement in tab2:
update 0N^prevval,0N^nextval from update prevval:prev value1,nextval:next value1 by key1 from table
is simply adding 2 columns onto the original table with the previous and next values for each row. 0^ is filling the empty fields with nulls.
The select statement in tab3:
tab3:select from tab2 where kind=1
is filtering tab2 for rows where kind=1.
The final select statement:
select time,value1,prevval,nextval from tab3
is selecting the rows you want to be returned in the final result.
Hope this answers your question.
Thanks,
Caitlin

select all columns with suffix _test in q kdb

I have a partitioned table, similar to below table:
q)t:([]date:3#2019.01.01; a:1 2 3; a_test:2 3 4; b_test:3 4 5; c: 6 7 8);
date a a_test b_test c
----------------------------
2019.01.01 1 2 3 6
2019.01.01 2 3 4 7
2019.01.01 3 4 5 8
Now, I want to fetch date column and all columns have names with suffix "_test" from table t.
Expected output:
date a_test b_test
------------------------
2019.01.01 2 3
2019.01.01 3 4
2019.01.01 4 5
In my original table, there are more than 100 columns with name having _test so below is not a practical solution in this case.
q)select date, a_test, b_test from t where date=2019.01.01
I tried various options like below, but of no use:
q)delete all except date, *_test from select from t where date=2019.01.01
If the columns you are selecting are variable then you should use a functional qSQL statement to perform the query. The following can be used in your case
q)query:{[tab;dt;c]?[tab;enlist (=;`date;dt);0b;(`date,c)!`date,c]}
q)query[t;2019.01.01;cols[t] where cols[t] like "*_*"]
date a_test b_test
------------------------
2019.01.01 2 3
2019.01.01 3 4
2019.01.01 4 5
In order to craft a particular functional statement, you can parse your query, putting dummy columns in place if you aren't sure what they should be
q)parse "select date,c1,c2 from tab where date=dt"
?
`tab
,,(=;`date;`dt)
0b
`date`c1`c2!`date`c1`c2
A functional select is probably the best way to go here if you require adding further filters.
?[`t;();0b;{x!x}`date,exec c from meta t where c like "*_test"]
The functional form of any select quesry can be obtained by using the -5! operator on any SQL style statement.
In the example below I have created a table with 20 fields, each one beginning with either a or b.
I then use the functional form to define which fields I want.
q)tab:{[x] enlist x!count[x]#0}`$"_" sv ' raze string `a`b,/:\:til 10
q){[t;s]?[t;();0b;{[x] x!x} cols[t] where cols[t] like s]}[tab;"b*"]
b_0 b_1 b_2 b_3 b_4 b_5 b_6 b_7 b_8 b_9
---------------------------------------
0 0 0 0 0 0 0 0 0 0
q){[t;s]?[t;();0b;{[x] x!x} cols[t] where cols[t] like s]}[tab;"a*"]
a_0 a_1 a_2 a_3 a_4 a_5 a_6 a_7 a_8 a_9
---------------------------------------
0 0 0 0 0 0 0 0 0 0
q)-5!" select a,b from c"
?
`c
()
0b
`a`b!`a`b
Alternatively, if I don't require any filtering I can use the # operator as in below:
{[x;s] (cols[x] where cols[x] like s)#x}[ tab;"a*"]

TSQL - Max per group?

I have a table that looks like this:
GroupID UserID Value
1 1 10
1 2 20
1 3 30
1 4 40
1 5 45
1 6 49
1 7 80
1 8 90
2 1 2
2 2 24
2 3 34
2 4 48
2 5 56
3 1 etc.
3 2
3 3
3 4
4 1
4 2
4 3
I am trying to write a LEAD function that will give me the midpoint between each value. To do this I have written the following:
SELECT
[GroupID]
, [UserID]+0.5
, (LEAD ([Value], 1) OVER (ORDER BY GroupID, UserID) + [Value])/2 as [Value]
from dbo.myTable
The problem with this function is that when it gets to the last User in the group, it gives me a bad value because it's taking the [Value] on the current row and the value from the next row.
What I want to do is stop it when it reaches the maximum UserID for each Group. In other words, when it gets to GroupID = 1 and UserID = 8, it should end and start at the next Group. I do not want a row that looks like this:
GroupID UserID Value
1 8.5 46
I could run a DELETE statement after I INSERT the rows into the original table, but I don't have anything to identify when a row is the "maximum" User for it's Group. Ideally, I would like to somehow tell the lead statement not to calculate it in the first place.

Select max value rows from table column

my table look like this..
id name count
-- ---- -----
1 Mike 0
2 Duke 2
3 Smith 1
4 Dave 6
5 Rich 3
6 Rozie 8
7 Romeo 0
8 Khan 1
----------------------
I want to select rows with max(count) limit 5 (TOP 5 Names with maximum count)
that would look sumthing like...
id name count
-- ---- -----
6 Rozie 8
4 Dave 6
5 Rich 3
2 Duke 2
3 Smith 1
please help,,
thanks
Here is how:
MySQL:
SELECT * FROM tableName ORDER BY count DESC LIMIT 5
MS SQL:
SELECT TOP 5 * FROM tableName ORDER BY count DESC

tsql sum data and include default values for missing data

I would like a query that will show a sum of columns with a default value for missing data. For example assume I have a table as follows:
type_lookup:
id name
-----------
1 self
2 manager
3 peer
And a table as follows
data:
id type_lookup_id value
--------------------------
1 1 1
2 1 4
3 2 9
4 2 1
5 2 9
6 1 5
7 2 6
8 1 2
9 1 1
After running a query I would like a result set as follows:
type_lookup_id value
----------------------
1 13
2 25
3 0
I would like all rows in type_lookup table to be included in the result set - even if they don't appear in the data table.
It's a bit hard to read your data layout, but something like the following should do the trick:
SELECT tl.type_lookup_id, tl.name, sum(da.type_lookup_id) how_much
from type_lookup tl
left outer join data da
on da.type_lookup_id = tl.type_lookup_id
group by tl.type_lookup_id, tl.name
order by tl.type_lookup_id
[EDIT]
...subsequently edited by changing count() to sum().