How to match US-ASCII characters in T-SQL? - tsql

I want to store URLs in a column. According to RFC 3986, US-ASCII is the character set from which URLs are composed.
SQL Server has the VARCHAR type, which can encode all the characters from the US-ASCII character set, and 128 more that are dependent on the code page.
I want to use a CHECK constraint to ensure the values in the column contains only the printable characters from the US-ASCII character set; in other words, ASCII(#char) >= 32 AND ASCII(#char) < 127 for every character in the string.
I think I can use a LIKE expression to do this in a check constraint, but I can't find the right pattern. I'm trying to adapt Itzik Ben-Gan's trick of matching any character outside the allowed range, which he presents in his article Can I convert this string to an integer?.
In my test harness I create a table #TestData of candidates to insert into my column, a table #Patterns of patterns to be used with the LIKE operator, and then I select the result of matching each pattern against each candidate:
DECLARE #TestData TABLE (
String VARCHAR(60) COLLATE Latin1_General_CI_AS NOT NULL
);
INSERT INTO #TestData(String)
VALUES
('€ÿ'),
('ab3'),
('http://www.google.com/'),
('http://www.example.com/düsseldorf?neighbourhood=Lörick'),
('1234');
DECLARE #Patterns TABLE (
Pattern VARCHAR(12) COLLATE Latin1_General_CI_AS NOT NULL
);
INSERT INTO #Patterns (Pattern)
VALUES
('%[^0-9]%'),
('%[^' + CHAR(32) + '-' + CHAR(126) + ']%');
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ID,
String,
Pattern,
CASE WHEN String NOT LIKE Pattern THEN 1 ELSE 0 END AS [Match]
FROM #TestData CROSS JOIN #Patterns;
The first row inserted into #Patterns is like the pattern Itzik uses to match non-digit characters. The second row is my attempt to adapt this for characters outside the range of printable US-ASCII characters.
When I execute the above batch, I receive the following result set:
ID String Pattern Match
--- -------------------------------------------------------- ------------ ------
1 €ÿ %[^0-9]% 0
2 ab3 %[^0-9]% 0
3 http://www.google.com/ %[^0-9]% 0
4 http://www.example.com/düsseldorf?neighbourhood=Lörick %[^0-9]% 0
5 1234 %[^0-9]% 1
6 €ÿ %[^ -~]% 0
7 ab3 %[^ -~]% 0
8 http://www.google.com/ %[^ -~]% 0
9 http://www.example.com/düsseldorf?neighbourhood=Lörick %[^ -~]% 0
10 1234 %[^ -~]% 0
As expected, row 5 is a match because the candidate contains only digits. The candidates in rows 1 thru 4 do not contain only digits, so do not match the pattern.
As expected, the candidate in row 6 does not match the pattern because it contains 'high ASCII' characters.
I expect the candidates in rows 7, 8, and 10 to match because they contain only printable US-ASCII characters. But these do not match.
What is wrong with the pattern in the LIKE expression?

As suggested in the question comments, and in the answer to a similar question, I need to use a binary collation clause.
If I change the select statement to this:
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ID,
String,
Pattern,
CASE WHEN String NOT LIKE Pattern COLLATE Latin1_General_BIN THEN 1 ELSE 0 END AS [Match]
FROM #TestData CROSS JOIN #Patterns;
I get the following result set:
ID String Pattern Match
--- -------------------------------------------------------- ------------ ------
1 €ÿ %[^0-9]% 0
2 ab3 %[^0-9]% 0
3 http://www.google.com/ %[^0-9]% 0
4 http://www.example.com/düsseldorf?neighbourhood=Lörick %[^0-9]% 0
5 1234 %[^0-9]% 1
6 €ÿ %[^ -~]% 0
7 ab3 %[^ -~]% 1
8 http://www.google.com/ %[^ -~]% 1
9 http://www.example.com/düsseldorf?neighbourhood=Lörick %[^ -~]% 0
10 1234 %[^ -~]% 1
Now the column Match contains the expected values.

Related

T-SQL: split string on multiple delimiters

I have been given a T-SQL task: to convert/format names which are in ALL CAPS into Title Case. I have decided that splitting the names into tokens, and capitalizing the first letter out of each token, would be a reasonable approach (I am willing to take advice if there's a better option, especially in T-SQL).
That said, to accomplish this, I'd have to split the name fields on spaces AND dashes, hyphens, etc. Then, once it is tokenized, I can worry about normalizing the case.
Is there any reasonable way to split a string along any delimiter in a list?
If ease & performance is important then grab a copy of PatExtract8k.
Here's a basic example where I split on any character that is not a letter or number ([^a-z0-9]):
-- Sample String
DECLARE #string VARCHAR(8000) = 'abc.123&xyz!4445556__5566^rrr';
-- Basic Use
SELECT pe.* FROM samd.patExtract8K(#string,'[^a-z0-9]') AS pe;
Output:
itemNumber itemIndex itemLength item
--------------- ----------- ----------- -------------
1 1 3 abc
2 5 3 123
3 9 3 xyz
4 13 7 4445556
5 22 4 5566
6 27 3 rrr
It returns what you need as well as:
the length of the item (ItemLength)
It's position in the string (ItemIndex)
It's ordinal position in the string (ItemNumber.)
Now against a table. Here we're doing the same thing but I'll explicitly call out the characters I want to use as a delimiter. Here it's any of these characters: *.&,?%/>
-- Sample Table
DECLARE #table TABLE (SomeId INT IDENTITY, SomeString VARCHAR(100));
INSERT #table VALUES('abc***332211,,XXX'),('abc.123&&555%jjj'),('ll/111>ff?12345');
SELECT t.*, pe.*
FROM #table AS t
CROSS APPLY samd.patExtract8K(t.SomeString,'[*.&,?%/>]') AS pe;
This returns:
SomeId SomeString itemNumber itemIndex itemLength item
----------- ------------------- ------------ ---------- ----------- ---------
1 abc***332211,,XXX 1 1 3 abc
1 abc***332211,,XXX 2 7 6 332211
1 abc***332211,,XXX 3 15 3 XXX
2 abc.123&&555%jjj 1 1 3 abc
2 abc.123&&555%jjj 2 5 3 123
2 abc.123&&555%jjj 3 10 3 555
2 abc.123&&555%jjj 4 14 3 jjj
3 ll/111>ff?12345 1 1 2 ll
3 ll/111>ff?12345 2 4 3 111
3 ll/111>ff?12345 3 8 2 ff
3 ll/111>ff?12345 4 11 5 12345
On the other hand - If I wanted to extract the delimiters I could change the pattern like this: [^*.&,?%/>]. Now the same query returns:
SomeId itemNumber itemIndex itemLength item
----------- -------------------- -------------------- ----------- ---------
1 1 4 3 ***
1 2 13 2 ,,
2 1 4 1 .
2 2 8 2 &&
2 3 13 1 %
3 1 3 1 /
3 2 7 1 >
3 3 10 1 ?

select all columns with suffix _test in q kdb

I have a partitioned table, similar to below table:
q)t:([]date:3#2019.01.01; a:1 2 3; a_test:2 3 4; b_test:3 4 5; c: 6 7 8);
date a a_test b_test c
----------------------------
2019.01.01 1 2 3 6
2019.01.01 2 3 4 7
2019.01.01 3 4 5 8
Now, I want to fetch date column and all columns have names with suffix "_test" from table t.
Expected output:
date a_test b_test
------------------------
2019.01.01 2 3
2019.01.01 3 4
2019.01.01 4 5
In my original table, there are more than 100 columns with name having _test so below is not a practical solution in this case.
q)select date, a_test, b_test from t where date=2019.01.01
I tried various options like below, but of no use:
q)delete all except date, *_test from select from t where date=2019.01.01
If the columns you are selecting are variable then you should use a functional qSQL statement to perform the query. The following can be used in your case
q)query:{[tab;dt;c]?[tab;enlist (=;`date;dt);0b;(`date,c)!`date,c]}
q)query[t;2019.01.01;cols[t] where cols[t] like "*_*"]
date a_test b_test
------------------------
2019.01.01 2 3
2019.01.01 3 4
2019.01.01 4 5
In order to craft a particular functional statement, you can parse your query, putting dummy columns in place if you aren't sure what they should be
q)parse "select date,c1,c2 from tab where date=dt"
?
`tab
,,(=;`date;`dt)
0b
`date`c1`c2!`date`c1`c2
A functional select is probably the best way to go here if you require adding further filters.
?[`t;();0b;{x!x}`date,exec c from meta t where c like "*_test"]
The functional form of any select quesry can be obtained by using the -5! operator on any SQL style statement.
In the example below I have created a table with 20 fields, each one beginning with either a or b.
I then use the functional form to define which fields I want.
q)tab:{[x] enlist x!count[x]#0}`$"_" sv ' raze string `a`b,/:\:til 10
q){[t;s]?[t;();0b;{[x] x!x} cols[t] where cols[t] like s]}[tab;"b*"]
b_0 b_1 b_2 b_3 b_4 b_5 b_6 b_7 b_8 b_9
---------------------------------------
0 0 0 0 0 0 0 0 0 0
q){[t;s]?[t;();0b;{[x] x!x} cols[t] where cols[t] like s]}[tab;"a*"]
a_0 a_1 a_2 a_3 a_4 a_5 a_6 a_7 a_8 a_9
---------------------------------------
0 0 0 0 0 0 0 0 0 0
q)-5!" select a,b from c"
?
`c
()
0b
`a`b!`a`b
Alternatively, if I don't require any filtering I can use the # operator as in below:
{[x;s] (cols[x] where cols[x] like s)#x}[ tab;"a*"]

Pivot Table in SQL (using Groupby)

I have a table structured as below
Customer_ID Sequence Comment_Code Comment
1 10 0 a
1 11 1 b
1 12 1 c
1 13 1 d
2 20 0 x
2 21 1 y
3 100 0 m
3 101 1 n
3 102 1 o
1 52 0 t
1 53 1 y
1 54 1 u
Sequence number is the unique number in the table
I want the output in SQL as below
Customer_ID Sequence
1 abcd
2 xy
3 mno
1 tyu
Can someone please help me with this. I can provide more details if required.
enter image description here
This looks like a simple gaps/islands problem.
-- Sample Data
DECLARE #table TABLE
(
Customer_ID INT,
[Sequence] INT,
Comment_Code INT,
Comment CHAR(1)
);
INSERT #table
(
Customer_ID,
[Sequence],
Comment_Code,
Comment
)
VALUES (1,10 ,0,'a'),(1,11 ,1,'b'),(1,12 ,1,'c'),(1,13 ,1,'d'),(2,20 ,0,'x'),(2,21 ,1,'y'),
(3,100,0,'m'),(3,101,1,'n'),(3,102,1,'o'),(1,52 ,0,'t'),(1,53 ,1,'y'),(1,54 ,1,'u');
-- Solution
WITH groups AS
(
SELECT
t.Customer_ID,
Grouper = [Sequence] - DENSE_RANK() OVER (ORDER BY [Sequence]),
t.Comment
FROM #table AS t
)
SELECT
g.Customer_ID,
[Sequence] =
(
SELECT g2.Comment+''
FROM groups AS g2
WHERE g.Customer_ID = g2.Customer_ID AND g.Grouper = g2.Grouper
FOR XML PATH('')
)
FROM groups AS g
GROUP BY g.Customer_ID, g.Grouper;
Returns:
Customer_ID Sequence
----------- ----------
1 abcd
1 tyu
2 xy
3 mno

SQL Renumbering index after group by

I have the following input table:
Seq Group GroupSequence
1 0
2 4 A
3 4 B
4 4 C
5 0
6 6 A
7 6 B
8 0
Output table is:
Line NewSeq GroupSequence
1 1
2 2 A
3 2 B
4 2 C
5 3
6 4 A
7 4 B
8 5
The rules for the input table are:
Any positive integer in the Group column indicates that the rows are grouped together. The entire field may be NULL or blank. A null or 0 indicates that the row is processed on its own. In the above example there are two groups and three 'single' rows.
the GroupSequence column is a single character that sorts within the group. NULL, blank, 'A', 'B' 'C' 'D' are the only characters allowed.
if Group has a positive integer, there must be alphabetic character in GroupSequence.
I need a query that creates the output table with a new column that sequences as shown.
External apps needs to iterate through this table in either Line or NewSeq order(same order, different values)
I've tried variations on GROUP BY, PARTITION BY, OVER(), etc. WITH no success.
Any help much appreciated.
Perhaps this will help
The only trick here is Flg which will indicate a new Group Sequence (values will be 1 or 0). Then it is a small matter to sum(Flg) via a window function.
Edit - Updated Flg method
Example
Declare #YourTable Table ([Seq] int,[Group] int,[GroupSequence] varchar(50))
Insert Into #YourTable Values
(1,0,null)
,(2,4,'A')
,(3,4,'B')
,(4,4,'C')
,(5,0,null)
,(6,6,'A')
,(7,6,'B')
,(8,0,null)
Select Line = Row_Number() over (Order by Seq)
,NewSeq = Sum(Flg) over (Order By Seq)
,GroupSequence
From (
Select *
,Flg = case when [Group] = lag([Group],1) over (Order by Seq) then 0 else 1 end
From #YourTable
) A
Order By Line
Returns
Line NewSeq GroupSequence
1 1 NULL
2 2 A
3 2 B
4 2 C
5 3 NULL
6 4 A
7 4 B
8 5 NULL

Convert Varchar to Ascii

I'm trying to convert the contents of a VARCHAR field to be unique number that can be easily referenced by a 3rd party.
How can I convert a varchar to the ascii string equivalent? In TSQL? The ASCII() function converts a single character but what can I do to convert an entire string?
I've tried using
CAST(ISNULL(ASCII(Substring(RTRIM(LTRIM(PrimaryContactRegion)),1,1)),'')AS VARCHAR(3))
+ CAST(ISNULL(ASCII(Substring(RTRIM(LTRIM(PrimaryContactRegion)),2,1)),'')AS VARCHAR(3))
....but this is tedious, stupid looking, and just doesn't really work if I had long strings. Or if it is better how would I do the same thing in SSRS?
try something like this:
DECLARE #YourString varchar(500)
SELECT #YourString='Hello World!'
;WITH AllNumbers AS
(
SELECT 1 AS Number
UNION ALL
SELECT Number+1
FROM AllNumbers
WHERE Number<LEN(#YourString)
)
SELECT
(SELECT
ASCII(SUBSTRING(#YourString,Number,1))
FROM AllNumbers
ORDER BY Number
FOR XML PATH(''), TYPE
).value('.','varchar(max)') AS NewValue
--OPTION (MAXRECURSION 500) --<<needed if you have a string longer than 100
OUTPUT:
NewValue
---------------------------------------
72101108108111328711111410810033
(1 row(s) affected)
just to test it out:
;WITH AllNumbers AS
(
SELECT 1 AS Number
UNION ALL
SELECT Number+1
FROM AllNumbers
WHERE Number<LEN(#YourString)
)
SELECT SUBSTRING(#YourString,Number,1),ASCII(SUBSTRING(#YourString,Number,1)),* FROM AllNumbers
OUTPUT:
Number
---- ----------- -----------
H 72 1
e 101 2
l 108 3
l 108 4
o 111 5
32 6
W 87 7
o 111 8
r 114 9
l 108 10
d 100 11
! 33 12
(12 row(s) affected)
Also, you might want to use this:
RIGHT('000'+CONVERT(varchar(max),ASCII(SUBSTRING(#YourString,Number,1))),3)
to force all ASCII values into 3 digits, I'm not sure if this is necessary based on your usage or not.
Output using 3 digits per character:
NewValue
-------------------------------------
072101108108111032087111114108100033
(1 row(s) affected)
Well, I think that a solution to this will be very slow, but i guess that you could do something like this:
DECLARE #count INT, #string VARCHAR(100), #ascii VARCHAR(MAX)
SET #count = 1
SET #string = 'put your string here'
SET #ascii = ''
WHILE #count <= DATALENGTH(#string)
BEGIN
SELECT #ascii = #ascii + '&#' + ASCII(SUBSTRING(#string, #count, 1)) + ';'
SET #count = #count + 1
END
SET #ascii = LEFT(#ascii,LEN(#ascii)-1)
SELECT #ascii
I'm not in a pc with a database engine, so i can't really test this code. If it works, then you can create a UDF based on this.