Need to go from hostname to base domain - tsql

I need a function:
f(fqdn,suffix) -> basedomain
with these example inputs and outputs:
f('foobar.quux.somedomain.com','com') -> 'somedomain.com'
f('somedomain.com','com') -> 'somedomain.com'
f('foobar.quux.somedomain.com.br','com.br') -> 'somedomain.com.br'
f('somedomain.com.br','com.br') -> 'somedomain.com.br'
In plain English, if the suffix has n segments, take the last n+1 segments. Find the base domain for the FQDN, allowing for the fact that some FQDNs have more than one suffix element.
The suffixes I need to match are here. I've already got them in my SQL database.
I could write this in C#; it might not be the most elegant but it would work. Unfortunately I would like to have this function in either T-SQL, where it is closest to the data, or in Powershell, which is where the rest of the utility that consumes this data is going to be. I suppose it would be ok to do it in C#, compile to an assembly and then access it from T-SQL, or even from Powershell ... if that would be the fastest executing. If there's some reasonably clever alternative in pure T-SQL or simple Powershell, I'd like that.
EDIT: One thing I forgot to mention explicitly (but which is clear when reviewing the suffix list, at my link above) is that we must pick the longest matching suffix. Both "br" and "com.br" appear in the suffix list (with similar things happening for uk, pt, etc). So the SQL has to use a window function to make sure the longest matching suffix is found.
Here is how far I got when I was doing the SQL. I had gotten lost in all the substring/reverse functions.
SELECT Domain, suffix
FROM (
SELECT SD.Domain, SL.suffix,
RN=ROW_NUMBER() OVER (
PARTITION BY sd.Domain ORDER BY LEN(SL.suffix) DESC)
FROM SiteDomains SD
INNER JOIN suffixlist SL ON SD.Domain LIKE '%.'+SL.suffix
) AS X
WHERE RN=1
This works ok for finding the right suffix. I'm a little concerned about its performance though.

The following demonstrates matching FQDNs with TLDs and extracting the desired n + 1 domain name segments:
-- Sample data.
declare #SampleTLDs as Table ( TLD VarChar(64) );
insert into #SampleTLDs ( TLD ) values
( 'com' ), ( 'somedomain.com' ), ( 'com.br' );
declare #SampleFQDNs as Table ( FQDN VarChar(64) );
insert into #SampleFQDNs ( FQDN ) values
( 'foobar.quux.somedomain.com' ), ( 'somedomain.com' ),
( 'foobar.quux.somedomain.com.br' ), ( 'somedomain.com.br' );
select * from #SampleTLDs;
select * from #SampleFQDNs;
-- Fiddle about.
select FQDN, TLD,
case
when DotPosition = 0 then FQDN
else Reverse( Left( ReversedPrefix, DotPosition - 1) ) + '.' + TLD
end as Result
from (
select FQDNs.FQDN, TLDs.TLD,
Substring( Reverse( FQDNs.FQDN ), Len( TLDs.TLD ) + 2, 100 ) as ReversedPrefix,
CharIndex( '.', Substring( Reverse( FQDNs.FQDN ), Len( TLDs.TLD ) + 2, 100 ) ) as DotPosition
from #SampleFQDNs as FQDNs inner join
#SampleTLDs as TLDs on FQDNs.FQDN like '%.' + TLDs.TLD or FQDNs.FQDN = TLDs.TLD ) as Edna;
-- To select only the longest matching TLD for each FQDN:
with
ExtendedFQDNs as (
select FQDNs.FQDN, TLDs.TLD, Row_Number() over ( partition by FQDN order by Len( TLDs.TLD ) desc ) as TLDLenRank,
Substring( Reverse( FQDNs.FQDN ), Len( TLDs.TLD ) + 2, 100 ) as ReversedPrefix,
CharIndex( '.', Substring( Reverse( FQDNs.FQDN ), Len( TLDs.TLD ) + 2, 100 ) ) as DotPosition
from #SampleFQDNs as FQDNs inner join
#SampleTLDs as TLDs on FQDNs.FQDN like '%.' + TLDs.TLD or FQDNs.FQDN = TLDs.TLD )
select FQDN, TLD,
case
when DotPosition = 0 then FQDN
else Reverse( Left( ReversedPrefix, DotPosition - 1) ) + '.' + TLD
end as Result
from ExtendedFQDNs
where TLDLenRank = 1;

Here's how I would do it in C#:
string getBaseDomain(string fqdn, string suffix)
{
string[] domainSegs = fqdn.Split('.');
return domainSegs[domainSegs.Length - suffix.Split('.').Length - 1] + "." + suffix;
}
So here it is in Powershell:
function getBaseDomain
{
Param(
[string]$fqdn,
[string]$suffix
)
$domainSegs = $fqdn.Split(".");
return $domainSegs[$domainSegs.Length - $suffix.Split(".").Length - 1] + "."+$suffix;
}
Seems rather silly now to have wasted stackoverflow.com's time with this. My apologies.

Here is a tsql variant...
declare #fqdn varchar(256) = 'somedomain.com'
declare #suffix varchar(128) = 'com'
select left(#fqdn,CHARINDEX(#suffix,#fqdn) - 2)
if(select CHARINDEX('.',reverse(left(#fqdn,CHARINDEX(#suffix,#fqdn) - 2)))) = 0
begin
select left(#fqdn,CHARINDEX(#suffix,#fqdn) - 2) + '.' + #suffix
end
else
begin
select right(left(#fqdn,CHARINDEX(#suffix,#fqdn) - 2),CHARINDEX('.',reverse(left(#fqdn,CHARINDEX(#suffix,#fqdn) - 2))) - 1) + '.' + #suffix
end

Related

Why using same field when filtering cause different execution time? (different index usage)

When I run query and filter by agreement_id it is slow,
but when I filter by an alias id it is fast. (Look at the end of the query)
Why using same field when filtering cause different execution time?
Links to explain analyze:
slow1, slow2
fast1, fast2
Difference start at #20: Where different indexes are used:
Index Cond: (o.sys_period #> sys_time()) VS Index Cond: (o.agreement_id = 38)
PS. It would be nice if I can contact to developer of this feature (I have one more similar problem)
UPD I did some experiments. when I remove window functions from my query it works fast in any case. So why window function stop index usage in some cases? How to escape/workaround that?
dbfiddle with minimal test case
Server version is v13.1
Full query:
WITH gconf AS
-- https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-SELECT
NOT MATERIALIZED -- force it to be merged into the parent query
-- it gives a net savings because each usage of the WITH query needs only a small part of the WITH query's full output.
( SELECT
ocd.*,
tstzrange( '2021-05-01', '2021-05-01', '[]') AS acc_period,
(o).agreement_id AS id, -- Required to passthrough WINDOW FUNCTION
(o).id AS order_id,
(ic).consumed_period AS consumed_period,
dense_rank() OVER ( PARTITION BY (o).agreement_id, (o).id ORDER BY (ic).consumed_period ) AS nconf,
row_number() OVER ( wconf ORDER BY (c).sort_order NULLS LAST ) AS nitem,
(sum( ocd.item_cost ) OVER wconf)::numeric( 10, 2) AS conf_cost,
max((ocd.ic).consumed) OVER wconf AS consumed,
CASE WHEN true
THEN (sum( ocd.item_suma ) OVER wconf)::numeric( 10, 2 )
ELSE (sum( ocd.item_cost ) OVER wconf)::numeric( 10, 2 )
END AS conf_suma
FROM order_cost_details( tstzrange( '2021-05-01', '2021-05-01', '[]') ) ocd
WHERE true OR (ocd.ic).consumed_period #> lower( tstzrange( '2021-05-01', '2021-05-01', '[]') )
WINDOW wconf AS ( PARTITION BY (o).agreement_id, (o).id, (ic).consumed_period )
),
gorder AS (
SELECT *,
(conf_suma/6)::numeric( 10, 2 ) as conf_nds,
sum( conf_suma ) FILTER (WHERE nitem = 1) OVER worder AS order_suma
FROM gconf
WINDOW worder AS ( PARTITION BY gconf.id, (o).id )
-- TODO: Ask PG developers: Why changing to (o).agreement_id slows down query?
-- WINDOW worder AS ( PARTITION BY (o).agreement_id, (o).id )
)
SELECT
u.id, consumed_period, nconf, nitem,
(c).id as item_id,
COALESCE( (c).sort_order, pd.sort_order ) as item_order,
COALESCE( st.display, st.name, rt.display, rt.name ) as item_name,
COALESCE( item_qty, (c).amount/rt.unit ) as item_qty,
COALESCE( (p).label, rt.label ) as measure,
item_price, item_cost, item_suma,
conf_cost, consumed, conf_suma, conf_nds, order_suma,
(order_suma/6)::numeric( 10, 2 ) as order_nds,
sum( conf_suma ) FILTER (WHERE nitem = 1 ) OVER wagreement AS total_suma,
sum( (order_suma/6)::numeric( 10, 2 ) ) FILTER (WHERE nitem = 1 AND nconf = 1) OVER wagreement AS total_nds,
pkg.id as package_id,
pkg.link_1c_id as package_1c_id,
COALESCE( pkg.display, pkg.name ) as package,
acc_period
FROM gorder u
LEFT JOIN resource_type rt ON rt.id = (c).resource_type_id
LEFT JOIN service_type st ON st.id = (c).service_type_id
LEFT JOIN package pkg ON pkg.id = (o).package_id
LEFT JOIN package_detail pd ON pd.package_id = (o).package_id
AND pd.resource_type_id IS NOT DISTINCT FROM (c).resource_type_id
AND pd.service_type_id IS NOT DISTINCT FROM (c).service_type_id
-- WHERE (o).agreement_id = 38 -- slow
WHERE u.id = 38 -- fast
WINDOW wagreement AS ( PARTITION BY (o).agreement_id )
As problem workaround we can additionally SELECT an alias for column used at PARTITION BY expression. Then PG apply optimization and use index.
The answer to the question could be: PG does not apply optimization if composite type is used. Notice as it works:
PARTITION | FILTER | IS USED?
------------------------------
ALIAS | ORIG | NO
ALIAS | ALIAS | YES
ORIG | ALIAS | NO
ORIG | ORIG | NO
See this dbfiddle
create table agreement ( ag_id int, name text, cost numeric(10,2) );
create index ag_idx on agreement (ag_id);
insert into agreement (ag_id, name, cost) values ( 1, '333', 22 ),
(1,'333', 33), (1, '333', 7), (2, '555', 18 ), (2, '555', 2), (3, '777', 4);
select * from agreement;
create function initial ()
returns table( agreement_id int, ag agreement ) language sql stable AS $$
select ag_id, t from agreement t;
$$;
select * from initial() t;
explain( analyze, costs, buffers, verbose ) with totals_by_ag as (
select
*,
sum( (t.ag).cost ) over ( partition by agreement_id ) as total
from initial() t
)
select * from totals_by_ag t
where (t.ag).ag_id = 1; -- index is NOT USED
explain( analyze, costs, buffers, verbose ) with totals_by_ag as (
select
*,
sum( (t.ag).cost ) over ( partition by agreement_id ) as total
from initial() t
)
select * from totals_by_ag t
where agreement_id = 1; -- index is used when alias for column is used
explain( analyze, costs, buffers, verbose ) with totals_by_ag as (
select
*,
sum( (t.ag).cost ) over ( partition by (t.ag).ag_id ) as total --renamed
from initial() t
)
select * from totals_by_ag t
where agreement_id = 1; -- index is NOT USED because grouping by original column
explain( analyze, costs, buffers, verbose ) with totals_by_ag as (
select
*,
sum( (t.ag).cost ) over ( partition by (t.ag).ag_id ) as total --renamed
from initial() t
)
select * from totals_by_ag t
where (t.ag).ag_id = 1; -- index is NOT USED even if at both cases original column

T-SQL - WHERE #Parameter LIKE column+'%'

I have the following query:
DECLARE #phone varchar(50) = '972544123123'
SELECT top 1 prefix_number
FROM prefix_numbers
WHERE #phone LIKE LTRIM(RTRIM(prefix_number)) + '%'
ORDER BY len(prefix_number) DESC
It is used to find the shortest prefix for a phone number.
This is not sargable, and results in a table scan.
Do you have any ideas on how to rewrite this?
the output for this is 972, where all possible prefixes are 9725 & 972.
Assumption: The LTrim is a red herring and all prefixes are store as right-filled fixed-length strings.
The following code should be able to use an index on Prefix in the table of prefixes. It breaks the target phone number (#Phone) into substrings from one character up to the length of the number, then performs a join on equality with the available prefixes. Using top 1 and order by it retrieves the shortest (or longest) match, if any.
declare #Prefixes as Table ( PrefixId Int Identity, Prefix Char(50) );
insert into #Prefixes ( Prefix ) values
( '914' ), ( '972544' ), ( 'BR549' ), ( '972' );
select PrefixId, '''' + Prefix + '''' as Prefix -- Show blank padding.
from #Prefixes;
declare #Phone as VarChar(50) = '972544123123';
with
Substrings as (
select 1 as Length, Cast( Left( #Phone, 1 ) as VarChar(50) ) as TargetPrefix
union all
select Length + 1, Left( #Phone, Length + 1 )
from Substrings
where Length + 1 <= Len( #Phone )
)
-- select * from Substrings; -- Use this to see the intermediate results.
select top 1 P.PrefixId, P.Prefix
from #Prefixes as P inner join
Substrings as S on P.Prefix = S.TargetPrefix
order by S.Length; -- Ascending for shortest match, descending for longest match.
If you have a tally (or numbers) table it may be more efficient to use it to split the string into substrings.

SQL 2017 - Comparing values between two tables where certain values can be NULL

I have the following Tables with the following data:
CREATE TABLE TestSource (
InstrumentID int,
ProviderID int,
KPI1 int,
Col2 varchar(255),
KPI3 int
);
CREATE TABLE TestTarget (
InstrumentID int,
ProviderID int,
KPI1 int,
Col2 varchar(255),
KPI3 int
);
INSERT INTO TestSource (InstrumentID,ProviderID,KPI1,Col2,KPI3)
VALUES (123, 27, 1, 'ABC', 10.0 ),
(1234, 27, 2, 'DEF', 10.0 ),
(345, 27, 1, NULL, 0.00 );
INSERT INTO TestTarget (InstrumentID,ProviderID,KPI1,Col2,KPI3)
VALUES (123, 27, 1, 'ABC', 10.0 ),
(1234, 27, 2, 'DEF', 10.0 ),
(345, 27, 1, 'ABC', 0.0 );
I'm trying to compare the values between tables. Here's the query logic I am currently using:
DECLARE #Result NVARCHAR(max)
;WITH
compare_source (InstrumentID,ProviderID,
/*** Source columns to compare ***/
Col1Source, Col2Source,Col3Source
)
as (
select InstrumentID
,ProviderID
,KPI1
--,ISNULL(Col2,'NA') as Col2
,Col2
,KPI3
from TestSource
group by
InstrumentID
,ProviderID
,KPI1
,Col2
,KPI3
),
compare_target (InstrumentID,ProviderID,
/*** Target columns to compare ***/
Col1Target,Col2Target,Col3Target
)
as
(
select
InstrumentID
,ProviderID
,KPI1
--,1
,Col2
,KPI3
from TestTarget
group by
InstrumentID
,ProviderID
,KPI1
,Col2
,KPI3
)
SELECT #Result = STRING_AGG ('InstrumentID = ' + CONVERT(VARCHAR,InstrumentID)
+ ', Col1: ' + CONVERT(VARCHAR,Col1Source) + ' vs ' + CONVERT(VARCHAR,Col1Target)
+ ', Col2: ' + CONVERT(VARCHAR,Col2Source) + ' vs ' + CONVERT(VARCHAR,Col2Target)
+ ', Col3: ' + CONVERT(VARCHAR,Col3Source) + ' vs ' + CONVERT(VARCHAR,Col3Target)
, CHAR(13) + CHAR(10)
)
FROM
(
select
s.InstrumentID
,s.Col1Source
,t.Col1Target
,s.Col2Source
,t.Col2Target
,s.Col3Source
,t.Col3Target
from compare_source s
left join compare_target t on t.InstrumentID = s.InstrumentID and t.ProviderID = s.ProviderID
where not exists
(
select 1 from compare_target t where
s.InstrumentID = t.InstrumentID AND
( s.Col1Source = t.Col1Target ) OR (ISNULL(s.Col1Source, t.Col1Target) IS NULL) AND
( s.Col2Source = t.Col2Target ) OR (ISNULL(s.Col2Source, t.Col2Target) IS NULL) AND
( s.Col3Source = t.Col3Target ) OR (ISNULL(s.Col3Source, t.Col3Target) IS NULL)
)
) diff
PRINT #Result
When there are no NULL values in my tables, the comparison works well. However, as soon as I attempt to insert NULLs in either of the tables, my comparison logic breaks down and does not account for the differences between tables values.
I know that I could easily do an ISNULL on my columns in my individual selects, however, I'd like to keep it as generic as possible and to only do my comparison checks and NULL checks in my final NOT EXISTS comparison WHERE clause.
I've also tried the following logic in my comparison logic without success:
(
select 1 from compare_target t where
s.InstrumentID = t.InstrumentID AND
( s.Col1Source = t.Col1Target OR (s.Col1Source IS NULL AND t.Col1Target IS NULL) ) AND
( s.Col2Source = t.Col2Target OR (s.Col2Source IS NULL AND t.Col2Target IS NULL) ) AND
( s.Col3Source = t.Col3Target OR (s.Col3Source IS NULL AND t.Col3Target IS NULL) )
)
Another issue I am having is that my query cannot distinguish between data formats (for example, it sees the value 0.00 as equivalent to 0.0)
I'm not totally certain as to what I am missing.
Any help to put me on the right path would be great.
Well the two problems I see are this:
The WHERE clause at the bottom needs to have extra parenthesis to combine your ORs with your ANDs so that the order of precedence is correct:
select 1 from compare_target t where
s.InstrumentID = t.InstrumentID AND
(( s.Col1Source = t.Col1Target ) OR (ISNULL(s.Col1Source, t.Col1Target) IS NULL)) AND
(( s.Col2Source = t.Col2Target ) OR (ISNULL(s.Col2Source, t.Col2Target) IS NULL)) AND
(( s.Col3Source = t.Col3Target ) OR (ISNULL(s.Col3Source, t.Col3Target) IS NULL))
When you make that change the one row that is returned has a NULL value in the Col2Source column. So when you try and build the string that you are sending to STRING_AGG it has a NULL in the middle of it. So the entire string will be NULL. So you will need to use ISNULL in either the subquery in your FROM clause or within the STRING_AGG()....or is suppose right where you had it commented out.

Get characters before underscore and separated by comma from a string in SQL Server 2008

I tried this query
DECLARE #AdvancedSearchSelectedDropdownName TABLE (
SelectedIds VARCHAR(2048),
AdvanceSearchOptionTypeId INT
)
INSERT INTO #AdvancedSearchSelectedDropdownName
VALUES ('4_0,5_1,6_2,7_3', 23),
('62_3', 21), ('2_4', 23)
DECLARE #selectedIds VARCHAR(MAX) = '';
SELECT #selectedIds +=
CASE WHEN SelectedIds IS NULL
THEN #selectedIds + ISNULL(SelectedIds + ',', '')
WHEN SelectedIds IS NOT NULL
THEN SUBSTRING(SelectedIds, 0, CHARINDEX('_', SelectedIds, 0)) + ','
END
FROM #AdvancedSearchSelectedDropdownName WHERE advanceSearchOptionTypeId = 23
SELECT #selectedIds
Current output: 4,2
Required output: 4,5,6,7,2
We may have n number of comma separated values in the SelectedIds column.
You might go this route:
WITH Casted AS
(
SELECT *
,CAST('<x><y>' + REPLACE(REPLACE(SelectedIds,'_','</y><y>'),',','</y></x><x><y>') + '</y></x>' AS XML) SplittedToXml
FROM #AdvancedSearchSelectedDropdownName
)
SELECT *
FROM Casted;
This will return your data in this form:
<x>
<y>4</y>
<y>0</y>
</x>
<x>
<y>5</y>
<y>1</y>
</x>
<x>
<y>6</y>
<y>2</y>
</x>
<x>
<y>7</y>
<y>3</y>
</x>
Now we can grab all the x and just the first y:
WITH Casted AS
(
SELECT *
,CAST('<x><y>' + REPLACE(REPLACE(SelectedIds,'_','</y><y>'),',','</y></x><x><y>') + '</y></x>' AS XML) SplittedToXml
FROM #AdvancedSearchSelectedDropdownName
)
SELECT Casted.AdvanceSearchOptionTypeId AS TypeId
,x.value('y[1]/text()[1]','int') AS IdValue
FROM Casted
CROSS APPLY SplittedToXml.nodes('/x') A(x);
The result:
TypeId IdValue
23 4
23 5
23 6
23 7
21 62
23 2
Hint: Do not store comma delimited values!
It is a very bad idea to store your data in this format. You can use a generic format like my XML to store this or a structure of related side tables. But such construction tend to turn out as a real pain in the neck...
After a little re-think. Perhaps something a little more straightforward.
Now, if you have a limited number of _N
Example
;with cte as (
Select *
,RN = Row_Number() over(Order by (Select NULL))
From #AdvancedSearchSelectedDropdownName A
)
Select AdvanceSearchOptionTypeId
,IDs = replace(
replace(
replace(
replace(
replace(
stuff((Select ',' +SelectedIds From cte Where AdvanceSearchOptionTypeId=A.AdvanceSearchOptionTypeId Order by RN For XML Path ('')),1,1,'')
,'_0','')
,'_1','')
,'_2','')
,'_3','')
,'_4','')
From cte A
Group By AdvanceSearchOptionTypeId
Returns
AdvanceSearchOptionTypeId IDs
21 62
23 4,5,6,7,2
If interested in a helper function.
Tired of extracting strings (left, right, charindex, patindex, ...) I modified s split/parse function to accept TWO non-like delimiters. In this case a , and _.
Example
;with cte as (
Select A.AdvanceSearchOptionTypeId
,B.*
,RN = Row_Number() over(Order by (Select NULL))
From #AdvancedSearchSelectedDropdownName A
Cross Apply [dbo].[tvf-Str-Extract](','+A.SelectedIds,',','_') B
)
Select AdvanceSearchOptionTypeId
,IDs = stuff((Select ',' +RetVal From cte Where AdvanceSearchOptionTypeId=A.AdvanceSearchOptionTypeId Order by RN,RetVal For XML Path ('')),1,1,'')
From cte A
Group By AdvanceSearchOptionTypeId
Returns
AdvanceSearchOptionTypeId IDs
21 62
23 4,5,6,7,2
The TVF if Interested
CREATE FUNCTION [dbo].[tvf-Str-Extract] (#String varchar(max),#Delimiter1 varchar(100),#Delimiter2 varchar(100))
Returns Table
As
Return (
with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N) As (Select Top (IsNull(DataLength(#String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 N1,cte1 N2,cte1 N3,cte1 N4,cte1 N5,cte1 N6) A ),
cte3(N) As (Select 1 Union All Select t.N+DataLength(#Delimiter1) From cte2 t Where Substring(#String,t.N,DataLength(#Delimiter1)) = #Delimiter1),
cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(#Delimiter1,#String,s.N),0)-S.N,8000) From cte3 S)
Select RetSeq = Row_Number() over (Order By N)
,RetPos = N
,RetVal = left(RetVal,charindex(#Delimiter2,RetVal)-1)
From (
Select *,RetVal = Substring(#String, N, L)
From cte4
) A
Where charindex(#Delimiter2,RetVal)>1
)
/*
Max Length of String 1MM characters
Declare #String varchar(max) = 'Dear [[FirstName]] [[LastName]], ...'
Select * From [dbo].[tvf-Str-Extract] (#String,'[[',']]')
*/
Disclaimer.As per first Normal form, you should not store multiple values in a single cell. I would suggest you to avoid storing this way.
Still the approach would be: Create a UDF function which separates comma separated list into a table valued variable. Below code I have not tested. but, it gives idea on how to approach this problem.
Refer to CSV to table approaches
Declare #selectedIds varchar(max) = '';
SET #selectedIds = SELECT STUFF
(SELECT ','+ (SUBSTRING(c.value, 0, CHARINDEX('_', c.value, 0))
FROM #AdvancedSearchSelectedDropdownName AS tv
CROSS APPLY dbo.udfForCSVToList(SelectedIds) AS c
WHERE advanceSearchOptionTypeId = 23
FOR XML PATH('')),1,2,'');
SELECT #selectedIds

T-SQL split string by - and space

I'm having difficult time with T-SQL and I was wondering if somebody could me point me to the right track.
I have the following variable called #input
DECLARE #input nvarchar(100);
SET #input= '27364 - John Smith';
-- SET #input= '27364 - John Andrew Smith';
I need to split this string in 3 parts (ID,Firstname and LastName) or 4 if the string contains a MiddleName. For security reason I cannot use functions.
My aproach was use Substring and Charindex.
SET #Id = SUBSTRING(#input, 1, CASE CHARINDEX('-', #input)
WHEN 0
THEN LEN(#input)
ELSE
CHARINDEX('-', #input) - 2
END);
SET #FirstName = SUBSTRING(#input, CASE CHARINDEX(' ', #input)
WHEN 0
THEN LEN(#input) + 1
ELSE
CHARINDEX(' ', #input) + 1
END, 1000);
SET #LastName = SUBSTRING(#input, CASE CHARINDEX(' ', #input)
WHEN 0
THEN LEN(#input) + 1
ELSE
CHARINDEX('0', #input) + 1
END, 1000);
Select #PartyCode,#FirstName,#LastName
I am stuck because I don't know how to proceed and also the code has to be smart enough to add a fourth split if Middlename exists.
Any thoughts?
Thanks in advance
Hopefully this is part of a normalization project. This data is breaking 1NF and one really should avoid that...
Try it like this
The advantages
typesafe values
ad-hoc SQL
set based
If you want you might use a CASE WHEN to check if the last part is NULL and place Part2 into Part3 in this case...
DECLARE #input table(teststring nvarchar(100));
INSERT INTO #input VALUES
(N'27364 - John Smith'),(N'27364 - John Andrew Smith');
WITH Splitted AS
(
SELECT CAST(N'<x>' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(teststring,N' - ',N' '),N'&',N'&'),N'<',N'<'),N'>',N'>'),N' ',N'</x><x>') + N'</x>' AS XML) testXML
FROM #input
)
SELECT testXML.value('/x[1]','int') AS Number
,testXML.value('/x[2]','nvarchar(max)') AS Part1
,testXML.value('/x[3]','nvarchar(max)') AS Part2
,testXML.value('/x[4]','nvarchar(max)') AS Part3
FROM Splitted
The result
Number Part1 Part2 Part3
27364 John Smith NULL
27364 John Andrew Smith
SQL Server 2016 has a new built-in function called STRING_SPLIT()
Assuming creating built-in functions, but CLR functions are not allowed:
CREATE FUNCTION dbo.WORD_SPLIT
(
#String AS nvarchar(4000)
)
RETURNS TABLE
AS
RETURN
(
WITH Spaces AS
(
SELECT Spaced.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY (SELECT 1)) AS ordinal
FROM STRING_SPLIT(#String, ' ') AS Spaced
)
, Tabs AS
(
SELECT Tabbed.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY s.ordinal, (SELECT 1)) AS ordinal
FROM Spaces AS s
CROSS APPLY STRING_SPLIT(s.[value], ' ') AS Tabbed
)
, NewLines1 AS
(
SELECT NewLined1.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY t.ordinal, (SELECT 1)) AS ordinal
FROM Tabs AS t
CROSS APPLY STRING_SPLIT(t.[value], CHAR(13)) AS NewLined1
)
, NewLines2 AS
(
SELECT NewLined2.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY nl1.ordinal, (SELECT 1)) AS ordinal
FROM NewLines1 AS nl1
CROSS APPLY STRING_SPLIT(nl1.[value], CHAR(10)) AS NewLined2
)
SELECT LTRIM(RTRIM(nl2.[value])) AS [value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY nl2.ordinal, (SELECT 1)) AS ordinal
FROM NewLines2 AS nl2
WHERE LTRIM(RTRIM(nl2.[value])) <> ''
)
GO
Usage:
-- Not Normailized
SELECT i.*, split.[value], split.[ordinal]
FROM #input AS i
CROSS APPLY dbo.WORD_SPLIT(i.teststring) AS split
-- Normalized
;WITH Splitted AS
(
SELECT split.[value], split.[ordinal]
FROM #input AS i
CROSS APPLY dbo.WORD_SPLIT(i.teststring) AS split
)
SELECT *
FROM (SELECT [value], 'part' + CONVERT(nvarchar(20), [ordinal]) AS [parts] FROM Splitted) AS s
PIVOT (MAX([value]) FOR [parts] IN ([part1], [part2], [part3], [part4])
Or assuming that, per-security, you are not allowed to make schema changes:
WITH Splitting AS
(
SELECT teststring AS [value]
FROM #input
)
WITH Spaces AS
(
SELECT Spaced.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY (SELECT 1)) AS ordinal
FROM Splitting AS sp
CROSS APPLY STRING_SPLIT(sp.[value], ' ') AS Spaced
)
, Tabs AS
(
SELECT Tabbed.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY s.ordinal, (SELECT 1)) AS ordinal
FROM Spaces AS s
CROSS APPLY STRING_SPLIT(s.[value], ' ') AS Tabbed
)
, NewLines1 AS
(
SELECT NewLined1.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY t.ordinal, (SELECT 1)) AS ordinal
FROM Tabs AS t
CROSS APPLY STRING_SPLIT(t.[value], CHAR(13)) AS NewLined1
)
, NewLines2 AS
(
SELECT NewLined2.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY nl1.ordinal, (SELECT 1)) AS ordinal
FROM NewLines1 AS nl1
CROSS APPLY STRING_SPLIT(nl1.[value], CHAR(10)) AS NewLined2
)
, Splitted AS
(
SELECT LTRIM(RTRIM(nl2.[value])) AS [teststring], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY nl2.ordinal, (SELECT 1)) AS ordinal
FROM NewLines2 AS nl2
WHERE LTRIM(RTRIM(nl2.[value])) <> ''
)
SELECT *
FROM (SELECT [value], 'part' + CONVERT(nvarchar(20), [ordinal]) AS [parts] FROM Splitted) AS s
PIVOT (MAX([value]) FOR [parts] IN ([part1], [part2], [part3], [part4])
Hopefully helpful!