Splitting comma delimited cell data - tsql

I have a spreadsheet with multiple columns, one of which is an owner_id column. The problem is that this column contains a comma delimited list of owner id's and not just a single one.
I've imported this spreadsheet into my sql database (2008) and have completed other importing tasks and now have a parcel_id column as a result of this process.
I need to create an entry in my parcelOwners table for each parcelID/ownerID pair, but I'm not sure how to go about this with the owner id's being in the comma delimited list.
My tables look like this:
ImportData
=================
owner_id varchar,
parcelID int
sample row (owner_id = '13782, 21431', parcelID = 319)
ParcelOwners
=================
ownerID int,
parcelID int
row from ImportData table should look like:
ownerID = 13782, parcelID = 319
ownerID = 21431, parcelID = 319
Is this a common situation for anybody and if so, how do you go about getting around this?

The below function will split you comma sep column into a table. You will then need to iterate through the temp table and insert 1 row into your parcelOwners table using the data from your single column. To get this to work you will need an outer loop to iterate through the parcelOwners table and an inner loop to iterate through the #temptable for each row. Also, don't forget, if you come to a row in your outer loop with no comma's in the owner_id column you won't want to do anything.
CREATE FUNCTION dbo.Split(#String varchar(8000), #Delimiter char(1))
returns #temptable TABLE (items varchar(8000))
as
begin
declare #idx int
declare #slice varchar(8000)
select #idx = 1
if len(#String)<1 or #String is null return
while #idx!= 0
begin
set #idx = charindex(#Delimiter,#String)
if #idx!=0
set #slice = left(#String,#idx - 1)
else
set #slice = #String
if(len(#slice)>0)
insert into #temptable(Items) values(#slice)
set #String = right(#String,len(#String) - #idx)
if len(#String) = 0 break
end
return
end

You can do this easily leveraging SQL Server's XML functions:
WITH xmlData (xml_owner_id,parecelID) AS (
/* make into xml */
SELECT cast('<x>'+replace(owner_id,',','</x><x>')+'</x>' as XML) AS xml_owner_id, parecelID
FROM ImportData
)
SELECT x.value('.','int') AS owner_id, parecelID /* split up */
FROM xmlData
CROSS APPLY xmlData.xml_owner_id.nodes('//x') AS func(x)

(In response to #senloe's question about how to use the function supplied by #RandomBen)
This answer to a previous question shows how to use OUTER APPLY to apply a function to every row in a table. In your case, and assuming you have already run #RandomBen's code to create the dbo.Split function, the syntax would look something like this:
INSERT INTO ParcelOwners (ownerId, parcelID)
SELECT CONVERT(int, Results.items), ImportData.parcelID
FROM ImportData
OUTER APPLY dbo.Split(ImportData.owner_id, ',') AS Results
(I don't have access to SQL Server right now, so I haven't tried it yet. You can run it without the first line, i.e. just from SELECT onwards, to see what output it is going to generate before you actually do the INSERT).

Related

Redshift how to split a stringified array into separate parts

Say I have a varchar column let's say religions that looks like this: ["Christianity", "Buddhism", "Judaism"] (yes it has a bracket in the string) and I want the string (not array) split into multiple rows like "Christianity", "Buddhism", "Judaism" so it can be used in a WHERE clause.
Eventually I want to use the results of the query in a where clause like this:
SELECT ...
FROM religions
WHERE name in
(
<this subquery>
)
How can one do this?
You can use the function JSON_PARSE to convert the varchar string into an array. Then you can use the strategy described in Convert varchar array to rows in redshift - Stack Overflow to convert the array to separate rows.
You can do the following.
Create a temporary table with sequence of numbers
Using the sequence and split_part function available in redshift, you can split the values based on the numbers generated in the temporary table by doing a cross join.
To replace the double quote and square brackets, you can use the regexp_replace function in Redshift.
create temp table seq as
with recursive numbers(NUMBER) as
(
select 1 UNION ALL
select NUMBER + 1 from numbers where NUMBER < 28
)
select * from numbers;
select regexp_replace(split_part(val,',',seq.number),'[]["]','') as value
from
(select '["christianity","Buddhism","Judaism"]' as val) -- You can select the actual column from the table here.
cross join
seq
where seq.number <= regexp_count(val,'[,]')+1;

SQL Server - How Do I Create Increments in a Query

First off, I'm using SQL Server 2008 R2
I am moving data from one source to another. In this particular case there is a field called SiteID. In the source it's not a required field, but in the destination it is. So it was my thought, when the SiteID from the source is NULL, to sort of create a SiteID "on the fly" during the query of the source data. Something like a combination of the state plus the first 8 characters of a description field plus a ten digit number incremented.
At first I thought it might be easy to use a combination of date/time + nanoseconds but it turns out that several records can be retrieved within a nanosecond leading to duplicate SiteIDs.
My second idea was to create a table that contained an identity field plus a function that would add a record to increment the identity field and then return it (the function would also delete all records where the identity field is less than the latest saving space). Unfortunately after I got it written, when trying to "CREATE" the function I got a notice that INSERTs are not allowed in functions.
I could (and did) convert it to a stored procedure, but stored procedures are not allowed in select queries.
So now I'm stuck.
Is there any way to accomplish what I'm trying to do?
This script may take time to execute depending on the data present in the table, so first execute on a small sample dataset.
DECLARE #TotalMissingSiteID INT = 0,
#Counter INT = 0,
#NewID BIGINT;
DECLARE #NewSiteIDs TABLE
(
SiteID BIGINT-- Check the datatype
);
SELECT #TotalMissingSiteID = COUNT(*)
FROM SourceTable
WHERE SiteID IS NULL;
WHILE(#Counter < #TotalMissingSiteID )
BEGIN
WHILE(1 = 1)
BEGIN
SELECT #NewID = RAND()* 1000000000000000;-- Add your formula to generate new SiteIDs here
-- To check if the generated SiteID is already present in the table
IF ( ISNULL(( SELECT 1
FROM SourceTable
WHERE SiteID = #NewID),0) = 0 )
BREAK;
END
INSERT INTO #NewSiteIDs (SiteID)
VALUES (#NewID);
SET #Counter = #Counter + 1;
END
INSERT INTO DestinationTable (SiteID)-- Add the extra columns here
SELECT ISNULL(MainTable.SiteID,NewIDs.SiteID) SiteID
FROM (
SELECT SiteID,-- Add the extra columns here
ROW_NUMBER() OVER(PARTITION BY SiteID
ORDER BY SiteID) SerialNumber
FROM SourceTable
) MainTable
LEFT JOIN ( SELECT SiteID,
ROW_NUMBER() OVER(ORDER BY SiteID) SerialNumber
FROM #NewSiteIDs
) NewIDs
ON MainTable.SiteID IS NULL
AND MainTable.SerialNumber = NewIDs.SerialNumber

TSQL split comma delimited string

I am trying to create a stored procedure that will split 3 text boxes on a webpage that have user input that all have comma delimited strings in it. We have a field called 'combined_name' in our table that we have to search for first and last name and any known errors or nicknames etc. such as #p1: 'grei,grie' #p2: 'joh,jon,j..' p3: is empty.
The reason for the third box is after I get the basics set up we will have does not contain, starts with, ends with and IS to narrow our results further.
So I am looking to get all records that CONTAINS any combination of those. I originally wrote this in LINQ but it didn't work as you cannot query a list and a dataset. The dataset is too large (1.3 million records) to be put into a list so I have to use a stored procedure which is likely better anyway.
Will I have to use 2 SP, one to split each field and one for the select query or can this be done with one? What function do I use for contains in tsql? I tried using IN win a query but cannot figure out how it works with multiple parameters.
Please note that this will be an internal site that has limited access so worrying about sql injection is not a priority.
I did attempt dynamic SQL but am not getting the correct results back:
CREATE PROCEDURE uspJudgments #fullName nvarchar(100) AS
EXEC('SELECT *
FROM new_judgment_system.dbo.defendants_ALL
WHERE combined_name IN (' + #fullName + ')')
GO
EXEC uspJudgments #fullName = '''grein'', ''grien'''
Even if this did retrieve the correct results how would this be done with 3 parameters?
You may try use this to split string and obtain a tables of strings. Then to have all the combinations you may use full join of these two tables. And then do your select.
Here is the Table valued function I set up:
ALTER FUNCTION [dbo].[Split] (#sep char(1), #s varchar(8000))
RETURNS table
AS
RETURN (
WITH splitter_cte AS (
SELECT CHARINDEX(#sep, #s) as pos, 0 as lastPos
UNION ALL
SELECT CHARINDEX(#sep, #s, pos + 1), pos
FROM splitter_cte
WHERE pos > 0
)
SELECT SUBSTRING(#s, lastPos + 1,
case when pos = 0 then 80000
else pos - lastPos -1 end) as OutputValues
FROM splitter_cte
)
)

help with TSQL IN statement with int

I am trying to create the following select statement in a stored proc
#dealerids nvarchar(256)
SELECT *
FROM INVOICES as I
WHERE convert(nvarchar(20), I.DealerID) in (#dealerids)
I.DealerID is an INT in the table. and the Parameter for dealerids would be formatted such as
(8820, 8891, 8834)
When I run this with parameters provided I get no rows back. I know these dealerIDs should provided rows as if I do it individually I get back what I expect.
I think I am doing
WHERE convert(nvarchar(20), I.DealerID) in (#dealerids)
incorrectly. Can anyone point out what I am doing wrong here?
Use a table values parameter (new in SQl Server 2008). Set it up by creating the actual table parameter type:
CREATE TYPE IntTableType AS TABLE (ID INTEGER PRIMARY KEY)
Your procedure would then be:
Create Procedure up_TEST
#Ids IntTableType READONLY
AS
SELECT *
FROM ATable a
WHERE a.Id IN (SELECT ID FROM #Ids)
RETURN 0
GO
if you can't use table value parameters, see: "Arrays and Lists in SQL Server 2005 and Beyond, When Table Value Parameters Do Not Cut it" by Erland Sommarskog, then there are many ways to split string in SQL Server. This article covers the PROs and CONs of just about every method. in general, you need to create a split function. This is how a split function can be used:
SELECT
*
FROM YourTable y
INNER JOIN dbo.yourSplitFunction(#Parameter) s ON y.ID=s.Value
I prefer the number table approach to split a string in TSQL but there are numerous ways to split strings in SQL Server, see the previous link, which explains the PROs and CONs of each.
For the Numbers Table method to work, you need to do this one time table setup, which will create a table Numbers that contains rows from 1 to 10,000:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this split function:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this will not return empty rows
----------------
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(#SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = #SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
You can now easily split a CSV string into a table and join on it:
Create Procedure up_TEST
#Ids VARCHAR(MAX)
AS
SELECT * FROM ATable a
WHERE a.Id IN (SELECT ListValue FROM dbo.FN_ListToTable(',',#Ids))
You can't use #dealerids like that, you need to use dynamic SQL, like this:
#dealerids nvarchar(256)
EXEC('SELECT *
FROM INVOICES as I
WHERE convert(nvarchar(20), I.DealerID) in (' + #dealerids + ')'
The downside is that you open yourself up to SQL injection attacks unless you specifically control the data going into #dealerids.
There are better ways to handle this depending on your version of SQL Server, which are documented in this great article.
Split #dealerids into a table then JOIN
SELECT *
FROM INVOICES as I
JOIN
ufnSplit(#dealerids) S ON I.DealerID = S.ParsedIntDealerID
Assorted split functions here (I'd probably a numbers table in this case for a small string

How to refactor this sql query

I have a lengthy query here, and wondering whether it could be refactor?
Declare #A1 as int
Declare #A2 as int
...
Declare #A50 as int
SET #A1 =(Select id from table where code='ABC1')
SET #A2 =(Select id from table where code='ABC2')
...
SET #A50 =(Select id from table where code='ABC50')
Insert into tableB
Select
Case when #A1='somevalue' Then 'x' else 'y' End,
Case when #A2='somevalue' Then 'x' else 'y' End,
..
Case when #A50='somevalue' Then 'x' else 'y' End
From tableC inner join ......
So as you can see from above, there is quite some redundant code. But I can not think of a way to make it simpler.
Any help is appreciated.
If you need the variables assigned, you could pivot your table...
SELECT *
FROM
(
SELECT Code, Id
FROM Table
) t
PIVOT
(MAX(Id) FOR Code IN ([ABC1],[ABC2],[ABC3],[ABC50])) p /* List them all here */
;
...and then assign them accordingly.
SELECT #A1 = [ABC1], #A2 = [ABC2]
FROM
(
SELECT Code, Id
FROM Table
) t
PIVOT
(MAX(Id) FOR Code IN ([ABC1],[ABC2],[ABC3],[ABC50])) p /* List them all here */
;
But I doubt you actually need to assign them at all. I just can't really picture what you're trying to achieve.
Pivotting may help you, as you can still use the CASE statements.
Rob
Without taking the time to develop a full answer, I would start by trying:
select id from table where code in ('ABC1', ... ,'ABC50')
then pivot that, to get one row result set of columns ABC1 through ABC50 with ID values.
Join that row in the FROM.
If 'somevalue', 'x' and 'y' are constant for all fifty expressions. Then start from:
select case id when 'somevalue' then 'x' else 'y' end as XY
from table
where code in ('ABC1', ... ,'ABC50')
I am not entirely sure from your example, but it looks like you should be able to do one of a few things.
Create a nice look up table that will tell you for a given value of the select statement what should be placed there. This would be much shorter and should be insanely fast.
Create a simple for loop in your code and generate a list of 50 small queries.
Use sub-selects or generate a list of selects with one round trip to retrieve your #a1-#A50 values and then generate the query with them already in place.
Jacob