SQL Server - How Do I Create Increments in a Query - sql-server-2008-r2

First off, I'm using SQL Server 2008 R2
I am moving data from one source to another. In this particular case there is a field called SiteID. In the source it's not a required field, but in the destination it is. So it was my thought, when the SiteID from the source is NULL, to sort of create a SiteID "on the fly" during the query of the source data. Something like a combination of the state plus the first 8 characters of a description field plus a ten digit number incremented.
At first I thought it might be easy to use a combination of date/time + nanoseconds but it turns out that several records can be retrieved within a nanosecond leading to duplicate SiteIDs.
My second idea was to create a table that contained an identity field plus a function that would add a record to increment the identity field and then return it (the function would also delete all records where the identity field is less than the latest saving space). Unfortunately after I got it written, when trying to "CREATE" the function I got a notice that INSERTs are not allowed in functions.
I could (and did) convert it to a stored procedure, but stored procedures are not allowed in select queries.
So now I'm stuck.
Is there any way to accomplish what I'm trying to do?

This script may take time to execute depending on the data present in the table, so first execute on a small sample dataset.
DECLARE #TotalMissingSiteID INT = 0,
#Counter INT = 0,
#NewID BIGINT;
DECLARE #NewSiteIDs TABLE
(
SiteID BIGINT-- Check the datatype
);
SELECT #TotalMissingSiteID = COUNT(*)
FROM SourceTable
WHERE SiteID IS NULL;
WHILE(#Counter < #TotalMissingSiteID )
BEGIN
WHILE(1 = 1)
BEGIN
SELECT #NewID = RAND()* 1000000000000000;-- Add your formula to generate new SiteIDs here
-- To check if the generated SiteID is already present in the table
IF ( ISNULL(( SELECT 1
FROM SourceTable
WHERE SiteID = #NewID),0) = 0 )
BREAK;
END
INSERT INTO #NewSiteIDs (SiteID)
VALUES (#NewID);
SET #Counter = #Counter + 1;
END
INSERT INTO DestinationTable (SiteID)-- Add the extra columns here
SELECT ISNULL(MainTable.SiteID,NewIDs.SiteID) SiteID
FROM (
SELECT SiteID,-- Add the extra columns here
ROW_NUMBER() OVER(PARTITION BY SiteID
ORDER BY SiteID) SerialNumber
FROM SourceTable
) MainTable
LEFT JOIN ( SELECT SiteID,
ROW_NUMBER() OVER(ORDER BY SiteID) SerialNumber
FROM #NewSiteIDs
) NewIDs
ON MainTable.SiteID IS NULL
AND MainTable.SerialNumber = NewIDs.SerialNumber

Related

Is there a more efficient / elegant way to write this code I have?

I'm wondering if anybody can help me out with any or all of this code below. I've made it work, but it seems inefficient to me and is probably quite a bit slower than optimal.
Some basic background on the necessity of this code in the first place:
I have a table of shipping records that does not include the corresponding invoice number. I've looked all through the tables and I continue to do so. In fact, only this morning I discovered that if a packing slip has been generated that I can link the shipping table to the packing slip table via that packing slip ID and grab the invoice number from there. Absent that link, however, I'm forced to guess. In most instances, that's not terribly difficult, because the invoice table has number, line and release that can match up. But when there are multiple shipments for number, line and release (for instance, when a line is partially shipped) then there can be multiple answers, only one of which is correct. I am partially helped by the presence of a a column in the shipping table that states what the date sequence is for that number, line and release, but there are still circumstances where the process I use for "guessing" can be somewhat ambiguous.
What my procedure does is this. First, it creates a table of data that includes the invoice number if there was a pack slip to link it through.
Next, it dumps all of that data into a second table, this time using--only if the invoice was NULL in the first table--a "guess" about the invoice number based on partitioning all the shipping records by number, line, release, date sequence and date, and then comparing that to the same type of thing for the invoice table, and trying to line everything up by date.
Finally, it parses through that table and finds any last nulls and essentially matches them up with the first record of any invoice for that number, line and release.
Both guesses have added characters to show that they are, in fact, guesses.
IF OBJECT_ID('tempdb..#cosTAble') IS NOT NULL
DROP TABLE #cosTable
DECLARE #cosTable2 TABLE (
ID INT IDENTITY
,co_num CoNumType
,co_line CoLineType
,co_release CoReleaseType
,date_seq DateSeqType
,ship_date DateType
,inv_num NVARCHAR(14)
)
DECLARE
#co_num_ck CoNumType
,#co_line_ck CoLineType
,#co_release_ck CoReleaseType
DECLARE #Counter1 INT = 0
SELECT cos.co_num, cos.co_line, cos.co_release, cos.date_seq, cos.ship_date, cos.qty_invoiced, pck.inv_num
INTO #cosTable
FROM co_ship cos
LEFT JOIN pckitem pck
ON cos.pack_num = pck.pack_num
AND cos.co_num = pck.co_num
AND cos.co_line = pck.co_line
AND cos.co_release = pck.co_release
;WITH cos_Order
AS(
SELECT co_num, co_line, co_release, qty_invoiced, date_seq, ship_date, ROW_NUMBER () OVER (PARTITION BY co_num, co_line, co_release ORDER BY ship_date) AS cosrow
FROM co_ship
WHERE qty_invoiced > 0
),
invi_Order
AS(
SELECT inv_num, co_num, co_line, co_release, ROW_NUMBER () OVER (PARTITION BY co_num, co_line, co_release ORDER BY RecordDate) AS invirow
FROM inv_item
WHERE qty_invoiced > 0
),
cos_invi
AS(
SELECT cosO.*, inviO.inv_num
FROM cos_Order cosO
LEFT JOIN invi_Order inviO
ON cosO.co_num = inviO.co_num AND cosO.co_line = inviO.co_line AND cosO.cosrow = inviO.invirow)
INSERT INTO #cosTable2
SELECT cosT.co_num, cosT.co_line, cosT.co_release, cosT.date_seq, cosT.ship_date, COALESCE(cosT.inv_num,'*'+cosi.inv_num) AS inv_num
FROM #cosTable cosT
LEFT JOIN cos_invi cosi
ON cosT.co_num = cosi.co_num
AND cosT.co_line = cosi.co_line
AND cosT.co_release = cosi.co_release
AND cosT.date_seq = cosi.date_seq
AND cosT.ship_date = cosi.ship_date
WHILE #Counter1 < (SELECT MAX(ID) FROM #cosTable2) BEGIN
SET #Counter1 += 1
SET #co_num_ck = (SELECT co_num FROM #cosTable2 WHERE ID = #Counter1)
SET #co_line_ck = (SELECT co_line FROM #cosTable2 WHERE ID = #Counter1)
SET #co_release_ck = (SELECT co_release FROM #cosTable2 WHERE ID = #Counter1)
IF EXISTS (SELECT * FROM #cosTable2 WHERE ID = #Counter1 AND inv_num IS NULL)
UPDATE #cosTable2
SET inv_num = '^' + (SELECT TOP 1 inv_num FROM #cosTable2 WHERE
#co_num_ck = co_num AND
#co_line_ck = co_line AND
#co_release_ck = co_release)
WHERE ID = #Counter1 AND inv_num IS NULL
END
SELECT * FROM #cosTable2
ORDER BY co_num, co_line, co_release, date_seq, ship_date
You're in a bad spot - as #craig.white and #HLGEM suggest, you've inherited something without sufficient constraints to make the data correct or safe...and now you have to "synthesize" it. I get that guesses are the best you can do, and you can, at least make your guesses reasonable performance-wise.
After that, you should squeal loudly to get some time to fix the db - to apply the constraints needed to prevent further crapification of the data.
Performance-wise, the while loop is a disaster. You'd be better off replacing that whole mess with a single update statement...something like:
update c0
set inv_nbr = '^' + c1.inv_nbr
from
#cosTable2 c0
left outer join
(
select
co_num,
co_line,
co_release,
inv_nbr
from
#cosTable2
where
inv_nbr is not null
group by
co_num,
co_line,
co_release,
inv_nbr
) as c1
on
c0.co_num = c1.co_num and
c0.co_line = c1.co_line and
c0.co_release = c1.co_release
where
c0.inv_num is null
...which does the same thing the loop does, only in a single statement.
It seems to me that you are trying very hard to solve a problem that should not exist. What you describe is an unfortunately common situation where a process has grown organically without intent and specific direction as a business has grown which has made data extraction near impossible to automate. You very much need a set of policies and procedures- For (very crude and simple) example:
1: An Order must exist before a packing slip can be generated.
2: a packing slip must exist before an invoice can be generated.
3: an invoice is created using data from the packing slip and order (what was requested, what was picked, what do we bill)
-Again, this is a crude example just to illustrate the idea.
All of the data MUST be entered at the proper time or someone has not done their job.
It is not in the IT departments typical skillset to accurately and consistently provide management good data when such data does not exist.

In DB2, perform an update based on insert for large number of rows

In DB2, I need to do an insert, then, using results/data from that insert, update a related table. I need to do it on a million plus records and would prefer not to lock the entire database. So, 1) how do I 'couple' the insert and update statements? 2) how can I ensure the integrity of the transaction (without locking the whole she-bang)?
some pseudo-code should help clarify
STEP 1
insert into table1 (neededId, id) select DYNAMICVALUE, id from tableX where needed value is null
STEP 2
update table2 set neededId = (GET THE DYNAMIC VALUE JUST INSERTED) where id = (THE ID JUST INSERTED)
note: in table1, the ID col is not unique, so i can't just filter on that to find the new DYNAMICVALUE
This should be more clear (FTR, this works, but I don't like it, because I'd have to lock the tables to maintain integrity. Would be great it I could run these statements together, and allow the update to refer to the newAddressNumber value.)
/****RUNNING TOP INSERT FIRST****/*
--insert a new address for each order that does not have a address id
insert into addresses
(customerId, addressNumber, address)
select
cust.Id,
--get next available addressNumber
ifNull((select max(addy2.addressNumber) from addresses addy2 where addy2.customerId = cust.id),0) + 1 as newAddressNumber,
cust.address
from customers cust
where exists (
--find all customers with at least 1 order where addressNumber is null
select 1 from orders ord
where 1=1
and ord.customerId = cust.id
and ord.addressNumber is null
)
/*****RUNNING THIS UPDATE SECOND*****/
update orders ord1
set addressNumber = (
select max(addressNumber) from addresses addy3
where addy3.customerId = ord1.customerId
)
where 1=1
and ord1.addressNumber is null
The IDENTITY_VAL_LOCAL function is a non-deterministic function that returns the most recently assigned value for an identity column, where the assignment occurred as a result of a single INSERT statement using a VALUES clause

need to copy all rows with C_PROV_TYPE ='014' and C_SPECILTY = '300' and insert back 3 rows with same data + max sequence number + 1 i.e =4,5,6

need 3 rows for each one of the two valid rows displayed below output like below:
Primary key is C_PROCEDURE + C_PROV_TYPE + SPEC_SEQ_NO!
output shall be like bELOW
You could try something like this:
INSERT INTO YourTable (
C_PROCEDURE,
C_PROV_TYPE,
I_PT_SPEC_SEQ_NO,
C_SPECIALTY
)
SELECT
s.C_PROCEDURE,
s.C_PROV_TYPE,
s.MaxSeq + ROW_NUMBER() OVER (
PARTITION BY s.C_PROCEDURE, s.C_PROV_TYPE
ORDER BY v.rn, s.I_PT_SPEC_SEQ_NO),
s.C_SPECIALTY + v.rn
FROM (
SELECT
*,
MAX(I_PT_SPEC_SEQ_NO) OVER (
PARTITION BY C_PROCEDURE, C_PROV_TYPE
) AS MaxSeq
FROM YourTable
) s
CROSS JOIN (
VALUES (1), (2), (3)
) v (rn)
WHERE s.C_PROV_TYPE = '014'
AND s.C_SPECIALTY = '300'
;
Basically, the subquery returns all the YourTable rows supplied with the maximum values of I_PT_SPEC_SEQ_NO for every partition of (C_PROCEDURE, C_PROV_TYPE) using the windowing MAX() function (MAX(...) OVER (...)).
The resulting set of that subquery is then cross-joined to an inline 3-row table (which produces three copies of every row returned) and filtered by the specified values of C_PROV_TYPE and C_SPECIALTY.
New data rows pull C_PROCEDURE and C_PROV_TYPE directly from the subquery. The new C_SPECIALTY values are produced using those from the subquery and the rn values of the inline table. The new sequence numbers are generated with the help of the ROW_NUMBER() function and the maximum sequence numbers returned by the subquery.
As I didn't have access to a working installation of DB2, I was testing my script in SQL Server 2008, trying to stick to features that I understood DB2 supported as well as SQL Server. This SQL Fiddle demo also uses a SQL Server 2008 instance to demonstrate how the query works.

help with TSQL IN statement with int

I am trying to create the following select statement in a stored proc
#dealerids nvarchar(256)
SELECT *
FROM INVOICES as I
WHERE convert(nvarchar(20), I.DealerID) in (#dealerids)
I.DealerID is an INT in the table. and the Parameter for dealerids would be formatted such as
(8820, 8891, 8834)
When I run this with parameters provided I get no rows back. I know these dealerIDs should provided rows as if I do it individually I get back what I expect.
I think I am doing
WHERE convert(nvarchar(20), I.DealerID) in (#dealerids)
incorrectly. Can anyone point out what I am doing wrong here?
Use a table values parameter (new in SQl Server 2008). Set it up by creating the actual table parameter type:
CREATE TYPE IntTableType AS TABLE (ID INTEGER PRIMARY KEY)
Your procedure would then be:
Create Procedure up_TEST
#Ids IntTableType READONLY
AS
SELECT *
FROM ATable a
WHERE a.Id IN (SELECT ID FROM #Ids)
RETURN 0
GO
if you can't use table value parameters, see: "Arrays and Lists in SQL Server 2005 and Beyond, When Table Value Parameters Do Not Cut it" by Erland Sommarskog, then there are many ways to split string in SQL Server. This article covers the PROs and CONs of just about every method. in general, you need to create a split function. This is how a split function can be used:
SELECT
*
FROM YourTable y
INNER JOIN dbo.yourSplitFunction(#Parameter) s ON y.ID=s.Value
I prefer the number table approach to split a string in TSQL but there are numerous ways to split strings in SQL Server, see the previous link, which explains the PROs and CONs of each.
For the Numbers Table method to work, you need to do this one time table setup, which will create a table Numbers that contains rows from 1 to 10,000:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this split function:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this will not return empty rows
----------------
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(#SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = #SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
You can now easily split a CSV string into a table and join on it:
Create Procedure up_TEST
#Ids VARCHAR(MAX)
AS
SELECT * FROM ATable a
WHERE a.Id IN (SELECT ListValue FROM dbo.FN_ListToTable(',',#Ids))
You can't use #dealerids like that, you need to use dynamic SQL, like this:
#dealerids nvarchar(256)
EXEC('SELECT *
FROM INVOICES as I
WHERE convert(nvarchar(20), I.DealerID) in (' + #dealerids + ')'
The downside is that you open yourself up to SQL injection attacks unless you specifically control the data going into #dealerids.
There are better ways to handle this depending on your version of SQL Server, which are documented in this great article.
Split #dealerids into a table then JOIN
SELECT *
FROM INVOICES as I
JOIN
ufnSplit(#dealerids) S ON I.DealerID = S.ParsedIntDealerID
Assorted split functions here (I'd probably a numbers table in this case for a small string

Splitting comma delimited cell data

I have a spreadsheet with multiple columns, one of which is an owner_id column. The problem is that this column contains a comma delimited list of owner id's and not just a single one.
I've imported this spreadsheet into my sql database (2008) and have completed other importing tasks and now have a parcel_id column as a result of this process.
I need to create an entry in my parcelOwners table for each parcelID/ownerID pair, but I'm not sure how to go about this with the owner id's being in the comma delimited list.
My tables look like this:
ImportData
=================
owner_id varchar,
parcelID int
sample row (owner_id = '13782, 21431', parcelID = 319)
ParcelOwners
=================
ownerID int,
parcelID int
row from ImportData table should look like:
ownerID = 13782, parcelID = 319
ownerID = 21431, parcelID = 319
Is this a common situation for anybody and if so, how do you go about getting around this?
The below function will split you comma sep column into a table. You will then need to iterate through the temp table and insert 1 row into your parcelOwners table using the data from your single column. To get this to work you will need an outer loop to iterate through the parcelOwners table and an inner loop to iterate through the #temptable for each row. Also, don't forget, if you come to a row in your outer loop with no comma's in the owner_id column you won't want to do anything.
CREATE FUNCTION dbo.Split(#String varchar(8000), #Delimiter char(1))
returns #temptable TABLE (items varchar(8000))
as
begin
declare #idx int
declare #slice varchar(8000)
select #idx = 1
if len(#String)<1 or #String is null return
while #idx!= 0
begin
set #idx = charindex(#Delimiter,#String)
if #idx!=0
set #slice = left(#String,#idx - 1)
else
set #slice = #String
if(len(#slice)>0)
insert into #temptable(Items) values(#slice)
set #String = right(#String,len(#String) - #idx)
if len(#String) = 0 break
end
return
end
You can do this easily leveraging SQL Server's XML functions:
WITH xmlData (xml_owner_id,parecelID) AS (
/* make into xml */
SELECT cast('<x>'+replace(owner_id,',','</x><x>')+'</x>' as XML) AS xml_owner_id, parecelID
FROM ImportData
)
SELECT x.value('.','int') AS owner_id, parecelID /* split up */
FROM xmlData
CROSS APPLY xmlData.xml_owner_id.nodes('//x') AS func(x)
(In response to #senloe's question about how to use the function supplied by #RandomBen)
This answer to a previous question shows how to use OUTER APPLY to apply a function to every row in a table. In your case, and assuming you have already run #RandomBen's code to create the dbo.Split function, the syntax would look something like this:
INSERT INTO ParcelOwners (ownerId, parcelID)
SELECT CONVERT(int, Results.items), ImportData.parcelID
FROM ImportData
OUTER APPLY dbo.Split(ImportData.owner_id, ',') AS Results
(I don't have access to SQL Server right now, so I haven't tried it yet. You can run it without the first line, i.e. just from SELECT onwards, to see what output it is going to generate before you actually do the INSERT).