Set repeating IDs till first record repeats (bulk load csv file) - tsql

I have a file that I imported via bulk-insert and I want to assign group IDs/sequences.
I would like to assign the IDs till the first record with the first character is repeated. In this example its "A".
The challenge I have is how to achieve this example and set the IDs like this example:
ID
data
1
A000abcefd
1
E00asoaskdaok
1
C000dasdasok
2
A100abcasds
2
E100aandas
2
C100adsokdas

Here is one way to do it, but given the limited info you provided I will make the following assumptions:
**The data in your table has some order to it. This obviously will not work if that is not the case. I used an ID, you use what you have.
**The first row in the table has the character you are looking for.
CREATE TABLE #tmp(ID int, [data] varchar(20))
INSERT INTO #tmp
VALUES
(1, 'A000abcefd'),
(2, 'E00asoaskdaok'),
(3, 'C000dasdasok'),
(4, 'A100abcasds'),
(5, 'E100aandas'),
(6, 'C100adsokdas')
DECLARE #CHAR varchar(1)
SELECT #CHAR = (SELECT TOP 1 SUBSTRING([data],1,1) FROM #tmp ORDER BY ID)
SELECT SUM(CASE WHEN SUBSTRING([data],1,1) = #CHAR THEN 1 ELSE 0 END)
OVER(ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) SeqNum
,[data]
FROM #tmp

Related

How to use a declare statement to update a table

I have this Declare Statement
declare #ReferralLevelData table([Type of Contact] varchar(10));
insert into #ReferralLevelData values ('f2f'),('nf2f'),('Travel'),('f2f'),('nf2f'),('Travel'),('f2f'),('nf2f'),('Travel');
select (row_number() over (order by [Type of Contact]) % 3) +1 as [Referral ID]
,[Type of Contact]
from #ReferralLevelData
order by [Referral ID]
,[Type of Contact];
It does not insert into the table so i feel this is not working as expect, i.e it doesn't modify the table.
If it did work I was hoping to modify the statement to make it update.
At the moment the table just prints this result
1 f2f
1 nf2f
1 Travel
2 f2f
2 nf2f
2 Travel
3 f2f
3 nf2f
3 Travel
EDIT:
I want TO Update the table to enter recurring data in groups of three.
I have a table of data, it is duplicated twice in the same table to make three sets.
Its "ReferenceID" is the primary key, i want to in a way group the 3 same ReferenceID's and inject these three values "f2f" "NF2F" "Travel" into the row called "Type" in any order but ensure that each ReferenceID only has one of those values.
Do you mean the following?
declare #ReferralLevelData table(
[Referral ID] int,
[Type of Contact] varchar(10)
);
insert into #ReferralLevelData([Referral ID],[Type of Contact])
select
(row_number() over (order by [Type of Contact]) % 3) +1 as [Referral ID]
,[Type of Contact]
from
(
values ('f2f'),('nf2f'),('Travel'),('f2f'),('nf2f'),('Travel'),('f2f'),('nf2f'),('Travel')
) v([Type of Contact]);
If it suits you then you also can use the next query to generate data:
select r.[Referral ID],ct.[Type of Contact]
from
(
values ('f2f'),('nf2f'),('Travel')
) ct([Type of Contact])
cross join
(
values (1),(2),(3)
) r([Referral ID]);

Best way to avoid duplicates in table?

I've been given a task that requires writing a script to mass change items in a table(ProductArea):
ProductID int
SalesareaID int
One ProductID can only exist once in each SalesareaID so there can't be any duplicates in this table. But one ProductID can be sold in multiple SalesareaID.
So an example would look something like:
ProductID SalesareaID
1 1
1 2
1 3
2 2
3 1
Now, some areas have merged. So, if I try to run a straight-forward UPDATE to fix this like:
UPDATE ProductArea SET SalesareaID = 4 where SalesareaID IN (2, 3)
it will find (1, 2) and change that to (1, 4). Then it will find (1, 3) and try to change that to (1, 4). But that already exist so it will crash with a "Cannot insert duplicate key..."-error.
Is there a best/recommended way to tell my UPDATE to only update if the resulting (ProductID, SalesareaID) doesn't already exist?
This should work
It uses a window function
declare #T table (prodID int, salesID int, primary key (prodID, salesID));
insert into #T values
(1, 1)
, (1, 2)
, (1, 3)
, (2, 2)
, (3, 1);
with cte as
( select t.*
, row_number() over (partition by t.prodID order by t.salesID) as rn
from #T t
where t.salesID in (2, 3)
)
delete cte where rn > 1;
update #T set salesID = 4 where salesID in (2, 3);
select * from #T;
If you are creating a new merged region from existing regions then I think the easiest thing to do would be to treat the merge as two separate operations.
First you insert entries for the new area based on the existing areas.
INSERT INTO ProductArea (ProductID, SalesareaID)
SELECT DISTINCT ProductID, 4 FROM ProductArea
WHERE SalesareaID IN (2, 3)
Then you remove the entries for the existing areas.
DELETE FROM ProductArea WHERE SalesareaID IN (2, 3)
The SalesareaID of 4 would need to be replaced by the id of the new Salesarea. The 2 and 3 would also need to be replaced by the ids of the areas you are merging to create the new Salesarea.

GROUP BY clause sees all VARCHAR fields as different

I have witnessed a strange behaviour while trying to GROUP BY a VARCHAR field.
Let the following example, where I try to spot customers that have changed name at least once in the past.
CREATE TABLE #CustomersHistory
(
Id INT IDENTITY(1,1),
CustomerId INT,
Name VARCHAR(200)
)
INSERT INTO #CustomersHistory VALUES (12, 'AAA')
INSERT INTO #CustomersHistory VALUES (12, 'AAA')
INSERT INTO #CustomersHistory VALUES (12, 'BBB')
INSERT INTO #CustomersHistory VALUES (44, '444')
SELECT ch.CustomerId, count(ch.Name) AS cnt
FROM #CustomersHistory ch
GROUP BY ch.CustomerId HAVING count(ch.Name) != 1
Which oddly yields (as if 'AAA' from first INSERT was different from the second one)
CustomerId cnt // (I was expecting)
12 3 // 2
44 1 // 1
Is this behaviour specific to T-SQL?
Why does it behave in this rather counter-intuitive way?
How is it customary to overcome this limitation?
Note: This question is very similar to GROUP BY problem with varchar, where I didn't find the answer to Why
Side Note: Is it good practice to use HAVING count(ch.Name) != 1 instead of HAVING count(ch.Name) > 1 ?
The COUNT() operator will count all rows regardless of value. I think you might want to use a COUNT(DISTINCT ch.Name) which will only count unique names.
SELECT ch.CustomerId, count(DISTINCT ch.Name) AS cnt
FROM #CustomersHistory ch
GROUP BY ch.CustomerId HAVING count(DISTINCT ch.Name) > 1
For more information, take a look at the COUNT() article on book online

Find duplicate row "details" in table

OrderId OrderCode Description
-------------------------------
1 Z123 Stuff
2 ABC999 Things
3 Z123 Stuff
I have duplicates in a table like the above. I'm trying to get a report of which Orders are duplicates, and what Order they are duplicates of, so I can figure out how they got into the database.
So ideally I'd like to get an output something like;
OrderId IsDuplicatedBy
-------------------------
1 3
3 1
I can't work out how to code this in SQL.
You can use the same table twice in one query and join on the fields you need to check against. T1.OrderID <> T2.OrderID is needed to not find a duplicate for the same row.
declare #T table (OrderID int, OrderCode varchar(10), Description varchar(50))
insert into #T values
(1, 'Z123', 'Stuff'),
(2, 'ABC999', 'Things'),
(3, 'Z123', 'Stuff')
select
T1.OrderID,
T2.OrderID as IsDuplicatedBy
from #T as T1
inner join #T as T2
on T1.OrderCode = T2.OrderCode and
T1.Description = T2.Description and
T1.OrderID <> T2.OrderID
Result:
OrderID IsDuplicatedBy
1 3
3 1

tsql math across multiple dates in a table

I have a #variabletable simply defined as EOMDate(datetime), DandA(float), Coupon(float), EarnedIncome(float)
04/30/2008, 20187.5,17812.5,NULL
05/31/2008, 24640.63, 22265.63, NULL
06/30/2008, 2375, 26718.75,NULL
What I am trying to do is after the table is populated, I need to go back and calculate the EarnedIncome field to populate it.
the formula is DandA for the current month minus DandA for the previous month plus coupon.
Where I am having trouble is how can I do the update? So for 6/30 the value should be 4453.12 (2375-24640.63)+26718.75
I'll gladly take a clubbing over the head to get this resolved. thanks. Also, running under MS SQL2005 so any CTE ROW_OVER type solution can be used if possible.
You would need to use a subquery like this:
UPDATE #variabletable v1
SET EarnedIncome = DandA
- (SELECT DandA FROM #variabletable v2 WHERE GetMonthOnly(DATEADD(mm, -1, v2.EOMDate)=GetMonthOnly(v1.EOMDate))
+ Coupon
And I was making use of this helper function
DROP FUNCTION GetMonthOnly
GO
CREATE FUNCTION GetMonthOnly
(
#InputDate DATETIME
)
RETURNS DATETIME
BEGIN
RETURN CAST(CAST(YEAR(#InputDate) AS VARCHAR(4)) + '/' +
CAST(MONTH(#InputDate) AS VARCHAR(2)) + '/01' AS DATETIME)
END
GO
There's definitely quite a few ways to do this. You'll find pros and cons depending on how large your data set is, and other factors.
Here's my recommendation...
Declare #table as table
(
EOMDate DateTime,
DandA float,
Coupon Float,
EarnedIncome Float
)
Insert into #table Values('04/30/2008', 20187.5,17812.5,NULL)
Insert into #table Values('05/31/2008', 24640.63, 22265.63, NULL)
Insert into #table Values('06/30/2008', 2375, 26718.75,NULL)
--If we know that EOMDate will only contain one entry per month, and there's *always* one entry a month...
Update #Table Set
EarnedIncome=DandA-
(Select top 1 DandA
from #table t2
where t2.EOMDate<T1.EOMDate
order by EOMDate Desc)+Coupon
From #table T1
Select * from #table
--If there's a chance that there could be more per month, or we only want the values from the previous month (do nothing if it doesn't exist)
Update #Table Set
EarnedIncome=DAndA-(
Select top 1 DandA
From #table T2
Where DateDiff(month, T1.EOMDate, T2.EOMDate)=-1
Order by EOMDate Desc)+Coupon
From #Table T1
Select * from #table
--Leave the null, it's good for the data (since technically you cannot calculate it without a prior month).
I like the second method best because it will only calculate if there exists a record for the preceding month.
(add the following to the above script to see the difference)
--Add one for August
Insert into #table Values('08/30/2008', 2242, 22138.62,NULL)
Update #Table Set
EarnedIncome=DAndA-(
Select top 1 DandA
From #table T2
Where DateDiff(month, T1.EOMDate, T2.EOMDate)=-1
Order by EOMDate Desc
)+Coupon
From #Table T1
--August is Null because there's no july
Select * from #table
It's all a matter of exactly what do you want.
Use the record directly proceding the current record (regardless of date), or ONLY use the record that is a month before the current record.
Sorry about the format... Stackoverflow.com's answer editor and I do not play nice together.
:D
You can use a subquery to perform the calcuation, the only problem is what do you do with the first month because there is no previous DandA value. Here I've set it to 0 using isnull. The query looks like
Update MyTable
Set EarnedIncome = DandA + Coupon - IsNull( Select Top 1 DandA
From MyTable2
Where MyTable.EOMDate > MyTable2.EOMDate
Order by MyTable2.EOMDate desc), 0)
This also assumes that you only have one record per month in each table, and that there are't any gaps between months.
Another alternative is to calculate the running total when you are inserting your data, and have a constraint guarantee that your running total is correct:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
There may be a way to do this in a single statement, but in cases like this, I'd be inclined to set up a cursor to walk through each row, computing the new EarnedIncome field for that row, update the row, and then move to the next row.
Ex:
DECLARE #EOMDateVal DATETIME
DECLARE #EarnedIncomeVal FLOAT
DECLARE updCursor CURSOR FOR
SELECT EOMDate FROM #variabletable
OPEN updCursor
FETCH NEXT FROM updCursor INTO #EOMDateVal
WHILE ##FETCH_STATUS = 0
BEGIN
// Compute #EarnedIncomeVal for this row here.
// This also gives you a chance to catch data integrity problems
// that would cause you to fail the whole batch if you compute
// everything in a subquery.
UPDATE #variabletable SET EarnedIncome = #EarnedIncomeVal
WHERE EOMDate = #EOMDateVal
FETCH NEXT FROM updCursor INTO #EOMDateVal
END
CLOSE updCursor
DEALLOCATE updCursor