I got a problem regarding missing rows in a table that is giving me a headache.
As base data, I have the following table:
declare #table table
(
id1 int,
id2 int,
ch char(1) not null,
val int
)
insert into #table values (1112, 121, 'A', 12)
insert into #table values (1351, 121, 'A', 13)
insert into #table values (1411, 121, 'B', 81)
insert into #table values (1312, 7, 'C', 107)
insert into #table values (1401, 2, 'A', 107)
insert into #table values (1454, 2, 'D', 107)
insert into #table values (1257, 6, 'A', 1)
insert into #table values (1269, 6, 'B', 12)
insert into #table values (1335, 6, 'C', 12)
insert into #table values (1341, 6, 'D', 5)
insert into #table values (1380, 6, 'A', 3)
The output should be ordered by id2 and follow a fixed sequence of ch, which should repeat until next id2 begins.
Sequence:
'A'
'B'
'C'
'D'
If the sequence or the pattern is interrupted, it should fill the missing rows with null, so that i get this result table:
id1 id2 ch val
----------------------------
1112 121 'A' 12
NULL 121 'B' NULL
NULL 121 'C' NULL
NULL 121 'D' NULL
1351 121 'A' 13
1411 121 'B' 81
NULL 121 'C' NULL
NULL 121 'D' NULL
NULL 7 'A' NULL
NULL 7 'B' NULL
1312 7 'C' 107
NULL 7 'D' NULL
1401 2 'A' 107
NULL 2 'B' NULL
NULL 2 'C' NULL
1454 2 'D' 107
and so on...
What I'm looking for is a way to do this without iterations.
I hope someone can help!
Thanks in advance!
A solution might be this:
declare #table table ( id1 int, id2 int, ch char(1) not null, val int )
insert into #table values (1112, 121, 'A', 12)
,(1351, 121, 'A', 13),(1411, 121, 'B', 81),(1312, 7, 'C', 107),(1401, 2, 'A', 107)
,(1454, 2, 'D', 107),(1257, 6, 'A', 1),(1269, 6, 'B', 12),(1335, 6, 'C', 12)
,(1341, 6, 'D', 5),(1380, 6, 'A', 3)
;with foo as
(select
*
,row_number() over (partition by id2 order by id1) rwn
,ascii(isnull(lag(ch,1) over (partition by id2 order by id1),'A'))-ascii('A') prev
,count(*) over (partition by id2,ch) nr
,ascii(ch)-ascii('A') cur
from #table
)
,bar as
(
select
*,case when cur<=prev and rwn>1 then 4 else 0 end + cur-prev step
from foo
)
,foobar as
(
select *,sum(step) over (partition by id2 order by id1 rows unbounded preceding) rownum
from bar
)
,iterations as
(
select id2,max(nr) nr from foo
group by id2
)
,blanks as
(
select
id2,ch chnr,char(ch+ascii('A') )ch,ROW_NUMBER() over (partition by id2 order by c.nr,ch)-1 rownum,c.nr
from iterations a
inner join (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) c(nr)
on c.nr<=a.nr
cross join (values (0),(1),(2),(3)) b(ch)
)
select
b.id1,a.id2,a.ch,b.val
from blanks a
left join foobar b
on a.id2=b.id2 and a.rownum=b.rownum
order by a.id2,a.rownum
I first make the query "foo" which looks at the row number and gets the previous value for ch for each id2.
"bar" then finds how many missing values there are between the rows. For instance If the previous was an A and the current is a c then there are 2. If the previous was an A and the current is an A, then there are 4!
"foobar" then adds the steps, thus numbering the original rows, where they should be in the final output.
"iterations" counts the number of times the "ABCD" rows should appear.
"BLANKS" then is all the final rows, that is for each id2, it outputs all the "ABCD" rows that should be in the final output, and numbers them in rownum
Finally I left join "foobar" with "BLANKS" on id2 and rownum. Thus we get the correct number of rows, and the places where there are values in the original is output.
If you can manage to add an extra column in your table, that defines which [id2] are part from the same sequence you can try this:
declare #table table
(
id1 int,
id2 int,
ch char(1) not null,
val int,
category int -- extra column
)
insert into #table values (1112, 121, 'A', 12, 1)
insert into #table values (1351, 121, 'A', 13, 2)
insert into #table values (1411, 121, 'B', 81, 2)
insert into #table values (1312, 7, 'C', 107, 3)
insert into #table values (1401, 2, 'A', 107, 4)
insert into #table values (1454, 2, 'D', 107, 4)
insert into #table values (1257, 6, 'A', 1, 5)
insert into #table values (1269, 6, 'B', 12, 5)
insert into #table values (1335, 6, 'C', 12, 5)
insert into #table values (1341, 6, 'D', 5, 5)
insert into #table values (1380, 6, 'A', 3, 5)
DECLARE #sequence table (seq varchar(1))
INSERT INTO #sequence values ('A'), ('B'), ('C'), ('D')
SELECT b.id1, a.id2, a.seq, b.val, a.category
INTO #T1
FROM (
SELECT *
FROM #table
CROSS JOIN #sequence
) A
LEFT JOIN (
SELECT * FROM #table
) B
ON 1=1
AND a.id1 = b.id1
AND a.id2 = b.id2
AND a.seq = b.ch
AND a.val = b.val
;WITH rem_duplicates AS (
SELECT *, dup = ROW_NUMBER() OVER (PARTITION by id2, seq, category ORDER BY id1 DESC)
FROM #T1
) DELETE FROM rem_duplicates WHERE dup > 1
SELECT * FROM #T1 ORDER BY id2 DESC, category ASC, seq ASC
DROP TABLE #T1
I'm little confused by your output, try this:
Update
DECLARE #table TABLE
(
row INT IDENTITY(1, 1) ,
id1 INT ,
id2 INT ,
ch CHAR(1) NOT NULL ,
val INT
);
DECLARE #Sequence TABLE ( ch3 CHAR(1) NOT NULL );
INSERT INTO #Sequence
VALUES ( 'A' );
INSERT INTO #Sequence
VALUES ( 'B' );
INSERT INTO #Sequence
VALUES ( 'C' );
INSERT INTO #Sequence
VALUES ( 'D' );
INSERT INTO #table
VALUES ( 1112, 121, 'A', 12 );
INSERT INTO #table
VALUES ( 1351, 121, 'A', 13 );
INSERT INTO #table
VALUES ( 1411, 121, 'B', 81 );
INSERT INTO #table
VALUES ( 1312, 7, 'C', 107 );
INSERT INTO #table
VALUES ( 1401, 2, 'A', 107 );
INSERT INTO #table
VALUES ( 1454, 2, 'D', 107 );
INSERT INTO #table
VALUES ( 1257, 6, 'A', 1 );
INSERT INTO #table
VALUES ( 1269, 6, 'B', 12 );
INSERT INTO #table
VALUES ( 1335, 6, 'C', 12 );
INSERT INTO #table
VALUES ( 1341, 6, 'D', 5 );
INSERT INTO #table
VALUES ( 1380, 6, 'A', 3 );
SELECT r.id1 ,
fin.id2 ,
ch3 ,
r.val
FROM ( SELECT *
FROM ( SELECT CASE WHEN r.chd - l.chd = 1 THEN 0
ELSE 1
END [gap in sq] ,
l.*
FROM ( SELECT id2 ,
ASCII(ch) chd ,
ch ,
val ,
id1 ,
row
FROM #table
) AS l
LEFT JOIN ( SELECT id2 ,
ASCII(ch) chd ,
row
FROM #table
) AS r ON l.row = r.row - 1
) AS temp ,
#Sequence s
WHERE temp.[gap in sq] = 1
OR ( temp.[gap in sq] = 0
AND s.ch3 = temp.ch
)
) AS fin
LEFT JOIN #table r ON r.id2 = fin.id2
AND r.id1 = fin.id1
AND r.ch = fin.ch3
I was reading an article that explained the difference between join and in and exists clause but I got confused with the explanation of different results when using NOT IN vs. NOT EXISTS clause. Can someone clarify why there is a difference between the output for NOT EXISTS clause vs. NOT IN clause? I tried after deleting the NULL row (t2.id = 8) from the table t2 and still got the same result.
Here's the SQL script from the article:
CREATE TABLE t1 (id INT, title VARCHAR(20), someIntCol INT)
GO
CREATE TABLE t2 (id INT, t1Id INT, someData VARCHAR(20))
GO
INSERT INTO t1
SELECT 1, 'title 1', 5 UNION ALL
SELECT 2, 'title 2', 5 UNION ALL
SELECT 3, 'title 3', 5 UNION ALL
SELECT 4, 'title 4', 5 UNION ALL
SELECT null, 'title 5', 5 UNION ALL
SELECT null, 'title 6', 5
INSERT INTO t2
SELECT 1, 1, 'data 1' UNION ALL
SELECT 2, 1, 'data 2' UNION ALL
SELECT 3, 2, 'data 3' UNION ALL
SELECT 4, 3, 'data 4' UNION ALL
SELECT 5, 3, 'data 5' UNION ALL
SELECT 6, 3, 'data 6' UNION ALL
SELECT 7, 4, 'data 7' UNION ALL
SELECT 8, null, 'data 8' UNION ALL
SELECT 9, 6, 'data 9' UNION ALL
SELECT 10, 6, 'data 10' UNION ALL
SELECT 11, 8, 'data 11'
And here's the SQL queries and their explanation:
-- IN doesn't get correct results.
-- That's because of how IN treats NULLs and the Three-valued logic
-- NULL is treated as an unknown, so if there's a null in the t2.t1id
-- NOT IN will return either NOT TRUE or NOT UNKNOWN. And neither can be TRUE.
-- when there's a NULL in the t1id column of the t2 table the NOT IN query will always return an empty set.
SELECT t1.*
FROM t1
WHERE t1.id NOT IN (SELECT t1id FROM t2)
-- NOT EXISTS gets correct results
SELECT t1.*
FROM t1
WHERE NOT EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id)
GO
DROP TABLE t2
DROP TABLE t1
Here's the link to the article: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx
Thank you!
As I can see, you can use them as the same thing in a lot of cases, but you can't forget the details behind them.
Probably you can get the same results applying both NOT IN and NOT EXISTS, but you could see differences in query which involve the NULL value. Because NOT EXISTS is the only way to obtain those rows with the NULL value.
You can see it better in this example:
update cars set c_owner = NULL where c_id = BMW03444
Well... Let's try to see if we have any car in stock that has not been sold yet.
select count(*) from cars where c_owner not it (select c_name from customers);
Output:
COUNT(*): 0
Where's the failure? Quite simple. You're not requesting a group of cars whose buyers has not been included in the list. You are simply asking for a car without owner. Anybody, even if he's not in the list. The correct form is:
select count(*)
from cars c1
where not exists (
select c_owner
from customers c2
where c1.c_owner=c2.customer_id
);
COUNT(*): 1
This is because NOT IN needs specific values to check in. So NULL values are set as FALSE and not counted.
NOT EXISTS checks the non existence of an element in a set, so NULL values are set as TRUE and are included.
CREATE TABLE #t(LocationCode varchar(10), ResourceId int, TransType char(3))
INSERT #t
SELECT 'STORE 001', 1, 'In' UNION ALL
SELECT 'STORE 002', 2, 'In' UNION ALL
SELECT 'STORE 003', 3, 'In' UNION ALL
SELECT 'STORE 001', 1, 'Out' UNION ALL
SELECT 'STORE 004', 1, 'In' UNION ALL
SELECT 'STORE 004', 4, 'In' UNION ALL
SELECT 'STORE 004', 4, 'Out' UNION ALL
SELECT 'STORE 004', 1, 'Out' UNION ALL
SELECT 'STORE 001', 1, 'In'
DROP TABLE #t
How to show only the items with the corresponding location having maximum number of "Ins" when compared with "Outs" (sorry for my bad english).
LocationCode ResourceId
STORE 001[edited] 1
STORE 002 2
STORE 003 3
Assuming you only want Ins where there isn't a matching Out.
SELECT *
FROM #t AS a
WHERE a.TransType = 'In'
AND NOT EXISTS (
SELECT *
FROM #t AS b
WHERE b.TransType = 'Out'
AND b.LocationCode = a.LocationCode
AND b.ResourceId = a.ResourceId
)
You'd need more data in your schema to be able to match an Out with an In by time.
Try something simpler like this:
SELECT LocationCode, ResourceID
FROM #t
GROUP BY LocationCode, ResourceID
HAVING COUNT(*) % 2 = 1
Here's an example where the transactions are sequenced and two ways to use that sequence:
CREATE TABLE #t(LocationCode varchar(10), ResourceId int, TransType char(3), Seq int UNIQUE NOT NULL)
INSERT #t
SELECT 'STORE 001', 1, 'In', 1 UNION ALL
SELECT 'STORE 002', 2, 'In', 2 UNION ALL
SELECT 'STORE 003', 3, 'In', 3 UNION ALL
SELECT 'STORE 001', 1, 'Out', 4 UNION ALL
SELECT 'STORE 004', 1, 'In', 5 UNION ALL
SELECT 'STORE 004', 4, 'In', 6 UNION ALL
SELECT 'STORE 004', 4, 'Out', 7 UNION ALL
SELECT 'STORE 004', 1, 'Out', 8 UNION ALL
SELECT 'STORE 001', 1, 'In', 9
;WITH Ins AS (
SELECT * FROM #t
WHERE TransType = 'In'
)
,Outs AS (
SELECT * FROM #t
WHERE TransType = 'Out'
)
,Matched AS (
SELECT *,
(SELECT MIN(Seq)
FROM Outs
WHERE Outs.LocationCode = Ins.LocationCode
AND Outs.ResourceID = Ins.ResourceID
AND Outs.Seq > Ins.Seq) AS OutSeq
FROM Ins
)
SELECT *
FROM Matched
WHERE OutSeq IS NULL
;WITH LastIn AS (
SELECT ResourceID, MAX(Seq) AS Seq
FROM #t
WHERE TransType = 'In'
GROUP BY ResourceID
)
SELECT *
FROM LastIn
WHERE NOT EXISTS (
SELECT *
FROM #t outs
WHERE outs.TransType = 'Out'
AND Outs.ResourceID = LastIn.ResourceID
AND outs.Seq > LastIn.Seq)
DROP TABLE #t
Table 1:
AccountId, ReferenceId, Name, (lots of other columns)
Table 2:
AccountId, ReferenceId, (other columns)
How can I do a select to get the following:
AccountId, ReferenceId, [Count(*) in Table2 where accountId and reference ID match.]
1, AB, 1
1, AC, 0
2, AD, 4
2, EF, 0
etc
Guessing a join, but that gives me values, not a count?
Tried adding a count, but get errors?
SELECT T1.AccountId,
T1.ReferenceId,
COUNT(T2.ReferenceId) AS Cnt
FROM Table1 T1
LEFT JOIN Table2 T2
ON T1.AccountId = T2.AccountId
AND T1.ReferenceId = T2.ReferenceId
GROUP BY T1.AccountId,
T1.ReferenceId
Something like:
SELECT t1.AccountId, t1.ReferenceId, COUNT(t2.AccountId)
FROM Table1 t1
LEFT JOIN Table2 t2 ON t1.AccountId = t2.AccountId AND
t1.ReferenceId = t2.ReferenceId
GROUP BY t1.AccountId, t1.ReferenceId
should work. The trick is to group by both key values so you can aggregate over other values. In this case you want to simply count values from other rows (you could also sum or average values from the grouped-by rows.).
sample data
declare #tbl1 table (AccountId INT, ReferenceId int, Name varchar(20))
declare #tbl2 table (AccountId INT, ReferenceId int)
insert into #tbl1 select 1, 10, 'White'
insert into #tbl1 select 2, 20, 'Green'
insert into #tbl1 select 3, 30, 'Black'
insert into #tbl1 select 3, 40, 'Red'
insert into #tbl2 select 1, 10
insert into #tbl2 select 1, 10
insert into #tbl2 select 2, 20
insert into #tbl2 select 3, 30
Query
select t.AccountId, t.ReferenceId, t.Name
,(select COUNT(*) from #tbl2 t2
where t.AccountId = t2.AccountId
and t.ReferenceId = t.ReferenceId) as countt
from #tbl1 t
SELECT t1.AccountId, t1.ReferenceId, COUNT(t2.AccountId)
FROM Table1 t1 LEFT JOIN Table2 t2
ON (t1.AccountId=t2.AccountId AND t1.ReferenceId=t2.ReferenceId)
GROUP BY Table1.AccountId, Table1.ReferenceId