Full Outer Self Join [duplicate]

Full Outer Self Join [duplicate] - tsql

This question already has an answer here:
SQL Full Outer Join on same column in same table
(1 answer)
Closed 9 years ago.
The problem is to return the rows which contain nulls as well. Below is SQL code to create table and populate it with sample data.
I'm expecting below, but query does not show the two rows with null values.
src_t1 id1_t1 id2_t1 val_t1 src_t2 id1_t2 id2_t2 val_t2
b z z 4
a w w 100 b w w 1
a x x 200 b x x 2
a y y 300
Data:
CREATE TABLE sample (
src VARCHAR(6)
,id1 VARCHAR(6)
,id2 VARCHAR(6)
,val FLOAT
);
INSERT INTO sample (src, id1, id2, val)
VALUES ('a', 'w', 'w', 100)
,('b', 'w', 'w', 1)
,('a', 'x', 'x', 200)
,('b', 'x', 'x', 2)
,('a', 'y', 'y', 300)
,('b', 'z', 'z', 4)
;
This is my test query. It does not show results when t1.src = 'a' and t1.id1 = 'y' or when t2.src = 'b' and t2.id1 = 'z'.
Why?
What's the correct query?
SELECT t1.src, t1.id1, t1.id2, t1.val
,t2.src as src2, t2.id1, t2.id2, t2.val
FROM sample t1 FULL OUTER JOIN sample t2
ON t1.id1 = t2.id1 AND t1.id2 = t2.id2
WHERE (t1.src = 'a' AND t2.src = 'b')
OR (t1.src IS NULL AND t1.id1 IS NULL AND t1.id2 IS NULL)
OR (t2.src IS NULL AND t2.id1 IS NULL AND t2.id2 IS NULL)
I've also tried moving the conditions in the WHERE clause to the ON clause as well.
TIA.

The WHERE clause evaluates too late, effectively converting your query into an inner join.
Instead, write your query like this using proper JOIN syntax:
SELECT t1.src, t1.id1, t1.id2, t1.val
,t2.src as src2, t2.id1, t2.id2, t2.val
FROM (
select * from sample
where src='a'
) t1 FULL OUTER JOIN (
select * from sample
where src='b'
) t2
ON t1.id1 = t2.id1 AND t1.id2 = t2.id2
yielding this result set:
src id1 id2 val src2 id1 id2 val
---- ---- ---- ----------- ---- ---- ---- -----------
a w w 100 b w w 1
a x x 200 b x x 2
NULL NULL NULL NULL b z z 4
a y y 300 NULL NULL NULL NULL
Update:
Note also the use of two sub-queries to clearly separate the source table into two distinct relvars. I missed this for a minute on my first submission.

Actually, I think the solution is a bit cleaner if a CTE is used:
WITH A AS (
select * from sample where src='a'
),
B AS (
select * from sample where src='b'
)
SELECT *
FROM A FULL OUTER JOIN B
ON A.ID1 = B.ID1 AND A.ID2 = B.ID2
;

Related

SSRS Expression Split string in rows and column

I am working with SQL Server 2008 Report service. I have to try to split string values in different columns in same row in expression but I can't get the excepted output. I have provided input and output details. I have to split values by space (" ") and ("-").
Input :
Sample 1:
ASY-LOS,SLD,ME,A1,A5,J4A,J4B,J4O,J4P,J4S,J4T,J7,J10,J2A,J2,S2,S3,S3T,S3S,E2,E2F,E6,T6,8,SB1,E1S,OTH AS2-J4A,J4B,J4O,J4P,J4S,J4T,J7,J1O,J2A,S2,S3,J2,T6,T8,E2,E4,E6,SLD,SB1,OTH
Sample 2:
A1 A2 A3 A5 D2 D3 D6 E2 E4 E5 E6 EOW LH LL LOS OTH P8 PH PL PZ-1,2,T1,T2,T3 R2-C,E,A RH RL S1 S2-D S3
Output should be:
Thank you.

I wrote this before I saw your comment about having to do it in the report. If you can explain why you cannot do this in the dataset query then there may be a way around that.
Anyway, here's one way of doing this using SQL
DECLARE #t table (RowN int identity (1,1), sample varchar(500))
INSERT INTO #t (sample) SELECT 'ASY-LOS,SLD,ME,A1,A5,J4A,J4B,J4O,J4P,J4S,J4T,J7,J10,J2A,J2,S2,S3,S3T,S3S,E2,E2F,E6,T6,8,SB1,E1S,OTH AS2-J4A,J4B,J4O,J4P,J4S,J4T,J7,J1O,J2A,S2,S3,J2,T6,T8,E2,E4,E6,SLD,SB1,OTH'
INSERT INTO #t (sample) SELECT 'A1 A2 A3 A5 D2 D3 D6 E2 E4 E5 E6 EOW LH LL LOS OTH P8 PH PL PZ-1,2,T1,T2,T3 R2-C,E,A RH RL S1 S2-D S3'
drop table if exists #s1
SELECT RowN, sample, SampleIdx = idx, SampleValue = [Value]
into #s1
from #t t
CROSS APPLY
spring..fn_Split(sample, ' ') as x
drop table if exists #s2
SELECT
s1.*
, s2idx = Idx
, s2Value = [Value]
into #s2
FROM #s1 s1
CROSS APPLY spring..fn_Split(SampleValue, '-')
SELECT SampleKey = [1],
Output = [2] FROM #s2
PIVOT (
MAX(s2Value)
FOR s2Idx IN ([1],[2])
) p
This produced the following results
If you do not have a split function, here is the script to create the one I use
CREATE FUNCTION [dbo].[fn_Split]
/* Define I/O parameters WARNING! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE! */
(#pString VARCHAR(8000)
,#pDelimiter CHAR(1)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
/*"Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000: enough to cover VARCHAR(8000)*/
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)--10E+1 or 10 rows
,E2(N) AS (SELECT 1 FROM E1 a,E1 b)--10E+2 or 100 rows
,E4(N) AS (SELECT 1 FROM E2 a,E2 b)--10E+4 or 10,000 rows max
/* This provides the "base" CTE and limits the number of rows right up front
for both a performance gain and prevention of accidental "overruns" */
,cteTally(N) AS (
SELECT TOP (ISNULL(DATALENGTH(#pString), 0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E4
)
/* This returns N+1 (starting position of each "element" just once for each delimiter) */
,cteStart(N1) AS (
SELECT 1 UNION ALL
SELECT t.N + 1 FROM cteTally t WHERE SUBSTRING(#pString, t.N, 1) = #pDelimiter
)
/* Return start and length (for use in SUBSTRING later) */
,cteLen(N1, L1) AS (
SELECT s.N1
,ISNULL(NULLIF(CHARINDEX(#pDelimiter, #pString, s.N1), 0) - s.N1, 8000)
FROM cteStart s
)
/* Do the actual split.
The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. */
SELECT
idx = ROW_NUMBER() OVER (ORDER BY l.N1)
,value = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l

How to count the frequency of integers in a set of querystrings in postgres

I have a column in a postgres database which logs search querystrings for a page on our website.
The column contains data like
"a=2&b=4"
"a=2,3"
"b=4&a=3"
"a=4&a=3"
I'd like to work out the frequency of each value for a certain parameter (a).
value | freq
------|------
3 | 3
2 | 2
4 | 1
Anyway to do this in a single SQL statement?

Something like this:
with all_values as (
select string_to_array(split_part(parameter, '=', 2), ',') as query_params
from the_table d,
unnest(string_to_array(d.querystring, '&')) as x(parameter)
where x.parameter like 'a%'
)
select t.value, count(*)
from all_values av, unnest(av.query_params) as t(value)
group by t.value
order by t.value;
Online example: http://rextester.com/OXM67442

try something like this :
select data_value,count(*) from (
select data_name,unnest(string_to_array(data_values,',')) data_value from (
select split_part(data_array,'=',1) data_name ,split_part(data_array,'=',2) data_values from (
select unnest(string_to_array(mydata,'&')) data_array from mytable
) a
) b
) c where data_name='a' group by 1 order by 1

Assuming tha table that keeps the counts is called paramcount:
WITH vals(v) AS
(SELECT regexp_replace(p, '^.*=', '')
FROM regexp_split_to_table(
'b=4&a=3,2',
'&|,'
) p(p)
)
INSERT INTO paramcount (value, freq)
SELECT v, 1 FROM vals
ON CONFLICT (value)
DO UPDATE SET freq = paramcount.freq + 1
WHERE paramcount.value = EXCLUDED.value;

get csv integer after 'a='
split that to numbers
stat values
select v, count(*) from (
SELECT c,unnest(string_to_array(unnest(regexp_matches(c,'a=([0-9,]+)','g')),',')) as v FROM qrs
) x group by v;
Parametrize:
WITH argname(aname) as (values ('a'::TEXT))
select v, count(*) from (SELECT c,unnest(string_to_array(unnest(regexp_matches(c,aname||'=([0-9,]+)','g')),',')) as v FROM qrs,argname) x group by v;

Cumulative time difference per group

I have a table structure like below
DECLARE #XTable TABLE
(
ColA Varchar(20),
ColB Varchar(20),
DateCol DATE
)
INSERT INTO #XTable
VALUES
('A', 'X1', '4/1/2015'), ('A', 'X2', '4/10/2015'), ('A', 'X3', '4/12/2015'),
('A', 'X4', '4/16/2015'), ('B', 'X1', '5/18/2015'), ('B', 'X2', '5/20/2015')
Expected output:
/*
ColA ColB DateCol Diff
A X1 4/1/2015 0
A X2 4/10/2015 9
A X3 4/12/2015 2
A X3 4/12/2015 11
A X4 4/16/2015 15
A X4 4/16/2015 5
A X4 4/16/2015 4
B X1 5/18/2015 0
B X2 5/20/2015 12
*/
for example:
A X4 will have a difference of date from A X1, A X2 and A X3
& A X3 will have a difference of date from A X1 & A X2
I am able to get difference from last row via below query
;WITH Dataf
AS (
SELECT
*,
ROW_NUMBER() OVER (ORDER BY ColA,ColB, DateCol) AS RowNum
FROM
#XTable
)
SELECT a.ColA, a.ColB, SUM(DATEDIFF(Dd,b.DateCol,a.DateCol)) as TotalTime
FROM
Dataf AS A
LEFT OUTER JOIN Dataf AS B
ON A.RowNum = B.RowNum + 1 and a.ColA = b.ColA
GROUP BY a.ColA, a.ColB
Thought of applying multiple CTE, here is on what I am working now
;WITH Dataf
AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY ColA ORDER BY DateCol) AS RowNum
FROM
#XTable
),
CTE AS
(
SELECT ColA, ColB, DateCol, RowNum, NULL AS DateDifference
FROM Dataf WHERE RowNum = 1
UNION ALL
SELECT DF.ColA, DF.ColB, DF.DateCol, DF.RowNum ,
DATEDIFF(DD, CT.DateCol, DF.DateCol) AS DateDifference
FROM Dataf DF
JOIN CTE CT ON DF.ColA = CT.ColA AND DF.RowNum = CT.RowNum + 1
)
SELECT *
FROM CTE
ORDER BY ColA

You can instead use LEFT OUTER JOIN to the CTE you prepared. Here is how it can be done
;WITH DataForm
AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Cola ORDER BY DateCol) AS RowNum
FROM
#XTable
)
SELECT ColA, ColB, DateCol, 0
FROM DataForm WHERE RowNum = 1
UNION
SELECT T1.ColA, T1.ColB, T1.DateCol
, DATEDIFF(dd,T2.DateCol, T1.Datecol)
FROM DataForm T1
LEFT OUTER JOIN DataForm T2 ON T1.ColA = T2.ColA
AND T1.RowNum >= T2.RowNum
WHERE DATEDIFF(dd,T2.DateCol, T1.Datecol) > 0
SQL Fiddler Example

how to do dead reckoning on column of table, postgresql

I have a table looks like,
x y
1 2
2 null
3 null
1 null
11 null
I want to fill the null value by conducting a rolling
function to apply y_{i+1}=y_{i}+x_{i+1} with sql as simple as possible (inplace)
so the expected result
x y
1 2
2 4
3 7
1 8
11 19
implement in postgresql. I may encapsulate it in a window function, but the implementation of custom function seems always complex

WITH RECURSIVE t AS (
select x, y, 1 as rank from my_table where y is not null
UNION ALL
SELECT A.x, A.x+ t.y y , t.rank + 1 rank FROM t
inner join
(select row_number() over () rank, x, y from my_table ) A
on t.rank+1 = A.rank
)
SELECT x,y FROM t;

You can iterate over rows using a recursive CTE. But in order to do so, you need a way to jump from row to row. Here's an example using an ID column:
; with recursive cte as
(
select id
, y
from Table1
where id = 1
union all
select cur.id
, prev.y + cur.x
from Table1 cur
join cte prev
on cur.id = prev.id + 1
)
select *
from cte
;
You can see the query at SQL Fiddle. If you don't have an ID column, but you do have another way to order the rows, you can use row_number() to get an ID:
; with recursive sorted as
(
-- Specify your ordering here. This example sorts by the dt column.
select row_number() over (order by dt) as id
, *
from Table1
)
, cte as
(
select id
, y
from sorted
where id = 1
union all
select cur.id
, prev.y + cur.x
from sorted cur
join cte prev
on cur.id = prev.id + 1
)
select *
from cte
;
Here's the SQL Fiddle link.

Dealing with periods and dates without using cursors

I would like to solve this issue avoiding to use cursors (FETCH).
Here comes the problem...
1st Table/quantity
------------------
periodid periodstart periodend quantity
1 2010/10/01 2010/10/15 5
2st Table/sold items
-----------------------
periodid periodstart periodend solditems
14343 2010/10/05 2010/10/06 2
Now I would like to get the following view or just query result
Table Table/stock
-----------------------
periodstart periodend itemsinstock
2010/10/01 2010/10/04 5
2010/10/05 2010/10/06 3
2010/10/07 2010/10/15 5
It seems impossible to solve this problem without using cursors, or without using single dates instead of periods.
I would appreciate any help.
Thanks

DECLARE #t1 TABLE (periodid INT,periodstart DATE,periodend DATE,quantity INT)
DECLARE #t2 TABLE (periodid INT,periodstart DATE,periodend DATE,solditems INT)
INSERT INTO #t1 VALUES(1,'2010-10-01T00:00:00.000','2010-10-15T00:00:00.000',5)
INSERT INTO #t2 VALUES(14343,'2010-10-05T00:00:00.000','2010-10-06T00:00:00.000',2)
DECLARE #D1 DATE
SELECT #D1 = MIN(P) FROM (SELECT MIN(periodstart) P FROM #t1
UNION ALL
SELECT MIN(periodstart) FROM #t2) D
DECLARE #D2 DATE
SELECT #D2 = MAX(P) FROM (SELECT MAX(periodend) P FROM #t1
UNION ALL
SELECT MAX(periodend) FROM #t2) D
;WITH
L0 AS (SELECT 1 AS c UNION ALL SELECT 1),
L1 AS (SELECT 1 AS c FROM L0 A CROSS JOIN L0 B),
L2 AS (SELECT 1 AS c FROM L1 A CROSS JOIN L1 B),
L3 AS (SELECT 1 AS c FROM L2 A CROSS JOIN L2 B),
L4 AS (SELECT 1 AS c FROM L3 A CROSS JOIN L3 B),
Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS i FROM L4),
Dates AS(SELECT DATEADD(DAY,i-1,#D1) AS D FROM Nums where i <= 1+DATEDIFF(DAY,#D1,#D2)) ,
Stock As (
SELECT D ,t1.quantity - ISNULL(t2.solditems,0) AS itemsinstock
FROM Dates
LEFT OUTER JOIN #t1 t1 ON t1.periodend >= D and t1.periodstart <= D
LEFT OUTER JOIN #t2 t2 ON t2.periodend >= D and t2.periodstart <= D ),
NStock As (
select D,itemsinstock, ROW_NUMBER() over (order by D) - ROW_NUMBER() over (partition by itemsinstock order by D) AS G
from Stock)
SELECT MIN(D) AS periodstart, MAX(D) AS periodend, itemsinstock
FROM NStock
GROUP BY G, itemsinstock
ORDER BY periodstart

Hopefully a little easier to read than Martin's. I used different tables and sample data, hopefully extrapolating the right info:
CREATE TABLE [dbo].[Quantity](
[PeriodStart] [date] NOT NULL,
[PeriodEnd] [date] NOT NULL,
[Quantity] [int] NOT NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[SoldItems](
[PeriodStart] [date] NOT NULL,
[PeriodEnd] [date] NOT NULL,
[SoldItems] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO Quantity (PeriodStart,PeriodEnd,Quantity)
SELECT '20100101','20100115',5
INSERT INTO SoldItems (PeriodStart,PeriodEnd,SoldItems)
SELECT '20100105','20100107',2 union all
SELECT '20100106','20100108',1
The actual query is now:
;WITH Dates as (
select PeriodStart as DateVal from SoldItems union select PeriodEnd from SoldItems union select PeriodStart from Quantity union select PeriodEnd from Quantity
), Periods as (
select d1.DateVal as StartDate, d2.DateVal as EndDate
from Dates d1 inner join Dates d2 on d1.DateVal < d2.DateVal left join Dates d3 on d1.DateVal < d3.DateVal and d3.DateVal < d2.DateVal where d3.DateVal is null
), QuantitiesSold as (
select StartDate,EndDate,COALESCE(SUM(si.SoldItems),0) as Quantity
from Periods p left join SoldItems si on p.StartDate < si.PeriodEnd and si.PeriodStart < p.EndDate
group by StartDate,EndDate
)
select StartDate,EndDate,q.Quantity - qs.Quantity
from QuantitiesSold qs inner join Quantity q on qs.StartDate < q.PeriodEnd and q.PeriodStart < qs.EndDate
And the result is:
StartDate EndDate (No column name)
2010-01-01 2010-01-05 5
2010-01-05 2010-01-06 3
2010-01-06 2010-01-07 2
2010-01-07 2010-01-08 4
2010-01-08 2010-01-15 5
Explanation: I'm using three Common Table Expressions. The first (Dates) is gathering all of the dates that we're talking about, from the two tables involved. The second (Periods) selects consecutive values from the Dates CTE. And the third (QuantitiesSold) then finds items in the SoldItems table that overlap these periods, and adds their totals together. All that remains in the outer select is to subtract these quantities from the total quantity stored in the Quantity Table

John, what you could do is a WHILE loop. Declare and initialise 2 variables before your loop, one being the start date and the other being end date. Your loop would then look like this:
WHILE(#StartEnd <= #EndDate)
BEGIN
--processing goes here
SET #StartEnd = #StartEnd + 1
END
You would need to store your period definitions in another table, so you could retrieve those and output rows when required to a temporary table.
Let me know if you need any more detailed examples, or if I've got the wrong end of the stick!

Damien,
I am trying to fully understand your solution and test it on a large scale of data, but I receive following errors for your code.
Msg 102, Level 15, State 1, Line 20
Incorrect syntax near 'Dates'.
Msg 102, Level 15, State 1, Line 22
Incorrect syntax near ','.
Msg 102, Level 15, State 1, Line 25
Incorrect syntax near ','.

Damien,
Based on your solution I also wanted to get a neat display for StockItems without overlapping dates. How about this solution?
CREATE TABLE [dbo].[SoldItems](
[PeriodStart] [datetime] NOT NULL,
[PeriodEnd] [datetime] NOT NULL,
[SoldItems] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO SoldItems (PeriodStart,PeriodEnd,SoldItems)
SELECT '20100105','20100106',2 union all
SELECT '20100105','20100108',3 union all
SELECT '20100115','20100116',1 union all
SELECT '20100101','20100120',10
;WITH Dates as (
select PeriodStart as DateVal from SoldItems
union
select PeriodEnd from SoldItems
union
select PeriodStart from Quantity
union
select PeriodEnd from Quantity
), Periods as (
select d1.DateVal as StartDate, d2.DateVal as EndDate
from Dates d1
inner join Dates d2 on d1.DateVal < d2.DateVal
left join Dates d3 on d1.DateVal < d3.DateVal and
d3.DateVal < d2.DateVal where d3.DateVal is null
), QuantitiesSold as (
select StartDate,EndDate,SUM(si.SoldItems) as Quantity
from Periods p left join SoldItems si on p.StartDate < si.PeriodEnd and si.PeriodStart < p.EndDate
group by StartDate,EndDate
)
select StartDate,EndDate, qs.Quantity
from QuantitiesSold qs
where qs.quantity is not null

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Full Outer Self Join [duplicate] - tsql

Actually, I think the solution is a bit cleaner if a CTE is used: WITH A AS ( select * from sample where src='a' ), B AS ( select * from sample where src='b' ) SELECT * FROM A FULL OUTER JOIN B ON A.ID1 = B.ID1 AND A.ID2 = B.ID2 ;

Related

SSRS Expression Split string in rows and column

How to count the frequency of integers in a set of querystrings in postgres

Cumulative time difference per group

how to do dead reckoning on column of table, postgresql

Dealing with periods and dates without using cursors

Categories

Resources