Joining Against Derived Table - tsql

I'm not sure of the terminology here, so let me give an example. I have this query:
SELECT * FROM Events
--------------------
Id Name StartPeriodId EndPeriodId
1 MyEvent 34 32
In here, the PeriodIds specify how long the event lasts for, think of it as weeks of the year specified in another table if that helps. Notice that the EndPeriodId is not necessarily sequentially after the StartPeriodId. So I could do this:
SELECT * FROM Periods WHERE Id = 34
-----------------------------------
Id StartDate EndDate
34 2009-06-01 2009-08-01
Please do not dwell on this structure, as it's only an example and not how it actually works. What I need to do is come up with this result set:
Id Name PeriodId
1 MyEvent 34
1 MyEvent 33
1 MyEvent 32
In other words, I need to select an event row for each period in which the event exists. I can calculate the Period information (32, 33, 34) easily, but my problem lies in pulling it out in a single query.
This is in SQL Server 2008.

I may be mistaken, and I can't test it right now because there's no SQL Server available right now, but wouldn't that simply be:
SELECT Events.Id, Events.Name, Periods.PeriodId
FROM Periods
INNER JOIN Events
ON Periods.ID BETWEEN Events.StartPeriodId AND Events.EndPeriodId

I'm assuming that you want a listing of all periods that fall between the dates for the periods specified by start/end period id's.
With CTE_PeriodDate (ID, MaxDate, MinDate)
as (
Select Id, Max(Dates) MaxDate, MinDate=Min(Dates) from (
Select e.ID, StartDate as Dates from Events e
Inner join Periods P on P.ID=StartPeriodID
Union All
Select e.ID, EndDate from Events e
Inner join Periods P on P.ID=StartPeriodID
Union All
Select e.ID, StartDate from Events e
Inner join Periods P on P.ID=EndPeriodID
Union All
Select e.ID, EndDate from Events e
Inner join Periods P on P.ID=EndPeriodID ) as A
group by ID)
Select E.Name, P.ID from CTE_PeriodDate CTE
Inner Join Periods p on
(P.StartDate>=MinDate and P.StartDate<=MaxDate)
and (p.EndDate<=MaxDate and P.EndDate>=MinDate)
Inner Join Events E on E.ID=CTE.ID
It's not the best way to do this, but it does work.
It get's the min and max date ranges for the periods specified on the event.
Using these two date it joins with the periods table on values inside the range between the two.
Kris

Related

subquery problem - need to get avg of a sum

I have 2 tables
sales table
weekly sales, store, date
store table
store, type, size
my sales table has multiple years, multiple stores and multiple types. I'm trying to get the avg sales by sqft for each store type per year. I have a sub query that shows the sales by sqft for each store but Im having trouble then rolling it up into my main query to get the avg by type
Anything jumps out with my final query?
SELECT
date_part('year', sales.date) AS year,
stores.type,
AVG(sales_by_sqft)
FROM
(SELECT
SUM((sales.weekly_sales)/stores.size) AS sales_by_sqft
FROM SALES
INNER JOIN stores ON sales.store = stores.store
GROUP BY sales.store) AS sq
FROM sales
INNER JOIN stores ON sales.store = stores.store
WHERE date_part('year', date) = 2012
GROUP BY year, stores.type;
getting a syntax error on the second FROM statement
I figured it out. AVG doesn't work on money. Once I changed that data type to integer, it all fell in place
SELECT
year,
type,
ROUND(AVG(sales_by_sqft),2)AS avg_sales_by_sqft
FROM
(SELECT
date_part('year', sales.date) AS year,
stores.type,
sales.store,
stores.size,
SUM(sales.weekly_sales) AS total_sales,
SUM(sales.weekly_sales)/ AVG(stores.size) AS sales_by_sqft
FROM sales
INNER JOIN stores ON sales.store = stores.store
GROUP BY year, stores.type, sales.store, stores.size) AS sq
GROUP BY 1,2
ORDER BY 1,3 DESC;

Count entities by date that fall between start date and end date (postgres)

Using Postgres, I'm trying to get a count of active entities by day over the past year.
For each entity I have a name, a start date, an end date. Assume the schema below-
Table x
Entity|Start_date|End_date
x | 2018-01-07 |2018-01-23
y | 2018-01-08 |2018-04-01
z | 2018-01-22 |2018-01-24
What I'm trying to output
Date|Count
2018-01-01|0
...
2018-01-07|1
...
2018-01-22|3
2018-01-23|3
2018-01-24|2
2018-01-25|1
...
2018-08-15|0
Have created a date table but don't know what to join it on. Feel like I have to create another table, then aggregate it but not sure what it is. If I don't need to create an additional table then great.
Any help would be appreciated! T
edit - FWIW I've researched but I'm not quite sure what it is I need to research here - what function or join I'm missing
edit 2 - to include example
You can do that with a left joining year's dates and entities like:
select t.d, count(e.entity)
from generate_series(
make_date(date_part('year',current_date)::integer,1,1),
make_date(date_part('year',current_date)::integer,12,31),
'1 day'::interval) t(d)
left join entities e
on t.d between e.start_date and e.end_date
group by t.d
order by t.d;
well, you need count days for every Start_date and End_date from "x" table. (this happens in left join subquery).
Then you need just create all year days list and "left join" those all day to counted days from "x" table.
Hope, this is what you want and if so, also hope, do you understand, what I tried to explain.
WITH year_days as (
select * from generate_series('2018-01-01'::date, '2018-12-31'::date, '1 day'::interval) as d
),
x(Entity, Start_date, End_date) AS (
values
('x','2018-01-07'::date, '2018-01-23'::date),
('y','2018-01-08'::date, '2018-04-01'::date),
('z','2018-01-22'::date, '2018-01-24'::date)
)
select year_days.d, coalesce(t.cnt, 0) from year_days
left join (
select generate_series(Start_date, End_date, '1 day'::interval) as d , count(*) as cnt from x group by d
) t
on year_days.d = t.d

Join 2 tables where two sets of numbers overlap within the joining columns

I need to join 2 tables with postgresql where two sets of numbers overlap within the joining columns.
The image below explains it - I am needing to take a table of congresspeople and their party affiliation and join it with a table of districts (based on when the districts were drawn or redrawn). The result will be the rows that show the dates that the district, state and congressperson were the same. Wherever there are dates of a district that are known and the congressperson dates are unknown, the dates that are known for the district are filled for that portion, and the dates for the congressperson are left blank - and vice versa.
For example, for the first rows in the tables:
Congressperson Table:
Arkansas, District 5, Republican: 1940-1945
District Table:
Arkansas, District 5: 1942-1963
Results in the following combinations (Start_Comb and End_Comb):
1940-1942
1942-1945
And for the combination where the district is unknown (1940-1942), the district dates are left blank.
The final set of date columns (gray) is simply the combinations that are only for the district (this is super easy).
In case you're wondering what this is for, I am creating an animated map, kind of like this, but for congressional districts over time:
https://www.youtube.com/watch?v=vQDyn04vtf8
I'll end up with something where there is a map where for every known district, there is a known or unknown party.
Haven't got very far, this is what I did:
SELECT *
FROM congressperson
JOIN districts
ON Start_Dist BETWEEN Start_Cong AND End_Cong
WHERE district.A = district.B
OR End_Dist BETWEEN Start_Cong AND Start_Dist
OR Start_Cong = Start_Dist OR End_Cong= End_Dist;
The idea is to make list of unique dates from both tables first. Then for each such date find next date (in this particular case dates are grouped by state, district, and next date is looked for particular state, district).
So now we have list of ranges we are looking for. Now we can join (for this paticular task left join) other tables by required conditions:
select
r.state,
c.start_cong,
c.end_cong,
c.party,
coalesce(c.district, d.district) district,
d.start_dist,
d.end_dist,
start_comb,
end_comb,
case when d.district is not null then start_comb end final_start,
case when d.district is not null then end_comb end final_end
from (
with dates as (
select
*
from (
SELECT
c.state,
c.district,
start_cong date
FROM congressperson c
union
SELECT
c.state,
c.district,
end_cong
FROM congressperson c
union
SELECT
d.state,
d.district,
start_dist
FROM district d
union
SELECT
d.state,
d.district,
end_dist
FROM district d
) DATES
group by
state,
district,
date
order by
state,
district,
date
)
select
dates.state,
dates.district,
dates.date start_comb,
(select
d.date
from
dates d
where
d.state = dates.state and
d.district = dates.district and
d.date > dates.date
order by
d.date
limit 1
) end_comb
from
dates) r
left join congressperson c on
c.state = r.state and
c.district = r.district and
start_comb between c.start_cong and c.end_cong and
end_comb between c.start_cong and c.end_cong
left join district d on
d.state = r.state and
d.district = r.district and
start_comb between d.start_dist and d.end_dist and
end_comb between d.start_dist and d.end_dist
where
end_comb is not null
order by
r.state, coalesce(c.district, d.district), start_comb, end_comb, start_cong, end_cong

How does one avoid Join placing a constraint on aggregate function?

[using postgres]
So the background to my scenario is:
I'm trying to get the average expenses by age where the user is active
cost = 40
Only 2 of the 33 year olds have purchased something
I have 34 active members that are 33 years old and active (whether or not they made a payment is irrelevant in this count)
with this in mind money spent per age = 40 / 34 = 1.18
what I am getting right now is = 40 / 2 = 20
I understand that it's constrained by the two users who made a purchase
So where did I get all of this?
select date_part('year', age(birthday)) as age,
avg(cost)
from person
inner join payment on person.person_id = payment.person_id
inner join product on payment.product_id = product.product_id
where
date_part('year', age(birthday))= 33 and user_state = 'active'
group by age
Unfortunately, when using an aggregate function (in this example avg())
it seems avg() is constrained to the result of the inner join (I've tried a left join to maintain having access to all users, it didn't seem to work since I still got the undesired result 20). Is there a way to avoid this? In other words can I make it so the avg() call is specific to my person table rather than the result of the join?
If it matters, this is how I am retrieving total sum.
select sum(cost)
from person
inner join payment on person.person_id = payment.person_id
inner join product on payment.product_id = product.product_id
where
date_part('year', age(birthday))= 33
and
user_state = 'active'
= 40
The obvious is to do the count of people I want and then do the sum seperately, but I'm trying to avoid going from one query to another.
avg will skip nulls so coalesce those null values into zeros and obviously left join:
select
date_part('year', age(birthday)) as age,
avg(coalesce(cost,0))
from
person
left join
payment on ...
left join
product on ...

Grouping consecutive dates in PostgreSQL

I have two tables which I need to combine as sometimes some dates are found in table A and not in table B and vice versa. My desired result is that for those overlaps on consecutive days be combined.
I'm using PostgreSQL.
Table A
id startdate enddate
--------------------------
101 12/28/2013 12/31/2013
Table B
id startdate enddate
--------------------------
101 12/15/2013 12/15/2013
101 12/16/2013 12/16/2013
101 12/28/2013 12/28/2013
101 12/29/2013 12/31/2013
Desired Result
id startdate enddate
-------------------------
101 12/15/2013 12/16/2013
101 12/28/2013 12/31/2013
Right. I have a query that I think works. It certainly works on the sample records you provided. It uses a recursive CTE.
First, you need to merge the two tables. Next, use a recursive CTE to get the sequences of overlapping dates. Finally, get the start and end dates, and join back to the "merged" table to get the id.
with recursive allrecords as -- this merges the input tables. Add a unique row identifier
(
select *, row_number() over (ORDER BY startdate) as rowid from
(select * from table1
UNION
select * from table2) a
),
path as ( -- the recursive CTE. This gets the sequences
select rowid as parent,rowid,startdate,enddate from allrecords a
union
select p.parent,b.rowid,b.startdate,b.enddate from allrecords b join path p on (p.enddate + interval '1 day')>=b.startdate and p.startdate <= b.startdate
)
SELECT id,g.startdate,g.enddate FROM -- outer query to get the id
-- inner query to get the start and end of each sequence
(select parent,min(startdate) as startdate, max(enddate) as enddate from
(
select *, row_number() OVER (partition by rowid order by parent,startdate) as row_number from path
) a
where row_number = 1 -- We only want the first occurrence of each record
group by parent)g
INNER JOIN allrecords a on a.rowid = parent
The below fragment does what you intend. (but it will probably be very slow) The problem is that detecteng (non)overlapping dateranges is impossible with standard range operators, since a range could be split into two parts.
So, my code does the following:
split the dateranges from table_A into atomic records, with one date per record
[the same for table_b]
cross join these two tables (we are only interested in A_not_in_B, and B_not_in_A) , remembering which of the L/R outer join wings it came from.
re-aggregate the resulting records into date ranges.
-- EXPLAIN ANALYZE
--
WITH RECURSIVE ranges AS (
-- Chop up the a-table into atomic date units
WITH ar AS (
SELECT generate_series(a.startdate,a.enddate , '1day'::interval)::date AS thedate
, 'A'::text AS which
, a.id
FROM a
)
-- Same for the b-table
, br AS (
SELECT generate_series(b.startdate,b.enddate, '1day'::interval)::date AS thedate
, 'B'::text AS which
, b.id
FROM b
)
-- combine the two sets, retaining a_not_in_b plus b_not_in_a
, moments AS (
SELECT COALESCE(ar.id,br.id) AS id
, COALESCE(ar.which, br.which) AS which
, COALESCE(ar.thedate, br.thedate) AS thedate
FROM ar
FULL JOIN br ON br.id = ar.id AND br.thedate = ar.thedate
WHERE ar.id IS NULL OR br.id IS NULL
)
-- use a recursive CTE to re-aggregate the atomic moments into ranges
SELECT m0.id, m0.which
, m0.thedate AS startdate
, m0.thedate AS enddate
FROM moments m0
WHERE NOT EXISTS ( SELECT * FROM moments nx WHERE nx.id = m0.id AND nx.which = m0.which
AND nx.thedate = m0.thedate -1
)
UNION ALL
SELECT rr.id, rr.which
, rr.startdate AS startdate
, m1.thedate AS enddate
FROM ranges rr
JOIN moments m1 ON m1.id = rr.id AND m1.which = rr.which AND m1.thedate = rr.enddate +1
)
SELECT * FROM ranges ra
WHERE NOT EXISTS (SELECT * FROM ranges nx
-- suppress partial subassemblies
WHERE nx.id = ra.id AND nx.which = ra.which
AND nx.startdate = ra.startdate
AND nx.enddate > ra.enddate
)
;