postgres complicated query - postgresql

I wonder is it possible to make such query. The problem is that I have a table where are some numbers for date.
Lets say I have 3 columns: Date, Value, Good/Bad
I.e:
2014-03-03 100 Good
2014-03-03 15 Bad
2014-03-04 120 Good
2014-03-04 10 Bad
And I want to select and subtract Good-Bad:
2014-03-03 85
2014-03-04 110
Is it possible? I am thinking a lot and don't have an idea yet. It would be rather simple if I had Good and Bad values in seperate tables.

The trick is to join your table back to it self as shown below. myTable as A will read only the Good rows and myTable as B will read only the Bad rows. Those rows then get joined into a signle row based on date.
SQL Fiddle Demo
select
a.date
,a.count as Good_count
,b.count as bad_count
,a.count-b.count as diff_count
from myTable as a
inner join myTable as b
on a.date = b.date and b.type = 'Bad'
where a.type = 'Good'
Output returned:
DATE GOOD_COUNT BAD_COUNT DIFF_COUNT
March, 03 2014 00:00:00+0000 100 15 85
March, 04 2014 00:00:00+0000 120 10 110
Another aproach would be to use Group by instead of the inner join:
select
a.date
,sum(case when type = 'Good' then a.count else 0 end) as Good_count
,sum(case when type = 'Bad' then a.count else 0 end) as Bad_count
,sum(case when type = 'Good' then a.count else 0 end) -
sum(case when type = 'Bad' then a.count else 0 end) as Diff_count
from myTable as a
group by a.date
order by a.date
Both approaches produce the same result.

Related

Window Function For Consecutive Dates

I want to know how many users were active for 3 consecutive days on any given day.
e.g on 2022-11-03, 1 user (user_id = 111) was active 3 days in a row. Could someone please advise what kind of window function(?) would be needed?
This is my dataset:
user_id
active_date
111
2022-11-01
111
2022-11-02
111
2022-11-03
222
2022-11-01
333
2022-11-01
333
2022-11-09
333
2022-11-10
333
2022-11-11
If you are confident there are no duplicate user_id + active_date rows in the source data, then you can use two LAG functions like this:
SELECT user_id,
active_date,
CASE WHEN DATEADD(day, -1, active_date) = LAG(active_date, 1) OVER (PARTITION BY user_id ORDER BY active_date)
AND DATEADD(day, -2, active_date) = LAG(active_date, 2) OVER (PARTITION BY user_id ORDER BY active_date)
THEN 'Yes'
ELSE 'No'
END AS rowof3
FROM your_table
ORDER BY user_id, active_date;
If there might be duplication, use this FROM clause instead:
FROM (SELECT DISTINCT user_id, active_date :: DATE FROM your_table)

Pivoting results from CTE in Postgres

I have a large SQL statements(PostgreSQL version 11) with many CTE's, i want to use the results from an intermediary CTE to create a PIVOTed set of results and join it with other CTE.
Below is a small part of my query and the CTE "previous_months_actual_sales" is the one i need to PIVOT.
,last_24 as
(
SELECT l_24m::DATE + (interval '1' month * generate_series(0,24)) as last_24m
FROM last_24_month_start LIMIT 24
)
,previous_months_actual_sales as
(
SELECT TO_CHAR(created_at,'YYYY-MM') as dates
,b.code,SUM(quantity) as qty
FROM base b
INNER JOIN products_sold ps ON ps.code=b.code
WHERE TO_CHAR(created_at,'YYYY-MM')
IN(SELECT TO_CHAR(last_24m,'YYYY-MM') FROM last_24)
GROUP BY b.code,TO_CHAR(created_at,'YYYY-MM')
)
SELECT * FROM previous_months_actual_sales
The results of this CTE "previous_months_actual_sales" is shown below,
dates code qty
"2018-04" "0009" 23
"2018-05" "0009" 77
"2018-06" "0008" 44
"2018-07" "0008" 1
"2018-08" "0009" 89
The expected output based on the above result is,
code. 2018-04. 2018-05. 2018-06. 2018-07. 2018-08
"0009". 23 77 89
"0008". 44 1
Is there a way to achieve this?

Using Lag() function to retrieve values across dates

I am trying to use the LAG() and LEAD() functions in postgres to retrieve values from other rows/records in a table and I am running into some difficulty. The functionality works as intended as long as the LAG or LEAD function is only looking at dates within the same month (i.e. June 2nd can look back to June 1st, but when I try to look back to May 31st, I retrieve a NULL value).
Here's what the table looks like
_date count_daily_active_users count_new_users day1_users users_arriving_today_who_returned_tomrrow day_retained_users
5/27/2013 1742 335 266 207 0.617910448
5/28/2013 1768 241 207 146 0.605809129
5/29/2013 1860 272 146 161 0.591911765
5/30/2013 2596 841 161 499 0.59334126
5/31/2013 2837 703 499 NULL NULL
6/1/2013 12881 10372 0 5446 0.525067489
6/2/2013 14340 6584 5446 2781 0.422387606
6/3/2013 12222 3690 2781 1494 0.404878049
6/4/2013 25861 17254 1494 8912 0.516517909
From that table you can see that on May 31st when I try to 'look ahead' to June 1st to retrieve the number of users who arrived for the first time on May 31st and then returned again on June 1st I get a NULL value. This happens at every month boundary and it happens regardless of the number of days I try to 'look ahead'. So if I look ahead two days, then I'd have NULLs for May 30th and May 31st.
Here's the SQL I wrote
SELECT
timestamp_session::date AS _date
, COUNT(DISTINCT dim_player_key) AS count_daily_active_users
, COUNT(DISTINCT CASE WHEN days_since_birth = 0 THEN dim_player_key ELSE NULL END) AS count_new_users
, COUNT(DISTINCT CASE WHEN days_since_birth != 0 THEN dim_player_key ELSE NULL END) AS count_returning_users
, COUNT(DISTINCT CASE WHEN days_since_birth = 1 THEN dim_player_key ELSE NULL END) AS day1_users -- note: the function is a LAG function instead of a LEAD function because of the sort order
, (NULLIF(LAG(COUNT(DISTINCT CASE WHEN days_since_birth = 0 THEN dim_player_key ELSE NULL END), 1) OVER (order by _date)::float, 0)) as AA
, (NULLIF(LAG(COUNT(DISTINCT CASE WHEN days_since_birth = 1 THEN dim_player_key ELSE NULL END), 1) OVER (order by _date)::float, 0)) as AB
, (NULLIF(LAG(COUNT(DISTINCT CASE WHEN days_since_birth = 0 THEN dim_player_key ELSE NULL END), 0) OVER (order by _date)::float, 0)) as BB
, (NULLIF(LAG(COUNT(DISTINCT CASE WHEN days_since_birth = 1 THEN dim_player_key ELSE NULL END), 0) OVER (order by _date)::float, 0)) as BA
FROM ( SELECT sessions_table.account_id AS dim_player_key,
sessions_table.session_id AS dim_session_key,
sessions_table.title_id AS dim_title_id,
sessions_table.appid AS dim_app_id,
sessions_table.loginip AS login_ip,
essions_table.logindate AS timestamp_session,
birthdate_table.birthdate AS timestamp_birthdate,
EXTRACT(EPOCH FROM (sessions_table.logindate - birthdate_table.birthdate)) AS count_age_in_seconds,
(date_part('day', sessions_table.logindate)- date_part('day', birthdate_table.birthdate)) AS days_since_birth
FROM
dataset.tablename1 AS sessions_table
JOIN (
SELECT
account_id,
MIN(logindate) AS birthdate
FROM
dataset.tablename1
GROUP BY
account_id )
-- call this sub-table the birthdate_table
birthdate_table ON
sessions_table.account_id = birthdate_table.account_id
-- call this table the outer_sessions_table
) AS outer_sessions_table
GROUP BY
_date
ORDER BY
_date ASC
I think that what I probably need to do is add an additional field in the inner select that reports the date as an integer value- something like that the EPOCH time for that date at midnight. But when I have tried that (adding a per day epoch time) it changes all of the values in the output table to 1. And I don't understand why.
Can anyone help me out?
Thanks,
Brad
The problem was with the days_since_birth calculation. I was using
(date_part('day',
sessions_table.logindate)- date_part('day',
birthdate_table.birthdate)) AS days_since_birth
as though it was subtracting the absolute date to give me a difference between those dates in days, but it's just converting the date to a day of the month and subtracting that, so at the month roll over, it returns -27, -29, -30 (depending on the month). I can fix this by wrapping it with an ABS function.

SQL Not returning Expected Records

This has to be a simple error on my part. I have a table with permits (applicants have one permit) - about ~600 expired last season and ~900 the season before. I need to generate a mailing list of unique applicants that had permits in last two seasons.
SELECT COUNT(*) FROM Backyard_Burn WHERE YEAR(Expiration_Date)= 2014
SELECT COUNT(*) FROM Backyard_Burn WHERE YEAR(Expiration_Date)= 2013
SELECT COUNT(*) FROM Backyard_Burn WHERE YEAR(Expiration_Date)= 2013
AND Applicant_Mail_ID NOT IN(
SELECT Applicant_Mail_ID
FROM Backyard_Burn
WHERE YEAR(Expiration_Date)= 2014)
Which returns : 618, 923, and 0
Why 0 and not a number somewhere near 923 - 618 assuming most are repeat applicants?
NOT IN can be dangerous. The problem is probably caused because Applicant_Mail_id takes on NULL values. You can fix this readily with:
SELECT COUNT(*)
FROM Backyard_Burn
WHERE YEAR(Expiration_Date) = 2013 AND
Applicant_Mail_ID NOT IN (SELECT Applicant_Mail_ID
FROM Backyard_Burn
WHERE YEAR(Expiration_Date) = 2014 AND Applicant_Mail_ID IS NOT NULL
);
If any of those values are NULL, then NOT IN can only return FALSE or NULL -- the condition can never allows records through.
For this reason, I think it is better to use NOT EXSTS, which has the semantics you expect when some of the values might be NULL:
SELECT COUNT(*)
FROM Backyard_Burn bb
WHERE YEAR(Expiration_Date) = 2013 AND
NOT EXISTS (SELECT 1
FROM Backyard_Burn bb2
WHERE YEAR(bb2.Expiration_Date) = 2014 AND
bb2.Applicant_Mail_ID = bb.Applicant_Mail_ID
);
EDIT:
By the way, an alternative way of formulating this is to use group by and having:
select Applicant_Mail_ID
from Backyard_Burn
group by Applicant_Mail_ID
having sum(case when year(Expiration_Date) = 2013 then 1 else 0 end) > 0 and
sum(case when year(Expiration_Date) = 2014 then 1 else 0 end) > 0;
This avoids the problem with NULLs and makes it easy to add new conditions, such as applicants who did not have any records in 2012.
you need applicants from the last two seasons - you need to use a greater than operator
its better to check on a full date instead of getting the year value with year
to get the unique applicants you can use distinct
Which results in:
select count(distinct Applicant_Mail_ID)
from Backyard_Burn
where Expiration_Date >= '20130101';

t-sql subquery and groupby

Under the my query.I select this way all job count by fullname.
SELECT COUNT(sy.FullName) [Count Job],
sy.FullName [FullName],
MIN(CAST(i.vrp_notificationdate AS DATE)) [Oldest Date]
FROM BusinessUnit AS b
INNER JOIN SystemUser AS sy
ON b.BusinessUnitId = sy.BusinessUnitId
INNER JOIN Incident AS i
ON i.OwnerId = sy.SystemUserId
GROUP BY f.sy.FullName
This query show this table
---------------------------------
Count Job FullName Oldest Date
10 a 2011-10-11
20 B 2011-10-11
55 C 2011-10-11
---------------------------------
But i want to make under table for example.
--------------------------------------------------------------
Count Job FullName Oldest Date Open Job Close Job
10 A 2011-10-11 5 5
20 B 2011-10-11 13 7
55 C 2011-10-11 48 7
------------------------------------------------------------
I have status of columnname on my Incident Table,if status code is 5 that the job is closed.when i used group by condition statuscode,then table is under .And i dont want show this showing table.
---------------------------------
Count Job FullName Oldest Date
10 a 2011-10-11
13 B 2011-10-11
48 C 2011-10-11
7 B 2011-10-11
7 C 2011-10-11
---------------------------------
when i use union on my t-sql,i take this error "all queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists."
how to exactly solve this query.Any suggestion.
Thanks.
How about using CASE and SUM?
SELECT COUNT(sy.FullName) [Count Job],
sy.FullName [FullName],
MIN(CAST(i.vrp_notificationdate AS DATE)) [Oldest Date],
SUM(CASE i.status
WHEN 5 THEN 1
ELSE 0) [Open Jobs],
SUM(CASE i.status
WHEN 5 THEN 0
ELSE 1) [Closed Jobs]
FROM BusinessUnit AS b
INNER JOIN SystemUser AS sy
ON b.BusinessUnitId = sy.BusinessUnitId
INNER JOIN Incident AS i
ON i.OwnerId = sy.SystemUserId
GROUP BY f.sy.FullName