How to create a pivot table in Postgresql using case when? - postgresql

I want to create a pivot table using postgresql. I could accomplish this using SQLite, and I thought the logic would be similar, but it doesn't seem to be the case.
Here's the sample table:
create table df(
campaign varchar(50),
date date not null,
revenue integer not null
);
insert into df(campaign,date,revenue) values('A','2019-01-01',10000);
insert into df(campaign,date,revenue) values('B','2019-01-02',7000);
insert into df(campaign,date,revenue) values('A','2018-01-01',5000);
insert into df(campaign,date,revenue) values('B','2018-01-01',3500);
here's my sqlite code to transform the tidy data into pivot table:
select
sum(case when strftime('%Y', date) = '2019' then revenue else 0 end) as '2019',
sum(case when strftime('%Y', date) = '2018' then revenue else 0 end) as '2018',
campaign
from df
group by campaign
the result would be like this:
2018 2019 campaign
5000 10000 A
3500 7000 B
I tried making the similar code using postgres, I will just use the year 2019:
select
sum(case when extract('year' from date) = '2019' then revenue else 0 end) as '2019',
campaign
from df
group by campaign
somehow the code doesn't work, I don't understand what's wrong.
Query Error: error: syntax error at or near "'2019'"
what do I miss here?
db-fiddle link:
https://www.db-fiddle.com/f/f1WjMAAxwSPRvB8BrxECN7/0

The function strftime() is used to extract various parts of a date in SQLite, but is not supported by Postgresql.
Use date_part():
select campaign,
sum(case when date_part('year', date) = '2019' then revenue else 0 end) as "2019",
sum(case when date_part('year', date) = '2018' then revenue else 0 end) as "2018"
from df
group by campaign
Or use Postgresql's FILTER clause:
select campaign,
sum(revenue) filter (where date_part('year', date) = '2019') as "2019",
sum(revenue) filter (where date_part('year', date) = '2018') as "2018"
from df
group by campaign
Also, don't use single quotes for table/column names.
SQLite allows it but Postgresql does not.
It accepts only double quotes which is the SQL standard.
See the demo.

Related

How to fix more than one columns returned error in PostgreSQL

I was trying to write a code to fetch records from a table and group by some columns but the subquery returns "more than one" error.
When I write the codes independently, I get a great awesome result but combining them is a problem.
select
year as Season,
cal_scheme as Scheme,
(case when cal_scheme='Mt.Elgon' then '1000'
when cal_scheme='West Nile' then '2000'
when cal_scheme='Rwenzori' then '1500' else '' end) as Target,
min(today::date) as startdatetime,
max(today::date)-min(today::date) as No_of_days,
(select count(id) as id from
kcl_internal_edit where new_farmer='' or new_farmer is null
group by year, cal_scheme)as growers
from kcl_internal_edit
group by year, cal_scheme
The expected result is to be as follows:
Season Scheme Target Startdatetime No_of_days growers
2019 Mt.Elgon 1000 28-10-2019 5 5
2019 West Nile 2000 29-05-2019 10 1
2018 Mt.Elgon 1500 29-08-2018 207 3
Your query should look like this:
select
year as Season,
cal_scheme as Scheme,
(case when cal_scheme='Mt.Elgon' then '1000'
when cal_scheme='West Nile' then '2000'
when cal_scheme='Rwenzori' then '1500' else '' end) as Target,
min(today::date) as startdatetime,
max(today::date)-min(today::date) as No_of_days,
count(id) FILTER (WHERE new_farmer='' or new_farmer is null) as growers
from kcl_internal_edit
group by year, cal_scheme;
There is no need for a subselect!

t-sql select max value between two columns, or col one when col two is null

This is not easy for me to describe in the title (please forgive me), but here is my problem:
Suppose you have the following table:
CREATE TABLE Subscriptions (product char(3), start_date datetime, end_date datetime);
INSERT INTO #Subscriptions
VALUES('ABC', '2015-01-28 00:00:00', '2016-02-15 00:00:00'),
('ABC', '2016-02-04 12:08:00', NULL),
('DEF', '2013-04-15 00:00:00', '2013-06-10 00:00:00'),
('GHI', '2013-01-11 00:00:00', '2013-04-08 00:00:00');
Now I want to find out for how long a subscription has been either active or passive. I thus need to select the newest end_dates grouped by product, BUT if end_date is null, then I want start_date.
So - I have:
product start_date end_date
ABC 28-01-2015 00:00 15-02-2016 00:00
ABC 04-02-2016 12:08 NULL
DEF 15-04-2013 00:00 10-06-2013 00:00
GHI 11-01-2013 00:00 08-04-2013 00:00
What I want to find in my query:
product relevant_date
ABC 04-02-2016 12:08
DEF 10-06-2013 00:00
GHI 08-04-2013 00:00
I have tried using a union, and that seems to work, but it is very slow, and my question is: is there a more efficient way to solve this (I am using MS SQL Server 2012):
SELECT [product]
,MAX([start_date]) AS start_date
,NULL AS [end_date]
,MAX([start_date]) AS relevant_date
FROM Subscriptions
where end_date IS NULL
GROUP BY product
UNION
SELECT [product]
,NULL
,MAX([end_date])
,MAX([end_date])
FROM Subscriptions
where end_date IS not NULL and product not in (SELECT product FROM Subscriptions
where end_date IS NULL)
GROUP BY product
(If you have a suggestion for another title for my question, I am also all ears!)
For version 2012 or higher you can use a combination of distinct, first_value and isnull, like this:
SELECT DISTINCT
product,
FIRST_VALUE(ISNULL(end_date,start_date))
OVER(PARTITION BY product
ORDER BY ISNULL(end_date, '9999-12-31') DESC) AS EndDate
FROM Subscriptions
Results:
product EndDate
ABC 04.02.2016 12:08:00
DEF 10.06.2013 00:00:00
GHI 08.04.2013 00:00:00
For versions between 2008 and 2012, you can use a cte with row_number to get the same effect:
;WITH CTE AS
(
SELECT product,
ISNULL(end_date,start_date) As relevant_date,
ROW_NUMBER() OVER(PARTITION BY product ORDER BY ISNULL(end_date, '9999-12-31') DESC) As rn
FROM Subscriptions
)
SELECT product,
relevant_date
FROM CTE
WHERE rn = 1
See a live demo on rextester.
If the second ABC row is showing the incorrect start_date then this query should work
SELECT S.product
, relevant_date = MAX(ISNULL(S.end_date,S.start_date))
FROM dbo.Subscriptions S
GROUP BY S.product
This should do it:
select s1.product,MAX(case when useStartDate=1 then s1.startDate else s1.endDate end) 'SubscriptionDate'
from #Subscriptions s1
join (select s2s1.product, max(case when s2s1.endDate is null then 1 else 0 end) 'useStartDate' from #Subscriptions s2s1 group by s2s1.product) s2 on s1.product=s2.product
group by s1.product

Monthly retention in Amazon redshift

I'm trying to calculate monthly retention rate in Amazon Redshift and have come up with the following query:
Query 1
SELECT EXTRACT(year FROM activity.created_at) AS Year,
EXTRACT(month FROM activity.created_at) AS Month,
COUNT(DISTINCT activity.member_id) AS active_users,
COUNT(DISTINCT future_activity.member_id) AS retained_users,
COUNT(DISTINCT future_activity.member_id) / COUNT(DISTINCT activity.member_id)::float AS retention
FROM ads.fbs_page_view_staging activity
LEFT JOIN ads.fbs_page_view_staging AS future_activity
ON activity.mongo_id = future_activity.mongo_id
AND datediff ('month',activity.created_at,future_activity.created_at) = 1
GROUP BY Year,
Month
ORDER BY Year,
Month
For some reason this query returns zero retained_users and zero retention. I'd appreciate any help regarding why this may be happening or maybe a completely different query for monthly retention would work.
I modified the query as per another SO post and here it goes:
Query 2
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('month', created_at)) OVER (PARTITION BY member_id
ORDER BY date_trunc('month', created_at))
= date_trunc('month', created_at) - interval '1 month'
OR NULL AS repeat_transaction
FROM ads.fbs_page_view_staging
WHERE created_at >= '2016-01-01'::date
AND created_at < '2016-04-01'::date -- time range of interest.
GROUP BY 1, 2
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
This query gives me the following error:
An error occurred when executing the SQL command:
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('m...
[Amazon](500310) Invalid operation: Interval values with month or year parts are not supported
Details:
-----------------------------------------------
error: Interval values with month or year parts are not supported
code: 8001
context: interval months: "1"
query: 616822
location: cg_constmanager.cpp:145
process: padbmaster [pid=15116]
-----------------------------------------------;
I have a feeling that Query 2 would fare better than Query 1, so I'd prefer to fix the error on that.
Any help would be much appreciated.
Query 1 looks good. I tried similar one. See below. You are using self join on table (ads.fbs_page_view_staging) and the same column (created_at). Assuming mongo_id is unique, the datediff('month'....) will always return 0 and datediff ('month',activity.created_at,future_activity.created_at) = 1 will always be false.
-- Count distinct events of join_col_id that have lapsed for one month.
SELECT count(distinct E.join_col_id) dist_ct
FROM public.fact_events E
JOIN public.dim_table Z
ON E.join_col_id = Z.join_col_id
WHERE datediff('month', event_time, sysdate) = 1;
-- 2771654 -- dist_ct

Update Redshift table from query

I'm trying to update a table in Redshift from query:
update mr_usage_au au
inner join(select mr.UserId,
date(mr.ActionDate) as ActionDate,
count(case when mr.EventId in (32) then mr.UserId end) as Moods,
count(case when mr.EventId in (33) then mr.UserId end) as Activities,
sum(case when mr.EventId in (10) then mr.Duration end) as Duration
from mr_session_log mr
where mr.EventTime >= current_date - interval '1 days' and mr.EventTime < current_date
Group By mr.UserId,
date(mr.ActionDate)) slog on slog.UserId=au.UserId
and slog.ActionDate=au.Date
set au.Moods = slog.Moods,
au.Activities=slog.Activities,
au.Durarion=slog.Duration
But I receive the following error:
ERROR: syntax error at or near "au".
This is completely invalid syntax for Redshift (or Postgres). Reminds me of SQL Server ...
Should work like this (at least on current Postgres):
UPDATE mr_usage_au
SET Moods = slog.Moods
, Activities = slog.Activities
, Durarion = slog.Duration
FROM (
select UserId
, ActionDate::date
, count(CASE WHEN EventId = 32 THEN UserId END) AS Moods
, count(CASE WHEN EventId = 33 THEN UserId END) AS Activities
, sum(CASE WHEN EventId = 10 THEN Duration END) AS Duration
FROM mr_session_log
WHERE EventTime >= current_date - 1 -- just subtract integer from a date
AND EventTime < current_date
GROUP BY UserId, ActionDate::date
) slog
WHERE slog.UserId = mr_usage_au.UserId
AND slog.ActionDate = mr_usage_au.Date;
This is generally the case for Postgres and Redshift:
Use a FROM clause to join in additional tables.
You cannot table-qualify target columns in the SET clause.
Also, Redshift was forked from PostgreSQL 8.0.2, which is very long ago. Only some later updates to Postgres were applied.
For instance, Postgres 8.0 did not allow a table alias in an UPDATE statement, yet - which is the reason behind the error you see.
I simplified some other details.

2 T-SQL queries that should be the same are not

I have the 2 below queries that should produce the same result as far as I can tell but they are actually producing vastly different numbers. Why is "Between" dates not the same as specifying the month and year of those dates?
What could be causing this?
SELECT [Account]
, SUM([Amount]) AS [Amount]
FROM [Table]
WHERE [Account] = 'Specific Account'
AND Month([Date]) = 5
AND Year([Date]) = 2015
GROUP BY [Account]
Sum Result: -1,500,000
SELECT [Account]
, SUM([Amount]) AS [Amount]
FROM [Table]
WHERE [Account] = 'Specific Account'
AND [Date] BETWEEN '2015-05-01' AND '2015-05-31'
GROUP BY [Account]
Sum Result: 350,000
I need the first one to be correct because I need to group the results by Month and Year, which would be cumbersome using the second query.
Query that I need ultimately:
SELECT [Account]
, Month([Date]) AS [Month]
, Year([Date]) AS [Year]
, SUM([Amount]) AS [Amount]
FROM [Table]
GROUP BY [Account]
, Month([Date])
, Year([Date])
[Date] BETWEEN '2015-05-01' AND '2015-05-31'
will only include rows on the 31st where the time component is midnight and omit the rest of the day.
You should forget about BETWEEN as there is no valid string literal that you can put on the right that will work correctly for datetime,smalldatetime,datetime2(0)..datetime2(7) and use
WHERE [Date] >= '2015-05-01' AND [Date] < '2015-06-01'
Try below for your first case, where you are getting more rows.
AND (Month([Date]) = 5 AND Year([Date]) = 2015)
instead of
AND Month([Date]) = 5 AND Year([Date]) = 2015
==Update==
I would suggest to use CONVERT function. And you should revise your query like below
CONVERT(varchar(10),DATE_COLUMN,112) between '20150501' and '20150531'