Postgres - convert null result column to zero in crosstab query

Postgres - convert null result column to zero in crosstab query - postgresql

I have a crosstab query that is working fine. It is based on customers and their total transactions, split by months of the year.
The only issue is that when there is no data for a column (ie no activity for a month) then I get a null value which I would like to convert to a zero.
I have tried coalesce on the 'amount' field, but that does not work.
If anyone has any pointers to help I would be very grateful.
The query is:
select *
from crosstab(
$ct$
SELECT sa.id,
company.name,
to_char(sat.transaction_date, 'YYYY-MM') AS my,
COALESCE(sat.amount,0) AS amnt
FROM sales_account_transactions sat
JOIN sales_account sa ON sa.id = sat.sales_account
JOIN company ON sa.company = company.id
WhERE sat.financial_company = 1
AND sat.transaction_date BETWEEN '2018-01-01' AND '2018-03-31'
AND sat.reversed_by = 0
AND sat.original_id = 0
GROUP BY sa.id, company.name, my, amnt
ORDER BY company.name, to_char(sat.transaction_date, 'YYYY-MM');
$ct$,
$$VALUES
('2018-01'), ('2018-02'), ('2018-03')
$$
)
as ct(id int, name text,
"Jan 2018" int, "Feb 2018" int, "Mar 2018" int);

Instead of select *, use select coalesce("Jan 2018", 0) as "Jan 2018", ...

Related

How to create a pivot table in Postgresql using case when?

I want to create a pivot table using postgresql. I could accomplish this using SQLite, and I thought the logic would be similar, but it doesn't seem to be the case.
Here's the sample table:
create table df(
campaign varchar(50),
date date not null,
revenue integer not null
);
insert into df(campaign,date,revenue) values('A','2019-01-01',10000);
insert into df(campaign,date,revenue) values('B','2019-01-02',7000);
insert into df(campaign,date,revenue) values('A','2018-01-01',5000);
insert into df(campaign,date,revenue) values('B','2018-01-01',3500);
here's my sqlite code to transform the tidy data into pivot table:
select
sum(case when strftime('%Y', date) = '2019' then revenue else 0 end) as '2019',
sum(case when strftime('%Y', date) = '2018' then revenue else 0 end) as '2018',
campaign
from df
group by campaign
the result would be like this:
2018 2019 campaign
5000 10000 A
3500 7000 B
I tried making the similar code using postgres, I will just use the year 2019:
select
sum(case when extract('year' from date) = '2019' then revenue else 0 end) as '2019',
campaign
from df
group by campaign
somehow the code doesn't work, I don't understand what's wrong.
Query Error: error: syntax error at or near "'2019'"
what do I miss here?
db-fiddle link:
https://www.db-fiddle.com/f/f1WjMAAxwSPRvB8BrxECN7/0

The function strftime() is used to extract various parts of a date in SQLite, but is not supported by Postgresql.
Use date_part():
select campaign,
sum(case when date_part('year', date) = '2019' then revenue else 0 end) as "2019",
sum(case when date_part('year', date) = '2018' then revenue else 0 end) as "2018"
from df
group by campaign
Or use Postgresql's FILTER clause:
select campaign,
sum(revenue) filter (where date_part('year', date) = '2019') as "2019",
sum(revenue) filter (where date_part('year', date) = '2018') as "2018"
from df
group by campaign
Also, don't use single quotes for table/column names.
SQLite allows it but Postgresql does not.
It accepts only double quotes which is the SQL standard.
See the demo.

DATE ADD function in PostgreSQL

I currently have the following code in Microsoft SQL Server to get users that viewed on two days in a row.
WITH uservideoviewvideo (date, user_id) AS (
SELECT DISTINCT date, user_id
FROM clickstream_videos
WHERE event_name ='video_play'
and user_id IS NOT NULL
)
SELECT currentday.date AS date,
COUNT(currentday.user_id) AS users_view_videos,
COUNT(nextday.user_id) AS users_view_next_day
FROM userviewvideo currentday
LEFT JOIN userviewvideo nextday
ON currentday.user_id = nextday.user_id AND DATEADD(DAY, 1,
currentday.date) = nextday.date
GROUP BY currentday.date
I am trying to get the DATEADD function to work in PostgreSQL but I've been unable to figure out how to get this to work. Any suggestions?

I don't think PostgreSQL really has a DATEADD function. Instead, just do:
+ INTERVAL '1 day'
SQL Server:
Add 1 day to the current date November 21, 2012
SELECT DATEADD(day, 1, GETDATE()); # 2012-11-22 17:22:01.423
PostgreSQL:
Add 1 day to the current date November 21, 2012
SELECT CURRENT_DATE + INTERVAL '1 day'; # 2012-11-22 17:22:01
SELECT CURRENT_DATE + 1; # 2012-11-22 17:22:01
http://www.sqlines.com/postgresql/how-to/dateadd
EDIT:
It might be useful if you're using a dynamic length of time to create a string and then cast it as an interval like:
+ (col_days || ' days')::interval

You can use date + 1 to do the equivalent of dateadd(), but I do not think that your query does what you want to do.
You should use window functions, instead:
with plays as (
select distinct date, user_id
from clickstream_videos
where event_name = 'video_play'
and user_id is not null
), nextdaywatch as (
select date, user_id,
case
when lead(date) over (partition by user_id
order by date) = date + 1 then 1
else 0
end as user_view_next_day
from plays
)
select date,
count(*) as users_view_videos,
sum(user_view_next_day) as users_view_next_day
from nextdaywatch
group by date
order by date;

Capture first character of last group of 1s in a binary series

I have a series something like this:
Month J F M A M J J A S O N D
Status 1 0 0 1 0 1 0 0 1 1 1 1
Using t-SQL, I am trying to capture the month corresponding to the first 1 in the last group of 1s, i.e., September in this example.
Here is the code I'm using:
IF OBJECT_ID('tempdb..#Temp1') IS NOT NULL DROP TABLE #Temp1
;WITH PARTITIONED1 AS
(SELECT , t0.ID
, t0.Year_Month
, t0.Status
, LAST_VALUE(t0.Year_Month) OVER (PARTITION BY t0.ID ORDER BY t0.Year_Month) AS D_YM
, ROW_NUMBER() OVER (PARTITION BY t0.ID ORDER BY t0.Year_Month) AS rn1
FROM #Temp0 t0
However, this just returns the first occurence of a 1; January here.
I really can't figure this one out, so any help would be very much appreciated.

Carefull with
although the ordering is performed in a previous stage
The previous sorting does not guarantee the later processing!
Try something like this. It is a very simple approach where you rely on gapless IDs:
DECLARE #tbl TABLE(ID INT IDENTITY,Mnth VARCHAR(100),[Status] TINYINT);
INSERT INTO #tbl VALUES
('J',1)
,('F',0)
,('M',0)
,('A',1)
,('M',0)
,('J',1)
,('J',0)
,('A',0)
,('S',1)
,('O',1)
,('N',1)
,('D',1);
SELECT a.*
FROM #tbl AS a
WHERE a.ID=(SELECT MAX(b.ID)+1 FROM #tbl AS b WHERE b.[Status]=0)

this can also be used :
select top 1 Month from table t where Status=1
and not exists
(select id from table t1 where stat=0 and t1.id>t.id)
order by t.id

I might have overcomplicated this but not knowing the table structure I put the below together:
IF OBJECT_ID('tempdb..#Temp1') IS NOT NULL DROP TABLE #Temp1
CREATE TABLE #Temp1
(
Jan int,
Feb int,
Mar int,
Apr int,
May int,
June int,
July int ,
Aug int,
Sep int,
Oct int,
Nov int,
Dec int
)
insert into #temp1
select
1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1
IF OBJECT_ID('tempdb..#monthTranslate') IS NOT NULL DROP TABLE #monthTranslate
create table #monthTranslate
(
MonthValue varchar(50),
MonthInt int
)
insert into #monthTranslate
select 'Jan',1
union all select 'Feb',2
union all select 'Mar',3
union all select 'Apr',4
union all select 'May',5
union all select 'June',6
union all select 'July',7
union all select 'Aug',8
union all select 'Sep',9
union all select 'OCt',10
union all select 'Nov',11
union all select 'Dec',12
--find the max month w\ 0 and add 1... becareful on null, it might return January incorrectly. I'd check for that in a a case statement
select max(b.MonthInt)+1
from
(
select
MonthPassVal, months , t.MonthInt
from
(
select Jan, Feb, Mar, Apr, May, June, July, Aug, Sep, Oct, Nov, Dec
from #temp1
) as r
Unpivot
(
MonthPassVal for Months
in (Jan, Feb, Mar, Apr, May, June, July, Aug, Sep, Oct, Nov, Dec)
) as u
inner join #monthTranslate t
on t.MonthValue = months
) as b
where
MonthPassVal=0

2 T-SQL queries that should be the same are not

I have the 2 below queries that should produce the same result as far as I can tell but they are actually producing vastly different numbers. Why is "Between" dates not the same as specifying the month and year of those dates?
What could be causing this?
SELECT [Account]
, SUM([Amount]) AS [Amount]
FROM [Table]
WHERE [Account] = 'Specific Account'
AND Month([Date]) = 5
AND Year([Date]) = 2015
GROUP BY [Account]
Sum Result: -1,500,000
SELECT [Account]
, SUM([Amount]) AS [Amount]
FROM [Table]
WHERE [Account] = 'Specific Account'
AND [Date] BETWEEN '2015-05-01' AND '2015-05-31'
GROUP BY [Account]
Sum Result: 350,000
I need the first one to be correct because I need to group the results by Month and Year, which would be cumbersome using the second query.
Query that I need ultimately:
SELECT [Account]
, Month([Date]) AS [Month]
, Year([Date]) AS [Year]
, SUM([Amount]) AS [Amount]
FROM [Table]
GROUP BY [Account]
, Month([Date])
, Year([Date])

[Date] BETWEEN '2015-05-01' AND '2015-05-31'
will only include rows on the 31st where the time component is midnight and omit the rest of the day.
You should forget about BETWEEN as there is no valid string literal that you can put on the right that will work correctly for datetime,smalldatetime,datetime2(0)..datetime2(7) and use
WHERE [Date] >= '2015-05-01' AND [Date] < '2015-06-01'

Try below for your first case, where you are getting more rows.
AND (Month([Date]) = 5 AND Year([Date]) = 2015)
instead of
AND Month([Date]) = 5 AND Year([Date]) = 2015
==Update==
I would suggest to use CONVERT function. And you should revise your query like below
CONVERT(varchar(10),DATE_COLUMN,112) between '20150501' and '20150531'

T-SQL get months from ranges of date

I have a Table with id and start date and end date. i want insert into another table, end of each month between the start data and end date and the ID, e.g.
ID Start Date End Date
1 2012-01-01 2012-03-31
2 2012-10-01 2012-12-31
Results
ID MONTH END
1 2012-01-31
1 2012-02-29
1 2012-03-31
2 2012-10-31
2 2012-11-30
2 2012-12-31

This answer makes some assumptions - no end-dates greater than start-dates, but you should see how it works. It creates a recursive union CTE and uses that to figure out the end dates
CREATE TABLE #Dates
(
ID INT IDENTITY PRIMARY KEY,
START_DATE DATETIME2(0) NOT NULL,
END_DATE DATETIME2(0) NOT NULL
)
INSERT INTO #Dates VALUES ('2012-01-01', '2012-03-31'), ('2012-10-01','2012-12-31')
WITH MONTHS ([ID],[Month],[Date], [End])
AS
(
SELECT ID, DATEPART(m,START_DATE) AS [Month], START_DATE AS [Date], DATEADD(s,-1,DATEADD(m,DATEDIFF(m,0,START_DATE)+1,0)) as [End]
FROM #Dates
UNION ALL
SELECT D.ID, DATEPART(m,DATEADD(m,1,[Date])),DATEADD(m,1,[Date]), DATEADD(s,-1,DATEADD(m,DATEDIFF(m,0,DATEADD(m,1,[Date]))+1,0)) as [End]
FROM #Dates D
INNER JOIN MONTHS M
ON D.ID = M.ID
WHERE DATEADD(m,1,[Date]) < [END_DATE]
)
SELECT *
FROM MONTHS ORDER BY ID, Date
DROP TABLE #Dates