Redshift - Find average sales by month - amazon-redshift

I have the below query that find count of sales done by month.
select to_char(sale_date,'Mon') as mon,
count (*) as "Sales"
from sales
where to_char(sale_date,'yyyy-mm-dd') between '2018-10-01' and '2018-12-01'
group by 1
I am trying to find the average sales done by month. How could I modify the above query to get this output. I am using Redshift.

You may query a CTE and take the average:
WITH cte AS (
select to_char(sale_date,'Mon') as mon, count (*) as "Sales"
from sales
where to_char(sale_date,'yyyy-mm-dd') between '2018-10-01' and '2018-12-01'
group by 1
)
SELECT AVG(Sales)
FROM cte;
Note that ideally you should be grouping by year and month, because a given month can belong to more than one year. If you wanted to keep your current query, but include the average over all months, then you could try:
select
to_char(sale_date,'Mon') as mon,
extract(year from sale_date) as year,
count (*) as "Sales",
avg(count(*)) over () "AvgSales"
from sales
where to_char(sale_date,'yyyy-mm-dd') between '2018-10-01' and '2018-12-01'
group by 1, 2;

Related

Count distinct dates between two timestamps

I want to count %days when a user was active. A query like this
select
a.id,
a.created_at,
CURRENT_DATE - a.created_at::date as days_since_registration,
NOW() as current_d
from public.accounts a where a.id = 3257
returns
id created_at days_since_registration current_d tot_active
3257 2022-04-01 22:59:00.000 1 2022-04-02 12:00:0.000 +0400 2
The person registered less than 24 hours ago (less than a day ago), but there are two distinct dates between the registration and now. Hence, if a user was active one hour before midnight and one hour after midnight, he is two days active in less than a day (active 200% of days)
What is the right way to count distinct dates and get 2 for a user, who registered at 23:00:00 two hours ago?
WITH cte as (
SELECT 42 as userID,'2022-04-01 23:00:00' as d
union
SELECT 42,'2022-04-02 01:00:00' as d
)
SELECT
userID,
count(d),
max(d)::date-min(d)::date+1 as NrOfDays,
count(d)/(max(d)::date-min(d)::date+1) *100 as PercentageOnline
FROM cte
GROUP BY userID;
output:
userid
count
nrofdays
percentageonline
42
2
2
100

Mixing DISTINCT with GROUP_BY Postgres

I am trying to get a list of:
all months in a specified year that,
have at least 2 unique rows based on their date
and ignore specific column values
where I got to is:
SELECT DATE_PART('month', "orderDate") AS month, count(*)
FROM public."Orders"
WHERE "companyId" = 00001 AND "orderNumber" != 1 and DATE_PART('year', ("orderDate")) = '2020' AND "orderNumber" != NULL
GROUP BY month
HAVING COUNT ("orderDate") > 2
The HAVING_COUNT sort of works in place of DISTINCT insofar as I can be reasonably sure that condition filters the condition of data required.
However, being able to use DISTINCT based on a given date within a month would return a more reliable result. Is this possible with Postgres?
A sample line of data from the table:
Sample Input
"2018-12-17 20:32:00+00"
"2019-02-26 14:38:00+00"
"2020-07-26 10:19:00+00"
"2020-10-13 19:15:00+00"
"2020-10-26 16:42:00+00"
"2020-10-26 19:41:00+00"
"2020-11-19 20:21:00+00"
"2020-11-19 21:22:00+00"
"2020-11-23 21:10:00+00"
"2021-01-02 12:51:00+00"
without the HAVING_COUNT this produces
month
count
7
1
10
2
11
3
Month 7 can be discarded easily as only 1 record.
Month 10 is the issue: we have two records. But from the data above, those records are from the same day. Similarly, month 11 only has 2 distinct records by day.
The output should therefore be ideally:
month
count
11
2
We have only two distinct dates from the 2020 data, and they are from month 11 (November)
I think you just want to take the distinct count of dates for each month:
SELECT
DATE_PART('month', orderDate) AS month,
COUNT(DISTINCT orderDate::date) AS count
FROM Orders
WHERE
companyId = 1 AND
orderNumber != 1 AND
DATE_PART('year', orderDate) = '2020'
GROUP BY
DATE_PART('month', orderDate)
HAVING
COUNT(DISTINCT orderDate::date) > 2;

Using 'over' function results in column "table.id" must appear in the GROUP BY clause or be used in an aggregate function

I'm currently writing an application which shows the growth of the total number of events in my table over time, I currently have the following query to do this:
query = session.query(
count(Event.id).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
This results in the following output:
Count
Year
Month
100
2021
1
50
2021
2
75
2021
3
While this is okay on it's own, I want it to display the total number over time, so not just the number of events that month, so the desired outpout should be:
Count
Year
Month
100
2021
1
150
2021
2
225
2021
3
I read on various places I should use a window function using SqlAlchemy's over function, however I can't seem to wrap my head around it and every time I try using it I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "event.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT count(event.id) OVER (PARTITION BY event.date ORDER...
^
[SQL: SELECT count(event.id) OVER (PARTITION BY event.date ORDER BY EXTRACT(year FROM event.date), EXTRACT(month FROM event.date)) AS count, EXTRACT(year FROM event.date) AS year, EXTRACT(month FROM event.date) AS month
FROM event
WHERE event.date IS NOT NULL GROUP BY year, month]
This is the query I used:
session.query(
count(Event.id).over(
order_by=(
extract('year', Event.date),
extract('month', Event.date)
),
partition_by=Event.date
).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
Could someone show me what I'm doing wrong? I've been searching for hours but can't figure out how to get the desired output as adding event.id in the group by would stop my rows from getting grouped by month and year
The final query I ended up using:
query = session.query(
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month'),
func.sum(func.count(Event.id)).over(order_by=(
extract('year', Event.date),
extract('month', Event.date)
)).label('count'),
).filter(
Event.date.isnot(None)
).group_by('year', 'month')
I'm not 100% sure what you want, but I'm assuming you want the number of events up to that month for each month. You're going to first need to calculate the # of events per month and also sum them with the postgresql window function.
You can do that with in a single select statement:
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, SUM(COUNT(events.id)) OVER(ORDER BY extract(year FROM events.date), extract(month FROM events.date)) AS total_so_far
FROM events
GROUP BY 1,2
but it might be easier to think about if you split it into two:
SELECT year, month, SUM(events_count) OVER(ORDER BY year, month)
FROM (
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, COUNT(events.id) AS events_count
FROM events
GROUP BY 1,2
)
but not sure how to do that in SqlAlchemy

How to Display Last rolling 12 months in correct month order

I have an SSRS report that shows customer sales for the year and I have been asked to change it to the last 13 rolling months. I have changed my where clause to be:
WHERE (#First12Months.FirstSaleDate BETWEEN DATEADD(MM,-13,#ReportDate) AND (#ReportDate))
( #ReportDate is the last day of the month that needs to be displayed on the right of the matrix.)
This where clause pulls the correct data but it is still displaying in my monthsort order and I need to change this to the last 12 months so that the newest month is on the right and the oldest month is on the left. I cannot work out how to do the sort.
My old sort is MonthSort which gives each month a number where April is 1 through to March = 12:
CASE WHEN Month(#First12Months.FirstSaleDate)<=3 THEN MONTH(#First12Months.FirstSaleDate)+9 ELSE MONTH(#First12Months.FirstSaleDate)-3 END AS MonthSort
but of course this is now incorrect as I need the month from #ReportDate to be number 13 and each month before that chronologically to be 1 number less.
I found this post which is the only one that seems to come close to what I need but unfortunately I simply don't understand what it is saying.
Dynamic table/output each month for report
How do I tell the MonthSort column which number to allocate to the months to get the correct sort order for a rolling 13 months?
As your data is in rows and your SSRS displays it in columns you can do the following:
Add a sorting column to your sql query that uses an analytical function in order to give the (dense) rank of the month. That rank can then be used as a sorting criteria in SSRS.
Assuming your month column is called month, your query could look like this:
select t.*, dense_rank() over (order by month) rnk from t
That order could also be done descending like this:
select t.*, dense_rank() over (order by month desc) rnk from t
Let's have an example:
with t as (
select 2134 sales, cast('20190101' as date) month union all
select 3456 sales, cast('20190201' as date) month union all
select 234 sales, cast('20190301' as date) month union all
select 4567 sales, cast('20190401' as date) month union all
select 5678 sales, cast('20190501' as date) month union all
select 234 sales, cast('20190601' as date) month union all
select 756 sales, cast('20190701' as date) month union all
select 9 sales, cast('20190801' as date) month union all
select 24356134 sales, cast('20190901' as date) month union all
select 2456134 sales, cast('20191001' as date) month union all
select 234 sales, cast('20191101' as date) month union all
select 675 sales, cast('20191201' as date) month union all
select 86 sales, cast('20200101' as date) month union all
select 786 sales, cast('20200201' as date) month union all
select 715 sales, cast('20200301' as date) month union all
select 156 sales, cast('20200401' as date) month union all
select 123 sales, cast('20200501' as date) month union all
select 687 sales, cast('20200601' as date) month union all
select 45 sales, cast('20200701' as date) month
)
, t1 as (
select sales, month from t where t.month > dateadd(MONTH, -12, getdate())
)
select t1.*, DENSE_RANK() over (order by datefromparts(year(month), month([month]), 1)) rnk from t1
will return
I guess, your first query retrieving the data is correct. Changing the columns' order would have to be done in the SSRS report.
For sorting tablix (table elements in SSRS) have a look here
Posting my own answer:
I have managed to work out a formula that calculates a position number for each column to replace my MonthSort column:
select case when MONTH(saledate) BETWEEN MONTH(dateadd(mm,-11,#ReportDate)) AND 12 THEN((MONTH(saledate)+1)-MONTH(#ReportDate)+11) ELSE ((month(saledate)+1)+month(#ReportDate)+11) end as position,
from table
WHERE (saledate BETWEEN DATEADD(MM,-11,#ReportDate) AND (#ReportDate))
This doesn't quite give me what I wanted as I wanted 12 months but couldn't work out how to differentiate between the same month this year and the same month last year e.g. if the report date is 30/6/2020 then with a 12 month parameter it gives 2 June months (1 for each year 2019 and 2020) but places both of them in position 1 when June 2019 should be in position 1 and June 2020 should be in position 13. Works well with 11 months. If anyone can help with getting it to 12 months I would be grateful
I am posting another answer of my own because it is a different way of achieving what I needed as well as the simplest - I was making this issue far more complicated than it needed to be.
In my second effort I added an extra column called SaleYear using: YEAR(SaleDate). I already had a column for MONTH(SaleDate) that I was using in a case when to achieve the Apr to March sort.
I restricted the data within the SQL query to the last 13 months using a where clause:
WHERE (SaleDate BETWEEN DATEADD(MM,-13,#ReportDate) AND DATEADD(minute, - 1, #ReportDate + 1))
And in the SSRS report I added 'Year' as a parent column group to the 'Month' column group. In the column group I added sorting by Year and then by month.
Because I had already restricted the data in the sql query to the last 13 months I have the last 13 rolling months in the correct order.
This is the cleanest and most simplistic answer.

Customize query of postgresql

I am using postgresql database, for i am trying to achieve like i have two queries and but i don't want to use multiple queries so is it possible to manage by single query ?
Query 1 :
select coalesce(sum("dummy"),0) as sum from generate_series ('2014-09-09 00:00:00'::timestamp,'2014-09-09 23:59:59','1 minute')
minutes(minute) LEFT JOIN report ON
minutes.minute=date_trunc('minute', report.fetchdate)
AND fetchdate >= '2014-09-09 00:00:00' AND fetchdate <= '2014-09-09 23:59:00'
AND entity_id ='0' group by minute order by minute
OUTPUT:
Total count of dummy field for each minutes of each day it means each day have total (24*60=1440) records
Note : This Query Using for single Day
Query2 :
select date(day)as day,coalesce(sum("dummy"),0) as sum from generate_series ('2014-09-06 00:00:01'::date,'2014-09-12 23:59:59'::date,'1 day'::interval) days(day) LEFT JOIN report ON days.day=date_trunc('day', report.fetchdate) AND entity_id ='0' group by day order by day
OUTPUT:
give total count of dummy field for each day between day 2014-09-06 to 2014-09-12 it means total 7 records (Date : 6,7,8,9,10,11,12)
Note :This Query using for more than 1 days
Required Output:
1) Need to see total count of dummy field of each day between specified date(Output of 2nd query)
2) Need to see maximum call of each day
Ex :
Suppose i am search by any two days then need to break in single date and get data for each minute of each date and whenever we have maximum count of dummy field of particular day then need to show as output maximum call for each day
select
date_trunc('day', minute) as day,
sum(minute_sum) as day_sum,
max(minute_sum) as max_minute_sum
from (
select
minute,
coalesce(sum("dummy"),0) as minute_sum
from
generate_series(
'2014-09-06'::timestamp,
'2014-09-13'::timestamp - interval '1 minute',
'1 minute'
) minutes(minute)
left join
report on
minutes.minute = date_trunc('minute', report.fetchdate)
and entity_id ='0'
group by minute
) s
group by 1
order by 1