How can I get the maximum amount of the total amounts for different products in a month in Postgresql? - postgresql

I've just begun using Postgresql recently. I have a table named 'sales'.
create table sales
(
cust varchar(20),
prod varchar(20),
day integer,
month integer,
year integer,
state char(2),
quant integer
)
insert into sales values ('Bloom', 'Pepsi', 2, 12, 2001, 'NY', 4232);
insert into sales values ('Knuth', 'Bread', 23, 5, 2005, 'PA', 4167);
insert into sales values ('Emily', 'Pepsi', 22, 1, 2006, 'CT', 4404);
insert into sales values ('Emily', 'Fruits', 11, 1, 2000, 'NJ', 4369);
insert into sales values ('Helen', 'Milk', 7, 11, 2006, 'CT', 210);
...
There are 500 rows, 10 distinct products and 5 distinct customers in total.
It looks like this:
Now I need to , find the most “popular” and least “popular” products (those products with most and least total sales quantities) and the corresponding total sales quantities (i.e., SUMs) for each of the 12 months (regardless of the year).
The result should be like this:
Now I can only write query like this:
select month,
prod,
sum(quant)
from sales
group by month,prod
order by month,prod;
And it gives me the result like this:
Now I need to pick up the maximum value for each month. For example, the biggest value in the first 10 sums of month 1, and so on...
I also need to get the minimum value of the sums (regardless of the year). And combine them horizontally... I have no idea about this...

Note: for a TLDR, skip to the end.
Your problem is a very interesting textbook case as it involves multiple facets of Postgres.
I often find it very helpful to decompose the problem into multiple subproblems before joining them together for the final result set.
In your case, I see two subproblems: finding the most popular product for each month, and finding the least popular product for each month.
Let's start with the most popular products:
WITH months AS (
SELECT generate_series AS month
FROM generate_series(1, 12)
)
SELECT DISTINCT ON (month)
month,
prod,
SUM(quant)
FROM months
LEFT JOIN sales USING (month)
GROUP BY month, prod
ORDER BY month, sum DESC;
Explanations:
WITH is a common table
expression,
which acts as a temporary table (for the duration of the query) and
helps clarify the query. If you find it confusing, you could also opt
for a subquery.
generate_series(1, 12) is a Postgres function which generate a series of integers, in this case from 1 to 12.
the LEFT JOIN allows us to associate each sale to the corresponding month. If no sale can be found for a given month, a row is returned with the month and the joined columns with NULL values. More information on joins can be found here. In your case, using LEFT JOIN is important, as using INNER JOIN would exclude products that have never been sold (which in that case should be the least popular product).
GROUP BY is used to sum over the quantities.
at this stage, you should -potentially- have multiple products for any given month. We only want to keep those with the most quantities for each month. DISTINCT ON is especially useful for that purpose. Given a column, it allows us to keep the first iteration of each value. It is therefore important to ORDER the sales by sum first, as only the first one will be selected. We want the bigger numbers first, so DESC (for descending order) should be used.
We can now repeat the process for the least popular products:
WITH months AS (
SELECT generate_series AS month
FROM generate_series(1, 12)
)
SELECT DISTINCT ON (month)
month,
prod,
SUM(quant)
FROM months
LEFT JOIN sales USING (month)
GROUP BY month, prod
ORDER BY month, sum;
Conclusion (and TLDR):
Now we need to merge the two queries into one final query.
WITH months AS (
SELECT generate_series AS month
FROM generate_series(1, 12)
), agg_sales AS (
SELECT
month,
prod,
SUM(quant)
FROM months
LEFT JOIN sales USING (month)
GROUP BY month, prod
), most_popular AS (
SELECT DISTINCT ON (month)
month,
prod,
sum
FROM agg_sales
ORDER BY month, sum DESC
), least_popular AS (
SELECT DISTINCT ON (month)
month,
prod,
sum
FROM agg_sales
ORDER BY month, sum
)
SELECT
most_popular.month,
most_popular.prod AS most_popular_prod,
most_popular.sum AS most_pop_total_q,
least_popular.prod AS least_popular_prod,
least_popular.sum AS least_pop_total_q
FROM most_popular
JOIN least_popular USING (month);
Note that I used an intermediate agg_sales CTE to try and make the query a bit clearer and avoid repeating the same operation twice, although it shouldn't be a problem for Postgres' optimizer.
I hope you find my answer satisfactory. Do not hesitate to comment otherwise!
EDIT: although this solution should work as is, I would suggest storing your dates as a single column of type TIMESTAMPTZ. It is often much easier to manipulate dates using that type and it is always good practice in case you need to analyze and audit your database further down the line.
You can get the month of any date by simply using EXTRACT(MONTH FROM date).

Related

Is it possible to use AVG function to give the average of 2 results from within a sub-query that has a UNION?

So I have written a UNION query and then, in order to amalgamate the 2 averages from both sides of the union, I have put the whole union query inside a select query and given it an alias, this isn't the whole thing, but gets the point across I think(?):
select supplier, year, month, avg(average) as average from
(select supplier, year, month, avg(age(tableA.date, tableB.date)) as
average
from tableA join tableB using(supplier)
group by supplier, year, month
UNION ALL
select supplier, year, month, avg(age(tableC.date, tableB.date)) as
average
from tableC join tableB using(supplier)
group by supplier, year, month
) as x
group by supplier, year, month
I have a Maths graduate telling me that you simply can't average an average, but having looked at the data I think that the outer query is treating the average from the inner query as a single amount of time for each inner query and is therefore allowing me to average it outside as though it hasn't already been averaged, if that makes any kind of sense??
Any perspectives on this very welcome.

Continuous aggregates in postgres/timescaledb requires time_bucket-function?

I have a SELECT-query which gives me the aggregated sum(minutes_per_hour_used) of some stuff. Grouped by id, weekday and observed hour.
SELECT id,
extract(dow from observed_date) AS weekday, ( --observed_date is type date
observed_hour, -- is type timestamp without timezone, every full hour 00:00:00, 01:00:00, ...
sum(minutes_per_hour_used)
FROM base_table
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
The result looks nice, but now I would like to store that in a self-maintained view, which only considers/aggregates the last 8 weeks. I thought contiouus aggregates are the right way, but I can't make it work (https://blog.timescale.com/blog/continuous-aggregates-faster-queries-with-automatically-maintained-materialized-views/). It seems I need to somehow use the time_bucket-function, but actually I don't know how. Any ideas/hints?
I am using postgres with timescaledb.
EDIT: This gives me the desired output, but I can't put it in a continouus aggregate
SELECT id,
extract(dow from observed_date) AS weekday,
observed_hour,
sum(minutes_per_hour_used)
FROM base_table
WHERE observed_date >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
EDIT: Prepend this with
CREATE VIEW my_view
WITH (timescaledb.continuous) AS
gives me [0A000] ERROR: invalid SELECT query for continuous aggregate
Continuous aggregates require grouping by time_bucket:
SELECT <grouping_exprs>, <aggregate_functions>
FROM <hypertable>
[WHERE ... ]
GROUP BY time_bucket( <const_value>, <partition_col_of_hypertable> ),
[ optional grouping exprs>]
[HAVING ...]
It should be applied to a partitioned column, which is usually the time dimension column used in the hypertable creation. Also ORDER BY is not supported.
In the case of the aggregate query in the question no time column is used for grouping. Neither weekday nor observed_hour are time valid columns, since they don't increase as time, instead their values are repeat regularly. weekday repeats every 7 days and observed_hour repeats every 24 hours. This breaks requirements for continuous aggregates.
Since there is no ready solution for this use case, one approach is to use a continuous aggregate to reduce the amount of data for the targeted query, e.g., by bucketing by day:
CREATE MATERIALIZED VIEW daily
WITH (timescaledb.continuous) AS
SELECT id,
time_bucket('1day', observed_date) AS day,
observed_hour,
sum(minutes_per_hour_used)
FROM base_table
GROUP BY 1, 2, 3;
Then execute the targeted aggregate query on top of it:
SELECT id,
extract(dow from day) AS weekday,
observed_hour,
sum(minutes_per_hour_used)
FROM daily
WHERE day >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
Another approach is to use PostgreSQL's materialized views and refresh it on regular basis with help of custom jobs, which is run by the job scheduling framework of TimescaleDB. Note that the refresh will re-calculate entire view, which in the example case covers 8 weeks of data. The materialized view can be written in terms of the original table base_table or in terms of the continuous aggregate suggested above.

SQL SSRS aggregate fuctions

I am trying to figure out the aggregate functions in SQL SSRS to give me to sum of total sales for the given information by YEAR. I need to combine the year, the months within that year and provide the total sum of sales for that year. For example: for 2018 I need to combine month's 2-12 and provide the total sum, for 2019 combine 1-12 and provide total sum and so on.
enter image description here
I'm not sure where to begin on this one as I am new to SQL SSRS. Any help would be appreciated!
UPDATE:
Ideally I want this to be the end result:
id Year Price
102140 2019 ($XXXXX.XX)
102140 2018 ($XXXXX.XX)
102140 2017 ($XXXXX.XX)
And so on.
your query:
Select customer_id
, year_ordered
--, month_ordered
--, extended_price
--, SUM(extended_price) OVER (PARTITION BY year_ordered) AS year_total
, SUM(extended_price) AS year_total
From customer_order_history
Where customer_id = '101646'
Group By
customer_id
, year_ordered
, extended_price
--, month_ordered
Provides this:
enter image description here
multiple "years_ordered" because it is still using each month and that months SUM of price.
There are two approaches.
Do this in your dataset query:
SELECT Customer_id, year_ordered, SUM(extended_price) AS Price
FROM myTable
GROUP BY Customer_id, year_ordered
This option is best when you will never need the month values themselves in the report (i.e. you don't intend to have a drill down to the month data)
Do this in SSRS
By default you will get a RowGroup called "Details" (look under the main design area and you will row groups and column groups).
You can right-click this and add grouping for both customer_id and year_ordered. You can then change the extended_price textbox's value property to =SUM(Fields!extended_price.Value)
You could use a window function in your SQL:
select [year], [month], [price], SUM(PRICE) OVER (PARTITION BY year) as yearTotal
from myTable

Calculate best sale between several sellers

I'm using postgre .
Let's say there are 5 sellers .
Each month sale is recorded inside the database like this ( userId:6, january : 10000$, february:20000$ , march : 10000$ ... ,december:50000$, year :2018 )
I need to calculate , possibily with only one query, the best of each month sale in one array of this format : ( january : 15000$, february:30000$ , march : 40000$ , year :2018 ), i dont need the userId . I simply need to compare each sales per months and display the best amount ...
For now, i've got this code, who works well, givin me the user 6 sales per month on a given year :
SELECT date_trunc('month', date_vente) AS txn_month, sum(prix_vente) as monthly_sum,count(prix_vente) AS monthly_count
FROM crm_vente
WHERE 1=1
AND date_part('year', date_vente) = 2018
AND id_user = 6
GROUP BY txn_month ORDER BY txn_month
I wonder if somebody could tell me what kind of technology i could use to get the best of sales each 12 months between of the 5 employees .
COuld i use view ? SHould i better do a for loop in php, with each of the users sales per months, then do a kind of comparative array ?
No need to give me a full resolution, but maybe an advice on how to do, directly with postgre ? Because my only solution for now is to use php and to do a not nice code .
Nice day, ill check on MOnday
Sorry for my english
WITH monthly_sales AS (
SELECT
date_trunc('month', date_vente) AS txn_month,
user_id,
sum(prix_vente) as monthly_sum,
FROM crm_vente
WHERE 1=1
AND date_part('year', date_vente) = 2018
GROUP BY txn_month, user_id
ORDER BY txn_month, user_id),
rank_monthly_sales_by_user_id AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY txn_month ORDER BY monthly_sum DESC) AS rank
FROM monthly_sales)
SELECT
txn_month,
monthly_sum
FROM rank_monthly_sales_by_user_id
WHERE rank = 1
ORDER BY txn_month ASC;
Firstly what you should do is get the totals per month by user. This is the top subquery called monthly sales. Monthly_sales sums the sales of each user by month
Next, to get the top user for each month in terms of their total sales you have to rank the rows returned by the previous subquery. This is down by ROW_NUMBER()
ROW_NUMBER() gets the row number in a specified window, in this case it's ordering the rows from monthly_sales for each month (it starts ordering again from 1 each month). The PARTITION BY statement is the window in which we want to perform the row count, here it's month since we want to order our user_id's sales by month. The ORDER BY statement says how to order the rows from 1 to n. We're using monthly_sum in descending order. So the highest monthly sum is 1, lowest is 6
The next query is selecting only the rows from rank_monthly_sales_by_user_id that are the top sales for the month (WHERE rank = 1)
This leaves us with a output where is row is a month, with the highest sale for that month
Let me know if that was what you needed help with

Postgres group by quarter

I have table with columns: topic, person, published_date. I would like to create query which help me compute how many times every person wrote in specific topic in every quarter. Example:
topic person published_date
'world' JS 2016-05-05
'world' JS 2016-05-10
'nature' AR 2016-12-01
should return something like
topic person quarter how_many_times
'world' JS 2 2
'nature' AR 4 1
I'm able to group it by topic and person
select topic, person, published_date, count(*) from table group by topic, person, published_date
but how group published_date into quarters?
Assuming that the published_date is a date type column you can use the extract function like this:
select
topic,
person,
extract(quarter from published_date) as quarter,
count(*)
from
table1
group by
topic,
person,
extract(quarter from published_date)
order by
extract(quarter from published_date) asc
Sample SQL Fiddle
If the dates can fall into different years you might want to add the year to the select and group by.
If you want both quarter and year you can use date_trunc:
SELECT
date_trunc('quarter', published_date) AS quarter
This gives the date rounded to the start of the quarter, e.g. 2020-04-01, and has the advantage that subsequent steps in the pipeline can read it like a normal date.
(This compares to extract (extract(quarter FROM published_date)), which gives the number of the quarter, i.e. 1, 2, 3, or 4.)
If someone also needs the year, they can use this:
SELECT (extract(year from published_date)::text || '.Q' || extract(quarter from published_date)::text) as quarter
This will return the value in the form 2018.Q1.
How does it work? It extracts both year and quarter (which are both of type double precision), casts them into strings, and concatenates the whole in something readable.