PostgreSQL Select Sum of Number, Group by Date - postgresql

I'm not sure if there is question like this out there already, couldn't find my solution, so sorry if this is a duplicate:
I have a table with dates:
|date (date)| tax (numeric) | (stuff in brackets is the type)
|2012-12-12 | 5.00 |
|2012-12-12 | 10.00 |
|2012-12-13 | 2.00 |
I want my output to look like this:
|date (date)| tax (numeric) | (stuff in brackets is the type)
|2012-12-12 | 15.00 |
|2012-12-13 | 2.00 |
I was thinking of doing a CTE type query because in the datebase, I store things as datetime without timezone, and a bunch of taxes
This is the start of my query:
with date_select as
(
select CAST(r.datetime as DATE), sum(r.tax) as tax
from suezensalon.receipt r
where r.datetime between '2012-12-12 00:00:00' and '2012-12-18 23:59:59'
group by r.datetime
order by r.datetime
)
This gives me the top table. What is the best way to do this? Is it by 'averaging the date'?

This is what ended up working:
with date_select as
(
select CAST(r.datetime as DATE), sum(r.tax) as tax
from suezensalon.receipt r
where r.datetime between '2012-12-12 00:00:00' and '2012-12-18 23:59:59'
group by r.datetime
order by r.datetime
)
select extract(Month from datetime) || '-' || extract(Day from datetime) || '-' || extract(Year from datetime) as Date, sum(tax)
from date_select
group by Date
order by Date;

I think the simplest option, and most likely to perform well in the presence of indexes, would be something like this:
SELECT
datetime::date as "Date",
sum(tax)
FROM suezensalon.receipt
WHERE datetime >= '2012-12-12 00:00:00'
AND datetime < '2012-12-19 00:00:00'
GROUP BY datetime::date
ORDER BY "Date";

Related

MySQL group by timestamp difference

I need to write mysql query which will group results by difference between timestamps.
Is it possible?
I have table with locations and every row has created_at (timestamp) and I want to group results by difference > 1min.
Example:
id | lat | lng | created_at
1. | ... | ... | 2020-05-03 06:11:35
2. | ... | ... | 2020-05-03 06:11:37
3. | ... | ... | 2020-05-03 06:11:46
4. | ... | ... | 2020-05-03 06:12:48
5. | ... | ... | 2020-05-03 06:12:52
Result of this data should be 2 groups (1,2,3) and (4,5)
It depends on what you actually want. If youw want to group together records that belong to the same minute, regardless of the difference with the previous record, then simple aggregation is enough:
select
date_format(created_at, '%Y-%m-%d %H:%i:00') date_minute,
min(id) min_id,
max(id) max_id,
min(created_at) min_created_at,
max(created_at) max_created_at,
count(*) no_records
from mytable
group by date_minute
On the other hand, if you want to build groups of consecutive records that have less than 1 minute gap in between, this is a gaps and islands problem. Here is on way to solve it using window functions (available in MySQL 8.0):
select
min(id) min_id,
max(id) max_id,
min(created_at) min_created_at,
max(created_at) max_created_at,
count(*) no_records
from (
select
t.*,
sum(case when created_at < lag_created_at + interval 1 minute then 0 else 1 end)
over(order by created_at) grp
from (
select
t.*,
lag(created_at) over(order by created_at) lag_created_at
from mytable t
) t
) t
group by grp

First record by month & by year

I have a Rails application with 20+ years of data.
I'm struggling to create two SQLs:
Fetch the first record of each year (based on filters)
Fetch the first record of each month (based on filters)
I made a DBFiddle here: https://www.db-fiddle.com/f/wjQqrrpaJeiYG8zkExbaos/0
For the first query (yearly), the result should be:
a | b_id | created_at
74780 | 82373 | 2020-01-02 01:34:33 +0000
15670 | 16639 | 2019-02-24 14:33:56 +0000
14586 | 87594 | 2018-01-06 09:14:31 +0000
I can fetch the years and months using date_part('year', created_at) and date_part('month', created_at), but didn't find a way to "glue" them with min(created_at).
Try to use window function OVER:
with grouped as(
select *, min(created_at) over(partition by date_trunc('year', created_at))
from z order by date_trunc('year', created_at) desc
)
select a, b_id, created_at from grouped where min = created_at
For the first record by month you can use the same approach by replacing all date_trunc('year', created_at) with date_trunc('month', created_at)

PostgreSQL Marketing Report

I'm writing out a query that takes ad marketing data from Google Ads, Microsoft, and Taboola and merges it into one table.
The table should have 3 rows, one for each ad company with 4 columns: traffic source (ad company), money spent, sales, and cost per conversion. Right now I'm just dealing with the first 2 till I get those right. The whole table's data should be grouped within that a given month's data.
Right now the results I'm getting are multiple rows from each traffic source, some of them merging months of data into the cost column instead of summing up the costs within a given month.
WITH google_ads AS
( SELECT 'Google' AS traffic_source,
date_trunc('month', "day"::date) AS month,
SUM(cost / 1000000) AS cost
FROM googleads_campaign AS g
GROUP BY month
ORDER BY month DESC),
taboola AS
( SELECT 'Taboola' AS traffic_source,
date_trunc('month', "date"::date) AS month,
SUM(spent) AS cost
FROM taboola_campaign AS t
GROUP BY month
ORDER BY month DESC),
microsoft AS
( SELECT 'Microsoft' AS traffic_source,
date_trunc('month', "TimePeriod"::date) AS month,
SUM("Spend") AS cost
FROM microsoft_campaign AS m
GROUP BY month
ORDER BY month DESC)
SELECT (CASE
WHEN M.traffic_source='Microsoft' THEN M.traffic_source
WHEN T.traffic_source='Taboola' THEN T.traffic_source
WHEN G.traffic_source='Google' THEN G.traffic_source
END) AS traffic_source1,
SUM(CASE
WHEN G.traffic_source='Google' THEN G.cost
WHEN T.traffic_source='Taboola' THEN T.cost
WHEN M.traffic_source='Microsoft' THEN M.cost
END) AS cost,
(CASE
WHEN G.traffic_source='Google' THEN G.month
WHEN T.traffic_source='Taboola' THEN T.month
WHEN M.traffic_source='Microsoft' THEN M.month
END) AS month1
FROM google_ads G
LEFT JOIN taboola T ON G.month = T.month
LEFT JOIN microsoft M ON G.month = M.month
GROUP BY traffic_source1, month1
Here's an example of the results I'm getting. The month column is simply for testing purposes.
| traffic_source1 | cost | month1 |
|:----------------|:-----------|:---------------|
| Google | 210.00 | 01/09/18 00:00 |
| Google | 1,213.00 | 01/10/18 00:00 |
| Google | 2,481.00 | 01/11/18 00:00 |
| Google | 3,503.00 | 01/12/18 00:00 |
| Google | 7,492.00 | 01/01/19 00:00 |
| Microsoft | 22,059.00 | 01/02/19 00:00 |
| Microsoft | 16,958.00 | 01/03/19 00:00 |
| Microsoft | 7,582.00 | 01/04/19 00:00 |
| Microsoft | 76,125.00 | 01/05/19 00:00 |
| Taboola | 37,205.00 | 01/06/19 00:00 |
| Google | 45,910.00 | 01/07/19 00:00 |
| Google | 137,421.00 | 01/08/19 00:00 |
| Google | 29,501.00 | 01/09/19 00:00 |
Instead, it should look like this (Let's say for the month of July this year, for instance):
| traffic_source | cost |
|----------------|-----------|
| Google | 53,901.00 |
| Microsoft | 22,059.00 |
| Taboola | 37,205.00 |
Any help would be greatly appreciated, thanks!
You can try this way:
WITH google_ads AS
( SELECT 'Google' AS traffic_source,
date_trunc('month', "day"::date) AS month,
SUM(cost / 1000000) AS cost
FROM googleads_campaign AS g
GROUP BY month
ORDER BY month DESC),
taboola AS
( SELECT 'Taboola' AS traffic_source,
date_trunc('month', "date"::date) AS month,
SUM(spent) AS cost
FROM taboola_campaign AS t
GROUP BY month
ORDER BY month DESC),
microsoft AS
( SELECT 'Microsoft' AS traffic_source,
date_trunc('month', "TimePeriod"::date) AS month,
SUM("Spend") AS cost
FROM microsoft_campaign AS m
GROUP BY month
ORDER BY month DESC)
SELECT (CASE
WHEN M.traffic_source='Microsoft' THEN M.traffic_source
WHEN T.traffic_source='Taboola' THEN T.traffic_source
WHEN G.traffic_source='Google' THEN G.traffic_source
END) AS traffic_source1,
SUM(CASE
WHEN G.traffic_source='Google' THEN G.cost
WHEN T.traffic_source='Taboola' THEN T.cost
WHEN M.traffic_source='Microsoft' THEN M.cost
END) AS cost,
(CASE
WHEN G.traffic_source='Google' THEN G.month
WHEN T.traffic_source='Taboola' THEN T.month
WHEN M.traffic_source='Microsoft' THEN M.month
END) AS month1
FROM google_ads G
LEFT JOIN taboola T ON G.month = T.month
LEFT JOIN microsoft M ON G.month = M.month
GROUP BY traffic_source1, month1
HAVING EXTRACT(month from month1) = ... desired month (July is 7)
The concept of a different table for each ad source is really a very bad idea. It vastly compounds the complexity of of queries requiring consolidation. It would be much better to have a single table having the source along with the other columns. Consider what happens when marketing decides to use 30-40 or more ad suppliers. If you cannot create a single table then at least standardize column names and types. Also build a view, a materialized view, or a table function (below) which combines all the traffic sources into a single source.
create or replace function consolidated_ad_traffic()
returns table( traffic_source text
, ad_month timestamp with time zone
, ad_cost numeric(11,2)
, ad_sales numeric(11,2)
, conversion_cost numeric(11,6)
)
language sql
AS $$
with ad_sources as
( select 'Google' as traffic_source
, "date" as ad_date
, round(cast (cost AS numeric ) / 1000000.0,2) as cost
, sales
, cost_per_conversion
from googleads_campaign
union all
select 'Taboola'
, "date"
, spent
, sales
, cost_per_conversion
from taboola_campaign
union all
select 'Microsoft'
, "TimePeriod"
, "Spend"
, sales
, cost_per_conversion
from microsoft_campaign
)
select * from ad_sources;
$$;
With a consolidated view of the data you can now write normal selects as though you had a single table. Such as:
select * from consolidated_ad_traffic();
select distinct on( traffic_source, to_char(ad_month, 'mm'))
traffic_source
, to_char(ad_month, 'Mon') "For Month"
, to_char(sum(ad_cost) over(partition by traffic_source, to_char(ad_month, 'Mon')), 'FM99,999,999,990.00') monthly_traffic_cost
, to_char(sum(ad_cost) over(partition by traffic_source), 'FM99,999,999,990.00') total_traffic_cost
from consolidated_ad_traffic();
select traffic_source, sum(ad_cost) ad_cost
from consolidated_ad_traffic()
group by traffic_source
order by traffic_source;
select traffic_source
, to_char(ad_month, 'dd-Mon') "For Month"
, sum(ad_cost) "Monthly Cost"
from consolidated_ad_traffic()
where date_trunc('month',ad_month) = date_trunc('month', date '2019-07-01')
and traffic_source = 'Google'
group by traffic_source, to_char(ad_month, 'dd-Mon') ;
Now this won't do much for updating but will drastically ease selection.

Postgresql: get first item of an ordered group not working [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 6 years ago.
I have this data:
| id | person_id | date |
|--------|-----------|---------------------|
| 313962 | 1111111 | 2016-04-14 16:00:00 | --> this row
| 313946 | 2222222 | 2015-03-13 15:00:00 | --> this row
| 313937 | 1111111 | 2014-02-12 14:00:00 |
| 313944 | 1111111 | 2013-01-11 13:00:00 |
| ...... | ....... | ................... |
-What I would like to select are the indicated rows, i.e. the rows with the most recent date for each person_id.
-Also the output format for the date must be dd-mm-YYYY
So far I was trying with this:
SELECT
l.person_id,
to_char(DATE(l.date), 'dd-mm-YYYY') AS user_date
FROM login l
group by l.person_id
order by l.date desc
I was trying different approaches, but I have all kind of Aggregation error messages such as:
for select distinct order by expressions must appear
And
must appear in the GROUP BY clause or be used in an aggregate function
Any idea?
There are several ways, but the simplest way (and perhaps more efficient - but not SQL standard) is to rely on Postgresql's DISTINCT ON:
SELECT DISTINCT ON (person_id )
id, person_id , date
FROM login
ORDER BY person_id , date desc
The date formatting (do you really want that?) can be done in a outer select:
SELECT id,person_id, to_char(DATE(date), 'dd-mm-YYYY') as date
FROM (
SELECT DISTINCT ON (person_id )
id, person_id , date
FROM login
ORDER BY person_id, date desc )
AS XXX;
You can do it with a subquery, something like this:
SELECT
l.person_id,
to_char(DATE(l.date), 'dd-mm-YYYY') AS user_date
FROM login l
where l.date = (select max(date) from login where person_id = l.person_id)
order by l.person_id
You need something like the following to know which date to grab for each person.
select l.person_id, to_char(DATE(d.maxdate), 'dd-mm-YYYY')
from login l
inner join
(select person_id, max(date) as maxdate
from login group by person_id) d on l.person_id = d.person_id
order by d.maxdate desc

Compare interval date by row

I am trying to group dates within a 1 year interval given an identifier by labeling which is the earliest date and which is the latest date. If there are no dates within a 1 year interval from that date, then it will record it's own date as the first and last date. For example originally the data is:
id | date
____________
a | 1/1/2000
a | 1/2/2001
a | 1/6/2000
b | 1/3/2001
b | 1/3/2000
b | 1/3/1999
c | 1/1/2000
c | 1/1/2002
c | 1/1/2003
And the output I want is:
id | first_date | last_date
___________________________
a | 1/1/2000 | 1/2/2001
b | 1/3/1999 | 1/3/2001
c | 1/1/2000 | 1/1/2000
c | 1/1/2002 | 1/1/2003
I have been trying to figure this out the whole day and can't figure it out. I can do it for cases id's with only 2 duplicates, but can't for greater values. Any help would be great.
SELECT id
, min(min_date) AS min_date
, max(max_date) AS max_date
, sum(row_ct) AS row_ct
FROM (
SELECT id, year, min_date, max_date, row_ct
, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
FROM (
SELECT id
, extract(year FROM the_date)::int AS year
, min(the_date) AS min_date
, max(the_date) AS max_date
, count(*) AS row_ct
FROM tbl
GROUP BY id, year
) sub1
) sub2
GROUP BY id, grp
ORDER BY id, grp;
1) Group all rows per (id, year), in subquery sub1. Record min and max of the date. I added a count of rows (row_ct) for demonstration.
2) Subtract the row_number() from the year in the second subquery sub2. Thus, all rows in succession end up in the same group (grp). A gap in the years starts a new group.
3) In the final SELECT, group a second time, this time by (id, grp) and record min, max and row count again. Voilá. Produces exactly the result you are looking for.
-> SQLfiddle demo.
Related answers:
Return array of years as year ranges
Group by repeating attribute
select id, min ([date]) first_date, max([date]) last_date
from <yourTbl> group by id
Use this (SQLFiddle Demo):
SELECT id,
min(date) AS first_date,
max(date) AS last_date
FROM mytable
GROUP BY 1
ORDER BY 1