How do i write a group by query in PostgreSQL

How do i write a group by query in PostgreSQL - postgresql

I'm getting errors with PostgreSQL when am writing a group by query,
am sure someone will tell me to put all the columns I've selected in group by, but that will not give me the correct results.
Am writing a query that will select all the vehicles in the database and group the results by vehicles, giving me the total distance and cost for a given period.
Here is how am doing the query.
SELECT i.vehicle AS vehicle,
i.costcenter AS costCenter,
i.department AS department,
SUM(i.quantity) AS liters,
SUM(i.totalcost) AS Totalcost,
v.model AS model,
v.vtype AS vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates::text LIKE '%2019-03%' AND i.deleted_at IS NULL
GROUP BY i.vehicle;
If I put all the columns that are in the select in the group bt, the results will not be correct.
How do i go about this without putting all the columns in group by and creating sub-queries?
The fuel table looks like:
vehicle dates department quantity totalcost
1 2019-01-01 102 12 1200
1 2019-01-05 102 15 1500
1 2019-01-13 102 18 1800
1 2019-01-22 102 10 1000
2 2019-01-01 102 12 1260
2 2019-01-05 102 19 1995
2 2019-01-13 102 28 2940
Vehicle Table
id model vtype
1 1 2
2 4 6
2 5 7
This is the results i expect from the query
vehicle dates department quantity totalcost model vtype
1 2019-01-01 102 12 1200 1 2
1 2019-01-05 102 15 1500 1 2
1 2019-01-13 102 18 1800 1 2
1 2019-01-22 102 10 1000 1 2
1 2019-01-18 102 10 1000 1 2
1 65 6500
2 2019-01-01 102 12 1260 5 7
2 2019-01-05 102 19 1995 5 7
2 2019-01-13 102 28 2940 5 7
1 45 6195

Your query doesn't really make sense. Apparently there can be multiple departments and costcenters per vehicle in the fuelissuances table - which of those should be returned?
One way to deal with that, is to return all of them, e.g. as an array:
SELECT i.vehicle,
array_agg(i.costcenter) as costcenters,
array_agg(i.department) as departments,
SUM(i.quantity) AS liters,
SUM(i.totalcost) AS Totalcost,
v.model,
v.vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates >= date '2019-03-01'
and i.date < date '2019-04-01'
AND i.deleted_at IS NULL
group by i.vehicle, v.model, v.vtype;
Instead of an array, you could also return a comma separated lists of those values, e.g. string_agg(i.costcenter, ',') as costcenters.
Adding the columns v.model and v.vtype won't (shouldn't) change anything as the group by i.vehicle will only return a single vehicle anyway and thus the model and vtype won't change for that in the group.
Note that I removed the useless aliases and replaced the condition on the date with a proper range condition that can make use of an index on the dates column.
Edit
Based on your new sample data, you want a running total, rather than a "regular" aggregation. This can easily be done using window functions
SELECT i.vehicle,
i.costcenter,
i.department,
SUM(i.quantity) over (w) AS liters,
SUM(i.totalcost) over (w) AS Totalcost,
v.model,
v.vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates >= date '2019-01-01'
and i.dates < date '2019-02-01'
AND i.deleted_at IS NULL
window w as (partition by i.vehicle order by i.dates)
order by i.vehicle, i.dates;
I would not create those "total" lines using SQL, but rather in your front end that display the data.
Online example: https://rextester.com/CRJZ27446

You need to use a nested query to get those SUM you want inside that query.
SELECT i.vehicle AS vehicle,
i.costcenter AS costCenter,
i.department AS department,
(SELECT SUM(i.quantity) FROM TABLES WHERE CONDITIONS GROUP BY vehicle) AS liters,
(SELECT SUM(i.totalcost) FROM TABLES WHERE CONDITIONS GROUP BY vehicle) AS Totalcost,
v.model AS model,
v.vtype AS vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates::text LIKE '%2019-03%' AND i.deleted_at IS NULL;

Related

Calculate Subtotals using fixed categories in postgresql query

I have a query that returns the count and sum of certain fields on my tables, and also a total. It goes like this:
example:
with foo as(
select s.subcat as subcategory,
sum(s.cost) as costs,
round(((sum(s.cost) / (s.tl)::numeric )*100),2)|| ' %' as pct_cost
from (select ...big query)s group by s.subcat
)
select * from foo
union
select 'Total costs' as subcategory,
sum(costs) as costs,
null as pct_cost
from foo
order by...
Category
Cost
Percentage
x_subcategory 1
5
0.5%
x_subcategory 2
1
0.1%
x_subcategory 3
18
1.8%
y_subcategory 1
7
0.7%
y_subcategory 2
10
1.0%
...
...
...
Total
41
5.8%
And what I need to do for another report is to get the totals by Category. I have to assign these categories based on the value of the subcategory name, the point is how to partition the result so I can get something like this:
Category
Cost
Percentage
x_subcategory 1
5
0.5%
x_subcategory 2
1
0.1%
x_subcategory 3
18
1.8%
X category
24
2.4%
y_subcategory 1
7
0.7%
y_subcategory 2
10
1.0%
Y category
17
1.7%
With GROUP BY and GROUP BY GROUPING SET I don't get what I want, and with PARTITION I'm getting syntax errors, I'm able to use it in simpler queries but this one turned out to be very complicated and I wonder if it's possible to achieve this on a query on PostgreSQL.

How to select only one result per condition met inside an individual table (no joins)?

I have a table containing all the trips taken by different cars. I've filtered down this table to trips that had multiple stops specifically. Now all i want to do is get the first stop that each car had.
What i've got is:
Car ID
Date_depart
Date_arrive
Count (from a previous table creation)
I've filtered this table by using Car ID + Date Depart and making a count where there are multiple date_arrives for a single date_depart. Now i'm trying to figure out how to only get back the first stop but am completely stuck. Outside of doing the lateral join X, order by Z limit 1 etc method; i have no idea how to get back only the first result in this table.
Here's some sample data:
Car ID Date_depart Date_arrive Count
949 2017-01-01 2017-01-05 2
949 2017-01-01 2017-01-09 2
1940 2017-01-09 2017-01-11 3
1940 2017-01-09 2017-01-14 3
1940 2017-01-09 2017-01-28 3
949 2018-04-19 2018-04-23 2
949 2018-04-19 2018-04-26 2
and the expected result would be:
Car ID Date_depart Date_arrive Count
949 2017-01-01 2017-01-05 2
1940 2017-01-09 2017-01-11 3
949 2018-04-19 2018-04-23 2
Any help?

You need DISTINCT ON
SELECT DISTINCT ON (date_depart, car_id)
*
FROM
trips
ORDER BY date_depart, car_id, date_arrive
This gives you the first (ordered) row of each group (date_depart, car_id)
demo: db<>fiddle

Running Count Total with PostgresQL

I'm fairly close to this solution, but I just need a little help getting over the end.
I'm trying to get a running count of the occurrences of client_ids regardless of the date, however I need the dates and ids to still appear in my results to verify everything.
I found part of the solution here but have not been able to modify it enough for my needs.
Here is what the answer should be, counting if the occurrences of the client_ids sequentially :
id client_id deliver_on running_total
1 138 2017-10-01 1
2 29 2017-10-01 1
3 138 2017-10-01 2
4 29 2013-10-02 2
5 29 2013-10-02 3
6 29 2013-10-03 4
7 138 2013-10-03 3
However, here is what I'm getting:
id client_id deliver_on running_total
1 138 2017-10-01 1
2 29 2017-10-01 1
3 138 2017-10-01 1
4 29 2013-10-02 3
5 29 2013-10-02 3
6 29 2013-10-03 1
7 138 2013-10-03 2
Rather than counting the times the client_id appears sequentially, the code counts the time the id appears in the previous date range.
Here is my code and any help would be greatly appreciated.
Thank you,
SELECT n.id, n.client_id, n.deliver_on, COUNT(n.client_id) AS "running_total"
FROM orders n
LEFT JOIN orders o
ON (o.client_id = n.client_id
AND n.deliver_on > o.deliver_on)
GROUP BY n.id, n.deliver_on, n.client_id
ORDER BY n.deliver_on ASC
* EDIT WITH ANSWER *
I ending up solving my own question. Here is the solution with comments:
-- Set "1" for counting to be used later
WITH DATA AS (
SELECT
orders.id,
orders.client_id,
orders.deliver_on,
COUNT(1) -- Creates a column of "1" for counting the occurrences
FROM orders
GROUP BY 1
ORDER BY deliver_on, client_id
)
SELECT
id,
client_id,
deliver_on,
SUM(COUNT) OVER (PARTITION BY client_id
ORDER BY client_id, deliver_on
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -- Counts the sequential client_ids based on the number of times they appear
FROM DATA

Just the answer posted to close the question:
-- Set "1" for counting to be used later
WITH DATA AS (
SELECT
orders.id,
orders.client_id,
orders.deliver_on,
COUNT(1) -- Creates a column of "1" for counting the occurrences
FROM orders
GROUP BY 1
ORDER BY deliver_on, client_id
)
SELECT
id,
client_id,
deliver_on,
SUM(COUNT) OVER (PARTITION BY client_id
ORDER BY client_id, deliver_on
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -- Counts the sequential client_ids based on the number of times they appear
FROM DATA

PostgreSQL window function & difference between dates

Suppose I have data formatted in the following way (FYI, total row count is over 30K):
customer_id order_date order_rank
A 2017-02-19 1
A 2017-02-24 2
A 2017-03-31 3
A 2017-07-03 4
A 2017-08-10 5
B 2016-04-24 1
B 2016-04-30 2
C 2016-07-18 1
C 2016-09-01 2
C 2016-09-13 3
I need a 4th column, let's call it days_since_last_order which, in the case where order_rank = 1 then 0 else calculate the number of days since the previous order (with rank n-1).
So, the above would return:
customer_id order_date order_rank days_since_last_order
A 2017-02-19 1 0
A 2017-02-24 2 5
A 2017-03-31 3 35
A 2017-07-03 4 94
A 2017-08-10 5 38
B 2016-04-24 1 0
B 2016-04-30 2 6
C 2016-07-18 1 79
C 2016-09-01 2 45
C 2016-09-13 3 12
Is there an easier way to calculate the above with a window function (or similar) rather than join the entire dataset against itself (eg. on A.order_rank = B.order_rank - 1) and doing the calc?
Thanks!

use the lag window function
SELECT
customer_id
, order_date
, order_rank
, COALESCE(
DATE(order_date)
- DATE(LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date))
, 0)
FROM <table_name>

Retrieve information dynamically from multiple CTE

I have multiple CTEs and I want to retrieve some information from a couple of them into next CTE.
So, I have this information from one of the CTEs:
PeriodID StarDate
1 2006-01-01
2 2007-04-25
3 2008-08-16
4 2009-12-08
5 2011-04-017
and this from other:
RecordID Date
100 2007-04-15
101 2008-05-21
102 2008-06-06
103 2008-07-01
104 2009-11-12
And I need to show in next one:
RecordID Date PeriodID
100 2007-04-15 1
101 2008-05-21 2
102 2008-06-06 2
103 2008-07-01 2
104 2009-11-12 3
I can use some case/when statement to define if date of record is in period 1,2,3,4 or 5 but it some situation I can have different numbers of periods return from the first CTE.
Is there a way to do this in the above context?

You can have multiple CTEs defined as follows, and then select from and join them as you would any other table.
with cte1 as (select * ...),
cte2 as (select * ...)
select
cte2.*,
periodid
from cte2
cross apply
(select top 1 * from cte1 where cte2.recorddate> cte1.startdate order by startdate desc) v