I have a query that returns the count and sum of certain fields on my tables, and also a total. It goes like this:
example:
with foo as(
select s.subcat as subcategory,
sum(s.cost) as costs,
round(((sum(s.cost) / (s.tl)::numeric )*100),2)|| ' %' as pct_cost
from (select ...big query)s group by s.subcat
)
select * from foo
union
select 'Total costs' as subcategory,
sum(costs) as costs,
null as pct_cost
from foo
order by...
Category
Cost
Percentage
x_subcategory 1
5
0.5%
x_subcategory 2
1
0.1%
x_subcategory 3
18
1.8%
y_subcategory 1
7
0.7%
y_subcategory 2
10
1.0%
...
...
...
Total
41
5.8%
And what I need to do for another report is to get the totals by Category. I have to assign these categories based on the value of the subcategory name, the point is how to partition the result so I can get something like this:
Category
Cost
Percentage
x_subcategory 1
5
0.5%
x_subcategory 2
1
0.1%
x_subcategory 3
18
1.8%
X category
24
2.4%
y_subcategory 1
7
0.7%
y_subcategory 2
10
1.0%
Y category
17
1.7%
With GROUP BY and GROUP BY GROUPING SET I don't get what I want, and with PARTITION I'm getting syntax errors, I'm able to use it in simpler queries but this one turned out to be very complicated and I wonder if it's possible to achieve this on a query on PostgreSQL.
Related
I have something similar to the following table, which is a randomly ordered list of thousands of transactions with a Customer_ID and an order_cost for each transaction.
Customer_ID
order_cost
1
$503
53
$7
4
$80
13
$76
6
$270
78
$2
8
$45
910
$89
10
$3
1130
$43
etc...
etc...
I want to group the transactions by Customer_ID, aggregate the cost of all the orders into a spending column, and then create a new "decile" row that would assign a number 1-10 to each customer so that when the "spending" for all customers in a decile is added up, each decile contains 10% of all the spending.
The resulting table would look something like the table below where each ascending decile will contain fewer customers, but the total sum of "spending" for all the records in each decile group will be the same for deciles 1-10. (The actual numbers in this sample column don't actually add up, it's just the concept)
Customer_ID
spending
Decile
45
$500
1
3
$700
1
349
$800
1
23
$1,000
1
64
$2,000
1
718
$2,100
1
3452
$2,300
1
1276
$2,600
2
10
$3,000
2
34
$4,000
2
etc...
etc...
etc...
So far I have grouped by Customer_ID, aggregated the order_cost to a spending column, ordered each customer in ascending order based on the spending column, and then partitioned all the customers into 5000 groups. From there I manually found the values for each .when statement that would result in deciles 1-10 each containing the right amount of customers so each decile has 10% of the sum of the entire spending column. It's pretty time-consuming to use trial and error to find the right bucket configuration that results in each decile having 10% of the spending column.
I'm trying to find a way to automate this process so I don't have to find the right bucketing ratio for each decile by trial and error.
This is my code so far:
Import pyspark.sql.functions as F
deciles = (table
.groupBy('Customer_ID')
.agg(F.sum('order_cost').alias('spending')).alias('a')
.withColumn('rank', F.ntile (5000).over(W.Window.partitionBy()
.orderBy(F.asc('spending'))))
.withColumn('rank', F.when(F.col('rank')<=4628, F.lit(1))
.when(F.col('rank')<=4850, F.lit(2))
.when(F.col('rank')<=4925, F.lit(3))
.when(F.col('rank')<=4965, F.lit(4))
.when(F.col('rank')<=4980, F.lit(5))
.when(F.col('rank')<=4987, F.lit(6))
.when(F.col('rank')<=4993, F.lit(7))
.when(F.col('rank')<=4997, F.lit(8))
.when(F.col('rank')<=4999, F.lit(9))
.when(F.col('rank')<=5000, F.lit(10))
.otherwise (F.lit(0)))
)
end_table = (table.alias('a').join(deciles.alias('b'), ['Customer_ID'], 'left')
.selectExpr('a.*', 'b.rank')
)
I have a bunch of table, say, Job and transactions
Job:
id Name
4 Clay
6 Glow
7 Circle
9 Jam
Transactions
Id Job_id Person Marks
2 6 Amy 0
3 3 Keith 30
5 3 Glass 10
7 9 Know 60
11 6 Play 81
13 6 Play 100
How do I find below return query which should return three column Job_id(Id of Job), Job_name(name of job) and level , which is one of three possible Strings: "Hard", "Easy", "Medium".
**Job_id** **Job_name** **Level**
----------------------------------------------------
4 Clay Hard
6 Glow Easy
9 Jam Medium
Level is calculated if average score on Transactions,
-- If average score for job is lower than or equal to 20, then its level is "Hard".
-- If average score for job is higher than 20 but lower than or equal to 60, then its level is "Medium".
-- If average score for job is higher than 60, then its level is "Easy".
I'm not sure if I should use a subQuery for this, or if there's an easier way.
Thanks!
select j.id,j.name,if(avg(t.marks)> 60,'Easy',if(avg(t.marks) <= 60 and avg(t.marks) > 20,'Medium','Hard')) as level from job j left join Transactions t on j.id= t.Job_id where t.id is not null group by Job_id
I just wrote down the query by looking at your data, it will be good if you provide schema with data in sqlfiddle etc. You can try above query
for SQLLite use as below
select j.id AS Job_id,
j.name AS Job_name,
CASE
WHEN avg(t.marks)>60 THEN 'EASY'
WHEN avg(t.marks)<=60 and avg(t.marks)>20 THEN 'MEDIUM'
WHEN avg(t.marks)<=20 THEN 'HARD'
END Level
from job j
left join Transactions t
on j.id = t.Job_id
where t.id is not null
group by Job_id;
i have a large table, here is a snippet of how it looks like
name class brand rating
12 d 1 3.8
22 d 1 3.9
33 a 2 1.1
12 d 1 2.3
12 a 3 3.4
44 b 1 9.8
22 c 2 3.0
i calculated for the average of the rating doing the below
select avg(rating) over(partition by name,class,brand) as avg_rating from df
i'm aware that postgres doesn't have a median function but i would like to calculate for that column and have the output in a similar structure to that of my window function for average
in case of even number of rows, i would like the average number between the middle two numbers
To get the median, you should use percentile_cont
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY rating) FROM df;
I'm getting errors with PostgreSQL when am writing a group by query,
am sure someone will tell me to put all the columns I've selected in group by, but that will not give me the correct results.
Am writing a query that will select all the vehicles in the database and group the results by vehicles, giving me the total distance and cost for a given period.
Here is how am doing the query.
SELECT i.vehicle AS vehicle,
i.costcenter AS costCenter,
i.department AS department,
SUM(i.quantity) AS liters,
SUM(i.totalcost) AS Totalcost,
v.model AS model,
v.vtype AS vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates::text LIKE '%2019-03%' AND i.deleted_at IS NULL
GROUP BY i.vehicle;
If I put all the columns that are in the select in the group bt, the results will not be correct.
How do i go about this without putting all the columns in group by and creating sub-queries?
The fuel table looks like:
vehicle dates department quantity totalcost
1 2019-01-01 102 12 1200
1 2019-01-05 102 15 1500
1 2019-01-13 102 18 1800
1 2019-01-22 102 10 1000
2 2019-01-01 102 12 1260
2 2019-01-05 102 19 1995
2 2019-01-13 102 28 2940
Vehicle Table
id model vtype
1 1 2
2 4 6
2 5 7
This is the results i expect from the query
vehicle dates department quantity totalcost model vtype
1 2019-01-01 102 12 1200 1 2
1 2019-01-05 102 15 1500 1 2
1 2019-01-13 102 18 1800 1 2
1 2019-01-22 102 10 1000 1 2
1 2019-01-18 102 10 1000 1 2
1 65 6500
2 2019-01-01 102 12 1260 5 7
2 2019-01-05 102 19 1995 5 7
2 2019-01-13 102 28 2940 5 7
1 45 6195
Your query doesn't really make sense. Apparently there can be multiple departments and costcenters per vehicle in the fuelissuances table - which of those should be returned?
One way to deal with that, is to return all of them, e.g. as an array:
SELECT i.vehicle,
array_agg(i.costcenter) as costcenters,
array_agg(i.department) as departments,
SUM(i.quantity) AS liters,
SUM(i.totalcost) AS Totalcost,
v.model,
v.vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates >= date '2019-03-01'
and i.date < date '2019-04-01'
AND i.deleted_at IS NULL
group by i.vehicle, v.model, v.vtype;
Instead of an array, you could also return a comma separated lists of those values, e.g. string_agg(i.costcenter, ',') as costcenters.
Adding the columns v.model and v.vtype won't (shouldn't) change anything as the group by i.vehicle will only return a single vehicle anyway and thus the model and vtype won't change for that in the group.
Note that I removed the useless aliases and replaced the condition on the date with a proper range condition that can make use of an index on the dates column.
Edit
Based on your new sample data, you want a running total, rather than a "regular" aggregation. This can easily be done using window functions
SELECT i.vehicle,
i.costcenter,
i.department,
SUM(i.quantity) over (w) AS liters,
SUM(i.totalcost) over (w) AS Totalcost,
v.model,
v.vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates >= date '2019-01-01'
and i.dates < date '2019-02-01'
AND i.deleted_at IS NULL
window w as (partition by i.vehicle order by i.dates)
order by i.vehicle, i.dates;
I would not create those "total" lines using SQL, but rather in your front end that display the data.
Online example: https://rextester.com/CRJZ27446
You need to use a nested query to get those SUM you want inside that query.
SELECT i.vehicle AS vehicle,
i.costcenter AS costCenter,
i.department AS department,
(SELECT SUM(i.quantity) FROM TABLES WHERE CONDITIONS GROUP BY vehicle) AS liters,
(SELECT SUM(i.totalcost) FROM TABLES WHERE CONDITIONS GROUP BY vehicle) AS Totalcost,
v.model AS model,
v.vtype AS vtype
FROM fuelissuances AS i
LEFT JOIN vehicles AS v ON i.vehicle = v.id
WHERE i.dates::text LIKE '%2019-03%' AND i.deleted_at IS NULL;
tsql newbie here.
I have a table, similar to this one:
CarId CarName UserId RentedTimes CrashedTimes
`````````````````````````````````````````````````````````
1 Ferrari 1 2 0
2 DB9 1 5 0
3 Ferrari 2 4 0
4 Audi 3 1 0
5 Audi 1 1 0
Assuming the table is called 'Cars', I am trying to select total number of times each of cars were rented. According to the table above, Ferrari was rented total of 6 times, DB9 five times and Audi twice.
I tried doing this:
select CarName, SUM(RentedTimes)
from Cars
group by CarName, RentedTimes
order by RentedTimes desc
but, it is returning two rows of ferrari with 2,4 as rented times and so on..
How do I select all cars, and total times each were rented, please?
Thanks
Edited the query to include sort order, sorry.
select CarName, SUM(RentedTimes)
from Cars
group by CarName
ORDER BY SUM(RentedTimes) DESC
Try this way.. removed RentedTimes from group by