sql - aggregate count and share by group - postgresql

with table t1 like below, need to get the count by each make and the share by each make
+--------+
| make |
+--------+
| toyota |
| audi |
| bmw |
| bmw |
| audi |
+--------+
with below I can get get the car_cnt per make
select
make
, count (*) as car_cnt
from t1
group by make
how do I get the share (%) for each make ?

Using COUNT as an analytic function, we can make a single pass over your table and compute the market share for each car.
select distinct
make,
count(*) over (partition by make) as car_cnt,
100.0 * count(*) over (partition by make) / count(*) over () as car_pct
from t1
Output:
make car_cnt car_pct
1 audi 2 40
2 bmw 2 40
3 toyota 1 20
Demo here:
Rextester

Related

PostgreSQL - Setting null values to missing rows in a join statement

SQL newbie here. I'm trying to write a query that generates a scoring table, setting null to a student's grades in a module for which they haven't yet taken their exams (on PostgreSQL).
So I start with tables that look something like this:
student_evaluation:
|student_id| module_id | course_id |grade |
|----------|-----------|-----------|-------|
| 1 | 1 | 1 |3 |
| 1 | 1 | 1 |7 |
| 1 | 2 | 1 |8 |
| 2 | 4 | 2 |9 |
course_module:
| module_id | course_id |
| ---------- | --------- |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
In our use case, a course is made up of several modules. Each module has a single exam, but a student who failed his exam may have a couple of retries. The same module may also be present in different courses, but an exam attempt only counts for one instance of the module (ie. student A passed module 1's exam on course 1. If course 2 also has module 1, student A has to retake the same exam for course 2 if he also has access to that course).
So the output should look like this:
student_id
module_id
course_id
grade
1
1
1
3
1
1
1
7
1
2
1
8
1
3
1
null
2
4
2
9
I feel like this should have been a simple task, but I think I have a very flawed understanding of how outer and cross joins work. I have tried stuff like:
SELECT se.student_id, se.module_id, se.course_id, se.grade FROM student_evaluation se
RIGHT OUTER JOIN course_module ON course_module.course_id = se.course_id
AND course_module.module_id = se.module_id
or
SELECT se.student_id, se.module_id, se.course_id, se.grade FROM student_evaluation se
CROSS JOIN course_module WHERE course_module.course_id = se.course_id
Neither worked. These all feel wrong, but I'm lost as to what would be the proper way to go about this.
Thank you in advance.
I think you need both join types: first use a cross join to build a list of all combinations of students and courses, then use an outer join to add the grades.
SELECT sc.student_id,
sc.module_id,
sc.course_id,
se.grade
FROM student_evaluation se
RIGHT JOIN (SELECT s.student_id,
c.module_id,
c.course_id
FROM (SELECT DISTINCT student_id
FROM student_evaluation) AS s
CROSS JOIN course_module AS c) AS sc
USING (course_id));

Flatten hierarchy on self-join table

I have data in a self-join hierarchical table where Continents have many Countries have many Regions have many States have many Cities.
Self-joining table structure:
|-------------------------------------------------------------|
| ID | Name | Type | ParentID | IsTopLevel |
|-------------------------------------------------------------|
| 1 | North America | Continent | NULL | 1 |
| 12 | United States | Country | 1 | 0 |
| 113 | Midwest | Region | 12 | 0 |
| 155 | Kansas | State | 113 | 0 |
| 225 | Topeka | City | 155 | 0 |
| 2 | South America | Continent | NULL | 1 |
| 22 | Argentina | Country | 2 | 0 |
| 223 | Southern | Region | 22 | 0 |
| 255 | La Pampa | State | 223 | 0 |
| 777 | Santa Rosa | City | 255 | 0 |
|-------------------------------------------------------------|
I have been able to successfully use a recursive CTE to get the tree structure and depth of each node. Where I am failing is using a pivot to create a nice list of all bottom locations and their corresponding parents at each level.
The expected results:
|------------------------------------------------------------------------------------|
| Continent | Country | Region | State | City | Bottom_Level_ID |
|------------------------------------------------------------------------------------|
| North America | United States | Midwest | Kansas | Topeka | 234 |
| South America | Argentina | Southern | La Pampa | Santa Rosa | 777 |
|------------------------------------------------------------------------------------|
There are a few key points I should clarify.
Every single entry has a bottom level and a top level. There are no
cases where all five Types are not present for a given location.
If I filled out this data, I'd have 50 entries for North America at the
State level, so you can imagine how immense this table is at the
City level for every continent on the planet. Billions of rows.
The reason this is a necessity is because I need to be able to join onto a historical table of all addresses a person has lived at, and journey up the tree. I figure if I have the LocationID from that table, I can just LEFT JOIN onto a View of this query and nab the appropriate columns.
This is an old database, 2005, and I don't have sysadmin or control of the schema.
My CTE Code
--CTE
;WITH Tree
AS (
SELECT ID, Name, ParentID, Type, 1 as Depth
FROM LocationTable
WHERE IsTopLevel = 1
UNION ALL
SELECT L.ID, L.Name, L.ParentID, L.Type, T.Depth+1
FROM Tree T
JOIN LocationTable L
ON L.ParentGUID = T.GUID
)
Good solid data, in a mostly useful format. BUT then I got to thinking about it and isn't the table structure already in this format, so why would I bother doing a depth tree search if I wasn't going to join the entries together at the same time?
Anyway, here was the rest.
The Pivot Attempt
;WITH Tree
AS (
SELECT ID, Name, ParentID, Type
FROM LocationTable
WHERE IsTopLevel = 1
UNION ALL
SELECT L.ID, L.Name, L.ParentID, L.Type
FROM Tree T
JOIN LocationTable L
ON L.ParentGUID = T.GUID
)
select *
from Tree
pivot (
max(Name)
for Type in ([Continent],[Country],[Region],[State],[City])
) pvt
And now I have everything by Type in a column, with nulls for everything else. As I have struggled with before, I need to filter/join the CTE data before I attempt my pivot, but I have no idea where to start with that piece. Everything I have tried is soooooooooo sloooooooow.
Everytime I think I understand CTEs and Pivot, something new makes me extremely humbled. Please help me. ; ;
If your structure is as clean as you describe it (no gaps, 5 levels always) you might go the easy way:
This data really demands for a classical 1:n-table-tree, where your Countries, States etc. live in their own tables and link to their parent record
Make sure there's an index on ParentID and ID!
DECLARE #tbl TABLE(ID INT,Name VARCHAR(100),Type VARCHAR(100),ParentID INT,IsTopLevel BIT);
INSERT INTO #tbl VALUES
(1,'North America','Continent',NULL,1)
,(12,'United States','Country',1,0)
,(113,'Midwest','Region',12,0)
,(155,'Kansas','State',113,0)
,(225,'Topeka','City',155,0)
,(2,'South America','Continent',NULL,1)
,(22,'Argentina','Country',2,0)
,(223,'Southern','Region',22,0)
,(255,'La Pampa','State',223,0)
,(777,'Santa Rosa','City',255,0);
SELECT Level1.Name AS Continent
,Level2.Name AS Country
,Level3.Name AS Region
,Level4.Name AS State
,Level5.Name AS City
,Level5.ID AS Bottom_Level_ID
FROM #tbl AS Level1
INNER JOIN #tbl AS Level2 ON Level1.ID=Level2.ParentID
INNER JOIN #tbl AS Level3 ON Level2.ID=Level3.ParentID
INNER JOIN #tbl AS Level4 ON Level3.ID=Level4.ParentID
INNER JOIN #tbl AS Level5 ON Level4.ID=Level5.ParentID
WHERE Level1.ParentID IS NULL
The result
Continent Country Region State City Bottom_Level_ID
North America United States Midwest Kansas Topeka 225
South America Argentina Southern La Pampa Santa Rosa 777
Another solution with CTE could be :
;WITH Tree
AS (
SELECT cast(NULL as varchar(100)) as C1, cast(NULL as varchar(100)) as C2, cast(NULL as varchar(100)) as C3, cast(NULL as varchar(100)) as C4, Name as C5, ID as B_Level
FROM LocationTable
WHERE IsTopLevel = 1
UNION ALL
SELECT T.C2, T.C3, T.C4, T.C5, L.Name, L.ID
FROM Tree T
JOIN LocationTable L
ON L.ParentID = T.B_Level
)
select *
from Tree
where C1 is not null

How to get back aggregate values across 2 dimensions using Python Cubes?

Situation
Using Python 3, Django 1.9, Cubes 1.1, and Postgres 9.5.
These are my datatables in pictorial form:
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to use cubes to be able to do a display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
Note the following:
Even though there were no records in sales for Saucer 12 under Store S3, I displayed 0 instead of null or none.
I want to be able to do sort by store, say descending order for, S3.
The cells indicate the SUM total of that particular product spent in that particular store.
I also want to have pagination.
What I tried
This is the configuration I used:
"cubes": [
{
"name": "sales",
"dimensions": ["product", "store"],
"joins": [
{"master":"product_id", "detail":"product.id"},
{"master":"store_id", "detail":"store.id"}
]
}
],
"dimensions": [
{ "name": "product", "attributes": ["code", "name"] },
{ "name": "store", "attributes": ["code", "address"] }
]
This is the code I used:
result = browser.aggregate(drilldown=['Store','Product'],
order=[("Product.name","asc"), ("Store.name","desc"), ("total_products_sale", "desc")])
I didn't get what I want.
I got it like this:
----------------------------------------------
| product_id | store_id | total_products_sale |
|------------|----------|---------------------|
| 1 | 1 | 7.05 |
| 1 | 2 | 9 |
| 2 | 3 | 2.00 |
| and many more .... |
|---------------------------------------------|
which is the whole table with no pagination and if the products not sold in that store it won't show up as zero.
My question
How do I get what I want?
Do I need to create another data table that aggregates everything by store and product before I use cubes to run the query?
Update
I have read more. I realised that what I want is called dicing as I needed to go across 2 dimensions. See: https://en.wikipedia.org/wiki/OLAP_cube#Operations
Cross-posted at Cubes GitHub issues to get more attention.
This is a pure SQL solution using crosstab() from the additional tablefunc module to pivot the aggregated data. It typically performs better than any client-side alternative. If you are not familiar with crosstab(), read this first:
PostgreSQL Crosstab Query
And this about the "extra" column in the crosstab() output:
Pivot on Multiple Columns using Tablefunc
SELECT product_id, product
, COALESCE(s1, 0) AS s1 -- 1. ... displayed 0 instead of null
, COALESCE(s2, 0) AS s2
, COALESCE(s3, 0) AS s3
, COALESCE(s4, 0) AS s4
, COALESCE(s5, 0) AS s5
FROM crosstab(
'SELECT s.product_id, p.name, s.store_id, s.sum_amount
FROM product p
JOIN (
SELECT product_id, store_id
, sum(amount) AS sum_amount -- 3. SUM total of product spent in store
FROM sales
GROUP BY product_id, store_id
) s ON p.id = s.product_id
ORDER BY s.product_id, s.store_id;'
, 'VALUES (1),(2),(3),(4),(5)' -- desired store_id's
) AS ct (product_id int, product text -- "extra" column
, s1 numeric, s2 numeric, s3 numeric, s4 numeric, s5 numeric)
ORDER BY s3 DESC; -- 2. ... descending order for S3
Produces your desired result exactly (plus product_id).
To include products that have never been sold replace [INNER] JOIN with LEFT [OUTER] JOIN.
SQL Fiddle with base query.
The tablefunc module is not installed on sqlfiddle.
Major points
Read the basic explanation in the reference answer for crosstab().
I am including with product_id because product.name is hardly unique. This might otherwise lead to sneaky errors conflating two different products.
You don't need the store table in the query if referential integrity is guaranteed.
ORDER BY s3 DESC works, because s3 references the output column where NULL values have been replaced with COALESCE. Else we would need DESC NULLS LAST to sort NULL values last:
PostgreSQL sort by datetime asc, null first?
For building crosstab() queries dynamically consider:
Dynamic alternative to pivot with CASE and GROUP BY
I also want to have pagination.
That last item is fuzzy. Simple pagination can be had with LIMIT and OFFSET:
Displaying data in grid view page by page
I would consider a MATERIALIZED VIEW to materialize results before pagination. If you have a stable page size I would add page numbers to the MV for easy and fast results.
To optimize performance for big result sets, consider:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
Optimize query with OFFSET on large table

Query to combine two tables into one based on timestamp

I have three tables in Postgres. They are all about a single event (an occurrence, not "sports event"). Each table is about a specific item during the event.
table_header columns
gid, start_timestamp, end_timestamp, location, positions
table_item1 columns
gid, side, visibility, item1_timestamp
table_item2 columns
gid, position_id, name, item2_timestamp
I've tried the following query:
SELECT h.gid, h.location, h.start_timestamp, h.end_timestamp, i1.side,
i1.visibility, i2.position_id, i2.name, i2.item2_timestamp AS timestamp
FROM tablet_header AS h
LEFT OUTER JOIN table_item1 i1 on (i1.gid = h.gid)
LEFT OUTER JOIN table_item2 i2 on (i2.gid = i1.gid AND
i1.item1_timestamp = i2.item2_timestamp)
WHERE h.start_timestamp BETWEEN '2016-03-24 12:00:00'::timestamp AND now()::timestamp
The problem is that I'm losing some data from rows when item1_timestamp and item2_timestamp do not match.
So if I have in table_item1 and table_item2:
gid | item1_timestamp | side gid | item2_timestamp | name
---------------------------- -----------------------------------
1 | 17:00:00 | left 1 | 17:00:00 | charlie
1 | 17:00:05 | right 1 | 17:00:03 | frank
1 | 17:00:10 | left 1 | 17:00:06 | dee
I would want the final output to be:
gid | timestamp | side | name
-----------------------------
1 | 17:00:00 | left | charlie
1 | 17:00:03 | | frank
1 | 17:00:05 | right |
1 | 17:00:06 | | dee
1 | 17:00:10 | left |
based purely on the timestamp (and gid). Naturally I would have the header info in there too, but that's trivial.
I tried playing around with the query I posted used different JOINs and UNIONs, but I cannot seem to get it right. The one I posted gives the best results I could manage, but it's incomplete.
Side note: every minute or so there will be a new "event". So the gid will be unique to each event and the query needs to ensure that each dataset is paired with data from the same gid. Which is the reason for my i1.gid = h.gid lines. Data between different events should not be compared.
select t1.gid, t1.timestamp, t1.side, t2.name
from t1
left join t2 on t2.timestamp=t1.timestamp and t2.gid=t1.gid
union
select t1.gid, t1.timestamp, t1.side, t2.name
from t2
left join t1 on t2.timestamp=t1.timestamp and t2.gid=t1.gid

DB2, get all rows with 1/100 of a column

I have these rows in my product table:
product_name | product_code | percentage.
prod1#00X | 1 | 50
prod2#00X | 2 | 20
prod3#00X | 3 | 30
I wanna select all the elements of my table but I wanna show 1/100 of the percentage
The result should be:
prod1#00X | 1 | 0.50
prod2#00X | 2 | 0.20
prod3#00X | 3 | 0.30
How can I do?
I wanna find another solution not this:
SELECT product_name, product_code, (percentage/100) as percentage FROM product
Note: I have several columns in my table, not only product_name | product_code | percentage.
Try this way:
SELECT (percentage/100) as percentage,*
FROM product
OR
Using another alias:
SELECT *,(percentage/100) as NewPercentage
FROM product
you can use select statement as
SELECT *
, (percentage/100) as newpercentage
FROM product
where (your condition )