Find the proportion of each X consisting of Y in PostgreSQL? - postgresql

I have a big database of Magic: the Gathering cards and decklists. The table of cards contains the type and converted mana cost of each card (among other things). The decks are stored using two tables: a table of the decks themselves called "decks", and a table called "deckmembers", in which each row contains the ID of a deck, the ID of a card contained in that deck, and the number of copies of that card that appear in the deck.
What I want is a view of this data in which the rows are:
deck: the ID of a deck
cmc: a converted mana cost appearing on at least one card in that deck
proportion: the percentage of the nonland cards in that deck that have that cmc
Or am I better off deriving this data in Python or R or something?
This question is conceptually similar, but no one has answered it.
EDIT:
Since you asked, here's some example data:
cards:
id | name | fulltype | cmc
----+---------------------+-----------------+-----
1 | "Ach! Hans, Run!" | Enchantment | 6
2 | 1996 World Champion | Summon _ Legend | 5
4 | AWOL | Instant | 3
5 | Abandon Hope | Sorcery | 2
6 | Abandon Reason | Instant | 3
decks:
id | name
----+-----------------
1 | RDW
2 | Red Deck Recall
3 | RDW
4 | Red Deck Wins
5 | Red Deck Wins
deckmembers:
deck | card | count
------+-------+-------
1 | 14031 | 1
1 | 15011 | 1
1 | 14263 | 1
1 | 12966 | 1
1 | 12536 | 1
Any deck will have many cards. Any card may appear in many decks. Each card has an integer from 0-12 associated it which is called its "converted mana cost" or CMC. That's all you need to know. Don't bother learning to play Magic on my account.
And what I want might look something like:
deck | cmc | perc
------+-------+-------
1 | 1 | 11
1 | 2 | 11
1 | 3 | 11
1 | 4 | 11
1 | 5 | 11
Where "perc" in the first row says that 11 percent of the cards in the deck with with the id 1 have cmc 1.

Solved it!
SELECT
d.id,
c.cmc,
(CAST(SUM(m.count) AS FLOAT) /
(SELECT
CAST(SUM(m1.count) AS FLOAT)
FROM deckmembers AS m1
JOIN cards AS c1 ON c1.id=m1.card
WHERE NOT m1.sideboard
AND c1.fulltype NOT LIKE '%Land%'
AND m1.deck=d.id)
) * 100 AS perc
FROM deckmembers AS m
JOIN decks AS d ON d.id=m.deck
JOIN cards AS c ON c.id=m.card
WHERE NOT m.sideboard
AND c.fulltype NOT LIKE '%Land%'
GROUP BY d.id, c.cmc;
I also posted a solution to the simpler version of the problem here.

Related

PostgreSQL - Setting null values to missing rows in a join statement

SQL newbie here. I'm trying to write a query that generates a scoring table, setting null to a student's grades in a module for which they haven't yet taken their exams (on PostgreSQL).
So I start with tables that look something like this:
student_evaluation:
|student_id| module_id | course_id |grade |
|----------|-----------|-----------|-------|
| 1 | 1 | 1 |3 |
| 1 | 1 | 1 |7 |
| 1 | 2 | 1 |8 |
| 2 | 4 | 2 |9 |
course_module:
| module_id | course_id |
| ---------- | --------- |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
In our use case, a course is made up of several modules. Each module has a single exam, but a student who failed his exam may have a couple of retries. The same module may also be present in different courses, but an exam attempt only counts for one instance of the module (ie. student A passed module 1's exam on course 1. If course 2 also has module 1, student A has to retake the same exam for course 2 if he also has access to that course).
So the output should look like this:
student_id
module_id
course_id
grade
1
1
1
3
1
1
1
7
1
2
1
8
1
3
1
null
2
4
2
9
I feel like this should have been a simple task, but I think I have a very flawed understanding of how outer and cross joins work. I have tried stuff like:
SELECT se.student_id, se.module_id, se.course_id, se.grade FROM student_evaluation se
RIGHT OUTER JOIN course_module ON course_module.course_id = se.course_id
AND course_module.module_id = se.module_id
or
SELECT se.student_id, se.module_id, se.course_id, se.grade FROM student_evaluation se
CROSS JOIN course_module WHERE course_module.course_id = se.course_id
Neither worked. These all feel wrong, but I'm lost as to what would be the proper way to go about this.
Thank you in advance.
I think you need both join types: first use a cross join to build a list of all combinations of students and courses, then use an outer join to add the grades.
SELECT sc.student_id,
sc.module_id,
sc.course_id,
se.grade
FROM student_evaluation se
RIGHT JOIN (SELECT s.student_id,
c.module_id,
c.course_id
FROM (SELECT DISTINCT student_id
FROM student_evaluation) AS s
CROSS JOIN course_module AS c) AS sc
USING (course_id));

Is there a way to select elements associated with checked items without using multiple SELECT statements?

I'm trying to make a query that selects the neighborhoods ids of places that only have all the transport checked in a checkbox list. For instance, if 'Bus' and 'Railway' are checked, it should give me 7,8, and if only 'Railway' is checked, it should give me 7,8,11. The 'transporte' table is like this
b_codigo | tipo_transporte
----------+-----------------
1 | Underground
1 | Bus
2 | Bus
2 | Underground
3 | Bus
3 | Underground
4 | Bus
4 | RENFE
4 | Underground
5 | RENFE
5 | Underground
5 | Bus
5 | Tram
6 | Bus
6 | Underground
7 | RENFE
7 | Underground
7 | Bus
7 | Railway (FGC)
8 | Underground
8 | Railway (FGC)
8 | Bus
9 | Underground
9 | Bus
10 | Underground
10 | Bus
11 | Railway (FGC)
11 | Underground
12 | Bus
I tried with a query of the form
SELECT DISTINCT b_codigo
FROM transporte
WHERE (b_codigo, 'checked1') IN (SELECT * FROM transporte)
AND (b_codigo, 'checked2') IN (SELECT * FROM transporte)
AND ...
and another of the form
SELECT b_codigo
FROM transporte
WHERE tipo_transporte = 'checked1'
INTERSECT
SELECT b_codigo
FROM transporte
WHERE tipo_transporte = 'checked2'
INTERSECT
...;
and both give me the same results, but I'm worried about the efficiency of this two queries.
Is there a way of doing the same query without using N SELECT statements with N the number of checked boxes?
One way to do it, is to use aggregation:
select b_codigo
from transporte
where tipo_transporte in ('Bus', 'Railway (FGC)')
group by b_codigo
having count(distinct tipo_transporte) = 2
The number to compare to with the HAVING clause, needs to match the number of elements for the IN clause.

PostgreSQL One ID multiple values

I have a Postgres table where one id may have multiple Channel values as follows
ID |Channel | Column 3 | Column 4
_____|________|__________|_________
1 | Sports | x | null
1 | Organic| x | z
2 | Organic| null | q
3 | Arts | b | w
3 | Organic| e | r
4 | Sports | sp | t
No ID will have a duplicate channel name, and no ID will be both Sports and Arts. That is, ID 1 could have a Sports and Organic channel, a Sports and Arts channel, but not two sports or two organic entries and not a Sports and Arts channel. I want all IDs to be in the query, but if there is a non-organic channel I prefer that. The result I would want would be
ID |Channel | Column 3 | Column 4
_____|________|__________|_________
1 | Sports | x | null
2 | Organic| null | q
3 | Arts | b | w
4 | Sports | sp | t
I feel like there is some CTE here, a rank and partition or something that could do the trick, but I'm just not getting it. I'm only including Columns 3 and 4 to show there are extra columns.
Does anyone have any ideas on the code to deploy here?
You could use DISTINCT ON with an appropriate ORDER BY clause:
SELECT DISTINCT ON (id)
id, channel, column3, column4
FROM atable
ORDER BY id, channel = 'Organic';
This relies on the fact that FALSE < TRUE.
I ended up using a rank over function
ROW_NUMBER () over (partition by salesforce_id order by case when channel is organic then 0 else 1 end desc, timestamp desc) as id_rank
I didn't include in the original question that I had a timestamp! This works now. Thanks

How to get back aggregate values across 2 dimensions using Python Cubes?

Situation
Using Python 3, Django 1.9, Cubes 1.1, and Postgres 9.5.
These are my datatables in pictorial form:
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to use cubes to be able to do a display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
Note the following:
Even though there were no records in sales for Saucer 12 under Store S3, I displayed 0 instead of null or none.
I want to be able to do sort by store, say descending order for, S3.
The cells indicate the SUM total of that particular product spent in that particular store.
I also want to have pagination.
What I tried
This is the configuration I used:
"cubes": [
{
"name": "sales",
"dimensions": ["product", "store"],
"joins": [
{"master":"product_id", "detail":"product.id"},
{"master":"store_id", "detail":"store.id"}
]
}
],
"dimensions": [
{ "name": "product", "attributes": ["code", "name"] },
{ "name": "store", "attributes": ["code", "address"] }
]
This is the code I used:
result = browser.aggregate(drilldown=['Store','Product'],
order=[("Product.name","asc"), ("Store.name","desc"), ("total_products_sale", "desc")])
I didn't get what I want.
I got it like this:
----------------------------------------------
| product_id | store_id | total_products_sale |
|------------|----------|---------------------|
| 1 | 1 | 7.05 |
| 1 | 2 | 9 |
| 2 | 3 | 2.00 |
| and many more .... |
|---------------------------------------------|
which is the whole table with no pagination and if the products not sold in that store it won't show up as zero.
My question
How do I get what I want?
Do I need to create another data table that aggregates everything by store and product before I use cubes to run the query?
Update
I have read more. I realised that what I want is called dicing as I needed to go across 2 dimensions. See: https://en.wikipedia.org/wiki/OLAP_cube#Operations
Cross-posted at Cubes GitHub issues to get more attention.
This is a pure SQL solution using crosstab() from the additional tablefunc module to pivot the aggregated data. It typically performs better than any client-side alternative. If you are not familiar with crosstab(), read this first:
PostgreSQL Crosstab Query
And this about the "extra" column in the crosstab() output:
Pivot on Multiple Columns using Tablefunc
SELECT product_id, product
, COALESCE(s1, 0) AS s1 -- 1. ... displayed 0 instead of null
, COALESCE(s2, 0) AS s2
, COALESCE(s3, 0) AS s3
, COALESCE(s4, 0) AS s4
, COALESCE(s5, 0) AS s5
FROM crosstab(
'SELECT s.product_id, p.name, s.store_id, s.sum_amount
FROM product p
JOIN (
SELECT product_id, store_id
, sum(amount) AS sum_amount -- 3. SUM total of product spent in store
FROM sales
GROUP BY product_id, store_id
) s ON p.id = s.product_id
ORDER BY s.product_id, s.store_id;'
, 'VALUES (1),(2),(3),(4),(5)' -- desired store_id's
) AS ct (product_id int, product text -- "extra" column
, s1 numeric, s2 numeric, s3 numeric, s4 numeric, s5 numeric)
ORDER BY s3 DESC; -- 2. ... descending order for S3
Produces your desired result exactly (plus product_id).
To include products that have never been sold replace [INNER] JOIN with LEFT [OUTER] JOIN.
SQL Fiddle with base query.
The tablefunc module is not installed on sqlfiddle.
Major points
Read the basic explanation in the reference answer for crosstab().
I am including with product_id because product.name is hardly unique. This might otherwise lead to sneaky errors conflating two different products.
You don't need the store table in the query if referential integrity is guaranteed.
ORDER BY s3 DESC works, because s3 references the output column where NULL values have been replaced with COALESCE. Else we would need DESC NULLS LAST to sort NULL values last:
PostgreSQL sort by datetime asc, null first?
For building crosstab() queries dynamically consider:
Dynamic alternative to pivot with CASE and GROUP BY
I also want to have pagination.
That last item is fuzzy. Simple pagination can be had with LIMIT and OFFSET:
Displaying data in grid view page by page
I would consider a MATERIALIZED VIEW to materialize results before pagination. If you have a stable page size I would add page numbers to the MV for easy and fast results.
To optimize performance for big result sets, consider:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
Optimize query with OFFSET on large table

How to show all recursive results with hierarchyid sql

I have a table categories:
ID | NAME | PARENT ID | POSITION | LEVEL | ORDER
----------------------------------------------------------------------------
1 | root | -1 | 0x | 0 | 255
2 | cars | 1 | 0x58 | 1 | 10
5 | trucks | 1 | 0x68 | 1 | 10
13 | city cars | 2 | 0x5AC0 | 2 | 255
14 | offroad cars | 2 | 0x5B40 | 2 | 255
where:
ID int ident
NAME nvarchar(255)
PARENT ID int
POSITION hierarchyid
LEVEL hierarchyid GetLevel()
ORDER tinyint
This table model specifies model name and category where it belongs. Example:
ID | NAME | CATEGORY
-----------------------------
1 | Civic | 13
2 | Pajero | 14
3 | 815 | 5
4 | Avensis | 13
where:
ID int ident
NAME nvarchar(255)
CATEGORY int link to ID category table
What I am trying to do is to be able to show:
all models - would show all models from root recursively,
models within category cars (cars included)
models from city cars (or its children if any)
How do I use hierarchyid for such filtering and how to join the table for results with models? Is that a quick way how to show all model results from certain level?
I believe this would have given you what you were looking for:
declare #id hierarchyid
select #id = Position from Categories where Name = 'root' -- or 'cars', 'city cars', etc.
select m.*
from Models m
join Categories c on m.Category = c.ID
where c.Position.IsDescendantOf(#id) = 1
For more information on the IsDescendantOf method and other hierarchyid methods, check the method reference.
You going to want to use a CTE: Common Table Expression
https://web.archive.org/web/20210927200924/http://www.4guysfromrolla.com/webtech/071906-1.shtml
Introduced in SQL 2005 the allow for an easy way to do hierarchic or recursive relationships.
This is pretty close to your example:
http://www.sqlservercurry.com/2009/06/simple-family-tree-query-using.html