Is there a way to select elements associated with checked items without using multiple SELECT statements? - postgresql

I'm trying to make a query that selects the neighborhoods ids of places that only have all the transport checked in a checkbox list. For instance, if 'Bus' and 'Railway' are checked, it should give me 7,8, and if only 'Railway' is checked, it should give me 7,8,11. The 'transporte' table is like this
b_codigo | tipo_transporte
----------+-----------------
1 | Underground
1 | Bus
2 | Bus
2 | Underground
3 | Bus
3 | Underground
4 | Bus
4 | RENFE
4 | Underground
5 | RENFE
5 | Underground
5 | Bus
5 | Tram
6 | Bus
6 | Underground
7 | RENFE
7 | Underground
7 | Bus
7 | Railway (FGC)
8 | Underground
8 | Railway (FGC)
8 | Bus
9 | Underground
9 | Bus
10 | Underground
10 | Bus
11 | Railway (FGC)
11 | Underground
12 | Bus
I tried with a query of the form
SELECT DISTINCT b_codigo
FROM transporte
WHERE (b_codigo, 'checked1') IN (SELECT * FROM transporte)
AND (b_codigo, 'checked2') IN (SELECT * FROM transporte)
AND ...
and another of the form
SELECT b_codigo
FROM transporte
WHERE tipo_transporte = 'checked1'
INTERSECT
SELECT b_codigo
FROM transporte
WHERE tipo_transporte = 'checked2'
INTERSECT
...;
and both give me the same results, but I'm worried about the efficiency of this two queries.
Is there a way of doing the same query without using N SELECT statements with N the number of checked boxes?

One way to do it, is to use aggregation:
select b_codigo
from transporte
where tipo_transporte in ('Bus', 'Railway (FGC)')
group by b_codigo
having count(distinct tipo_transporte) = 2
The number to compare to with the HAVING clause, needs to match the number of elements for the IN clause.

Related

PostgreSQL - Setting null values to missing rows in a join statement

SQL newbie here. I'm trying to write a query that generates a scoring table, setting null to a student's grades in a module for which they haven't yet taken their exams (on PostgreSQL).
So I start with tables that look something like this:
student_evaluation:
|student_id| module_id | course_id |grade |
|----------|-----------|-----------|-------|
| 1 | 1 | 1 |3 |
| 1 | 1 | 1 |7 |
| 1 | 2 | 1 |8 |
| 2 | 4 | 2 |9 |
course_module:
| module_id | course_id |
| ---------- | --------- |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
In our use case, a course is made up of several modules. Each module has a single exam, but a student who failed his exam may have a couple of retries. The same module may also be present in different courses, but an exam attempt only counts for one instance of the module (ie. student A passed module 1's exam on course 1. If course 2 also has module 1, student A has to retake the same exam for course 2 if he also has access to that course).
So the output should look like this:
student_id
module_id
course_id
grade
1
1
1
3
1
1
1
7
1
2
1
8
1
3
1
null
2
4
2
9
I feel like this should have been a simple task, but I think I have a very flawed understanding of how outer and cross joins work. I have tried stuff like:
SELECT se.student_id, se.module_id, se.course_id, se.grade FROM student_evaluation se
RIGHT OUTER JOIN course_module ON course_module.course_id = se.course_id
AND course_module.module_id = se.module_id
or
SELECT se.student_id, se.module_id, se.course_id, se.grade FROM student_evaluation se
CROSS JOIN course_module WHERE course_module.course_id = se.course_id
Neither worked. These all feel wrong, but I'm lost as to what would be the proper way to go about this.
Thank you in advance.
I think you need both join types: first use a cross join to build a list of all combinations of students and courses, then use an outer join to add the grades.
SELECT sc.student_id,
sc.module_id,
sc.course_id,
se.grade
FROM student_evaluation se
RIGHT JOIN (SELECT s.student_id,
c.module_id,
c.course_id
FROM (SELECT DISTINCT student_id
FROM student_evaluation) AS s
CROSS JOIN course_module AS c) AS sc
USING (course_id));

A Postgres query to get subtraction of a value in a row by the value in the next row

I have a table like(mytable):
id | value
=========
1 | 4
2 | 5
3 | 8
4 | 16
5 | 8
...
I need a query to give me subtraction on each rows by next row:
id | value | diff
=================
1 | 4 | 4 (4-Null)
2 | 5 | 1 (5-4)
3 | 8 | 3 (8-5)
4 | 16 | 8 (16-8)
5 | 8 | -8 (8-16)
...
Right now I use a python script to do so, but I guess it's faster if I create a view from this table.
You should use window functions - LAG() in this case:
SELECT id, value, value - LAG(value, 1) OVER (ORDER BY id) AS diff
FROM mytable
ORDER BY id;

How to regexp split to table regexed splitted table

I have table with combined string and I want to split it to first parts. I have results from query with regexped split to table.
Now i have split from this: 1:9,5:4,4:8,6:9,3:9,2:5,7:8,34:8,24:6
to this table:
campaign_skill
----------------
1:9
5:4
4:8
6:9
3:9
2:5
7:8
34:8
24:6
with this expression:
select *
from regexp_split_to_table((select user_skill from users where user_token = 'ded8ab43-efe2-4aea-894d-511ed3505261'), E'[\\s,]+') as campaign_skill
How to split actual results to tables like this:
campaign | skill
---------|------
1 | 9
5 | 4
4 | 8
6 | 9
3 | 9
2 | 5
7 | 8
34 | 8
24 | 6
You can use split_part() for that.
select split_part(t.campaign_skill, ':', 1) as campaign,
split_part(t.campaign_skill, ':', 2) as skill
from users u,
regexp_split_to_table(u.user_skill, E'[\\s,]+') as t(campaign_skill)
where u.user_token = 'ded8ab43-efe2-4aea-894d-511ed3505261';

How would you create a group identifier based on one column, but sorted by another?

I am attempting to create column Group via T-SQL.
If a cluster of accounts are in a row, consider that as one group. if the account is seen again lower in the list (cluster or not), then consider it a new group. This seems straight forward, but I cannot seem to see the solution... Below there are three clusters of account 3456, each having a different group number (Group 1,4, and 6)
+-------+---------+------+
| Group | Account | Sort |
+-------+---------+------+
| 1 | 3456 | 1 |
| 1 | 3456 | 2 |
| 2 | 9878 | 3 |
| 3 | 5679 | 4 |
| 4 | 3456 | 5 |
| 4 | 3456 | 6 |
| 4 | 3456 | 7 |
| 5 | 1295 | 8 |
| 6 | 3456 | 9 |
+-------+---------+------+
UPDATE: I left this out of the original requirements, but a cluster of accounts could have more than two accounts. I updated the example data to include this scenario.
Here's how I'd do it:
--Sample Data
DECLARE #table TABLE (Account INT, Sort INT);
INSERT #table
VALUES (3456,1),(3456,2),(9878,3),(5679,4),(3456,5),(3456,6),(1295,7),(3456,8);
--Solution
SELECT [Group] = DENSE_RANK() OVER (ORDER BY grouper.groupID), grouper.Account, grouper.Sort
FROM
(
SELECT t.*, groupID = ROW_NUMBER() OVER (ORDER BY t.sort) +
CASE t.Account WHEN LEAD(t.Account,1) OVER (ORDER BY t.sort) THEN 1 ELSE 0 END
FROM #table AS t
) AS grouper;
Results:
Group Account Sort
------- ----------- -----------
1 3456 1
1 3456 2
2 9878 3
3 5679 4
4 3456 5
4 3456 6
5 1295 7
6 3456 8
Update based on OPs comment below (20190508)
I spent a couple days banging my head on how to handle groups of three or more; it was surprisingly difficult but what I came up with handles bigger clusters and is way better than my first answer. I updated the sample data to include bigger clusters.
Note that I include a UNIQUE constraint for the sort column - this creates a unique index. You don't need the constraint for this solution to work but, having an index on that column (clustered, nonclustered unique or just nonclustered) will improve the performance dramatically.
--Sample Data
DECLARE #table TABLE (Account INT, Sort INT UNIQUE);
INSERT #table
VALUES (3456,1),(3456,2),(9878,3),(5679,4),(3456,5),(3456,6),(1295,7),(1295,8),(1295,9),(1295,10),(3456,11);
-- Better solution
WITH Groups AS
(
SELECT t.*, Grouper =
CASE t.Account WHEN LAG(t.Account,1,t.Account) OVER (ORDER BY t.Sort) THEN 0 ELSE 1 END
FROM #table AS t
)
SELECT [Group] = SUM(sg.Grouper) OVER (ORDER BY sg.Sort)+1, sg.Account, sg.Sort
FROM Groups AS sg;
Results:
Group Account Sort
----------- ----------- -----------
1 3456 1
1 3456 2
2 9878 3
3 5679 4
4 3456 5
4 3456 6
5 1295 7
5 1295 8
5 1295 9
5 1295 10
6 3456 11

Find the proportion of each X consisting of Y in PostgreSQL?

I have a big database of Magic: the Gathering cards and decklists. The table of cards contains the type and converted mana cost of each card (among other things). The decks are stored using two tables: a table of the decks themselves called "decks", and a table called "deckmembers", in which each row contains the ID of a deck, the ID of a card contained in that deck, and the number of copies of that card that appear in the deck.
What I want is a view of this data in which the rows are:
deck: the ID of a deck
cmc: a converted mana cost appearing on at least one card in that deck
proportion: the percentage of the nonland cards in that deck that have that cmc
Or am I better off deriving this data in Python or R or something?
This question is conceptually similar, but no one has answered it.
EDIT:
Since you asked, here's some example data:
cards:
id | name | fulltype | cmc
----+---------------------+-----------------+-----
1 | "Ach! Hans, Run!" | Enchantment | 6
2 | 1996 World Champion | Summon _ Legend | 5
4 | AWOL | Instant | 3
5 | Abandon Hope | Sorcery | 2
6 | Abandon Reason | Instant | 3
decks:
id | name
----+-----------------
1 | RDW
2 | Red Deck Recall
3 | RDW
4 | Red Deck Wins
5 | Red Deck Wins
deckmembers:
deck | card | count
------+-------+-------
1 | 14031 | 1
1 | 15011 | 1
1 | 14263 | 1
1 | 12966 | 1
1 | 12536 | 1
Any deck will have many cards. Any card may appear in many decks. Each card has an integer from 0-12 associated it which is called its "converted mana cost" or CMC. That's all you need to know. Don't bother learning to play Magic on my account.
And what I want might look something like:
deck | cmc | perc
------+-------+-------
1 | 1 | 11
1 | 2 | 11
1 | 3 | 11
1 | 4 | 11
1 | 5 | 11
Where "perc" in the first row says that 11 percent of the cards in the deck with with the id 1 have cmc 1.
Solved it!
SELECT
d.id,
c.cmc,
(CAST(SUM(m.count) AS FLOAT) /
(SELECT
CAST(SUM(m1.count) AS FLOAT)
FROM deckmembers AS m1
JOIN cards AS c1 ON c1.id=m1.card
WHERE NOT m1.sideboard
AND c1.fulltype NOT LIKE '%Land%'
AND m1.deck=d.id)
) * 100 AS perc
FROM deckmembers AS m
JOIN decks AS d ON d.id=m.deck
JOIN cards AS c ON c.id=m.card
WHERE NOT m.sideboard
AND c.fulltype NOT LIKE '%Land%'
GROUP BY d.id, c.cmc;
I also posted a solution to the simpler version of the problem here.