SUM of two level group by in postgresql - postgresql

I have three table as given below
student
id name stand_id sub_id gender
---------------------------------------
1 | Joe | 1 | 1 | M
2 | Saun | 2 | 1 | F
3 | Paul | 1 | 2 | F
4 | Sena | 2 | 2 | M
Subject
id name
1 Math
2 English
Standard
id name
1 First
2 Second
How can I achieve this kind of multiple group by like standard, subject than total number of boys and girls.
Should I use with, union or union all ?
First
Math
boys total
girls total
second
math
boys total
girls total

It's not completely clear what you are attempting. My interpretation is that you are looking for the total of students by standard, subject and gender.
If that is correct, you need to join together the tables and count the students at the appropriate grain, like so:
SELECT
sta.name AS standard_name,
sub.name AS subject_name,
CASE stu.gender WHEN 'M' THEN 'Boys' ELSE 'Girls' END AS student_gender,
COUNT(stu.id) AS total
FROM
student stu
JOIN
subject sub
ON (stu.sub_id = sub.id)
JOIN
standard sta
ON (stu.stand_id = sta.id)
GROUP BY
standard_name,
subject_name,
student_gender;
Based on your sample data, it would return this:
standard_name | subject_name | student_gender | total
-----------------------------------------------------
First | Math | Boys | 1
First | English | Girls | 1
Second | Math | Girls | 1
Second | English | Boys | 1

Is it what you are looking for
SELECT sd.name,
sj.name,
count(st.gender) filter (
WHERE st.gender='M') AS MALE,
count(st.gender) filter (
WHERE st.gender='F') AS FEMALE
FROM Standard sd
INNER JOIN Student st ON (st.stand_id=sd.id)
INNER JOIN Subject sj ON (sj.id=st.sub_id)
GROUP BY sd.name,
sj.name;
name | name | male | female
--------+---------+------+--------
First | Math | 1 | 0
First | English | 0 | 1
Second | English | 2 | 1
Second | Math | 0 | 1
(4 rows)
I have added some more rows to second English.

Related

Postgres join when only one row is equal

I have two tables and I am wanting to do an inner join between table_1 and table_2 but only when there is one row in table_2 that meets the join criteria.
For example:
table_1
id | name | age |
-----------------+------------------+--------------+
1 | john jones | 10 |
2 | pete smith | 15 |
3 | mary lewis | 12 |
4 | amy roberts | 13 |
table_2
id | name | age | hair | height |
-----------------+------------------+--------------+--------------+--------------+
1 | john jones | 10 | brown | 100 |
2 | john jones | 10 | blonde | 132 |
3 | mary lewis | 12 | brown | 146 |
4 | pete smith | 15 | black | 171 |
So I want to do a join when name is equal, but only when there is one corresponding matching name in table_2
So my results would look like this:
id | name | age | hair |
-----------------+------------------+--------------+--------------+
2 | pete smith | 15 | black |
3 | mary lewis | 12 | brown |
As you can see, John Jones isn't in the results as there are two corresponding rows in table_2.
My initial code looks like this:
select tb.id,tb.name,tb.age,sc.hair
from table_1 tb
inner join table_2 sc
on tb.name = sc.name and tb.age = sc.age
Can I apply a clause within the join so that it only joins on rows which are unique matches?
Group by all columns and apply having count(*) = 1
select tb.id,tb.name,tb.age,sc.hair
from table_1 tb
join table_2 sc
on tb.name = sc.name and tb.age = sc.age
group by tb.id,tb.name,tb.age,sc.hair
having count(*) = 1
The interesting thing to note is that you don’t need the aggregate expression (in the case count(*) )in the select clause.

PostgreSQL COUNT DISTINCT on one column while checking duplicates of another column

I have a query that results in such a table:
guardian_id | child_id | guardian_name | relation | child_name |
------------|----------|---------------|----------|------------|
1 | 1 | John Doe | father | Doe Son |
2 | 1 | Jane Doe | mother | Doe Son |
3 | 2 | Peter Pan | father | Pan Dghter |
4 | 2 | Pet Pan | mother | Pan Dghter |
1 | 3 | John Doe | father | Doe Dghter |
2 | 3 | Jane Doe | mother | Doe Dghter |
So from these results, I need to count the families. That is, distinct children with the same guardians. From the results above, There are 3 children but 2 families. How can I achieve this?
If I do:
SELECT COUNT(DISTINCT child_id) as families FROM (
//larger query
)a
I'll get 3 which is not correct.
Alternatively, how can I incorporate a WHERE clause that checks DISTINCT guardian_id's? Any other approaches?
Also note that there are instances where a child may have one guardian only.
To get the distinct family you can try the following approach.
select distinct array_agg(distinct guardian_id)
from family
group by child_id;
The above query will return the list of unique families.
eg.
{1,2}
{3,4}
Now you can apply the count on top of it.

What is the fastest way to extract all n-grams of lengths 1, 2, and 3 from a body of text in PostgreSQL?

I have many bodies of text, and for each of them, I want to extract all unigrams, bigrams, and trigrams (words, not characters) and insert the counts and ngram lengths into another table.
Right now I am thinking of unnesting a regexp-splitted body of text using WITH ORDINALITY, and then using multiple subqueries for the bigrams and trigrams, but that requires ordering . However, I think this might be an inefficient way of going about it, since this sort of positional data should normally be accessed by index.
I am currently implementing this in Python, and a huge bottleneck is the dictionary insertion and searching of dictionaries/sets for stopwords.
Here is a very basic example:
Input:
This is a small, small sentence.
Output
ngram | count | length
-------------------------------------
this | 1 | 1
is | 1 | 1
a | 1 | 1
small | 2 | 1
sentence | 1 | 1
this is | 1 | 2
is a | 1 | 2
a small | 1 | 2
small small | 1 | 2
small sentence | 1 | 2
this is a | 1 | 3
is a small | 1 | 3
a small small | 1 | 3
small small sentence | 1 | 3
Stripping the punctuation/handling lowercase is not an issue here, but getting the proper counts is important.
As an preliminary or intermediate step, I would also be removing stopwords which, in this case, are this, a, and is.
ngram | count | length
--------------------------------------
small | 2 | 1
sentence | 1 | 1
small small | 1 | 2
small sentence | 1 | 2
small small sentence | 1 | 3
In the above example
Use the window function lead() to generate bigrams and trigrams, and unions to place all ngrams in a single list. In fact the most difficult was to keep the order in the resultset as in the starting sentence.
with my_table(sentence) as (
values ('This is a small, small sentence.')
),
words as (
select id, word
from my_table,
regexp_split_to_table(lower(sentence), '[^a-zA-Z]+') with ordinality as t(word, id)
where word <> ''
)
select ngram, count(*), length
from (
select distinct on(id, ngram) id, ngram, length
from (
select id, word as ngram, 1 as length
from words
union all
select id, concat_ws(' ', word, lead(word, 1) over w), 2
from words
window w as (order by id)
union all
select id, concat_ws(' ', word, lead(word, 1) over w, lead(word, 2) over w), 3
from words
window w as (order by id)
) s
order by id, ngram, length
) s
group by ngram, length
order by length, min(id);
ngram | count | length
----------------------+-------+--------
this | 1 | 1
is | 1 | 1
a | 1 | 1
small | 2 | 1
sentence | 1 | 1
this is | 1 | 2
is a | 1 | 2
a small | 1 | 2
small small | 1 | 2
small sentence | 1 | 2
this is a | 1 | 3
is a small | 1 | 3
a small small | 1 | 3
small small sentence | 1 | 3
(14 rows)
You can do this with a recursive query:
with recursive words as (
select id, translate(word, '.,', '') as word
from my_table,
regexp_split_to_table(lower(sentence), '\s+') with ordinality as t(word, id)
where word <> ''
), ngrams (id, ngram) as (
select id, array[word]
from words
where word not in ('this', 'a', 'is') -- remove stop words
union all
select c.id, p.ngram||c.word
from words c
join ngrams p on p.id + 1 = c.id
and cardinality(p.ngram) <= 2 -- limit to 3 words
)
select array_to_string(ngram, ' '),
count(*) over (partition by ngram) as "count",
cardinality(ngram) as length
from ngrams
order by cardinality(ngram);
For the sample 'This is a small, small sentence.' this returns:
ngram | count | length
---------------------+-------+-------
a | 1 | 1
is | 1 | 1
sentence | 1 | 1
small | 2 | 1
small | 2 | 1
this | 1 | 1
this is | 1 | 2
small small | 1 | 2
is a | 1 | 2
small sentence | 1 | 2
a small | 1 | 2
is a small | 1 | 3
this is a | 1 | 3
a small small | 1 | 3
small small sentence | 1 | 3
And with stop words removed:
ngram | count | length
---------------------+-------+-------
sentence | 1 | 1
small | 2 | 1
small | 2 | 1
small sentence | 1 | 2
small small | 1 | 2
small small sentence | 1 | 3
Not sure how fast this is going to be though.
Online example: http://rextester.com/CPPU86582

Postgresql : Filtering duplicate pair

I am asking this from mobile, so apologies for bad formatting. For the following table.
Table players
| ID | name |matches_won|
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
| 1 | bob | 3 |
| 2 | Paul | 2 |
| 3 | John | 4 |
| 4 | Jim | 1 |
| 5 | hal | 0 |
| 6 | fin | 0 |
I want to pair two players together in a query. Who have a similar or near similar the number of matches won. So the query should display the following result.
| ID | NAME | ID | NAME |
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
| 3 | John | 1 | bob |
| 2 | paul | 4 | Jim |
| 5 | hal | 6 | fin |
Until now I have tried this query. But it gives repeat pairs.
Select player1.ID,player1.name,player2.ID,player2.name
From player as player1,
player as player2
Where
player1.matches_won >= player2.matches_won
And player1.ID ! = player2.ID;
The query will pair the player with the most won matches with everyone of the other players. While I only want one player to appear only once in the result. With the player who is nearest to his wins.
I have tried sub queries. But I don't know how to go about it, since it only returns one result. Also aggregates don't work in the where clause. So I am not sure how to achieve this.
An easier way, IMHO, to achieve this would be to order the players by their number of wins, divide these ranks by two to create matches and self join. CTEs (with expressions) allow you to do this relatively elegantly:
WITH wins AS (
SELECT id, name, ROW_NUMNBER() OVER (ORDER BY matches_won DESC) AS rn
FROM players
)
SELECT w1.id, w1.name, w2.id, w2.name
FROM (SELECT id, name, rn / 2 AS rn
FROM wins
WHERE rn % 2 = 1) w1
LEFT JOIN (SELECT id, name, (rn - 1) / 2 AS rn
FROM wins
WHERE rn % 2 = 0) w2 ON w1.rn = w2.rn
Add row numbers in descending order by won matches to the table and join odd row numbers with adjacent even row numbers:
with players as (
select *, row_number() over (order by matches_won desc) rn
from player)
select a.id, a.name, b.id, b.name
from players a
join players b
on a.rn = b.rn- 1
where a.rn % 2 = 1
id | name | id | name
----+------+----+------
3 | John | 1 | bob
2 | Paul | 4 | Jim
5 | hal | 6 | fin
(3 rows)

I'm a bit new to PostgreSQL and need how to construct complex query

I need to list all the cities you can get to after stopping off at exactly one other city, starting off from any city of my choice. And list with it the distance to the final city and the intermediate city.
The tables in the database consist of cities, with the attributes:
| city_id | name |
1 Edinburgh
2 Newcastle
3 Manchester
citypairs:
| citypair_id | city_id |
1 1
1 2
2 1
2 3
3 2
3 3
and distances:
| citypair_id | distance |
1 1234
2 1324
3 1324
and trains:
| train_id | departure_city_id | destination_city_id |
1 1 2
2 2 3
3 1 3
4 3 2
I haven't put any of the data in but basically if a city.name is chosen at random by me I need to find out which cities I can get to from this city if I go via another city (i.e. in two journeys) and then the distance to the final and intermediate city.
How would you, or how should I, go about forming a query to return the desired table?
Edited to include data and a missing table! As an example you can go from Edinburgh(1) to Manchester(3) via Newcastle(2) and you can go from Edinburgh to Newcastle via Manchester, however you can not go from Manchester to Edinburgh via Newcastle (since a train departs from 3, arrives at 2, but no train from 2 arrives in 1) and this route should not be returned from the query. Apologies for any confusion beforehand.
I've got a CTE that builds a tree of all the destinations.
WITH RECURSIVE trip AS (
SELECT c.city_id AS start_city,
ARRAY[c.city_id] AS route,
cast(c.name AS varchar(100)) AS route_text,
c.city_id AS leg_start_city,
c.city_id AS leg_end_city,
0 AS trip_count,
0 AS leg_length,
0 AS total_length
FROM cities c
UNION ALL
SELECT
trip.start_city,
trip.route || t.destination_city_id,
cast(trip.route_text || ',' || c.name AS varchar(100)),
t.departure_city_id,
t.destination_city_id,
trip.trip_count + 1,
d.distance,
trip.total_length + d.distance
FROM trains t
INNER JOIN trip
ON t.departure_city_id = trip.leg_end_city
INNER JOIN citypairs cps
ON t.departure_city_id = cps.city_id
INNER JOIN citypairs cpe
ON t.destination_city_id = cpe.city_id AND
cpe.citypair_id = cps.citypair_id
INNER JOIN distances d
ON cps.citypair_id = d.citypair_id
INNER JOIN cities c
ON t.destination_city_id = c.city_id
WHERE NOT (array[t.destination_city_id] <# trip.route))
SELECT *
FROM trip
WHERE trip_count = 2
AND start_city = (SELECT city_id FROM cities WHERE name = 'Edinburgh');
The CTE starts from each city (in the non-recursive part at the start), then determines all the destination cities it can go to. It keeps a track of all the cities its been to in an array (the route column), so it won't loop back to itself again. As it progresses, it keeps track of the overall trip distance, and the number of trains taken (in trip_count).
As it goes through the tree, it keeps a running total of the distance.
This gives results of
| START_CITY | ROUTE | ROUTE_TEXT | LEG_START_CITY | LEG_END_CITY | TRIP_COUNT | LEG_LENGTH | TOTAL_LENGTH |
--------------------------------------------------------------------------------------------------------------------------------
| 1 | 1,2,3 | Edinburgh,Newcastle,Manchester | 2 | 3 | 2 | 1324 | 2558 |
| 1 | 1,3,2 | Edinburgh,Manchester,Newcastle | 3 | 2 | 2 | 1324 | 2648 |
If you change remove the final WHERE clause it'll show all the possible trips in the data, likewise you can change the trip_count to find all single train destinations etc.
| START_CITY | ROUTE | ROUTE_TEXT | LEG_START_CITY | LEG_END_CITY | TRIP_COUNT | LEG_LENGTH | TOTAL_LENGTH |
--------------------------------------------------------------------------------------------------------------------------------
| 1 | 1 | Edinburgh | 1 | 1 | 0 | 0 | 0 |
| 2 | 2 | Newcastle | 2 | 2 | 0 | 0 | 0 |
| 3 | 3 | Manchester | 3 | 3 | 0 | 0 | 0 |
| 1 | 1,2 | Edinburgh,Newcastle | 1 | 2 | 1 | 1234 | 1234 |
| 1 | 1,3 | Edinburgh,Manchester | 1 | 3 | 1 | 1324 | 1324 |
| 2 | 2,3 | Newcastle,Manchester | 2 | 3 | 1 | 1324 | 1324 |
| 3 | 3,2 | Manchester,Newcastle | 3 | 2 | 1 | 1324 | 1324 |
| 1 | 1,2,3 | Edinburgh,Newcastle,Manchester | 2 | 3 | 2 | 1324 | 2558 |
| 1 | 1,3,2 | Edinburgh,Manchester,Newcastle | 3 | 2 | 2 | 1324 | 2648 |
The cast( ... as varchar(100)) is a bit hacky, and I'm not sure why it was needed, but I haven't had a chance to get around that yet.
The SQL is here for testing: http://sqlfiddle.com/#!1/93964/24
The first part is easy:
SELECT c2.name
FROM cities AS c
JOIN trains t ON c.city_id=t.departure_city_id
JOIN trains t2 ON t.destination_city_id=t2.departure_city_id
JOIN cities AS c2 ON t2.destination_city_id=c2.city_id
WHERE c2.city_id!=c.city_id
AND c.name='Edinburgh';
http://sqlfiddle.com/#!12/a656f/14
In PG 9.1+ you could even do it with a recursive CTE for any number of cities in between. The distances are a little more complicated and you probably would be better off transforming city_pairs into actual pairs.