Query table with sum of ALL previous positions, excluding current position - postgresql

I have a database table with:
id | date | position | name
--------------------------------------
1 | 2016-06-29 | 9 | Ben Smith
2 | 2016-06-29 | 1 | Ben Smith
3 | 2016-06-29 | 5 | Ben Smith
4 | 2016-06-29 | 6 | Ben Smith
5 | 2016-06-30 | 2 | Ben Smith
6 | 2016-06-30 | 2 | Tom Brown
7 | 2016-06-29 | 4 | Tom Brown
8 | 2016-06-30 | 2 | Tom Brown
9 | 2016-06-30 | 1 | Tom Brown
How can I query the table efficiently so that I can get a new column using sum().
I expect the table output to look like this
id | date | position | name | races | wins | places
--------------------------------------------------------------
1 | 2016-06-29 | 9 | Ben Smith | 1 | 0 | 0
2 | 2016-06-29 | 1 | Ben Smith | 2 | 1 | 0
3 | 2016-06-29 | 5 | Ben Smith | 3 | 1 | 0
4 | 2016-06-29 | 6 | Ben Smith | 4 | 1 | 0
5 | 2016-06-30 | 2 | Ben Smith | 5 | 1 | 1
6 | 2016-06-30 | 2 | Tom Brown | 1 | 0 | 2
7 | 2016-06-29 | 4 | Tom Brown | 1 | 0 | 2
8 | 2016-06-30 | 2 | Tom Brown | 2 | 0 | 3
9 | 2016-06-30 | 1 | Tom Brown | 4 | 1 | 3

Looks like this can easily be done using window functions:
select id, date, position, name,
row_number(*) over (partition by name, date order by id) as races,
count(*) filter (where position = 1) over (partition by name, date) as wins
from the_table;
I don't understand the logic to calculate the places column though.

#FatFreddy #a_horse_with_no_name
Thanks for getting me started, this is what I came up with. Do you think it can be improved on?
WITH runners AS (
SELECT
r.*,
CASE
WHEN position = 1 THEN 1
ELSE 0
END AS win,
CASE
WHEN position = 2 THEN 1
WHEN position = 3 THEN 1
ELSE 0
END AS place
FROM
runners r
ORDER BY id
)
SELECT
date,
r.id,
r.position,
name,
row_number(*) OVER foo AS races,
sum(win) OVER foo AS win,
sum(place) OVER foo AS place
FROM
runners r
LEFT JOIN markets m ON m.id = r.market_id
WINDOW foo AS (PARTITION BY name) ORDER BY r.id)

Related

postgresql: query two tables with same column names and show the result side by side ordered their column names, which occur in both tables

Having two tables (table1, table2) with the same column names (generation, parent), the desired output would be the combination of all columns of both tables. Thereby the rows of table2 should join table1 so that the rows of table2 are matching those of table1 on generation column. The parent number should be ordered ascending for the entries in table1 as well as in table2. The number of rows of the query results should be equal of those of table1.
Given the following tables
table1:
| generation | parent |
|:----------:|:------:|
| 0 | 1 |
| 0 | 2 |
| 0 | 3 |
| 1 | 3 |
| 1 | 2 |
| 1 | 1 |
| 2 | 2 |
| 2 | 1 |
| 2 | 3 |
table2:
| generation | parent |
|:----------:|:------:|
| 1 | 3 |
| 1 | 1 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
The following queries are thought for creating and populating two sample tables as shown above:
create table table1(generation integer, parent integer);
insert into table1 (generation, parent) values(0,1),(0,2),(0,3),(1,3),(1,2),(1,1),(2,2),(2,1),(2,3);
create table table2(generation integer, parent integer);
insert into table2 (generation, parent) values(1,3),(1,1),(1,3),(2,1),(2,2),(2,3);
the imagined query should lead to the following desired result:
| table1_generation | table1_parent | table2_generation | table2_parent |
|:-----------------:|:-------------:|:-----------------:|:-------------:|
| 0 | 1 | | |
| 0 | 2 | | |
| 0 | 3 | | |
| 1 | 1 | 1 | 1 |
| 1 | 2 | 1 | 3 |
| 1 | 3 | 1 | 3 |
| 2 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 |
| 2 | 3 | 2 | 3 |
Current query looks as follows:
with
p as (
select
generation,
parent
from
table1
order by
generation,
parent
), o as(
select
generation,
parent
from
table2
order by
generation,
parent
)
select
p.generation as table1_generation,
p.parent as table1_parent,
o.generation as table2_generation,
o.parent as table2_parent
from
p
left join o on
o.generation=p.generation;
Which leads to the following result:
| table1_generation | table1_parent | table2_generation | table2_parent |
|:-----------------:|:-------------:|:-----------------:|:-------------:|
| 0 | 1 | | |
| 0 | 2 | | |
| 0 | 3 | | |
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 3 |
| 1 | 1 | 1 | 3 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 1 | 3 |
| 1 | 2 | 1 | 3 |
| 1 | 3 | 1 | 1 |
| 1 | 3 | 1 | 3 |
| 1 | 3 | 1 | 3 |
| 2 | 1 | 2 | 1 |
| 2 | 1 | 2 | 2 |
| 2 | 1 | 2 | 3 |
| 2 | 2 | 2 | 1 |
| 2 | 2 | 2 | 2 |
| 2 | 2 | 2 | 3 |
| 2 | 3 | 2 | 1 |
| 2 | 3 | 2 | 2 |
| 2 | 3 | 2 | 3 |
This link led to the conclusion, that any join command might not what is necessary here ... But union does only append rows... so for me it is absolutely unclear, how the desired result can be achieved o.O
Any help is highly appreciated. Thanks in advance!
The main misunderstanding on this question arose from the fact that you mentioned join, which is a very precisely mathematically defined concept based on the Cartesian product and can be applied to any two sets. So the current output is clear.
But as you wrote in the title, you want to put two tables side by side. You take advantage of the fact that they have the same number of rows (triples).
This select returns the output you want.
I made artificial join columns, row_number() OVER (order by generation, parent) as rnum, and moved the second table using the addition of three. I hope this helps you:
with
p as (
select
row_number() OVER (order by generation, parent) as rnum,
generation,
parent
from
table1
order by
generation,
parent
), o as(
select
row_number() OVER (order by generation, parent) as rnum,
generation,
parent
from
table2
order by
generation,
parent
)
select
p.generation as table1_generation,
p.parent as table1_parent,
o.generation as table2_generation,
o.parent as table2_parent
from
p
left join o on
o.rnum+3=p.rnum
order by 1,2,3,4;
Output:
table1_generation
table1_parent
table2_generation
table2_parent
0
1
(null)
(null)
0
2
(null)
(null)
0
3
(null)
(null)
1
1
1
1
1
2
1
3
1
3
1
3
2
1
2
1
2
2
2
2
2
3
2
3

Query table with array_agg of ALL previous positions, excluding current position

I have a database table with:
id | date | position | name
--------------------------------------
1 | 2016-06-29 | 9 | Ben Smith
2 | 2016-06-29 | 1 | Ben Smith
3 | 2016-06-29 | 5 | Ben Smith
4 | 2016-06-29 | 6 | Ben Smith
5 | 2016-06-30 | 2 | Ben Smith
6 | 2016-06-30 | 2 | Tom Brown
7 | 2016-06-29 | 4 | Tom Brown
8 | 2016-06-30 | 2 | Tom Brown
9 | 2016-06-30 | 1 | Tom Brown
How can I query the table efficiently so that I can get a new column using array_agg().
I have already tried the following query however its incredibly slow and also wrong as it doesn't group the previous_positions by the name column:
SELECT
j.*,
(SELECT array_agg(id) FROM jockeys j2 WHERE j2.id < j.id)
FROM jockeys j
I expect the table output to look like this
id | date | position | name | previous_positions
----------------------------------------------------------
1 | 2016-06-29 | 9 | Ben Smith | {}
2 | 2016-06-29 | 1 | Ben Smith | {9}
3 | 2016-06-29 | 5 | Ben Smith | {9,1}
4 | 2016-06-29 | 6 | Ben Smith | {9,1,5}
5 | 2016-06-30 | 2 | Ben Smith | {9,1,5,6}
6 | 2016-06-30 | 2 | Tom Brown | {}
7 | 2016-06-29 | 4 | Tom Brown | {2}
8 | 2016-06-30 | 2 | Tom Brown | {2,4}
9 | 2016-06-30 | 1 | Tom Brown | {2,4,2}
You may use the WINDOW clause for array_agg
SELECT
j.* , array_agg(position) over w as previous_positions
FROM jockeys j
WINDOW w as
( partition by name ORDER BY id rows between
unbounded preceding and 1 preceding
)
DEMO

SQL Select with root parent

I have a table Members(id, name, parent_id), where parent_id is the parent of the member(it is also a member which can have its parent). For example
id | name | parent_id
----------------------
1 | John | NULL
2 | Smith| 1
3 | Andy | 1
4 | Joe | 2
5 | Rick | 2
6 | Craig| 5
7 | Greg | NULL
8 | Bob | 5
9 | Mike | 8
And I'd like to run statement select from members, and I want to have
id | name | parent_id | root_parent_id
--------------------------------------
1 | John | NULL | NULL
2 | Smith| 1 | 1
3 | Andy | 1 | 1
4 | Joe | 2 | 1
5 | Rick | 2 | 1
6 | Craig| 5 | 1
7 | Greg | NULL | NULL
8 | Bob | 7 | 7
9 | Mike | 8 | 7
I want to find the root_parent_id for all members as deeply as possible. Help me please
with recursive recursive_members as (
select *, id root_id, 1 depth
from members
union all
select r.id, r.name, r.parent_id, m.parent_id, r.depth+ 1
from recursive_members r
join members m on r.root_id = m.id
where m.parent_id notnull
)
select distinct on (id) *
from recursive_members
order by id, depth desc;
id | name | parent_id | root_id | depth
----+-------+-----------+---------+-------
1 | John | | 1 | 1
2 | Smith | 1 | 1 | 2
3 | Andy | 1 | 1 | 2
4 | Joe | 2 | 1 | 3
5 | Rick | 2 | 1 | 3
6 | Craig | 5 | 1 | 4
7 | Greg | | 7 | 1
8 | Bob | 5 | 1 | 4
9 | Mike | 8 | 1 | 5
(9 rows)
Read about recursive WITH queries.

Postgresql: Select sum with different conditions

I have two table table:
I. Table 1 like this:
------------------------------------------
codeid | pos | neg | category
-----------------------------------------
1 | 10 | 3 | begin2016
1 | 3 | 5 | justhere
3 | 7 | 7 | justthere
4 | 1 | 1 | else
4 | 12 | 0 | begin2015
4 | 5 | 12 | begin2013
1 | 2 | 50 | now
2 | 5 | 33 | now
5 | 33 | 0 | Begin2011
5 | 11 | 7 | begin2000
II. Table 2 like this:
------------------------------------------
codeid | codedesc | codegroupid
-----------------------------------------
1 | road runner | 1
2 | bike warrior | 2
3 | lazy driver | 4
4 | clever runner | 1
5 | worker | 3
6 | smarty | 1
7 | sweety | 3
8 | sweeper | 1
I want to have one result like this having two (or more) conditions:
sum pos and neg where codegroupid IN('1', '2', '3')
BUt do not sum pos and neg if category like 'begin%'
So the result will like this:
------------------------------------------
codeid | codedesc | sumpos | sumneg
-----------------------------------------
1 | roadrunner | 5 | 55 => (sumpos = 3+2, because 10 have category like 'begin%' so doesn't sum)
2 | bike warrior | 5 | 33
4 | clever runner | 1 | 1
5 | worker | 0 | 0 => (sumpos=sumneg=0) becase codeid 5 category ilike 'begin%'
Group by codeid, codedesc;
Sumpos is sum(pos) where category NOT ILIKE 'begin%', BUT IF category ILKIE 'begin%' make all pos values become zero (0);
Sumpos is sum(neg) where category NOT ILIKE 'begin%', BUT IF category ILKIE 'begin%' make all neg values become zero;
Any ideas how to do it?
Try:
SELECT
b.codeid,
b.codedesc,
sum(CASE WHEN category LIKE 'begin%' THEN 0 ELSE a.pos END) AS sumpos,
sum(CASE WHEN category LIKE 'begin%' THEN 0 ELSE a.neg END) AS sumneg
FROM
table1 AS a
JOIN
table2 AS b ON a.codeid = b.codeid
WHERE b.codegroupid IN (1, 2, 3)
GROUP BY
b.codeid,
b.codedesc;

I'm a bit new to PostgreSQL and need how to construct complex query

I need to list all the cities you can get to after stopping off at exactly one other city, starting off from any city of my choice. And list with it the distance to the final city and the intermediate city.
The tables in the database consist of cities, with the attributes:
| city_id | name |
1 Edinburgh
2 Newcastle
3 Manchester
citypairs:
| citypair_id | city_id |
1 1
1 2
2 1
2 3
3 2
3 3
and distances:
| citypair_id | distance |
1 1234
2 1324
3 1324
and trains:
| train_id | departure_city_id | destination_city_id |
1 1 2
2 2 3
3 1 3
4 3 2
I haven't put any of the data in but basically if a city.name is chosen at random by me I need to find out which cities I can get to from this city if I go via another city (i.e. in two journeys) and then the distance to the final and intermediate city.
How would you, or how should I, go about forming a query to return the desired table?
Edited to include data and a missing table! As an example you can go from Edinburgh(1) to Manchester(3) via Newcastle(2) and you can go from Edinburgh to Newcastle via Manchester, however you can not go from Manchester to Edinburgh via Newcastle (since a train departs from 3, arrives at 2, but no train from 2 arrives in 1) and this route should not be returned from the query. Apologies for any confusion beforehand.
I've got a CTE that builds a tree of all the destinations.
WITH RECURSIVE trip AS (
SELECT c.city_id AS start_city,
ARRAY[c.city_id] AS route,
cast(c.name AS varchar(100)) AS route_text,
c.city_id AS leg_start_city,
c.city_id AS leg_end_city,
0 AS trip_count,
0 AS leg_length,
0 AS total_length
FROM cities c
UNION ALL
SELECT
trip.start_city,
trip.route || t.destination_city_id,
cast(trip.route_text || ',' || c.name AS varchar(100)),
t.departure_city_id,
t.destination_city_id,
trip.trip_count + 1,
d.distance,
trip.total_length + d.distance
FROM trains t
INNER JOIN trip
ON t.departure_city_id = trip.leg_end_city
INNER JOIN citypairs cps
ON t.departure_city_id = cps.city_id
INNER JOIN citypairs cpe
ON t.destination_city_id = cpe.city_id AND
cpe.citypair_id = cps.citypair_id
INNER JOIN distances d
ON cps.citypair_id = d.citypair_id
INNER JOIN cities c
ON t.destination_city_id = c.city_id
WHERE NOT (array[t.destination_city_id] <# trip.route))
SELECT *
FROM trip
WHERE trip_count = 2
AND start_city = (SELECT city_id FROM cities WHERE name = 'Edinburgh');
The CTE starts from each city (in the non-recursive part at the start), then determines all the destination cities it can go to. It keeps a track of all the cities its been to in an array (the route column), so it won't loop back to itself again. As it progresses, it keeps track of the overall trip distance, and the number of trains taken (in trip_count).
As it goes through the tree, it keeps a running total of the distance.
This gives results of
| START_CITY | ROUTE | ROUTE_TEXT | LEG_START_CITY | LEG_END_CITY | TRIP_COUNT | LEG_LENGTH | TOTAL_LENGTH |
--------------------------------------------------------------------------------------------------------------------------------
| 1 | 1,2,3 | Edinburgh,Newcastle,Manchester | 2 | 3 | 2 | 1324 | 2558 |
| 1 | 1,3,2 | Edinburgh,Manchester,Newcastle | 3 | 2 | 2 | 1324 | 2648 |
If you change remove the final WHERE clause it'll show all the possible trips in the data, likewise you can change the trip_count to find all single train destinations etc.
| START_CITY | ROUTE | ROUTE_TEXT | LEG_START_CITY | LEG_END_CITY | TRIP_COUNT | LEG_LENGTH | TOTAL_LENGTH |
--------------------------------------------------------------------------------------------------------------------------------
| 1 | 1 | Edinburgh | 1 | 1 | 0 | 0 | 0 |
| 2 | 2 | Newcastle | 2 | 2 | 0 | 0 | 0 |
| 3 | 3 | Manchester | 3 | 3 | 0 | 0 | 0 |
| 1 | 1,2 | Edinburgh,Newcastle | 1 | 2 | 1 | 1234 | 1234 |
| 1 | 1,3 | Edinburgh,Manchester | 1 | 3 | 1 | 1324 | 1324 |
| 2 | 2,3 | Newcastle,Manchester | 2 | 3 | 1 | 1324 | 1324 |
| 3 | 3,2 | Manchester,Newcastle | 3 | 2 | 1 | 1324 | 1324 |
| 1 | 1,2,3 | Edinburgh,Newcastle,Manchester | 2 | 3 | 2 | 1324 | 2558 |
| 1 | 1,3,2 | Edinburgh,Manchester,Newcastle | 3 | 2 | 2 | 1324 | 2648 |
The cast( ... as varchar(100)) is a bit hacky, and I'm not sure why it was needed, but I haven't had a chance to get around that yet.
The SQL is here for testing: http://sqlfiddle.com/#!1/93964/24
The first part is easy:
SELECT c2.name
FROM cities AS c
JOIN trains t ON c.city_id=t.departure_city_id
JOIN trains t2 ON t.destination_city_id=t2.departure_city_id
JOIN cities AS c2 ON t2.destination_city_id=c2.city_id
WHERE c2.city_id!=c.city_id
AND c.name='Edinburgh';
http://sqlfiddle.com/#!12/a656f/14
In PG 9.1+ you could even do it with a recursive CTE for any number of cities in between. The distances are a little more complicated and you probably would be better off transforming city_pairs into actual pairs.