I'm a bit new to PostgreSQL and need how to construct complex query - postgresql

I need to list all the cities you can get to after stopping off at exactly one other city, starting off from any city of my choice. And list with it the distance to the final city and the intermediate city.
The tables in the database consist of cities, with the attributes:
| city_id | name |
1 Edinburgh
2 Newcastle
3 Manchester
citypairs:
| citypair_id | city_id |
1 1
1 2
2 1
2 3
3 2
3 3
and distances:
| citypair_id | distance |
1 1234
2 1324
3 1324
and trains:
| train_id | departure_city_id | destination_city_id |
1 1 2
2 2 3
3 1 3
4 3 2
I haven't put any of the data in but basically if a city.name is chosen at random by me I need to find out which cities I can get to from this city if I go via another city (i.e. in two journeys) and then the distance to the final and intermediate city.
How would you, or how should I, go about forming a query to return the desired table?
Edited to include data and a missing table! As an example you can go from Edinburgh(1) to Manchester(3) via Newcastle(2) and you can go from Edinburgh to Newcastle via Manchester, however you can not go from Manchester to Edinburgh via Newcastle (since a train departs from 3, arrives at 2, but no train from 2 arrives in 1) and this route should not be returned from the query. Apologies for any confusion beforehand.

I've got a CTE that builds a tree of all the destinations.
WITH RECURSIVE trip AS (
SELECT c.city_id AS start_city,
ARRAY[c.city_id] AS route,
cast(c.name AS varchar(100)) AS route_text,
c.city_id AS leg_start_city,
c.city_id AS leg_end_city,
0 AS trip_count,
0 AS leg_length,
0 AS total_length
FROM cities c
UNION ALL
SELECT
trip.start_city,
trip.route || t.destination_city_id,
cast(trip.route_text || ',' || c.name AS varchar(100)),
t.departure_city_id,
t.destination_city_id,
trip.trip_count + 1,
d.distance,
trip.total_length + d.distance
FROM trains t
INNER JOIN trip
ON t.departure_city_id = trip.leg_end_city
INNER JOIN citypairs cps
ON t.departure_city_id = cps.city_id
INNER JOIN citypairs cpe
ON t.destination_city_id = cpe.city_id AND
cpe.citypair_id = cps.citypair_id
INNER JOIN distances d
ON cps.citypair_id = d.citypair_id
INNER JOIN cities c
ON t.destination_city_id = c.city_id
WHERE NOT (array[t.destination_city_id] <# trip.route))
SELECT *
FROM trip
WHERE trip_count = 2
AND start_city = (SELECT city_id FROM cities WHERE name = 'Edinburgh');
The CTE starts from each city (in the non-recursive part at the start), then determines all the destination cities it can go to. It keeps a track of all the cities its been to in an array (the route column), so it won't loop back to itself again. As it progresses, it keeps track of the overall trip distance, and the number of trains taken (in trip_count).
As it goes through the tree, it keeps a running total of the distance.
This gives results of
| START_CITY | ROUTE | ROUTE_TEXT | LEG_START_CITY | LEG_END_CITY | TRIP_COUNT | LEG_LENGTH | TOTAL_LENGTH |
--------------------------------------------------------------------------------------------------------------------------------
| 1 | 1,2,3 | Edinburgh,Newcastle,Manchester | 2 | 3 | 2 | 1324 | 2558 |
| 1 | 1,3,2 | Edinburgh,Manchester,Newcastle | 3 | 2 | 2 | 1324 | 2648 |
If you change remove the final WHERE clause it'll show all the possible trips in the data, likewise you can change the trip_count to find all single train destinations etc.
| START_CITY | ROUTE | ROUTE_TEXT | LEG_START_CITY | LEG_END_CITY | TRIP_COUNT | LEG_LENGTH | TOTAL_LENGTH |
--------------------------------------------------------------------------------------------------------------------------------
| 1 | 1 | Edinburgh | 1 | 1 | 0 | 0 | 0 |
| 2 | 2 | Newcastle | 2 | 2 | 0 | 0 | 0 |
| 3 | 3 | Manchester | 3 | 3 | 0 | 0 | 0 |
| 1 | 1,2 | Edinburgh,Newcastle | 1 | 2 | 1 | 1234 | 1234 |
| 1 | 1,3 | Edinburgh,Manchester | 1 | 3 | 1 | 1324 | 1324 |
| 2 | 2,3 | Newcastle,Manchester | 2 | 3 | 1 | 1324 | 1324 |
| 3 | 3,2 | Manchester,Newcastle | 3 | 2 | 1 | 1324 | 1324 |
| 1 | 1,2,3 | Edinburgh,Newcastle,Manchester | 2 | 3 | 2 | 1324 | 2558 |
| 1 | 1,3,2 | Edinburgh,Manchester,Newcastle | 3 | 2 | 2 | 1324 | 2648 |
The cast( ... as varchar(100)) is a bit hacky, and I'm not sure why it was needed, but I haven't had a chance to get around that yet.
The SQL is here for testing: http://sqlfiddle.com/#!1/93964/24

The first part is easy:
SELECT c2.name
FROM cities AS c
JOIN trains t ON c.city_id=t.departure_city_id
JOIN trains t2 ON t.destination_city_id=t2.departure_city_id
JOIN cities AS c2 ON t2.destination_city_id=c2.city_id
WHERE c2.city_id!=c.city_id
AND c.name='Edinburgh';
http://sqlfiddle.com/#!12/a656f/14
In PG 9.1+ you could even do it with a recursive CTE for any number of cities in between. The distances are a little more complicated and you probably would be better off transforming city_pairs into actual pairs.

Related

postgresql: query two tables with same column names and show the result side by side ordered their column names, which occur in both tables

Having two tables (table1, table2) with the same column names (generation, parent), the desired output would be the combination of all columns of both tables. Thereby the rows of table2 should join table1 so that the rows of table2 are matching those of table1 on generation column. The parent number should be ordered ascending for the entries in table1 as well as in table2. The number of rows of the query results should be equal of those of table1.
Given the following tables
table1:
| generation | parent |
|:----------:|:------:|
| 0 | 1 |
| 0 | 2 |
| 0 | 3 |
| 1 | 3 |
| 1 | 2 |
| 1 | 1 |
| 2 | 2 |
| 2 | 1 |
| 2 | 3 |
table2:
| generation | parent |
|:----------:|:------:|
| 1 | 3 |
| 1 | 1 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
The following queries are thought for creating and populating two sample tables as shown above:
create table table1(generation integer, parent integer);
insert into table1 (generation, parent) values(0,1),(0,2),(0,3),(1,3),(1,2),(1,1),(2,2),(2,1),(2,3);
create table table2(generation integer, parent integer);
insert into table2 (generation, parent) values(1,3),(1,1),(1,3),(2,1),(2,2),(2,3);
the imagined query should lead to the following desired result:
| table1_generation | table1_parent | table2_generation | table2_parent |
|:-----------------:|:-------------:|:-----------------:|:-------------:|
| 0 | 1 | | |
| 0 | 2 | | |
| 0 | 3 | | |
| 1 | 1 | 1 | 1 |
| 1 | 2 | 1 | 3 |
| 1 | 3 | 1 | 3 |
| 2 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 |
| 2 | 3 | 2 | 3 |
Current query looks as follows:
with
p as (
select
generation,
parent
from
table1
order by
generation,
parent
), o as(
select
generation,
parent
from
table2
order by
generation,
parent
)
select
p.generation as table1_generation,
p.parent as table1_parent,
o.generation as table2_generation,
o.parent as table2_parent
from
p
left join o on
o.generation=p.generation;
Which leads to the following result:
| table1_generation | table1_parent | table2_generation | table2_parent |
|:-----------------:|:-------------:|:-----------------:|:-------------:|
| 0 | 1 | | |
| 0 | 2 | | |
| 0 | 3 | | |
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 3 |
| 1 | 1 | 1 | 3 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 1 | 3 |
| 1 | 2 | 1 | 3 |
| 1 | 3 | 1 | 1 |
| 1 | 3 | 1 | 3 |
| 1 | 3 | 1 | 3 |
| 2 | 1 | 2 | 1 |
| 2 | 1 | 2 | 2 |
| 2 | 1 | 2 | 3 |
| 2 | 2 | 2 | 1 |
| 2 | 2 | 2 | 2 |
| 2 | 2 | 2 | 3 |
| 2 | 3 | 2 | 1 |
| 2 | 3 | 2 | 2 |
| 2 | 3 | 2 | 3 |
This link led to the conclusion, that any join command might not what is necessary here ... But union does only append rows... so for me it is absolutely unclear, how the desired result can be achieved o.O
Any help is highly appreciated. Thanks in advance!
The main misunderstanding on this question arose from the fact that you mentioned join, which is a very precisely mathematically defined concept based on the Cartesian product and can be applied to any two sets. So the current output is clear.
But as you wrote in the title, you want to put two tables side by side. You take advantage of the fact that they have the same number of rows (triples).
This select returns the output you want.
I made artificial join columns, row_number() OVER (order by generation, parent) as rnum, and moved the second table using the addition of three. I hope this helps you:
with
p as (
select
row_number() OVER (order by generation, parent) as rnum,
generation,
parent
from
table1
order by
generation,
parent
), o as(
select
row_number() OVER (order by generation, parent) as rnum,
generation,
parent
from
table2
order by
generation,
parent
)
select
p.generation as table1_generation,
p.parent as table1_parent,
o.generation as table2_generation,
o.parent as table2_parent
from
p
left join o on
o.rnum+3=p.rnum
order by 1,2,3,4;
Output:
table1_generation
table1_parent
table2_generation
table2_parent
0
1
(null)
(null)
0
2
(null)
(null)
0
3
(null)
(null)
1
1
1
1
1
2
1
3
1
3
1
3
2
1
2
1
2
2
2
2
2
3
2
3

How do I calculate shortest path between Origin - Destination pairs using pgRouting?

First of all I just want to state that I'm very new to GIS and that I'm probably not that great at the terminology yet, so bear with me.
I'm having my internship right now and have been tasked with making a bike commuting potential analysis. The data I'm using is road layer (which I have already created a topology for using pgr_createTopology) and two point layers for where individuals live and work created from the centroids of 500x500m squares.
I have managed to do some sort of calculation between my two point layers using pgr_dijkstraCost that looks like this:
SELECT *
FROM pgr_dijkstraCost(
'SELECT gid AS id,
source,
target,
extlen / 1.3 / 60 AS cost
FROM roads',
array(select source FROM living),
array(select target FROM work),
directed := false);
The source and target value in the living and work test tables has a value from 1 to 50 since I initially though that I could make the calculation by calculating when source and target has the same value. I now know that's not possible since pgr_dijkstra wont allow calculations when they are the same. The result I'm getting right now is for every combination I don't want. The final calculation will be for around 300 000 pairs.
So is there a way for me to only do the calculation on specified pairs and not for every possible combination?
Starting from Version 3.1 there is this signature
pgr_dijkstra(Edges SQL, Combinations SQL, end_vids, [, directed])
RETURNS SET OF (seq, path_seq, start_vid, end_vid, node, edge, cost, agg_cost)
OR EMPTY SET
example usage (taken from the pgRouting documentation)
CREATE TABLE combinations_table (
source BIGINT,
target BIGINT
);
INSERT INTO combinations_table (source, target)
VALUES (1, 2), (1, 4), (2, 1), (2, 4), (2, 17);
SELECT * FROM pgr_dijkstra(
'SELECT id, source, target, cost, reverse_cost FROM edge_table',
'SELECT * FROM combinations_table',
FALSE
);
seq | path_seq | start_vid | end_vid | node | edge | cost | agg_cost
----+----------+-----------+---------+------+------+------+----------
1 | 1 | 1 | 2 | 1 | 1 | 1 | 0
2 | 2 | 1 | 2 | 2 | -1 | 0 | 1
3 | 1 | 1 | 4 | 1 | 1 | 1 | 0
4 | 2 | 1 | 4 | 2 | 2 | 1 | 1
5 | 3 | 1 | 4 | 3 | 3 | 1 | 2
6 | 4 | 1 | 4 | 4 | -1 | 0 | 3
7 | 1 | 2 | 1 | 2 | 1 | 1 | 0
8 | 2 | 2 | 1 | 1 | -1 | 0 | 1
9 | 1 | 2 | 4 | 2 | 2 | 1 | 0
10 | 2 | 2 | 4 | 3 | 3 | 1 | 1
11 | 3 | 2 | 4 | 4 | -1 | 0 | 2
(11 rows)

T-SQL Update Query to enter recurring data in groups of three

I have a table of data, it is duplicated twice in the same table to make three sets.
Its "ReferenceID" is the primary key, i want to in a way group the 3 same ReferenceID's and inject these three values "f2f" "NF2F" "Travel" into the row called "Type" in any order but ensure that each ReferenceID only has one of those values.
For Example:
ReferenceID | Type
------------|-------
1 f2f
1 nf2f
1 Travel
2 f2f
2 nf2f
2 Travel
3 f2f
3 nf2f
3 Travel
etc etc...
Is it possible ?
You can do this with a row_number that you mod by the number of groups you have (in your case 3):
declare #t table(RefType varchar(10));
insert into #t values ('f2f'),('nf2f'),('Travel'),('f2f'),('nf2f'),('Travel'),('f2f'),('nf2f'),('Travel');
select (row_number() over (order by RefType) % 3) + 1 as ReferenceID
,RefType
from #t
order by ReferenceID
,RefType;
Output
+-------------+---------+
| ReferenceID | RefType |
+-------------+---------+
| 1 | f2f |
| 1 | nf2f |
| 1 | Travel |
| 2 | f2f |
| 2 | nf2f |
| 2 | Travel |
| 3 | f2f |
| 3 | nf2f |
| 3 | Travel |
+-------------+---------+

PostgreSQL 9.3:Updating table(order column) from another table, getting same values in rows

I need help with updating table from another table in Postgres Db.
Long story short we ended up with corrupted data in db, and now I need to update one table with values from another.
I have table with this data table wfc:
| step_id | command_id | commands_order |
|---------|------------|----------------|
| 1 | 1 | 0 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 3 |
| 1 | 1 | 0 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 3 | 1 |
| 2 | 4 | 3 |
and I want to update values in command_order column from another table, so I can have result like this:
| step_id | command_id | commands_order|
|---------|------------|---------------|
| 1 | 1 | 0 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 3 |
| 1 | 1 | 4 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 3 | 2 |
| 2 | 4 | 3 |
It was looking like easy task, but problem is to update rows for same command_id, it is writing same value in commands_order
SQL that I tried is:
UPDATE wfc
SET commands_order = CAST(sq.input_step_id as INTEGER)
FROM (
SELECT wfp.step_id, wfp.input_command_id, wfp.input_step_id
from wfp
order by wfp.step_id, wfp.input_step_id
) AS sq
WHERE (wfc.step_id=sq.step_id AND wfc.command_id=CAST(sq.input_command_id as INTEGER));
SQL Fiddle http://sqlfiddle.com/#!15/4efff4/4
I am pretty stuck with this, please help.
Thanks in advance.
Assuming you are trying to number the rows in the order in which they were created, and as long as you understand that ctid will chnage on update and with VACCUUM FULL, you can do the following:
select step_id, command_id, rank - 1 as command_order
from (select step_id, command_id, ctid as wfc_ctid, rank() over
(partition by step_id order by ctid)
from wfc) as wfc_ordered;
This will give you the wfc table with the ordering that you want. If you do update the original table, the ctids will change, so it's probably safer to create a copy of the table with the above query.

How to set sequence number of sub-elements in TSQL unsing same element as parent?

I need to set a sequence inside T-SQL when in the first column I have sequence marker (which is repeating) and use other column for ordering.
It is hard to explain so I try with example.
This is what I need:
|------------|-------------|----------------|
| Group Col | Order Col | Desired Result |
|------------|-------------|----------------|
| D | 1 | NULL |
| A | 2 | 1 |
| C | 3 | 1 |
| E | 4 | 1 |
| A | 5 | 2 |
| B | 6 | 2 |
| C | 7 | 2 |
| A | 8 | 3 |
| F | 9 | 3 |
| T | 10 | 3 |
| A | 11 | 4 |
| Y | 12 | 4 |
|------------|-------------|----------------|
So my marker is A (each time I met A I must start new group inside my result). All rows before first A must be set to NULL.
I know that I can achieve that with loop but it would be slow solution and I need to update a lot of rows (may be sometimes several thousand).
Is there a way to achive this without loop?
You can use window version of COUNT to get the desired result:
SELECT [Group Col], [Order Col],
COUNT(CASE WHEN [Group Col] = 'A' THEN 1 END)
OVER
(ORDER BY [Order Col]) AS [Desired Result]
FROM mytable
If you need all rows before first A set to NULL then use SUM instead of COUNT.
Demo here