This is to confirm if my design is good enough or get the better ideas to solve the bus routing problem with time. Here is my solution with the primary steps given below:
Have one edges table which represents all the edges (the source and target represent vertices (bus stops):
postgres=# select id, source, target, cost from busedges;
id | source | target | cost
----+--------+--------+------
1 | 1 | 2 | 1
2 | 2 | 3 | 1
3 | 3 | 4 | 1
4 | 4 | 5 | 1
5 | 1 | 7 | 1
6 | 7 | 8 | 1
7 | 1 | 6 | 1
8 | 6 | 8 | 1
9 | 9 | 10 | 1
10 | 10 | 11 | 1
11 | 11 | 12 | 1
12 | 12 | 13 | 1
13 | 9 | 15 | 1
14 | 15 | 16 | 1
15 | 9 | 14 | 1
16 | 14 | 16 | 1
Have a table which represents bus details like from time, to time, edge etc.
NOTE: I have used integer format for "from" and "to" column for faster results as I can do an integer query, but I can replace it with any better format if available.
postgres=# select id, "busedgeId", "busId", "from", "to" from busedgetimes;
id | busedgeId | busId | from | to
----+-----------+-------+-------+-------
18 | 1 | 1 | 33000 | 33300
19 | 2 | 1 | 33300 | 33600
20 | 3 | 2 | 33900 | 34200
21 | 4 | 2 | 34200 | 34800
22 | 1 | 3 | 36000 | 36300
23 | 2 | 3 | 36600 | 37200
24 | 3 | 4 | 38400 | 38700
25 | 4 | 4 | 38700 | 39540
Use dijkstra algorithm to find the nearest path.
Get the upcoming buses from the busedgetimes table in the earliest first order for the nearest path detected by dijkstra algorithm. => This leads to a bit complex query though.
Can I do any kind of improvements to this, or are there any better designs?
Links to docs, articles related to this would be really helpful.
This is totally normal and the regular way to do it. See also,
PgRouting Example
Related
My Situation
I have some tables in my redshift cluster that all break down into either an order_id, shipment_id, or shipment_item_id depending on how granular the table is. order_id is a 1 to many relationship on shipment_id and shipment_id is a 1 to many on shipemnt_item_id.
My Question
I distribute on order_id, so all shipment_id and shipment_item_id records should be on the same nodes across the tables since they are grouped by order_id. My question is, when I have to join on shipment_id or shipment_item_id then will redshift know that the records are on the same nodes, or will it still broadcast the tables since they aren't joined on order_id?
Example Tables
unified_order shipment_details
+----------+-------------+------------------+ +-------------+-----------+--------------+
| order_id | shipment_id | shipment_item_id | | shipment_id | ship_day | ship_details |
+----------+-------------+------------------+ +-------------+-----------+--------------+
| 1 | 1 | 1 | | 1 | 1/1/2017 | stuff |
| 1 | 1 | 2 | | 2 | 5/1/2017 | other stuff |
| 1 | 1 | 3 | | 3 | 6/14/2017 | more stuff |
| 1 | 2 | 4 | | 4 | 5/13/2017 | less stuff |
| 1 | 2 | 5 | | 5 | 6/19/2017 | that stuff |
| 1 | 3 | 6 | | 6 | 7/31/2017 | what stuff |
| 2 | 4 | 7 | | 7 | 2/5/2017 | things |
| 2 | 4 | 8 | +-------------+-----------+--------------+
| 3 | 5 | 9 |
| 3 | 5 | 10 |
| 4 | 6 | 11 |
| 5 | 7 | 12 |
| 5 | 7 | 13 |
+----------+-------------+------------------+
Distribution
distribution_by_node
+------+----------+-------------+------------------+
| node | order_id | shipment_id | shipment_item_id |
+------+----------+-------------+------------------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
| 1 | 1 | 1 | 3 |
| 1 | 1 | 2 | 4 |
| 1 | 1 | 2 | 5 |
| 1 | 1 | 3 | 6 |
| 1 | 5 | 7 | 12 |
| 1 | 5 | 7 | 13 |
| 2 | 2 | 4 | 7 |
| 2 | 2 | 4 | 8 |
| 3 | 3 | 5 | 9 |
| 3 | 3 | 5 | 10 |
| 4 | 4 | 6 | 11 |
+------+----------+-------------+------------------+
The Amazon Redshift documentation does not go into detail how information is shared between nodes, but it is doubtful that it "broadcasts the tables".
Rather, information is probably sent between nodes based on need -- only the relevant columns would be shared, and possibly only sub-ranges of the data.
Rather than worrying too much about the internal implementation, you should test various DISTKEY and SORTKEY strategies against real queries to determine performance.
Follow the recommendations from Choose the Best Distribution Style to minimize the amount of data that needs to be sent between nodes and consult Amazon Redshift Best Practices for Designing Queries to improve queries.
You can EXPLAIN your query to see how data will be distributed (or not) during the execution. In this doc you'll see how to read the query plan:
Evaluating the Query Plan
I want to create a cross table in Spotfire where in which Average is calculated only when there are at least 3 values. If there are no values or less than 3 values the average should be blank.
+-------+-----+---------+
| Month | Age | Average |
+-------+-----+---------+
| 1 | 10 | |
| 2 | 11 | |
| 3 | 2 | 7.7 |
| 4 | | |
| 5 | 13 | |
| 6 | 14 | |
| 7 | | |
| 8 | 19 | |
| 9 | 20 | |
| 10 | 21 | 20 |
+-------+-----+---------+
If I'm understanding you correctly, you want to group by Month, and then have something like this as your aggregation:
If(Count()>2,Avg([Age]),null) as [AverageAge_3Min]
I have this situation, I have one offer, and that offer have n number of dates, and n number of options. So I have two additional tables for offer. And third one, which is a price, but price depends of date, and offer. And it is like this:
| | date 1 | date 2 | date 3 |
| offer 1 | price 11 | price 12 | price 13 |
| offer 2 | price 21 | price 22 | price 23 |
| offer 3 | price 31 | price 32 | price 33 |
Is there any way to create TCA custom field to insert all of this Price values at once?
So, basically I need one table with input fields and to store also uid of date and offer in it as reference.
Make more than one table... Tables with dynamic col count are horrible bad to maintain.
Table Offer:
uid | Name | Desc
1 | offer1 | This is some cool shit
2 | offer2 | dsadsad
3 | offer3 | sdadsdsadsada
Table Date:
uid | date
1 | 12.02.2014
2 | 12.03.2014
3 | 20.03.2014
Table Prices:
uid | date | offer | price
1 | 1 | 1 | price11
2 | 1 | 2 | price21
3 | 1 | 3 | price31
4 | 2 | 1 | price12
5 | 2 | 2 | price22
6 | 2 | 3 | price32
7 | 3 | 1 | price13
8 | 3 | 2 | price23
9 | 3 | 3 | price33
And then its straight forward...
Forgive what may be a silly question, but I'm not much of a database guru.
Here is my table :
id_data | val_no3 | id_prev | id_next
--------+---------+---------+----------
1 | | | 2
2 | 7 | |
3 | | 2 | 4
4 | 5 | |
5 | | 4 | 10
6 | | 4 | 10
7 | | 4 | 10
8 | | 4 | 10
9 | | 4 | 10
10 | 8 | 4 |
In the table below :
id_prev is the value of the id_data which precedes when val_no3 is null
id_next is the value of the id_data which folow when val_no3 is null
And now i would like to have this one :
id_data | val_no3 | id_prev | id_next | val_prev | val_next
--------+---------+---------+----------+----------+----------
1 | | | 2 | | 7
2 | 7 | | | |
3 | | 2 | 4 | 7 | 5
4 | 5 | | | |
5 | | 4 | 10 | 5 | 8
6 | | 4 | 10 | 5 | 8
7 | | 4 | 10 | 5 | 8
8 | | 4 | 10 | 5 | 8
9 | | 4 | 10 | 5 | 8
10 | 8 | | | |
The conditions are as follows:
If val_no3 is null then : val_prev and val_next must be null
If val_no3 is not null then :
val_prev must be equal to the previous value of val_no3 (it should be null if val_no3 which precedes is null too)
val_next must be equal to the following value of val_no3 (it should be null if val_no3 which folows is null too)
I think i might have to use something with lag and lead but i don't know how to do.
I would be very grateful if you could give me your help to resolve this issue, thank you.
No need for analytic functions, just sub-selects. Something like the following (untested) should work:
select
id_data,
val_no3,
id_prev,
id_next,
(select val_no2 from b where id_data = x.id_prev) as val_prev,
(select val_no2 from b where id_data = x.id_next) as val_next
from
b x
order by
id_data;
I know that crosstabs are just for summaries. But is it possible to use a crosstab for daily reports given two dates? It's more like a details summary.
For example:
Date: 10 August 2012
Start Date: 10/8/2012
End Date: 11/8/2012
Date: 10 August 2012
_________| Center 1 | Center 2 | Total |
Person 1 | 1 | 2 | 3 |
Person 2 | 2 | 5 | 7 |
TOTAL | 3 | 7 | 10 |
Date: 11 August 2012
_________| Center 1 | Center 2 | Total |
Person 1 | 5 | 2 | 7 |
Person 2 | 8 | 5 | 13 |
TOTAL | 13 | 7 | 20 |
Unfortunately, you'd have to have each date as a row in the crosstab, and I think this is the best you'll be able to get:
10 August 2012|_________| Center 1 | Center 2 | Total |
|Person 1 | 1 | 2 | 3 |
|Person 2 | 2 | 5 | 7 |
|TOTAL | 3 | 7 | 10 |
11 August 2012|_________| Center 1 | Center 2 | Total |
|Person 1 | 5 | 2 | 7 |
|Person 2 | 8 | 5 | 13 |
|TOTAL | 13 | 7 | 20 |
|GRAND | 16 | 14 | 30 |
|TOTAL | | | |
So your setup would be:
Center Total
----+--------+-------+------+
Date| | | |
|Person | | |