I am very new to the spatial realm of SQL Server and need some help. I have a waypoint organizing app and I am trying to generate some queries that follow along the premise of finding waypoints that are part of geographic polygons like lakes, rivers, etc. I have preloaded my tables with data I have downloaded. I used shape2sql.exe to load shapefiles into the appropriate db tables.
Tables are as follows:
Water table - id, name, geog(geography data type)
State table - id, state_name, state_abbr, geog(geography data type)
County table - id, name, state_name, geog(geography data type)
Waypoint table - id, name, lat, lon, waterid
How do I write queries against these tables to return things like:
- all waypoints in 'michigan'
- all waypoints on 'bass lake' in 'montcalm' county in 'michigan' (there are multiple bass lakes in michigan and the country hence the county/state part)
- auto assign the water id column of the waypoint table by "processing" a group of waypoints and finding what lake they actually belong to
- etc.
Thanks!
Learned so far:
select geog.ToString() as Points, geog.STArea() as Area, geog.STLength() as Length
from water
where name like '%bass lake%' and STATE = 'mi'
will return the record for Bass Lake and the polygon with the actual coordinates for the lake.
POLYGON ((-87.670498549804691 46.304831340698243, -87.670543549804691 46.307117340698241, -87.676573549804687 46.313480340698241, -87.68120854980468 46.314821340698245, -87.685168549804686 46.315703340698242, -87.6877605498047 46.313390340698241, -87.685051549804683 46.308827340698244, -87.682360549804685 46.305650340698243, -87.677734549804683 46.304768340698246, -87.674440549804686 46.304336340698242, -87.670498549804691 46.304831340698243)) 1022083.96662664 4027.52433709888
Shooting from the hip, here, but maybe like this:
UPDATE waypoints
SET waypoints.WaterId = water.Id
FROM dbo.Waypoints AS waypoints LEFT JOIN
dbo.Water AS water ON geography::Point(waypoints.Lat, waypoints.Lon, 4326).STIntersects(water.geog)
Should set the waterId on the wapoints table to one of the matching water ids, from the water table.
This should get you all the waypoints on BASS LAKE
SELECT waypoints.*
FROM dbo.Waypoints as waypoints INNER JOIN
dbo.Water AS water ON geography::Point(waypoints.Lat, waypoints.Lon, 4326).STIntersects(water.geog) = 1
WHERE water.Name = 'BASS_LAKE' -- OR WHATEVER
Ok - learning as I go so here are some answers to my own questions for anyone what would like to know.
Here is one query for finding various waypoints with conditions in the where clause:
SELECT * FROM WaypointTable wp
JOIN WaterTable w
ON wp.geogcolumn.STIntersects(w.geogcolumn) = 1
WHERE w.name LIKE '%bass lake%'
AND w.state = 'mi';
Here is a query for assigning water id's to waypoints based on where they 'fit':
UPDATE WaypointTable wp
SET WaterID = (
SELECT ID
FROM WaterTable
WHERE geogcolumns.STIntersects(wp.geogcolumn) = 1
);
Both of these queries work extremely well and fast! Love it!
Related
I have a Postgres database with a postgis extention installed and filles with open street map data.
With the following SQL statement :
SELECT
l.osm_id,
sum(
st_area(st_intersection(ST_Buffer(l.way, 30), p.way))
/
st_area(ST_Buffer(l.way, 30))
) as green_fraction
FROM planet_osm_line AS l
INNER JOIN planet_osm_polygon AS p ON ST_Intersects(l.way, ST_Buffer(p.way,30))
WHERE p.natural in ('water') or p.landuse in ('forest') GROUP BY l.osm_id;
I calculate a "green" score.
My goal is to create a "green" score for each osm_id.
Which means; how much of a road is near a water, forrest or something similar.
For example a road that is enclosed by a park would have a score of 1.
A road that only runs by a river for a short period of time would have a score of for example 0.4
OR so is my expectation.
But by inspection the result of this calculation I get sometimes Values of
212.11701212511463 for a road with the OSM ID -647522
and 82 for a road with osm ID -6497265
I do get values between 0 and 1 too but I don't understand why I do also get such huge values.
What am I missing ?
I was expecting values between 1 and 0.
Using a custom unique ID that you must populate, the query can also union eventually overlapping polygons:
SELECT
l.uid,
st_area(
ST_UNION(
st_intersection(ST_Buffer(l.way, 30), p.way))
) / st_area(ST_Buffer(l.way, 30)) as green_fraction
FROM planet_osm_line AS l
INNER JOIN planet_osm_polygon AS p
ON st_dwithin(l.way, p.way,30)
WHERE p.natural in ('water') or p.landuse in ('forest')
GROUP BY l.uid;
Trying make join to get data about people flying on Mars
I'm having studying database, which contains information about different flights on spaceships. I will provide you with my database for better understanding.My datalogical model
The task is - find the amount of people that flew to a certain point on a certain ship. I do it with the code shown below:
select ship.shipname, destination.name as destination_name, count(person_id)
from person
join flight_person on flight_person.PERSON_ID = person.ID
join flight on flight.id = flight_person.flight_id
join ship on flight.ship_id = ship.id
join destination on flight.destination_id = destination.id
group by ship.shipname, destination.name;
The output is next:
shipname
destination_name
count
WhiteForest
Mars
1
YarikLightSpeed
Earth
1
YarikLightSpeed
Mars
2
But the problem is, that I want to get information about points and destinations that I didn’t visit, how can I modify my query to get this data. In my case with is destination Neptune and snip FirePower
Probably, what you're looking for is called "FULL JOIN". In this case, you'll get an output, which contains all information about all ships and all planets. Consider following example:
select ship.shipname, destination.name as destination_name, count(person_id)
from person
join flight_person on flight_person.PERSON_ID = person.ID
join flight on flight.id = flight_person.flight_id
full join ship on flight.ship_id = ship.id
full join destination on flight.destination_id = destination.id
group by ship.shipname, destination.name;
shipname
destination_name
count
null
Neptune
0
FirePower
null
0
WhiteForest
Mars
1
YarikLightSpeed
Earth
1
YarikLightSpeed
Mars
2
I have two datasets :
A "customer" dataset with customer names and geographical coordinates (x,y)
A "stations" dataset with stations names and geographical coordinates (x,y)
What I need to do :
Find for each customer, the nearest station from the "stations" dataset
At the end, i need a dataset with :
customer_name, customerX, customerY, nearest_station_name, nearest_station_x, nearest_station_y
Nearest Definition :
For example for customer "c":
s1 is the station 1
s2 is the station 2
if ((Xs1-Xc)² + (Ys1-Yc)²) < ((Xs2-Xc)² + (Ys2-Yc)²) Then the Nearest station is S1
if ((Xs1-Xc)² + (Ys1-Yc)²) = ((Xs2-Xc)² + (Ys2-Yc)²) Then the Nearest stations is either
if ((Xs1-Xc)² + (Ys1-Yc)²) > ((Xs2-Xc)² + (Ys2-Yc)²) Then the Nearest station is S2
That mean i need to know for each customer and each station, the result of (Xsi-Xc)² + (Ysi-Yc)²
Do you know if i can do that in spark scala or spark sql or bigquery without having to code a UDF?
Thank you for your help.
I tried, for every customer, to loop thru the stations list in order to find the nearest but its too complicated and should be a UDF, which i dont want if not mandatory ...
Double nearestStationDistance = Double.MAX_VALUE;
Station nearestStation = null;
for(Station station : stations){
Double distance = ((station.x - customer.x)² + (station.y - customer.y)²);
if(distance < nearestStationDistance ){
nearestStationDistance = distance;
nearestStation = station
}
}
return nearestStation;
And after extract informations from the "Station" object to get the name and the coordinates in order to complete the customer dataset.
I wrote couple posts about doing it in BigQuery:
https://mentin.medium.com/nearest-neighbor-in-bigquery-gis-7d50ebd5d63
https://mentin.medium.com/nearest-neighbor-using-bq-scripting-373241f5b2f5
The solution is simple to express in SQL:
SELECT
a.id,
ARRAY_AGG(b.id ORDER BY ST_Distance(a.geog, b.geog) LIMIT 1)
[ORDINAL(1)] as neighbor_id
FROM people_table a JOIN restaurant_table b
GROUP BY a.id
But that solution does not scale when tables are large, and the posts discuss options to speed things up.
I have a query which I want to know relatively how many locations are up to 100 meters away (relate to all distances):
select person_tbl.tdm, sum((st_distance (person_tbl.geo, location_tbl.geo) < 100)::INT)::FLOAT / count(*)
from persons as person_tbl, locations as location_tbl
where person_tbl.geo is not null
group by person_tbl.tdm
The 2 tables contains geometry indexs:
create index idx on persons using gist(geo)
create index idx on locations using gist(geo)
The first table (persons) the values of geo field is POLYGON
The second table (locations) the values of geo field are POINT Z or POLYGON Z or MULTIPOLYGON Z
The first table persons contains ~2M rows and the second table locations contains ~500 rows
The query took too long (~2 hours).
The values of max_parallel_processes and max_parallel_workers is 8
Is there something I can do to optimize the query calculation time (2 hours seems too long) ?
Is there a better way to write the query ? or do I need to define the indexes in other way ?
In my application i have a query that do multiple joins with a table position. Just like this:
SELECT *
FROM (...) as trips
join trip as t on trips.trip_id = t.trip_id
left outer join vehicle as v on v.vehicle_id = t.trip_vehicle_id
left outer join position as start on trips.start_position_id = start.position_id and start.position_vehicle_id = v.vehicle_id
left outer join position as "end" on trips.end_position_id = "end".position_id and "end".position_vehicle_id = v.vehicle_id
left outer join position as last on trips.last_position_id = last.position_id and last.position_vehicle_id = v.vehicle_id;
My table position has 35 columns(for example position_id).
When I run the query, in result should appear the table position 3 times, start, end and last. But postgres can not distinguish between, for exemplar, start.position_id, end.position_id and last.position_id. So this 3 columns are group and appear as one, position_id.
As the data from start.position_id and end.position_id are different, the column, position_id, that appear in result, it's empty.
Without having to rename all the columns, like this: start.position_id as start_position_id.
How can i get each group of data separately, for exemple, get all columns from the table 'start'. In MYSQL i can do this operation by calling fetch_fields, and give the function an alias, like 'start'.
But i can i do this in Postgres?
Best Regards,
Nuno Oliveira
My understanding is that you can't (or find it difficult to) discern between which table each column with a shared name (such as "position_id") belongs to, but only need to see one of the sets of shared columns at any one time. If that is the case, use tablename.* in your SELECT, so SELECT trips.*, start.*... would show the columns from trips and start, but no columns from other tables involved in the join.
SELECT [...,] start.* [,...] FROM [...] atable AS start [...]