Overpass API: query for counting amenity of specified type around set of lat lons - overpass-api

I'm trying to query data from the OSM Overpass API. Specifically I'm trying to determine the count of amenities of a given type around a point (using the 'around' syntax). When running this for many locations (lat, lons) I'm running into a TooManyRequests error.
I have tried to work around by setting sleep time pauses and playing with the timeout header and retry time, but I'm running into the same issue. I'm trying to find a way to adapt the query so that it just returns the count of amenities (of specified type) around each point, rather than the full json of nodes which is more data intensive. My current script is as follows;
# Running Overpass query for each point
results = {}
for n in range(0, 200):
name = df.loc[n]['city']
state = df.loc[n]['state_name']
rad = df.loc[n]['radius_m']
lat = df.loc[n]['lat']
lon = df.loc[n]['lng']
# Overpass query for amenities
start_time = time.time()
api = overpy.Overpass(max_retry_count=None, retry_timeout=2)
r = api.query(f"""
[out:json][timeout:180];
(node["amenity"="charging_station"](around:{rad}, {lat}, {lon});
);
out;
""")
print("query time for "+str(name)+", number "+str(n)+" = "+str(time.time() - start_time))
results[name] = len(r.nodes)
time.sleep(2)
Any help is much appreciated from other Overpass users!
Thanks

In general, you can run out count; to return a count from an overpass API query.
It's hard to say without knowing how your data is specifically structured, but you might have better luck using area to look at specific cities, or regions.
Here is an example that returns the count of all nodes tagged as charging station in Portland, Oregon:
/* charging stations in portland */
area[name="Oregon"]->.state;
area[name="Portland"]->.city;
(
node["amenity"="charging_station"](area.state)(area.city);
);
out count;

Related

how to download precipitation data for latitude-longitude coordinates from NOAA in R

I'm trying to download precipitation data for a list of latitude-longitude coordinates in R. I've came across this question which gets me most of the way there, but over half of the weather stations don't have precipitation data. I've pasted code below up to this point.
I'm now trying to figure out how to only get data from the closest station with precipitation data, or run a second function on the sites with missing data to get data from the second closest station. However, I haven't been able to figure out how to do this. Any suggestions or resources that might help?
`
library(rnoaa)
# load station data - takes some minutes
station_data <- ghcnd_stations() %>% filter(element == "PRCP")
# add id column for each location (necessary for next function)
sites_df$id <- 1:nrow(sites_df)
# retrieve all stations in radius (e.g. 20km) using lapply
stations <- lapply(1:nrow(sites_df),
function(i) meteo_nearby_stations(sites_df[i,],lat_colname = 'Lattitude',lon_colname = 'Longitude',radius = 20,station_data = station_data)[[1]])
# pull data for nearest stations - x$id[1] selects ID of closest station
stations_data <- lapply(stations,function(x) meteo_pull_monitors(x$id[1], date_min = "2022-05-01", date_max = "2022-05-31", var = c("prcp")))
stations_data`
# poor attempt its making me include- trying to rerun subset for second closest station. I know this isn't working but don't know how to get lapply to run for a subset of a list, or understand exactly how the function is running to code it another way
for (i in c(1,2,3,7,9,10,11,14,16,17,19,20)){
stations_data[[i]] <- lapply(stations,function(x) meteo_pull_monitors(x$id[2], date_min = "2022-05-01", date_max = "2022-05-31", var = c("prcp")))
}

Looking for advice on improving a custom function in AnyLogic

I'm estimating last mile delivery costs in an large urban network using by-route distances. I have over 8000 customer agents and over 100 retail store agents plotted in a GIS map using lat/long coordinates. Each customer receives deliveries from its nearest store (by route). The goal is to get two distance measures in this network for each store:
d0_bar: the average distance from a store to all of its assigned customers
d1_bar: the average distance between all customers common to a single store
I've written a startup function with a simple foreach loop to assign each customer to a store based on by-route distance (customers have a parameter, "customer.pStore" of Store type). This function also adds, in turn, each customer to the store agent's collection of customers ("store.colCusts"; it's an array list with Customer type elements).
Next, I have a function that iterates through the store agent population and calculates the two average distance measures above (d0_bar & d1_bar) and writes the results to a txt file (see code below). The code works, fortunately. However, the problem is that with such a massive dataset, the process of iterating through all customers/stores and retrieving distances via the openstreetmap.org API takes forever. It's been initializing ("Please wait...") for about 12 hours. What can I do to make this code more efficient? Or, is there a better way in AnyLogic of getting these two distance measures for each store in my network?
Thanks in advance.
//for each store, record all customers assigned to it
for (Store store : stores)
{
distancesStore.print(store.storeCode + "," + store.colCusts.size() + "," + store.colCusts.size()*(store.colCusts.size()-1)/2 + ",");
//calculates average distance from store j to customer nodes that belong to store j
double sumFirstDistByStore = 0.0;
int h = 0;
while (h < store.colCusts.size())
{
sumFirstDistByStore += store.distanceByRoute(store.colCusts.get(h));
h++;
}
distancesStore.print((sumFirstDistByStore/store.colCusts.size())/1609.34 + ",");
//calculates average of distances between all customer nodes belonging to store j
double custDistSumPerStore = 0.0;
int loopLimit = store.colCusts.size();
int i = 0;
while (i < loopLimit - 1)
{
int j = 1;
while (j < loopLimit)
{
custDistSumPerStore += store.colCusts.get(i).distanceByRoute(store.colCusts.get(j));
j++;
}
i++;
}
distancesStore.print((custDistSumPerStore/(loopLimit*(loopLimit-1)/2))/1609.34);
distancesStore.println();
}
Firstly a few simple comments:
Have you tried timing a single distanceByRoute call? E.g. can you try running store.distanceByRoute(store.colCusts.get(0)); just to see how long a single call takes on your system. Routing is generally pretty slow, but it would be good to know what the speed limit is.
The first simple change is to use java parallelism. Instead of using this:
for (Store store : stores)
{ ...
use this:
stores.parallelStream().forEach(store -> {
...
});
this will process stores entries in parallel using standard Java streams API.
It also looks like the second loop - where avg distance between customers is calculated doesn't take account of mirroring. That is to say distance a->b is equal to b->a. Hence, for example, 4 customers will require 6 calculations: 1->2, 1->3, 1->4, 2->3, 2->4, 3->4. Whereas in case of 4 customers your second while loop will perform 9 calculations: i=0, j in {1,2,3}; i=1, j in {1,2,3}; i=2, j in {1,2,3}, which seems wrong unless I am misunderstanding your intention.
Generally, for long running operations it is a good idea to include some traceln to show progress with associated timing.
Please have a look at above and post results. With more information additional performance improvements may be possible.

save page rank output in neo4j

I am running Pregel Page rank algorith
m on twitter data in Spark using scala. The algorithm runs fine and gives me the output correctly finding out the highest page rank score. But I am unable to save graph on neo4j.
The inputs and outputs are mentioned below.
Input file: (The numbers are twitter userIDs)
86566510 15647839
86566510 197134784
86566510 183967095
15647839 11272122
15647839 10876852
197134784 34236703
183967095 20065583
11272122 197134784
34236703 18859819
20065583 91396874
20065583 86566510
20065583 63433165
20065583 29758446
Output of the graph vertices:
(11272122,0.75)
(34236703,1.0)
(10876852,0.75)
(18859819,1.0)
(15647839,0.6666666666666666)
(86566510,0.625)
(63433165,0.625)
(29758446,0.625)
(91396874,0.625)
(183967095,0.6666666666666666)
(197134784,1.1666666666666665)
(20065583,1.0)
Using the below scala code I try saving the graph but it does'nt. Please help me solve this.
Neo4jGraph.saveGraph(sc, pagerankGraph, nodeProp = "twitterId", relProp = "follows")
Thanks.
Did you load the graph originally from Neo4j? Currently saveGraph saves the graph data back to Neo4j nodes via their internal id's.
It actually runs this statement:
UNWIND {data} as row
MATCH (n) WHERE id(n) = row.id
SET n.$nodeProp = row.value return count(*)
But as a short term mitigation I added optional labelIdProp parameters that are used instead of the internal id's, and a match/merge flag. You'll have to build the library yourself though to use that. I gonna push the update the next few days.
Something you can try is Neo4jDataFrame.mergeEdgeList
Here is the test code for it.
You basically have a dataframe with the data and it saves it to a Neo4j graph (including relationships though).
val rows = sc.makeRDD(Seq(Row("Keanu", "Matrix")))
val schema = StructType(Seq(StructField("name", DataTypes.StringType), StructField("title", DataTypes.StringType)))
val df = new SQLContext(sc).createDataFrame(rows, schema)
Neo4jDataFrame.mergeEdgeList(sc, df, ("Person",Seq("name")),("ACTED_IN",Seq.empty),("Movie",Seq("title")))
val edges : RDD[Edge[Long]] = sc.makeRDD(Seq(Edge(0,1,42L)))
val graph = Graph.fromEdges(edges,-1)
assertEquals(2, graph.vertices.count)
assertEquals(1, graph.edges.count)
Neo4jGraph.saveGraph(sc,graph,null,"test")
val it: ResourceIterator[Long] = server.graph().execute("MATCH (:Person {name:'Keanu'})-[:ACTED_IN]->(:Movie {title:'Matrix'}) RETURN count(*) as c").columnAs("c")
assertEquals(1L, it.next())
it.close()

Getting the total number of records in PagedList

The datagrid that I use on the client is based on SQL row number; it also requires a total number of pages for its paging. I also use the PagedList on the server.
SQL Profiler shows that the PagedList makes 2 db calls - the first to get the total number of records and the second to get the current page. The thing is that I can't find a way to extract that total number of records from the PagedList. Therefore, currently I have to make an extra call to get that total which creates 3 calls in total for each request, 2 of which are absolutely identical. I understand that I probably won't be able to rid of the call to get the totals but I hate to call it twice. Here is an extract from my code, I'd really appreciate any help in this:
var t = from c in myDb.MyTypes.Filter<MyType>(filterXml) select c;
response.Total = t.Count(); // my first call to get the total
double d = uiRowNumber / uiRecordsPerPage;
int page = (int)Math.Ceiling(d) + 1;
var q = from c in myDb.MyTypes.Filter<MyType>(filterXml).OrderBy(someOrderString)
select new ReturnType
{
Something = c.Something
};
response.Items = q.ToPagedList(page, uiRecordsPerPage);
PagedList has a .TotalItemCount property which reflects the total number of records in the set (not the number in a particular page). Thus response.Items.TotalItemCount should do the trick.

Query near vs. within

Using MongoDB I'm querying homes that are within 25 miles of a lat/long.
My first attempt to do this used the near command, like so:
var near = Query.Near("Coordinates", coordinates.Latitude, coordinates.Longitude, find.GetRadiansAway(), false);
var query = Collection().Find(near);
var listings = query.ToList();
The issue with near is that it only returns 100 listings, whereas I want to return all listings within 25 miles of the coordinates.
My next attempt was to use within:
var within = Query.WithinCircle("Coordinates", coordinates.Latitude, coordinates.Longitude, find.GetRadiansAway(), false);
var query = Collection().Find(within);
var listings = query.ToList();
Within returns all listings within 25 miles, which is great, however it doesn't sort them by how close they are to the center coordinates like near does.
So my question is, how do I get the best of both worlds? How do I get all listings within 25 miles AND have them sorted by proximity to the center coordinates?
Geospatial $near queries set a default limit() of 100 results. You should be able to get more results by setting a new limit().
While "near" queries are sorted by distance, "within" is not (although "within" doesn't have a default limit).