How to get all of the way IDs that each node is a part of - openstreetmap

So I am trying to build an overpass / osm query that will in effect find me all of the nodes that a part of multiple road segments, or 'ways'. I have a challenge in that I am dealing with somewhat large area (Norfolk VA, 100,000 nodes) so I'm trying to find a somewhat performant query.
This following query is useful in that it provides all of the nodes, something I need to iterate over, as any node could be part of another way:
[out:json][timeout:25];
{{geocodeArea:Norfolk, VA}}->.searchArea;
(
(
way["highway"](area.searchArea);
node(w);
);
);
// print results
out body;
>;
out skel qt;
I also found this query which returns to me every node that is a part of multiple ways. Very useful, however very non-performant query, O(n^2), and scales to an entire city very poorly.
way({{bbox}})->.always;
foreach .always -> .currentway(
(.always; - .currentway;)->.allotherways;
node(w.currentway)->.e;
node(w.allotherways)->.f;
node.e.f;
(._ ; .result;) -> .result;
);
.result out meta;
I think the minimum-useful information I need is to have all of the node IDs returned as they are associated with each way (kinda like a map/dict) but I'm really struggling to figure out if that is a possible to make such a call. Appreciate your input!

Related

Overpass query. Absence of [maxsize] returns significantly smaller results

I have two overpass queries.
node(33.68336,-117.89466,34.14946,-117.03498);
way["highway"~"motorway|motorway_link|trunk|trunk_link|primary|primary_link|secondary|secondary_link|tertiary|tertiary_link|road|residential|service"](bn);
(._;>;);
out;
The query above returns an osm.xml file that is 167.306 kb big.
[out:xml][maxsize:2000000000];
(
node(33.68336,-117.89466,34.14946,-117.03498);
way["highway"~"motorway|motorway_link|trunk|trunk_link|primary|primary_link|secondary|seconda ry_link|tertiary|tertiary_link|road|residential|service"](bn);
(._;>;);
);
out;
The second query returns a file that is 618.994 kb big. Why does the second query return a significantly bigger result? Does the first query not give me the full dataset? Is there a way to get the same result with both queries? (The absence of [maxsize] sometimes leads to an error…)
I feel that there is something missing about your query:
node(33.68336,-117.89466,34.14946,-117.03498); should return all the nodes in this area,which is a lot of data.
then the second line:
way"highway"~“motorway|motorway_link|trunk|trunk_link|primary|primary_link|secondary|secondary_link|tertiary|tertiary_link|road|residential|service”;
gives an error, as it should be written with brackets and straight quotes as so:
way["highway"~"motorway|motorway_link|trunk|trunk_link|primary|primary_link|secondary|secondary_link|tertiary|tertiary_link|road|residential|service"];
but this looks for all the roads in the world, and your first query is not used any more, as your output is only the second query. But that is a huge amount of data, probably in the GB range.
So I don't see how you would get only 167 kB. I assume you must a bounding box or some other filter that you did not mention.
But in your second example, you make an union of the two queries, as you put them in brackets:
(... ; ...;); out; so you would get all the nodes in the area and all the roads in the world. And again, if you have an extra bounding box or filter, you might get only 619 kB. Supposing that there are a lot of non-road nodes, it makes sense that you get more data, as you get the union of the two searches (all nodes + nodes from the roads)

How to export the roadmap including traffic_signals and street_lamps of a city?

I would like to get some statistics about the roads, their deployed traffic lights, and the lampposts. Is there any way to get these statistics immediately for Shenzhen (China) city?
Secondly: how can I export the road network of a specific city (i.e., Shenzhen) including traffic_signals and street_lamps?
I have tried this code using Overpass API:
[out:csv(::id,::lat,::lon)][timeout:900];
// gather results
(
node["highway"="street_lamp"](22.6242,113.6371,23.0628,114.5462);
);
// print results
out body;
The query doesn't retrieve any results for Shenzhen's(China) coordinates(22.6242,113.6371,23.0628,114.5462).However, when applying on coordinates of London(51.3941,-0.2774,51.56,0.0879), it works and retrieves.
Moreover, when I do simple query like querying PoI:
[out:json][timeout:10];
// gather results
(
node["leisure"](around: 200,22.5,113.9936701,22.6740047,113.9935278);
);
out body;
It also works although in Shenzhen(China). Any way to retrieve nodes tagged with 'street_lamp' and 'traffic_sign' in Chinese cities (i.e., Shenzhen)?
To query within boundaries, use the id of the city boundary's relation, then use map_to_area and then query with the the (area) filter:
rel(3464353);
map_to_area;
node(area)["highway"="street_lamp"];
out;

Postgres...how to improve ilike results (quality not speed)

I have a list of chemicals in my database and I provide our users with the ability to do a live search via our website. I use SQLAlchemy and the query I use looks something like this:
Compound.query.filter(Compound.name.ilike(f'%{name}%')).limit(50).all()
When someone searches for toluene, for example, they don't get the result they're looking for because there are many chemicals that have the word toluene in them, such as:
2, 4 Dinitrotoluene
2-Chloroethyl-p-toluenesulfonate
4-Bromotoluene
6-Amino-m-toluenesulfonic acid
a,2,4-trichlorotoluene
a,o-Dichlorotoluene
a-Bromtoluene
etc...
I realize I could increase my limit but I feel like 50 is more than enough. Or, I could change the ilike(f'%{name}%')) to something like ilike(f'{name}%')) but our business requirements don't want this. What I'd rather do is improve the ability for Postgres to return results so that toluene is at the top of the search results.
Any ideas on how Postgres' ilike capability?
Thanks in advance.
One option is to better rank the results. Postgres text search allows you to rank the results.
A cheap and dirty version of preferential ranking is to do multiple queries for name = ?, ilike(f'{name}%')), and ilike(f'%{name}%')) using a union. That way the ilike(f'{name}%')) results come first.
And rather than a hard limit, offer pagination. SQLAlchemy has paginate to help.
ILIKE yields a boolean. It doesn't specify what order to return the results, just whether to return them at all (you can order by a boolean, but if you only return trues there is nothing left to order by). So by the time you are done improving it, it would no longer be ILIKE at all but something else completely.
You might be looking for something like <-> from pg_trgm, which provides a distance score which can be sorted on. Although really, you could just order the result based on the length of the compound name, and return the shortest 50 that contain the target.
something like ilike(f'{name}%')) but our business requirements don't want this
Isn't your business requirement to get better results?
But at least in my database, this could just return a bunch of names in inverted format, like toluene, 2,4-dinitro, so the results might not be much better, unless you avoid storing such inverted names. Sorting by either <-> or by length would overcome that problem. But they would also penalize toluene, ACS reagent grade 99.99% by HPLC, should you have names like that.

Is there a way to get results for an overpass query paginated?

Let's say I want to get restaurants in Berlin and I have this query:
[out:json];
area["boundary"="administrative"]["name"="Berlin"] -> .a;
(
node(area.a)["amenity"="restaurant"];
); out center;
Let's say this result set is too big to extract in just one request to overpass. I would like to be able to use something like SQL's OFFSET and LIMIT arguments to get the first 100 results (0-99), process them, then get the next 100 (100-199) and so on.
I can't find an option to do that in the API, is it possible at all? If not, how should I query my data to get it divided into smaller sets?
I know I can increase the memory limit or the timeout, but this still leaves me handling one massive request instead on n small ones, which is how I would like to do it.
OFFSET is not supported by Overpass API, but you can limit the number of result this is getting returned by the query via an additional parameter in the out statement. The following example would return only 100 restaurants in Berlin:
[out:json];
area["boundary"="administrative"]["name"="Berlin"] -> .a;
(
node(area.a)["amenity"="restaurant"];
); out center 100;
One approach to limit the overall data volume could be to count the number of objects in a bounding box, and if that number is too large, split the bounding box in 4 parts. counting is supported via out count;. Once the number of objects is feasible, just use out; to get some results.
node({{bbox}})["amenity"="restaurant"];
out count;

Gremlin query to find the count of a label for all the nodes

Sample query
The following query returns me the count of a label say
"Asset " for a particular id (0) has >>>
g.V().hasId(0).repeat(out()).emit().hasLabel('Asset').count()
But I need to find the count for all the nodes that are present in the graph with a condition as above.
I am able to do it individually but my requirement is to get the count for all the nodes that has that label say 'Asset'.
So I am expecting some thing like
{ v[0]:2
{v[1]:1}
{v[2]:1}
}
where v[1] and v[2] has a node under them with a label say "Asset" respectively, making the overall count v[0] =2 .
There's a few ways you could do it. It's maybe a little weird, but you could use group()
g.V().
group().
by().
by(repeat(out()).emit().hasLabel('Asset').count())
or you could do it with select() and then you don't build a big Map in memory:
g.V().as('v').
map(repeat(out()).emit().hasLabel('Asset').count()).as('count').
select('v','count')
if you want to maintain hierarchy you could use tree():
g.V(0).
repeat(out()).emit().
tree().
by(project('v','count').
by().
by(repeat(out()).emit().hasLabel('Asset')).select(values))
Basically you get a tree from vertex 0 and then apply a project() over that to build that structure per vertex in the tree. I had a different way to do it using union but I found a possible bug and had to come up with a different method (actually Gremlin Guru, Daniel Kuppitz, came up with the above approach). I think the use of project is more natural and readable so definitely the better way. Of course as Mr. Kuppitz pointed out, with project you create an unnecessary Map (which you just get rid of with select(values)). The use of union would be better in that sense.