Query regarding OSM file structure - openstreetmap

Had a query regarding the OSM file's tags and values. I came across files where there are inconsistencies in tag names and certain node ids/uids do not have tag names and thereby don't allow us to identify what feature it is without opening them in any GIS software. For example, some node ids and uids have tag names as "source bing". Is there a way to identify what they represent without opening them in GIS software? Also, How does OSM recognize these features without proper tag names and values?
Thank you!

In the OSM data model, nodes can represent point features, but they can also be used to define the shape of ways (or, rarely, relations). OSM ways do not have location information on their own, but instead reference a list of node IDs. You can read more about this in the OSM wiki's article on nodes. The OSM wiki is also a good source on the meaning of the tags used in the OSM database.
Nodes that do not represent a point feature will usually carry no tags. However, you will sometimes find nodes with tags for internal use by OSM contributors, such as source information.
There are other reasons why a node may have no tags or unexpected tags, such as editing mistakes by an OSM contributor or newly invented tags. But these are comparatively less common.

Related

Identify crossroad nodes in openstreetmap data (.pbf)

does anybody know if there is a way I can seperate only the crossroad nodes which are included in a .pbf file? Is this clue (if a node is crossroad or not) included in this file's format?
Another option to solve your issue would be to use the new Atlas project.
As part of loading .osm.pbf files into in-memory Atlas files, it takes care of doing way sectioning on roads:
Load your pbf file into an Atlas. You will then have an Atlas object that you could save to a file and re-use.
Use the Atlas APIs to access all the intersections
In the end, each Atlas Node which is connected to more than 4 Edges on a two-way road or 2 Edges on a one way road would be a candidate if I understand your question correctly.
I'm not aware of a ready-made solution for this task, but it should still be relatively easy to do.
For parsing the .pbf file, I recommend using an existing library like Osmosis or Osmium. That way, you only need to implement the actual semantics of your use case.
The nodes themselves don't have any special attributes that mark them as crossroads. So instead, you will have to look at the ways containing the nodes.
Some considerations when implementing this:
You need to check the way's tags to find out whether it's a road. The most relevant key for that is highway. The details depend on your specific use case – for example, you need to decide whether footways, forestry tracks, driveways, ... should count.
What matters is the number of connecting way segments at a node, not the number of ways. For example, a node that is part of two ways may be a crossroads (if at least one of the ways continues beyond that node), or may not (if both ways start/end at that node).

Meaning of the spatialite scheme generated by the spatialite_osm_map tool

I used the spatialite_osm_map tool to generate a spatialite database from an .osm.pbf file. After the process was finished, a series of tables were generated in the database as shown in the image.
I noticed that there were 3 groups of tables based on the prefixes of their names: In_, pg_ and pt_. I also noticed that the rest of the name corresponded to a key defined in OpenStreetMap.
Can someone explain to me how the information is distributed in each of these groups and tables? I've searched for a site that explains the resulting schema after the conversion, but I've only found information on how to use the tool.
I think you have already identified the key points of this scheme.
It's main purpose is to offer the data from OSM in a way who could be more direct and intuitive for a GIS user. The data is splited according to OSM tags (aerialway, aeroway, amenity, etc., you can change the list of tags to be used if you don't need all of them) and according to the type of geometry (pt_* for points, ln_* for lines, and pg_* for polygons) so these tables (which could be directly seen as "layers" by a GIS user) can quickly be styled (for example in a GIS desktop application such as QGIS) with simple rules due to this simple schema (for example one can set rules like green for pg_natural, blue for ln_waterway and pg_waterway, or just click on the "pg_building" layer to toggle its visibility). That schema doesn't preserve all the objects from the OSM database, but only those requested to build the tables for the requested tags.
Contrary to the original way of storing OSM objects, with this kind of extraction you will lose the relationships between objects (for example in OSM the same node can be used, let's say, as part of the relationship describing an administrative boundary and as part of a road; here you will get a road line in ln_road and a polygon in pg_boundary but you will loose the information that they were maybe partially sharing nodes). Notably due to this last point, the weight of the OSM extractions can be relatively high compared to the original file.
So I guess that this kind of scheme (which is one amongst other existing ways to transform OSM data) offers an interesting abstraction for those who are not accustomed to the OSM schema which use Node, Way and Relation elements (eg, in OSM, buildings can be represented as closed way or as relation, here you get "simply" polygons for these various buildings).

Routing network from OSM

I'm looking for some good tool to import map.osm to postgres and next create some routes which will be displayed by geoserver. I need route, with some text information about vertexes (e.g. city, address, address number, and so on...)
I found this:
osm2pgrouting - Import OSM data into pgRouting Database
osm2postgis -Import OSM data to PostGIS
osm2po - tool to convert OSM data into a routable format
osm4routing - OpenStreetMap data parser to turn them into a nodes-edges adapted for routing applications
I do not have many experiences with GIS, so how tool is the best for me? I try osm2pgrouting, but in result I have tables, which do not contains data about vertexes(only lat. and alt.) Thanks for answers.
UPDATE App Info:
I will be have web and android client where user enter text value of start and end node, and next over geoserver get wms with vertexes of entered route for example
My result from could be be some edges and nodes like this like this:
sequence_num, edge_distance, and informations about edge vertexes like osm_id, some text value, lat alt, etc...
I think you have a lot of work to do before you get to a complete solution, but here are some pointers. I suggest you break down your project into smaller chunks and ask specific questions on any bits you might get stuck on.
First, you need to import your data. Then you'll need some pre-processing / cleaning. Then you need your routing queries and, finally, a way to use the outputs (with this last part determining to some extent the previous steps).
Import OSM data
As I described in an answer to your previous question here, you can use OGR2OGR to import OSM data to Postgis. You can use other programs, as you mention above, but I guess you'll get much the same results. I think the difference between the OGR2OGR tables and the osm2postgis ones is that some of the columns in the latter appear in the other_tags column. However, the data is still there, you just need slightly different queries.
Preparing data
I'm assuming you'll use pgrouting for the routing, but whatever you use, you'll need a network suitable for routing (in short, the edges have a start and end node, and the end nodes must connect with other start nodes). Pgrouting has tools to create what you need and validate it. E.g. you create integer columns source and target and the function pgr_createtopology will populate the columns for you.
OGR2OGR gives you tables "lines", "points", "multipolygons", "multilinestrings". I suggest you read up on OSM to understand exactly what is in these tables, but, roughly speaking, the lines contain your roads and the multipolygons contain, amongst other things, buildings with e.g. addresses. The addresses are in a hstore column called "other_tags".
The lines do not contain addresses! (although they do contain street names). So, if you want to do address-to-address routing you need to do some preparation. You can skip this if you can live with the street names.
Create your network (e.g. if you're routing for cars, you'll want to
throw out pedestrian routes and so on)
Extract the desired addresses (including coordinates)
Either snap the addresses to the nearest node, or otherwise
relate the address to the nearest node
Pgrouting will return the edges in your route, so you need the above to relate back to your addresses.
Routing
Your app is going to send to your server (in an as-yet unspecified way) a pair of addresses or coordinates and you need postgis to return the route. With pgrouting, that's quite easy and there are plenty of examples out there, for example here. You will need to write queries that join the output to your address table to give you the desired output.
pgrouting creates a vertices table. You can get the nearest vertex with the following query:
select id from vertices_pgr
order by the_geom <-> st_setsrid(st_point(lon,lat),4326)
limit 1
Using the output
Using WMS from geoserver is unlikely to be a good choice - you won't have the information on individual edges without a lot of messing about. You might consider geoJSON, which can be read by e.g. OpenLayers, Leaflet, or you can manipulate in Javascript. Postgres has lots of useful functions for working with json and geojson.
Conclusion
That's quite a lot of work and probably new stuff if you have little GIS knowledge, and it, er, basically recreates what you'd get from Graphhopper! Are you sure that's not a better way to go?
If you do decide to go this (or similar) route break things down into manageable chunks! First, figure out exactly what you're trying to achieve, then work backwards from there. If you do decide to use OSM / pgrouting, then play with the data and pgrouting first so you understand how it works before trying address matching etc.
The tools you listed are only for producing data, but I think you actually need a routing engine.
Try Graphhopper: https://graphhopper.com
Using the WEB Api (more likely what you need), you don't need the import the data in your database. This is the easiest solution. You will have not control over the input openstreetmap data but this is fine if you don't have special requirements.
Import data and implement/integrate a routing engine directly in your application would be much more complicated.

Which features should be added for NER in search result snippets

I want to cluster queries by help of the snippets of the search engine results they are currently returning. While using the noun phrases in the snippet worked well for Google results I felt that I should try a different approach for bing snippets and hence was going for Named Entity Extraction.
I have identified the following entities that can be extracted as of now using standard tools:
Person Names
Organisation Names
Locations
But I think I should be extracting more entities. Could anyone help me out here to identify more entities that may be useful?
This is an endless list, once you get to real data problems.
For example, dates are a common thing to extract. But for example booking codes such as airline tickets, or tracking codes such as parcels are something Google Mail already recognizes and extracts.
I don't think this is a very good question for a Q/A site. Plus, you may want to read more literature, and see what kind of data you can get - it clearly is data-driven what entities you want to extract. When analyzing log files, you might be interested in extracting host names, IPs, usernames and daemon/serivce names, for example.

OpenStreetMap relation ID rarely present

first off, I'm not looking for any 3rd party parsers or existing libraries. I've also read all related OSM questions on StackOverflow and haven't seen an answer.
I'm looking to parse the OpenStreetMap data into a viable structure for rendering and routing and had a quick question about this. To this end I exported a tiny portion of the center of a major city, and have it parsed to a usable structure.
The Relation-XML looks like this:
<relation id="31249" visible="true" version="100" changeset="13180178" timestamp="2012-09-20T08:12:17Z" user="Skywave" uid="10927">
<member type="way" ref="22375740" role=""/>
<member type="way" ref="39271187" role=""/>
<member type="way" ref="39271189" role=""/>
<member type="way" ref="39271191" role=""/>
</relation>
.. etc
But these ref-IDs are often not present in the XML. Ideally they'd refer to another node, another relation.. but often they only occur in this one line. This essentially means it's worthless info, correct?
I suspect OSM includes all known information, and if that information happens to be outside the map area, its reference in relations is still included but the actual object that is referred to is not exported along, and I can trim it out ?...
Tried to find this on the OSM website.. couldn't :-)
OSM extracts from APIs etc. don't contain all elements that are linked in a relation. They contain all geo objects (nodes, ways) that touch the boundingbox of your extract. But for relations it doesn't work like this, just because otherwise in a worstcase it will cascade (relation links to other relations or huge ways that are present in multiple relations...) till a huge mass of objects (even without the bbox) need to be present in the extract.
You might use bigger extracts that you can make sure to contain all data (e.g. country planet.osm extracts) or call an read-only API like Overpass API to get the objects.
The Overpass API is the way to go. To extract all nodes related to you given relation you would need to do a query like this:
relation(31249);
>;
out;
See: http://overpass-turbo.eu/s/41k
To only select way members (with their node ids) of the given relation run:
relation(31249);
way(r);
out;
The Overpass API is quite confusing at first but really powerful.