How to do Geo IP or postcode lookup against Geonames data - autocomplete

I am using the freely available geonames data locally to do autocomplete searches during the sign up stage on one of my websites.
I am having trouble working out the best way to make the form more user friendly by auto selecting a geoname based on their IP address and also be able to lookup a geoname based on the postcode data.
The problem is that I can't see a way to easily link an IP range or a postal code to a geoname. So what is the best practice here? Do I just run a separate query to lookup the nearest geoname by long/lat against the postcode or IP address?

You don't mention how you are geolocating the IP address, but the MaxMind GeoIP2 and GeoLite2 databases provide the geoname_id of the location. See, e.g., the CSV docs. The binary databases provide this same information.

Related

Efficient way to find if an IP is in a list of subnets that are stored in DynamoDB

I'm trying to create an API that I can send an IP address to and the response will contain the subnet that the IP belongs to (if it belongs to any in the table).
I have a list of subnets all stored in a table in DynamoDB like such:
subnet
45.221.27.0/24
102.215.216.0/23
192.168.0.0/16
etc...
I can't seem to figure out how I could efficiently query the table to determine which subnet an IP belongs to. I am using a Lambda to make the request so I am trying to avoid reading all the subnets in because that will use a lot of memory. I'm also trying to avoid scanning the table rather than querying because that can become too expensive.
I've been thinking about different ways of storing the subnets in the table such that it becomes possible to get more granular with queries but I also feel like I'm overcomplicating something that shouldn't be so complex.
How funny, I'm actually writing a blog on this. I'll add the link once it's published. There's a lot of interesting scaling topics related to this problem for how to load and query with max efficiency. Here's the simplest approach:
Use a singular Partition Key value (that is, the same for all items). Use the range start IP address as the Sort Key. But make it the 32-bit numeric value of the IP address not the string value, because we need to sort by it and sorting by the string value is problematic. (All IP addresses are really just 32-bit numbers underneath.) The other attributes will be the metadata you want to retrieve.
The lookup then is to issue a Query where the PK is the singular value and the SK is <= the lookup IP address (in numeric form).
The one caveat is you need to make sure that any gaps in the IP address range data set need to be filled during the load with marker items saying "gap here", otherwise a lookup that hits the gap will return the range ahead of the gap.

Entity Framework - How to manage suburb and state date across multiple databases

I have an SaaS application in the pipeworks.
One of the things that has me a bit confused is the best way to manage the stable of Austalian suburb and state data across multiple databases (this applies to any country as each country has a list like this).
For example in Australia you have Australian Postcode list that links all the postcodes to the suburbs and you can use that to create a dropdown for state, suburb and postcode etc.
An example of the CSV of australian postcodes can be found HERE.
So you can upload a csv file for example but the problem remains..
Whats the best way to hold this data.. its common to all databases where you have a person, client, employee etc..
Do you replcate it in each database? Is there a better way than having redundant stores of data..
Best way to implement it..
There are several options and considerations I would look at for this problem. Some considerations:
Number of address rows expected
Whether a client database is concerned with prefill/validated international addresses
Whether the client system is web connected or can operate in isolation
Are these databases/systems hosted by you or distributed to individual clients? (SaaS implies "Web" and "Hosted by You" to points 3 & 4)
How critical address integrity is.
For smaller systems, a simple option for address systems is to de-normalize the address data (state, postcode, suburb) and consider using a central lookup database/service, either under your own control or a third party. The denormalized address table would contain the text fields for the State, Postcode, Suburb etc. rather than FK values (stateId, suburbId, etc.) This avoids needing to store lookup tables in every client DB, just one Lookup DB or leave that to a 3rd party service.
The advantage of a third-party lookup is that keeping it up to date with new areas and changes is handled for you. Third party services would require a web connection, and you have to factor in the risk of their service being down or a web connection being unavailable. Larger systems with millions of addresses might benefit from normalizing the address table, so the "cost" of replicating suitable address lookup tables might be worthwhile. You can still a central service to look up addresses, then resolve whether the client DB already has a StateId, SuburbId etc. for the respective state/suburb for that post-code before inserting one if necessary. (Cutting down the number of rows each client DB needs to address values that are actually used)
In that last example you might have lookup tables for State and Suburb linked to PostCodes, linked to Country. Country would default to the target, maybe be an optional selection for international addresses. The user provides a post code to the service which returns suburbs, they select a suburb. The address validation service could go as far as to validate the street address. When you're happy an address is "valid" and ready to be saved, you search your local State, Suburb, (even Street) tables for matches for that PostCode, if found use those FKs, otherwise insert new entries and link the FK.
Using a separate service, or services would be my consideration especially if you need to support validating/storing international addresses. For instance if the client is in Australia but regularly has address information for New Zealand. Storing entire address validation tables could get rather large if clients could be resolving addresses for many countries. (I.e. European countries and neighbours) You can write a Façade service to support different 3rd party address validation providers and/or homemade implementations with a standard interface.
If a system has to operate in isolation of an internet connection then you'll probably be stuck with each database having one or more local data sources to resolve address information.
Data integrity of address information is a separate concern you might want to consider. In some systems you need to validate that an address is recognized and don't want to allow invalid combinations or detect unexpected changes. Services that validate a particular address can provide unique IDs for an address that you can store as part of your address information. (These often tie into geocoordinate solutions where you want to quickly direct a map service to a particular location) Alternatively, if you successfully look up an address then validate that the address information is valid, even if just the country, post code, and suburb, you can create and store a hash of those values to check for tampering. (I.e. someone or some system changed a field to make the address invalid, the combined address won't match the stored hash) Addresses can be checked before use and flagged if not valid.

IBM Watson Assitant: How to obtain a full address

I am working on making a reservation tab as follows:
Date of the pick up: #sys-date
Time of the pick up: #sys-time
Address of the pick up: #sys-location
The problem is that Watson Assistant doesn't recognize the location on live test when customer input detail of their location. It keeps asking me what's the address?
All I get is date of the pick up and time with no pick up address.
The system entity #sys-location in Watson Assistant should be used with caution. It is a BETA feature for some languages and only with this capability:
The #sys-location system entity extracts place names (country, state/province, city, town, etc.) from the user's input. The value of the entity is not a system-standard value of the location.
My suggestion is to ask for the address and capture it as an entire string. Then, try to standardize the input to your local address format, e.g., by using an address verification system.
Another option is to break up the address into parts like city, street, zip code and more. Depending on the country there are different formats and even more than one format to specify it. "What is your city?", "What is your street address?", ...

Where does Nominatim get its address info?

Till now, I was using the Nominatim API to fetch landmark information from but recently, I've downloaded the OpenStreetMaps database, and tried to make my own dataset, so I would not rely so heavily on Nominatim services. I managed to extract from the OSM database the needed information (nodes tagged with amenity for example), but I realized, that while I was querying for amenities through Nominatim, it returned a bunch of address info, which is nowhere to be found in the OSM database.
Example:
Reverse geocoding of a hotel from Spain using Nominatim:
http://nominatim.openstreetmap.org/reverse?format=xml&osm_type=N&osm_id=1207098527
The data that is attached to the same node used to reverse geocode in OSM:
http://open.mapquestapi.com/xapi/api/0.6/node/1207098527
While Nominatim gives me Suburb, Pedestrian, City, County, State, etc. information, this node in OSM contains only a name tag, and a tourism tag.
Does anyone know, where is Nominatim getting the additional data it uses to display its information from?
Nominatim does not just look at individual objects but gathers information from multiple objects instead. Look at the information Nominatim knows about "HOTEL LA MORADA MAS HERMOSA": There are:
the node, lacking all address information (feel free to improve this!, at least house number and street should be added)
a nearby street
the suburb the hotel is located in
the city
... and so on.
Remember, OSM is a spatial database. Instead of attaching all information to each individual object one can do spatial queries in order to gather various kinds of additional information.

How to get list of countries IP Address ranges from WHOIS server?

I want to get all countries ip addresses range from IANA's whois server, Not from maxmind or ip2location site. IANA is authentic site hence I would like to get all ipaddress ranges for countries from that site. Is it possible to query the WHOIS server such a way??
Its not possible to directly get the ip addresses allotted to any country like that.
IP numbers are allocated to regional internet registries.
There are 5 of them , ARIN , APNIC , AFRINIC , LACNIC , RIPE
And again , these RIR allot ip ranges to ISPs of a country.
By doing a whois query for an ip you can find out which RIR is the IP allocated to. The whois response will also contain the country and ISP of the ip address.
Basically you need to whois-query all ip ranges and aggregate the data and form a database. Such a database can be then used to provide all ip addresses belonging to a certain country.
IANA does not have this information so, no, there is no way to get it from them.
IANA only allocates big IP prefixes to RIR (Regional Internet Registries). For instance 31.0.0.0/8 has just been allocated to the RIPE-NCC (by the way, one less IPv4 prefix, time to enable IPv6 if it is not already done), which covers all Europe and a good part of the Middle East. So, these adresses may go to Ireland, Jordan or Greece and you cannot tell it from IANA allocations. Even the RIR whois (whois.ripe.net for the RIPE-NCC) won't tell you with enough details because a prefix may be assigned to a multinational IAP (Internet Access Provider).