Entity Framework - How to manage suburb and state date across multiple databases - entity-framework

I have an SaaS application in the pipeworks.
One of the things that has me a bit confused is the best way to manage the stable of Austalian suburb and state data across multiple databases (this applies to any country as each country has a list like this).
For example in Australia you have Australian Postcode list that links all the postcodes to the suburbs and you can use that to create a dropdown for state, suburb and postcode etc.
An example of the CSV of australian postcodes can be found HERE.
So you can upload a csv file for example but the problem remains..
Whats the best way to hold this data.. its common to all databases where you have a person, client, employee etc..
Do you replcate it in each database? Is there a better way than having redundant stores of data..
Best way to implement it..

There are several options and considerations I would look at for this problem. Some considerations:
Number of address rows expected
Whether a client database is concerned with prefill/validated international addresses
Whether the client system is web connected or can operate in isolation
Are these databases/systems hosted by you or distributed to individual clients? (SaaS implies "Web" and "Hosted by You" to points 3 & 4)
How critical address integrity is.
For smaller systems, a simple option for address systems is to de-normalize the address data (state, postcode, suburb) and consider using a central lookup database/service, either under your own control or a third party. The denormalized address table would contain the text fields for the State, Postcode, Suburb etc. rather than FK values (stateId, suburbId, etc.) This avoids needing to store lookup tables in every client DB, just one Lookup DB or leave that to a 3rd party service.
The advantage of a third-party lookup is that keeping it up to date with new areas and changes is handled for you. Third party services would require a web connection, and you have to factor in the risk of their service being down or a web connection being unavailable. Larger systems with millions of addresses might benefit from normalizing the address table, so the "cost" of replicating suitable address lookup tables might be worthwhile. You can still a central service to look up addresses, then resolve whether the client DB already has a StateId, SuburbId etc. for the respective state/suburb for that post-code before inserting one if necessary. (Cutting down the number of rows each client DB needs to address values that are actually used)
In that last example you might have lookup tables for State and Suburb linked to PostCodes, linked to Country. Country would default to the target, maybe be an optional selection for international addresses. The user provides a post code to the service which returns suburbs, they select a suburb. The address validation service could go as far as to validate the street address. When you're happy an address is "valid" and ready to be saved, you search your local State, Suburb, (even Street) tables for matches for that PostCode, if found use those FKs, otherwise insert new entries and link the FK.
Using a separate service, or services would be my consideration especially if you need to support validating/storing international addresses. For instance if the client is in Australia but regularly has address information for New Zealand. Storing entire address validation tables could get rather large if clients could be resolving addresses for many countries. (I.e. European countries and neighbours) You can write a Façade service to support different 3rd party address validation providers and/or homemade implementations with a standard interface.
If a system has to operate in isolation of an internet connection then you'll probably be stuck with each database having one or more local data sources to resolve address information.
Data integrity of address information is a separate concern you might want to consider. In some systems you need to validate that an address is recognized and don't want to allow invalid combinations or detect unexpected changes. Services that validate a particular address can provide unique IDs for an address that you can store as part of your address information. (These often tie into geocoordinate solutions where you want to quickly direct a map service to a particular location) Alternatively, if you successfully look up an address then validate that the address information is valid, even if just the country, post code, and suburb, you can create and store a hash of those values to check for tampering. (I.e. someone or some system changed a field to make the address invalid, the combined address won't match the stored hash) Addresses can be checked before use and flagged if not valid.

Related

Efficient way to find if an IP is in a list of subnets that are stored in DynamoDB

I'm trying to create an API that I can send an IP address to and the response will contain the subnet that the IP belongs to (if it belongs to any in the table).
I have a list of subnets all stored in a table in DynamoDB like such:
subnet
45.221.27.0/24
102.215.216.0/23
192.168.0.0/16
etc...
I can't seem to figure out how I could efficiently query the table to determine which subnet an IP belongs to. I am using a Lambda to make the request so I am trying to avoid reading all the subnets in because that will use a lot of memory. I'm also trying to avoid scanning the table rather than querying because that can become too expensive.
I've been thinking about different ways of storing the subnets in the table such that it becomes possible to get more granular with queries but I also feel like I'm overcomplicating something that shouldn't be so complex.
How funny, I'm actually writing a blog on this. I'll add the link once it's published. There's a lot of interesting scaling topics related to this problem for how to load and query with max efficiency. Here's the simplest approach:
Use a singular Partition Key value (that is, the same for all items). Use the range start IP address as the Sort Key. But make it the 32-bit numeric value of the IP address not the string value, because we need to sort by it and sorting by the string value is problematic. (All IP addresses are really just 32-bit numbers underneath.) The other attributes will be the metadata you want to retrieve.
The lookup then is to issue a Query where the PK is the singular value and the SK is <= the lookup IP address (in numeric form).
The one caveat is you need to make sure that any gaps in the IP address range data set need to be filled during the load with marker items saying "gap here", otherwise a lookup that hits the gap will return the range ahead of the gap.

How to do Geo IP or postcode lookup against Geonames data

I am using the freely available geonames data locally to do autocomplete searches during the sign up stage on one of my websites.
I am having trouble working out the best way to make the form more user friendly by auto selecting a geoname based on their IP address and also be able to lookup a geoname based on the postcode data.
The problem is that I can't see a way to easily link an IP range or a postal code to a geoname. So what is the best practice here? Do I just run a separate query to lookup the nearest geoname by long/lat against the postcode or IP address?
You don't mention how you are geolocating the IP address, but the MaxMind GeoIP2 and GeoLite2 databases provide the geoname_id of the location. See, e.g., the CSV docs. The binary databases provide this same information.

use the database id as the restful service id expose a threat?

I have a restful service for the documents, where the documents are stored in mongodb, the restful api for the document is /document/:id, initially the :id in the api is using the mongodb 's object id, but I wonder deos this approach reveal the database id, and expose the potential threat, should I want to replace it with a pseudonymity id.
if it is needed to replace it the pseudonymity id, I wonder if there is a algorithmic methods for me to transform the object id and pseudonymity id back and forth without much computation
First, there is no "database id" contained in the ObjectID.
I'm assuming your concern comes from the fact that the spec lists a 3 byte machine identifier as part of the ObjectID. A couple of things to note on that:
Most of the time, the ObjectID is actually generated on the client side, not the server (though it can be). Hence this is usually the machine identifier for the application server, not your database
The 3 byte Machine ID is the first three bytes of the (md5) hash of the machine host name, or of the mac/network address, or the virtual machine id (depending on the particular implementation), so it can't be reversed back into anything particularly meaningful
With the above in mind, you can see that worrying about exposing information is not really a concern.
However, with even a small sample, it is relatively easy to guess valid ObjectIDs, so if you want to avoid that type of traffic hitting your application, then you may want to use something else (a hash of the ObjectID might be a good idea for example), but that will be dependent on your requirements.

Can Sync Services add a column on the central table?

Is it possible to have Sync Services for ADO.NET read data from a table on multiple devices and insert it into a central SQL Server, having an additional column in the central table with the origin of the row data?
Let's say I have equipped door-to-door sales people with a device where they register sales. The local table would contain rows with sales information, and the central database would contain the same data + a column with the ID of the sales person.
Is that possible, or would I need the sales person's ID in the local database too?
Sync Framework identifies each client with a GUID (see: How To:Use Session Variables) and you can use that to map a particular client to a particular salesperson (see:Identifying Which Client Made a Data Change on either How to: Use Custom Change Tracking System or How to: Use SQL Server Change Tracking.
Or try the approach here for intercepting the change dataset and inserting/substituting the salesperson value: Part 1 – Upload Synchronization where the Client and Server Primary Keys are different

CQRS Command and domain state

I am new to CQRS and confused on how command will write a address change to a customer object
Lets say I have divided customer information into two tables
customer - Domain database
Active
Preferred
Customer_Read database
Name,
Address,
Phone,
email
User modifies address of the customer. The address fields are all in read database.
there may be 3 or more query friendly tables that is keeping address information.
If I understand the CQRS implementations (sample) Customer Domain (removed Aggregate root) should be publishing event about address change that should be handled by multiple handlers to update each of the table.
How do I implement this when I wont be changing the state of customer object?
Do domain have to know that it has address in another database ?
Thank you in advance.
Regards,
The Mar
Update--
After going through more posts on net I am assuming that if the state is not changed by the command no event will be generated to save the domain itself but events will be applied to change the address in query / View Model friendly tables.
You still need to persist some domain data somewhere in the write persistence. This way the address is stored in this persistence store, event is published after changing it.
This way:
if there were no change - we can skip publishing the event
domain does not need to know anything about objects that may (or may not) be subscribed to his events.
This logic applies to both persistence in relational DBs (MS SQL with NHibernate, for example) and event sourcing approach.