How can I improve response time if the remote server is located very far physical distance - server

I want to know how to construct servers physically in this situation.
Let's assume that my service provides in the USA.
And my business is quite successful so, I want to expand my business location in Asia.
but I don't want to localized service, so I just got some API server in Asia to provide service which is just use API that located in headquater, but my main components are still in the USA.
But the problem is that my API which is located in Asia needs to call head-quater API which is located in the USA, and the response is quite often slow because of far physical distance.
so In this situation, How can I overcome?
In my opinion, I get some CDN for static contents. but I have no idea how to improve the API response time problem which is originated from physical distance.
If it is a stupid question, please understand, I'm quite a newbie in architect.
Also, How can I construct database replication in this situation.
If I get a replication which is replicate from the USA in Asia, I think the replication performance is quite poor because of phisical distance.
How Amazon or any global service construct it?

Replication performance can be quite poor. It is important to understand how much of your data is changing so that you can estimate the bandwidth required and understand whether your replication can keep up.
Amazon and other global services deal with this via a combination of replication, edge-caching (CDN), and other methodologies that bring the data closer to the consumer.
As a first step, you also might want to look at just making your API more coarse-grained. The fewer calls you have to make, the higher the performance (as the problem is likely latency, not bandwidth). See if you can batch things up instead of handling them one-at-a-time.
You also can look critically at caching. Instead of making your read-only API calls all the time, introduce some cache-control headers to specify the acceptable age of your requests. A lot of data is very static, things like user data, departments, product-info etc... Some of this data can leverage caching layers to become much more performant.

If you want to use AWS and want to host main components in a specific region, then you may think of hosting it yourself in EC2(s) [as Origin Server] in the region of your choice and use Cloudfront (CDN) to serve the content globally. AWS employs their own High Speed Backbone Network to reduce latency between geographically distant locations, by reducing no of Network hops.
From a caching standpoint, as Rob rightly said, Cloudfront performs different caching mechanisms for hot objects, warm objects (edge-caching, regional-caching); Also the Origin servers can send minimum expiration time and maximum expiration time over HTTP Headers to define Caching TTL.
If however, you don't want to use the advantage of High Speed Backbone Network, you should consider application design of your endpoints and functionality keeping latency as a constraint; and use appropriate TTL for caching of objects and define appropriate caching strategy, keeping in mind the R/W ratio of your application.


Microservices communication model

Consider microservices architecture, where you need to expose functionality to manage simple configuration shared with different microservices. Configuration is not changing often, but still, I would like to see changes whenever I ask for any value.
Using REST microservice seems easy, but it is adding latency.
Alternative could be RPC over messaging (i.e. RabbitMQ), but interface becomes more complicated.
What communication are you using for internal, simple services and what are pros and cons?
Any examples?
I tried with REST API, but it means a lot of "slow" requests, which add a latency to overall requests.
I've found that using RESTful APIs with some judicious implementation of cache-control headers actually works fairly well for this use case. The biggest challenge is ensuring that the HTTP client underneath your REST client actually respects the things.
It's fairly easy to implement, fits nicely into HTTP, and generally scales really well. It gives control to the client to decide if they want to respect the caching suggestions, allows server to optimize if it "knows" the configs haven't change (304 Not modified) to optimize if the client wants to ask for new versions.
You don't have to get into anything too complicated from a cache-invalidation, and you can leverage things like edge caching to further accelerate things in interesting ways.
The question to ask is ultimately the extent to which it is a requirement that a change to the configuration immediately affects everything.
If that's actually a requirement, then we're talking about strong consistency which implies some combination of:
all other processing must be effectively executed one-at-a-time against the (there can only ultimately be one: if there's multiple, then they will be affected at different times) component against which the change is made
all other processing must stop for the duration of time that it takes to propagate the change to all components
(these can be combined: you can have multiple instances depend on the configuration and stop for as long as it takes to update those and then you can execute things in parallel... an example of this is making it static configuration in the dependent services and taking them all down to update the configuration: if these updates are sufficiently rare, you can fit them into your error/downtime budget)
Needless to say, there's a (likely surprisingly small) consistency budget you're dealing with.
If you don't actually need strong absolute consistency like I've described (and the set of problems which actually need it is perhaps surprisingly small: anything to do with money for instance doesn't actually need strong consistency because it's only money), then it's a question of how much inconsistency is acceptable (typically you'll quantify this with some sort of bounded staleness and a liveness guarantee that you don't go back in time (unless there's a really good reason to go back in time...)). At this point, we've established that you want eventual consistency, we're just haggling over "how eventual?".
For this, propagating the configuration changes via durable publish-subscribe log (Kafka being the exemplar of this approach) is probably the place to start. Components subscribe to this log and update local state as it changes (and probably store the log position and the last value in some local store to prevent inadvertently going backward in time when they initially read the log). Then you can distribute the configuration so that it's in local memory of the subscribers, though during an update, there will be a window where different subscribers will have different views of that configuration.
A lot of solutions exist to externalize microservice configuration to a central location depending on what frameworks/programming languages you used to build your services. If it happened you would be using Spring, take a look at Spring Cloud Config. Off course Eureka is not the only solution tailored for this purpose.

Limits of processing data on the client vs. processing data on the server

For a desktop App (ERP like functionality) I'm and wondering what would be wiser to do.
Assuming that both machines are equal in performance and the server has to deal with max. 5-10 clients and no other obligations. Is it better to load all data initially (~20.000 objects) and do filtering, sorting etc. on the client (electron) or is it better to do the processing on the backend (golang + postgres) over Axios. The user interface should be as snappy as possible but also get the data as fast as possible.
A costly operation is filtering 15.000 Objects by a reference ID. (e.g. a client can have several orders)
So objects that belong to a "parent object" are displayed by querying all those objects by a parentID.
Is there a general answer to what would be more performant, or a better choice here? Doing some assumptions, like a latency of 5ms in the network + 20ms for the API + a couple for filling the store.
At which data size will this operation be slower on the frontend or completely unsustainable?
If it's not a performance problem, are there other reasons I would want to do this on the server?
Edit: Client and Server are on the same local network
You specifically mention an ERP-like software. For such software you have to carefully consider the value of consistency:
Will your software need to show the same data for all clients?
If the answer to this is yes, then the simplest implementation is to do data processing on the server which informs all clients of changing data.
If the answer to this is no, then you should be fine doing most processing on the client software.
There are of course ways to do most of your processing on the client yet still have consistency but they will add complexity to your overall design. One implementation is to broadcast changes on one client to all other clients. This is the architecture behind most multiplayer online games.
Another way to tackle this is implemented by git: the data on all clients are different from each other but there are ways to synchronize each client data with the server thus achieving eventual consistency.
Another consideration you have to think about is the size of your data:
Will downloading all the data from the server take more than a few seconds?
If downloading all data from the server takes too long then the UI will be essentially unresponsive when starting.

Real life scenarios of when would anyone choose availability over consistency (Who would be in interested in stale data?)

I was trying to wrap my brain around the CAP theorem. I understand that Network partitions can occur (eventually leading to the nodes in the cluster not able to sync up with the WRITE operations happening on the other nodes.)
In this case, either the Cluster could still be up and the load-balancer in front of the cluster could route the request to any of the nodes and after a WRITE operation on one of the nodes, the other nodes who can't sync with that data, still have STALE data and any subsequent READS to these nodes will serve STALE data.
[So we are Loosing CONSISTENCY as we choose AVAILABILITY (i.e., we have choose the cluster to give STALE responses back.)]
Or we could SHUTDOWN the cluster whenever a network partition occurs! (There by loosing AVAILABILITY as we don't want to hamper consistency among the nodes.)
I have 2 things I would like to know the answer for it:
In Reality, When would anyone choose to be AVAILABLE and still trade off CONSISTENCY? Who on this earth (practically) would be interested in STALE data?
Please help me understand by listing more than one scenarios.
In case, we would like to choose CONSISTENCY over AVAILABILITY,
the cluster is down. Who on earth (real-time scenarios) practically would accept to design their system to be DOWN in order to preserve CONSISTENCY.
Please list some scenarios.
Won't majority of us look for High availability no matter what? what are our options? please enlighten.
If I send you a message on FB and you send one to me, I'd rather prefer to see messages in an incorrect order(message sent at 1pm comes before message sent at 2pm) rather than not seeing them at all(example of AVAILABILITY of messages prefered over read-after-write CONSISTENCY of messages). Another example, If I gather web site metrics, I'd rather skip or drop some signal rather then force my users to wait for a page load while my consistent transaction is stuck.
Keep in mind that consistency doesn't mean STALE data, also data can be inconsistent in different ways(
Financial transactions are a classic example of data that requires consistency over availability. As a bank, I'd rather decline user request for money transfer, than accept it and lose customer's money due to DB being down.
I'd like to point out that CAP theorem is a high-level concept. There are a lot of ways you can treat terms consistency, availability or even partitioning, and different businesses have different requirements. Software engineering as a whole and distributed systems engineering, in particular, is about making trade-offs.
An example where you may choose Availability over Consistency is collaborative editing (e.g. Google Docs). It may be perfectly acceptable (and in fact desirable) to allow users to make local modifications to the documents and deal with conflict resolution once network is restored.
A bank ATM is an example where you'd choose Consistency over Availability. Once ATM is disconnected from the network you would not want to allow withdrawals (thus, no Availability). Or, you could pick partial Availability, and allow deposits or read-only access to your bank statements.

Is a HTTP REST request the only way to access Azure Storage?

I've started reading about Azure Storage and it seems that the only way to access it is via an HTTP REST request.
I've seen that there are a few wrappers around these requests, for example, StorageClient (by Microsoft) and cloud storage api (, but they all still use REST in the background (to the best of my understanding).
It seems unreasonable to me that this is actually true. If I have a machine in Azure, and I want to access data stored in Azure Storage, it would seem every inefficient to
Yes, all storage calls are normalized to the REST API. Its actually very efficient when you consider the problem. You are thinking of a machine in Azure and data in azure as stored on two servers sitting in a rack. Remember in Azure, your data, your "servers", etc may be stored in different racks, different zones, and even different datacenters. With the REST API, your apps don't have to care about any of this. They just get the data with the URL.
So while a tiny HTTP overhead may appear inefficient if these were two boxes next to each other, its actually a very elegant solution when they are on different continents. Factor in concepts such as CDN, and it becomes an even better fit.
Layered onto this base concept is the Azure load balancer and other pieces of the internal infrastructure which can further optimize every request because they are all the same (HTTP). I also wouldn't be surprised (not sure at all, I dont work for MSFT) if the LB was doing traffic management optimizations when a request is made intra-datacenter.
Throughput on the storage subsystem in Windows Azure is pretty high. I'd be very surprised if the system cannot deliver to your needs.
There are also many design patterns to increase scalability of your app, like asynch processing, batching requests, delayed processing, etc.

REST service with load balancing

I've been considering the advantages of REST services, the whole statelessness and session affinity "stuff". What strikes me is that if you have multiple deployed versions of your service on a number of machines in your infrastructure, and they all act on a given resource, where is the state of that resource stored?
Would it make sense to have a single host in the infrastructre that utilises a distributed cache, and any state that is change inside a service, it simply fetches/puts to the cache? This would allow any number of deployed services for loading balancing reasons to all see the same state views of resources.
If you're designing a system for high load (which usually implies high reliability), having a single point of failure is never a good idea. If the service providing the consistent view goes down, at best your performance decreases drastically as the database is queried for everything and at worst, your whole application stops working.
In your question, you seem to be worried about consistency. If there's something to be learned about eBay's architecture, it's that there is a trade-off to be made between availability/redundancy/performance vs consistency. You may find 100% consistency is not required and you can get away with a little "chaos".
A distributed cache (like memcache) can be used as a backing for a distributed hashtable which have been used extensively to create scalable infrastructures. If implemented correctly, caches can be redundant and caches can join and leave the ring dynamically.
REST is also inherently cacheable as the HTTP layer can be cached with the appropriate use of headers (ETags) and software (e.g. Squid proxy as a Reverse proxy). The one drawback of specifying caching through headers is that it relies on the client interpreting and respecting them.
However, to paraphrase Phil Karlton, caching is hard. You really have to be selective about the data that you cache, when you cache it and how you invalidate that cache. Invalidating can be done in the following ways:
Through a timer based means (cache for 2 mins, then reload)
When an update comes in, invalidating all caches containing the relevant data.
I'm partial to the timer based approach as its simpler to implement and you can say with relative certainty how long stale data will live in the system (e.g. Company details will be updated in 2 hours, Stock prices will be updated in 10 seconds).
Finally, high load also depends on your use case and depending on the amount of transactions none of this may apply. A methodology (if you will) may be the following:
Make sure the system is functional without caching (Does it work)
Does it meet performance criteria (e.g. requests/sec, uptime goals)
Optimize the bottlenecks
Implement caching where required
After all, you may not have a performance problem in the first place and you may able to get away with a single database and a good back up strategy.
I think the more traditional view of load balancing web applications is that you would have your REST service on multiple application servers and they would retrieve resource data from single database server.
However, with the use of hypermedia, REST services can easily vertically partition the application so that some resources come from one service and some from another service on a different server. This would allow you to scale to some extent, depending on your domain, without have a single data store. Obviously with REST you would not be able to do transactional updates across these services, but there are definitely scenarios where this partitioning is valuable.
If you are looking at architectures that need to really scale then I would suggest looking at Greg Young's stuff on CQS Architecture (video) before attempting to tackle the problems of a distributed cache.