Microservices communication model - rest

Consider microservices architecture, where you need to expose functionality to manage simple configuration shared with different microservices. Configuration is not changing often, but still, I would like to see changes whenever I ask for any value.
Using REST microservice seems easy, but it is adding latency.
Alternative could be RPC over messaging (i.e. RabbitMQ), but interface becomes more complicated.
What communication are you using for internal, simple services and what are pros and cons?
Any examples?
I tried with REST API, but it means a lot of "slow" requests, which add a latency to overall requests.

I've found that using RESTful APIs with some judicious implementation of cache-control headers actually works fairly well for this use case. The biggest challenge is ensuring that the HTTP client underneath your REST client actually respects the things.
It's fairly easy to implement, fits nicely into HTTP, and generally scales really well. It gives control to the client to decide if they want to respect the caching suggestions, allows server to optimize if it "knows" the configs haven't change (304 Not modified) to optimize if the client wants to ask for new versions.
You don't have to get into anything too complicated from a cache-invalidation, and you can leverage things like edge caching to further accelerate things in interesting ways.

The question to ask is ultimately the extent to which it is a requirement that a change to the configuration immediately affects everything.
If that's actually a requirement, then we're talking about strong consistency which implies some combination of:
all other processing must be effectively executed one-at-a-time against the (there can only ultimately be one: if there's multiple, then they will be affected at different times) component against which the change is made
all other processing must stop for the duration of time that it takes to propagate the change to all components
(these can be combined: you can have multiple instances depend on the configuration and stop for as long as it takes to update those and then you can execute things in parallel... an example of this is making it static configuration in the dependent services and taking them all down to update the configuration: if these updates are sufficiently rare, you can fit them into your error/downtime budget)
Needless to say, there's a (likely surprisingly small) consistency budget you're dealing with.
If you don't actually need strong absolute consistency like I've described (and the set of problems which actually need it is perhaps surprisingly small: anything to do with money for instance doesn't actually need strong consistency because it's only money), then it's a question of how much inconsistency is acceptable (typically you'll quantify this with some sort of bounded staleness and a liveness guarantee that you don't go back in time (unless there's a really good reason to go back in time...)). At this point, we've established that you want eventual consistency, we're just haggling over "how eventual?".
For this, propagating the configuration changes via durable publish-subscribe log (Kafka being the exemplar of this approach) is probably the place to start. Components subscribe to this log and update local state as it changes (and probably store the log position and the last value in some local store to prevent inadvertently going backward in time when they initially read the log). Then you can distribute the configuration so that it's in local memory of the subscribers, though during an update, there will be a window where different subscribers will have different views of that configuration.

A lot of solutions exist to externalize microservice configuration to a central location depending on what frameworks/programming languages you used to build your services. If it happened you would be using Spring, take a look at Spring Cloud Config. Off course Eureka is not the only solution tailored for this purpose.

Related

Is there standard way of making multiple API calls combined into one HTTP request?

While designing rest API's I time to time have challenge to deal with batch operations (e.g. delete or update many entities at once) to reduce overhead of many tcp client connections. And in particular situation problem usually solves by adding custom api method for specific operation (e.g. POST /files/batchDelete which accepts ids at request body) which doesn't look pretty from point of view of rest api design principles but do the job.
But for me general solution for the problem still desirable. Recently I found Google Cloud Storage JSON API batching documentation which for me looks like pretty general solution. I mean similar format may be used for any http api, not just google cloud storage. So my question is - does anybody know kind of general standard (standard or it's draft, guideline, community effort or so) of making multiple API calls combined into one HTTP request?
I'm aware of capabilities of http/2 which include usage of single tcp connection for http requests but my question is addressed to application level. Which in my opinion still make sense because despite of ability to use http/2 taking that on application level seems like the only way to guarantee that for any client including http/1 which is currently the most used version of http.
TL;DR
REST nor HTTP are ideal for batch operations.
Usually caching, which is one of RESTs constraints, which is not optional but mandatory, prevents batch processing in some form.
It might be beneficial to not expose the data to update or remove in batch as own resources but as data elements within a single resource, like a data table in a HTML page. Here updating or removing all or parts of the entries should be straight forward.
If the system in general is write-intensive it is probably better to think of other solutions such as exposing the DB directly to those clients to spare a further level of indirection and complexity.
Utilization of caching may prevent a lot of workload on the server and even spare unnecessary connecctions
To start with, REST nor HTTP are ideal for batch operations. As Jim Webber pointed out the application domain of HTTP is the transfer of documents over the Web. This is what HTTP does and this is what it is good at. However, any business rules we conclude are just a side effect of the document management and we have to come up with solutions to turn this document management side effects to something useful.
As REST is just a generalization of the concepts used in the browsable Web, it is no miracle that the same concepts that apply to Web development also apply to REST development in some form. Thereby a question like how something should be done in REST usually resolves around answering how something should be done on the Web.
As mentioned before, HTTP isn't ideal in terms of batch processing actions. Sure, a GET request may retrieve multiple results, though in reality you obtain one response containing links to further resources. The creation of resources has, according to the HTTP specification, to be indicated with a Location header that points to the newly created resource. POST is defined as an all purpose method that allows to perform tasks according to server-specific semantics. So you could basically use it to create multiple resources at once. However, the HTTP spec clearly lacks support for indicating the creation of multiple resources at once as the Location header may only appear once per response as well as define only one URI in it. So how can a server indicate the creation of multiple resources to the server?
A further indication that HTTP isn't ideal for batch processing is that a URI must reference a single resource. That resource may change over time, though the URI can't ever point to multiple resources at once. The URI itself is, more or less, used as key by caches which store a cacheable response representation for that URI. As a URI may only ever reference one single resource, a cache will also only ever store the representation of one resource for that URI. A cache will invalidate a stored representation for a URI if an unsafe operation is performed on that URI. In case of a DELETE operation, which is by nature unsafe, the representation for the URI the DELETE is performed on will be removed. If you now "redirect" the DELETE operation to remove multiple backing resources at once, how should a cache take notice of that? It only operates on the URI invoked. Hence even when you delete multiple resources in one go via DELETE a cache might still serve clients with outdated information as it simply didn't take notice of the removal yet and its freshness value would still indicate a fresh-enough state. Unless you disable caching by default, which somehow violates one of REST's constraints, or reduce the time period a representation is considered fresh enough to a very low value, clients will probably get served with outdated information. You could of course perform an unsafe operation on each of these URIs then to "clear" the cache, though in that case you could have invoked the DELETE operation on each resource you wanted to batch delete itself to start with.
It gets a bit easier though if the batch of data you want to remove is not explicitly captured via their own resources but as data of a single resource. Think of a data-table on a Web page where you have certain form-elements, such as a checkbox you can click on to mark an entry as delete candidate and then after invoking the submit button send the respective selected elements to the server which performs the removal of these items. Here only the state of one resource is updated and thus a simple POST, PUT or even PATCH operation can be performed on that resource URI. This also goes well with caching as outlined before as only one resource has to be altered, which through the usage of unsafe operations on that URI will automatically lead to an invalidation of any stored representation for the given URI.
The above mentioned usage of form-elements to mark certain elements for removal depends however on the media-type issued. In the case of HTML its forms section specifies the available components and their affordances. An affordance is the knowledge what you can and should do with certain objects. I.e. a button or link may want to be pushed, a text field may expect numeric or alphanumeric input which further may be length limited and so on. Other media types, such as hal-forms, halform or ion, attempt to provide form representations and components for a JSON based notation, however, support for such media-types is still quite limited.
As one of your concerns are the number of client connections to your service, I assume you have a write-intensive scenario as in read-intensive cases caching would probably take away a good chunk of load from your server. I.e. BBC once reported that they could reduce the load on their servers drastically just by introducing a one minute caching interval for recently requested resources. This mainly affected their start page and the linked articles as people clicked on the latest news more often than on old news. On receiving a couple of thousands, if not hundred thousands, request per minute they could, as mentioned before, reduce the number of requests actually reaching the server significantly and therefore take away a huge load on their servers.
Write intensive use-cases however can't take benefit of caching as much as read-intensive cases as the cache would get invalidated quite often and the actual request being forward to the server for processing. If the API is more or less used to perform CRUD operations, as so many "REST" APIs do in reality, it is questionable if it wouldn't be preferable to expose the database directly to the clients. Almost all modern database vendors ship with sophisticated user-right management options and allow to create views that can be exposed to certain users. The "REST API" on top of it basically just adds a further level of indirection and complexity in such a case. By exposing the DB directly, performing batch updates or deletions shouldn't be an issue at all as through the respective query languages support for such operations should already be build into the DB layer.
In regards to the number of connections clients create: HTTP from 1.0 on allows the reusage of connections via the Connection: keep-alive header directive. In HTTP/1.1 persistent connections are used by default if not explicitly requested to close via the respective Connection: close header directive. HTTP/2 introduced full-duplex connections that allow many channels and therefore requests to reuse the same connections at the same time. This is more or less a fix for the connection limitation suggested in RFC 2626 which plenty of Web developers avoided by using CDN and similar stuff. Currently most implementations use a maximum limit of 100 channels and therefore simultaneous downloads via a single connections AFAIK.
Usually opening and closing a connection takes a bit of time and server resources and the more open connections a server has to deal with the more a system may suffer. Though open connections with hardly any traffic aren't a big issue for most servers. While the connection creation was usually considered to be the costly part, through the usage of persistent connections that factor moved now towards the number of requests issued, hence the request for sending out batch-requests, which HTTP is not really made for. Again, as mentioned throughout the post, through the smart utilization of caching plenty of requests may never reach the server at all, if possible. This is probably one of the best optimization strategies to reduce the number of simultaneous requests, as probably plenty of requests might never reach the server at all. Probably the best advice to give is in such a case to have a look at what kind of resources are requested frequently, which requests take up a lot of processing capacity and which ones can easily get responded with by utilizing caching options.
reduce overhead of many tcp client connections
If this is the crux of the issue, the easiest way to solve this is to switch to HTTP/2
In a way, HTTP/2 does exactly what you want. You open 1 connection, and using that collection you can send many HTTP requests in parallel. Unlike batching in a single HTTP request, it's mostly transparent for clients and response and requests can be processed out of order.
Ultimately batching multiple operations in a single HTTP request is always a network hack.
HTTP/2 is widely available. If HTTP/1.1 is still the most used version (this might be true, but gap is closing), this has more to do with servers not yet being set up for it, not clients.

When to use polling and streaming in launch darkly

I have started using launch darkly(LD) recently. And I was exploring how LD updates its feature flags.
As mentioned Here, there are two ways.
Streaming
Polling
I was just thinking which implementation will be better in what cases. After a little research about streaming vs polling, It was found Streaming has the following advantages over polling.
Faster than polling
Receives only latest data instead of all the data which is same as before
Avoids periodic requests
I am pretty sure all of the above advantages comes at a cost. So,
Are there any downsides of using streaming over polling?
In what scenarios polling should be preferred? or the other way around?
On what factors should I decide whether to stream or poll?
Streaming
Streaming requires your application to be always alive. This might not be the case in a serverless environment. Furthermore, a streaming solution usually relies on a connection that is always open in the background. This might be costly, so feature flag providers tend to limit the number of concurrent connections you can keep open to their infrastructure. This might be not a problem if you use feature flags only in a few application instances. But you will easily reach the limit if you want to stream feature flag updates to mobile apps or a ton of microservices.
Polling
Polling sounds less fancy, but it's a reliable & robust old-school pattern that will work in almost all environments.
Webhooks
There is a third option too: webhooks. The basic idea is that you create an HTTP endpoint on your end and he feature flag service will call that endpoint whenever a feature flag value update happens. This way you get a "notification" about feature flag value changes. For example ConfigCat supports this model. ConfigCat can notify your infrastructure by calling your webhooks and (optionally) pushing new values to your end. Webhooks have the advantage over streaming that they are cheap to maintain, so feature flag service providers don't limit them as much (for example ConfigCat can give you unlimited webhooks).
How to decide
How I would use the above 3 option really depends on your use-case. A general rule of thumb is: use polling by default and add quasi real-time notifications (by streaming or by webhooks) to the components where it's critical to know about feature flag value updates.
In addition to #Zoltan's answer, I Found the following from LaunchDarkly's Effective Feature management E book (Page 36)
In any networked system there are two methods to distribute information.
Polling is the method by which the endpoints (clients or servers) periodically ask for updates. Streaming, the second method,is when the central authority pushes the new values to all the end‐points as they change.Both options have pros and cons.
However, in a poll-based system, you are faced with an unattractive trade-off: either you poll infrequently and run the risk of different parts of your application having different flag states, or you poll very frequently and shoulder high costs in system load, network bandwidth, and the necessary infra‐structure to support the high demands.
A streaming architecture, on the other hand, offers speed advantages and consistency guarantees. Streaming is a better fit for large-scale and distributed systems. In this design, each client maintains along-running connection to the feature management system, which instantly sends down any changes as they occur to all clients.
Polling Pros:
Simple
Easily cached
Polling Cons:
Inefficient. All clients need to connect momentarily, regardless of whether there is a change.
Changes require roughly twice the polling interval to propagate to all clients.
Because of long polling intervals, the system could create a “split brain” situation, in which both new flag and old flag states exist at the same time.
Streaming Pros:
Efficient at scale. Each client receives messages only when necessary.
Fast Propagation. Changes can be pushed out to clients in real time.
Streaming Cons:
Requires the central service to maintain connections for every client
Assumes a reliable network
For my use case, I have decided to use polling in places where I don't need to update the flags often(long polling interval) and doesn't care about inconsistencies (split-brain) .
And Streaming for applications that need immediate flag updates and consistency is important.

How can I improve response time if the remote server is located very far physical distance

I want to know how to construct servers physically in this situation.
Let's assume that my service provides in the USA.
And my business is quite successful so, I want to expand my business location in Asia.
but I don't want to localized service, so I just got some API server in Asia to provide service which is just use API that located in headquater, but my main components are still in the USA.
But the problem is that my API which is located in Asia needs to call head-quater API which is located in the USA, and the response is quite often slow because of far physical distance.
so In this situation, How can I overcome?
In my opinion, I get some CDN for static contents. but I have no idea how to improve the API response time problem which is originated from physical distance.
If it is a stupid question, please understand, I'm quite a newbie in architect.
EDIT:
Also, How can I construct database replication in this situation.
If I get a replication which is replicate from the USA in Asia, I think the replication performance is quite poor because of phisical distance.
How Amazon or any global service construct it?
Replication performance can be quite poor. It is important to understand how much of your data is changing so that you can estimate the bandwidth required and understand whether your replication can keep up.
Amazon and other global services deal with this via a combination of replication, edge-caching (CDN), and other methodologies that bring the data closer to the consumer.
As a first step, you also might want to look at just making your API more coarse-grained. The fewer calls you have to make, the higher the performance (as the problem is likely latency, not bandwidth). See if you can batch things up instead of handling them one-at-a-time.
You also can look critically at caching. Instead of making your read-only API calls all the time, introduce some cache-control headers to specify the acceptable age of your requests. A lot of data is very static, things like user data, departments, product-info etc... Some of this data can leverage caching layers to become much more performant.
If you want to use AWS and want to host main components in a specific region, then you may think of hosting it yourself in EC2(s) [as Origin Server] in the region of your choice and use Cloudfront (CDN) to serve the content globally. AWS employs their own High Speed Backbone Network to reduce latency between geographically distant locations, by reducing no of Network hops.
From a caching standpoint, as Rob rightly said, Cloudfront performs different caching mechanisms for hot objects, warm objects (edge-caching, regional-caching); Also the Origin servers can send minimum expiration time and maximum expiration time over HTTP Headers to define Caching TTL.
If however, you don't want to use the advantage of High Speed Backbone Network, you should consider application design of your endpoints and functionality keeping latency as a constraint; and use appropriate TTL for caching of objects and define appropriate caching strategy, keeping in mind the R/W ratio of your application.

REST service with load balancing

I've been considering the advantages of REST services, the whole statelessness and session affinity "stuff". What strikes me is that if you have multiple deployed versions of your service on a number of machines in your infrastructure, and they all act on a given resource, where is the state of that resource stored?
Would it make sense to have a single host in the infrastructre that utilises a distributed cache, and any state that is change inside a service, it simply fetches/puts to the cache? This would allow any number of deployed services for loading balancing reasons to all see the same state views of resources.
If you're designing a system for high load (which usually implies high reliability), having a single point of failure is never a good idea. If the service providing the consistent view goes down, at best your performance decreases drastically as the database is queried for everything and at worst, your whole application stops working.
In your question, you seem to be worried about consistency. If there's something to be learned about eBay's architecture, it's that there is a trade-off to be made between availability/redundancy/performance vs consistency. You may find 100% consistency is not required and you can get away with a little "chaos".
A distributed cache (like memcache) can be used as a backing for a distributed hashtable which have been used extensively to create scalable infrastructures. If implemented correctly, caches can be redundant and caches can join and leave the ring dynamically.
REST is also inherently cacheable as the HTTP layer can be cached with the appropriate use of headers (ETags) and software (e.g. Squid proxy as a Reverse proxy). The one drawback of specifying caching through headers is that it relies on the client interpreting and respecting them.
However, to paraphrase Phil Karlton, caching is hard. You really have to be selective about the data that you cache, when you cache it and how you invalidate that cache. Invalidating can be done in the following ways:
Through a timer based means (cache for 2 mins, then reload)
When an update comes in, invalidating all caches containing the relevant data.
I'm partial to the timer based approach as its simpler to implement and you can say with relative certainty how long stale data will live in the system (e.g. Company details will be updated in 2 hours, Stock prices will be updated in 10 seconds).
Finally, high load also depends on your use case and depending on the amount of transactions none of this may apply. A methodology (if you will) may be the following:
Make sure the system is functional without caching (Does it work)
Does it meet performance criteria (e.g. requests/sec, uptime goals)
Optimize the bottlenecks
Implement caching where required
After all, you may not have a performance problem in the first place and you may able to get away with a single database and a good back up strategy.
I think the more traditional view of load balancing web applications is that you would have your REST service on multiple application servers and they would retrieve resource data from single database server.
However, with the use of hypermedia, REST services can easily vertically partition the application so that some resources come from one service and some from another service on a different server. This would allow you to scale to some extent, depending on your domain, without have a single data store. Obviously with REST you would not be able to do transactional updates across these services, but there are definitely scenarios where this partitioning is valuable.
If you are looking at architectures that need to really scale then I would suggest looking at Greg Young's stuff on CQS Architecture (video) before attempting to tackle the problems of a distributed cache.

What's the best IPC mechanism for medium-sized data in Perl? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm working on designing a multi-tiered app in Perl and I'm wondering about the pros and cons of the various IPC mechnisms available to me. I'm looking at handling moderately-sized data, typically a few dozen kilobytes but up to a couple of megabytes, and the load is pretty light, at most a couple of hundred requests per minute.
My primary concerns are maintainability and performance (in that order). I don't think I'll need to scale up to more than one server, or port off of our main platform (RHEL), but I suppose it's something to consider.
I can think of the following options:
Temporary files - Simplistic, probably the worst option in terms of speed and storage requirements
UNIX domain sockets - Not portable, not scalable
Internet Sockets - Portable, scalable
Pipes - Portable, not scalable (?)
Considering that scalability and portability are not my primary concerns, I need to learn more. What's the best choice, and why? Please comment if you need additional information.
EDIT: I'll try to give more detail in response to ysth's questions (warning, wall of text follows):
Are readers/writers in a one-to-one relationship, or something more more complicated?
What do you want to happen to the writer if the reader is no longer there or busy?
And vice versa?
What other information do you have about your desired usage?
At this point, I'm contemplating a three-tiered approach, but I'm not sure how many processes I'll have in each tier. I think I need to have more processes towards the left side and fewer toward the right, but maybe I should have the same number across the board:
.---------. .----------. .-------.
| Request | -----> | Business | -----> | Data |
| Manager | <----- | Logic | <----- | Layer |
`---------' `----------' `-------'
These names are still generic and probably won't make it into the implementation in these forms.
The request manager is responsible for listening for requests from different interfaces, for example web requests and CLI (where response time is important) and e-mail (where response time is less important). It performs logging and manages the responses to the requests (which are rendered in a format appropriate to the type of request).
It sends data about the request to the business logic which performs logging, authorization depending on business rules, etc.
The business logic (if it needs to) then requests data from the data layer, which can either talk to (most often) the internal MySQL database or some other data source outside our team's control (e.g., our organization's primary LDAP servers, or our DB2 employee information database, etc.). This is mostly simply a wrapper which formats the data in a uniform way so that it can be handled more easily in the business logic.
The information then flows back to to the request manager for presentation.
If, when data is flowing to the right, the reader is busy, for the interactive requests I'd like to simply wait a suitable period of time, and return a timeout error if I don't get access in that amount of time (e.g. "Try again later"). For the non-interactive requests (e.g. e-mail), the polling system can simply exit and try again on the next invocation (which will probably be once per 1-3 minutes).
When data is flowing in the other direction, there shouldn't be any waiting situations. If one of the processes has died when trying to travel back to the left, all I can really do is log and exit.
Anyway, that was pretty verbose, and since I'm still in early design I probably still have some confused ideas in there. Some of what I've mentioned is probably tangential to the issue of which IPC system to use. I'm open to other suggestions on the design, but I was trying to keep the question limited in scope (For example, maybe I should consider collapsing down to two tiers, which is a much simpler for IPC). What are your thoughts?
If you're unsure about your exact requirements at the moment, try to think of a simple interface that you can code to, that any IPC implementation (be it temporary files, TCP/IP or whatever) needs to support. You can then choose a particular IPC flavour (I would start with whatever's easiest and/or easiest to debug -- probably temporary files) and implement the interface using that. If that turns out to be too slow, implement the interface using e.g. TCP/IP. Actually implementing the interface does not involve much work as you will essentially just be forwarding calls to some existing library.
The point is that you have a high-level task to perform ("transmit data from program A to program B") which is more or less independent of the details of how it is performed. By establishing an interface and coding to it, you isolate the main program from changes in the event that you need to change the implementation.
Note that you don't need to use any heavyweight Perl language mechanisms to capitalise on the idea of having an interface. You could simply have e.g. 3 different packages (for temp files, TCP/IP, Unix domain sockets), each of which exports the same set of methods. Choosing which implementation you want to use in your main program amounts to choosing which module to use.
Temporary files (and related things, like a shared memory region), are probably a bad bet. If you ever want to run your server on one machine and your clients on another, you will need to rewrite your application. If you pick any of the other options, at least the semantics are the essentially the same, if you need to switch between them at a later date.
My only real advice, though, is to not write this yourself. On the server side, you should use POE (or Coro, etc.), rather than doing select on the socket yourself. Also, if your interface is going to be RPC-ish, use something like JSON-RPC-Common/ from the CPAN.
Finally, there is IPC::PubSub, which might work for you.
Temporary files have other problems besides that. I think Internet socks are really the best choice. They are well documented, and as you say, scalable and portable. Even if that is not a core requirement, you get it nearly for free. Sockets are pretty easy to deal with, again there is copious amounts of documentation. You can build out your data sharing mechanism and protocol out in a library and never have to look at it again!
UNIX domain sockets are portable across unices. It's no less portable than pipes. It's also more efficient than IP sockets.
Anyway, you missed a few options, shared memory for example. Some would add databases to that list but I'd say that's a rather heavyweight solution.
Message queues would also be a possibility, though you'd have to change a kernel option for it to handle such large messages. Otherwise, they have an ideal interface for a lot of things, and IMHO they are greatly underused.
I generally agree though that using an existing solution is better than building somethings of your own. I don't know the specifics of your problem, but I'd suggest you'd check out the IPC section of CPAN
There are so many different options because most of them are better for some particular case, but you haven't really given any information that would identify your case.
Are readers/writers in a one-to-one relationship, or something more more complicated?
What do you want to happen to the writer if the reader is no longer there or busy? And vice versa?
What other information do you have about your desired usage?
For "interactive" requests (holding the connection open while waiting for a response (asynchronously or not): HTTP + JSON. JSON::XS is insanely fast. Everyone and everything can speak HTTP and it's easy to load balance, debug, ...
For queued requests ("please do this, thanks!"): Beanstalkd and Beanstalk::Client. Serialize the requests in the beanstalk queue with JSON.
Thrift might also be worth looking into depending on your application.