What exactly does one mean when they say that rest services scale ?
Suppose I have a clustered environment with multiple instances of my application that exposes a rest service for some resources. Why does the rest service scale better than a traditional web service?
Thanks
REST, when applied correctly, scales well for many reasons:
One of the major scalability problems in distributed system is storing application state on the server-side and guaranteeing consistency and availability across all nodes. Being stateless, you don't need to worry about that storage, and without a lot of communication between servers it's easier to scale horizontally. Nodes can be added and removed at will, and any node can respond any requests.
With uniform interfaces, clients can deal with server-side changes more gracefully, and it's easier to deploy changes that help scalability. Many developers often have a lot of improvements ready to be deployed, but have to wait for clients to adapt to the changes won't break everything.
Also, with hypermedia it's easier to direct a client transparently between different services, which makes it easier to separate concerns, to scale each part according to demand, and to use a server that's better suited for a particular type of resource.
REST requires resources to explicitly declare their cacheability, and aggressive caching is not only encouraged but often necessary when you're using hypermedia. Good caching strategies may improve your scalability considerably.
But it should be clear that REST rarely is the solution for scalability problems. REST is a solution for the problems involved in the long-term evolution of services. If this isn't your major concern, you can adopt some ideas to improve scalability without having to adhere to REST strictly.
Related
In the past I worked with what I believe were "microservices". We had a service discovery and the applications communicated with each other through REST sync calls, which I believe is called request / response.
Now we are working on an application that is simply broken down into multiple small applications which work together through publish / subscribe using Kafka, there's no direct communication between them nor a service discovery, per say.
Is it safe to say that these are "Microservices" too or what should I call them?
As one commenter pointed out- An answer will always be opinionated (and stackoverflow is generally not the best place for opinion based questions), but I am not afraid to give mine in an attempt to help.
My view goes along the lines of what Martin Fowler and others have taught us in last decade, as well as the insight from the pioneering work of Werner Vogels at Amazon. Their definitions are that microservices are not a well defined single standard, but an architectural style that can encompass a range of different approaches in distributed computing. What they have in common are the goals of providing scalability in many different dimensions, but mostly system complexity, organizational productivity, throughput and resilience - All while being affordable. To achieve that the designs should be centered around:
Service boundaries cut according to business goals (a type of Domain Driven Design).
Sizing / Scope of services to be limited (the famous two-pizza team size idea).
Clear separation in terms of organizational responsibilities and maintenance. This includes independent teams and release-/ maintenance cycles for each service.
Embrace of automation on all layers (how cloud virtualization, devops, CICD, infrastructure as code, etc. became popular tools).
Design focus on failure-resilient operation, e.g. avoid cascading failures, defined failure states, fallbacks, etc.
A consequence of those goals is also the embrace of distributed system architectures and design for horizontal scalability (statelessness, separation of compute and storage, etc.).
So if you feel your designs are along those lines then in my opinion you would have applied the microservices architectural style successfully.
If you want to learn more about this school of thought I recommend this read.
I would like to understand, how microservices is different from creating separate, stand-alone service(like REST or SOAP).
For instance, I have to create a license system for my Webapp. This can be simply a separate REST service, which can be consumed by Webapp. Still it can be tweaked, scaled up/down, isolated from my Webapp,
Why it needs to be a microservice ?
What are the pros and cons of each approach ?
Microservice is just a buzzword and people have been practicing that architectural style before calling it microservice. Here are the commonly agreed attributes of a microservice. If your service fulfills those points, chances are you can call it a microservice:
Independent life cycles (development, testing, deployment, etc) with independent development teams (to utilize Conway's Law to your advantage).
Focused scope - Implement exactly one specific domain aspect to keep it as "micro" as possible. (Separation of concerns)
Clearly defined interface to the outside world. Typically with changes only versioned and phaseout time frames.
Scalable design. Often stateless on purpose to allow for deployment of application clusters.
No sharing of databases/persistence. This is important to avoid indirect dependencies.
Additionally some people like to include things like freedom of implementation technologies, infrastructure automation and asynchronous communication.
The cons of not following these practices may not be relevant if you are only a single developer or small team starting out. But if you have a large development organization you would start to suffer under the crippling "monolithis"-disease, which potentially slows down development and release cycles significantly.
For more detailed understanding I recommend reading the Martin Fowler material. Especially the section about trade-offs.
At the moment we have one huge API which is used by our backoffice, our frontend, and also our public API.
This causes me a lot of headaches because when building new endpoints I find a lot of application specific logic in the code which I don't necessarily want to include in my endpoint. For example, the code to create a user might contain code to send a welcome email, but because that's not needed for the backoffice endpoint I will then need to add a new endpoint without that logic.
I was thinking about a large refactor to break our code base in to a number of smaller highly specific service APIs, then building a set of small application APIs on top of those.
So for example, an application endpoint to create a new user might do something like this after the refactor:
customerService.createCustomer();
paymentService.chargeCard();
emailService.sendWelcomeEmail();
The application and service APIs will be entirely separate code bases (perhaps a separate code base per service), they may also be built using different languages. They will only interact through REST API calls. They will be on the same local network, so latency shouldn't be a huge issue.
Is this a bad idea? I've never seen/worked on a codebase which has separated the two before, so perhaps there is a better architecture to achieve the flexibility and maintainability I'm looking for?
Advise, links, or comments would all be appreciated.
Your idea of making multiple, well-defined services is sound and really it is the best way to approach this. Going with purely micro-services approach however trendy it might seem, proves to be an overkill most often than not. This is why I'd just redesign the existing API/services properly and follow solid and sound SOA design principles below. Good Resources could be found on both serviceorientation.com and soapatterns.org I've always used them as reference in my career.
Consider what types of services you need
(image from serviceorientation.com)
Entity services are generally your Client, Payment services - e.g. services centered around an entity in your domain. They should be business-agnostic, and be able to be reused in all scenarios. They could be called sometimes by clients directly if sufficient for their needs. They could be called by Task services.
Utility services contain logic you're likely to reuse in other services, but are generally not called by the clients directly. Rather, they'd be called by Task and Entity services. An example might be a Transliteration service.
Task services combine and reuse Entity and Utility services into meaningful tasks. Most often they are not that agnostic and they do implement some specific business logic. They have meaningful business operations and they are what clients mostly call.
Principles to follow when redesigning
I strongly recommend going over this cheat sheet and making sure everything there is covered when you do your redesign. It's great help.
In general, you should make sure that:
Each service has a common context and follows the separation of concerns principle. E.g. Clients service is only for clients related operations, etc.
Each of the Entity and Utility services is business-agnostic and basic enough. So it can be reused in multiple scenarios and context without being changed. Contract must be simple - CRUD and only common operations that make sense in most usage scenarios.
Services follow a common data model - make sure all the data structures you use are used uniformly in all services in order to prevent need for integration efforts in the future and promote combination of services for clients to exploit. If you need to receive a customer that another service returns, this should be happening without the need for transformation
OK, but where to put the non-agnostic logic?
Now, you have multiple options for abstracting business logic whenever you have a need for complex business functionality. It depends on your scenario what you're going to chose:
Leave logic to all clients. Let them combine your simplified services
If there is business logic that is commonly implemented in multiple of your applications and has the potential to be reused heavily you can implement a composite service that reuses multiple existing underlying services and exposing the logic.
Service Composability. Concerns on multiple API calls communication overhead.
Well, this is an age-old question - should you make multiple API calls when they will probably create some communication overhead? The answer is - it depends on how complex your scenario is, how much reuse you expect and how flexible you want to be. Also is speed critical? To what extent? In Service Oriented Architecture though, this is a very common approach - to reuse your existing services and combine them in new configurations as needed. Yes, it does add some overhead, but I've seen implementations in very complex environments, for example Telecoms, where thanks to the use of ESB solutions, message queues, etc the overhead is negligible compared to the benefits. Here is a common architecture approach (image from serviceorientation.com):
The mandatory legacy refactoring heads-up
More often than not, changing the established contract for multiple existing client systems is a messy business and could very well lead to lots of refactoring and need for looking for needle-in-a-stack functionality that's somewhere deep in the (possibly) legacy code. Business logic might be dispersed everywhere. So make sure you're ready and have the controls, time and will to lead this battle.
Hope this helps
Is this a bad idea?
No, but this is a big overall question to be able to provide very specific advice.
I'd like to separate this into 3 areas:
Approach
Design
Technology
Working backwards, the Technology is the final and most-specific part, and totally depends on what your current environment is (platforms, skills), and (hopefully) will be reasonable self-evident to you once the other things are in progress.
The Design that you outlined above seems like a good end-state - having multiple, specific, focused APIs, each with their own responsibility. Again, the details of the design will depend on the skills of you and your organization, and the existing platforms that you have. E.g. if you are already using TIBCO (for example) and have a lot invested (licenses, platforms, tools, people) then leveraging some of their published patterns/designs/templates makes sense; but (probably) not if you don't already have TIBCO exposure.
In the abstract, the REST API services seems like a good starting point - there are a lot of tools and platforms at all levels of the system for security, deployment, monitoring, scalability, etc. If you are NGINX users, they have a lot of (platform-independent) thoughts on how to do this also NGINX blog, including some smart thinking on scalability and performance. If you are more adventurous, and have an smart, eager team, a look at Event-driven architecture - see this
Approach (or Process) is the key thing here. Ultimately, this is a refactoring, though your description of "a large refactor" does scare me a little - put that way, it sounds like you are talking about a big-bang change and calling it refactoring. Perhaps it is just language, but what's in my mind would be "an evolution of the 'one huge API' into multiple, specific, focused APIs (by refactoring the architecture)". One place to start is Martin Fowler, while this book is about refactoring software, the principles and approach are the same, just at a higher-level. Indeed, he talks about just this here
IBM talk about refactoring to microservices and make it sound easy to do in one step, but it never is (outside the lab).
You have an existing API, serving multiple internal and external clients. I will suggest that you'll want to keep this interface solid for these clients - separate your refactoring of the implementation from the additional concerns of liaising with and coordinating external systems/groups. My high-level starting approach would be:
identify a small (3-7) number of related methods on the API
ideally if a significant, limited-scope change is needed anyway with these methods, that is good - business value with the code change
design/specify a new stand-alone API specifically for these methods
at first, clone the existing model/naming/style
code a new service just for these
with proper automated CI/CD testing and deployment practices
with associated monitoring
modify the existing API to have calls to these methods re-direct to call the new service
perhaps have a run-time switch to change between the old implementation and the new implementation
remove the old implementation from codebase
capture issues, assumptions and problems along the way
the first pass will involve a lot of learning about what works and doesn't.
then repeat the process over & over, incorporating improvements each time.
At some point in the future, when appropriate due to other business-driven needs, the API published to the back-end, front-end and/or public clients can change, but that is a whole different project.
As you can see, if the API is huge (1,000 methods => 140 releases) this is a many-months process, and having a reasonably frequent release schedule is important. And there may be no value improving code that works reliably and never changes, so a (potentially) large portion of the existing API may remain, just wrapped by a new API.
Other considerations:
public API? Maybe a new version (significant changes) will be needed sooner than the internal APIs
focus on the methods/services used by it
what parts/services change the most (have the most enhancement requests approved)
these are the bits most likely to change, and could benefit most from a better process/architecture
what are future plans for change and where would the API be impacted
e.g. change to user management, change to payment processors, change to fulfilment systems
e.g. new business plans (new products/services)
consider affected methods in the API
Also see:
Using Microservices for Legacy System Modernization
Migrating From a Monolith to APIs and Microservices
Break the Monolith! Loosely Coupled Architecture Brings DevOps Success
From the CEO’s Desk: Application Modernization – Assess, Strategize, Modernize! 9
[Microservices Architecture As A Large-Scale Refactoring Tool 10
Probably the biggest 4 pieces of advice that I can give is:
think refactoring: small changes that don't affect function
think agile: small increments that are valuable, testable, achievable
think continuous: have a vision for where you will (eventually) get to, then work the process continuously
script & automate the processes from code, documentation, testing, deployment, monitoring...
improving it every time!
you have an application/API that works - keep it working!
That is always the first priority (you just need to work to carve-out time/budget for maintenance)
Not a bad idea at all.
Also what are your looking is microservices arch. and with that the question comes is how you break your system into well defined services.
We use Domain Driven Design Arch. to break our system into microservices and lagom framework , which allows every service to be in diff. code base and event driven arch. between microservices.
Now lets look at your problem at low level: you said a service contains code like creating a user and sending a email and one with just creating a user and there might be other code as well.
First we need to understand how many type of code you are writing:
Domain Object Logic (eg: User Object) -- what parameters are valid and all -- this should be independent of service endpoint and should be encapsulated in one Class like user class and we say it an Aggregate in Domain Driven Design terms
Business Reactions -- like on user creation send a email -- using event driven arch. these type of logics are separated into process managers or sagas which could most cases work conditionally like a for user created externally send a mail and for user created internally send a email , by having extra data in the event
Also the current way you are doing it , how are you handling transaction across services???
I see more and more software organizations using gRPC in their service-oriented architectures, but people are also still using REST. In what use cases does it make sense to use gRPC, and when does it make sense to use REST for inter-service communication?
Interestingly, I've come across open source projects that use both REST and gRPC. For instance, Kubernetes and Docker Swarm all employ gRPC to some extent for cluster coordination, but also expose REST APIs for interfacing with master/leader nodes. Why not use gRPC up and down?
Edit: for a summary of my own thoughts on this topic, see
https://medium.com/#natemurthy/rest-rpc-and-brokered-messaging-b775aeb0db3
When done correctly, REST improves long-term evolvability and scalability at the cost of performance and added complexity. REST is ideal for services that must be developed and maintained independently, like the Web itself. Client and server can be loosely coupled and change without breaking each other.
RPC services can be simpler and perform better, at the cost of flexibility and independence. RPC services are ideal for circumstances where client and server are tightly coupled and follow the same development cycle.
However, most so-called REST services don't really follow REST at all, because REST became just a buzzword for any kind of HTTP API. In fact, most so-called REST APIs are so tightly coupled, they offer no advantage over an RPC design.
Given that, my somewhat cynical answers to your question are:
Some people are adopting gRPC for the same reason they adopted REST a few years ago: design-by-buzzword.
Many people are realizing the way they implement REST amounts to RPC anyway, so why not go with an standardized RPC framework and implement it correctly, instead of insisting on poor REST implementations?
REST is a solution for problems that appear in projects that span several organizations and have long-term goals. Maybe people are realizing they don't really need REST and looking for better options.
Depending on the gRPC's future roadmap, people will continue migrating to it and letting REST (over HTTP) "quiet".
gRPC is more convenient in many ways:
Usually fast (like super-fast)
(Almost) No "design dichotomy" ― what's the right end-point to use, what's the right HTTP verb to use, etc.
Not dealing with the messy input/response serialization baloney as gRPC deals with the serialization ― more efficient data encoding and HTTP/2 which makes things go faster with multiplexed requests over a single connection and header compression
Define/Declare your input/response and generate reliable clients for different languages (of course, the ones that are "supported", this is a HUGE advantage)
Formalized set of errors ― this is debatable but so far they are more directly applicable to API use cases than the HTTP status codes
In any case, you will have to deal with all the gRPC troubles also since nothing in this world is infalible, but so far it "looks better" than REST ― and has actually proven that.
I think you can have the best of both worlds. In any case gRPC largely follows HTTP semantics (over HTTP/2) but explicitly allow for full-duplex streaming, diverging from typical REST conventions as it uses static paths for performance reasons during call dispatch as parsing call parameters from paths ― query parameters and payload body adds latency and complexity.
The promise of REST has always been a uniform interface. An ideal REST client would be able to talk to a wide range of RESTful resources, even those that did not exist when the client was coded.
Unfortunately, this ideal has never really materialized except for REST’s original case — the World Wide Web of human-readable documents.
At this point, most interfaces that call themselves “RESTful” are really a baroque kind of RPC, where request and response data is smeared over methods, query strings, headers, status codes, payloads, all in a variety of fragile formats.
Most of the uniformity in today’s “RESTful” interfaces is in the heads of developers. They “know” that POST /orders/ is probably going to add a new order. But they still have to program their clients to “know” that, for every API they talk to, often making lots of errors.
Still, there is some uniformity that can actually be useful in code. For example, if you have a “RESTful” API, then you can often add a transparent, finely tunable caching layer to it nearly for free. This is possible because (semantically correct) HTTP messages already carry all the standardized information needed for caching: request method, URL, status code, Cache-Control, Vary and all that. In gRPC, you have to roll your own caching.
But the real reason for the current dominance of “REST” is not this sort of minor affordances. It’s really just the success of the World Wide Web. At some point in history, it transpired that everyone already had a performant, flexible HTTP server (to serve their Web site) and a solid HTTP client (to view said site), so when people started adding machine-readable resources, it was just easier and cheaper to stick to the same HTTP ways. They used HTTP methods and headers and status codes because that’s what their Web servers already understood and logged. Tools like PHP allowed them to do this with zero deployment overhead over their regular Web sites.
If uniformity and alignment with the World Wide Web are not important for you, then RPC is a tried and true architectural choice, and gRPC is a solid implementation that can save you some trouble, as ɐuıɥɔɐɯ explains.
I am about to work on an application that need to handle high transactions per second (around 5000). The application needs to provide REST interface, with server side technologies including EJB, JMS, Infinispan.
Could you please let me know briefly the best practices that need to be considered while designing the application that needs to handle high volume of requests. Also please point to some resources which give more information.
Some of the factors that comes to mind are
1. Asynchronous processing (Servlet 3.0) support for REST
2. Stateless, as suggested by JB Nizet
3. Optimizing Data access(Second level hibernate cache, Cache read only data, proper column indexing)
4. Queuing long running tasks (JMS)
A lot of it is going to come down to specifics, as #JBNizet said.
How much hardware can you throw at the problem? Again, per #JBNizet, a stateless API scales really nicely in a cluster. Enough hardware can solve any performance problem - it becomes a cost tradeoff between solving performance in hardware vs. software.
Can you design your resources so they are cache friendly? Anything you can cache will save you a boatload.
Can you set up a reverse-proxy server like Squid to keep cached calls out of your data access layer?
Absolutely look at configuring your L2 cache and optimizing your database access layer. You'll want to do some reading on caching in a cluster .. there are different ways to keep your clustered caches in sync.
At a certain point, this becomes a a performance testing problem, too. Automate a suite of test calls to the API, hook up JVisualVM, and see where your bottlenecks are. Keep an eye on Hibernate, in particular. You may be better off in some cases with using custom SQL, especially if there are large differences between your data layer and your API layer.