How to make a forward compatible (Service Fabric) microservice - deployment

According to Service Fabric rolling upgrades documentation:
During the upgrade, the cluster may contain a mix of the old and new versions. For that reason, the two versions must be forward and backward compatible.
I know how to make a microservice backward compatible so an old client can talk to a new server. But how can an old server be (forward) compatible with a new client, which can try to call a newly introduced endpoint?
The documentation follows with:
If they are not compatible, the application administrator is responsible for staging a multiple-phase upgrade to maintain availability
Is a multiple-phase upgrade the only way of achieving high availability when introducing new endpoints in a microservice? or can it be achieved with the default rolling upgrade process, maybe by routing the calls from new clients to new servers?

It is not about introducing new endpoints. It's about handling existing endpoints.
Let's say you have and endpoint E1 which returns 2 fields, F1 and F2. Now, you introduced a new feature that requires E1 returning 3 fields - F1, F2, F3.
If you're doing rolling upgrade, you have both old and new clients talking to both old and new server. If in a particular communication, an old client gets connected to a new server and the client breaks by seeing a new field F3, it is NOT forward compatible.
So, forward compatibility is where the old client can read data created by new client.
For e.g. while parsing JSON using Jackson, we can specify #JsonIgnoreProperties(ignoreUnknown = true) which will ignore new properties. Similarly while using other data encoding formats like Apache Thrift or Protocol Buffer, we can add new fields while maintaining forward compatibility, but if we remove existing fields, compatibility breaks. With Avro, things are little easier as the schema used to encode data is available with the data.
Martin Klepmann's book Designing data intensive applications has a detailed chapter on this.

Related

Should HTTP REST request to the Server return data in specific format expected by the Client?

If using a library on the client-side that expects data in a specific format (e.g. [{id: 1, name: "Jack", available: true}]), should the server process the data in the exact structure requested by the client or send back a generic data (e.g. [{userId: 1, username: "Jack", isUserAvailable: true}]) which can then be modeled on the client-side to avoid tight coupling and breaking if the client-side library changes in the future?
Client have to depend towards the interface that the server provides, This interface can be seen as a contract that the server and client agree to. This does imply the data structures provided and is a form of coupling. Hence the need to clearly describe / define APIs and have a policy for versioning and obsoleting those.
So this may seem like a tight coupling at first glance, but does not have to be. Client and server may or may not use the same language / representation of the data. The client is free to do whatever it wants the with the JSON in this example. It may use all the data or just a single attribute. All of that is of no concern to the server. Similarly the client is not concerned with how the server has created this JSON string. Due to the service contract only describing the interface and the resulting freedom for server / client implementations the coupling can be considered loose (enough).
If you have legacy systems or are unable to change formats or need to support specific clients which cannot be changed, then consider an API gateway that can do transforms for you at different endpoints.
But normally, model your API as open as possible. You can hardly define it for each different client there is out there.

How to keep state consistent across distributed systems

When building distributed systems, it must be ensured the client and the server eventually ends up with consistent view of the data they are operating on, i.e they never get out of sync. Extra care is needed, because network can not be considered reliable. In other words, in the case of network failure, client never knows if the operation was successful, and may decide to retry the call.
Consider a microservice, which exposes simple CRUD API, and unbounded set of clients, maintained in-house by the same team, by different teams and by different companies also.
In the example, client request a creation of new entity, which the microservice successfully creates and persists, but the network fails and client connection times out. The client will most probably retry, unknowingly persisting the same entity second time. Here is one possible solution to this I came up with:
Use client-generated identifier to prevent duplicate post
This could mean the primary key as it is, the half of the client and server -generated composite key, or the token issued by the service. A service would either persist the entity, or reply with OK message in the case the entity with that identifier is already present.
But there is more to this: What if the client gives up after network failure (but entity got persisted), mutates it's internal view of the entity, and later decides to persist it in the service with the same id. At this point and generally, would it be reasonable for the service just silently:
Update the existing entity with the state that client posted
Or should the service answer with some more specific status code about what happened? The point is, developer of the service couldn't really influence the client design solutions.
So, what are some sensible practices to keep the state consistent across distributed systems and avoid most common pitfalls in the case of network and system failure?
There are some things that you can do to minimize the impact of the client-server out-of-sync situation.
The first measure that you can take is to let the client generate the entity IDs, for example by using GUIDs. This prevents the server to generate a new entity every time the client retries a CreateEntityCommand.
In addition, you can make the command handing idempotent. This means that if the server receives a second CreateEntityCommand, it just silently ignores it (i.e. it does not throw an exception). This depends on every use case; some commands cannot be made idempotent (like updateEntity).
Another thing that you can do is to de-duplicate commands. This means that every command that you send to a server must be tagged with an unique ID. This can also be a GUID. When the server receives a command with an ID that it already had processed then it ignores it and gives a positive response (i.e. 200), maybe including some meta-information about the fact that the command was already processed. The command de-duplication can be placed on top of the stack, as a separate layer, independent of the domain (i.e. in front of the Application layer).

hazelcast spring-data write-through

I am using Spring-Boot, Spring-Data/JPA with Hazelcast client/server topology. In parts of my test application, I am calculating time when performing CRUD operations on the client side (the server is the one interacting with a relational db). I configured the map(Store) to be write-behind by setting write-delay-seconds to 10.
Spring-Data's save() returns the persisted entity. In the client app, therefore, the application flow will be blocked until the (server) returns the persisted entity.
Would like to know is there is an alternative in which case the client does NOT have to wait for the entity to persist. Was under the impression that once new data is stored in the Map, persisting to the backed happens asynchronously -> the client app would NOT have to wait.
Map config in hazelast.xml:
<map name="com.foo.MyMap">
<map-store enabled="true" initial-mode="EAGER">
<class-name>com.foo.MyMapStore</class-name>
<write-delay-seconds>10</write-delay-seconds>
</map-store>
</map>
#NeilStevenson I don't find your response particularly helpful. I asked on an earlier post about where and how to generate the Map keys. You pointed me to the documentation which fails to shed any light on this topic. Same goes for the hazelcast (and other) examples.
The point of having the cache in the 1st place, is to avoid hitting the database. When we add data (via save()), we need to also generate an unique key for the Map. This key also becomes the Entity.Id in the database table. Since, again, its the hazelcast client that generates these Ids, there is no need to wait for the record to be persisted in the backend.
The only reason to wait for save() to return the persisted object would be to catch any exceptions NOT because of the ID.
That unfortunately is how it is meant to work, see https://docs.spring.io/spring-data/commons/docs/current/api/org/springframework/data/repository/CrudRepository.html#save-S-.
Potentially the external store mutates the saved entry in some way.
Although you know it won't do this, there isn't a variant on the save defined.
So the answer seems to be this is not currently available in the general purpose Spring repository definition. Why not raise a feature request for the Spring Data team ?

Rest Security Ensure Resource Delete

Background:I'm a new developer fresh out of college at a company that uses RPC architectural style for a lot its internal services.They also seem to change which tool they use behind the scenes pretty frequently, so the tight coupling between the client and server implementations in RPC is problematic. I was tasked with rewriting one of the services, and I feel a RESTful api would be a good match because the backing technology can only deal with files anyway, but I have a few questions.My understanding of REST so far is that you break operations up as much as possible and shift the focus to resources, so both the client and the server together make a state machine with the server mainly handling the transitions through hypermedia.Example:say you have a service that takes a file and splits it in two byte-wise.I would design the sequence for this likethe client would POST the file they want split,server splits the fileserver writes both result pieces to a temp folderserver returns that the client should GET and both files URI'sthe client sends a GET for the pieceserver returns the piece and that the client should DELETE the URIthe client sends a DELETE for the URI
and 2 and 3 are done for both pieces.My question is: How do you ensure that the pieces get deleted at the end?a client could just not follow step 3if you combine step 2&3, a malicious (or negligent) client could just stop after step 1but if you combine them all, isn't that just RPC over HTTP?
If the 2 pieces in question are inseparable, then they are in fact just properties of a single resource.
And yes, if a POST/PUT must be followed by a DELETE, then you're probably just trying to shoehorn RPC into a REST-style architecture.
There's no real definition of what "REST" actually is, but if the one thing certain about it is that it MUST be stateless; i.e. every separate request must be self-sufficient - it cannot depend on a previous request, and cannot mandate subsequent requests.

Workarounds for adding new fields to existing output data type in SOAP

According to this article about backwards compatibility in SOAP by IBM they state that new fields can not be added to output types without breaking the contract. The relevant snip from the page is from the section titled New, optional fields in an existing data type...
You can add an element to an existing complexType as long as you make it optional (using the minOccurs="0" attribute). But be careful. Adding an optional element is a minor change only if its enclosing complexType is received as input to the new service. The new service cannot return a complexType with new fields. If an old client were to receive the new field, the client deserialization would fail because the client would not know about the new field.
This was written in 2004 for the WSDL 1.1 spec. Is this still true under current under the WSDL 1.2 spec? Is there no way to define a default behavior of "ignore" for new unknown fields? This statement also seems implementation specific or is that per the spec?
I am trying to contend with the issue of evolving a SOAP service that returns complex business objects. New fields will be added as consumers find use cases for them. I would like to avoid having keep N versions of the service around for simply adding new fields.
From my personal experience this is still the case. I think your main concern is the versioning methodology. You can look at: http://www.ibm.com/developerworks/webservices/library/ws-version/, or more close to home Web Services API Versioning.