API Management node goes offline - ibm-cloud

We are working with the API Management service for Bluemix. When the management node goes offline, are the APIs still available because they are served by the Gateway (DataPower) nodes? We are aware that an option is to have a second management node.

If the Management node goes offline, the APIs are still available for use as they are served by Gateway(Datapower) nodes. However, you will not be able to publish new APIs or manage the existing APIs. The analytics records will be batched for a period of time and will be ready to be passed to the Management node when it returns. Depending on the length of the Management node outage and the volume of API calls, it is possible that the buffer might overrun and the oldest analytics records will begin to be discarded.

Related

How to spin up/down workers programmatically at run-time on Kubernetes based on new Redis queues and their load?

Suppose I want to implement this architecture deployed on Kubernetes cluster:
Gateway
Simple RESTful HTTP microservice accepting scraping tasks (URLs to scrape along with postback urls)
Request Queues - Redis (or other message broker) queues created dynamically per unique domain (when new domain is encountered, gateway should programmatically create new queue. If queue for domain already exists - just place message in it.
Response Queue - Redis (or other message broker) queue used to post Worker results as scraped HTML pages along with postback URLs.
Workers - worker processes which should spin-up at runtime when new queue is created and scale-down to zero when queue is emptied.
Response Workers - worker processes consuming response queue and sending postback results to scraping client. (should be available to scale down to zero).
I would like to deploy the whole solution as dockerized containers on Kubernetes cluster.
So my main concerns/questions would be:
Creating Redis or other message broker queues dynamically at run-time via code. Is it viable? Which broker is best for that purpose? I would prefer Redis if possible since I heard it's the easiest to set up and also it supports massive throughput, ideally my scraping tasks should be short-lived so I think Redis would be okay if possible.
Creating Worker consumers at runtime via code - I need some kind of Kubernetes-compatible technology which would be able to react on newly created queue and spin up Worker consumer container which would listen to that queue and later on would be able to scale up/down based on the load of that queue. Any suggestions for such technology? I've read a bit about KNative, and it's Eventing mechanism, so would it be suited for this use-case? Don't know if I should continue investing my time in reading it's documentation.
Best tools for Redis queue management/Worker management: I would prefer C# and Node.JS tooling. Something like Bull for Node.JS would be sufficient? But ideally I would want to produce queues and messages in Gateway by using C# and consume them in Node.JS (Workers).
If you mean vertical scaling it definitely won't be a viable solution, since it requires pod restarts. Horizontal scaling is somewhat viable when compared to vertical scaling, however you need to consider a fact that even for spinning up your nodes or pods it takes some time and it is always suggested to have proper resources in place for serving your upcoming traffic else this delay will affect some features of your application and there might be a business impact. Just having auto scalers isn’t an option; you should also have proper metrics in place for monitoring your application.
This documentation details how to scale your redis and worker pods respectively using the KEDA mechanism. KEDA stands for Kubernetes Event-driven Autoscaling, KEDA is a plugins which sits on top of existing kubernetes primitives (such as Horizontal pod autoscaler) to scale any number of kubernetes containers based on the number of events which needs to be processed.

What is common strategy for synchronized communication between replica's of same PODS?

Lets say we have following apps ,
API app : Responsible for serving the user requests.
Backend app: Responsible for handling the user requests which are long running tasks. It updates the progress to database (postgres) and distributed cache (Redis).
Both apps are scalable service. Single Backend app handles multiple tenants e.g. Customer here but one customer is assigned to single backend app only.
I have a usecase where I need API layer to connect to specific replica which is handling that customer. Do we have a common Pattern for this ?
Few strategies in mind
Pub/Sub: Problem is we want sync guranteed response , probably using Redis
gRPC : Using POD IP to connect to specific pod is not a standard way
Creating a Service at runtime by adding labels to the replicas and use those. -- Looks promising
Do let me know if there is common pattern or example architecture of this or standard way of doing this?
Note :[Above is a simulation of production usecase, names and actual use case is changed]
You should aim to keep your services stateless, in a Kubernetes environment there is no telling when one pod might be replaced by another due to worker node maintenance.
If you have long running task that cannot be completed during the configured grace period for pods to shutdown during a worked node drain/evacuation you need to implement some kind of persistent work queue as your are think about in option 1. I suggest you look into the saga pattern.
Another pattern we usually employ is to let the worker service write the current state of the job into the database and let the client pull the status every few seconds. This does however require some way of handling half finished jobs that might be abandoned by pods that are forced to shutdown.

Request buffering in Kubernetes clusters

This is a purely theoretical question. A standard Kubernetes clusted is given with autoscaling in place. If memory goes above a certain targetMemUtilizationPercentage than a new pod is started and it takes on the flow of requests that is coming to the contained service. The number of minReplicas is set to 1 and the number of maxReplicas is set to 5.
What happens when the number of pods that are online reaches maximum (5 in our case) and requests from clients are still coming towards the node? Are these requests buffered somewhere of they are discarded? Can I take any actions to avoid request loss?
Natively Kubernetes does not support messaging queue buffering. Depends on the scenario and setup you use your requests will most likely 'timeout'. To efficiently manage those you`ll need custom resource running inside Kubernetes cluster.
In that situations it very common to use a message broker which ensures communication between microservices is reliable and stable, that the messages are managed and monitored within the system and that messages don’t get lost.
RabbitMQ, Kafka and Redis appears to be most popular but choosing the right one will heaving depend on your requirement and features needed.
Worth to note since Kubernetes essentially runs on linux is that linux itself also manages/limits the requests coming in socket. You may want to read more about it here.
Another thing is that if you have pods limits set or lack of resource it is most likely that pods might be restarted or cluster will become unstable. Usually you can prevent it by configuring some kind of "circuit breaker" to limit amount of requests that could go to backed without overloading it. If the amount of requests goes beyond the circuit breaker threshold, excessive requests will be dropped.
It is better to drop some request than having cascading failure.
I managed to test this scenario and I get 503 Service Unavailable and 403 Forbidden on my requests that do not get processed.
Knative Serving actually does exactly this. https://github.com/knative/serving/
It buffers requests and informs autoscaling decisions based on in-flight request counts. It also can enforce per-Pod max in-flight requests and hold onto request until newly scaled-up Pods come up and then Knative proxies the request to them as it has this container named queue-proxy as a sidecar to its workload type called "Service".

Stateful service service fabric app - remoting, and custom state saving provider

I'm writing a first Azure Service Fabric app applying partitioning to stateful services. I have a few questions:
Can I use remoting instead of HTTP to communicate from my web api to my partitions. The Azure example uses HttpCommunicationListener and I've not been able to see how to use remoting. I would expect remoting would be faster?
Can I persist my state for a given partition using a custom state persistence provider? Will that still be supported by the replication features of service fabric?
Can my stateful service partition save several hundred megabytes of state?
Examples/guidance pointers for above would be greatly appreciated.
Thanks
You can use SF remoting within the cluster, to communicate between services and actors. Http access is usually used to communicate to services from outside the cluster. (but you can still use it from within)
Yes, you can do that by implementing custom IStateProviderReplica2 and likely the serializer. But be aware that this is difficult. (Why would you require this?)
Stateful service storage capacity is limited by disk and memory. (calculation example behind the link)
Reliable services are typically partitioned, so the amount you can
store is only limited by the number of machines you have in the
cluster, and the amount of memory available on those machines.
--- extra info concerning partitioning---
Yes, have a look at this video, the start of it is about how to come up with a partitioning strategy.
The most important downside of 'partition per user' is that the #of partitions cannot be changed without recreating the service. Also, it doesn't scale. And the distribution of data is off balance.

Rolling Over Streaming Connections During Upgrades

I am working on an application that uses Amazon Kinesis, and one of the things I was wondering about is how you can roll over an application during an upgrade without data loss on streams. I have heard about things like blue/green deployments and such, but I was wondering what is the best practice for upgrading a data streaming service so you don't loose data from your streams.
For example, my application has an HTTP endpoint that ingests data as a series of POST operations. If I want to replace the service with a newer version, how do I manage existing application streaming to my endpoint?
One common method is having a software load balancer (LB) with a virtual IP; behind this LB there would be at least two HTTP ingestion endpoints during normal operation. During upgrade, each endpoint is announced out and upgraded in turn. The LB ensures that no traffic is forwarded to an announced out endpoint.
(The endpoints themselves can be on separate VMs, Docker containers or physical nodes).
Of course, the stream needs to be finite; the TCP socket/HTTP stream is owned by one of the endpoints. However, as long as the stream can be stopped gracefully, the following flow works, assuming endpoint A owns the current ingestion:
Tell endpoint A not to accept new streams. All new streams will be redirected only to endpoint B by the LB.
Gracefully stop existing streams on endpoint A.
Upgrade A.
Announce A back in.
Rinse and repeat with endpoint B.
As a side point, you would need two endpoints with a load balanced (or master/slave) set-up if you require any reasonable uptime and reliability guarantees.
There are more bespoke methods which allow hot code swap on the same endpoint, but they are more bespoke and rely on specific internal design (e.g. separate process between networking and processing stack connected by IPC).