Application startup and shutdown based on authenticated user activity - kubernetes

There are applications and services in enterprises that do not need to run all the time and that have a limited user base (say a handful of people).
These applications can be shut down and started either based on scheduling or even better user activity. So, we are talking about on-demand service (say wrapped by a container) and node start-up and shut down.
Now, first to mention that the reason why I mention authenticated user activity is because is makes sense to startup and shutdown on that basis (i.e. not based on lower level network traffic). One can imagine corporate SSO (say OAuth 2 based) being involved.
So, my question is whether anyone has attempted to implement what I have described using Consul or Kubernetes?
In the case of Consul, it could be that the key-value store could be used to give "Micro" (i.e. small user base) class applications a TTL, each time an authenticated user requests access to a given "Micro" class application it's TTL is updated. During the TTL window we want to check the health of the node(s), containers and services - outside of the window we don't (since we want to save on op ex).
This question is similar to this autoscaling question, however different in the sense that this use case is about scaling from 0 nodes and then down to 0 based on an authenticated user base (most likely using SSO).

In the case of Kubernetes, the Horizontal Pod Autoscaling documentation lists the exact use case described under Next steps (i.e. the feature is on the backlog and may be implemented after v1.1. of Kubernetes). The cited feature description (Unidling proposal) is as follows:
Scale the number of pods starting from 0. All pods can be turned-off, and then turned-on when there is a demand for them. When a request to service with no pods arrives, kube-proxy will generate an event for autoscaler to create a new pod.
So basically, it may be possible to do what I've described in future using Kubernetes, but it is not possible right now. This in itself does not address the requirement to only scale from 0 based on authenticated user activity.
It's worth noting, as a cluster-agnostic aside, on-demand container activation based on systemd. This solution will of course not scale back down to 0 without a controlling process, but it's still worth noting.

Related

What is common strategy for synchronized communication between replica's of same PODS?

Lets say we have following apps ,
API app : Responsible for serving the user requests.
Backend app: Responsible for handling the user requests which are long running tasks. It updates the progress to database (postgres) and distributed cache (Redis).
Both apps are scalable service. Single Backend app handles multiple tenants e.g. Customer here but one customer is assigned to single backend app only.
I have a usecase where I need API layer to connect to specific replica which is handling that customer. Do we have a common Pattern for this ?
Few strategies in mind
Pub/Sub: Problem is we want sync guranteed response , probably using Redis
gRPC : Using POD IP to connect to specific pod is not a standard way
Creating a Service at runtime by adding labels to the replicas and use those. -- Looks promising
Do let me know if there is common pattern or example architecture of this or standard way of doing this?
Note :[Above is a simulation of production usecase, names and actual use case is changed]
You should aim to keep your services stateless, in a Kubernetes environment there is no telling when one pod might be replaced by another due to worker node maintenance.
If you have long running task that cannot be completed during the configured grace period for pods to shutdown during a worked node drain/evacuation you need to implement some kind of persistent work queue as your are think about in option 1. I suggest you look into the saga pattern.
Another pattern we usually employ is to let the worker service write the current state of the job into the database and let the client pull the status every few seconds. This does however require some way of handling half finished jobs that might be abandoned by pods that are forced to shutdown.

Getting Beyond 50 Replica Set Members in Mongodb

I’m looking to build a distributed Access Control system for a microservice platform. I’m considering using Mongodb as my database technology. My system design objectives are as follows:
Policy Enforcement should be distributed - If any given Policy
Enforcement Point (PEP) experiences downtime, only the application
that the PEP serves should be affected.
Policy Decisions should be
distributed - We don’t want the whole platform to experience downtime
because a central Policy Decision Point (PDP) is experiencing
downtime. We only want it to affect the application that it serves.
Policy Administration should be centralized - Creating a centralized
policy administration interface provides the ability for any system
(including a UI) to understand what rights an individual has, and by
establishing a common interface it allows us to more easily audit
changes to access across a whole platform.
Policy Information (context) is distributed - We don’t get to choose this if we are
building a distributed microservice platform. We can centralize the
retrieval of additional context by aggregating data that is needed to
make access control decisions into a single place, but the data
sources are still distributed.
I’m considering building a system like the one shown below. The idea is that Access Policies are administered by a central Policy Admin API. This API manages Policies that are persisted to a mongodb cluster with a 3 member replica set backing it. I would like other APIs in the platform to have a dedicated policy-query-api (Policy Decision Point) that is deployed along side it to make Access Control decisions pertinent to the API. The idea is that if any one of the policy-query-apis goes down, only the API that it serves will be affected.
I want changes to Policies to be governed by the Policy Admin API and I would like the changes to be replicated across each mongo instance that is used by each of the policy-query-apis.I don’t want the mongo replicas for each policy-query-api to affect a write to the primaries.
I also don’t need immediate data consistency (less than 5 sec latency), but I would like the data replication to be handled at the database layer if possible. The technology is already built to handle this and I don’t want to reinvent the wheel at the application layer if possible.
I’ve looked at the documentation on Replica Set Members and I’ve pretty thoroughly reviewed the documentation on Replica Sets in Mongo. It seems like having a Hidden Member or Delayed Member would be a good fit for my use case. Do you agree? Also, I’m concerned about the 50 member replica set limit 1. Since each one of these replicas would serve an API in my platform, if there exceeded more than 50 microservices (which is quite likely) how would I manage replication like this?
Just so that I understand, you are asking about:
one standalone (?? your picture suggests standalone but you are asking about 50 node RS limit) node per application, data mirrored to standalone from the master RS
the application only queries its local standalone
MongoDB provides read preference nearest for the use case of reading data from local nodes. Importantly the nearest read preference still provides availability if your local node is unavailable - the next closest (roughly) node will be used in this case. Your proposed architecture would take the application down every time its local database node needs to be restarted for version upgrades.
You may also look into tag sets.
Additionally, MongoDB allows specifying priorities on nodes for election purposes. If you put all of your MongoDB nodes into the same RS, you can use priorities to have one of the 3 designated "main" servers be primaries if any of them are available.

How to notify POD in statefull set about other PODS in Kubernetes

I was reading the tutorial on deploying a Cassandra ring and zookeeper with statefulsets. What I don't understand is if I decide to add another replica into the statefulset, how do I notify the other PODS that there is another one. What are best practices for it? I want to be able for one POD to redirect request to another POD in my custom application in case the request doesn't belong to it (ie. it doesn't have the data)
Well, seems like you want to run a clustered application inside kubernetes. It is not something that kubernetes is directly responsible for. The cluster coordination for given solution should be handled within it, and a response to a "how to" question can not be generic.
Most of the softwares out there will have some kind of coordination, discovery and registration mechanism. Be it preconfigured members, external dioscovery catalog/db or some networ broadcasting.
StatefulSet helps a lot in it by retaining network identity under service/pod, or helping to keep storage, so you can ie. always point your new replicas to register with first replica (or preferably one of the first two, cause what if your no.1 is the one that restarted), but as a wrote above, this is pretty much depending on capabilities available on the solution you want to deploy.

Behaviour when reducing instances of a Bluemix application

I have an orchestrator service which keeps track of the instances that are running and what request they are currently dealing with. If a new instance is required, I make a REST call to increase the instances and wait for the new instance to connect to the orchestrator. It's one request per instance.
The orchestrator tracks whether an instance is doing anything and knows which instances can be stopped, however there is nothing in the API that allows me to reduce the number of instances stopping a particular instance, which is what I am trying to achieve.
Is there anything I can do to manipulate the platform into deterministically stopping the instances that I want to stop? Perhaps by having long running HTTP requests to the instances I require and killing the request when it's no longer required, then making the API call to reduce the number of instances?
Part of the issue here is that I don't know the specifics of the current behavior...
Assuming you're talking about CloudFoundry/Instant Runtime applications, all of the instances of an applications are running behind a load balancer which uses round-robin to distribute requests across the instances (unless you have session affinity cookie set up). Differentiating between each instances for incoming requests or manual scaling is not recommended and it's an anti-pattern. You can not control which instance the scale down task will choose.
If you really want that level of control with each instance, maybe you should deploy them as separate applications. MyApp1, MyApp2, MyApp3, etc. All of your applications can have the same route (myapp.mybluemix.net). Each of the applications can now distinguish themselves by their name (VCAP_APPLICATION) allowing you terminate them.

How to monitor (micro)services?

I have a set of services. Every service contains some components.
Some of them are stateless, some of them are stateful, some are synchronous, some are asynchronous.
I used different approaches to monitoring and alerting.
Log-based alerting and metrics gathering. New Relic based. Own bicycle.
Basically, atm I am looking for a way, how to generalize and aggregate important metrics for all services in single place. One of things, I want is that we monitor more products, than separate services.
As an end result I see it as a single dashboard with small amount of widgets, but looking at those widgets I would be able to say for sure, if services are usable to end-customer.
Probably someone can recommend me some approach/methodology. Or give a reference to some best practices.
I like what you're trying to achieve! A service is not production-ready unless it's thoroughly monitored.
I believe what your're describing goes into the topics of health-checking and metrics.
... I would be able to say for sure, if services are usable to end-customer.
That however will require a little of both ;-) To ensure you're currently fulfilling your SLA, you have to make sure, that your services are all a) running and b) perform as requested. With both problems I suggest to look at the StatsD toolchain. Initially developed by Etsy, it has become the de-facto standard for gathering metrics.
To ensure all your services are running, we're relaying Kubernetes. It takes our description for what should run, be reachable from outside etc. and hosts that on our infrastructure. It also makes sure, that should things die - that they will be restarted. It helps with things like auto-scaling etc. as well! Awesome tooling and kudos to Google!
The way it ensures that is with health-checks. There are multiple ways how you can ensure your service node booted by Kubernetes is alive and kicking (namely HTTP calls and CLI scripts but this should be a modular thing should you need anything else!) If Kubernetes detects unhealthy nodes it will immediately phase them out and start another node instead.
Now, making sure, all your services perform as expected you'll need to gather some metrics. For all of our services (and all individual endpoints), we gather a few metrics via StatsD like:
Requests/sec
number of errors returned (404, etc...)
Response times (Average, Median, Percentiles depending on the services SLA)
Payload size (Average)
sometimes the number of concurrent requests per endpoint, the number of instances currently running
general metrics like the hosts current CPU and memory usage and uptime.
We gather a lot more metrics but that's about the bottom line. Since StatsD has become more of a "protocol specification" than a concrete product there are a myriad of collector, front- and backends to choose from. They help you visualize your systems state and many of them feature alerts of something or some combination of metrics go beyond their thresholds.
Let me know, if this was helpfull!
There's at least 3 types of things you will need to monitor: the host where the service is deployed, the component itself and the SLAs and some of them depend on the software stack you're using as well as the architecture.
With that said, you could for example use Nagios to monitor the hardware where the services are deployed, Splunk for the services metrics/SLAs as well as for any errors that might occur. You can also use SNMP packages in case something goes wrong and you have a more sophisticated support structure, this would be yours triggers. Without knowing how your infrastructure/services are set up it is complicated to go into deeper details.