Running a stateless app as a statefulset (Kubernetes) - kubernetes

In the Kubernetes world, a typical/classic pattern is using Deployment for Stateless Applications and using StatefulSet for a stateful application.
I am using a vendor product (Ping Access) which is meant to be a stateless application (it plays the role of a Proxy in front of other Ping products such as Ping Federate).
The github repo for Ping Cloud (where they run these components as containers) shows them running Ping Access (a stateless application) as a Stateful Set.
I am reaching out to their support team to understand why anyone would run a Stateless application as a StatefulSet.
Are there other examples of such usage (as this appears strange/bizarre IMHO)?
I also observed a scenario where a customer is using a StatefulApp (Ping Federate) as a regular deployment instead of hosting them as a StatefulSet.
The Ping Cloud repository does build and deploy Ping Federate as a StatefulSet.
Honestly, both these usages, running a stateless app as a StatefulSet (Ping Access) and running a stateful app as a deployment (Ping Federate) sound like classic anti-patterns.

Apart from the ability to attach dedicated Volumes to StatefulSets you get the following features of which some might be useful for stateless applications:
Ordered startup and shutdown of Pods with K8s doing them one by one in an ordered fashion.
Possibility to guarantee that not more than a single Pod is running at a time even during unscheduled Pod restarts.
Stable DNS names for Pods.
I can only speculate, why Ping Federate uses a StatefulSet. Possibly, it has to do with access limitations of the downstream services it connects to.

The consumption of PingAccess is stateless, but the operation is very much stateful. Namely, the PingAccess admin console maintains a database for configuration, and part of that configuration includes clustered engine mapping and session keys.
Thus, if you were to take away the persistent volume, restarting the admin console would decouple all the engines in the cluster. Then the engines no longer receive configuration.. and web session keys would be mismatched.
The ping-cloud-base repo uses StatefulSet for engines not for persistent volumes, but for sts naming scheme. I personally disagree with this and recommend using Deployment for engines. The only downside is you then have to remove orphaned engines from the admin configuration. Orphaned engines meaning engine config that stays in the admin console db after the engine deployment is rolled/updated. These can be removed from the admin UI, or API. Pretty easy to script in a hook.
It would be ideal for an application that is not a datastore to run without persistent volume, but for the reasons mentioned above, the PingAccess admin console does require and act like a persistent datastore so I think StatefulSet is okay.
Finally, the Ping DevOps team focuses support on their helm chart (where engines are also deployments by default). I'd suspect the community and enterprise support is much larger there for folks deploying on their own. ping-cloud-base is a good place to pick up some hooks though.

Related

Migrate Service Fabric Reliable Collections to Kubernetes

We are in the process of migrating our Service Fabric services to Kubernetes. Most of them were "stateless" services and were easy to migrate. However, we have one "stateful" service that uses SF's Reliable Collections pretty heavily.
K8s has Statefulsets, but that's not really comparable to SF's reliable collections.
Is there a .NET library or other solution to implement something similar to SF's Reliable Collections in K8s?
AFAIK this cannot be done by using a .Net library.
K8 is all about orchestration. SF on the other hand is both an orchestrator + rich programming /application model + state management.
If you want to do something like reliable collection in K8 then you have to either
A) built your own replication solution with leader election and all.
B) use private etcd/cockroachdb etc. store
This article is pretty good in terms of differences.
https://learn.microsoft.com/en-us/archive/blogs/azuredev/service-fabric-and-kubernetes-comparison-part-1-distributed-systems-architecture#split-brain-and-other-stateful-disasters
"Existing systems provide varying levels of support for microservices, the most prominent being Nirmata, Akka, Bluemix, Kubernetes, Mesos, and AWS Lambda [there’s a mixed bag!!]. SF is more powerful: it is the only data-ware orchestration system today for stateful microservices"
However, they don't solve the coordination problem (saving data on the primary instance will automatically replicate to the others for recovery when the primary instance dies). That's what SF reliable collections does out of the box.
StatefulSets are valuable for applications that require one or more of the following.
Stable (persistent), unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, automated rolling updates.
If your application doesn't require any of these, you should deploy your application using a Deployment.
There is a good Kubernetes guide on how to Run a Replicated Stateful Application here.
This page shows how to run a replicated stateful application using a StatefulSet controller. This application is a replicated MySQL database. The example topology has a single primary server and multiple replicas, using asynchronous row-based replication.
The StatefulSet controller starts Pods one at a time, in order by their ordinal index. It waits until each Pod reports being Ready before starting the next one. And it is used to perform orderly startup of MySQL replication by running Init Containers.
Operators can help
While operators are not necessary, they can help run stateful apps on Kubernetes with features like application-level HA management, backups, and restore.
You can use existing Operators or develop your own. The operator package includes all the configurations needed to deploy and manage the application from a Kubernetes point of view – from a StatefulSet to be used to any required storage, rollout strategies, persistence and affinity configuration, and more. Kubernetes will then rely on the operator to validate instances of the application against the specification to ensure it runs in the same way across instances in all clusters it is deployed in.
Some DB operators:
You can deploy MySQL database using Kubernetes operator developed by Oracle (currently is in a preview state):
https://github.com/mysql/mysql-operator
There’s also a PostgreSQL operator by Crunchydata to deploy PostgreSQL to Kubernetes:
https://github.com/CrunchyData/postgres-operator
MongoDB owns an operator to deploy MongoDB Enterprise to a Kubernetes cluster:
https://github.com/mongodb/mongodb-enterprise-kubernetes
You can find ready-made operators on OperatorHub.io to suit your use case.

multiple environment for websites in Kubernetes

I am a newbie in Kubernetes.
I have 19 LAN servers with 190 machines.
Each of the 19 LANs has 10 machines and 1 exposed IP.
I have different websites/apps and their environments that are assigned to each LAN.
how do I manage my Kubernetes cluster and do setup/housekeeping.
Would like to have a single portal or manager to manage the websites and environment(dev, QA, prod) and keep isolation.
Is that possible?
I only got a vague idea of what you want to achieve so here goes nothing.
Since Kubernetes has a lot of convenience tools for setting a cluster on a public cloud platform, I'd suggest to start by going through "kubernetes-the-hard-way". It is a guide to setup a cluster on Google Cloud Platform without any additional scripts or tools, but the instructions can be applied to local setup as well.
Once you have an operational cluster, next step should be to setup an Ingress Controller. This gives you the ability to use one or more exposed machines (with public IPs) as gateways for the services running in the cluster. I'd personally recommend Traefik. It has great support for HTTP and Kubernetes.
Once you have the ingress controller setup, your cluster is pretty much ready to use. Process for deploying a service is really specific to service requirements but the right hand rule is to use a Deployment and a Service for stateless loads, and StatefulSet and headless services for stateful workloads that need peer discovery. This is obviously too generalized and have many exceptions.
For managing different environments, you could split your resources into different namespaces.
As for the single portal to manage it all, I don't think that anything as such exists, but I might be wrong. Besides, depending on your workflow, you can create your own portal using the Kubernetes API but it requires a good understanding of Kubernetes itself.

Usage of custom resource definition (CRD) vs Service Catalog in k8s

I recently started to explore k8s extensions and got introduced to two concepts:
CRD.
Service catalogs.
They look pretty similar to me. The only difference to my understanding is, CRDs are deployed inside same cluster to be consumed; whereas, catalogs are deployed to be exposed outside the cluster for example as database service (client can order cluster of mysql which will be accessible from his cluster).
My query here is:
Is my understanding correct? if yes, can there be any other scenario where I would like to create catalog and not CRD.
Yes, your understanding is correct. Taken from official documentation:
Example use case
An application developer wants to use message queuing as part of their application running in a Kubernetes cluster.
However, they do not want to deal with the overhead of setting such a
service up and administering it themselves. Fortunately, there is a
cloud provider that offers message queuing as a managed service
through its service broker.
A cluster operator can setup Service Catalog and use it to communicate
with the cloud provider’s service broker to provision an instance of
the message queuing service and make it available to the application
within the Kubernetes cluster. The application developer therefore
does not need to be concerned with the implementation details or
management of the message queue. The application can simply use it as
a service.
With CRD you are responsible for provisioning resources, running backend logic and so on.
More info can be found in this KubeCon 2018 presentation.

Is testing on OpenShift Container Platform (OCP) equivalent to testing on Openshift Origin from a kubernetes standpoint?

This applications which are programmed to use the kubernetes API.
Should we assume that openshift container platform, from a kubernetes standpoint, matches all the standards that openshift origin (and kubernetes) does?
Background
Compatibility testing cloud native apps that are shipped can include a large matrix. It seems that, as a baseline, if OCP is meant to be a pure kubernetes distribution, with add ons, then testing against it is trivial, so long as you are only using core kubernetes features.
Alternatively, if shipping an app with support on OCP means you must test OCP, that would to me imply that (1) the app uses OCP functionality or (2) the app uses kube functionality which may not work correctly in OCP, which should be a considered a bug.
In practice you should be able to regard OpenShift Container Platform (OCP) as being the same as OKD (previously known as Origin). This is because it is effectively the same software and setup.
In comparing both of these to plain Kubernetes there are a few things you need to keep in mind.
The OpenShift distribution of Kubernetes is set up as a multi-tenant system, with a clear distinction between normal users and administrators. This means RBAC is setup so that a normal user is restricted in what they can do out of the box. A normal user cannot for example create arbitrary resources which affect the whole cluster. They also cannot run images that will only work if run as root as they run under a default service account which doesn't have such rights. That default service also has no access to the REST API. A normal user has no privileges to enable the ability to run such images as root. A user who is a project admin, could allow an application to use the REST API, but what it could do via the REST API will be restricted to the project/namespace it runs in.
So if you develop an application on Kubernetes where you have an expectation that you have full admin access, and can create any resources you want, or assume there is no RBAC/SCC in place that will restrict what you can do, you can have issues getting it running.
This doesn't mean you can't get it working, it just means that you need to take extra steps so you or your application is granted extra privileges to do what it needs.
This is the main area where people have issues and it is because OpenShift is setup to be more secure out of the box to suit a multi-tenant environment for many users, or even to separate different applications so that they cannot interfere with each other.
The next thing worth mentioning is Ingress. When Kubernetes first came out, it had no concept of Ingress. To fill that hole, OpenShift implemented the concept of Routes. Ingress only came much later, and was based in part of what was done in OpenShift with Routes. That said, there are things you can do with Routes which I believe you still can't do with Ingress.
Anyway, obviously, if you use Routes, that only works on OpenShift as a plain Kubernetes cluster only has Ingress. If you use Ingress, you need to be using OpenShift 3.10 or later. In 3.10, there is an automatic mapping of Ingress to Route objects, so I believe Ingress should work even though OpenShift actually implements Ingress under the covers using Routes and its haproxy router setup.
There are obviously other differences as well. OpenShift has DeploymentConfig because Kubernetes never originally had Deployment. Again, there are things you can do with DeploymentConfig you can't do with Deployment, but Deployment object from Kubernetes is supported. One difference with DeploymentConfig is how it works in with ImageStream objects in OpenShift, which don't exist in Kubernetes. Stick with Deployment/StatefulSet/DaemonSet and don't use the OpenShift objects which were created when Kubernetes didn't have such features you should be fine.
Do note though that OpenShift takes a conservative approach on some resource types and so they may not by default be enabled. This is for things that are still regarded as alpha, or are otherwise in very early development and subject to change. You should avoid things which are still in development even if using plain Kubernetes.
That all said, for the core Kubernetes bits, OpenShift is verified for conformance against CNCF tests for Kubernetes. So use what is covered by that and you should be okay.
https://www.cncf.io/certification/software-conformance/

Kubernetes - Load balancing Web App access per connections

Long time I did not come here and I hope you're fine :)
So for now, i have the pleasure of working with kubernetes ! So let's start ! :)
[THE EXISTING]
I have an operationnal kubernetes cluster with which I work every day.it consists of several applications, one of which is of particular interest to us, which is the web management interface.
I currently own one master and four nodes in my cluster.
For my web application, pod contain 3 containers : web / mongo /filebeat, and for technical reasons, we decided to assign 5 users max for each web pod.
[WHAT I WANT]
I want to deploy a web pod on each nodes (web0,web1,web2,web3), what I can already do, and that each session (1 session = 1 user) is distributed as follows:
For now, all HTTP requests are processed by web0.
[QUESTIONS]
Am I forced to go through an external loadbalancer (haproxy)?
Can I use an internal loadbalancer, configuring a service?
Does anyone have experience on the implementation described above?
I thank in advance those who can help me in this process :)
This generally depends how and where you've deployed your Kubernetes infrastructure, but you can do this natively with a few options.
Firstly, you'll need to scale your web deployment. This is very simple to do:
kubectl scale --current-replicas=2 --replicas=3 deployment/web
If you're deployed into a cloud provider (such as AWS using kops, or GKE) you can use a service. Just specify the type as LoadBalancer. Services will spread the sessions for your users.
Another option is to use an Ingress. In order to do this, you'll need to use an Ingress Controller, such as the nginx-ingress-controller which is the most featureful and widely deployed.
Both of these options will automatically loadbalance your incoming application sessions, but they may not necessarily do it in the order you've described in your image, it'll be random across the available web deployments