Flink native kubernetes deployment - kubernetes

I have some limitations with the rights required by Flink native deployment.
The prerequisites say
KubeConfig, which has access to list, create, delete pods and **services**, configurable
Specifically, my issue is I cannot have a service account with the rights to create/remove services. create/remove pods is not an issue. but services by policy only can be created within an internal tool.
could it be any workaround for this?

Flink creates two service in native Kubernetes integration.
Internal service, which is used for internal communication between JobManager and TaskManager. It is only created when the HA is not enabled. Since the HA service will be used for the leader retrieval when HA enabled.
Rest service, which is used for accessing the webUI or rest endpoint. If you have other ways to expose the rest endpoint, or you are using the application mode, then it is also optional. However, it is always be created currently. I think you need to change some codes to work around.

Related

Running a stateless app as a statefulset (Kubernetes)

In the Kubernetes world, a typical/classic pattern is using Deployment for Stateless Applications and using StatefulSet for a stateful application.
I am using a vendor product (Ping Access) which is meant to be a stateless application (it plays the role of a Proxy in front of other Ping products such as Ping Federate).
The github repo for Ping Cloud (where they run these components as containers) shows them running Ping Access (a stateless application) as a Stateful Set.
I am reaching out to their support team to understand why anyone would run a Stateless application as a StatefulSet.
Are there other examples of such usage (as this appears strange/bizarre IMHO)?
I also observed a scenario where a customer is using a StatefulApp (Ping Federate) as a regular deployment instead of hosting them as a StatefulSet.
The Ping Cloud repository does build and deploy Ping Federate as a StatefulSet.
Honestly, both these usages, running a stateless app as a StatefulSet (Ping Access) and running a stateful app as a deployment (Ping Federate) sound like classic anti-patterns.
Apart from the ability to attach dedicated Volumes to StatefulSets you get the following features of which some might be useful for stateless applications:
Ordered startup and shutdown of Pods with K8s doing them one by one in an ordered fashion.
Possibility to guarantee that not more than a single Pod is running at a time even during unscheduled Pod restarts.
Stable DNS names for Pods.
I can only speculate, why Ping Federate uses a StatefulSet. Possibly, it has to do with access limitations of the downstream services it connects to.
The consumption of PingAccess is stateless, but the operation is very much stateful. Namely, the PingAccess admin console maintains a database for configuration, and part of that configuration includes clustered engine mapping and session keys.
Thus, if you were to take away the persistent volume, restarting the admin console would decouple all the engines in the cluster. Then the engines no longer receive configuration.. and web session keys would be mismatched.
The ping-cloud-base repo uses StatefulSet for engines not for persistent volumes, but for sts naming scheme. I personally disagree with this and recommend using Deployment for engines. The only downside is you then have to remove orphaned engines from the admin configuration. Orphaned engines meaning engine config that stays in the admin console db after the engine deployment is rolled/updated. These can be removed from the admin UI, or API. Pretty easy to script in a hook.
It would be ideal for an application that is not a datastore to run without persistent volume, but for the reasons mentioned above, the PingAccess admin console does require and act like a persistent datastore so I think StatefulSet is okay.
Finally, the Ping DevOps team focuses support on their helm chart (where engines are also deployments by default). I'd suspect the community and enterprise support is much larger there for folks deploying on their own. ping-cloud-base is a good place to pick up some hooks though.

How can I check if a resource inside Kubernetes has been deleted for some reason?

I am a junior developer currently running a service in a Kubernetes environment.
How can I check if a resource inside Kubernetes has been deleted for some reason?
As a simple example, if a deployment is deleted, I want to know which user deleted it.
Could you please tell me which log to look at.
And I would like to know how to collect these logs.
I don't have much experience yet, so I'm asking for help.
Also, if you have a reference or link, please share it. It will be very helpful to me.
Thank you:)
Start with enabling audit with lots of online resources about doing this.
If you are on AWS and using EKS I would suggest enabling "Amazon EKS control plane logging" By enabling it you can enable audit and diagnostic logs streaming in AWS cloudwatch logs, which are more easily accessible, and useful for audit and compliance requirements. Control plane logs make it easy for you to secure and run your clusters and make the entire system more audiatable.
As per AWS documentation:
Kubernetes API server component logs (api) – Your cluster's API server is the control plane component that exposes the Kubernetes API. For more information, see kube-apiserver in the Kubernetes documentation.
Audit (audit) – Kubernetes audit logs provide a record of the individual users, administrators, or system components that have affected your cluster. For more information, see Auditing in the Kubernetes documentation.
Authenticator (authenticator) – Authenticator logs are unique to Amazon EKS. These logs represent the control plane component that Amazon EKS uses for Kubernetes Role-Based Access Control (RBAC) authentication using IAM credentials. For more information, see Cluster authentication.
Controller manager (controllerManager) – The controller manager manages the core control loops that are shipped with Kubernetes. For more information, see kube-controller-manager in the Kubernetes documentation.
Scheduler (scheduler) – The scheduler component manages when and where to run pods in your cluster. For more information, see kube-scheduler in the Kubernetes documentation.
Reference: https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html

Usage of custom resource definition (CRD) vs Service Catalog in k8s

I recently started to explore k8s extensions and got introduced to two concepts:
CRD.
Service catalogs.
They look pretty similar to me. The only difference to my understanding is, CRDs are deployed inside same cluster to be consumed; whereas, catalogs are deployed to be exposed outside the cluster for example as database service (client can order cluster of mysql which will be accessible from his cluster).
My query here is:
Is my understanding correct? if yes, can there be any other scenario where I would like to create catalog and not CRD.
Yes, your understanding is correct. Taken from official documentation:
Example use case
An application developer wants to use message queuing as part of their application running in a Kubernetes cluster.
However, they do not want to deal with the overhead of setting such a
service up and administering it themselves. Fortunately, there is a
cloud provider that offers message queuing as a managed service
through its service broker.
A cluster operator can setup Service Catalog and use it to communicate
with the cloud provider’s service broker to provision an instance of
the message queuing service and make it available to the application
within the Kubernetes cluster. The application developer therefore
does not need to be concerned with the implementation details or
management of the message queue. The application can simply use it as
a service.
With CRD you are responsible for provisioning resources, running backend logic and so on.
More info can be found in this KubeCon 2018 presentation.

How can i set up workers in kubernetes (infrastructure questions)

I'm using kubernetes and i would like to set up workers , one of my docker host an API using flask, i have an algorithm in another docker (same pod , i don't know if i should leave it in the same) and other scripts that are also in separated dockers.
i want to link all of these, when i receive a request on the API, call the other dockers depending on the request and get the return.
I don't know how to do that with multiple dockers and so kubernetes.
I'm using RQ library for python to parallelize until now but it was on Heroku without kubernetes (i'm migrating to azure at the moment) and i don't know how it manage it behind.
Thank you.
follow the below reference and setup kubernetes cluster using kubeadm.
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
using 'kubeadm join' command you should be able to add worker nodes to the master.
above given link has steps to join the worker to master
If you are using Azure, you can try exploring AKS. It works out of the box. You just need to configure kubectl and you will be good to go.
Regarding deploying multiple microservices(API), you can deploy each microservice as a separate k8s deployment using kubectl and expose them using a service. This way they can communicate with each other using exposed endpoints(API) or a message queue .
Here is a quick guide you can take help from : https://dzone.com/articles/quick-guide-to-microservices-with-kubernetes-sprin
Typically you should use only one container per pod. Multiple containers per pod are possible but are typically used for sidecars, not for additional APIs.
You expose your services using kubernetes services, no need to run everything on a different port if you don't want to.
A minimal setup for typicall webapi calls would look something like this (if you expose your API service as public LoadBalancer you don't necessarily need Ingress)
Client -> (Ingress) -> API service -> API deployment pod(s) -> internal services -> deployment pods.
You can access your internal services from within your cluster using http(s)://servicename[:custom-port]
On the other hand, if you simply use flask to forward API calls to other services, you might want to replace it with an Ingress Controller that does all the routing for you.

Is authentication required on RESTful services interacting with each other on Kubernetes cluster?

We have a microservice architecture and there are REST services interacting with each other through HTTP. All of these services are hosted on a Kubernetes cluster. Do we need to have explicit authentication for such service interaction or does Kubernetes provide enough security for it?
Kubernetes provides only orchestration for your conteinerized applications. It helps you to run, update, scale your services and provides a way of delivering traffic to them inside the cluster. Most of the Kubernetes security relates to traffic management and role based administration of the cluster.
Some additional tools like Istio can provide you secure communication between pods and some other traffic management capabilities.
Applications in pods should have their own capabilities of providing Authentication and Authorization based on local files/databases or network services like LDAP or OpenID etc.
It's purely based on how you design, architect, how you create a SDD for your system. While designing one, security hardening must be considered and give priority. The software and tools bring their features but, how you adopt is important. Kubernetes is no exception.
You are running your micro-services using HTTP and in production system, you can not believe that your system is secure even if it's running in Kubernetes cluster. Kubernetes brings cool features from security perspective as RBAC, CRD, etc. as you can find in here, Kubernetes 1.8 Security, Workloads and Feature Depth. But, still leveraging only these feature is not sufficient. The internal services should be as secure as external once. Following are few things you should take care once you are running your workload into kubernetes cluster,
Scan all your docker images for vulnerability testing.
Use RBAC over ABAC and assign optimum privileges to respective teams.
Configure a security context for a pod running your service.
Avoid unauthorized internal access to service data and protect all micro-services end-points.
Encryption keys should be rotated over a certain period of time.
The datastore like etcd for your kubernetes cluster must be secured.
Only admin should have access to kubectl.
Use token based validation and enable authentication on all REST api calls.
Continuous Monitoring all the services, logs for analysis, health-check, all the processes running inside containers.
Hope this helps.