Flink on K8S: how do I provide Flink configuration to the cluster? - kubernetes

I am following Flink Kubernetes Setup to create a cluster, but it is unclear how can I provide Flink configuration to the cluster? e.g., I want to specify jobmanager.heap.size=2048m.

According to the docs, all configuration has to be passed via a yaml configuration file.
It seems that jobmanager.heap.size is a common option that can be configured.
That being said, the approach on kubernetes is a little different when it comes to providing this configuration file.
The next piece of the puzzle is figuring out what the current start command is for the container you are trying to launch. I assumed you were using the official flink docker image which is good because the Dockerfile is opensource (link to repo at the bottom). They are using a complicated script to launch the flink container, but if you dig through that script you will see that it's reading the configuration yaml from /opt/flink/conf/flink-conf.yaml. Instead of trying to change this, it'll probably be easier to just mount a yaml file at that exact path in the pod with your configuration values.
Here's the github repo that has these Dockerfiles for reference.
Next question is what should the yaml file look like?
From their docs:
All configuration is done in conf/flink-conf.yaml, which is expected
to be a flat collection of YAML key value pairs with format key: value.
So, I'd imagine you'd create flink-conf.yaml with the following contents:
jobmanager.heap.size: 2048m
And then mount it in your kubernetes pod at /opt/flink/conf/flink-conf.yaml and it should work.
From a kubernetes perspective, it might make the most sense to make a configmap of that yaml file, and mount the config map in your pod as a file. See reference docs
Specifically, you are most interested in creating a configmap from a file and Adding a config map as a volume
Lastly, I'll call this out but I won't recommend it because of the fact that the owners of flink have marked it as an incubating feature currently, they have started providing a helm chart for Flink, and I can see that they are passing flink-conf.yaml as a config map in the helm chart templates (ignore values surrounded with {{ }} - that is helm template syntax). Here is where they mount their config map into a pod.

Related

How can I backup nifi processes when restarting Kuberntes pod?

I have deployed Nifi on Kuberntes using cetic/helm-nifi helm chart. We are facing a problem, If nifi pod restarts we lost the all processes that we created. Is there any way to keep a backup of the process in nifi canvas.
You might be deploying the helm with default config or you are tweaking any configs also?
i have not used nifi but i think enabling the PVC config for nifi might resolve your resolve.
https://github.com/cetic/helm-nifi/blob/master/values.yaml#L211
You can enable the PVC by changing the values.yaml line number.
persistence:
enabled: true
if you have already created volumes you can also use it.
Nifi keeps flow definitions in the /opt/nifi/nifi-current/conf/flow.xml.gz file in the docker image. It is strongly advised to have /conf folder persist .There are lots of kubernetes alternatives for this . check the official document
You can also find archives in the /conf folder for back-ups

What are other ways to provide configuration information to pods other than ConfigMap

I have a deployment in which I want to populate pod with config files without using ConfigMap.
You could also store your config files on a PersistentVolume and read those files at container startup. For more details on that topic please take a look at the K8S reference docs: https://kubernetes.io/docs/concepts/storage/persistent-volumes/
Please note: I would not consider this good practice. I used this approach in the early beginning of a project where a legacy app was migrated to Kubernetes: The application consisted of tons of config files that were read by the application at startup.
Later on I switched to creating ConfigMaps from my configuration files, as the latter approach allows to store the K8S object (yaml file) in Git and I found managing/editing a ConfigMap way easier/faster, especially in a multi-node K8S environment:
kubectl create configmap app-config --from-file=./app-config1.properties --from-file=./app-config2.properties
If you go for the "config files in persistent volume" approach you need to take different aspects into account... e.g. how to bring your configuration files on that volume, potentially not on a single but multiple nodes, and how to keep them in sync.
You can use environment variable and read the value from environment.
Or you

Deleting kubernetes yaml: how to prevent old objects from floating around?

i'm working on a continuous deployment routine for a kubernetes application: everytime i push a git tag, a github action is activated which calls kubectl apply -f kubernetes to apply a bunch of yaml kubernetes definitions
let's say i add yaml for a new service, and deploy it -- kubectl will add it
but then later on, i simply delete the yaml for that service, and redeploy -- kubectl will NOT delete it
is there any way that kubectl can recognize that the service yaml is missing, and respond by deleting the service automatically during continuous deployment? in my local test, the service remains floating around
does the developer have to know to connect kubectl to the production cluster and delete the service manually, in addition to deleting the yaml definition?
is there a mechanism for kubernetes to "know what's missing"?
You need to use a CI/CD tool for Kubernetes to achieve what you need. As mentioned by Sithroo Helm is a very good option.
Helm lets you fetch, deploy and manage the lifecycle of applications,
both 3rd party products and your own.
No more maintaining random groups of YAML files (or very long ones)
describing pods, replica sets, services, RBAC settings, etc. With
helm, there is a structure and a convention for a software package
that defines a layer of YAML templates and another layer that
changes the templates called values. Values are injected into
templates, thus allowing a separation of configuration, and defines
where changes are allowed. This whole package is called a Helm
Chart.
Essentially you create structured application packages that contain
everything they need to run on a Kubernetes cluster; including
dependencies the application requires. Source
Before you start, I recommend you these articles explaining it's quirks and features.
The missing CI/CD Kubernetes component: Helm package manager
Continuous Integration & Delivery (CI/CD) for Kubernetes Using CircleCI & Helm
There's no such way. You can deploy resources from yaml file from anywhere if you can reach the node and configure kube config. So kubernetes will not know how to respond on a file deletion. If you still want to do this, you can write a program (a go code) which checks the availability of files in one place and deletes the corresponding resource whenever the file gets deleted.
There's one way via kubernetes is by using kubernetes operator, and whenever there is any change in your files you can update the crd used to deploy resources via operator.
Before deleting the yaml file, you can run kubectl delete -f file.yaml, this way all the resources created by this file will be deleted.
However, what you are looking for, is achieving the desired state using k8s. You can do this by using tools like Helmfile.
Helmfile, allow you to specify the resources you want to have all in one file, and it will achieve the desired state every time you run helmfile apply

Is there a way to create a configMap containing multiple files for a Kubernetes Pod?

I want to deploy Grafana using Kubernetes, but I don't know how to attach provisioned dashboards to the Pod. Storing them as key-value data in a configMap seems to me like a nightmare - example here https://github.com/do-community/doks-monitoring/blob/master/manifest/dashboards-configmap.yaml - in my case it would me much more JSON dashboards - thus the harsh opinion.
I didn't had an issue with configuring the Grafana settings, datasources and dashboard providers as configMaps since they are defined in single files, but the dashboards situation is a little bit more tricky for me.
All of my dashboards are stored in the repo under "/files/dashboards/", and I wondered how to make them available to the Pod, besides the way described earlier. Wondered about using the hostPath object for a sec, but didn't make sense for multi-node deployment on different hosts.
Maybe its easy - but I'm fairly new to Kubernetes and can't figure it out - so any help would be much appreciated. Thank you!
You can automatically generate a ConfigMap from a set fo files in a directory. Each file will be a key-value pair in the ConfigMap with the file name being the key and the file content being the value (like in your linked example but done automatically instead of manually).
Assuming that your dashboard files are stored as, for example:
files/dashboards/
├── k8s-cluster-rsrc-use.json
├── k8s-node-rsrc-use.json
└── k8s-resources-cluster.json
You can run the following command to directly create the ConfigMap in the cluster:
kubectl create configmap my-config --from-file=files/dashboards
If you prefer to only generate the YAML manifest for the ConfigMap, you can do:
kubectl create configmap my-config --from-file=files/dashboards --dry-run -o yaml >my-config.yaml
You could look into these options:
Use a persistent volume.
Store the JSON files for the dashboards in a code repo like git, file repository like nexus, or a plain web server, and use init container to get the files before the application (Grafana) container is started and put them on a volume shared between the init container and the application (Grafana) container. This example could be a good starting point.
Notice that this doesn't require a persistent volume. See in the example - it uses a volume of type emptyDir.

Passing variables to args field in a yaml file, kubernetes

I am writing a YAML file to use Kubernetes and I wondering how to pass variables to args field.
I need to do something like this :
args: ['--arg1=http://12.12.12.12:8080','--arg2=11.11.11.11']
But I don't want to hard code those values for --arg1 and --arg2, instead it should be like,
args: ['--arg1='$HOST1,'--arg2='$HOST2]
How should I do this?
You have two options that are quite different and really depend on your use-case, but both are worth knowing:
1) Helm would allow you to create templates of Kubernetes definitions, that can use variables.
Variables are supplied when you install the Helm chart, and before the resulting manifests are deployed to Kubernetes.
You can change the variables later on, but what it does is regenerate the YAML and re-deploy "static" versions of the result (template+variables=YAML that's sent to Kubernetes)
2) ConfigMaps allow you to separate a configuration from the pod manifest, and share this configuration across several pods/deployments.
You can later reference the ConfigMap from your pod/deployment manifests.
Hope this helps!