i am running kafka on kubernetes (deployed on Azure) using strimzi for development environment and would prefer to use internal kubernetes node storage. if i use persistant-claim or jbod, it creates standard disks on azure storage. however i prefer to use internal node storage as i have 16 gb available there. i do not want to use ephemeral as i want the data to be persisted atleast on kubernetes nodes.
folllowing is my deployment.yml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: kafka-cluster
spec:
kafka:
version: 3.1.0
replicas: 2
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
- name: external
type: loadbalancer
tls: false
port: 9094
config:
offsets.topic.replication.factor: 2
transaction.state.log.replication.factor: 2
transaction.state.log.min.isr: 2
default.replication.factor: 2
min.insync.replicas: 2
inter.broker.protocol.version: "3.1"
storage:
type: persistent-claim
size : 2Gi
deleteClaim: false
zookeeper:
replicas: 2
storage:
type: persistent-claim
size: 2Gi
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}
The persistent-claim storage as you use it will provision the storage using the default storage class which in your case I guess creates standard storage.
You have two options how to use local disk space of the worker node:
You can use the ephemeral type storage. But keep in mind that this is like a temporary directory, it will be lost in every rolling update. Also if you for example delete all the pods at the same time, you will loose all data. As such it is something recommended only for some short-lived clusters in CI, maybe some short development etc. But for sure not for anything where you need reliability.
You can use Local Persistent Volumes which are persistent volumes which are bound to a particular node. These are persistent, so the pods will re-use the volume between restarts and rolling udpates. However, it bounds the pod to the particular worker node the storage is on -> so you cannot easily reschedule it to another worker node. But apart from these limitation, it is something what can be (unlike the ephemeral storage) used with reliability and availability when done right. The local persistent volumes are normally provisioned through StorageClass as well -> so in the Kafka custom resource in Strimzi it will still use the persistent-claim type storage, just with different storage class.
You should really thing what exactly you want to use and why. From my experience, the local persistent volumes are great option when
You run on bare metal / on-premise clusters where often good shared block storage is not available
When you require maximum performance (local storage does not depend on network, so it can be often faster)
But in public clouds with good support for high quality for networked block storage such as Amazon EBS volumes and their Azure or Google counterparts, local storage often brings more problems than advantages because of how it bounds your Kafka brokers to a particular worker node.
Some more details about the local persistent volumes can be found here: https://kubernetes.io/docs/concepts/storage/volumes/#local ... there are also different provisioners which can help you use it. I'm not sure if Azure supports anything out of the box.
Sidenote: 2Gi of space is very small for Kafka. Not sure how much you will be able to do before running out of disk space. Even 16Gi would be quite small. If you know what are you doing, then fine. But if not, you should be careful.
Related
how to migrate VM data from a disk to a Kubernetes cluster?
I have a VM with three disks attached mounted to it, each having data that needs to be migrated to a Kubernetes cluster to be attached to database services in statefulset.
This link https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/preexisting-pd does give me the way. But don't know how to use it or implement it with statefulsets so that one particular database resource (like Postgres) be able to use the same PV(created from one of those GCE persistent disks) and create multiple PVCs for new replicas.
Is the scenario I'm describing achievable?
If yes how to implement it?
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch
spec:
version: 6.8.12
http:
tls:
selfSignedCertificate:
disabled: true
nodeSets:
- name: default
count: 3
config:
node.store.allow_mmap: false
xpack.security.enabled: false
xpack.security.authc:
anonymous:
username: anonymous
roles: superuser
authz_exception: false
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: managed-premium
StatefulSet is a preference for manually creating a PersistentVolumeClaim. For database workloads that can't be replicated, you can't set replicas: greater than 1 in either case, but the PVC management is valuable. You usually can't have multiple databases pointing at the same physical storage, containers or otherwise, and most types of Volumes can't be shared across Pods.
Postgres doesn't support multiple instances sharing the underlying volume without massive corruption so if you did set things up that way, it's definitely a mistake. More common would be to use the volumeClaimTemplate system so each pod gets its own distinct storage. Then you set up Postgres streaming replication yourself.
Refer to the document on Run a replicated stateful application and stackpost for more information.
Kubernetes creates one PersistentVolume for each VolumeClaimTemplate definition on an statefulset. That makes each statefulset pod have its own storage that is not shared across the replicas. However, I would like to share the same volume across all the statefulset replicas.
It looks like the approach should be the following:
Create a PVC on the same namespace.
On the statefulset use Volumes to bound the PVC
Ensure that the PVC is ReadOnlyMany or ReadWriteMany
Assuming that my application is able to deal with any concurrency on the shared volume, is there any technical problem if I have one PVC to share the same volume across all statefulset replicas?
I wholeheartedly agree with the comments made by #Jonas and #David Maze:
You can do this, it should work. There is no need to use volumeClaimTemplates unless your app needs it.
Two obvious problems are that ReadWriteMany volumes are actually a little tricky to get (things like AWS EBS volumes are only ReadWriteOnce), and that many things you want to run in StatefulSets (like databases) want exclusive use of their filesystem space and use file locking to enforce this.
Answering on the question:
Is there any technical problem if I have one PVC to share the same volume across all statefulset replicas?
I'd say that this would mostly depend on the:
How the application would handle such scenario where it's having single PVC (writing concurrency).
Which storage solution are supported by your Kubernetes cluster (or could be implemented).
Subjectively speaking, I don't think there should be an issue when above points are acknowledged and aligned with the requirements and configuration that the cluster/applications allows.
From the application perspective, there is an inherent lack of the software we are talking about. Each application could behave differently and could require different tuning (look on the David Maze comment).
We do not also know anything about your infrastructure so it could be hard to point you potential issues. From the hardware perspective (Kubernetes cluster), this would inherently go into making a research on the particular storage solution that you would like to use. It could be different from cloud provider to cloud provider as well as on-premise solutions. You would need to check the requirements of your app to align it to the options you have.
Continuing on the matter of Volumes, I'd reckon the one of the important things would be accessModes.
Citing the official docs:
Access Modes
A PersistentVolume can be mounted on a host in any way supported by the resource provider. As shown in the table below, providers will have different capabilities and each PV's access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV's capabilities.
The access modes are:
ReadWriteOnce -- the volume can be mounted as read-write by a single node
ReadOnlyMany -- the volume can be mounted read-only by many nodes
ReadWriteMany -- the volume can be mounted as read-write by many nodes
In the CLI, the access modes are abbreviated to:
RWO - ReadWriteOnce
ROX - ReadOnlyMany
RWX - ReadWriteMany
Kubernetes.io: Docs: Concepts: Storage: Persistent Volumes: Access modes
One of the issues you can run into is with the ReadWriteOnce when the PVC is mounted to the Node and sts-X (Pod) is scheduled onto a different Node but from the question, I'd reckon you already know about it.
However, I would like to share the same volume across all the statefulset replicas.
An example of a StatefulSet with a Volume that would be shared across all of the replicas could be following (modified example from Kubernetes documentation):
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 3 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: web
# VOLUME START
volumeMounts:
- name: example-pvc
mountPath: /usr/share/nginx/html
volumes:
- name: example-pvc
persistentVolumeClaim:
claimName: pvc-for-sts
# VOLUME END
Additional resources:
Kubernetes.io: Docs: Concepts: Workloads: Controllers: Statefulset
Kubernetes.io: Docs: Concepts: Storage: Persistent Volumes
I need to list all nodes from my Redis Cluster on attribute spring.redis.sentinel.nodes? Is it right?
I wanna run a Redis Cluster on K8s to use the autoscaling provided from K8s, How can I use autoscale is it necessary to inform all nodes on spring.redis.sentinel.nodes?
Good question 💯.
The short answer is that you typically don't do autoscaling with stateful apps like Redis since you have to be careful about not corrupting your data. Most of the time you migrate and shard your data, i.e multiple clusters, with different segments of your data, etc.
Having said that, there is no silver bullet redis autoscaling solution but it's doable with a lot of monitoring and testing 🦄. A challenge here is that sentinels change the master in case of failover so your solution needs to be able to determine or monitor who the master is at a certain interval, this is very critical during downscales. Redis has written a pretty good guide on how to create clients which you will probably have to do/understand if you want a reliable autoscaling solution.
So the idea here 💡 is that you start with a set of sentinel/redis nodes managed by a Kubernetes Operator. With some config like this:
apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
name: redisfailover
spec:
sentinel:
replicas: 3
resources:
requests:
cpu: 100m
limits:
memory: 100Mi
redis:
replicas: 3
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 400m
memory: 500Mi
Then maybe modify the controller of this operator to do autoscale based on certain metrics (CPUs, Memory, Storage, etc).
The moment there is an autoscale operation you will have to do a configuration change in your Spring boot application to account for this change (say the ConfigMap of your application). For example, automatically change the value of this:
spring:
cache:
type: redis
redis:
port: 6666
password: 123pwd
sentinel:
master: masterredis
nodes:
- 10.0.0.16
- 10.0.0.17
- 10.0.0.18
lettuce:
shutdown-timeout: 200ms
Now after the config change, you need to do a rolling restart to prevent any downtime. The best thing, in my opinion, to do this in Kubernetes is just to have another Operator (or extend the Redis operator) for your application, that has a controller, to automatically detect when there is scaling operation, does the ConfigMap change, and finally does the rolling restart of your app. Your scaling operations need to allow enough time for balancing of data and also the rolling restart to prevent any thrashing/starvation and possible downtime/data corruption.
✌️☮️
Can anyone share me the yaml file for creating kafka cluster with two kafka broker and zookeeper cluster with 3 servers.I'm new to kubernetes.
Take look at https://github.com/Yolean/kubernetes-kafka, Make sure the broker memory limit is 2 GB or above.
Maintaining a reliable kafka cluster in kubernetes is still a challenge, good luck.
I recommend you to try Strimzi Kafka Operator. Using it you can define a Kafka cluster just like other Kubernetes object - writing a yaml file. Moreover, also users, topics and Kafka Connect cluster are just a k8s objects. Some (by not all!) features of Strimzi Kafka Operator:
Secure communication between brokers and between brokers and zookeeper with TLS
Ability to expose the cluster outside k8s cluster
Deployable as a helm chart (it simplifies things a lot)
Rolling updates when changing cluster configuration
Smooth scaling out
Ready to monitor the cluster using Prometheus and Grafana.
It's worth to mention a great documentation.
Creating a Kafka cluster is as simple as applying a Kubernetes manifest like this:
apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
version: 2.2.0
replicas: 3
listeners:
plain: {}
tls: {}
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
log.message.format.version: "2.2"
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 100Gi
deleteClaim: false
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 100Gi
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}
I think that you could take a look at the Strimzi project here https://strimzi.io/.
It's based on the Kubernetes operator pattern and provide a simple way to deploy and manage a Kafka cluster on Kubernetes using custom resources.
The Kafka cluster is described through a new "Kafka" resource YAML file for setting all you need.
The operator takes care of that and deploys the Zookeeper ensemble + the Kafka cluster for you.
It also deploys more two operators for handling topics and users (but they are optional).
Another simple configuration of Kafka/Zookeeper on Kubernetes in DigitalOcean with external access:
https://github.com/StanislavKo/k8s_digitalocean_kafka
You can connect to Kafka from outside of AWS/DO/GCE by regular binary protocol. Connection is PLAINTEXT or SASL_PLAINTEXT (username/password).
Kafka cluster is StatefulSet, so you can scale cluster easily.
I am trying to cut the costs of running the kubernetes cluster on Google Cloud Platform.
I moved my node-pool to preemptible VM instances. I have 1 pod for Postgres and 4 nodes for web apps.
For Postgres, I've created StorageClass to make data persistent.
Surprisingly, maybe not, all storage data was erased after a day.
How to make a specific node in GCP not preemptible?
Or, could you advice what to do in that situation?
I guess I found a solution.
Create a disk on gcloud via:
gcloud compute disks create --size=10GB postgres-disk
gcloud compute disks create --size=[SIZE] [NAME]
Delete any StorageClasses, PV, PVC
Configure deployment file:
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
selector:
matchLabels:
app: postgres
replicas: 1
template:
metadata:
labels:
app: postgres
role: postgres
spec:
containers:
- name: postgres
image: postgres
env:
...
ports:
...
# Especially this part should be configured!
volumeMounts:
- name: postgres-persistent-storage
mountPath: /var/lib/postgresql
volumes:
- name: postgres-persistent-storage
gcePersistentDisk:
# This GCE PD must already exist.
pdName: postgres-disk
fsType: ext4
You can make a specific node not preemptible in a Google Kubernetes Engine cluster, as mentioned in the official documentation.
The steps to set up a cluster with both preemptible and non-preemptible node pools are:
Create a Cluster: In the GCP Console, go to Kubernetes Engine -> Create Cluster, and configure the cluster as you need.
On that configuration page, under Node pools, click on Add node pool. Enter the number of nodes for the default and the new pool.
To make one of the pools preemptible, click on the Advance edit button under the pool name, check the Enable preemptible nodes (beta) box, and save the changes.
Click on Create.
Then you probably want to schedule specific pods only on non-preemptible nodes. For this, you can use node taints.
You can use managed service from GCP named GKE google kubernetes cluster.
And storage data erased cause of storage class change may not retain policy and PVC.
It's better to use managed service I think.