I am deploying KafkaConnect using Custom Resource. I would like to mount to Kafka Connect cluster PersistentVolumeClaim. The idea is that another application responsible for file transfer will place there file that will be picked up by kafka connector.
I checked KafkaConnect resource config docs and it seems that I cannot simply add the volume to the Pod.
My understanding is that if I will patch the Pod strimzi-operator will recognise the modification and on the next reconciliation will overwrite it.
Would anyone have an idea how can I still use KafkaConnect CR and mount the pvc volume?
This is currently not supported by Strimzi. There is an enhancement issue for this (https://github.com/strimzi/strimzi-kafka-operator/issues/2571) but nobody implemented it yet.
I wonder if for the use-case you describe something like running the Kafka Connect as a sidecar would make more sense. You could then share the storage directly without any networking etc. (This is not something supported by Strimzi, But Kafka Connect itself can of course be used like this)
Related
Hello (I use the google translator).
I have the following problem, I have a Kafka service in kubernetes, everything is managed by rancher and the deployment of kafka is done through the catalogs that rancher allows, (I attach an image of the service)
Everything works correctly within kubernetes, but now I need a producer external to kubernetes connects to Kafka and sends messages so that they are received internally in kubernetes.
I have not been able to accomplish this task and I have already tried another kafka deployment following this guide:
https://www.weave.works/blog/kafka-on-kubernetes-and-deploying-best-practice
[1]
But I can't understand both in the version of rancher catalogs and not in the version installed through YAML files, where and what should I configure to have a producer outside of kubernetes, I also tried to set the service as NodePort but this didn't work, any help you are welcome and thank you.
NodePort is one option. A LoadBalancer or Ingress is another.
Rather than use Rancher catalogs, I'd recommend that you read through the Strimzi operator documentation, where it covers all options for external client communication
I am planning to run Kafka on GCP (google cloud platform).
What I wonder is what happens to a data in Kafka topic when a GCP pod fails? By default a new pod will be created, but will the data in Kafka topic be lost? How can I avoid data loss in this situation?
I appreciate any help. Thanks in advance :)
Best Regards,
Kafka itself needs a solution for persistence, you will probably need a cloud native storage solution. Create a storage class defining your storage requirements like replication factor, snapshot policy, and performance profile. Deploy Kafka as a StatefulSet on Kubernetes at a high level.
Not understood exactly your purpose but in this case, you cannot guarantee Kafka's data resistance when pod fails/evicted. Maybe you should tried with native VM with Kafka installed and config that to be fully backed up (restorable anytime when disaster happen)
It depends on what exactly you need. It's quite a general question.
You have some ready Kafka deployments if you would use MarketPlace.
As you are asking for pods, I guess you want to use Google Kubernetes Engine. On the internet you can find many guides about using Kafka on Kubernetes.
For example you can refer Kafka with Zookeeper on Portworx. In one of the steps you have StorageClass yaml. In GKE default storageclass is set to delete but you can create a new storageclass with reclaimPolicy: Retain which will keep the disk in GCP after delete of pod.
In GCP you have also an option to create disk snapshot
In addition to some best practice using Kafka on Kubernetes you can find here.
Context :
We have a Apache Nifi cluster deployed in Kubernetes as Stateful sets, and a volume claim template is used for Nifi repositories.
Nifi helm charts we are using
There is a use case where file processing is done by Nifi. So the file feeds are put into a shared folder and nifi would read it from the shared folder. When multiple nodes of Nifi is present all three would read from the shared folder.
In a non kubernetes environment we use NFS file share.
In AWS we use AWS S3 for storage and Nifi has processors to read from S3.
Problem :
Nifi is already deployed as a statefulset and use volume claim template for the storage repository. How can we mount this NFS share for file feed to all nifi replicas.
or in other word putting the question in a generic manner,
How can we mount a single NFS shared folder to all statefulset replicas ?
Solutions tried
We tried to link separate pvc claimed folders to the nfs share , but looks like a work around.
Can somebody please help. Any hints would be highly appreciated.
Put it in the pod template like normal. NFS is a "ReadWriteMany" volume type so you can create one PVC and then use it on every pod simultaneously. You can also configure NFS volumes directly in the pod data but using a PVC is probably better.
It sounds like what you have is correct :)
I am using AWS EKS to run Kubernetes Cluster.
In that, I am using AWS EFS for persistent storage to store application logs. I have many applications running and I create a PVC for each application. I need persistent storage for the application logs only. Now, I need these logs in Elasticsearch also. So, I use Filebeat to do so. So, this is my architecture,
I just want to get feedback on this architecture. Is this the correct way to do this? What can be the drawbacks of this? How are you sending application logs to ELK in kubernetes?
At the moment I have a Kubernetes cluster distributed on AWS via kops. I have a doubt: is it possible to make a sort of snapshot of the Kubernetes cluster and recreate the same environment (master and pod nodes), for example to be resilient or to migrate the cluster in an easy way? I know that the Heptio Ark exists, it is very beautiful. But I'm curious to know if there is an easy way to do it. For example, is it enough to back up Etcd (or in my case the snapshot of EBS volumes)?
Thanks a lot. All suggestions are welcome
kops stores its state in an S3 bucket identified by the KOPS_STATE_STORE. So yes, if your cluster has been removed you can restore it by running kops create cluster.
Keep in mind that it doesn't restore your etcd state so for that you are going to set up etcd backups. You could also make use of Heptio Ark.
Similar answers to this topic:
Recover kops Kubernetes cluster
How to restore kubernetes cluster using kops?
As mentioned by Rico in the earlier post, you can use Velero to back up your etcd using cli client. Another option to consider for the scenario you described is CAPE: CAPE provides an easy to use control plane for Kubernetes Multi-cluster App & Data Management via a friendly user interface.
See below for resources:
How to create an on-demand K8s Backup:
https://www.youtube.com/watch?v=MOPtRTeG8sw&list=PLByzHLEsOQEB01EIybmgfcrBMO6WNFYZL&index=7
How to Restore/Migrate K8s Backup to Another Cluster:
https://www.youtube.com/watch?v=dhBnUgfTsh4&list=PLByzHLEsOQEB01EIybmgfcrBMO6WNFYZL&index=10