I would like to create a cronjob via Knative that sends healthcheck messages to my Kafka topic every 10 minutes.
Then we will have a separate endpoint created, that will receive these messages and pass some response to a receiver (healthcheck).
Tried using KafkaBinding, which seems suitable, but cannot apply this template (TLS): https://github.com/knative/docs/tree/main/code-samples/eventing/kafka/binding
I also find it odd, that regular KafkaBinding template contains apiVersion: bindings.knative.dev/v1beta1, while the one with TLS: sources.knative.dev/v1beta1.
Haven't found much documentation on how to create a Cronjob sending messages on Kafka, which is then grabbed using KafkaSource and passed using broker+trigger to my service on K8s.
Anyone got it implemented right? Or found another way for this?
The idea is test the whole flow including KafkaSource, which seems a little bit unstable.
Related
We just migrated to a kubernetes cluster, I was wondering if it is possible to send a kafka event when a container/pod finishes automatically with the stdout as message. Right now we are using fluentd with elastic search but the output of a pod is used as input for the next one, we need to poll constantly elastic search for when the output is ready and that causes performance issues on overall execution
I'm not sure of your current setup but my first thought would jump to:
Use something such as fluentd or Logstash on it's own pod per node
Configure volume access to Kubernetes log folder /var/log/containers/*
Use the Kafka output for either fluentd or Logstash with file input (tail) on the logging folder
This approach would require the configuration above on each node however but requires minimal configuration of logging locations etc..
It's not something I've personally configured but have considered it for the future.
More info here
I have deployed a Kafka cluster and a Kafka Connect cluster in kubernetes, using Strimzi and AKS. And I wanted to start reading from RSS resources to feed my Kafka cluster, so I created a connector instance of "org.kaliy.kafka.connect.rss.RssSourceConnector" which reads from a specific RSS feed, given an url, and writes to a specific topic. But my whole intention with this is to eventually have a Kafka Connect cluster able to manage a lot of external requests of new RSSs to read from; and here is where all my doubts come in:
Shoud I create an instance of Kaliy RSS connector for each RSS feed? Or would it be better to implement my own connector, so I create only one instance of it and each time I want to read a new RSS feed I would create a new Task in the connector?
Who should be resposible of assuring the Kafka Connect Cluster state is the desired one? I mean that if a Connector(in the case of 1 RSS feed : 1 Connector instance) stopped working, who should try to start it again? An external client via the Kafka Connect REST API? Kubernetes itself?
Right now, I think my best option is to rely on Kafka Connect REST API making the external client responsible of managing the state of the set of connectors, but I don't know if these was designed to recieve a lot of requests as it would be the case. Maybe these could be scaled by provisioning several listeners in the Kafka Connect REST API configuration but I do not know.
Thanks a lot!
One of the main benefits in using Kafka Connect is to leverage a configuration-driven approach, so you will lose this by implementing your own Connector. In my opinion, the best strategy is to have one Connector instance for each RSS feed. Reducing the number of instances could make sense when having a single data source system, to avoid overloading it.
Using Strimzi Operator, Kafka Connect cluster would be monitored and it will try to restore the desired cluster state when needed. This does not include the single Connector instances and their tasks, but you may leverage the K8s API for monitoring the Connector custom resource (CR) status, instead of the REST API.
Example:
$ kubetctl get kafkaconnector amq-sink -o yaml
apiVersion: kafka.strimzi.io/v1alpha1
kind: KafkaConnector
# ...
status:
conditions:
- lastTransitionTime: "2020-12-07T10:30:28.349Z"
status: "True"
type: Ready
connectorStatus:
connector:
state: RUNNING
worker_id: 10.116.0.66:8083
name: amq-sink
tasks:
- id: 0
state: RUNNING
worker_id: 10.116.0.66:8083
type: sink
observedGeneration: 1
It could be late, but it could help anyone will pass by the question, It is more relevant to have a look at Kafka-connect CR (Custom Resources) as a part of Confluent For Kubernetes (CFK), it introduces a clear cut declarative way to manage and monitor Connector with health checks and auto healing.
https://www.confluent.io/blog/declarative-connectors-with-confluent-for-kubernetes/
I have a parallel Kubernetes job with 1 pod per work item (I set parallelism to a fixed number in the job YAML).
All I really need is an ID per pod to know which work item to do, but Kubernetes doesn't support this yet (if there's a workaround I want to know).
Therefore I need a message queue to coordinate between pods. I've successfully followed the example in the Kubernetes documentation here: https://kubernetes.io/docs/tasks/job/coarse-parallel-processing-work-queue/
However, the example there creates a rabbit-mq service. I typically deploy my tasks as a job. I don't know how the lifecycle of a job compares with the lifecycle of a service.
It seems like that example is creating a permanent message queue service. But I only need the message queue to be in existence for the lifecycle of the job.
It's not clear to me if I need to use a service, or if I should be creating the rabbit-mq container as part of my job (and if so how that works with parallelism).
I'm working on a cluster in which I'm performing a lot scraping on Instagram to find valuable accounts and then message them to ask if they're interested in selling their account. This is what my application consists of:
Finding Instagram accounts by scraping for them with a lot of different accounts
Refine the accounts retrieved and sort out the bad ones
Message the chosen accounts
In addition to this, I'm thinking of uploading every data of each step to a database (the whole chunk of accounts gathered in step 1, the refined accounts gathered in step 2, and the messaged users from step 3) in separate collections. I'm also thinking of developing a slack bot that handles errors by messaging me a report of the error and eventually so it can message me whenever a user responds.
As you can see, there are a lot of different parts of this application and that is the reason why I figured that using Kubernetes for this would be a good idea.
My initial approach to this was by making every pod in my node a rest API. Then I could send a request to each pod, each time I wanted them to run. But if figured that this would not be an optimal solution and not in any way a Kubernetes-way approach.
The only way to achieve it in way you describe it is to communicate with Kubernetes API server from inside of your pod. This requires several thing (adding service account and role binding, using kubernetes client etc) and I would not recommended it as regular application flow (unless you are a devops trying to provide some generic/utility solution).
From another angle - sharing a volumes between pods and jobs should be avoided if possible (it adds complexity and restrictions)
You can dit more on this here - https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/#accessing-the-api-from-within-a-pod - as a starter.
If I can suggest some solutions:
you can share S3 volume and have Cronjob scheduled to
run every some time. If cronjob finds data - it process it. Therefore you do
not need to trigger job from inside a pod.
Two services, sending data via http (if feasible) - second service don't do
anything when it is not requested from it.
If you share your usecase with some details probably better answers could be provided.
Cheers
There is out of the box support in kubectl to run a job from a cronjob (kubectl create job test-job --from=cronjob/a-cronjob), but there is no official support for running a job straight from a pod. You will need to get the pod resource from the cluster and then create a job by using the pod specification as part of the job specification.
I would like to know if it is possible to send notification using yaml config if the kubernetes job fails?
For example, I have a kubetnetes job which runs once in everyday. Now i have been running a jenkins job to check and send notification if the job fails. Do we have any options to get notification from kubernetes jobs directly if it fails? It should be something like we add in job yaml
I'm not sure about any built in notification support. That seems like the kind of feature you can find in external dedicated monitoring/notification tools such as Prometheus or Logstash output.
For example, you can try this tutorial to leverage the prometheus metrics generated by default in many kubernetes clusters: https://medium.com/#tristan_96324/prometheus-k8s-cronjob-alerts-94bee7b90511
Or you can theoretically setup Logstash and monitor incoming logs sent by filebeat and conditionally send alerts as part of the output stage of the pipelines via the "email output plugin"
Other methods exist as well as mentioned in this similar issue: How to send alerts based on Kubernetes / Docker events?
For reference, you may also wish to read this request as discussed in github: https://github.com/kubernetes/kubernetes/issues/22207