How do I set up Elastic Node APM distributed tracing to work with Kafka and multiple Node services? - apache-kafka

I'm using Kafka for a queue, with Node services producing and consuming messages to Kafka topics using Kafka-Node.
I've been using a home-brewed distributed tracing solution, but now we are moving to the Elastic APM.
This seems to be tailored to HTTP servers, but how do I configure it to work with Kafka?
I want to be able to track transactions like the following: Service A sends an HTTP request to Service B, which produces it to Kafka Topic C, from which it is consumed by Service D, which puts some data into Kafka Topic E, from which it is consumed by Service B.

I worked with the Elastic APM team, who had just rolled out this package: https://www.npmjs.com/package/elastic-apm-node
The directions are pretty self-explanatory, works like a charm.

Related

Triggering kubernetes job for a kafka message

I have a kubernetes service that only does something when it consumes a message from a Kafka queue. The queue does not have messages very often, and running the service as a job triggered whenever a message is found would save resources.
I see that Kubernetes has this functionality for AMQP-type message services: https://kubernetes.io/docs/tasks/job/coarse-parallel-processing-work-queue/
Is there a way to adapt this for Kafka, given that Kafka does not support AMQP? I'd switch to a different messaging system, but I have other services that also read from this queue that require Kafka.
That Kafka consumer Service is all you really need. If you want to save resources, this could be paired with KEDA autoscaler such that it scales up and down, depending on load or consumer group lag.
Or you can use serverless platforms such as KNative to trigger based on Kafka (or other messaging systems) events.
Kafka does not support AMQP
Kafka Connect should be able to bridge AMQP to Kafka. E.g. Apache Camel has connectors for both.

Using Kafka and Kafka REST Proxy server

I have two ends of an application:
Python Flask backend, which communicates using Kafka in the normal way.
A machine agent written in Go,installed on a client environment and communicates with Kafka only via Kafka REST proxy.
Now the question is can these two ends communicate? For example can my machine agent consume messages from Kafka via the REST proxy,with messages produced from the other end in the normal way? Or do both ends need to use Kafka REST Proxy?
As long as data arrives in Kafka, it doesn't matter what protocol-hops you're using to get it there. I'd recommend using sarama or confluent-kafka-go instead of HTTP, though

Build a data transformation service using Kafka Connect

Kafka Streams is good, but I have to do every configuration very manual. Instead Kafka Connect provides its API interface, which is very useful for handling the configuration, as well as Tasks, Workers, etc...
Thus, I'm thinking of using Kafka Connect for my simple data transforming service. Basically, the service will read the data from a topic and send the transformed data to another topic. In order to do that, I have to make a custom Sink Connector to send the transformed data to the kafka topic, however, it seems those interface functions aren't available in SinkConnector. If I can do it, that would be great since I can manage tasks, workers via the REST API and running the tasks under a distributed mode (multiple instances).
There are 2 options in my mind:
Figuring out how to send the message from SinkConnector to a kafka topic
Figuring out how to build a REST interface API like Kafka Connect which wraps up the Kafka Streams app
Any ideas?
Figuring out how to send the message from SinkConnector to a kafka topic
A sink connector consumes data/messages from a Kafka topic. If you want to send data to a Kafka topic you are likely talking about a source connector.
Figuring out how to build a REST interface API like Kafka Connect which wraps up the Kafka Streams app.
using the kafka-connect-archtype you can have a template to create your own Kafka connector (source or sink). In your case that you want to build some stream processing pipeline after the connector, you are mostly talking about a connector of another stream processing engine that is not Kafka-stream. There are connectors for Kafka <-> Spark, Kafka <-> Flink, ...
But you can build your using the template of kafka-connect-archtype if you want. Use the MySourceTask List<SourceRecord> poll() method or the MySinkTask put(Collection<SinkRecord> records) method to process the records as stream. They extend the org.apache.kafka.connect.[source.SourceTask|sink.SinkTask] from Kafka connect.
a REST interface API like Kafka Connect which wraps up the Kafka Streams app
This is exactly what KsqlDB allows you to do
Outside of creating streams and tables with SQL queries, it offers a REST API as well as can interact with Connect endpoints (or embed a Connect worker itself)
https://docs.ksqldb.io/en/latest/concepts/connectors/

Exposing a public kafka cluster

If I were to create a public Kafka cluster that accepts messages from multiple clients, but are purely processed by a separate backend. What would be the right way to design it?
A bit more concrete example, let's say I have 50 kafka brokers. How do I:
Configure clients without the manually adding in IPs of the 50 kafka brokers.?
Loadbalancing messages to kafka broker based on load if possible.
Easier/automated way to setup additional clients with quota.
You can use hashicorp consul which is one of the open source service discovery tools to get your kafka brokers on, ultimately you will have single endpoint and you don't need to add multiple brokers in your clients. There are several other open source told available
There are few ways, use kafka assigner tool to balance the traffic or kafka cruise control open source tool to automatically balance the cluster for you

Apache Kafka consumer groups and microservices running on Kubernetes, are they compatible?

So far, I have been using Spring Boot apps (with Spring Cloud Stream) and Kafka running without any supporting infrastructure (PaaS).
Since our corporate platform is running on Kubernetes we need to move those Spring Boot apps into K8s to allow the apps to scale and so on. Obviously there will be more than one instance of every application so we will define a consumer group per application to ensure the unique delivery and processing of every message.
Kafka will be running outside Kubernetes.
Now my doubt is: since the apps deployed on k8s are accessed through the k8s service that abstracts the underlying pods, and individual application pods can't be access directly outside of the k8s cluster, Kafka won't know how to call individual instances of the consumer group to deliver the messages, will it?
How can I make them work together?
Kafka brokers do not push data to clients. Rather clients poll() and pull data from the brokers. As long as the consumers can connect to the bootstrap servers and you set the Kafka brokers to advertise an IP and port that the clients can connect to and poll() then it will all work fine.
Can Spring Cloud Data Flow solve your requirement to control the number of instances deployed?
and, there is a community released Spring Cloud Data Flow server for OpenShift:
https://github.com/donovanmuller/spring-cloud-dataflow-server-openshift