I am new to Kafka world. We are planning to set up Kafka to fulfill our data streaming needs. The sink in our case, is REST endpoint. What connectors are available to support Kafka => REST endpoint connectivity? This is similar to how AWS simple queues or topics work.
AFAIK there is no certified HTTP sink for Apache Kafka. Why not simply create a kafka consumer and for every message (or message batch) make a REST call to your service?
Nothing out of the box that I'm aware of but akka-streams feels well suited for this use-case.
The source would be a Kafka message stream created using akka-stream-kafka (aka reactive-kafka) and the sink the akka-http client (flow-based variant):
http://doc.akka.io/docs/akka-http/10.0.6/scala/http/client-side/request-level.html#flow-based-variant
Integration patterns for akka-streams are (starting to be) documented under the name alpakka:
http://developer.lightbend.com/docs/alpakka/current/
Related
I'm currently planning the architecture for an application that reads from a Kafka topic and after some conversion puts data to RabbitMq.
I'm kind new for Kafka Streams and they look a good choice for my task. But the problem is that Kafka server is hosted at another vendor's place, so I can't even install Cafka Connector to RabbitMq Sink plugin.
Is it possible to write Kafka steam application that doesn't have any Sink points, but just processes input stream? I can just push to RabbitMQ in foreach operations, but I'm not sure will Stream even work without a sink point.
foreach is a Sink action, so to answer your question directly, no.
However, Kafka Streams should be limited to only Kafka Communication.
Kafka Connect can be installed and ran anywhere, if that is what you wanted to use... You can also use other Apache tools like Camel, Spark, NiFi, Flink, etc to write to RabbitMQ after consuming from Kafka, or write any application in a language of your choice. For example, the Spring Integration or Cloud Streams frameworks allows a single contract between many communication channels
Kafka Streams is good, but I have to do every configuration very manual. Instead Kafka Connect provides its API interface, which is very useful for handling the configuration, as well as Tasks, Workers, etc...
Thus, I'm thinking of using Kafka Connect for my simple data transforming service. Basically, the service will read the data from a topic and send the transformed data to another topic. In order to do that, I have to make a custom Sink Connector to send the transformed data to the kafka topic, however, it seems those interface functions aren't available in SinkConnector. If I can do it, that would be great since I can manage tasks, workers via the REST API and running the tasks under a distributed mode (multiple instances).
There are 2 options in my mind:
Figuring out how to send the message from SinkConnector to a kafka topic
Figuring out how to build a REST interface API like Kafka Connect which wraps up the Kafka Streams app
Any ideas?
Figuring out how to send the message from SinkConnector to a kafka topic
A sink connector consumes data/messages from a Kafka topic. If you want to send data to a Kafka topic you are likely talking about a source connector.
Figuring out how to build a REST interface API like Kafka Connect which wraps up the Kafka Streams app.
using the kafka-connect-archtype you can have a template to create your own Kafka connector (source or sink). In your case that you want to build some stream processing pipeline after the connector, you are mostly talking about a connector of another stream processing engine that is not Kafka-stream. There are connectors for Kafka <-> Spark, Kafka <-> Flink, ...
But you can build your using the template of kafka-connect-archtype if you want. Use the MySourceTask List<SourceRecord> poll() method or the MySinkTask put(Collection<SinkRecord> records) method to process the records as stream. They extend the org.apache.kafka.connect.[source.SourceTask|sink.SinkTask] from Kafka connect.
a REST interface API like Kafka Connect which wraps up the Kafka Streams app
This is exactly what KsqlDB allows you to do
Outside of creating streams and tables with SQL queries, it offers a REST API as well as can interact with Connect endpoints (or embed a Connect worker itself)
https://docs.ksqldb.io/en/latest/concepts/connectors/
Currently, I am using Confluent Rest Proxy 5.5.1 for collecting data in production. Duplicated events can come. I found the solution for de-duplication using Kafka Stream API. Is it possible to get an idempotent producer guarantee in Rest Proxy?
You can modify the REST Proxy's producer client with the producer. prefix
producer.enable.idempotence=true
Idempotent producers are not supported in Kafka REST. Ignoring 'enable.idempotence'. (io.confluent.kafkarest.KafkaRestConfig:880)
High level Design of application :
Upstream system sends stream of data, data is received by Java Application. Using KAFKA as data store, logstash will publish stored data in Elastic index, and all the application will use elastic search query to get the data.
Problem : To publish data from Java application to KAFKA, which API Kafka JMS client or Java Kafka Producer/Consumer API should be used?
As per kafka documentation, If you are interested in writing new Java applications then you are encouraged to use the Java Kafka Producer/Consumer APIs as they provide advanced features not available when using the kafka-jms-client https://docs.confluent.io/current/clients/kafka-jms-client/docs/index.html .
Also as per KAFKA documentation it is not typical Messgaing broker and not all JMS concepts map 1:1 kafka.
Is there any benefit of using JMS API for KAFKA since KAFKA is not typical Messaging broker [and application will be still tightly couple to KAFKA] and not all JMS concepts can be mapped to kafka?
I have a requirement where messages are coming in json format from ACTIVEMQ and I have to expose end point where I can receive messages. I am using KAFKA but don't know whether can I use KAFKA REST API to receive json messages and store them against a particular topic in Kafka broker?
There is no REST API in the core Apache Kafka distribution but there is a very good open source REST Proxy for Kafka available from Confluent. It enables both publish and subscribe via REST APIs. The code and docs are on github at https://github.com/confluentinc/kafka-rest or you can download the entire Confluent open source platform which includes Apache Kafka, REST Proxy, Schema Registry and a few other useful enhancements at http://www.confluent.io/download/