Pushing data into kafka using rest api and kafka connect - rest

Is there a way for me to expose a rest api to which systems can push data to, which in turn creates topics and updates the avro schema registry without manually writing a custom connector?

Sure. You can write an HTTP server in any language you're comfortable with and wrap KafkaProducer with Avro serializer and AdminClient to create topics.
The Confluent REST Proxy already does this, partially.
Kafka Connect doesn't accept records over HTTP, anyway, so a custom connector wouldn't do any good.

Related

Kafka Connect or Kafka Streams?

I have a requirement to read messages from a topic, enrich the message based on provided configuration (data required for enrichment is sourced from external systems), and publish the enriched message to an output topic. Messages on both source and output topics should be Avro format.
Is this a good use case for a custom Kafka Connector or should I use Kafka Streams?
Why I am considering Kafka Connect?
Lightweight in terms of code and deployment
Configuration driven
Connection and error handling
Scalability
I like the plugin based approach in Connect. If there is a new type of message that needs to be handled I just deploy a new connector without having to deploy a full scale Java app.
Why I am not sure this is good candidate for Kafka Connect?
Calls to external system
Can Kafka be both source and sink for a connector?
Can we use Avro schemas in connectors?
Performance under load
Cannot do stateful processing (currently there is no requirement)
I have experience with Kafka Streams but not with Connect
Use both?
Use Kafka Connect to source external database into a topic.
Use Kafka Streams to build that topic into a stream/table that can then be manipulated.
Use Kafka Connect to sink back into a database, or other system other than Kafka, as necessary.
Kafka Streams can also be config driven, use plugins (i.e. reflection), is just as scalable, and has no different connection modes (to Kafka). Performance should be the similar. Error handling is really the only complex part. ksqlDB is entirely "config driven" via SQL statements, and can connect to external Connect clusters, or embed its own.
Avro works for both, yes.
Some connectors are temporarily stateful, as they build in-memory batches, such as S3 or JDBC sink connectors

Build a data transformation service using Kafka Connect

Kafka Streams is good, but I have to do every configuration very manual. Instead Kafka Connect provides its API interface, which is very useful for handling the configuration, as well as Tasks, Workers, etc...
Thus, I'm thinking of using Kafka Connect for my simple data transforming service. Basically, the service will read the data from a topic and send the transformed data to another topic. In order to do that, I have to make a custom Sink Connector to send the transformed data to the kafka topic, however, it seems those interface functions aren't available in SinkConnector. If I can do it, that would be great since I can manage tasks, workers via the REST API and running the tasks under a distributed mode (multiple instances).
There are 2 options in my mind:
Figuring out how to send the message from SinkConnector to a kafka topic
Figuring out how to build a REST interface API like Kafka Connect which wraps up the Kafka Streams app
Any ideas?
Figuring out how to send the message from SinkConnector to a kafka topic
A sink connector consumes data/messages from a Kafka topic. If you want to send data to a Kafka topic you are likely talking about a source connector.
Figuring out how to build a REST interface API like Kafka Connect which wraps up the Kafka Streams app.
using the kafka-connect-archtype you can have a template to create your own Kafka connector (source or sink). In your case that you want to build some stream processing pipeline after the connector, you are mostly talking about a connector of another stream processing engine that is not Kafka-stream. There are connectors for Kafka <-> Spark, Kafka <-> Flink, ...
But you can build your using the template of kafka-connect-archtype if you want. Use the MySourceTask List<SourceRecord> poll() method or the MySinkTask put(Collection<SinkRecord> records) method to process the records as stream. They extend the org.apache.kafka.connect.[source.SourceTask|sink.SinkTask] from Kafka connect.
a REST interface API like Kafka Connect which wraps up the Kafka Streams app
This is exactly what KsqlDB allows you to do
Outside of creating streams and tables with SQL queries, it offers a REST API as well as can interact with Connect endpoints (or embed a Connect worker itself)
https://docs.ksqldb.io/en/latest/concepts/connectors/

Publish message to kafka via http

I'm new with kafka and I'm trying to publish data from external application via http but I cannot find the way to do this.
I already created a topic in kafka and test it producing and consuming the message but I don't know how to insert/publish message via http, I tried to invoke the following url to retrieve the topics but it does not retrieve any data http://servername:2181/topics/
I'm using cloudera 5.12.1.
You can access to your topics, if it was already created, using APIs. The easy way...(see client list)
Or see Connects Config to manage connectors by REST (rest.host.name, rest.port parameters). But only connectors...
To consume or produce message in a topic, use a middleware. IT is more feaseble.
Check out the open source Kafka REST Proxy from Confluent. It does exactly what you want.
You can get it standalone, or as part of Confluent Platform.

REST endpoint as Kafka Sink

I am new to Kafka world. We are planning to set up Kafka to fulfill our data streaming needs. The sink in our case, is REST endpoint. What connectors are available to support Kafka => REST endpoint connectivity? This is similar to how AWS simple queues or topics work.
AFAIK there is no certified HTTP sink for Apache Kafka. Why not simply create a kafka consumer and for every message (or message batch) make a REST call to your service?
Nothing out of the box that I'm aware of but akka-streams feels well suited for this use-case.
The source would be a Kafka message stream created using akka-stream-kafka (aka reactive-kafka) and the sink the akka-http client (flow-based variant):
http://doc.akka.io/docs/akka-http/10.0.6/scala/http/client-side/request-level.html#flow-based-variant
Integration patterns for akka-streams are (starting to be) documented under the name alpakka:
http://developer.lightbend.com/docs/alpakka/current/

Consume JSON messages using KAFKA REST API

I have a requirement where messages are coming in json format from ACTIVEMQ and I have to expose end point where I can receive messages. I am using KAFKA but don't know whether can I use KAFKA REST API to receive json messages and store them against a particular topic in Kafka broker?
There is no REST API in the core Apache Kafka distribution but there is a very good open source REST Proxy for Kafka available from Confluent. It enables both publish and subscribe via REST APIs. The code and docs are on github at https://github.com/confluentinc/kafka-rest or you can download the entire Confluent open source platform which includes Apache Kafka, REST Proxy, Schema Registry and a few other useful enhancements at http://www.confluent.io/download/