how to call a springboot service endpoint from apache kafka and pass all the messages from a topic? - apache-kafka

I am new to kafka technology and I have a requirement of fetch all the realtime data from a DB and pass it to a springboot microservice for its processing. in my analysis found that apache kafka with kafka source connect can pull all real time data from the DB to kafka Topics.
Can someone tell is there any way to pick this data form kafka topics and share to microservice by trigger a restcall from the kafka service ?
The idea is whenever a new entry added to the database table kafka can pull that data via kafka connect and somehow kafka should call the microservice and share this new entry. is it possible with kafka ?
Database --> kafka connect --> kafka (Topic) ---> some service that call microservice ---> microservice

form kafka topics and share to microservice
Kafka doesn't push. You would add a consumer in your service to pull from Kafka, perhaps using spring-kafka or spring-cloud-streams
Alternatively, Kafka Connect Sink could be used with an HTTP POST connector, but then you'd need to somehow deal with not committing offsets for messages that have failed requests.

Related

Kafka Connector To read from a Topic and write to a topic

I want to build a Kafka connector which needs to read from the Kafka topic and make a call to the GRPC service to get some data and write the whole data into another kafka topic.
I have written a Kafka Sink connector which reads from a topic and called a GRPC service. But not sure how to redirect this data into a Kafka topic.
Kafka Streams can read from topics, call external services as necessary, then forward this data to a new topic in the same cluster.
MirrorMaker2 can be used between different clusters, but using Connect transforms is generally not recommended with external services.
Or you could make your gRPC service into a Kafka producer.

Can kafka publish messages to AWS lambda

I have to publish messages from a kafka topic to lambda to process them and store in a database using a springboot application, i did some research and found something to consume messages from kafka
public Function<KStream<String, String>, KStream<String, String>> process(){} however, im not sure if this is only used to publish the consumed messages to another kafka topic or can be used as an event source to lambda, I need some guidance on consuming and converting the consumed kafka message to event source.
Brokers do not push. Consumers always poll.
Code shown is for Kafka Streams API, which primarily writes to new Kafka topics. While you could fire HTTP events to start a lambda, that's not recommended.
Alternatively, Kafka is already supported as an event source. You don't need to write any consumer code.
https://aws.amazon.com/about-aws/whats-new/2020/12/aws-lambda-now-supports-self-managed-apache-kafka-as-an-event-source/
This is possible from MSK or a self managed Kafka
process them and store in a database
Your lambda could process the data and send to a new Kafka topic using a producer. You can then use MSK Connect or run your own Kafka Connect cluster elsewhere to dump records into a database. No Spring/Java code would be necessary.

Spring Cloud Data Flow Kafka Source

I am new to Spring Cloud Data Flow, and need to listen for messages on a topic from an external kafka cluster. This external kafka topic in confluent cloud would be my Source that I need to pass on to my Sink application.
I am also using kafka as my underlying message broker, which is a separate kafka instance that is deployed on kubernetes. I'm just not sure what is the best approach to connect to this external kafka instance. Is there an existing kafka Source app that I can use, or do I need to create my own Source application to connect to it? Or is it just some kind of configuration that I need to setup to get connected?
Any examples would be helpful. Thanks in advance!

Build a data transformation service using Kafka Connect

Kafka Streams is good, but I have to do every configuration very manual. Instead Kafka Connect provides its API interface, which is very useful for handling the configuration, as well as Tasks, Workers, etc...
Thus, I'm thinking of using Kafka Connect for my simple data transforming service. Basically, the service will read the data from a topic and send the transformed data to another topic. In order to do that, I have to make a custom Sink Connector to send the transformed data to the kafka topic, however, it seems those interface functions aren't available in SinkConnector. If I can do it, that would be great since I can manage tasks, workers via the REST API and running the tasks under a distributed mode (multiple instances).
There are 2 options in my mind:
Figuring out how to send the message from SinkConnector to a kafka topic
Figuring out how to build a REST interface API like Kafka Connect which wraps up the Kafka Streams app
Any ideas?
Figuring out how to send the message from SinkConnector to a kafka topic
A sink connector consumes data/messages from a Kafka topic. If you want to send data to a Kafka topic you are likely talking about a source connector.
Figuring out how to build a REST interface API like Kafka Connect which wraps up the Kafka Streams app.
using the kafka-connect-archtype you can have a template to create your own Kafka connector (source or sink). In your case that you want to build some stream processing pipeline after the connector, you are mostly talking about a connector of another stream processing engine that is not Kafka-stream. There are connectors for Kafka <-> Spark, Kafka <-> Flink, ...
But you can build your using the template of kafka-connect-archtype if you want. Use the MySourceTask List<SourceRecord> poll() method or the MySinkTask put(Collection<SinkRecord> records) method to process the records as stream. They extend the org.apache.kafka.connect.[source.SourceTask|sink.SinkTask] from Kafka connect.
a REST interface API like Kafka Connect which wraps up the Kafka Streams app
This is exactly what KsqlDB allows you to do
Outside of creating streams and tables with SQL queries, it offers a REST API as well as can interact with Connect endpoints (or embed a Connect worker itself)
https://docs.ksqldb.io/en/latest/concepts/connectors/

How to enable Kafka sink connector to insert data from topics to tables as and when sink is up

I have developed kafka-sink-connector (using confluent-oss-3.2.0-2.11, connect framework) for my data-store (Amppol ADS), which stores data from kafka topics to corresponding tables in my store.
Every thing is working as expected as long as kafka servers and ADS servers are up and running.
Need a help/suggestions about a specific use-case where events are getting ingested in kafka topics and underneath sink component (ADS) is down.
Expectation here is Whenever a sink servers comes up, records that were ingested earlier in kafka topics should be inserted into the tables;
Kindly advise how to handle such a case.
Is there any support available in connect framework for this..? or atleast some references will be a great help.
SinkConnector offsets are maintained in the _consumer_offsets topic on Kafka against your connector name and when SinkConnector restarts it will pick messages from Kafka server from the previous offset it had stored on the _consumer_offsets topic.
So you don't have to worry anything about managing offsets. Its all done by the workers in the Connect framework. In your scenario you go and just restart your sink connector. If the messages are pushed to Kafka by your source connector and are available in the Kafka, sink connector can be started/restarted at any time.