Spring Cloud Dataflow on different message broker than Redis? - spring-cloud

from what I can see in the documentation, always a Redis instance is needed for Spring Cloud Dataflow to work.
Is it also possible to work with different message broker, e.g. RabbitMQ?
How would one specify a different message broker during startup?

With the recent 1.0.0.M3 release, when using the Local server, we load kafka based OOTB applications, by default.
If you'd like to switch from kafka to rabbit, you can via --binder=rabbit as a command line argument when starting the spring-cloud-dataflow-server-local JAR (and be sure to start the RabbitMQ server).

Related

Kafka - increase partition count of existing topic through Confluent REST Api

I need to add partitions to an existing Kafka topic.
I'm aware that it is possible to use the /bin/kafka-topics.sh script to achieve this, but I would prefer to do this through the Confluent REST api.
As far as I see there is no documented endpoint in the api reference, but I wonder if someone else here was able to make this work.
Edit: As it does seem to be impossible to use the REST api here, I wonder what the best practice is for adding partitions to an existing topic in a containerized setup. E.g. if there is a custom partioning scheme that maps customer ids to specific partitions. In this case the app container would need to adjust the partition count of the kafka container.
The Confluent REST Proxy has no such endpoint for topic update administration.
You would need to use the shell script or the corresponding AdminClient class that the shell script uses
This is the solution I ended up with:
Create a small http service that is is deployed within the kafka docker image
The http service accepts requests to increase the partition count and directs the requests to the kafka admin scripts (bin/kafka/kafka-topics.sh)
Something similar could have been achieved by using the AdminClient NewPartitions api in the Java Kafka lib. This solution has the advantage that the kafka docker image does not have to be changed, because the AdminClient can connect through network from another container.
For a production setup the AdminClient is preferrable, I decided for the integrated script approach because of the chosen language (rust).

Why are there kafka and brokers in the scdf trace shown in zipkin?

I have a spring cloud data flow environment created in kubernetes and a zipkin environment created as well. But when I look at the Dependencies in zipkin, I see that in addition to the application that exists in the stream, there is also a broker and kafka.
Is there anyone who can tell me why this is? And is there any way I can get broker and kafka to not show up.
It's like this image shows
That's because one of the brokers got resolved as being kafka (for example via special message headers) and the other didn't. It's either a bug in Sleuth or you're using an uninstrumented library.

Ingest Streaming Data to Kafka via http

I am very new with Kafka and Streaming Data in general. What I am trying to do is to ingest data which is to be sent via http to kafka. My research has brought me to the confluent REST proxy but I can't get it to work.
What I currently have is kafka running with a single node and single broker with kafkamanager in docker containers.
Unfortunately I can't run the full confluent platform with docker since I don't have enough memory available on my machine.
In essence my question is: How to setup a development environment where data is ingested by kafka through http?
Any help is highly appreciated!
You don't need the "full Confluent Platform" (KSQL, Control Center, included)
Zookeeper, Kafka, the REST proxy, and optionally the Schema Registry, should all only take up-to 4 GB of RAM total. If you don't even have that, then you'll need to go buy more RAM.
Note that Zookeeper and Kafka do not need to be running on the same machines as the Schema Registry or REST proxy, so if you have multiple machines, then you can save some resources that way as well.
To run one Kafka broker, zookeeper and schema registry, 1Gb is usually enough (in dev).
If you do not want for some reason to use Confluent REST proxy, you can write your own. It's quite straightforward: "on request, parse your incoming JSON, validate data, construct your message (in Avro?) and produce it to Kafka".
In this article, you'll find some configuration to press Kafka and ZK on heap memory: https://medium.com/#saabeilin/kafka-hands-on-part-i-development-environment-fc1b70955152
Here you can read how to produce/consume messages with Python:
https://medium.com/#saabeilin/kafka-hands-on-part-ii-producing-and-consuming-messages-in-python-44d5416f582e
Hope these help!

Listen to a topic continiously, fetch data, perform some basic cleansing

I'm to build a Java based Kafka streaming application that will listen to a topic X continiously, fetch data, perform some basic cleansing and write to a Oracle database. The kafka cluster is outside my domain and have no ability to deploy any code or configurations in it.
What is the best way to design such a solution? I came across Kafka Streams but was confused as to if it can be used for 'Topic > Process > Topic' scenarios?
I came accross Kafka Streams but was confused as to if it can be used for 'Topic > Process > Topic' scenarios?
Absolutely.
For example, excluding the "process" step, it's two lines outside of the configuration setup.
final StreamsBuilder builder = new StreamsBuilder();
builder.stream("streams-plaintext-input").to("streams-pipe-output");
This code is straight from the documentation
If you want to write to any database, you should first check if there is a Kafka Connect plugin to do that for you. Kafka Streams shouldn't really be used to read/write from/to external systems outside of Kafka, as it is latency-sensitive.
In your case, the JDBC Sink Connector would work well.
The kafka cluster is outside my domain and have no ability to deploy any code or configurations in it.
Using either solution above, you don't need to, but you will need some machine with Java installed to run a continous Kafka Streams application and/or Kafka Connect worker.

How to start kafka server programmatically

we are trying to start the kafka server (zookeeper and broker) programmatically in our application. Any api / library available for the same?
Yes, you can use embedded Kafka, which will run zookeeper and kafka server for you. It is generally used for testing kafka producer/consumer where there is no need to explicitly run them.
For more detail refer
To run it, we write EmbeddedKafka.start() at the start and EmbeddedKafka.stop() at the end.