Possible to search all topics data in Kafka? - apache-kafka

I need a solution preferably something inbuilt (rather than creating my own application) which would help management search through multiple/all topics in Kafka. We are using Confluent Platform. Basically user should be able to search a keyword in a UI and it should search current log of multiple/all Kafka topics and return the data. All the topics in our environment use json to communicate.
So this search would enable us to track flow for example, multiple microservices send data from one system to another system and this flow can be tracked via a correlation id which is present in all the jsons. So if someone searchers this correlation id he should be able to see the messages involved in the flow. This search would have more use cases later on.
We need a solution which would have minimal coding involved. We would prefer to use a UI like Kibana.
On basic reading I suspect below solutions but not really sure as I am new to Confluent (used open-source Apache Kafka earlier):
Sol 1: use ksqldb. (need more help on how to use it)
Sol 2: Stream all topics data using Kafka Connect to Elastic Search by using inbuilt plugin and use Kibana on top of Elastic.
Kindly help to find the best case alternative.

You could use Elastic, sure.
You could also use Splunk, though.
There is also the pdk tool offered by Pilosa that creates a distributed index over Kafka events. (no affiliation)
Another option would be distributed tracing using interceptors between clients, not "on all topics", which sounds like what you actually need

Related

KSQLDB streams flow visualisation

KSQLDB allows stream joins which is quite a handy solution. As the queries get complicated and the time pass on it is handy to see how the data flow is designed in a visual manner.
My question is are there any existing tooling that allow to visualise the current message flow designed by the KSQL queries? Even perhaps the underlying Kafka streams visualisation would be a good start too.

Data synchronization between primary and redundant servers

I want to synchronize data among a set of REST API servers(Spring Boot based API cluster) periodically. Any instance in the cluster should be able to broadcast new information to all others.
I don't want to use a DB here. I am trying to find a lightweight library that can be used inside the API for this purpose. Is it possible to use Atomoix/Hazelcast/ZooKeeper for this purpose? If so, it will be really helpful if someone can post a sample code - if possible.
My thanks in advance.
In Hazelcast you can do it through WAN replication.
It is an enterprise feature you have to buy a license.
Hazelcast can be used for this use-case. Each of the REST instances will create an embedded Hazelcast member within its JVM. Hazelcast members then discover each other and form the cluster. Your REST apps will use the IMap or ReplicatedMap service - a distributed key-value store (IMap can store more data, ReplicatedMap is faster). Once you write a data to the IMap all other instances see it right away.
See the code sample here: https://docs.hazelcast.com/hazelcast/latest/getting-started/get-started-java.html#complete-code-samples
This feature and the Spring integration are open-source.

Using Apache Kafka as an alternative to FTP

I'm new to the open-source technology.
I just want to know whether we can use Apache Kafka as an alternative to our regular FTP where we keep files at a certain location from where the end user accesses them.
The source for my files will mainly be SAP HANA. From where I want to push files into Kafka, from where the end user will be able to consume it.
Can someone suggest from where to start or list down the steps in achieving this ?
Kafka is not a 1:1 replacement, no.
Can you use Apache Kafka for streaming data integration between systems, in a more scalable and less brittle way than FTP? Sure. Can you just switch one out for the other? No.
Have a look at these resources to understand more about what Kafka can be used for and how to use it:
http://go.rmoff.net/devoxx18-embrace-the-anarchy
http://go.rmoff.net/devoxx18-build-streaming-pipeline
http://rmoff.dev/ksny19-no-more-silos
Kafka is typically not meant for large data such as files. I suppose that you would want to do some operations on those files. The way you can do is to pass the references to those files to a Kafka topic and let your consumers read the data from those files using those references.
I don't know about SAP Hana. But you may be interested in
SAP Hana Connector for Kafka

Message format/specification for distributed REST services?

I have a growing number of REST services that talk to each other with JSON. Right now, the communication is direct, but it's possible that a broker might process and distribute later on.
This is the only one I've found so far:
https://github.com/cjus/umf/blob/master/umf.md
Are there others that would be better suited? Thanks.
You can use JSON Schema.
There is more explanatory info available also.

Kafka entries to DynamoDB

I want all records coming in Kafka topic to be inserted in DynamoDB. Is there any open source plugin available which can do the same. Or is there any other better way that I can do it within DynamoDB itself.
Pls Suggest
Lokesh Narayan
Storing Kafka messages in DynamoDB is a great use case for Kafka Connect. Unfortunately I don't know of any off-the-shelf sink connectors for DynamoDB. You can see a list here. For now, you'll need to either build your own sink connector (and hopefully open source it!) or build a custom consumer that writes to DynamoDB.
However, I found this kafka-connect-dynamodb. Though it is currently unmaintained, it worked for me.