Using Apache Kafka as an alternative to FTP - apache-kafka

I'm new to the open-source technology.
I just want to know whether we can use Apache Kafka as an alternative to our regular FTP where we keep files at a certain location from where the end user accesses them.
The source for my files will mainly be SAP HANA. From where I want to push files into Kafka, from where the end user will be able to consume it.
Can someone suggest from where to start or list down the steps in achieving this ?

Kafka is not a 1:1 replacement, no.
Can you use Apache Kafka for streaming data integration between systems, in a more scalable and less brittle way than FTP? Sure. Can you just switch one out for the other? No.
Have a look at these resources to understand more about what Kafka can be used for and how to use it:
http://go.rmoff.net/devoxx18-embrace-the-anarchy
http://go.rmoff.net/devoxx18-build-streaming-pipeline
http://rmoff.dev/ksny19-no-more-silos

Kafka is typically not meant for large data such as files. I suppose that you would want to do some operations on those files. The way you can do is to pass the references to those files to a Kafka topic and let your consumers read the data from those files using those references.
I don't know about SAP Hana. But you may be interested in
SAP Hana Connector for Kafka

Related

Near real time streaming data from 100s customer to Google Pub/Sub to GCS

I am getting near-real time data from 100s of customers. I need to store this data in Google Cloud Storage buckets created for each customer i.e. /gcs/customer_id/yy/mm/day/hhhh/
My data is in Avro. I guess I can use Pub/Sub to Avro Files on Cloud Storage template.
However, I'm not sure if Google Pub/Sub can accept data from multiple customers.
Appreciate any help here, thanks!
The template is quite simple: it takes all the data of PubSub and store them in an avro file on GCS.
However, it's a good starting point and you can make evolutions on that base to add a split per customer, and the file path that you want.
You can find the template in Java format on GitHub

Data synchronization between primary and redundant servers

I want to synchronize data among a set of REST API servers(Spring Boot based API cluster) periodically. Any instance in the cluster should be able to broadcast new information to all others.
I don't want to use a DB here. I am trying to find a lightweight library that can be used inside the API for this purpose. Is it possible to use Atomoix/Hazelcast/ZooKeeper for this purpose? If so, it will be really helpful if someone can post a sample code - if possible.
My thanks in advance.
In Hazelcast you can do it through WAN replication.
It is an enterprise feature you have to buy a license.
Hazelcast can be used for this use-case. Each of the REST instances will create an embedded Hazelcast member within its JVM. Hazelcast members then discover each other and form the cluster. Your REST apps will use the IMap or ReplicatedMap service - a distributed key-value store (IMap can store more data, ReplicatedMap is faster). Once you write a data to the IMap all other instances see it right away.
See the code sample here: https://docs.hazelcast.com/hazelcast/latest/getting-started/get-started-java.html#complete-code-samples
This feature and the Spring integration are open-source.

Possible to search all topics data in Kafka?

I need a solution preferably something inbuilt (rather than creating my own application) which would help management search through multiple/all topics in Kafka. We are using Confluent Platform. Basically user should be able to search a keyword in a UI and it should search current log of multiple/all Kafka topics and return the data. All the topics in our environment use json to communicate.
So this search would enable us to track flow for example, multiple microservices send data from one system to another system and this flow can be tracked via a correlation id which is present in all the jsons. So if someone searchers this correlation id he should be able to see the messages involved in the flow. This search would have more use cases later on.
We need a solution which would have minimal coding involved. We would prefer to use a UI like Kibana.
On basic reading I suspect below solutions but not really sure as I am new to Confluent (used open-source Apache Kafka earlier):
Sol 1: use ksqldb. (need more help on how to use it)
Sol 2: Stream all topics data using Kafka Connect to Elastic Search by using inbuilt plugin and use Kibana on top of Elastic.
Kindly help to find the best case alternative.
You could use Elastic, sure.
You could also use Splunk, though.
There is also the pdk tool offered by Pilosa that creates a distributed index over Kafka events. (no affiliation)
Another option would be distributed tracing using interceptors between clients, not "on all topics", which sounds like what you actually need

spring-cloud-starter-stream-source-jdbc Eample Application?

I am trying to run "spring-cloud-starter-stream-source-jdbc" application. As my Source is RDBMS and I want to store the RDBMS data into RDBMS sink. I would like to know any best demo application based on "spring-cloud-starter-stream-source-jdbc".
Is there support for Incremental & Full load while performing Data Stream From RDBMS Source to RDBMS Sink using "spring-cloud-starter-stream-jdbc".
Please share any reference blogs to understand "spring-cloud-starter-stream-source-jdbc" demo application.
You can use OOTB jdbc source/sink apps (with the binder of your choice rabbit, kafka). The spring-cloud-starter-stream projects are the ones you would want to use inside your application if you want to extend/build custom applications based on the jdbc starters.
For the OOTB apps, you can refer here. For instance, jdbc source app with rabbit binder can be found here

Kafka entries to DynamoDB

I want all records coming in Kafka topic to be inserted in DynamoDB. Is there any open source plugin available which can do the same. Or is there any other better way that I can do it within DynamoDB itself.
Pls Suggest
Lokesh Narayan
Storing Kafka messages in DynamoDB is a great use case for Kafka Connect. Unfortunately I don't know of any off-the-shelf sink connectors for DynamoDB. You can see a list here. For now, you'll need to either build your own sink connector (and hopefully open source it!) or build a custom consumer that writes to DynamoDB.
However, I found this kafka-connect-dynamodb. Though it is currently unmaintained, it worked for me.