Kafka entries to DynamoDB - apache-kafka

I want all records coming in Kafka topic to be inserted in DynamoDB. Is there any open source plugin available which can do the same. Or is there any other better way that I can do it within DynamoDB itself.
Pls Suggest
Lokesh Narayan

Storing Kafka messages in DynamoDB is a great use case for Kafka Connect. Unfortunately I don't know of any off-the-shelf sink connectors for DynamoDB. You can see a list here. For now, you'll need to either build your own sink connector (and hopefully open source it!) or build a custom consumer that writes to DynamoDB.

However, I found this kafka-connect-dynamodb. Though it is currently unmaintained, it worked for me.

Related

Possible to search all topics data in Kafka?

I need a solution preferably something inbuilt (rather than creating my own application) which would help management search through multiple/all topics in Kafka. We are using Confluent Platform. Basically user should be able to search a keyword in a UI and it should search current log of multiple/all Kafka topics and return the data. All the topics in our environment use json to communicate.
So this search would enable us to track flow for example, multiple microservices send data from one system to another system and this flow can be tracked via a correlation id which is present in all the jsons. So if someone searchers this correlation id he should be able to see the messages involved in the flow. This search would have more use cases later on.
We need a solution which would have minimal coding involved. We would prefer to use a UI like Kibana.
On basic reading I suspect below solutions but not really sure as I am new to Confluent (used open-source Apache Kafka earlier):
Sol 1: use ksqldb. (need more help on how to use it)
Sol 2: Stream all topics data using Kafka Connect to Elastic Search by using inbuilt plugin and use Kibana on top of Elastic.
Kindly help to find the best case alternative.
You could use Elastic, sure.
You could also use Splunk, though.
There is also the pdk tool offered by Pilosa that creates a distributed index over Kafka events. (no affiliation)
Another option would be distributed tracing using interceptors between clients, not "on all topics", which sounds like what you actually need

Using Apache Kafka as an alternative to FTP

I'm new to the open-source technology.
I just want to know whether we can use Apache Kafka as an alternative to our regular FTP where we keep files at a certain location from where the end user accesses them.
The source for my files will mainly be SAP HANA. From where I want to push files into Kafka, from where the end user will be able to consume it.
Can someone suggest from where to start or list down the steps in achieving this ?
Kafka is not a 1:1 replacement, no.
Can you use Apache Kafka for streaming data integration between systems, in a more scalable and less brittle way than FTP? Sure. Can you just switch one out for the other? No.
Have a look at these resources to understand more about what Kafka can be used for and how to use it:
http://go.rmoff.net/devoxx18-embrace-the-anarchy
http://go.rmoff.net/devoxx18-build-streaming-pipeline
http://rmoff.dev/ksny19-no-more-silos
Kafka is typically not meant for large data such as files. I suppose that you would want to do some operations on those files. The way you can do is to pass the references to those files to a Kafka topic and let your consumers read the data from those files using those references.
I don't know about SAP Hana. But you may be interested in
SAP Hana Connector for Kafka

spring-cloud-starter-stream-source-jdbc Eample Application?

I am trying to run "spring-cloud-starter-stream-source-jdbc" application. As my Source is RDBMS and I want to store the RDBMS data into RDBMS sink. I would like to know any best demo application based on "spring-cloud-starter-stream-source-jdbc".
Is there support for Incremental & Full load while performing Data Stream From RDBMS Source to RDBMS Sink using "spring-cloud-starter-stream-jdbc".
Please share any reference blogs to understand "spring-cloud-starter-stream-source-jdbc" demo application.
You can use OOTB jdbc source/sink apps (with the binder of your choice rabbit, kafka). The spring-cloud-starter-stream projects are the ones you would want to use inside your application if you want to extend/build custom applications based on the jdbc starters.
For the OOTB apps, you can refer here. For instance, jdbc source app with rabbit binder can be found here

Can kafka be used to obtain GoogleAnalytics data and processed through spark streaming?

Can you please anyone tried to integrate google analytics with kafka ..And the processed those data in spark streaming.
Thanks!
A little late since this question was asked but...
I created a python application that demonstrates how to pull information from Google Analytics and push those metrics to a Kafka topic. Soon I should have code that demonstrates analyzing that data.
https://github.com/admintome/analytics-intake
Hopefully, that will demonstrate how to accomplish what you asked for.
There is no API available for scala to access the google analytic.

Is there a Google Dataflow MongoDB Source/Sink?

I know Google Dataflow only officially supports as I/O for a Dataflow a file in Google Cloud Storage, BigQuery, Avro files or Pub/Sub out of the box.
But as it has an API for Custom Source and Sink I was wondering, is there some Pipeline I/O implementation for MongoDB?
Right now I will have to either migrate my data to BigQuery or write the whole Pipeline I/O implementation before even being able to know if Google Dataflow is a viable solution to my current problems.
I tried googling and looking at the current SDK issues and didn't see anything related. I even started to wonder if I missed something very basic from Google Dataflow concept and docs that completely invalidades this initial idea to use MongoDB as a data source.
Recently a MongoDB connector was added to Apache Beam (incubating). Please see MongoDBIO.