Is there a way to send data from InfluxDB to Kafka? - apache-kafka

Is there a way to send data from influxDB to kafka? Also, the kafka topic has an avro schema defined. So I wanted to know if there was a way to send data into kafka from InfluxDB keeping this in mind as well.

There appears to be a few options for this:
Telegraf daemon by Influx there is also an Output plugin.
Kafka consumer for InfluxDB (written in Python)
You could also write your own using the InfluxDB API.

Related

How to display kafka topic message using kafka connect to TSDB database in Prometheus

I want to monitor kafka topic message in prometheus. I will be using kafka connect for the same, but I want to understand how to get the message content details in prometheus tsdb.
You would need to use PushGateway since Prometheus scrapes endpoints, and has no Kafka consumer API. Similarly, Kafka Connect Sinks don't tend to populate any internal metrics server, other than their native JMX server. In other words, JMX Exporter won't let you "see Kafka messages".
There are HTTP Kafka Connect sinks that exist, and you should try using them to send data to PushGateway API.
However, this is a poor use-case for Prometheus.
InfluxDB, for example, can be populated by a Telegraf Kafka consumer (or Kafka Connect), or Elasticsearch by LogStash (or Kafka Connect).
Then Grafana can use Influx or Elasticseach as a datasource to view events... Or it can also use a regular RDBMS

Kafka to BigQuery, best way to consume messages

I need to receive messages to my BigQuery tables and I want to know what is the best way to consume those messages.
My Kafka servers who are at AWS they produce AVRO messages and from what I saw Dataflow needs receive JSON format messages. So I googled and found an article explaining how to receive messages to PubSub, but on PubSub what I only see in this type of architecture, they create a Kafka VM on GCP to produce the messages.
What I need to know is:
It's possible to receive AVRO messages on PubSub from external Kafka Servers and then deserialize the message using my Schema, sending it to Dataflow and finally send it to BigQuery tables?
Or do I need to create a Kafka VM and use it to consume messages from external servers?
This might seem a bit confusing but it is what I am feeling right now. The main goal here is to get messages from Kafka (AVRO format) at AWS and put them on BigQuery tables. If you have any suggestions they are very welcomed
Thanks a lot in advance
The Kafka Connect BigQuery Connector may be exactly what you need. It is a Kafka sink connector that allows you to export messages from Kafka directly to BigQuery. The README page provides detailed configuration instructions, including how to let the connector recognize your Kafka queue and how to enter the information for the destination BigQuery table. This connector should be able to retrieve the AVRO schema automatically from your Kafka project.

Creating new kafka topic using nifi

I'm trying to put some data into an uncreated Kafka topic using the PublishKafka_2.0 processor in Nifi.
I don't have a direct approach to the Kafka server - only via the nifi flow, and i need to create 3 new topics for the data.
How can it be done using nifi??
thank you!
You would need to enable automatic creation of Kafka topics from Kafka itself. NiFi doesn't have any control over Kafka. It just supports consuming and producing. From the sound of it, you may have a setup where automatic topic creation is disabled, so you'll need to have someone create the topics for you.

Is there a Telegraf plugin which takes in queries to read data from Influxdb and then post them on a kafka topic using the Kafka output plugin?

Using Telegraf plugins, there is a way to read data from InfluxDb and publish it to a Kafka topic.
But is there a way to read the data on demand and place it on a Kafka topic? Like a query based demand.
I can do a query based read through REST API (curl GET).
There are HTTP Listener plugins but these are only for POST methods.
None for GET method where I can query a subset of data from InfluxDb and place them on a Kafka topic. In this case, kafka would be the output plugin.
You can achieve it using Kapacitor's Kafka event handler. Kapacitor can be configured either in batch mode or streaming mode. In case of streaming mode, if the condition met for processing, Kapacitor event handler will process the record immediately and send to Kafaka cluster. Please refer here for more details.

Micro-batching through Nifi

I have a scenario where my kafka messages(from same topic) are flowing through single enrichment pipeline and written at the end into HDFS and MongoDB. My Kafka consumer for HDFS will run on hourly basis(for micro-batching). So I need to know the best possible way to route flowfiles to putHDFS and putMongo based on which consumer it is coming from(Consumer for HDFS or consumer for Mongo DB).
Or please suggest if there is any other way to achieve micro-batching through Nifi.
Thanks
You could set Nifi up to use a Scheduling Strategy for the processors that upload data.
And I would think you want the Kafka consumers to always read data, building a backlog of FlowFiles in NiFi, and then having the puts run on a less-frequent basis.
This is similar to how Kafka Connect would run for its HDFS Connector