Flink doesn't consume Data from Kafka publisher - apache-kafka

What I have: http://prntscr.com/szmkn4
That's the most barebone version of it. Some stuff's gonna come later, but for now the issue is that data is properly arriving in my consumer in form of a JSON string.
I want to throw it into a flink table, which I create with this statement: http://prntscr.com/szmll3
I then check if it got created, just to be sure and get this: http://prntscr.com/szmn79
Next I wanna turn on the machine and check my data with "SELECT * FROM RawData" and get the following error:
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.kafka.shaded.org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers
I assume it's an issue with how I created my table, but am honestly not sure where/what/how.
My publisher's properties in NiFi are:
https://prnt.sc/szoe6z
and
http://prntscr.com/szoeka
If you need any additional information from me, feel free to ask.
Thanks in advance,
Psy

[ERROR] Could not execute SQL statement. Reason: org.apache.flink.kafka.shaded.org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers
That likely means that the Kafka bootstrap servers you've specified to Flink cannot have their hostnames resolved on the Flink servers. You'd know if it's a NiFi issue because you'd see errors in the NiFi flow saying it couldn't produce to Kafka. It might be producing to the wrong topic or even the wrong set of brokers if you have multiple Kafka clusters, but the error you posted isn't a NiFi issue.

Related

Kafka connect jdbc sink SQL error handling

I am currently configuring a Kafka JDBC sink connector to write my kafka messages in a Postgres table. All is working fine except the error handling part. Sometimes, messages in my topic have wrong data and so the database constraints fail with an expected SQL EXCEPTION duplicate key...
I would like to put these wrong messages in a DLQ and to commit the offset to process the next messages, so I configured the connector with
"errors.tolerance": "all"
"errors.deadletterqueue.topic.name": "myDLQTopicName"
but it does not change a thing, the connector retries until it crashes.
Is there another configuration I'm missing? I saw only these two in the confluent documentation
(I see in the jdbc connector changelog that the error handling in the put stage is implemented in the version 10.1.0 (CCDB-192) and I'm using the last version of the connector 10.5.1)
"The Kafka Connect framework provides generic error handling and dead-letter queue capabilities which are available for problems with [de]serialisation and Single Message Transforms. When it comes to errors that a connector may encounter doing the actual pull or put of data from the source/target system, it’s down to the connector itself to implement logic around that."
If the duplicate key are the only type of bad records you need to deal with, you might consider use upsert in insert.mode

How to convince Kafka brokers to assign workers to my topic?

We're got a (Confluent) managed kafka installation with 5 brokers and 2 connector hosts. I have two topics which never get any consumers assigned to them, despite repeated starting and stopping of the connectors which are supposed to listen for them. This configuration was running until recently (and no, nothing has changed - we've done an audit to confirm).
What, if anything, can I do to force assignment of consumers to these topics?
This problem occurred for two reasons: (1) I had mistakenly installed new versions of the PostgreSQL and SQLServer JDBC connectors {which conflicted with the already-installed versions}, and (2) not understanding that source connectors do not have Consumer Groups assigned to them.
Getting past the "well duh, of course sources don't have consumers", another vital piece of information (thankyou to the support team at Confluent for this) is that you can see where your connector is up to by reading the private or hidden topic connect-offsets. Once you have that, you can then check your actual DB query and see what it is returning then (if necessary) reset your connector's offset.

PDI transformation does not send messages to Kafka server

I have a transformation in Pentaho Data Integration (PDI) that makes a query to NetSuite, builds JSON strings for each row and finally these strings are sent to Kafka. This is the transformation:
When I test the transform against my local Kafka it works like a charm, as you can see below:
The problem is when I substitute the connection parameters for those of an AWS EC2 instance where I have Kafka as well. The problem is that the transformation does not give errors, but the messages do not reach Kafka, as can be seen here:
This is the configuration of the Kafka Producer step of the transformation:
The strange thing is that although it does not send the messages to Kafka, it seems that it does connect to the server because the combobox is displayed with the names of the topics that I have:
In addition, this error is observed in the PDI terminal:
ERROR [NamedClusterServiceLocatorImpl] Could not find service for interface org.pentaho.hadoop.shim.api.jaas.JaasConfigService associated with named cluster null
Which doesn't make sense to me because I'm using a direct connection and not a connection to a Hadoop Cluster.
So I wanted to ask the members of this community if anyone has used POIs to send messages to Kafka and if they had to make configurations in POI or Slack to achieve it, since I cannot think what could be happening.
Thanks in advance for any ideas or comments to help me solve this!

How can I manage Kafka connect schema errors?

I'm using kafka connect (confluent distribution) to connect an mqtt broker to a kafka topic (https://docs.lenses.io/connectors/source/mqtt.html), but when a message arrives and it isn't conform to the expected schema, the connector stops!
How can I prevent this from happening?
I'd like also to manage the error and for example keep track of it!
If you are using a ready made connector, you need to satisfy the proper schema. If any error occurs it will stop the connector. So, best way is to identify the schema error based on error message.
If its impossible to use the existing connector, create one for your own which could satisfy your need.

In Kafka Connector, how do I get the bootstrap-server address My Kafka Connect is currently using?

I'm developing a Kafka Sink connector on my own. My deserializer is JSONConverter. However, when someone send a wrong JSON data into my connector's topic, I want to omit this record and send this record to a specific topic of my company.
My confuse is: I can't find any API for me to get my Connect's bootstrap.servers.(I know it's in the confluent's etc directory but it's not a good idea to write hard code of the directory of "connect-distributed.properties" to get the bootstrap.servers)
So question, is there another way for me to get the value of bootstrap.servers conveniently in my connector program?
Instead of trying to send the "bad" records from a SinkTask to Kafka, you should instead try to use the dead letter queue feature that was added in Kafka Connect 2.0.
You can configure the Connect runtime to automatically dump records that failed to be processed to a configured topic acting as a DLQ.
For more details, see the KIP that added this feature.