I am trying to create camel route with kafka component trying to consume events with io.confluent.kafka.serializers.KafkaAvroDeserializer and schemaRegistry url along with other component parameters. I am not sure if this is full supported by Camel-Kafka currently. Can someone please comment on this ?
from("kafka:{{kafka.notification.topic}}?brokers={{kafka.notification.brokers}}"
+ "&maxPollRecords={{kafka.notification.maxPollRecords}}"
+ "&seekTo={{kafka.notification.seekTo}}"
+ "&specificAvroReader=" + "true"
+ "&valueDeserializer=" + "io.confluent.kafka.serializers.KafkaAvroDeserializer"
+"&schemaRegistryURL=localhost:9021"
+ "&allowManualCommit={{kafka.notification.autocommit}})
specificAvroReader & schemaRegistryURL are the properties which seems to be not supported.
I believe the only way currently to have camel-kafka to work with Confluent Schema Registry is to write a custom AvroSerilizer/ AvroDeserializer (io.confluent.kafka.serializers.AbstractKafkaAvroSerializer/ io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer). E.g.:
BlablaDeserializer extends AbstractKafkaAvroDeserializer implements Deserializer<Object>
and
BlablaSerializer extends AbstractKafkaAvroSerializer implements Serializer<Object>
and then set them on the camel component. E.g. for the value it will be:
KafkaConfiguration kafkaConfiguration.setValueDeserializer(...)
Got this working after adding this to Gradle
compile 'org.apache.camel:camel-kafka:3.0.0-M2'
which can be found in this staging repository https://repository.apache.org/content/repositories/orgapachecamel-1124/org/apache/camel/
I think 3.0.0-M2 will be officially supported by Camel early next week.
Edit : 3.0.0-M2 available now https://repository.apache.org/content/repositories/releases/org/apache/camel/apache-camel/3.0.0-M2/
Has support for camel Kafka & Confluent Schema registry
As already answered, this has been solved in the meantime. You do not have to write your own serializer/deserializer.
I made full example with camel-quarkus and schema-registry from confluent:
https://github.com/tstuber/camel-quarkus-kafka-schema-registry
Related
Problem Definition
I am trying to integrate the data which exists in Confluent Schema Registry with Apache Atlas. For this purpose I have seen lots of links, they also talk about its possibility but they didn't give any technical information of how this integration was done.
Question
Would anyone help me to import the data (also metadata) from Schema Registry to Apache Atlas real-time? Is there any hook, even-listener or something like this to implement it?
Example
Here is what I have from Schema Registry:
{
"subject":"order-value",
"version":1,
"id":101,
"schema":"{\"type\":\"record\",\"name\":\"cart_closed\",\"namespace\":\"com.akbar.avro\",\"fields\":[{\"name\":\"_g\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"_s\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"_u\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"application_version\",\"type\":[\"int\",\"null\"],\"default\":null},{\"name\":\"client_time\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"event_fingerprint\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"os\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"php_session_id\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"platform\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"server_time\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"site\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"user_agent\",\"type\":[\"string\",\"null\"],\"default\":null},{\"name\":\"payment_method_id\",\"type\":[\"int\",\"null\"],\"default\":null},{\"name\":\"page_view\",\"type\":[\"boolean\",\"null\"],\"default\":null},{\"name\":\"items\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"record\",\"name\":\"item\",\"fields\":[{\"name\":\"brand_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"category_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"discount\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"order_item_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"price\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"product_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"quantity\",\"type\":[\"int\",\"null\"],\"default\":null},{\"name\":\"seller_id\",\"type\":[\"long\",\"null\"],\"default\":null},{\"name\":\"variant_id\",\"type\":[\"long\",\"null\"],\"default\":null}]}}},{\"name\":\"cart_id\",\"type\":[\"long\",\"null\"],\"default\":null}]}"
}
How to import it in Apache Atlas?
What I have done
I checked the schema registry documentation in which it has the following architecture:
So I decided to set the Kafka url but I didn't find any where to set the Kafka configuration. I tried to change the atlas.kafka.bootstrap.servers
variable in atlas-application.properties. I have also tried to call import-kafka.sh from hook-bin directory but it wasn't successful.
Error log
2021-04-25 15:48:34,162 ERROR - [main:] ~ Thread Thread[main,5,main] died (NIOServerCnxnFactory$1:92)
org.apache.atlas.exception.AtlasBaseException: EmbeddedServer.Start: failed!
at org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:115)
at org.apache.atlas.Atlas.main(Atlas.java:133)
Caused by: java.lang.NullPointerException
at org.apache.atlas.util.BeanUtil.getBean(BeanUtil.java:36)
at org.apache.atlas.web.service.EmbeddedServer.auditServerStatus(EmbeddedServer.java:128)
at org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:111)
... 1 more
Using the steps documented in structured streaming pyspark, I'm unable to create a dataframe in pyspark from the Azure Event Hub I have set up in order to read the stream data.
Error message is:
java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.eventhubs.EventHubsSourceProvider could not be instantiated
I have installed the Maven libraries (com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.12 is unavailable) but none appear to work:
com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.15
com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.6
As well as ehConf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString) but the error message returned is:
java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$(Lorg/apache/spark/internal/Logging;)V
The connection string is correct as it is also used in a console application that writes to the Azure Event Hub and that works.
Can someone point me in the right direction, please. Code in use is as follows:
from pyspark.sql.functions import *
from pyspark.sql.types import *
# Event Hub Namespace Name
NAMESPACE_NAME = "*myEventHub*"
KEY_NAME = "*MyPolicyName*"
KEY_VALUE = "*MySharedAccessKey*"
# The connection string to your Event Hubs Namespace
connectionString = "Endpoint=sb://{0}.servicebus.windows.net/;SharedAccessKeyName={1};SharedAccessKey={2};EntityPath=ingestion".format(NAMESPACE_NAME, KEY_NAME, KEY_VALUE)
ehConf = {}
ehConf['eventhubs.connectionString'] = connectionString
# For 2.3.15 version and above, the configuration dictionary requires that connection string be encrypted.
# ehConf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString)
df = spark \
.readStream \
.format("eventhubs") \
.options(**ehConf) \
.load()
To resolve the issue, I did the following:
Uninstall azure event hub library versions
Install com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.15 library version from Maven Central
Restart cluster
Validate by re-running code provided in the question
I received this same error when installing libraries with the version number com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.* on a Spark cluster running Spark 3.0 with Scala 2.12
For anyone else finding this via google - check if you have the correct Scala library version. In my case, my cluster is Spark v3 with Scala 2.12
Changing the "2.11" in the library version from the tutorial I was using to "2.12", so it matches my cluster runtime version, fixed the issue.
I had to take this a step further. in the format method I had to add in this:
.format("org.apache.spark.sql.eventhubs.EventHubsSourceProvider") directly.
check the cluster scala version and the library version
Unisntall the older libraries and install :
com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.17
in the shared workspace(right click and install library) and also in the cluster
I'm trying to set up replication between 2 clusters but don't want want the topic names to be changed. for example if i have a topic called "some_topic" it is automatically replicated to "cluster1.some_topic", I'm pretty sure this can be done but haven't found the correct config to change this
My current config "mirrormaker2.properties"
# Sample MirrorMaker 2.0 top-level configuration file
# Run with ./bin/connect-mirror-maker.sh connect-mirror-maker.properties
# specify any number of cluster aliases
clusters = cluster1, cluster2
# connection information for each cluster
cluster1.bootstrap.servers = host1:9092,host2:9092,host3:9092
cluster2.bootstrap.servers = rep_host1:9092,rep_host2:9092,rep_host3:9092
# enable and configure individual replication flows
cluster1->cluster2.enabled = true
cluster1->cluster2.topics = sometopic.*
# customize as needed
# replication.policy.separator = _
# sync.topic.acls.enabled = false
# emit.heartbeats.interval.seconds = 5
for reference:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382:MirrorMaker2.0-RunningastandaloneMirrorMakerconnector
https://kafka.apache.org/24/javadoc/index.html?constant-values.html
To 'disable' topic prefixes and to have topic properties mirrored properly at the same time, I had to provide a customized replication policy which also overrides the topicSource method. Otherwise non-default topic properties (e.g., "cleanup.policy=compact") have not been mirrored, even after restarting mirror maker.
Here is the complete procedure that worked for me:
Compile and package the following customized replication policy into a .jar file (full source code can be found here):
public class PrefixlessReplicationPolicy extends DefaultReplicationPolicy {
private static final Logger log = LoggerFactory.getLogger(PrefixlessReplicationPolicy.class);
private String sourceClusterAlias;
#Override
public void configure(Map<String, ?> props) {
super.configure(props);
sourceClusterAlias = (String) props.get(MirrorConnectorConfig.SOURCE_CLUSTER_ALIAS);
if (sourceClusterAlias == null) {
String logMessage = String.format("Property %s not found", MirrorConnectorConfig.SOURCE_CLUSTER_ALIAS);
log.error(logMessage);
throw new RuntimeException(logMessage);
}
}
#Override
public String formatRemoteTopic(String sourceClusterAlias, String topic) {
return topic;
}
#Override
public String topicSource(String topic) {
return topic == null ? null : sourceClusterAlias;
}
#Override
public String upstreamTopic(String topic) {
return null;
}
}
Copy the .jar into the ${KAFKA_HOME/libs directory
Configure Mirror Maker 2 to use that replication policy by setting the replication.policy.class property in ${KAFKA_HOME}/config/mm2.properties:
replication.policy.class=ch.mawileo.kafka.mm2.PrefixlessReplicationPolicy
I was able to remove the prefix using this setup:
"replication.policy.separator": ""
"source.cluster.alias": "",
"target.cluster.alias": "",
If the alias setting is necessary in your case, I understand you should use other replicationPolicy class. By default is using DefaultReplicationPolicy class (https://kafka.apache.org/24/javadoc/org/apache/kafka/connect/mirror/DefaultReplicationPolicy.html)
Starting with Kafka 3.0.0, it is sufficient to set
replication.policy.class=org.apache.kafka.connect.mirror.IdentityReplicationPolicy
Also, the PrefixlessReplicationPolicy in marcin-wieloch's answer https://stackoverflow.com/a/60619233/12008693 no longer works with 3.0.0 (NullPointerException).
I think the answer above is inappropriate.
In Mirror Maker 2.0, if you want to keep the topic unmodified, you have to implement ReplicationPolicy.
You can refer to DefaultReplicationPolicy.class, and then override formatRemoteTopic(), after that you have to remove sourceClusterAlias + separator. In the end, configure replication.policy.class in the mm2.properties
I defined MigrationReplicationPolicy.class
replication.policy.class = org.apache.kafka.connect.mirror.MigrationReplicationPolicy
You should see MirrorClientConfig,class, I know that you will understand
Managed to push replication with Kafka ConfluentINC Connector image release 5.4.2
Properties are:
connector.class=org.apache.kafka.connect.mirror.MirrorSourceConnector
target.cluster.alias=
replication.factor=3
tasks.max=3
topics=.*
source.cluster.alias=
target.cluster.bootstrap.servers=<broker1>,<broker2>,<broker3>
replication.policy.separator=
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
source.cluster.bootstrap.servers=<broker1>,<broker2>,<broker3>
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
1) Leave space after the 3 parameters: source.cluster.alias, replication.policy.separator, target.cluster.alias.
2) Set this mirror connector on the TARGET Kafka, NOT on the source (performs pull only)
In addition yo can use Conductor or Kafka Connector UI landoop image - landoop/kafka-connect-ui
This is still in a testing scenario, but it looks promising.
I'm trying to set up replication between 2 clusters, but need the same topic name in both the cluster without giving an alias for the in the connect-mirror-maker.properties.
By default, replicated topics are renamed based on source cluster aliases.
Source --> Target
topic-1 --> source.topic-1
You can avoid topics being renamed by setting the following properties to blank under your connector properties file. By default, replication.policy.separator property is a period, then by setting it to blank along with the source.cluster.alias, the target topic will have the same name as the source topic.
replication.policy.separator=
source.cluster.alias=
target.cluster.alias=
I'm attempting to create a custom exception handler for my Spring Cloud Dataflow stream to route some errors to be requeued and others to be DLQ'd.
To do this I'm utilizing the global Spring Integration "errorChannel" and routing based on exception type.
This is the code for the Spring Integration error router:
package com.acme.error.router;
import com.acme.exceptions.DlqException;
import org.springframework.cloud.stream.annotation.EnableBinding;
import org.springframework.integration.annotation.MessageEndpoint;
import org.springframework.integration.annotation.Router;
import org.springframework.integration.transformer.MessageTransformationException;
import org.springframework.messaging.Message;
#MessageEndpoint
#EnableBinding({ ErrorMessageChannels.class })
public class ErrorMessageMappingRouter {
private static final Logger LOGGER = LoggerFactory.getLogger(ErrorMessageMappingRouter.class);
public static final String ERROR_CHANNEL = "errorChannel";
#Router(inputChannel = ERROR_CHANNEL)
public String onError(Message<Object> message) {
LOGGER.debug("ERROR ROUTER - onError");
if(message.getPayload() instanceof MessageTransformationException) {
MessageTransformationException exception = (MessageTransformationException) message.getPayload();
Message<?> failedMessage = exception.getFailedMessage();
if(exceptionChainContainsDlq(exception)) {
return ErrorMessageChannels.DLQ_QUEUE_NAME;
}
return ErrorMessageChannels.REQUEUE_CHANNEL;
}
return ErrorMessageChannels.DLQ_QUEUE_NAME;
}
...
}
The error router is picked up by each of the stream apps through a package scan on the Spring Boot App for each:
#ComponentScan(basePackages = { "com.acme.error.router" }
#SpringBootApplication
public class StreamApp {}
When this is deployed and run with the local Spring Cloud Dataflow server (version 1.5.0-RELEASE), and a DlqException is thrown, the message is successfully routed to the onError method in the errorRouter and then placed into the dlq topic.
However, when this is deployed as a docker container with SCDF Kubernetes server (also version 1.5.0-RELEASE), the onError method is never hit. (The log statement at the beginning of the router is never output)
In the startup logs for the stream apps, it looks like the bean is picked up correctly and registers as a listener for the errorChannel, but for some reason, when exceptions are thrown they do not get handled by the onError method in our router.
Startup Logs:
o.s.i.endpoint.EventDrivenConsumer : Adding {router:errorMessageMappingRouter.onError.router} as a subscriber to the 'errorChannel' channel
o.s.i.channel.PublishSubscribeChannel : Channel 'errorChannel' has 1 subscriber(s).
o.s.i.endpoint.EventDrivenConsumer : started errorMessageMappingRouter.onError.router
We are using all default settings for the spring cloud stream and kafka binder configurations:
spring.cloud:
stream:
binders:
kafka:
type: kafka
environment.spring.cloud.stream.kafka.binder.brokers=brokerlist
environment.spring.cloud.stream.kafka.binder.zkNodes=zklist
Edit: Added pod args from kubectl describe <pod>
Args:
--spring.cloud.stream.bindings.input.group=delivery-stream
--spring.cloud.stream.bindings.output.producer.requiredGroups=delivery-stream
--spring.cloud.stream.bindings.output.destination=delivery-stream.enricher
--spring.cloud.stream.binders.xdkafka.environment.spring.cloud.stream.kafka.binder.zkNodes=<zkNodes>
--spring.cloud.stream.binders.xdkafka.type=kafka
--spring.cloud.stream.binders.xdkafka.defaultCandidate=true
--spring.cloud.stream.binders.xdkafka.environment.spring.cloud.stream.kafka.binder.brokers=<brokers>
--spring.cloud.stream.bindings.input.destination=delivery-stream.config-enricher
One other idea we attempted was trying to use the Spring Cloud Stream - spring integration error channel support to send to a broker topic on errors, but since messages don't seem to be landing in the global Spring Integration errorChannel at all, that didn't work either.
Is there anything special we need to do in SCDF Kubernetes to enable the global Spring Integration errorChannel?
What am I missing here?
Update with solution from the comments:
After reviewing your configuration I am now pretty sure I know what
the issue is. You have a multi-binder configuration scenario. Even if
you only deal with a single binder instance the existence of
spring.cloud.stream.binders.... is what's going to make framework
treat it as multi-binder. Basically this a bug -
github.com/spring-cloud/spring-cloud-stream/issues/1384. As you can
see it was fixed but you need to upgrade to Elmhurst.SR2 or grab the
latest snapshot (we're in RC2 and 2.1.0.RELEASE is in few weeks
anyway) – Oleg Zhurakousky
This was indeed the problem with our setup. Instead of upgrading, we just eliminated our multi-binder usage for now and the issue was resolved.
Update with solution from the comments:
After reviewing your configuration I am now pretty sure I know what
the issue is. You have a multi-binder configuration scenario. Even if
you only deal with a single binder instance the existence of
spring.cloud.stream.binders.... is what's going to make framework
treat it as multi-binder. Basically this a bug -
github.com/spring-cloud/spring-cloud-stream/issues/1384. As you can
see it was fixed but you need to upgrade to Elmhurst.SR2 or grab the
latest snapshot (we're in RC2 and 2.1.0.RELEASE is in few weeks
anyway) – Oleg Zhurakousky
This was indeed the problem with our setup. Instead of upgrading, we just eliminated our multi-binder usage for now and the issue was resolved.
I am trying to write a program that enables me to run predefined KSQL operations on Kafka topics in Scala, but I don't want to open the KSQL Cli everytime. Therefore I want to start the KSQL "Server" from within my Scala program. If I understand the KSQL source code right, I have to build and start a KsqlRestApplication:
def restServer = KsqlRestApplication.buildApplication(new
KsqlRestConfig(defaultServerProperties), true, new VersionCheckerAgent
{override def start(ksqlModuleType: KsqlModuleType, properties:
Properties): Unit = ???})
But when I try doing that, I get the following error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.kafka.streams.StreamsConfig.getConsumerConfigs(Ljava/lang/String;Ljava/lang/String;)Ljava/util/Map;
at io.confluent.ksql.rest.server.BrokerCompatibilityCheck.create(BrokerCompatibilityCheck.java:62)
at io.confluent.ksql.rest.server.KsqlRestApplication.buildApplication(KsqlRestApplication.java:241)
I looked into the function call in BrokerCompatibilityCheck and in the create function it calls the StreamsConfig.getConsumerConfigs() with 2 Strings as parameters instead of the parameters defined in
https://kafka.apache.org/0102/javadoc/org/apache/kafka/streams/StreamsConfig.html#getConsumerConfigs(StreamThread,%20java.lang.String,%20java.lang.String).
Are my KSQL and Kafka version simply not compatible or am I doing something wrong?
I am using KSQL version 4.1.0-SNAPSHOT and Kafka version 1.0.0.
Yes, NoSuchMethodError typically indicates a version incompatibility between libraries.
The link you posted is to javadoc for kafka 0.10.2. The method hasn't changed in 1.0 but indeed in the upcoming 1.1 it only takes 2 Strings:
https://kafka.apache.org/11/javadoc/org/apache/kafka/streams/StreamsConfig.html#getConsumerConfigs(java.lang.String,%20java.lang.String)
. That suggests the version of KSQL you're using (4.1.0-SNAPSHOT) depends on version 1.1 of kafka streams, which is currently in the release candidate phase and I believe and should be out soon:
https://lists.apache.org/thread.html/780c4458b16590e99261b69d7b41b9ec374a3226d72c8d38885a008a#%3Cusers.kafka.apache.org%3E
As per that email you can find the latest (1.1.0-rc2) artifacts in the apache staging repo:
https://repository.apache.org/content/groups/staging/