How can I test a Spring Cloud Stream Kafka Streams application that uses Avro and the Confluent Schema Registry? - apache-kafka

I am having trouble figuring out how to test a Spring Cloud Stream Kafka Streams application that uses Avro as message format and a (Confluent) schema registry.
The configuration could be something like this:
spring:
application:
name: shipping-service
cloud:
stream:
schema-registry-client:
endpoint: http://localhost:8081
kafka:
streams:
binder:
configuration:
application:
id: shipping-service
default:
key:
serde: org.apache.kafka.common.serialization.Serdes$IntegerSerde
schema:
registry:
url: ${spring.cloud.stream.schema-registry-client.endpoint}
value:
subject:
name:
strategy: io.confluent.kafka.serializers.subject.RecordNameStrategy
bindings:
input:
consumer:
valueSerde: io.confluent.kafka.streams.serdes.avro.SpecificAvroSerde
order:
consumer:
valueSerde: io.confluent.kafka.streams.serdes.avro.GenericAvroSerde
output:
producer:
valueSerde: io.confluent.kafka.streams.serdes.avro.SpecificAvroSerde
bindings:
input:
destination: customer
order:
destination: order
output:
destination: order
server:
port: 8086
logging:
level:
org.springframework.kafka.config: debug
NOTES:
It is using native serialization/deserialization.
Test framework: Junit 5
I guess regarding the Kafka Broker I should use a EmbeddedKafkaBroker bean, but as you see, it also relies on a Schema Registry that should be mocked in some way. How?

Sorting this out has been a real pain, but finally I managed to make it work using fluent-kafka-streams-tests:
Extra dependencies:
testImplementation("org.springframework.kafka:spring-kafka-test")
testImplementation("com.bakdata.fluent-kafka-streams-tests:schema-registry-mock-junit5:2.0.0")
The key is to set up the necessary configs as System properties. For that I created a separated test configuration class:
#Configuration
class KafkaTestConfiguration(private val embeddedKafkaBroker: EmbeddedKafkaBroker) {
private val schemaRegistryMock = SchemaRegistryMock()
#PostConstruct
fun init() {
System.setProperty("spring.kafka.bootstrap-servers", embeddedKafkaBroker.brokersAsString)
System.setProperty("spring.cloud.stream.kafka.streams.binder.brokers", embeddedKafkaBroker.brokersAsString)
schemaRegistryMock.start()
System.setProperty("spring.cloud.stream.schema-registry-client.endpoint", schemaRegistryMock.url)
System.setProperty("spring.cloud.stream.kafka.streams.binder.configuration.schema.registry.url", schemaRegistryMock.url)
}
#Bean
fun schemaRegistryMock(): SchemaRegistryMock {
return schemaRegistryMock
}
#PreDestroy
fun preDestroy() {
schemaRegistryMock.stop()
}
}
Finally the test class, where you can now produce and consume Avro messages with your KStream processing them and taking advantage of the mock schema registry:
#EmbeddedKafka
#SpringBootTest(properties = [
"spring.profiles.active=local",
"schema-registry.user=",
"schema-registry.password=",
"spring.cloud.stream.bindings.event.destination=event",
"spring.cloud.stream.bindings.event.producer.useNativeEncoding=true",
"spring.cloud.stream.kafka.streams.binder.configuration.application.server=localhost:8080",
"spring.cloud.stream.kafka.streams.bindings.event.consumer.keySerde=io.confluent.kafka.streams.serdes.avro.SpecificAvroSerde",
"spring.cloud.stream.kafka.streams.bindings.event.consumer.valueSerde=io.confluent.kafka.streams.serdes.avro.SpecificAvroSerde"])
class MyApplicationTests {
#Autowired
private lateinit var embeddedKafka: EmbeddedKafkaBroker
#Autowired
private lateinit var schemaRegistryMock: SchemaRegistryMock
#Test
fun `should process events`() {
val senderProps = KafkaTestUtils.producerProps(embeddedKafka)
senderProps[ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG] = "io.confluent.kafka.serializers.KafkaAvroSerializer"
senderProps[ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG] = "io.confluent.kafka.serializers.KafkaAvroSerializer"
senderProps["schema.registry.url"] = schemaRegistryMock.url
val pf = DefaultKafkaProducerFactory<Int, String>(senderProps)
try {
val template = KafkaTemplate(pf, true)
template.defaultTopic = "event"
...
}

Related

Spring cloud stream: Attaching function to binder by type

I have a Spring cloud stream application implemented in Functional approach. the app consumes events from multiple Kafka topics, normalizes the input into output schema (always same schema) and publishes to Kafka. I am not using Kafka-streams since no join/enrichment/state is required.
I want to allow flexible deployment by controlling the input topics to consume from at runtime: you can either consume from all topics or from single topic. my way to do it was to declare dedicated function for each type, and a dedicated binding for each function.
The problem is that the binder (there is a single one) routes all incoming messages to all bindings, and I get ClassCastException when the wrong function is called to handle some event type.
I thought of the following solutions, yet I want to know if there is a better way:
having a single binder per binding. I rather not to, especially
since I'm using a well-configured binder and I don't want to simply
duplicate it.
having a single binder and a single function of type
Message<?>, that internally checks the object type, cast it
and handles it by type.
my application.yaml looks like this:
spring:
cloud:
function:
definition: data;more
stream:
default-binder: kafka-string-avro
bindings:
data-in-0:
binder: kafka-string-avro
destination: data.emails.events
group: communication_system_events_data_gp
data-out-0:
binder: kafka-string-avro
destination: communication.system.emails.events
producer:
useNativeEncoding: true
more-in-0:
binder: kafka-string-avro
destination: communication.emails.send.status
group: communication_system_events_more_gp
more-out-0:
binder: kafka-string-avro
destination: communication.system.emails.events
producer:
useNativeEncoding: true
my functions:
#Bean("data")
public Function<Message<Data>, Message<Output>> dataFunction() {
return new DataFunction();
}
#Bean("more")
public Function<Message<More>, Message<Output>> moreFunction() {
return new MoreFunction();
}
Not sure where the issue is, but I am seeing some configuration issues with what you provided. It might be a typo when you copied to the question, but the following config should isolate the two different topics to their corresponding functions.
spring:
cloud:
function:
definition: dataFunction;moreFunction
stream:
default-binder: kafka-string-avro
bindings:
dataFunction-in-0:
binder: kafka-string-avro
destination: data.emails.events
group: communication_system_events_data_gp
dataFunction-out-0:
binder: kafka-string-avro
destination: communication.system.emails.events
producer:
useNativeEncoding: true
moreFunction-in-0:
binder: kafka-string-avro
destination: communication.emails.send.status
group: communication_system_events_more_gp
moreFunction-out-0:
binder: kafka-string-avro
destination: communication.system.emails.events
producer:
useNativeEncoding: true
#Bean("data")
public Function<Message<Data>, Message<Output>> dataFunction() {
return new DataFunction();
}
#Bean("more")
public Function<Message<More>, Message<Output>> moreFunction() {
return new MoreFunction();
}

Customize input kafka topic name for Spring Cloud Stream

Since #EnableBinding and #StreamListener(Sink.INPUT) were deprecated in favor to functions, I need to create a consumer that would read messages from Kafka topic.
My consumer function:
#Bean
public Consumer<Person> log() {
return person -> {
System.out.println("Received: " + person);
};
}
, application.yml configs
spring:
cloud:
stream:
kafka:
binder:
brokers: localhost:9092
bindings:
consumer:
destination: messages
contentType: application/json
Instead of connecting to topic messages, it keeps connecting to log-in-0 topic.
How could I fix this ?
spring.cloud.stream.bindings.log-in-0.destination=messages

mongoDB inserting twice when called on different Threads

Basically I am consuming Messages from spring cloud stream kafka and inserting it into the MongoDB
My code works fine if my mongo cluster is up
I have 2 problems In case My Mongo Instance is down
auto commit of cloud stream is disabled (autoCommitOffset set to false) then also re-polling is not happening even if it hasn't Acknowledged the message yet
While Checking For Mongo Connection it takes some time and in that time period if it receive two meesages with same ID and after that if i start the instance of mongo it duplicates the messages which in normal case is working fine
Do we have any solution for these?
Here is my code,
interface ResourceInventorySink {
companion object {
const val INPUT = "resourceInventoryInput"
}
#Input(INPUT)
fun input(): SubscribableChannel
}
#EnableBinding(ResourceInventorySink::class)
class InventoryEventListeners {
val logger = LoggerFactory.getLogger(javaClass)
#Autowired
lateinit var resourceInventoryService : ResourceInventoryService
#StreamListener(ResourceInventorySink.INPUT, condition = OperationConstants.INSERT)
fun receiveInsert(event : Message<ResourceInventoryEvent>) {
logger.info("received Insert message {}", event.payload.toString())
val success = resourceInventoryService.insert(event.payload)
success.subscribe({
logger.info("Data Inserted", event.payload.toString())
event.headers.get(KafkaHeaders.ACKNOWLEDGMENT, Acknowledgment::class.java)?.acknowledge()
},{
if(it !is DataAccessResourceFailureException) {
logger.error("Exception Occured {} {}", it.message , it.cause.toString())
event.headers.get(KafkaHeaders.ACKNOWLEDGMENT, Acknowledgment::class.java)?.acknowledge()
}
else {
logger.error("Error Inserting in Mongo DB {}", it.cause)
}
})
}
Here is my service class
#Service
class ResourceInventoryService() {
val logger = LoggerFactory.getLogger(javaClass)
#Autowired
lateinit var resourceInventoryRepository: ResourceInventoryRepository
fun insert(newResource: ResourceInventoryEvent) = resourceInventoryRepository
.findByProductId(newResource.productId)
.switchIfEmpty(newResource.convertTODocument().toMono())
.flatMap { resourceInventoryRepository.save(it) }
.onErrorResume { Mono.error(it) }
this is my application.yml
spring:
cloud:
stream:
default:
consumer:
useNativeEncoding: true
kafka:
binder:
brokers:
- localhost:9092
consumer-properties:
key.deserializer : org.apache.kafka.common.serialization.StringDeserializer
value.deserializer: io.confluent.kafka.serializers.KafkaAvroDeserializer
schema.registry.url: http://localhost:8081
enable.auto.commit: false
specific.avro.reader: true
bindings:
resourceInventoryInput:
consumer:
autoOffsetCommit: false
default-binder: kafka
bindings:
resourceInventoryInput:
binder: kafka
destination: ${application.messaging.topic}
content-type: application/*+avro
group: ${application.messaging.group}
EDIT 1. Acknowledgment is null

How can we configure value.subject.name.strategy for schemas in Spring Cloud Stream Kafka producers, consumers and KStreams?

I would like to customize the naming strategy of the Avro schema subjects in Spring Cloud Stream Producers, Consumers and KStreams.
This would be done in Kafka with the properties key.subject.name.strategy and value.subject.name.strategy -> https://docs.confluent.io/current/schema-registry/serializer-formatter.html#subject-name-strategy
In a native Kafka Producer this works:
private val producer: KafkaProducer<Int, Customer>
init {
val props = Properties()
...
props[AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG] = "http://localhost:8081"
props[AbstractKafkaAvroSerDeConfig.VALUE_SUBJECT_NAME_STRATEGY] = TopicRecordNameStrategy::class.java.name
producer = KafkaProducer(props)
}
fun sendCustomerEvent(customer: Customer) {
val record: ProducerRecord<Int, Customer> = ProducerRecord("customer", customer.id, customer)
producer.send(record)
}
However I cannot find how to do this in Spring Cloud Stream. So far I have tried this in a producer:
spring:
application:
name: spring-boot-customer-service
cloud:
stream:
kafka:
bindings:
output:
producer:
configuration:
key:
serializer: org.apache.kafka.common.serialization.IntegerSerializer
value:
subject:
name:
strategy: io.confluent.kafka.serializers.subject.TopicRecordNameStrategy
Apparently Spring Cloud uses it's own subject naming strategy with the interface org.springframework.cloud.stream.schema.avro.SubjectNamingStrategy and only one subclass: DefaultSubjectNamingStrategy.
Is there declarative way of configuring value.subject.name.strategy or are we expected to provide our own org.springframework.cloud.stream.schema.avro.SubjectNamingStrategy implementation and the property spring.cloud.stream.schema.avro.subject-naming-strategy?
As pointed out in the other answer there's a dedicated property, spring.cloud.stream.schema.avro.subjectNamingStrategy, that allows to set up a different naming strategy for Kafka producers.
I contributed the org.springframework.cloud.stream.schema.avro.QualifiedSubjectNamingStrategy that provides that functionality out of the box.
In the case of Kafka Streams and native serialization/deserialization (default behaviour from Spring Cloud Streams 3.0.0+) you have to use Confluent's implementation (io.confluent.kafka.serializers.subject.RecordNameStrategy) and the native properties:
spring:
application:
name: shipping-service
cloud:
stream:
...
kafka:
streams:
binder:
configuration:
application:
id: shipping-service
...
value:
subject:
name:
strategy: io.confluent.kafka.serializers.subject.RecordNameStrategy
You can declare it in your properties as
spring.cloud.stream.schema.avro.subjectNamingStrategy=MyStrategy
where MyStrategy is an implementation of the interface. For instance
object MyStrategy: SubjectNamingStrategy {
override fun toSubject(schema: Schema): String = schema.fullName
}

Kafka - Redirect messages from "Topic A" to "Topic B" based on header value

I would like to redirect kafka messages from a topic called "all-topic" to a topic named "headervalue-topic" where headervalue is the value of a custom header each message has.
At the moment i'm using a custom console application that consumes messages and redirects the messages to the correct topic, but it only process 16 messages per second.
Both kafka and zookeeper are running in a docker container, configured as such :
zookeeper:
image: "wurstmeister/zookeeper:latest"
restart: always
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_SERVER_ID: 1
kafka:
hostname: kafka
image: "wurstmeister/kafka:latest"
restart: always
depends_on:
- zookeeper
ports:
- "9092:9092"
environment:
KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
KAFKA_ADVERTISED_HOST_NAME: kafka
KAFKA_ADVERTISED_PORT: 9092
What is the best and fastest way to achieve my goal?
I do know about the existence of Kafka Streams, but i'm not familiar with Java so in case you'd like to suggest Kafka Streams a little example would be appreciated :)
Many Thanks!
Here is the solution i came up with, using kafka-streams nodejs library :
const {KafkaStreams} = require("kafka-streams");
const {nativeConfig: config} = require("./config.js");
const kafkaStreams = new KafkaStreams(config);
const myConsumerStream = kafkaStreams.getKStream("all-topic");
myConsumerStream
.mapJSONConvenience()
.filter((element) => {
return element.value.type == "Article";
})
.tap((element) => {console.log("Got Article")})
.mapWrapKafkaValue()
.to("Article-topic", 1, "buffer");
myConsumerStream.start();
From what I know you can't access the header directly through DSL.
You can access it through ProcessorContext using stream processor though and here is a little example I came up with:
public class CustomProcessor1 implements Processor<String, String> {
private ProcessorContext context;
#Override
public void init(ProcessorContext processorContext) {
this.context = processorContext;
}
#Override
public void process(String key, String value) {
HashMap<String, String> headers = new HashMap<>();
for (Header header : context.headers()) {
headers.put(header.key(), new String(header.value()));
}
String headerValue = headers.get("certainHeader").replace("\"", "");
if (headerValue.equals("expectedHeaderValue")) {
context.forward(key, value);
}
}
Above is the processor which will forward messages with certainHeader that matches headerValue to downstream process. This processor will be used when creating the streaming topology like below:
public static void main(String[] args) throws Exception {
Properties props = getProperties();
final Topology topology = new Topology()
.addSource("SOURCE", "all.topic")
.addProcessor("CUSTOM_PROCESSOR_1", CustomProcessor1::new, "SOURCE")
.addProcessor("CUSTOM_PROCESSOR_2", CustomProcessor2::new, "SOURCE")
.addSink("SINK1", "headervalue1-topic", "CUSTOM_PROCESSOR_1")
.addSink("SINK2", "headervalue2-topic", "CUSTOM_PROCESSOR_2");