Can I set Kafka Stream consumer group.id? - apache-kafka

I'm using Kafka Stream library for streaming application.
I wanted to set kafka consumer group id.
Then, I put Kafka stream configuration like below.
streamsCopnfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "JoinTestApp");
streamsCopnfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "JonTestClientId1");
streamsCopnfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10 * 1000);
streamsCopnfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstreapServer);
streamsCopnfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsCopnfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.Bytes().getClass().getName());
streamsCopnfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
streamsCopnfiguration.put(StreamsConfig.consumerPrefix("group.id"), "groupId1");
// streamsCopnfiguration.put(ConsumerConfig.GROUP_ID_CONFIG, "groupId1");
But, my console log kafka consumer configuration not seted group.id.
Just stream application id..
2019-03-19 17:17:03,206 [main] INFO org.apache.kafka.clients.consumer.ConsumerConfig - ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [....]
check.crcs = true
client.dns.lookup = default
client.id = JonTestClientId111-StreamThread-1-consumer
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = JoinTestApp
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = false
Can I set kafka stream consumer group.id??

No, you can't. For this purpose in Kafka Streams you need to use application.id.
Kafka Streams application.id is used at various places to isolate resources used by application from others.
application.id is used as Kafka consumer group.id for coordination. That's why you can not set group.id explicitly.
From Kafka Streams Official documentation, application.id is also used at following places:
- As the default Kafka consumer and producer client.id prefix
- As the name of the subdirectory in the state directory (cf. state.dir)
- As the prefix of internal Kafka topic names

Related

Customize input kafka topic name for Spring Cloud Stream

Since #EnableBinding and #StreamListener(Sink.INPUT) were deprecated in favor to functions, I need to create a consumer that would read messages from Kafka topic.
My consumer function:
#Bean
public Consumer<Person> log() {
return person -> {
System.out.println("Received: " + person);
};
}
, application.yml configs
spring:
cloud:
stream:
kafka:
binder:
brokers: localhost:9092
bindings:
consumer:
destination: messages
contentType: application/json
Instead of connecting to topic messages, it keeps connecting to log-in-0 topic.
How could I fix this ?
spring.cloud.stream.bindings.log-in-0.destination=messages

CDC with WSO2 Streaming Integrator and Postgres DB

I am trying to setup Change Data Capture (CDC) between WSO2 Streaming Integrator and a local Postgres DB.
I have added the Postgres Driver (v42.2.5) to SI_HOME/lib and I am able to read data from the database from a Siddhi application.
I am following the CDCWithListeningMode example to implement CDC and I am using pgoutput as the logical decoding plugin. But when I run the application I get the following log.
[2020-04-23_19-02-37_460] INFO {org.apache.kafka.connect.json.JsonConverterConfig} - JsonConverterConfig values:
converter.type = key
schemas.cache.size = 1000
schemas.enable = true
[2020-04-23_19-02-37_461] INFO {org.apache.kafka.connect.json.JsonConverterConfig} - JsonConverterConfig values:
converter.type = value
schemas.cache.size = 1000
schemas.enable = false
[2020-04-23_19-02-37_461] INFO {io.debezium.embedded.EmbeddedEngine$EmbeddedConfig} - EmbeddedConfig values:
access.control.allow.methods =
access.control.allow.origin =
bootstrap.servers = [localhost:9092]
header.converter = class org.apache.kafka.connect.storage.SimpleHeaderConverter
internal.key.converter = class org.apache.kafka.connect.json.JsonConverter
internal.value.converter = class org.apache.kafka.connect.json.JsonConverter
key.converter = class org.apache.kafka.connect.json.JsonConverter
listeners = null
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
offset.flush.interval.ms = 60000
offset.flush.timeout.ms = 5000
offset.storage.file.filename =
offset.storage.partitions = null
offset.storage.replication.factor = null
offset.storage.topic =
plugin.path = null
rest.advertised.host.name = null
rest.advertised.listener = null
rest.advertised.port = null
rest.host.name = null
rest.port = 8083
ssl.client.auth = none
task.shutdown.graceful.timeout.ms = 5000
value.converter = class org.apache.kafka.connect.json.JsonConverter
[2020-04-23_19-02-37_516] INFO {io.debezium.connector.common.BaseSourceTask} - offset.storage = io.siddhi.extension.io.cdc.source.listening.InMemoryOffsetBackingStore
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - database.server.name = localhost_5432
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - database.port = 5432
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - table.whitelist = SweetProductionTable
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - cdc.source.object = 1716717434
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - database.hostname = localhost
[2020-04-23_19-02-37_518] INFO {io.debezium.connector.common.BaseSourceTask} - database.password = ********
[2020-04-23_19-02-37_518] INFO {io.debezium.connector.common.BaseSourceTask} - name = CDCWithListeningModeinsertSweetProductionStream
[2020-04-23_19-02-37_518] INFO {io.debezium.connector.common.BaseSourceTask} - server.id = 6140
[2020-04-23_19-02-37_519] INFO {io.debezium.connector.common.BaseSourceTask} - database.history = io.debezium.relational.history.FileDatabaseHistory
[2020-04-23_19-02-38_103] INFO {io.debezium.connector.postgresql.PostgresConnectorTask} - user 'user_name' connected to database 'db_name' on PostgreSQL 11.5, compiled by Visual C++ build 1914, 64-bit with roles:
role 'user_name' [superuser: false, replication: true, inherit: true, create role: false, create db: false, can log in: true] (Encoded)
[2020-04-23_19-02-38_104] INFO {io.debezium.connector.postgresql.PostgresConnectorTask} - No previous offset found
[2020-04-23_19-02-38_104] INFO {io.debezium.connector.postgresql.PostgresConnectorTask} - Taking a new snapshot of the DB and streaming logical changes once the snapshot is finished...
[2020-04-23_19-02-38_105] INFO {io.debezium.util.Threads} - Requested thread factory for connector PostgresConnector, id = localhost_5432 named = records-snapshot-producer
[2020-04-23_19-02-38_105] INFO {io.debezium.util.Threads} - Requested thread factory for connector PostgresConnector, id = localhost_5432 named = records-stream-producer
[2020-04-23_19-02-38_293] INFO {io.debezium.connector.postgresql.connection.PostgresConnection} - Obtained valid replication slot ReplicationSlot [active=false, latestFlushedLSN=null]
[2020-04-23_19-02-38_704] ERROR {io.siddhi.core.stream.input.source.Source} - Error on 'CDCWithListeningMode'. Connection to the database lost. Error while connecting at Source 'cdc' at 'insertSweetProductionStream'. Will retry in '5 sec'. (Encoded)
io.siddhi.core.exception.ConnectionUnavailableException: Connection to the database lost.
at io.siddhi.extension.io.cdc.source.CDCSource.lambda$connect$1(CDCSource.java:424)
at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:793)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.ConnectException: Cannot create replication connection
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.(PostgresReplicationConnection.java:87)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.(PostgresReplicationConnection.java:38)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$ReplicationConnectionBuilder.build(PostgresReplicationConnection.java:362)
at io.debezium.connector.postgresql.PostgresTaskContext.createReplicationConnection(PostgresTaskContext.java:65)
at io.debezium.connector.postgresql.RecordsStreamProducer.(RecordsStreamProducer.java:81)
at io.debezium.connector.postgresql.RecordsSnapshotProducer.(RecordsSnapshotProducer.java:70)
at io.debezium.connector.postgresql.PostgresConnectorTask.createSnapshotProducer(PostgresConnectorTask.java:133)
at io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:86)
at io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:45)
at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:677)
... 3 more
Caused by: io.debezium.jdbc.JdbcConnectionException: ERROR: could not access file "decoderbufs": No such file or directory
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.initReplicationSlot(PostgresReplicationConnection.java:145)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.(PostgresReplicationConnection.java:79)
... 12 more
Caused by: org.postgresql.util.PSQLException: ERROR: could not access file "decoderbufs": No such file or directory
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:307)
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:293)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:270)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:266)
at org.postgresql.replication.fluent.logical.LogicalCreateSlotBuilder.make(LogicalCreateSlotBuilder.java:48)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.initReplicationSlot(PostgresReplicationConnection.java:108)
... 13 more
Debezium defaults to decoderbufs plugin - "could not access file "decoderbufs": No such file or directory".
According to this answer, the issue is due to the configuration of decoderbufs plugin.
Details
Postgres - 11.4
siddhi-cdc-io - 2.0.3
Debezium - 0.8.3
How do I configure the embedded debezium engine to use the pgoutput plugin? Will changing this configuration fix the error?
Please help me with this issue. I have not found any resources that can help me.
you either need to update the Debezium to the latest 1.1 version - this will enable you to use pgoutput plugin using plugin.name config option or you need to deploy (and maybe build) decoderbufs.so library to your PostgreSQL database.
I'd recommend the former as 0.8.3 is very old version.
I observed this behavior with PostgreSQL 12 when I tried to do CDC with pgoutput logical decoding output plug-in. It seems like even though I configured the database with pgoutput, the siddhi extension is trying to make the connection using "decoderbufs" as decoding plug-in.
When I tried configuring decoderbufs as the logical decoding output plug-in in the database level, I was able to use siddhi io extension without any issue.
It seems like for now, Siddhi io CDC only supports decoderbufs logical decoding output plug-in with PostgreSQL.

How to provide Kafka Streams properties via Spring Cloud Stream in YAML?

I'd like to move spring.kafka.streams.* under spring.cloud.stream - is this possible? I thought of streams-properties similarly to consumer-properties or producer-properties, but it doesn't work.
spring:
cloud:
config:
override-system-properties: false
server:
health:
enabled: false
stream:
bindings:
input_technischerplatz:
destination: technischerplatz
output_technischerplatz:
destination: technischerplatz
default:
group: '${spring.application.name}'
consumer:
max-attempts: 5
kafka:
binder:
auto-add-partitions: false
auto-create-topics: false
brokers: '${values.spring.kafka.bootstrap-servers}'
configuration:
header.mode: headers
consumer-properties:
allow.auto.create.topics: false
auto.offset.reset: '${values.spring.kafka.consumer.auto-offset-reset}'
enable.auto.commit: false
isolation.level: read_committed
max.poll.interval.ms: 300000
max.poll.records: 100
session.timeout.ms: 300000
header-mapper-bean-name: defaultKafkaHeaderMapper
producer-properties:
acks: all
key.serializer: org.apache.kafka.common.serialization.StringSerializer
max.in.flight.requests.per.connection: 1
max.block.ms: '${values.spring.kafka.producer.max-block-ms}'
retries: 10
required-acks: -1
kafka:
streams:
applicationId: '${spring.application.name}_streams'
properties:
default.key.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
default.timestamp.extractor: org.apache.kafka.streams.processor.LogAndSkipOnInvalidTimestamp
state.dir: '${values.spring.kafka.streams.properties.state.dir}'
You can bind the streams properties with spring.cloud.stream in the following manner:
spring.cloud.stream.kafka.streams.binder.applicationId: my-application-id
spring.cloud.stream.kafka.streams.binder.configuration:
default.key.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
default.value.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
For more details, you can refer the documentation:
https://cloud.spring.io/spring-cloud-static/spring-cloud-stream-binder-kafka/3.0.0.M3/reference/html/spring-cloud-stream-binder-kafka.html#_kafka_streams_binder

Alpakka Consumer not consuming messages from Kafka running via Docker compose

I've got Kafka and Zookeeper running via Docker compose. I'm able to send/consume messages to a topic using Kafka terminal and I'm able to monitor everything via Conduktor. But unfortunately, I'm not being able to consume msgs via my Scala app using Alpakka connector. The app connects to the topic but whenever I send a msg to the topic nothing happens.
Just Kafka and Zookeeper are running via docker-compose. I'm running Scala consumer app directly in the host machine.
Docker Compose
version: '3'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
HOSTNAME_COMMAND: "route -n | awk '/UG[ \t]/{print $$2}'"
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
depends_on:
- zookeeper
Scala App
object Main extends App {
implicit val actorSystem = ActorSystem()
import actorSystem.dispatcher
val kafkaConsumerSettings = ConsumerSettings(actorSystem, new StringDeserializer, new StringDeserializer)
.withGroupId("new_id")
.withCommitRefreshInterval(1.seconds)
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
.withBootstrapServers("localhost:9092")
Consumer
.plainSource(kafkaConsumerSettings, Subscriptions.topics("test1"))
.map(msg => msg.value())
.runWith(Sink.foreach(println)).onComplete {
case Failure(exception) => exception.printStackTrace()
case Success(value) => println("done")
}
}
App - Console output
16:58:33.877 INFO [akka.event.slf4j.Slf4jLogger] Slf4jLogger started
16:58:34.470 INFO [akka.kafka.internal.SingleSourceLogic] [1955f] Starting. StageActor Actor[akka://default/system/Materializers/StreamSupervisor-0/$$a#-591284224]
16:58:34.516 INFO [org.apache.kafka.clients.consumer.ConsumerConfig] ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [localhost:9092]
check.crcs = true
client.dns.lookup = default
client.id =
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = novo_id
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
16:58:34.701 INFO [org.apache.kafka.common.utils.AppInfoParser] Kafka version: 2.4.0
16:58:34.702 INFO [org.apache.kafka.common.utils.AppInfoParser] Kafka commitId: 77a89fcf8d7fa018
16:58:34.702 INFO [org.apache.kafka.common.utils.AppInfoParser] Kafka startTimeMs: 1585256314699
16:58:34.715 INFO [org.apache.kafka.clients.consumer.KafkaConsumer] [Consumer clientId=consumer-novo_id-1, groupId=novo_id] Subscribed to topic(s): test1
16:58:35.308 INFO [org.apache.kafka.clients.Metadata] [Consumer clientId=consumer-novo_id-1, groupId=novo_id] Cluster ID: c2XBuDIJTI-gBs9guTvG
Export KAFKA_ADVERTISED_LISTENERS
Describes how the host name that is advertised and can be reached by
clients. The value is published to ZooKeeper for clients to use.
If using the SSL or SASL protocol, the endpoint value must specify the
protocols in the following formats:
SSL: SSL:// or SASL_SSL://
SASL: SASL_PLAINTEXT:// or SASL_SSL://
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:29092
And now your consumer can use port 29092:
.withBootstrapServers("localhost:29092")

How can we configure value.subject.name.strategy for schemas in Spring Cloud Stream Kafka producers, consumers and KStreams?

I would like to customize the naming strategy of the Avro schema subjects in Spring Cloud Stream Producers, Consumers and KStreams.
This would be done in Kafka with the properties key.subject.name.strategy and value.subject.name.strategy -> https://docs.confluent.io/current/schema-registry/serializer-formatter.html#subject-name-strategy
In a native Kafka Producer this works:
private val producer: KafkaProducer<Int, Customer>
init {
val props = Properties()
...
props[AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG] = "http://localhost:8081"
props[AbstractKafkaAvroSerDeConfig.VALUE_SUBJECT_NAME_STRATEGY] = TopicRecordNameStrategy::class.java.name
producer = KafkaProducer(props)
}
fun sendCustomerEvent(customer: Customer) {
val record: ProducerRecord<Int, Customer> = ProducerRecord("customer", customer.id, customer)
producer.send(record)
}
However I cannot find how to do this in Spring Cloud Stream. So far I have tried this in a producer:
spring:
application:
name: spring-boot-customer-service
cloud:
stream:
kafka:
bindings:
output:
producer:
configuration:
key:
serializer: org.apache.kafka.common.serialization.IntegerSerializer
value:
subject:
name:
strategy: io.confluent.kafka.serializers.subject.TopicRecordNameStrategy
Apparently Spring Cloud uses it's own subject naming strategy with the interface org.springframework.cloud.stream.schema.avro.SubjectNamingStrategy and only one subclass: DefaultSubjectNamingStrategy.
Is there declarative way of configuring value.subject.name.strategy or are we expected to provide our own org.springframework.cloud.stream.schema.avro.SubjectNamingStrategy implementation and the property spring.cloud.stream.schema.avro.subject-naming-strategy?
As pointed out in the other answer there's a dedicated property, spring.cloud.stream.schema.avro.subjectNamingStrategy, that allows to set up a different naming strategy for Kafka producers.
I contributed the org.springframework.cloud.stream.schema.avro.QualifiedSubjectNamingStrategy that provides that functionality out of the box.
In the case of Kafka Streams and native serialization/deserialization (default behaviour from Spring Cloud Streams 3.0.0+) you have to use Confluent's implementation (io.confluent.kafka.serializers.subject.RecordNameStrategy) and the native properties:
spring:
application:
name: shipping-service
cloud:
stream:
...
kafka:
streams:
binder:
configuration:
application:
id: shipping-service
...
value:
subject:
name:
strategy: io.confluent.kafka.serializers.subject.RecordNameStrategy
You can declare it in your properties as
spring.cloud.stream.schema.avro.subjectNamingStrategy=MyStrategy
where MyStrategy is an implementation of the interface. For instance
object MyStrategy: SubjectNamingStrategy {
override fun toSubject(schema: Schema): String = schema.fullName
}