unable to fetch messages from Kafka topic using spark structured Streaming - spark-structured-streaming

I have created kafka topic with below command,
kafka-topics --create --zookeeper 10.0.2.11:2181 --replication-factor 1 --partitions 1 --topic xmlStore
I am using spark structured streaming to subscribe and read from topic. below is the code
object Test {
val spark = SparkSession.builder
.appName("Spark-Kafka-Integration")
.master("local")
.getOrCreate()
def main(args: Array[String]): Unit = {
println("Inside Main ===>")
import spark.implicits._
val inputDf = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "xmlStore")
.option("startingOffsets" , "earliest")
.load()
inputDf.printSchema()
val dataSet: Dataset[(String, String)] =inputDf.selectExpr("CAST(key AS
STRING)", "CAST(value AS STRING)")
.as[(String, String)]
val query = dataSet
.writeStream
.outputMode("append")
.format("memory")
.queryName("op")
.start()
println("*********" +query.isActive)
spark.sql("select * from op").show(false)
query.explain()
query.awaitTermination()
}
}
After submitting above spark consumer job, I am producing values with below command
kafka-console-producer --broker-list 10.0.1.10:9092 --topic xmlStore
Hello how are you?
I am not seeing any values on console. i also tried with
dataSet
.writeStream
.outputMode("append")
.format("console")
.start()
Still no values are displayed. I dont know what went wrong. please help me out.thanks in advance
I am getting below output on console,
18/12/14 07:15:09 INFO metadata.Hive: Registering function
tempfromfartocelcius com.hive.temparature.udf.ConvertToCelcius
root
|-- key: binary (nullable = true)
|-- value: binary (nullable = true)
|-- topic: string (nullable = true)
|-- partition: integer (nullable = true)
|-- offset: long (nullable = true)
|-- timestamp: timestamp (nullable = true)
|-- timestampType: integer (nullable = true)
18/12/14 07:15:11 INFO streaming.MicroBatchExecution: Starting op [id =
dfceb317-4a27-4b8a-b2f8-3ff7aa4a0242, runId = f6fe77e1-cb64-4d41-938f-
db573380dc71]. Use hdfs://ip-10-0-1-20.ec2.internal:8020/tmp/temporary-
b6d69d12-26b4-4635-80bc-d923cf82dc23 to store the query checkpoint.
18/12/14 07:15:11 INFO consumer.ConsumerConfig: ConsumerConfig values:
metric.reporters = []
metadata.max.age.ms = 300000
partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
max.partition.fetch.bytes = 1048576
bootstrap.servers = [localhost:9092]
ssl.keystore.type = JKS
enable.auto.commit = false
sasl.mechanism = GSSAPI
interceptor.classes = null
exclude.internal.topics = true
ssl.truststore.password = null
client.id =
ssl.endpoint.identification.algorithm = null
max.poll.records = 1
check.crcs = true
request.timeout.ms = 40000
heartbeat.interval.ms = 3000
auto.commit.interval.ms = 5000
receive.buffer.bytes = 65536
ssl.truststore.type = JKS
ssl.truststore.location = null
ssl.keystore.password = null
fetch.min.bytes = 1
send.buffer.bytes = 131072
value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
group.id = spark-kafka-source-4b2e17d1-f289-47be-ab4f-fce9fa1c1723-
-744805972-driver-0
retry.backoff.ms = 100
ssl.secure.random.implementation = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.trustmanager.algorithm = PKIX
ssl.key.password = null
fetch.max.wait.ms = 500
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
session.timeout.ms = 30000
metrics.num.samples = 2
key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
ssl.protocol = TLS
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.keystore.location = null
ssl.cipher.suites = null
security.protocol = PLAINTEXT
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
auto.offset.reset = earliest
18/12/14 07:15:11 INFO consumer.ConsumerConfig: ConsumerConfig values:
metric.reporters = []
metadata.max.age.ms = 300000
partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
max.partition.fetch.bytes = 1048576
bootstrap.servers = [localhost:9092]
ssl.keystore.type = JKS
enable.auto.commit = false
sasl.mechanism = GSSAPI
interceptor.classes = null
exclude.internal.topics = true
ssl.truststore.password = null
client.id = consumer-2
ssl.endpoint.identification.algorithm = null
max.poll.records = 1
check.crcs = true
request.timeout.ms = 40000
heartbeat.interval.ms = 3000
auto.commit.interval.ms = 5000
receive.buffer.bytes = 65536
ssl.truststore.type = JKS
ssl.truststore.location = null
ssl.keystore.password = null
fetch.min.bytes = 1
send.buffer.bytes = 131072
value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
group.id = spark-kafka-source-4b2e17d1-f289-47be-ab4f-fce9fa1c1723-
-744805972-driver-0
retry.backoff.ms = 100
ssl.secure.random.implementation = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.trustmanager.algorithm = PKIX
ssl.key.password = null
fetch.max.wait.ms = 500
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
session.timeout.ms = 30000
metrics.num.samples = 2
key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
ssl.protocol = TLS
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.keystore.location = null
ssl.cipher.suites = null
security.protocol = PLAINTEXT
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
auto.offset.reset = earliest
INFO utils.AppInfoParser: Kafka version : 0.10.0-kafka-2.1.0
INFO utils.AppInfoParser: Kafka commitId : unknown
INFO streaming.MicroBatchExecution: Starting new streaming query.
+-----++-----+
|key ||value|
+-----++-----+
+-----++-----+

I was trying to to run on Cluster by submitting spark job. May be because of spark and kafka's versions mismatch messages were not consumed.
i set up kafka locally on windows and i am able to consume messages from topic.

Related

akka kafka consumer poll speed backpressure

The project uses akka stream kafka, and one process is commencing several kafka topics. When I receive a message from kafka, I am processing the business logic of pre-processing and updating to couchbase. At this time, even if the data generation speed of the kafka producer becomes very fast, the consumer wants to control the speed of the concume. When registering the kafka consumer, you specified options such as "max.pull.records" and "fetch.max.bytes", but the cpu value is abnormally high because the speed of the conclusion is too fast. Is there a way to control the speed of the concume?
Consumer.plainSource(ConsumerConfig, Subscriptions.topics("topicName")
.mapAsync(parallelism, Function)
.delay(Duration.ofNanos(1L), DelayOverflowStrategy.backpressure());
I had no choice but to adjust the consume speed with the code giving delay as above, but it is too slow. Please help me.
2022:04:28 10:19:49.566 INFO --- [akka.actor.consume-dispatcher-25] o.a.k.clients.consumer.ConsumerConfig[logAll:347] : ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [127.0.0.1:9092]
check.crcs = true
client.dns.lookup = default
client.id =
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 10485760
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = group1
group.instance.id = null
heartbeat.interval.ms = 5000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 100
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
it`s my consumer config.

Timeout expired while fetching topic metadata - Kafka

We are trying to consume the broker messages in Kafka hosted in Windows standalone.
Consumer is running in Kubernetes.
server.properties:
listeners=PLAINTEXT://:29092
advertised.listeners=PLAINTEXT://myhostname:29092
listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
Consumer Error:
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1627883198273
[main] INFO XXX.XXX.KafkaConsumerProperties - Kafka Topic Name : table-update
[main] INFO org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-GroupConsumer-1, groupId=GroupConsumer] Subscribed to topic(s): table-update
[main] INFO XX.XX.XXXXX- Could not run Loader: Timeout expired while fetching topic metadata .
Consumer config values :
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [myhostname:29092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = consumer-GroupConsumer-1
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = GroupConsumer
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 52428800
max.poll.interval.ms = 2147483647
max.poll.records = 1000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 120000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 60000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.2
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = XXXX
Please help me in resolving this issue.
While fetching the metadata for topics there are possibilities of multiple error conditions like invalid topic name, metadata leader, offline partitions etc. You can get more info about the errors by logging at debug level. The debug level logs the error condition
log.debug("Topic metadata fetch included errors: {}", errors);

How to minimize Kafka consumer's latency?

I am trying to develop a real time data streaming application with Kafka. Producers send message to broker which resides in different pc in same LAN. Then broker send message to consumer which also resides in different pc in same LAN. If producer,consumer and brokers are running on same pc no problem arrives. But in my scenario consumer's lag increases dramatically. On the other hand when new consumer running on broker machine it's lag value changes between 0 and 10. But remote consumer's lag values increases to 200 after 1 minute also it gets rarely messages like 10 messages between every 5 seconds.
In this scenario i have one consumer, one producer, one broker and one topic with 1 partition. What configurations options can i adjust to minimize latency?
consumer.config:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [192.168.1.47:9092]
check.crcs = true
client.dns.lookup = default
client.id =
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = stream-app
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 1
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
and here producer.config:
acks = 1
batch.size = 16384
bootstrap.servers = [192.168.1.47:9092]
buffer.memory = 33554432
client.dns.lookup = default
client.id =
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 120000
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.StringSerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 2147483647
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
consumer code:
public void startConsume() {
int i=0;
while (true) {
ConsumerRecords<String, byte[]> consumerRecords = kafkaConsumer.poll(Duration.ofMillis(10));
for (ConsumerRecord<String, byte[]> cors : consumerRecords) {
System.out.println(cors.key()+" ..... " +cors.value()+ " ..... " +i);
i++;
}
}
}

Kafka consumer process order with concurrency

I have a producer writing to a Kafka topic with 100 partitions, and it choose the partition by the user ID, therefore user's messages are necessarily being processed by the order they were submitted to the queue.
The service which is responsible for consuming has 2-10 instances, each one has in its configuration:
spring.cloud.stream.bindings.input.consumer.concurrency=10
spring.cloud.stream.bindings.input.consumer.partitioned=true
I recently noticed that although the consumer start to process the partition messages in order, sometimes one message is done before the one after it because it's easier to being processed than next one.
It's important to me to maintain the current processing rate of the service, and because I'm not familiar with the threading model of spring cloud stream I wanted to consult and ask for other's knowledge. What is the best way to ensure that one user's message is being processed only after the previous ones are done?
--EDIT--
As requested, more relevant params.
Kafka version: 0.10.2.1
spring-cloud-stream version: 1.1.0.RELEASE
binder params:
spring.cloud.stream.kafka.bindings.input.consumer.autoCommitOffset=false
spring.cloud.stream.kafka.bindings.input.consumer.autoCommitOnError=true
spring.cloud.stream.kafka.bindings.input.consumer.enableDlq=true
consumer configuration as printed to the console:
2018-12-11 09:56:51,975 [RMI TCP Connection(6)-127.0.0.1] INFO [AbstractConfig::logAll] - ConsumerConfig values:
metric.reporters = []
metadata.max.age.ms = 300000
partition.assignment.strategy = [org.apache.kafka.clients.consumer.RangeAssignor]
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
max.partition.fetch.bytes = 1048576
bootstrap.servers = [localhost:9092]
ssl.keystore.type = JKS
enable.auto.commit = true
sasl.mechanism = GSSAPI
interceptor.classes = null
exclude.internal.topics = true
ssl.truststore.password = null
client.id = consumer-11
ssl.endpoint.identification.algorithm = null
max.poll.records = 2147483647
check.crcs = true
request.timeout.ms = 40000
heartbeat.interval.ms = 3000
auto.commit.interval.ms = 5000
receive.buffer.bytes = 65536
ssl.truststore.type = JKS
ssl.truststore.location = null
ssl.keystore.password = null
fetch.min.bytes = 1
send.buffer.bytes = 131072
value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
group.id =
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.trustmanager.algorithm = PKIX
ssl.key.password = null
fetch.max.wait.ms = 500
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
session.timeout.ms = 30000
metrics.num.samples = 2
key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
ssl.protocol = TLS
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.keystore.location = null
ssl.cipher.suites = null
security.protocol = PLAINTEXT
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
auto.offset.reset = latest
producer config as printed to the console:
2018-12-11 09:56:52,439 [-kafka-listener-1] INFO [AbstractConfig::logAll] - ProducerConfig values:
metric.reporters = []
metadata.max.age.ms = 300000
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
bootstrap.servers = [localhost:9092]
ssl.keystore.type = JKS
sasl.mechanism = GSSAPI
max.block.ms = 60000
interceptor.classes = null
ssl.truststore.password = null
client.id = producer-5
ssl.endpoint.identification.algorithm = null
request.timeout.ms = 30000
acks = 1
receive.buffer.bytes = 32768
ssl.truststore.type = JKS
retries = 0
ssl.truststore.location = null
ssl.keystore.password = null
send.buffer.bytes = 131072
compression.type = none
metadata.fetch.timeout.ms = 60000
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
buffer.memory = 33554432
timeout.ms = 30000
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.trustmanager.algorithm = PKIX
block.on.buffer.full = false
ssl.key.password = null
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
max.in.flight.requests.per.connection = 5
metrics.num.samples = 2
ssl.protocol = TLS
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
batch.size = 16384
ssl.keystore.location = null
ssl.cipher.suites = null
security.protocol = PLAINTEXT
max.request.size = 1048576
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
linger.ms = 0
The partitions are distributed across the container threads.
If the container concurrency is 10 and you have 20 partitions, each consumer (thread) will normally be assigned 2 partitions.
This guarantees delivery order within a partition.

spark-streaming-kafka-0-10 auto.offset.reset is always set to none

Does anyone come across the issue when assign auto.offset.reset->"latest" doesn't affect this property in spark-streaming-kafka 0-10
here is my code:
val config = StreamingConfigHelper.getStreamingConfig()
val kafkaParams = Map[String, Object]("bootstrap.servers" -> config.brokers,
"key.deserializer" -> classOf[ByteArrayDeserializer],
"value.deserializer" -> classOf[ByteArrayDeserializer],
"group.id" -> "prodgroup",
"auto.offset.reset" -> "latest",
"receive.buffer.bytes" -> (65536: java.lang.Integer),
"enable.auto.commit" -> (false: java.lang.Boolean))
val inputDStream = KafkaUtils.createDirectStream[Array[Byte], Array[Byte]](streamingContext, LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[Array[Byte], Array[Byte]](config.productTopic.toArray, kafkaParams))
But when I deploy I get this results
16/11/11 11:03:00 INFO ConsumerConfig: ConsumerConfig values:
metric.reporters = []
metadata.max.age.ms = 300000
partition.assignment.strategy = [org.apache.kafka.clients.consumer.RangeAssignor]
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
max.partition.fetch.bytes = 1048576
bootstrap.servers = [xxxxxxxxx:6667, xxxxxxxx:6667, xxxxxxxxxx:6667]
ssl.keystore.type = JKS
enable.auto.commit = false
sasl.mechanism = GSSAPI
interceptor.classes = null
exclude.internal.topics = true
ssl.truststore.password = null
client.id =
ssl.endpoint.identification.algorithm = null
max.poll.records = 2147483647
check.crcs = true
request.timeout.ms = 40000
heartbeat.interval.ms = 3000
auto.commit.interval.ms = 5000
receive.buffer.bytes = 65536
ssl.truststore.type = JKS
ssl.truststore.location = null
ssl.keystore.password = null
fetch.min.bytes = 1
send.buffer.bytes = 131072
value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
group.id = spark-executor-prodgroup1
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.trustmanager.algorithm = PKIX
ssl.key.password = null
fetch.max.wait.ms = 500
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
session.timeout.ms = 30000
metrics.num.samples = 2
key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
ssl.protocol = TLS
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.keystore.location = null
ssl.cipher.suites = null
security.protocol = PLAINTEXT
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
auto.offset.reset = none
Could you help please?
Thanks
Your settings are overriden in KafkaUtils.fixKafkaParams:
I do not know why yet...
i am not sure why the setting is ignored but you could also try not to set the auto.offset.reset because the default is "latest" in the new consumer configs.
Source of information:
https://kafka.apache.org/documentation#newconsumerconfigs