Tuning ReactiveElasticsearchClient due to ReadTimeoutException - spring-data

We've been experimenting around with the ReactiveElasticsearchRepository however we're running into issues when the service remains idle for several hours and you attempt to retrieve data from Elastic Search that it times out.
What we're seeing when making those first few requests is:
2019-11-06 17:31:35.858 WARN [my-service,,,] 56942 --- [ctor-http-nio-1] r.netty.http.client.HttpClientConnect : [id: 0x8cf5e94d, L:/192.168.1.100:60745 - R:elastic.internal.com/192.168.1.101:9200] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
When I enable DEBUG for reactor.netty, I can see that it goes through the motions of trying each connection in the pool:
2019-11-06 17:31:30.841 DEBUG [my-service,,,] 56942 --- [ctor-http-nio-1] r.n.resources.PooledConnectionProvider : [id: 0x8cf5e94d, L:/192.168.1.100:60745 - R:elastic.internal.com/192.168.1.101:9200] Channel acquired, now 1 active connections and 2 inactive connections
2019-11-06 17:31:35.858 WARN [my-service,,,] 56942 --- [ctor-http-nio-1] r.netty.http.client.HttpClientConnect : [id: 0x8cf5e94d, L:/192.168.1.100:60745 - R:elastic.internal.com/192.168.1.101:9200] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
2019-11-06 17:31:35.881 DEBUG [my-service,,,] 56942 --- [ctor-http-nio-1] r.n.resources.PooledConnectionProvider : [id: 0x8cf5e94d, L:/192.168.1.100:60745 ! R:elastic.internal.com/192.168.1.101:9200] Releasing channel
2019-11-06 17:31:35.891 DEBUG [my-service,,,] 56942 --- [ctor-http-nio-1] r.n.resources.PooledConnectionProvider : [id: 0x8cf5e94d, L:/1192.168.1.100:60745 ! R:elastic.internal.com/192.168.1.101:9200] Channel cleaned, now 0 active connections and 2 inactive connections
2019-11-06 17:32:21.249 DEBUG [my-service,,,] 56942 --- [ctor-http-nio-1] r.n.resources.PooledConnectionProvider : [id: 0x38e99d68, L:/192.168.1.100:60744 - R:elastic.internal.com/192.168.1.101:9200] Channel acquired, now 1 active connections and 1 inactive connections
2019-11-06 17:32:26.251 WARN [my-service,,,] 56942 --- [ctor-http-nio-1] r.netty.http.client.HttpClientConnect : [id: 0x38e99d68, L:/192.168.1.100:60744 - R:elastic.internal.com/192.168.1.101:9200] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
2019-11-06 17:32:26.255 DEBUG [my-service,,,] 56942 --- [ctor-http-nio-1] r.n.resources.PooledConnectionProvider : [id: 0x38e99d68, L:/192.168.1.100:60744 ! R:elastic.internal.com/192.168.1.101:9200] Releasing channel
2019-11-06 17:32:26.256 DEBUG [my-service,,,] 56942 --- [ctor-http-nio-1] r.n.resources.PooledConnectionProvider : [id: 0x38e99d68, L:/192.168.1.100:60744 ! R:elastic.internal.com/192.168.1.101:9200] Channel cleaned, now 0 active connections and 1 inactive connections
2019-11-06 17:32:32.592 DEBUG [my-service,,,] 56942 --- [ctor-http-nio-1] r.n.resources.PooledConnectionProvider : [id: 0xdee3a211, L:/1192.168.1.100:60746 - R:elastic.internal.com/192.168.1.101:9200] Channel acquired, now 1 active connections and 0 inactive connections
2019-11-06 17:32:37.597 WARN [my-service,,,] 56942 --- [ctor-http-nio-1] r.netty.http.client.HttpClientConnect : [id: 0xdee3a211, L:/192.168.1.100:60746 - R:elastic.internal.com/192.168.1.101:9200] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
2019-11-06 17:32:37.600 DEBUG [my-service,,,] 56942 --- [ctor-http-nio-1] r.n.resources.PooledConnectionProvider : [id: 0xdee3a211, L:/192.168.1.100:60746 ! R:elastic.internal.com/192.168.1.101:9200] Releasing channel
2019-11-06 17:32:37.600 DEBUG [my-service,,,] 56942 --- [ctor-http-nio-1] r.n.resources.PooledConnectionProvider : [id: 0xdee3a211, L:/192.168.1.100:60746 ! R:elastic.internal.com/192.168.1.101:9200] Channel cleaned, now 0 active connections and 0 inactive connections
Until eventually all the active / inactive connections have been cleaned and then it re-creates new connections which then work.
Is there a way to tune things behind the scenes, to limit how long a connection can remain in the pool for before being re-created? Or an alternative idea to be able to handle these timeouts.

Related

How to optimize EmbddedKafka and Mongo logs in Spring Boot

how to properly keep only relevant logs when using MongoDB and Kafka in a SpringBoot application
2022-08-02 11:14:58.148 INFO 363923 --- [ main] kafka.server.KafkaConfig : KafkaConfig values:
advertised.listeners = null
alter.config.policy.class.name = null
alter.log.dirs.replication.quota.window.num = 11
alter.log.dirs.replication.quota.window.size.seconds = 1
authorizer.class.name =
auto.create.topics.enable = true
auto.leader.rebalance.enable = true
background.threads = 10
broker.heartbeat.interval.ms = 2000
broker.id = 0
broker.id.generation.enable = true
broker.rack = null
broker.session.timeout.ms = 9000
client.quota.callback.class = null
compression.type = producer
connection.failed.authentication.delay.ms = 100
connections.max.idle.ms = 600000
...
2022-08-02 11:15:11.005 INFO 363923 --- [er-event-thread] state.change.logger : [Controller id=0 epoch=1] Changed partition test_cfr_prv_customeragreement_event_disbursement_ini-0 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=0, leaderEpoch=0, isr=List(0), zkVersion=0)
2022-08-02 11:15:11.005 INFO 363923 --- [er-event-thread] state.change.logger : [Controller id=0 epoch=1] Changed partition test_cfr_prv_customeragreement_event_receipt_ini-0 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=0, leaderEpoch=0, isr=List(0), zkVersion=0)
2022-08-02 11:15:11.017 INFO 363923 --- [er-event-thread] state.change.logger : [Controller id=0 epoch=1] Sending LeaderAndIsr request to broker 0 with 2 become-leader and 0 become-follower partitions
2022-08-02 11:15:11.024 INFO 363923 --- [er-event-thread] state.change.logger : [Controller id=0 epoch=1] Sending UpdateMetadata request to brokers HashSet(0) for 2 partitions
2022-08-02 11:15:11.026 INFO 363923 --- [er-event-thread] state.change.logger : [Controller id=0 epoch=1] Sending UpdateMetadata request to brokers HashSet() for 0 partitions
2022-08-02 11:15:11.028 INFO 363923 --- [quest-handler-0] state.change.logger : [Broker id=0] Handling LeaderAndIsr request correlationId 1 from controller 0 for 2 partitions
example of undesired logs
2022-08-02 11:15:04.578 INFO 363923 --- [ Thread-3] o.s.b.a.mongo.embedded.EmbeddedMongo : {"t":{"$date":"2022-08-02T11:15:04.578+02:00"},"s":"I", "c":"CONTROL", "id":51765, "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"20.04"}}}
2022-08-02 11:15:04.579 INFO 363923 --- [ Thread-3] o.s.b.a.mongo.embedded.EmbeddedMongo : {"t":{"$date":"2022-08-02T11:15:04.578+02:00"},"s":"I", "c":"CONTROL", "id":21951, "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"net":{"bindIp":"127.0.0.1","port":34085},"replication":{"oplogSizeMB":10,"replSet":"rs0"},"security":{"authorization":"disabled"},"storage":{"dbPath":"/tmp/embedmongo-db-66eab1ce-d099-40ec-96fb-f759ef3808a4","syncPeriodSecs":0}}}}
2022-08-02 11:15:04.585 INFO 363923 --- [ Thread-3] o.s.b.a.mongo.embedded.EmbeddedMongo : {"t":{"$date":"2022-08-02T11:15:04.585+02:00"},"s":"I", "c":"STORAGE", "id":22297, "ctx":"initandlisten","msg":"Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem","tags":["startupWarnings"]}
Please find here a link to a sample project github.com/smaillns/springboot-mongo-kafka
If we run a test we'll get a bunch of logs ! What's wrong with the current configuration ?

minikube service is failing to expose URL

F:\Udemy\GitRepo\Kubernetes-Tutorial>kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
my-app-deploy-68698d9757-wrs9z 1/1 Running 0 14m 172.17.0.3 minikube <none> <none>
F:\Udemy\GitRepo\Kubernetes-Tutorial>minikube service my-app-svc
|-----------|------------|-------------|-----------------------------|
| NAMESPACE | NAME | TARGET PORT | URL |
|-----------|------------|-------------|-----------------------------|
| default | my-app-svc | 80 | http://172.30.105.146:30365 |
|-----------|------------|-------------|-----------------------------|
* Opening service default/my-app-svc in default browser...
F:\Udemy\GitRepo\Kubernetes-Tutorial>kubectl describe service my-app-svc
Name: my-app-svc
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=my-app
Type: NodePort
IP: 10.98.9.115
Port: <unset> 80/TCP
TargetPort: 9001/TCP
NodePort: <unset> 30365/TCP
Endpoints: 172.17.0.3:9001
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
F:\Udemy\GitRepo\Kubernetes-Tutorial>kubectl logs my-app-deploy-68698d9757-wrs9z
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v2.3.1.RELEASE)
2021-08-21 13:37:21.046 INFO 1 --- [ main] c.d.d.DockerpublishApplication : Starting DockerpublishApplication v0.0.3 on my-app-deploy-68698d9757-wrs9z with PID 1 (/app.jar started by root in /)
2021-08-21 13:37:21.050 INFO 1 --- [ main] c.d.d.DockerpublishApplication : No active profile set, falling back to default profiles: default
2021-08-21 13:37:22.645 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 9091 (http)
2021-08-21 13:37:22.659 INFO 1 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2021-08-21 13:37:22.660 INFO 1 --- [ main] org.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/9.0.36]
2021-08-21 13:37:22.785 INFO 1 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2021-08-21 13:37:22.785 INFO 1 --- [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 1646 ms
2021-08-21 13:37:23.302 INFO 1 --- [ main] o.s.s.concurrent.ThreadPoolTaskExecutor : Initializing ExecutorService 'applicationTaskExecutor'
2021-08-21 13:37:23.496 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 9091 (http) with context path ''
2021-08-21 13:37:23.510 INFO 1 --- [ main] c.d.d.DockerpublishApplication : Started DockerpublishApplication in 3.279 seconds (JVM running for 4.077)
F:\Udemy\GitRepo\Kubernetes-Tutorial>
Everything seems to be good, but not working well.
Refused to connect issue comes as below for
minikube service my-app-svc
Your service or application is running on different port as you are getting connection refused.
Spring boot running on the 9091 : Tomcat started on port(s): 9091 (http) with context path ''
But your service is redirecting the traffic TargetPort: 9001/TCP
Your target port should be 9091 instead of 9001
Your will access the application over the node port Ip request, which will reach to the K8s service and be forwarded to the TargetPort: 9091/TCP on which the application is running.

Kafka consumer message commit issue

Kafka newbie.
Kafka version: 2.3.1
I am trying to consume Kafka message from two topics using spring cloud. I have not done much configuration apart from kafka binder and some simple config like below. Whenever (Group coordinator lbbb111a.uat.pncint.net:9092 (id: 2147483641 rack: null) is unavailable or invalid, will attempt rediscovery)happen, bunch of message which has already processed is getting processed again. Not sure what is happening.
spring.cloud.stream.kafka.binder.brokers: xxxxx:9094
spring:
cloud:
stream:
default:
group: bbb-bl-kyc
bindings:
input:
destination: bbb.core.sar.blul.events,bbb.core.sar.bluloc.events
contentType: application/json
consumer:
headerMode: embeddedHeaders
spring.kafka.consumer.properties.spring.json.trusted.packages: "*"
spring.cloud.stream.kafka.streams.binder.configuration.commit.interval.ms: 1000
#Custom Serializer configurations to secure data
spring.cloud.stream.kafka.binder.configuration:
key.serializer: org.apache.kafka.common.serialization.StringSerializer
value.serializer: pnc.aop.core.kafka.serialization.MessageSecuredByteArraySerializer
value.deserializer: pnc.aop.core.kafka.serialization.MessageSecuredByteArrayDeserializer
key.deserializer: org.apache.kafka.common.serialization.StringDeserializer
2020-05-29 07:01:11.389 INFO 1 --- [container-0-C-1] p.a.b.k.service.KYCOrchestrationService : Done with Customer xxxx MS call response handling Confm Id: 159073553171893 Appln Id: HSUKQJDJNZNMWVZZ
2020-05-29 07:01:11.393 INFO 1 --- [container-0-C-1] p.a.b.kyc.service.DMSIntegrationService : Message written to the DMS topic successfully 159073553171893
2020-05-29 07:01:11.394 INFO 1 --- [container-0-C-1] p.a.b.k.s.AdminConsoleProducerService : Message written to Admin console Application Log topic successfully Confm Id: 159073553171893 Appln Id: HSUKQJDJNZNMWVZZ
2020-05-30 17:21:13.140 INFO 1 --- [ad | bbb-bl-kyc] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-4, groupId=bbb-bl-kyc] Group coordinator lbbb111a.uat.pncint.net:9092 (id: 2147483641 rack: null) is unavailable or invalid, will attempt rediscovery
2020-05-30 17:21:13.122 INFO 1 --- [ad | bbb-bl-kyc] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=bbb-bl-kyc] Group coordinator lbbb111a.uat.pncint.net:9092 (id: 2147483641 rack: null) is unavailable or invalid, will attempt rediscovery
2020-05-30 17:21:14.522 INFO 1 --- [ad | bbb-bl-kyc] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=bbb-bl-kyc] Discovered group coordinator lbbb111a.uat.pncint.net:9092 (id: 2147483641 rack: null)
2020-05-30 17:21:14.692 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-4, groupId=bbb-bl-kyc] Discovered group coordinator lbbb111a.uat.pncint.net:9092 (id: 2147483641 rack: null)
2020-05-30 17:21:15.151 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-4, groupId=bbb-bl-kyc] Attempt to heartbeat failed for since member id consumer-4-f5a03efd-75cd-425b-94e1-efd3d728d7ca is not valid.
2020-05-30 17:21:15.152 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-4, groupId=bbb-bl-kyc] Revoking previously assigned partitions [bbb.core.sar.bluloc.events-0]
2020-05-30 17:21:15.173 INFO 1 --- [container-0-C-1] o.s.c.s.b.k.KafkaMessageChannelBinder$1 : bbb-bl-kyc: partitions revoked: [bbb.core.sar.bluloc.events-0]
2020-05-30 17:21:15.141 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=bbb-bl-kyc] Attempt to heartbeat failed for since member id consumer-2-52012bae-1b22-4211-b107-803fb3765720 is not valid.
2020-05-30 17:21:15.175 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-4, groupId=bbb-bl-kyc] (Re-)joining group
2020-05-30 17:21:15.176 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=bbb-bl-kyc] Revoking previously assigned partitions [bbb.core.sar.blul.events-0]
2020-05-30 17:21:15.184 INFO 1 --- [container-0-C-1] o.s.c.s.b.k.KafkaMessageChannelBinder$1 : bbb-bl-kyc: partitions revoked: [bbb.core.sar.blul.events-0]
2020-05-30 17:21:15.184 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=bbb-bl-kyc] (Re-)joining group
2020-05-30 17:21:18.200 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-4, groupId=bbb-bl-kyc] Successfully joined group with generation 66
2020-05-30 17:21:18.200 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=bbb-bl-kyc] Successfully joined group with generation 66
2020-05-30 17:21:18.200 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-4, groupId=bbb-bl-kyc] Setting newly assigned partitions: bbb.core.sar.bluloc.events-0
2020-05-30 17:21:18.200 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=bbb-bl-kyc] Setting newly assigned partitions: bbb.core.sar.blul.events-0
2020-05-30 17:21:18.203 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=bbb-bl-kyc] Found no committed offset for partition bbb.core.sar.blul.events-0
2020-05-30 17:21:18.203 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-4, groupId=bbb-bl-kyc] Found no committed offset for partition bbb.core.sar.bluloc.events-0
2020-05-30 17:21:18.537 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.SubscriptionState : [Consumer clientId=consumer-2, groupId=bbb-bl-kyc] Resetting offset for partition bbb.core.sar.blul.events-0 to offset 4.
2020-05-30 17:21:18.538 INFO 1 --- [container-0-C-1] o.a.k.c.c.internals.SubscriptionState : [Consumer clientId=consumer-4, groupId=bbb-bl-kyc] Resetting offset for partition bbb.core.sar.bluloc.events-0 to offset 0.
2020-05-30 17:21:18.621 INFO 1 --- [container-0-C-1] o.s.c.s.b.k.KafkaMessageChannelBinder$1 : bbb-bl-kyc: partitions assigned: [bbb.core.sar.blul.events-0]
2020-05-30 17:21:18.625 INFO 1 --- [container-0-C-1] o.s.c.s.b.k.KafkaMessageChannelBinder$1 : bbb-bl-kyc: partitions assigned: [bbb.core.sar.bluloc.events-0]
2020-05-30 17:21:18.822 INFO 1 --- [container-0-C-1] p.a.b.k.stream.KYCbbbCoreEventsListener : Initiating KYC Orchestration 159071814927374
2020-05-30 17:21:18.826 INFO 1 --- [container-0-C-1] p.a.b.k.stream.KYCbbbCoreEventsListener : Initiating KYC Orchestration null
2020-05-30 17:21:18.928 INFO 1 --- [container-0-C-1] p.a.b.k.s.AdminConsoleProducerService : Message written to Admin console Application topic successfully Confm Id: null Appln Id: XQZ58K3H3XZADTAT
Without changing much of the consumer configurations, you will have at least once delivery semantics.
When the Group Coordinator is temporarly not available your consumer won't be able to commit the messages it processed. After re-joining your consumer will again process same messages (as they were not committed yet) leading to duplicates.
You can find more details on GroupCoordinator and delivery semantics here

Spring Cloud Stream Kafka Stream application shows Resetting offset for partition event-x to offset 0 on every restart

I have a Spring Cloud Stream Kafka Stream application that reads from a topic (event) and performs a simple processing:
#Configuration
class EventKStreamConfiguration {
private val logger = LoggerFactory.getLogger(javaClass)
#StreamListener
fun process(#Input("event") eventStream: KStream<String, EventReceived>) {
eventStream.foreach { key, value ->
logger.info("--------> Processing Event {}", value)
// Save in DB
}
}
}
This application is using a Kafka environment from Confluent Cloud, with an event topic with 6 partitions. The full configuration is:
spring:
application:
name: events-processor
cloud:
stream:
schema-registry-client:
endpoint: ${schema-registry-url:http://localhost:8081}
kafka:
streams:
binder:
brokers: ${kafka-brokers:localhost}
configuration:
application:
id: ${spring.application.name}
default:
key:
serde: org.apache.kafka.common.serialization.Serdes$StringSerde
schema:
registry:
url: ${spring.cloud.stream.schema-registry-client.endpoint}
value:
subject:
name:
strategy: io.confluent.kafka.serializers.subject.RecordNameStrategy
processing:
guarantee: exactly_once
bindings:
event:
consumer:
valueSerde: io.confluent.kafka.streams.serdes.avro.SpecificAvroSerde
bindings:
event:
destination: event
data:
mongodb:
uri: ${mongodb-uri:mongodb://localhost/test}
server:
port: 8085
logging:
level:
org.springframework.kafka.config: debug
---
spring:
profiles: confluent-cloud
cloud:
stream:
kafka:
streams:
binder:
autoCreateTopics: false
configuration:
retry:
backoff:
ms: 500
security:
protocol: SASL_SSL
sasl:
mechanism: PLAIN
jaas:
config: xxx
basic:
auth:
credentials:
source: USER_INFO
schema:
registry:
basic:
auth:
user:
info: yyy
Messages are being correctly processed by the KStream. If I restart the application they are not reprocessed. Note: I don’t want them to be reprocessed, so this behaviour is ok.
However the startup logs show some strange bits:
First it displays the creation of a restore consumer client. with auto offset reset none:
2019-07-19 10:20:17.120 INFO 82473 --- [ main] o.a.k.s.p.internals.StreamThread : stream-thread [events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1] Creating restore consumer client
2019-07-19 10:20:17.123 INFO 82473 --- [ main] o.a.k.clients.consumer.ConsumerConfig : ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = none
Then it creates a consumer client with auto offset reset earliest.
2019-07-19 10:20:17.235 INFO 82473 --- [ main] o.a.k.s.p.internals.StreamThread : stream-thread [events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1] Creating consumer client
2019-07-19 10:20:17.241 INFO 82473 --- [ main] o.a.k.clients.consumer.ConsumerConfig : ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
The final traces of the startup log show an offset reset to 0. This happens on every restart of the application:
2019-07-19 10:20:31.577 INFO 82473 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING
2019-07-19 10:20:31.578 INFO 82473 --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f] State transition from REBALANCING to RUNNING
2019-07-19 10:20:31.669 INFO 82473 --- [events-processor] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1-consumer, groupId=events-processor] Resetting offset for partition event-3 to offset 0.
2019-07-19 10:20:31.669 INFO 82473 --- [events-processor] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1-consumer, groupId=events-processor] Resetting offset for partition event-0 to offset 0.
2019-07-19 10:20:31.669 INFO 82473 --- [events-processor] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1-consumer, groupId=events-processor] Resetting offset for partition event-1 to offset 0.
2019-07-19 10:20:31.669 INFO 82473 --- [events-processor] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1-consumer, groupId=events-processor] Resetting offset for partition event-5 to offset 0.
2019-07-19 10:20:31.670 INFO 82473 --- [events-processor] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=events-processor-9a8069c4-3fb6-4d76-a207-efbbadd52b8f-StreamThread-1-consumer, groupId=events-processor] Resetting offset for partition event-4 to offset 0.
What's the reason why there are two consumers configured?
Why does the second one have auto.offset.reset = earliest when I haven't configured it explicitly and the Kafka default is latest?
I want the default (auto.offset.reset = latest) behaviour and it seems to be working fine. However, doesn't it contradict what I see in the logs?
UPDATE:
I would rephrase the third question like this: Why do the logs show that the partitions are being reseted to 0 on every restart and in spite of it no messages are redelivered to the KStream?
UPDATE 2:
I've simplified the scenario, this time with a native Kafka Streams application. The behaviour is exactly the same as observed with Spring Cloud Stream. However, inspecting the consumer-group and the partitions I've found it kind of makes sense.
KStream:
fun main() {
val props = Properties()
props[StreamsConfig.APPLICATION_ID_CONFIG] = "streams-wordcount"
props[StreamsConfig.BOOTSTRAP_SERVERS_CONFIG] = "localhost:9092"
props[StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG] = 0
props[StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG] = Serdes.String().javaClass.name
props[StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG] = Serdes.String().javaClass.name
val builder = StreamsBuilder()
val source = builder.stream<String, String>("streams-plaintext-input")
source.foreach { key, value -> println("$key $value") }
val streams = KafkaStreams(builder.build(), props)
val latch = CountDownLatch(1)
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(object : Thread("streams-wordcount-shutdown-hook") {
override fun run() {
streams.close()
latch.countDown()
}
})
try {
streams.start()
latch.await()
} catch (e: Throwable) {
exitProcess(1)
}
exitProcess(0)
}
This is what I've seen:
1) With an empty topic, the startup shows a resetting of all partitions to offset 0:
07:55:03.885 [streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-2 to offset 0.
07:55:03.886 [streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-3 to offset 0.
07:55:03.886 [streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-0 to offset 0.
07:55:03.886 [streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-1 to offset 0.
07:55:03.886 [streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-4 to offset 0.
07:55:03.886 [streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-3549a54e-49db-4490-bd9f-7156e972021a-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-5 to offset 0
2) I put one message in the topic and inspect the consumer group, seeing that the record is in partition 4:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
streams-plaintext-input 0 - 0 - streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer-905a307a-4c49-4d8b-ac2e-5525ba2e8a8e /127.0.0.1 streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer
streams-plaintext-input 5 - 0 - streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer-905a307a-4c49-4d8b-ac2e-5525ba2e8a8e /127.0.0.1 streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer
streams-plaintext-input 1 - 0 - streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer-905a307a-4c49-4d8b-ac2e-5525ba2e8a8e /127.0.0.1 streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer
streams-plaintext-input 2 - 0 - streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer-905a307a-4c49-4d8b-ac2e-5525ba2e8a8e /127.0.0.1 streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer
streams-plaintext-input 3 - 0 - streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer-905a307a-4c49-4d8b-ac2e-5525ba2e8a8e /127.0.0.1 streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer
streams-plaintext-input 4 1 1 0 streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer-905a307a-4c49-4d8b-ac2e-5525ba2e8a8e /127.0.0.1 streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer
3) I restart the application. Now the resetting only affects the empty partitions (0, 1, 2, 3, 5):
07:57:39.477 [streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-2 to offset 0.
07:57:39.478 [streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-3 to offset 0.
07:57:39.478 [streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-0 to offset 0.
07:57:39.479 [streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-1 to offset 0.
07:57:39.479 [streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-b1565eca-7d80-4550-97d2-e78ead62a840-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-5 to offset 0.
4) I insert another message, inspect the consumer group state and the same thing happens: the record is in partition 2 and when restarting the application it only resets the empty partitions (0, 1, 3, 5):
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
streams-plaintext-input 0 - 0 - streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer-cb04e2bd-598f-455f-b913-1370b4144dd6 /127.0.0.1 streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer
streams-plaintext-input 5 - 0 - streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer-cb04e2bd-598f-455f-b913-1370b4144dd6 /127.0.0.1 streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer
streams-plaintext-input 1 - 0 - streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer-cb04e2bd-598f-455f-b913-1370b4144dd6 /127.0.0.1 streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer
streams-plaintext-input 2 1 1 0 streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer-cb04e2bd-598f-455f-b913-1370b4144dd6 /127.0.0.1 streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer
streams-plaintext-input 3 - 0 - streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer-cb04e2bd-598f-455f-b913-1370b4144dd6 /127.0.0.1 streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer
streams-plaintext-input 4 1 1 0 streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer-cb04e2bd-598f-455f-b913-1370b4144dd6 /127.0.0.1 streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer
08:00:42.313 [streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-3 to offset 0.
08:00:42.314 [streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-0 to offset 0.
08:00:42.314 [streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-1 to offset 0.
08:00:42.314 [streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=streams-wordcount-addb08ed-62ce-47f9-a446-f2ee0592c53d-StreamThread-1-consumer, groupId=streams-wordcount] Resetting offset for partition streams-plaintext-input-5 to offset 0.
What's the reason why there are two consumers configured?
Restore Consumer Client is a dedicated consumer for fault tolerance and state management. It is the responsible for restoring the state from the changelog topics. It is displayed seperately from the application consumer client. You can find more information here :
https://docs.confluent.io/current/streams/monitoring.html#kafka-restore-consumer-client-id
Why does the second one have auto.offset.reset = earliest when I haven't configured it explicitly and the Kafka default is latest?
You are right, auto.offset.reset default value is latest in Kafka Consumer. But in Spring Cloud Stream, default value for consumer startOffset is earliest. Hence it shows earliest in second consumer. Also it depends on spring.cloud.stream.bindings.<channelName>.group binding. If it is set explicitly, then startOffset is set to earliest, otherwise it is set to latest for anonymous consumer.
Reference : Spring Cloud Stream Kafka Consumer Properties
I want the default (auto.offset.reset = latest) behaviour and it
seems to be working fine. However, doesn't it contradict what I see in
the logs?
In case of anonymous consumer group, the default value for startOffset will be latest.

Running docker compose causes "Connection to localhost:5432 refused." exception

I've looked at SO posts related to this questions here, here, here, and here but I haven't had any luck with the fixes proposed. Whenever I run the command docker-compose -f stack.yml up I receive the following stack trace:
Attaching to weg-api_db_1, weg-api_weg-api_1
db_1 | 2018-07-04 14:57:15.384 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
db_1 | 2018-07-04 14:57:15.384 UTC [1] LOG: listening on IPv6 address "::", port 5432
db_1 | 2018-07-04 14:57:15.388 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db_1 | 2018-07-04 14:57:15.402 UTC [23] LOG: database system was interrupted; last known up at 2018-07-04 14:45:24 UTC
db_1 | 2018-07-04 14:57:15.513 UTC [23] LOG: database system was not properly shut down; automatic recovery in progress
db_1 | 2018-07-04 14:57:15.515 UTC [23] LOG: redo starts at 0/16341E0
db_1 | 2018-07-04 14:57:15.515 UTC [23] LOG: invalid record length at 0/1634218: wanted 24, got 0
db_1 | 2018-07-04 14:57:15.515 UTC [23] LOG: redo done at 0/16341E0
db_1 | 2018-07-04 14:57:15.525 UTC [1] LOG: database system is ready to accept connections
weg-api_1 |
weg-api_1 | . ____ _ __ _ _
weg-api_1 | /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
weg-api_1 | ( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
weg-api_1 | \\/ ___)| |_)| | | | | || (_| | ) ) ) )
weg-api_1 | ' |____| .__|_| |_|_| |_\__, | / / / /
weg-api_1 | =========|_|==============|___/=/_/_/_/
weg-api_1 | :: Spring Boot :: (v1.5.3.RELEASE)
weg-api_1 |
weg-api_1 | 2018-07-04 14:57:16.908 INFO 7 --- [ main] api.ApiKt : Starting ApiKt v0.0.1-SNAPSHOT on f9c58f4f2f27 with PID 7 (/app/spring-jpa-postgresql-spring-boot-0.0.1-SNAPSHOT.jar started by root in /app)
weg-api_1 | 2018-07-04 14:57:16.913 INFO 7 --- [ main] api.ApiKt : No active profile set, falling back to default profiles: default
weg-api_1 | 2018-07-04 14:57:17.008 INFO 7 --- [ main] ationConfigEmbeddedWebApplicationContext : Refreshing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext#6e5e91e4: startup date [Wed Jul 04 14:57:17 GMT 2018]; root of context hierarchy
weg-api_1 | 2018-07-04 14:57:19.082 INFO 7 --- [ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat initialized with port(s): 8080 (http)
weg-api_1 | 2018-07-04 14:57:19.102 INFO 7 --- [ main] o.apache.catalina.core.StandardService : Starting service Tomcat
weg-api_1 | 2018-07-04 14:57:19.104 INFO 7 --- [ main] org.apache.catalina.core.StandardEngine : Starting Servlet Engine: Apache Tomcat/8.5.14
weg-api_1 | 2018-07-04 14:57:19.215 INFO 7 --- [ost-startStop-1] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
weg-api_1 | 2018-07-04 14:57:19.215 INFO 7 --- [ost-startStop-1] o.s.web.context.ContextLoader : Root WebApplicationContext: initialization completed in 2211 ms
weg-api_1 | 2018-07-04 14:57:19.370 INFO 7 --- [ost-startStop-1] o.s.b.w.servlet.ServletRegistrationBean : Mapping servlet: 'dispatcherServlet' to [/]
weg-api_1 | 2018-07-04 14:57:19.375 INFO 7 --- [ost-startStop-1] o.s.b.w.servlet.FilterRegistrationBean : Mapping filter: 'characterEncodingFilter' to: [/*]
weg-api_1 | 2018-07-04 14:57:19.376 INFO 7 --- [ost-startStop-1] o.s.b.w.servlet.FilterRegistrationBean : Mapping filter: 'hiddenHttpMethodFilter' to: [/*]
weg-api_1 | 2018-07-04 14:57:19.376 INFO 7 --- [ost-startStop-1] o.s.b.w.servlet.FilterRegistrationBean : Mapping filter: 'httpPutFormContentFilter' to: [/*]
weg-api_1 | 2018-07-04 14:57:19.376 INFO 7 --- [ost-startStop-1] o.s.b.w.servlet.FilterRegistrationBean : Mapping filter: 'requestContextFilter' to: [/*]
weg-api_1 | 2018-07-04 14:57:19.867 ERROR 7 --- [ main] o.a.tomcat.jdbc.pool.ConnectionPool : Unable to create initial connections of pool.
weg-api_1 |
weg-api_1 | org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
I thought that my .yml file was brain-dead-simple, but I must be missing something vital for the internal routing between the two containers to fail.
EDIT
My stack.yml is below:
version: '3'
services:
db:
image: postgres
restart: always
container_name: db
environment:
POSTGRES_USER: root
POSTGRES_PASSWORD: password
POSTGRES_DB: weg
ports:
- "5432:5432"
weg-api:
image: weg-api
restart: always
container_name: weg-api
ports:
- "8080:8080"
depends_on:
- "db"
EDIT
My Springboot application properties are below:
spring.datasource.url=jdbc:postgresql://db:5432/weg
spring.datasource.username=root
spring.datasource.password=password
spring.jpa.generate-ddl=true
I'm at a loss as to how to proceed.
Your database is running on db container, not on your localhost inside your weg-api container. Therefore, you have to change
spring.datasource.url=jdbc:postgresql://localhost:5432/weg
to
spring.datasource.url=jdbc:postgresql://db:5432/weg
I would also suggest you give container_name to each of your containers to be sure the container names are always same. Otherwise you might sometimes get different names depending on your configuration.
version: '3'
services:
db:
image: postgres
restart: always
container_name: db
environment:
POSTGRES_USER: root
POSTGRES_PASSWORD: password
POSTGRES_DB: weg
ports:
- "5432:5432"
weg-api:
image: weg-api
restart: always
container_name: weg-api
ports:
- "8080:8080"
depends_on:
- "db"