Debezium server error: java.lang.OutOfMemoryError: Java heap space - mongodb

Debezium server: v 1.9.0.Final
MongoDB Atlas: v 4.2.20
Running on AWS ECS with Fargate w/ 1GB CPU & 4GB MEMORY
Overview:
Debezium starts an initial snapshot and it sends some data to kinesis but it runs into an error (below) before it finishes the snapshot. I've tried increasing the memory of the container to 4GB but not sure if that's the issue. The one collection I'm testing this with is 28GB total and 11M documents.
Debezium config (in Terraform):
environment = [
{
"name" : "DEBEZIUM_SINK_TYPE",
"value" : "kinesis"
},
{
"name" : "DEBEZIUM_SINK_KINESIS_REGION",
"value" : "us-east-1"
},
{
"name" : "DEBEZIUM_SINK_KINESIS_CREDENTIALS_PROFILE",
"value" : "default"
},
{
"name" : "DEBEZIUM_SINK_KINESIS_ENDPOINT",
"value" : "https://kinesis.us-east-1.amazonaws.com"
},
{
"name" : "DEBEZIUM_SOURCE_CONNECTOR_CLASS",
"value" : "io.debezium.connector.mongodb.MongoDbConnector"
},
{
"name" : "DEBEZIUM_SOURCE_OFFSET_STORAGE_FILE_FILENAME",
"value" : "data/offsets.dat"
},
{
"name" : "DEBEZIUM_SOURCE_OFFSET_FLUSH_INTERVAL_MS",
"value" : "0"
},
{
"name" : "DEBEZIUM_SOURCE_MONGODB_NAME",
"value" : "test"
},
{
"name" : "DEBEZIUM_SOURCE_MONGODB_HOSTS",
"value" : "test-mongodb-shard-00-00.test.mongodb.net:27017,test-mongodb-shard-00-01.test.mongodb.net:27017,test-mongodb-shard-00-02.test.mongodb.net:27017,test-mongodb-i-00-00.test.mongodb.net:27017"
},
{
"name" : "DEBEZIUM_SOURCE_MONGODB_SSL_ENABLED",
"value" : "true"
},
{
"name" : "DEBEZIUM_SOURCE_MONGODB_MEMBERS_AUTO_DISCOVER",
"value" : "true"
},
{
"name" : "DEBEZIUM_SOURCE_DATABASE_INCLUDE_LIST",
"value" : "test"
},
{
"name" : "DEBEZIUM_SOURCE_COLLECTION_INCLUDE_LIST",
"value" : "test.testCollection"
},
{
"name" : "DEBEZIUM_SOURCE_CAPTURE_MODE",
"value" : "change_streams_update_full"
},
{
"name" : "DEBEZIUM_SOURCE_DATABASE_HISTORY",
"value" : "io.debezium.relational.history.FileDatabaseHistory"
},
{
"name" : "DEBEZIUM_SOURCE_DATABASE_HISTORY_FILE_FILENAME",
"value" : "history.dat"
},
{
"name" : "QUARKUS_LOG_CONSOLE_JSON",
"value" : "false"
}
]
secrets = [
{
"name" : "DEBEZIUM_SOURCE_MONGODB_USER",
"valueFrom" : "${data.aws_secretsmanager_secret.test-debezium-read.arn}:username::"
},
{
"name" : "DEBEZIUM_SOURCE_MONGODB_PASSWORD",
"valueFrom" : "${data.aws_secretsmanager_secret.test-debezium-read.arn}:password::"
}
]
Stacktrace:
2022-06-01 18:22:23,976 ERROR [io.deb.con.mon.MongoDbSnapshotChangeEventSource] (debezium-mongodbconnector-test-replicator-snapshot-0) Error while attempting to sync 'test-mongodb-shard-0.test.testCollection': : java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172)
at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:538)
at java.base/java.lang.StringBuffer.append(StringBuffer.java:317)
at java.base/java.io.StringWriter.write(StringWriter.java:106)
at org.bson.json.StrictCharacterStreamJsonWriter.write(StrictCharacterStreamJsonWriter.java:368)
at org.bson.json.StrictCharacterStreamJsonWriter.writeStartObject(StrictCharacterStreamJsonWriter.java:204)
at org.bson.json.LegacyExtendedJsonDateTimeConverter.convert(LegacyExtendedJsonDateTimeConverter.java:22)
at org.bson.json.LegacyExtendedJsonDateTimeConverter.convert(LegacyExtendedJsonDateTimeConverter.java:19)
at org.bson.json.JsonWriter.doWriteDateTime(JsonWriter.java:129)
at org.bson.AbstractBsonWriter.writeDateTime(AbstractBsonWriter.java:394)
at org.bson.codecs.DateCodec.encode(DateCodec.java:32)
at org.bson.codecs.DateCodec.encode(DateCodec.java:29)
at org.bson.codecs.EncoderContext.encodeWithChildContext(EncoderContext.java:91)
at org.bson.codecs.DocumentCodec.writeValue(DocumentCodec.java:203)
at org.bson.codecs.DocumentCodec.writeMap(DocumentCodec.java:217)
at org.bson.codecs.DocumentCodec.writeValue(DocumentCodec.java:200)
at org.bson.codecs.DocumentCodec.writeMap(DocumentCodec.java:217)
at org.bson.codecs.DocumentCodec.writeValue(DocumentCodec.java:200)
at org.bson.codecs.DocumentCodec.writeMap(DocumentCodec.java:217)
at org.bson.codecs.DocumentCodec.encode(DocumentCodec.java:159)
at org.bson.codecs.DocumentCodec.encode(DocumentCodec.java:46)
at org.bson.Document.toJson(Document.java:453)
at io.debezium.connector.mongodb.JsonSerialization.lambda$new$0(JsonSerialization.java:57)
at io.debezium.connector.mongodb.JsonSerialization$$Lambda$521/0x0000000840448840.apply(Unknown Source)
at io.debezium.connector.mongodb.JsonSerialization.getDocumentValue(JsonSerialization.java:89)
at io.debezium.connector.mongodb.MongoDbSchema$$Lambda$580/0x00000008404ce840.apply(Unknown Source)
at io.debezium.connector.mongodb.MongoDbCollectionSchema.valueFromDocumentOplog(MongoDbCollectionSchema.java:90)
at io.debezium.connector.mongodb.MongoDbChangeSnapshotOplogRecordEmitter.emitReadRecord(MongoDbChangeSnapshotOplogRecordEmitter.java:68)
at io.debezium.connector.mongodb.MongoDbChangeSnapshotOplogRecordEmitter.emitReadRecord(MongoDbChangeSnapshotOplogRecordEmitter.java:27)
at io.debezium.pipeline.AbstractChangeRecordEmitter.emitChangeRecords(AbstractChangeRecordEmitter.java:42)
at io.debezium.pipeline.EventDispatcher.dispatchSnapshotEvent(EventDispatcher.java:163)
I noticed that during the snapshot, the number of records sent and the last recorded offset doesn't seem to change while the amount of time elapsed between each of those messages gets longer. This seems like an exponential backoff thing but I'm not entirely sure.
Example:
2022-06-01 16:20:37,789 INFO [io.deb.con.mon.MongoDbSnapshotChangeEventSource] (debezium-mongodbconnector-test-replicator-snapshot-0) Beginning snapshot of 'test-mongodb-shard-0' at {sec=1654100437, ord=138, initsync=true, h=0}
2022-06-01 16:20:37,804 INFO [io.deb.con.mon.MongoDbSnapshotChangeEventSource] (debezium-mongodbconnector-test-replicator-snapshot-0) Exporting data for collection 'test-mongodb-shard-0.test.testCollection'
2022-06-01 16:20:42,983 INFO [io.deb.con.com.BaseSourceTask] (pool-7-thread-1) 717 records sent during previous 00:00:06.159, last recorded offset: {sec=1654100437, ord=138, initsync=true, h=0}
2022-06-01 16:20:57,417 INFO [io.deb.con.com.BaseSourceTask] (pool-7-thread-1) 2048 records sent during previous 00:00:14.434, last recorded offset: {sec=1654100437, ord=138, initsync=true, h=0}
2022-06-01 16:21:05,107 INFO [io.deb.con.mon.ReplicaSetDiscovery] (debezium-mongodbconnector-test-replica-set-monitor) Checking current members of replica set at test-mongodb-shard-00-00.test.mongodb.net:27017,test-mongodb-shard-00-01.test.mongodb.net:27017,test-mongodb-shard-00-02.test.mongodb.net:27017,test-mongodb-i-00-00.test.mongodb.net:27017
2022-06-01 16:21:16,624 INFO [io.deb.con.com.BaseSourceTask] (pool-7-thread-1) 2048 records sent during previous 00:00:19.207, last recorded offset: {sec=1654100437, ord=138, initsync=true, h=0}
2022-06-01 16:21:35,107 INFO [io.deb.con.mon.ReplicaSetDiscovery] (debezium-mongodbconnector-test-replica-set-monitor) Checking current members of replica set at test-mongodb-shard-00-00.test.mongodb.net:27017,test-mongodb-shard-00-01.test.mongodb.net:27017,test-mongodb-shard-00-02.test.mongodb.net:27017,test-mongodb-i-00-00.test.mongodb.net:27017
2022-06-01 16:21:53,130 INFO [io.deb.con.com.BaseSourceTask] (pool-7-thread-1) 2048 records sent during previous 00:00:36.505, last recorded offset: {sec=1654100437, ord=138, initsync=true, h=0}
2022-06-01 16:22:05,107 INFO [io.deb.con.mon.ReplicaSetDiscovery] (debezium-mongodbconnector-test-replica-set-monitor) Checking current members of replica set at test-mongodb-shard-00-00.test.mongodb.net:27017,test-mongodb-shard-00-01.test.mongodb.net:27017,test-mongodb-shard-00-02.test.mongodb.net:27017,test-mongodb-i-00-00.test.mongodb.net:27017
...
2022-06-01 16:23:17,521 INFO [io.deb.con.com.BaseSourceTask] (pool-7-thread-1) 2048 records sent during previous 00:01:24.391, last recorded offset: {sec=1654100437, ord=138, initsync=true, h=0}
2022-06-01 16:23:35,106 INFO [io.deb.con.mon.ReplicaSetDiscovery] (debezium-mongodbconnector-test-replica-set-monitor) Checking current members of replica set at test-mongodb-shard-00-00.test.mongodb.net:27017,test-mongodb-shard-00-01.test.mongodb.net:27017,test-mongodb-shard-00-02.test.mongodb.net:27017,test-mongodb-i-00-00.test.mongodb.net:27017
...
2022-06-01 16:26:06,523 INFO [io.deb.con.com.BaseSourceTask] (pool-7-thread-1) 2048 records sent during previous 00:02:49.003, last recorded offset: {sec=1654100437, ord=138, initsync=true, h=0}
2022-06-01 16:26:35,107 INFO [io.deb.con.mon.ReplicaSetDiscovery] (debezium-mongodbconnector-test-replica-set-monitor) Checking current members of replica set at test-mongodb-shard-00-00.test.mongodb.net:27017,test-mongodb-shard-00-01.test.mongodb.net:27017,test-mongodb-shard-00-02.test.mongodb.net:27017,test-mongodb-i-00-00.test.mongodb.net:27017
...
2022-06-01 16:31:18,075 INFO [io.deb.con.com.BaseSourceTask] (pool-7-thread-1) 2048 records sent during previous 00:05:11.552, last recorded offset: {sec=1654100437, ord=138, initsync=true, h=0}
2022-06-01 16:31:35,106 INFO [io.deb.con.mon.ReplicaSetDiscovery] (debezium-mongodbconnector-test-replica-set-monitor) Checking current members of replica set at test-mongodb-shard-00-00.test.mongodb.net:27017,test-mongodb-shard-00-01.test.mongodb.net:27017,test-mongodb-shard-00-02.test.mongodb.net:27017,test-mongodb-i-00-00.test.mongodb.net:27017
...
2022-06-01 16:42:07,711 INFO [io.deb.con.com.BaseSourceTask] (pool-7-thread-1) 2048 records sent during previous 00:10:49.636, last recorded offset: {sec=1654100437, ord=138, initsync=true, h=0}
2022-06-01 16:42:35,106 INFO [io.deb.con.mon.ReplicaSetDiscovery] (debezium-mongodbconnector-test-replica-set-monitor) Checking current members of replica set at test-mongodb-shard-00-00.test.mongodb.net:27017,test-mongodb-shard-00-01.test.mongodb.net:27017,test-mongodb-shard-00-02.test.mongodb.net:27017,test-mongodb-i-00-00.test.mongodb.net:27017
...
2022-06-01 17:03:12,872 INFO [io.deb.con.com.BaseSourceTask] (pool-7-thread-1) 2048 records sent during previous 00:21:05.161, last recorded offset: {sec=1654100437, ord=138, initsync=true, h=0}
2022-06-01 17:03:35,117 INFO [io.deb.con.mon.ReplicaSetDiscovery] (debezium-mongodbconnector-test-replica-set-monitor) Checking current members of replica set at test-mongodb-shard-00-00.test.mongodb.net:27017,test-mongodb-shard-00-01.test.mongodb.net:27017,test-mongodb-shard-00-02.test.mongodb.net:27017,test-mongodb-i-00-00.test.mongodb.net:27017
...
2022-06-01 17:45:58,637 INFO [io.deb.con.com.BaseSourceTask] (pool-7-thread-1) 2048 records sent during previous 00:42:45.765, last recorded offset: {sec=1654100437, ord=138, initsync=true, h=0}
2022-06-01 17:46:05,106 INFO [io.deb.con.mon.ReplicaSetDiscovery] (debezium-mongodbconnector-test-replica-set-monitor) Checking current members of replica set at test-mongodb-shard-00-00.test.mongodb.net:27017,test-mongodb-shard-00-01.test.mongodb.net:27017,test-mongodb-shard-00-02.test.mongodb.net:27017,test-mongodb-i-00-00.test.mongodb.net:27017
...
2022-06-01 18:22:23,976 ERROR [io.deb.con.mon.MongoDbSnapshotChangeEventSource] (debezium-mongodbconnector-test-replicator-snapshot-0) Error while attempting to sync 'test-mongodb-shard-0.test.testCollection': : java.lang.OutOfMemoryError: Java heap space

Beside increasing the container memory to 4GB , you can set also bigger heap size , the maximum and initial heap size can be set for example to 2GB :
-Xms2048m -Xmx2048m
If the issue continue follow this steps:
Start JVM with argument -XX:+HeapDumpOnOutOfMemoryError this will give you a heap dump when program goes in to OOM.
Use a tool like visualVM to analyze the heap dump obtained. That will help in identifying the memory leak.

Not really answering the OP, just wanted to share my experience.
I too occasionally receive java.lang.OutOfMemoryError and would like to find out what's causing it.
My setup:
Debezium 1.9.5
Kafka 2.8
Docker container memory - 6Gi
Java heap - 4Gi both min and max
max.queue.size.in.bytes - 512Mi
max.batch.size - 16384
The errors from stdout:
2022-07-20T16:47:16.348943181Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "KafkaBasedLog Work Thread - cdc.config"
2022-07-20T16:47:27.628395682Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "KafkaBasedLog Work Thread - cdc.status"
2022-07-20T16:47:28.970536167Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-producer-network-thread | connector-producer-REDACTED-202207200823-0"
2022-07-20T16:47:33.787361085Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-producer-network-thread | producer-3"
2022-07-20T16:47:45.067373810Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "server-timer"
2022-07-20T16:47:46.987669188Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-producer-network-thread | REDACTED-dbhistory"
2022-07-20T16:48:03.396881812Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-producer-network-thread | producer-2"
2022-07-20T16:48:04.017710798Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-coordinator-heartbeat-thread | production"
2022-07-20T16:48:09.709036280Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "prometheus-http-1-3"
2022-07-20T16:48:14.667691706Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "mysql-cj-abandoned-connection-cleanup"
2022-07-20T16:48:17.182623196Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp1890777616-62"
2022-07-20T16:48:25.227925660Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp1890777616-58"
2022-07-20T16:48:43.598026645Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "HTTP-Dispatcher"
2022-07-20T16:48:45.543984655Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-producer-network-thread | producer-1"
2022-07-20T16:48:52.284810255Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "SourceTaskOffsetCommitter-1"
2022-07-20T16:48:56.992674380Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "DistributedHerder-connect-1-1"
2022-07-20T16:49:18.691603140Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Session-HouseKeeper-47a3d56a-1"
2022-07-20T16:49:19.350459393Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "KafkaBasedLog Work Thread - cdc.offset"
2022-07-20T16:49:26.256350455Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-admin-client-thread | adminclient-8"
2022-07-20T16:49:33.154845201Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp1890777616-59"
2022-07-20T16:49:34.414951745Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp1890777616-60"
2022-07-20T16:49:40.871967276Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp1890777616-61"
2022-07-20T16:49:56.007111292Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "debezium-sqlserverconnector-REDACTED-change-event-source-coordinator"
2022-07-20T16:50:00.410800756Z Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-admin-client-thread | connector-adminclient-REDACTED-0"
Some context:
I transfer a table with rows of ~1000 bytes each. There is a useful max.queue.size.in.bytes setting which I expect to work as a top bound for the connector, but my dashboard shows that the queue size goes only as much as 122Mi fluctuating between 30Mi and 60Mi most of the time. My calculations are 1000*16384 should never reach 512Mi.
In this particular case, the connector had been working for about 3 hours normally and then there was no new data for streaming for a few minutes. Soon after that, the OutOfMemory appeared.
I also notice that every time it happens, the container takes its maximum CPU allowed (which is 4 cores in my case).

Related

Mongock does not run changeunit in kotlin project

I have Java + maven project - https://github.com/petersuchy/mongock-test-java based on Mongock reactive example (https://github.com/mongock/mongock-examples/tree/master/mongodb/springboot-reactive)
And everything works well.
I tried to migrate that project into Kotlin + gradle - https://github.com/petersuchy/mongock-test-kotlin
And I am able to run it. But my ChangeUnit is ignored. Mongock is set up properly beacause in the end I have 2 collections created - mongockLock and mongockChangeLog.
I tried to get rid of #Value annotations in MongockConfig and MongoClientConfig, but there was no change in behaviour.
Can you please point me why is this happening? I think it can be something with these Reflections becasue it is only difference in logs.
Kotlin:
2023-02-12T00:49:58.455+01:00 INFO 80854 --- [ main] i.m.r.c.e.system.SystemUpdateExecutor : Mongock has finished the system update execution
2023-02-12T00:49:58.457+01:00 INFO 80854 --- [ main] org.reflections.Reflections : Reflections took 0 ms to scan 0 urls, producing 0 keys and 0 values
2023-02-12T00:49:58.458+01:00 INFO 80854 --- [ main] org.reflections.Reflections : Reflections took 1 ms to scan 0 urls, producing 0 keys and 0 values
2023-02-12T00:49:58.458+01:00 INFO 80854 --- [ main] i.m.r.c.e.o.migrate.MigrateExecutorBase : Mongock skipping the data migration. There is no change set item.
Java:
2023-02-12T00:29:48.064+01:00 INFO 78548 --- [ main] i.m.r.c.e.system.SystemUpdateExecutor : Mongock has finished the system update execution
2023-02-12T00:29:48.072+01:00 INFO 78548 --- [ main] org.reflections.Reflections : Reflections took 6 ms to scan 1 urls, producing 1 keys and 2 values
2023-02-12T00:29:48.075+01:00 INFO 78548 --- [ main] org.reflections.Reflections : Reflections took 3 ms to scan 1 urls, producing 1 keys and 2 values
2023-02-12T00:29:48.081+01:00 INFO 78548 --- [ main] i.m.driver.core.lock.LockManagerDefault : Mongock trying to acquire the lock
Here is the full log from Kotlin project
2023-02-12T00:49:55.863+01:00 INFO 80854 --- [ main] c.e.m.MongockTestKotlinApplicationKt : No active profile set, falling back to 1 default profile: "default"
2023-02-12T00:49:56.764+01:00 INFO 80854 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data Reactive MongoDB repositories in DEFAULT mode.
2023-02-12T00:49:57.019+01:00 INFO 80854 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 246 ms. Found 1 Reactive MongoDB repository interfaces.
2023-02-12T00:49:57.919+01:00 INFO 80854 --- [ main] org.mongodb.driver.client : MongoClient with metadata {"driver": {"name": "mongo-java-driver|reactive-streams|spring-boot", "version": "4.8.2"}, "os": {"type": "Linux", "name": "Linux", "architecture": "amd64", "version": "5.15.0-60-generic"}, "platform": "Java/Private Build/17.0.5+8-Ubuntu-2ubuntu122.04"} created with settings MongoClientSettings{readPreference=primary, writeConcern=WriteConcern{w=null, wTimeout=null ms, journal=null}, retryWrites=true, retryReads=true, readConcern=ReadConcern{level=null}, credential=null, streamFactoryFactory=NettyStreamFactoryFactory{eventLoopGroup=io.netty.channel.nio.NioEventLoopGroup#631cb129, socketChannelClass=class io.netty.channel.socket.nio.NioSocketChannel, allocator=PooledByteBufAllocator(directByDefault: true), sslContext=null}, commandListeners=[], codecRegistry=ProvidersCodecRegistry{codecProviders=[ValueCodecProvider{}, BsonValueCodecProvider{}, DBRefCodecProvider{}, DBObjectCodecProvider{}, DocumentCodecProvider{}, CollectionCodecProvider{}, IterableCodecProvider{}, MapCodecProvider{}, GeoJsonCodecProvider{}, GridFSFileCodecProvider{}, Jsr310CodecProvider{}, JsonObjectCodecProvider{}, BsonCodecProvider{}, EnumCodecProvider{}, com.mongodb.Jep395RecordCodecProvider#3d20e575]}, clusterSettings={hosts=[localhost:27017], srvServiceName=mongodb, mode=SINGLE, requiredClusterType=UNKNOWN, requiredReplicaSetName='null', serverSelector='null', clusterListeners='[]', serverSelectionTimeout='30000 ms', localThreshold='30000 ms'}, socketSettings=SocketSettings{connectTimeoutMS=10000, readTimeoutMS=0, receiveBufferSize=0, sendBufferSize=0}, heartbeatSocketSettings=SocketSettings{connectTimeoutMS=10000, readTimeoutMS=10000, receiveBufferSize=0, sendBufferSize=0}, connectionPoolSettings=ConnectionPoolSettings{maxSize=100, minSize=0, maxWaitTimeMS=120000, maxConnectionLifeTimeMS=0, maxConnectionIdleTimeMS=0, maintenanceInitialDelayMS=0, maintenanceFrequencyMS=60000, connectionPoolListeners=[], maxConnecting=2}, serverSettings=ServerSettings{heartbeatFrequencyMS=10000, minHeartbeatFrequencyMS=500, serverListeners='[]', serverMonitorListeners='[]'}, sslSettings=SslSettings{enabled=false, invalidHostNameAllowed=false, context=null}, applicationName='null', compressorList=[], uuidRepresentation=JAVA_LEGACY, serverApi=null, autoEncryptionSettings=null, contextProvider=null}
2023-02-12T00:49:57.968+01:00 INFO 80854 --- [ main] i.m.r.core.builder.RunnerBuilderBase : Mongock runner COMMUNITY version[5.2.2]
2023-02-12T00:49:57.970+01:00 INFO 80854 --- [ main] i.m.r.core.builder.RunnerBuilderBase : Running Mongock with NO metadata
2023-02-12T00:49:58.034+01:00 INFO 80854 --- [localhost:27017] org.mongodb.driver.cluster : Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, minWireVersion=0, maxWireVersion=17, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=62515465, setName='myReplicaSet', canonicalAddress=mongo1:27017, hosts=[mongo3:27017, mongo2:27017, mongo1:27017], passives=[], arbiters=[], primary='mongo1:27017', tagSet=TagSet{[]}, electionId=7fffffff0000000000000002, setVersion=1, topologyVersion=TopologyVersion{processId=63e7c5a7d11b71e048698dab, counter=6}, lastWriteDate=Sun Feb 12 00:49:53 CET 2023, lastUpdateTimeNanos=45870970528894}
2023-02-12T00:49:58.336+01:00 INFO 80854 --- [ main] org.reflections.Reflections : Reflections took 33 ms to scan 1 urls, producing 2 keys and 2 values
2023-02-12T00:49:58.343+01:00 INFO 80854 --- [ main] org.reflections.Reflections : Reflections took 2 ms to scan 1 urls, producing 2 keys and 2 values
2023-02-12T00:49:58.367+01:00 INFO 80854 --- [ main] i.m.driver.core.lock.LockManagerDefault : Mongock trying to acquire the lock
2023-02-12T00:49:58.400+01:00 INFO 80854 --- [ main] i.m.driver.core.lock.LockManagerDefault : Mongock acquired the lock until: Sun Feb 12 00:50:58 CET 2023
2023-02-12T00:49:58.401+01:00 INFO 80854 --- [ Thread-1] i.m.driver.core.lock.LockManagerDefault : Starting mongock lock daemon...
2023-02-12T00:49:58.404+01:00 INFO 80854 --- [ main] i.m.r.c.e.system.SystemUpdateExecutor : Mongock starting the system update execution id[2023-02-12T00:49:57.955733372-712]...
2023-02-12T00:49:58.408+01:00 INFO 80854 --- [ main] i.m.r.c.executor.ChangeLogRuntimeImpl : method[io.mongock.runner.core.executor.system.changes.SystemChangeUnit00001] with arguments: []
2023-02-12T00:49:58.411+01:00 INFO 80854 --- [ main] i.m.r.c.executor.ChangeLogRuntimeImpl : method[beforeExecution] with arguments: [io.mongock.driver.mongodb.reactive.repository.MongoReactiveChangeEntryRepository]
2023-02-12T00:49:58.413+01:00 INFO 80854 --- [ main] i.m.r.core.executor.ChangeExecutorBase : APPLIED - {"id"="system-change-00001_before", "type"="before-execution", "author"="mongock", "class"="SystemChangeUnit00001", "method"="beforeExecution"}
2023-02-12T00:49:58.425+01:00 INFO 80854 --- [ main] i.m.r.c.executor.ChangeLogRuntimeImpl : method[execution] with arguments: [io.mongock.driver.mongodb.reactive.repository.MongoReactiveChangeEntryRepository]
2023-02-12T00:49:58.429+01:00 INFO 80854 --- [ main] i.m.r.core.executor.ChangeExecutorBase : APPLIED - {"id"="system-change-00001", "type"="execution", "author"="mongock", "class"="SystemChangeUnit00001", "method"="execution"}
2023-02-12T00:49:58.447+01:00 INFO 80854 --- [ main] i.m.driver.core.lock.LockManagerDefault : Mongock waiting to release the lock
2023-02-12T00:49:58.447+01:00 INFO 80854 --- [ main] i.m.driver.core.lock.LockManagerDefault : Mongock releasing the lock
2023-02-12T00:49:58.455+01:00 INFO 80854 --- [ main] i.m.driver.core.lock.LockManagerDefault : Mongock released the lock
2023-02-12T00:49:58.455+01:00 INFO 80854 --- [ main] i.m.r.c.e.system.SystemUpdateExecutor : Mongock has finished the system update execution
2023-02-12T00:49:58.457+01:00 INFO 80854 --- [ main] org.reflections.Reflections : Reflections took 0 ms to scan 0 urls, producing 0 keys and 0 values
2023-02-12T00:49:58.458+01:00 INFO 80854 --- [ main] org.reflections.Reflections : Reflections took 1 ms to scan 0 urls, producing 0 keys and 0 values
2023-02-12T00:49:58.458+01:00 INFO 80854 --- [ main] i.m.r.c.e.o.migrate.MigrateExecutorBase : Mongock skipping the data migration. There is no change set item.
2023-02-12T00:49:58.458+01:00 INFO 80854 --- [ main] i.m.r.c.e.o.migrate.MigrateExecutorBase : Mongock has finished
2023-02-12T00:49:59.190+01:00 INFO 80854 --- [ main] o.s.b.web.embedded.netty.NettyWebServer : Netty started on port 8080
2023-02-12T00:49:59.201+01:00 INFO 80854 --- [ main] c.e.m.MongockTestKotlinApplicationKt : Started MongockTestKotlinApplicationKt in 4.086 seconds (process running for 4.773)
The problem is in your application.yam. The migration-scan-package name is wrong. That's the reason Mongock doesn't find any ChangeUnit

Get the last 5 minutes from log file

Im trying to get the last 5 minutes from a log file where time format is like this:
2021-08-10 16:05:00,007 ERROR [com.] Cought an Exception (el) Index: 1, Size: 0
2021-08-10 16:05:00,018 ERROR [com.] Cought an Exception (el) Index: 2, Size: 0
2021-08-10 16:10:00,005 ERROR [com.] Cought an Exception (el) Index: 2, Size: 0
2021-08-10 16:15:00,002 ERROR [com.] Cought an Exception (el) Index: 1, Size: 0
2021-08-10 16:15:00,014 ERROR [com.] Cought an Exception (el) Index: 2, Size: 0
2021-08-10 16:50:00,008 ERROR [com.] Cought an Exception (el) Index: 2, Size: 0
I have tried below sed command which is working on another format but I get nothing returned when i run this.
sed -n "/^$(date --date='5 minutes ago' '+%Y-%m-%d %H:%M')/,\$p" logfile
Tried many many different commands but cannot find one to work for this time format 2021-08-10 16:50:00,008
Anyone help is really appreciated.
Also now I'm trying with awk but its getting some random lines which are not on the last minutes but somewhere on the logfile: (i only want to get the last X minutes)
awk -v dt="$(date '+%Y-%m-%d %T,%3N' -d '-30 minutes')" '$1 " " $2 > dt' logfile
Raw Trap [OID=.2.6.1.4.1.5.1.2.4...
Raw Trap [OID=.2.6.1.4.1.5.1.2.4...
Raw Trap [OID=.2.6.1.4.1.5.1.2.14...
Raw Trap [OID=.2.6.1.4.1.5.1.2.14,...
Raw Trap [OID=.2.6.1.4.1.5.1.2.14...
Raw Trap [OID=.2.6.1.4.1.5.1.2.14...
Raw Trap [OID=.2.6.1.4.1.5.1.2.14,...
Raw Trap [OID=.2.6.1.4.1.5.1.2.14,....
2021-08-11 14:00:00,010 ERROR [com.] Cought an IOException (el) Index: 2, Size: 0
2021-08-11 14:05:00,010 ERROR [com.] Cought an IOException (el) Index: 2, Size: 0`
more input from logfile:
2021-08-06 14:24:16,137 INFO [com.] Invoking Rule: (class path...)
Raw Trap [OID=.1.3.6.1.2.14, Sender=v, VarBind OID=.1.3.6.3, VarBind Value=0] --> Clean Trap [........
2021-08-06 14:24:16,137 INFO [com.] InstanceStr=null
2021-08-06 14:24:16,137 INFO [com.] site_InstanceStr=null
2021-08-06 14:24:16,141 INFO [com.] deleting raw trap com.RawTrapid: 7291, oid: .1.3.6.1.4, time: 2021/08/06 14:2...
2021-08-07 00:30:12,495 INFO [com.] Exporting with user Admin
2021-08-07 00:30:12,511 INFO [com.] preparing for bundle export; retrieving all parameters with bundle id matching:
2021-08-07 00:31:07,538 INFO [com.] Exporting with user Admin
2021-08-07 00:31:07,538 INFO [com.] preparing for bundle export; retrieving all parameters with bundle id matching:
2021-08-07 00:31:07,573 INFO [com.] parameters retrieved: 1001
2021-08-07 00:31:07,573 INFO [com.] creating temp bundle export directory in: /var/tmp/_export2021/fullversion
2021-08-07 00:31:07,914 INFO [com.] Deleting temp dir: /var/tmp/_export_Sat_Aug_07_00_31_07_CEST_2021
2021-08-07 04:00:00,115 ERROR [com.] Partition maintenance failed: exit code=1, output: ERROR 1507 (HY000) at line 26: Error in list of partitions to DROP
You can compute date in required format in the shell and pass it as argument to awk to conpare:
awk -v dt="$(date '+%Y-%m-%d %T' -d '-5 minutes')" -F, '$1 > dt' file.log
If you want to include milli-second part also then use:
awk -v dt="$(date '+%Y-%m-%d %T,%3N' -d '-5 minutes')" '($1 " " $2) > dt' file
PS: This requires gnu-date.

Does cascade happen in 1 transaction?

I save the Product which cascade persist the productMaterial. However, when the productMaterial throws DataIntegrityViolationException the product is rollbacked, which seems like cascade is done in 1 transaction, but i don't find any docs saying that it does. Can someone clarify it for me?
NOTE: I DO NOT use #Transactional
Material material = new Material();
material.setId(1);
Product newProduct = new Product();
ProductMaterial productMaterial = new ProductMaterial();
newProduct.setName("bàn chải");
newProduct.setPrice(1000);
newProduct.setCreatedAt(new Date());
newProduct.setProductMaterials(Collections.singletonList(productMaterial));
productMaterial.setProduct(newProduct);
productMaterial.setMaterial(material);
productRepository.save(newProduct);
Here is the hibernate execution:
Hibernate:
/* insert com.vietnam.hanghandmade.entities.Product
*/ insert
into
product
(created_at, name, price, id)
values
(?, ?, ?, ?)
2020-11-10 14:55:38.281 TRACE 65729 --- [nio-8080-exec-2] o.h.type.descriptor.sql.BasicBinder : binding parameter [1] as [TIMESTAMP] - [Tue Nov 10 14:55:38 JST 2020]
2020-11-10 14:55:38.281 TRACE 65729 --- [nio-8080-exec-2] o.h.type.descriptor.sql.BasicBinder : binding parameter [2] as [VARCHAR] - [bàn chải]
2020-11-10 14:55:38.281 TRACE 65729 --- [nio-8080-exec-2] o.h.type.descriptor.sql.BasicBinder : binding parameter [3] as [INTEGER] - [1000]
2020-11-10 14:55:38.281 TRACE 65729 --- [nio-8080-exec-2] o.h.type.descriptor.sql.BasicBinder : binding parameter [4] as [OTHER] - [e5729490-a0f8-48e7-9600-eeeba8b8f279]
Hibernate:
/* insert com.vietnam.hanghandmade.entities.ProductMaterial
*/ insert
into
product_material
(material_id, product_id)
values
(?, ?)
2020-11-10 14:55:38.324 TRACE 65729 --- [nio-8080-exec-2] o.h.type.descriptor.sql.BasicBinder : binding parameter [1] as [INTEGER] - [1]
2020-11-10 14:55:38.324 TRACE 65729 --- [nio-8080-exec-2] o.h.type.descriptor.sql.BasicBinder : binding parameter [2] as [OTHER] - [e5729490-a0f8-48e7-9600-eeeba8b8f279]
2020-11-10 14:55:38.328 WARN 65729 --- [nio-8080-exec-2] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: 23503
2020-11-10 14:55:38.328 ERROR 65729 --- [nio-8080-exec-2] o.h.engine.jdbc.spi.SqlExceptionHelper : ERROR: insert or update on table "product_material" violates foreign key constraint "product_material_material_id_fkey"
Detail: Key (material_id)=(1) is not present in table "material".
NOTE: This answer missed the point of the question, which is about “cascading persist” – it talks about “cascading delete” for foreign keys.
The cascading delete or update is part of the action of the system trigger that implements foreign key constraints, and as such it runs in the same transaction as the triggering statement.
I cannot find a place in the fine manual that spells this out, but it is obvious if you think about it: if the cascading delete were run in a separate transaction, it would be possible that the delete succeeds and the cascading delete fails, which would render the database inconsistent and is consequently not an option.

create tablespace problem in db2 HADR environment

We have Db2 10.5.0.7 on centos 6.9 and TSAMP 3.2 as our high availability solution, when we create a tablespace in primary database we encounter the following errors in the standby:
2019-08-31-08.47.32.164952+270 I87056E2779 LEVEL: Error (OS)
PID : 4046 TID : 47669095425792 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMDB
APPHDL : 0-8 APPID: *LOCAL.DB2.190725231126
HOSTNAME: samdb-b EDUID : 155 EDUNAME: db2redom
(SAMDB) 0 FUNCTION: DB2 Common, OSSe, ossGetDiskInfo, probe:130
MESSAGE : ECF=0x90000001=-1879048191=ECF_ACCESS_DENIED
Access denied CALLED : OS, -, fopen OSERR: EACCES (13) DATA #1 : String, 12 bytes /proc/mounts DATA #2 :
String, 25 bytes /dbdata1/samdbTsContainer DATA #3 : unsigned integer,
8 bytes
2019-08-31-08.47.32.185625+270 E89836E494 LEVEL: Error PID
: 4046 TID : 47669095425792 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMDB
APPHDL : 0-8 APPID: *LOCAL.DB2.190725231126
HOSTNAME: samdb-b EDUID : 155 EDUNAME: db2redom
(SAMDB) 0 FUNCTION: DB2 UDB, high avail services,
sqlhaGetLocalDiskInfo, probe:9433 MESSAGE :
ECF=0x90000001=-1879048191=ECF_ACCESS_DENIED
Access denied
2019-08-31-08.47.32.186258+270 E90331E484 LEVEL: Error PID
: 4046 TID : 47669095425792 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMDB
APPHDL : 0-8 APPID: *LOCAL.DB2.190725231126
HOSTNAME: samdb-b EDUID : 155 EDUNAME: db2redom
(SAMDB) 0 FUNCTION: DB2 UDB, high avail services, sqlhaCreateMount,
probe:9746 RETCODE : ZRC=0x827300AA=-2106392406=HA_ZRC_FAILED "SQLHA
API call error"
2019-08-31-08.47.32.186910+270 I90816E658 LEVEL: Error PID
: 4046 TID : 47669095425792 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMDB
APPHDL : 0-8 APPID: *LOCAL.DB2.190725231126
HOSTNAME: samdb-b EDUID : 155 EDUNAME: db2redom
(SAMDB) 0 FUNCTION: DB2 UDB, buffer pool services,
sqlbDMSAddContainerRequest, probe:812 MESSAGE :
ZRC=0x827300AA=-2106392406=HA_ZRC_FAILED "SQLHA API call error" DATA
: String, 36 bytes Cluster add mount operation failed: DATA #2 : String, 37 bytes /dbdata1/samdbTsContainer/TSPKGCACH.1 DATA #3 :
String, 8 bytes SAMDB
2019-08-31-08.47.32.190537+270 E113909E951 LEVEL: Error PID
: 4046 TID : 47669095425792 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMDB
APPHDL : 0-8 APPID: *LOCAL.DB2.190725231126
HOSTNAME: samdb-b EDUID : 155 EDUNAME: db2redom
(SAMDB) 0 FUNCTION: DB2 UDB, buffer pool services,
sqlblog_reCreatePool, probe:3134 MESSAGE : ADM6106E Table space
"TSPKGCACH" (ID = "49") could not be created
during the rollforward operation. The most likely cause is that there
is not enough space to create the containers associated with the
table space. Connect to the database after the rollforward operation
completes and use the SET TABLESPACE CONTAINERS command to assign
containers to the table space. Then, issue another ROLLFORWARD
DATABASE command to complete recovery of this table space.
2019-08-31-08.47.32.200949+270 E114861E592 LEVEL: Error PID
: 4046 TID : 47669095425792 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMDB
APPHDL : 0-8 APPID: *LOCAL.DB2.190725231126
HOSTNAME: samdb-b EDUID : 155 EDUNAME: db2redom
(SAMDB) 0 FUNCTION: DB2 UDB, buffer pool services, sqlbIncPoolState,
probe:4628 MESSAGE : ADM12512W Log replay on the HADR standby has
stopped on table space
"TSPKGCACH" (ID "49") because it has been put into "ROLLFORWARD
PENDING" state.
There is free space available for the database and the specified path (/dbdata1/samdbTsContainer) exists on the server and we can create file manually on it .
all settings are equivalent on the primary and standby. db2inst1 is the owner of /dbdata1/samdbTsContainer and permission is drwxr-xr-x, the result of su - db2inst1 “ulimit -Hf” is unlimited and ext3 is file system type and create tablespace statement is as follows:
CREATE LARGE TABLESPACE TSPKGCACH IN DATABASE PARTITION GROUP IBMDEFAULTGROUP PAGESIZE 8 K MANAGED BY DATABASE USING (FILE '/dbdata1/samdbTsContainer/TSPKGCACH.1' 5120) ON DBPARTITIONNUM (0) EXTENTSIZE 64 PREFETCHSIZE 64 BUFFERPOOL BP8KPKGCACH OVERHEAD 10.5 TRANSFERRATE 0.14 DATA TAG NONE NO FILE SYSTEM CACHING;
SELinux is disabled and the sector size is 512 bytes. The mount options are as follws:
/dev/sdf1 /dbdata1 ext3 rw,relatime,errors=continue,barrier=1,data=ordered 0 0
We can not recreate the problem sometimes this problem occur and we don't know the reason of it, but the problem remains until server reboot.
When we restart the standby server problem solves but we need to drop the tablespace and recreate it, is there any idea for this problem?
From the error it looks to me that problem is not with the file access itself but rather /proc/mounts, which Db2 uses to do the mapping between containers and filesystems (to know e.g. the FS type). Hence I suggest to test whether all:
cat /proc/mounts
cat /proc/self/mounts
mount
work OK run as Db2 instance owner ID (db2inst1). If not, this implies some odd OS issue that Db2 is a victim of and we would need more OS diagnostics (e.g strace from the cat /proc/mounts command) to understand it.
Edit:
To confirm this theory I've run a quick test with Db2 11.1. Note this must be TSA-controlled environment for Db2 to follow sqlhaCreateMount code path (because if this will be a separate mount, Db2 will add it to the TSA resource model)
On both primary and standby:
mkdir /db2data
chown db2v111:db2iadm /db2data
then on standby:
chmod o-rx /proc
(couldn't find a "smarter" way to hit EACCES on mount info).
When I will run on primary:
db2 "create tablespace test managed by database using (file '/db2data/testts' 100 M)"
it completes fine on primary but standby hits exactly the error you are seeing:
2019-06-21-03.00.37.087693+120 I1774E2661 LEVEL: Error (OS)
PID : 10379 TID : 46912992438016 PROC : db2sysc 0
INSTANCE: db2v111 NODE : 000 DB : SAMPLE
APPHDL : 0-4492 APPID: *LOCAL.DB2.190621005919
HOSTNAME: rhel-hadrs.kkuduk.com
EDUID : 61 EDUNAME: db2redom (SAMPLE) 0
FUNCTION: DB2 Common, OSSe, ossGetDiskInfo, probe:130
MESSAGE : ECF=0x90000001=-1879048191=ECF_ACCESS_DENIED
Access denied
CALLED : OS, -, fopen OSERR: EACCES (13)
DATA #1 : String, 12 bytes
/proc/mounts
DATA #2 : String, 8 bytes
/db2data
DATA #3 : unsigned integer, 8 bytes
1
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x00002AAAB9CFD84B /home/db2v111/sqllib/lib64/libdb2osse.so.1 + 0x23F84B
[1] 0x00002AAAB9CFED51 ossLogSysRC + 0x101
[2] 0x00002AAAB9D19647 ossGetDiskInfo + 0xF07
[3] 0x00002AAAAC52402C _Z21sqlhaGetLocalDiskInfoPKcjPcjS1_jS1_ + 0x26C
[4] 0x00002AAAAC523C5F _Z16sqlhaGetDiskInfoPKcS0_jPcjS1_jS1_ + 0x29F
[5] 0x00002AAAAC521CA0 _Z16sqlhaCreateMountPKcS0_m + 0x350
[6] 0x00002AAAACDE8D5D _Z26sqlbDMSAddContainerRequestP12SQLB_POOL_CBP16SQLB_POOLCONT_CBP12SQLB_GLOBALSP14SQLB_pfParIoCbbm + 0x90D
[7] 0x00002AAAACE14FF9 _Z29sqlbDoDMSAddContainerRequestsP12SQLB_POOL_CBP16SQLB_POOLCONT_CBjP26SQLB_AS_CONT_AND_PATH_INFOP12SQLB_GLOBALS + 0x2D9
[8] 0x00002AAAACE0C20F _Z17sqlbDMSCreatePoolP12SQLB_POOL_CBiP16SQLB_POOLCONT_CBbP12SQLB_GLOBALS + 0x103F
[9] 0x00002AAAACDB1EAC _Z13sqlbSetupPoolP12SQLB_GLOBALSP12SQLB_POOL_CBPKciiiihiP19SQLB_CONTAINER_SPECllblsib + 0xE4C
-> it is an issue with /proc/mounts access, not the target path itself, where i can write with no issues:
[db2v111#rhel-hadrs ~]$ echo "test" > /db2data/testfile
If that would be path access issue:
chmod o+rx /proc
chmod a-rw /db2data
then an error during the "CREATE TABLESPACE" redo on standby will be different:
2019-06-21-03.07.29.175486+120 I35023E592 LEVEL: Error
PID : 10379 TID : 46912992438016 PROC : db2sysc 0
INSTANCE: db2v111 NODE : 000 DB : SAMPLE
APPHDL : 0-4492 APPID: *LOCAL.DB2.190621005919
HOSTNAME: rhel-hadrs.kkuduk.com
EDUID : 61 EDUNAME: db2redom (SAMPLE) 0
FUNCTION: DB2 UDB, buffer pool services, sqlbCreateAndLockParent, probe:918
MESSAGE : ZRC=0x8402001E=-2080243682=SQLB_CONTAINER_NOT_ACCESSIBLE
"Container not accessible"
DATA #1 : <preformatted>
Failed at directory /db2data.
2019-06-21-03.07.29.175799+120 I35616E619 LEVEL: Severe
PID : 10379 TID : 46912992438016 PROC : db2sysc 0
INSTANCE: db2v111 NODE : 000 DB : SAMPLE
APPHDL : 0-4492 APPID: *LOCAL.DB2.190621005919
HOSTNAME: rhel-hadrs.kkuduk.com
EDUID : 61 EDUNAME: db2redom (SAMPLE) 0
FUNCTION: DB2 UDB, buffer pool services, sqlbCreateAndLockParent, probe:722
MESSAGE : ZRC=0x8402001E=-2080243682=SQLB_CONTAINER_NOT_ACCESSIBLE
"Container not accessible"
DATA #1 : <preformatted>
Failed to create a portion of the path /db2data/testts2
(few more errors follow pointing directly to the permissions on /db2data)
This proves it is the /proc access issue and you need to debug it with your OS team. Perhaps /proc gets completely unmounted?
In any case, the actual issue is db2sysc process hitting EACCES running fopen on /proc/mounts and you need debug it further with OS team.
Edit:
When it comes to the debugging and proving the error is returned by the OS, we would have to trace open() syscalls done by Db2. Strace can do that, but overhead is too high for a production system. If you can get SystemTap installed on the system, I suggest a script like this (this is a basic version):
probe nd_syscall.open.return
{
if ( user_string( #entry( pointer_arg(1) ) ) =~ ".*mounts")
{
printf("exec: %s pid: %d uid: %d (euid: %d) gid: %d (egid: %d) run open(%s) rc: %d\n", execname(), pid(), uid(), euid(), gid(), egid(), user_string(#entry(pointer_arg(1)), "-"), returnval() )
}
}
it uses nd_syscall probe, so it will work even without kernel debuginfo package. You can run it like this:
$ stap open.stap
exec: cat pid: 24159 uid: 0 (euid: 0) gid: 0 (egid: 0) run open(/proc/mounts) rc: 3
exec: cat pid: 24210 uid: 0 (euid: 0) gid: 0 (egid: 0) run open(/proc/mounts) rc: 3
exec: cat pid: 24669 uid: 1111 (euid: 1111) gid: 1001 (egid: 1001) run open(/proc/mounts) rc: 3
exec: cat pid: 24734 uid: 1111 (euid: 1111) gid: 1001 (egid: 1001) run open(/proc/mounts) rc: -13
exec: cat pid: 24891 uid: 1111 (euid: 1111) gid: 1001 (egid: 1001) run open(/proc/self/mounts) rc: -13
exec: ls pid: 24971 uid: 1111 (euid: 1111) gid: 1001 (egid: 1001) run open(/proc/mounts) rc: -13
-> at some point I've revoked access from /proc and open attempt failed with -13 (EACCES). You just need to enable it on the system when you see the error and see if something is logged when Db2 fails.

Kafka Consumer Marking the coordinator 2147483647 dead

I am using Kafka Server 0.9 with consumer kafka-client version 0.9 and kafka-producer 0.8.2.
Every thing is working great except i am getting lot of info that the coordinator is dead on the consumer
2016-02-25 19:30:45.046 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.048 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.049 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.050 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.051 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.052 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.053 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.054 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.055 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.056 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.057 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.058 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.059 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.060 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.061 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.062 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.062 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.063 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.064 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.065 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.066 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.067 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.068 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.068 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.069 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.070 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.071 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.072 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.072 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.073 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.074 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.075 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.075 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.076 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.077 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.078 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.079 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.079 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.080 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.081 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.082 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.083 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.083 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.084 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.085 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.086 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.086 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.087 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.088 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.089 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.089 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.090 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.091 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.093 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.094 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.094 INFO 10263 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
I also noticed that the producer is having disconnect connect every 10 minute as the below
2016-03-12 15:55:36 INFO [pool-1-thread-1] - Fetching metadata from broker id:0,host:192.168.72.30,port:9092 with correlation id 41675 for 1 topic(s) Set(act)
2016-03-12 15:55:36 INFO [pool-1-thread-1] - Connected to 192.168.72.30:9092 for producing
2016-03-12 15:55:36 INFO [pool-1-thread-1] - Disconnecting from 192.168.72.30:9092
2016-03-12 15:55:36 INFO [pool-1-thread-1] - Disconnecting from kafkauk.XXXXXXXXXX.co:9092
2016-03-12 15:55:36 INFO [pool-1-thread-1] - Connected to kafkauk.XXXXXXXXXX.co:9092 for producing
this is my producer configuration
metadata.broker.list=192.168.72.30:9092
serializer.class=kafka.serializer.StringEncoder
request.required.acks=1
linger.ms=2000
batch.size=500
and consumer config
bootstrap.servers: kafkauk.xxxxxxxx.co:9092
group.id: cdrServer
client.id: cdrServer
enable.auto.commit: true
auto.commit.interval.ms: 1000
session.timeout.ms: 30000
key.deserializer: org.apache.kafka.common.serialization.StringDeserializer
value.deserializer: org.apache.kafka.common.serialization.StringDeserializer
I could not figure out what does these mean and should i neglect them or i am missing something in the configuration
After i change kafka to debug level on the consumer i found the below
2016-03-13 18:21:55.586 DEBUG 5469 --- [ cdrServer] org.apache.kafka.clients.NetworkClient : Node 2147483647 disconnected.
2016-03-13 18:21:55.586 INFO 5469 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-03-13 18:21:55.586 DEBUG 5469 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Issuing group metadata request to broker 0
2016-03-13 18:21:55.586 DEBUG 5469 --- [ cdrServer] org.apache.kafka.clients.NetworkClient : Sending metadata request ClientRequest(expectResponse=true
, callback=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=183025,client_id=cdrServer}, body={topics=[act]}), isInitiatedByNetworkCli
ent, createdTimeMs=1457893315586, sendTimeMs=0) to node 0
2016-03-13 18:21:55.591 DEBUG 5469 --- [ cdrServer] org.apache.kafka.clients.Metadata : Updated cluster metadata version 296 to Cluster(nodes = [N
ode(0, kafkauk.xxxxxxxxx.co, 9092)], partitions = [Partition(topic = act, partition = 0, leader = 0, replicas = [0,], isr = [0,]])
2016-03-13 18:21:55.592 DEBUG 5469 --- [ cdrServer] o.a.k.c.c.internals.AbstractCoordinator : Group metadata response ClientResponse(receivedTimeMs=1457
893315592, disconnected=false, request=ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFu
tureCompletionHandler#1e2de777, request=RequestSend(header={api_key=10,api_version=0,correlation_id=183024,client_id=cdrServer}, body={group_id=cdrServer}), c
reatedTimeMs=1457893315586, sendTimeMs=1457893315586), responseBody={error_code=0,coordinator={node_id=0,host=kafkauk.xxxxxxxx.co,port=9092}})
I am not sure it is a network problem because it happen every 9 minute exactly
Update
I found that is directly related to
connections.max.idle.ms: 300000
What ever i put then i will get disconnected at this value
Marking the coordinator dead happens when there is a Network communication error between the Consumer Client and the Coordinator (Also this can happen when the Coordinator dies and the group needs to rebalance). There are a variety of situations (offset commit request, fetch offset, etc) that can cause this issue. I will suggest that you research what's causing this situations
I have faced the same issue. Finally after follow Shannon recommendation about TRACING logs, I used:
logging.level.org.apache.kafka=TRACE
To find out that my client was trying to resolve Euler:9092 as coordinator... Local name!!
So I commented out and changed listeners and advertised.listeners values in server.properties file.
It is working now! :-)
In my case the message was in logs when I try to assign partitions manually. After I've read in api docs of the new consumer follow notice:
It is also possible for the consumer to manually assign specific partitions (similar to the older "simple" consumer) using assign(Collection). In this case, dynamic partition assignment and consumer group coordination will be disabled.
That is, if you have code like this:
KafkaConsumer<String, String> consumer = new KafkaConsumer(props);
consumer.assign( Arrays.asList(
new TopicPartition("topic", 0),
new TopicPartition("topic", 1)
));
then the message "Marking the coordinator 2147483647 dead" puts in our logs always.
This is basically you are not able to reach to Kafka.
In my case I was running Kafka in vagrant box, and if I start VPN it refresh vagrant ip hence it was not able to connect to it.
Possible Solution: In this case stop VPN and start your vagrant.
This may also be related to a long garbage collection stop-the-world phase. In my case I encountered this message after > 10 sec GCs.
This error mostly occurs when there is a conflict between coordinator and consumer. The first thing you should do is to expose the listener port in server.properties and secondly you need to remove all the logs under kafka-logs. Don't forget to restart the server and zookeeper after these steps. It will resolve the issue.
I faced this issue today and solved it (temporarily, might I add). I've posted an answer here on how I did it.