Mongobee stuck several minutes before to quit - mongodb

I'm currently using mongobee to update a mongoDB database.
An update can take up to a few minutes however it seems that the real behaviour of mongobee itself only lasts for a few seconds and after that the java process stay stuck for a few minutes before to shut.
2020-10-13 17:32:53.791 INFO 1 --- [mongo] o.m.d.cluster : Setting max set version to 4 from replica set primary XXXXXXXXX
2020-10-13 17:32:53.791 INFO 1 --- [mongo] o.m.d.cluster : Discovered replica set primary XXXXXXXXX
2020-10-13 17:32:54.889 INFO 1 --- [imer-1-thread-1] o.m.d.connection : Opened connection [connectionId{localValue:1, serverValue:18352}] to XXXXXXXXX
2020-10-13 17:32:55.191 INFO 1 --- [ main] c.g.m.Mongobee : Mongobee acquired process lock, starting the data migration sequence..
2020-10-13 17:32:55.508 INFO 1 --- [ main] o.r.Reflections : Reflections took 200 ms to scan 1 urls, producing 1 keys and 1 values
2020-10-13 17:32:55.691 INFO 1 --- [ main] c.g.m.Mongobee : Mongobee is releasing process lock.
2020-10-13 17:32:55.698 INFO 1 --- [ main] c.g.m.Mongobee : Mongobee has finished his job.
2020-10-13T17:35:37,089 TECHNICAL INFO myapplication {o.s.b.StartupInfoLogger} Started MyApplication in 28.81 seconds (JVM running for 36.744)
After that, the java process still works for a few minutes before to end without any further log.
According to the log, Mongobee has finished his job at 17:35:37, so I don't understand why I'm stuck for maybe 5 minutes before it stops.
Is it an expected behaviour ? Does springboot/mongobee have a parameter like maybe a "session idle time" to be respected before to leave the connexion ?

Related

Changing PostgreSQL schema only active after reload of Spring Boot application

In my spring boot application, the first migration creates a schema for the current user and switches to this schema. Subsequent migrations are properly executed on this schema. However, after migration is complete, these tables are not found by the application until the application is reloaded.
application.yml
spring:
r2dbc:
url: r2dbc:postgresql://127.0.0.1:5432/pickle_db
username: rick
password: morty
r2dbc:
migrate:
migrations-schema: public
migrations-table: fun_migrations
migrations-lock-table: fun_migrations_lock
resources-paths:
- classpath:/db/migration/*.sql
V1__schema.sql
CREATE SCHEMA IF NOT EXISTS fun;
ALTER ROLE current_user SET search_path TO 'fun';
SET search_path TO 'fun';
V2__tables.sql
CREATE TABLE TREE(id int, name varchar(64));
Migration runs successfully and creates following tables.
fun.tree
public.fun_migration
public.fun_migration_lock
2021-06-17 19:58:51.845 INFO 4400 --- [ restartedMain] n.n.r.m.a.R2dbcMigrateAutoConfiguration : Starting R2DBC migration
2021-06-17 19:58:51.847 INFO 4400 --- [ restartedMain] n.n.r2dbc.migrate.core.R2dbcMigrate : Configured with R2dbcMigrateProperties{enable=true, connectionMaxRetries=500, resourcesPaths=[classpath:/db/migration/*.sql], chunkSize=1000, dialect=null, validationQuery='select '42' as result', validationQueryExpectedResultValue='42', validationQueryTimeout=PT5S, validationRetryDelay=PT1S, acquireLockRetryDelay=PT1S, acquireLockMaxRetries=100, fileCharset=UTF-8, waitForDatabase=true, migrationsSchema='public', migrationsTable='fun_migrations', migrationsLockTable='fun_migrations_lock'}
2021-06-17 19:58:51.909 INFO 4400 --- [ restartedMain] n.n.r2dbc.migrate.core.R2dbcMigrate : Creating new test connection
2021-06-17 19:58:52.523 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : Comparing expected value '42' with provided result '42'
2021-06-17 19:58:52.525 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : Closing test connection
2021-06-17 19:58:52.532 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : Successfully got result '42' of test query
2021-06-17 19:58:52.678 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : By 'Making internal tables' 1 rows updated
2021-06-17 19:58:52.692 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : By 'Acquiring lock' 1 rows updated
2021-06-17 19:58:52.702 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : Database version is 0
2021-06-17 19:58:52.723 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : Applying MigrationInfo{version=1, description='schema', splitByLine=false, transactional=true}
2021-06-17 19:58:52.750 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : By 'MigrationInfo{version=1, description='schema', splitByLine=false, transactional=true}' 0 rows updated
2021-06-17 19:58:52.793 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : By 'Writing metadata version 1' 1 rows updated
2021-06-17 19:58:52.800 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : Applying MigrationInfo{version=2, description='tables', splitByLine=false, transactional=true}
2021-06-17 19:58:52.814 WARN 4400 --- [actor-tcp-nio-1] i.r.p.client.ReactorNettyClient : Notice: SEVERITY_LOCALIZED=NOTICE, SEVERITY_NON_LOCALIZED=NOTICE, CODE=00000, MESSAGE=table "tree" does not exist, skipping, FILE=tablecmds.c, LINE=1217, ROUTINE=DropErrorMsgNonExistent
2021-06-17 19:58:52.986 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : By 'MigrationInfo{version=2, description='tables', splitByLine=false, transactional=true}' 0 rows updated
2021-06-17 19:58:53.027 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : By 'Writing metadata version 2' 1 rows updated
2021-06-17 19:58:53.036 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : Applying MigrationInfo{version=3, description='data', splitByLine=false, transactional=true}
2021-06-17 19:58:53.058 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : By 'MigrationInfo{version=3, description='data', splitByLine=false, transactional=true}' 94 rows updated
2021-06-17 19:58:53.072 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : By 'Writing metadata version 3' 1 rows updated
2021-06-17 19:58:53.084 INFO 4400 --- [actor-tcp-nio-1] n.n.r2dbc.migrate.core.R2dbcMigrate : By 'Releasing lock' 1 rows updated
2021-06-17 19:58:53.090 INFO 4400 --- [ restartedMain] n.n.r.m.a.R2dbcMigrateAutoConfiguration : End of R2DBC migration
Once I connect to the application, I get following error.
postgresql log
database_1 | 2021-06-17 17:56:29.903 UTC [1] LOG: starting PostgreSQL 13.2 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.2.1_pre1) 10.2.1 20201203, 64-bit
database_1 | 2021-06-17 17:56:29.910 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
database_1 | 2021-06-17 17:56:29.910 UTC [1] LOG: listening on IPv6 address "::", port 5432
database_1 | 2021-06-17 17:56:29.939 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database_1 | 2021-06-17 17:56:29.960 UTC [51] LOG: database system was shut down at 2021-06-17 17:56:29 UTC
database_1 | 2021-06-17 17:56:29.972 UTC [1] LOG: database system is ready to accept connections
database_1 | 2021-06-17 18:03:52.818 UTC [65] ERROR: relation "tree" does not exist at character 15
database_1 | 2021-06-17 18:03:52.818 UTC [65] STATEMENT: SELECT * FROM TREE
After restarting the spring boot application, everything works perfectly fine. I assume, public.tree, which does not exist, is selected before restart. Once the application is restarted, fun.tree is selected. So, this happens only after this very first migration. How can I make the search_path which is used during migration persistent? Alternatively, how would I reload the connection after the migration, such that the role defined search_path is used?
Update 2021-06-18
I have found the reason for this issue. spring-boot-starter-data-r2dbc pulls in io.r2dbc:r2dbc-pool. Which creates 10 connections before ALTER ROLE current_user SET search_path TO 'fun'; is executed. SET search_path TO 'fun'; is only valid for the one session in which the migration runs.
So the question comes down to, how can I refresh all connections of the pool?
Please try set LOCAL option, After the set command as the following example:
SET LOCAL search_path TO 'fun';
Specifies that the command takes effect for only the current transaction. After COMMIT or ROLLBACK, the session-level setting takes effect again. Note that SET LOCAL will appear to have no effect if it is executed outside a BEGIN block since the transaction will end immediately.
For more detail:
https://www.postgresql.org/docs/9.1/sql-set.html

Spring Batch multiple jobs at once started?

In my spring batch i see following logs.
INFO 5572 --- [ scheduling-1] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=sample]] launched with the following parameters: [{JobID=x}]
INFO 5572 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=sample]] launched with the following parameters: [{run.id=1, JobID=y}]
INFO 5572 --- [ scheduling-1] o.s.batch.core.job.SimpleStepHandler : Executing step: [step1]
INFO 5572 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [step1]
Is this same log lines repeated by two threads or two jobs started at once ?
According to your logs, two different job instances are executed by two different threads: scheduling-1 and main.

Kafka could not create internal topics after 32 topics created

We have 9 microservices that creates 32 topics (2 of them from beginning , 30 of them from internal) , after I make a new join kafka gets down. Is there any limitation that only 32 topics can be created with Kafka, or how can I solve this?
Thank you for your time.
Started SpringBootCounterMS in 4.652 seconds (JVM running for 7.59)
2018-07-04 10:39:29.513 INFO 14956 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [counter-service-3ca6bb7e-addd-445e-a22b-8b7be1b3b6c7-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING
2018-07-04 10:39:29.514 INFO 14956 --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [counter-service-3ca6bb7e-addd-445e-a22b-8b7be1b3b6c7]State transition from REBALANCING to RUNNING
2018-07-04 10:39:30.579 INFO 14956 --- [-StreamThread-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=counter-service-3ca6bb7e-addd-445e-a22b-8b7be1b3b6c7-StreamThread-1-consumer, groupId=counter-service] Marking the coordinator localhost:9092 (id: 2147483647 rack: null) dead
2018-07-04 10:39:30.599 WARN 14956 --- [-StreamThread-2] o.a.k.s.p.i.InternalTopicManager : stream-thread [main] Could not create internal topics: Empty response for client request. Retry #0
Updating JVM to a 64bit one solves this problem. It is a outofmemory error.

Kafka Streams - Explain the reason why KTable and its associated Store only get updated every 30 seconds

I have this simple KTable definition that generates a Store:
KTable<String, JsonNode> table = kStreamBuilder.<String, JsonNode>table(ORDERS_TOPIC, ORDERS_STORE);
table.print();
I publish messages into the ORDERS_TOPIC but the store isn't really updated until every 30 seconds. This is the log where there is a message about committing because the 30000ms time has elapsed:
2017-07-25 23:53:15.465 DEBUG 17540 --- [ StreamThread-1] o.a.k.c.consumer.internals.Fetcher : Sending fetch for partitions [orders-0] to broker EXPRF026.SUMINISTRADOR:9092 (id: 0 rack: null)
2017-07-25 23:53:15.567 INFO 17540 --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [StreamThread-1] Committing all tasks because the commit interval 30000ms has elapsed
2017-07-25 23:53:15.567 INFO 17540 --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [StreamThread-1] Committing task StreamTask 0_0
2017-07-25 23:53:15.567 DEBUG 17540 --- [ StreamThread-1] o.a.k.s.processor.internals.StreamTask : task [0_0] Committing its state
2017-07-25 23:53:15.567 DEBUG 17540 --- [ StreamThread-1] o.a.k.s.p.i.ProcessorStateManager : task [0_0] Flushing all stores registered in the state manager
f2b9ff2b-62c3-470e-8df1-066cd1e3d5ec
{"uid":"string","productId":0,"orderId":"f2b9ff2b-62c3-470e-8df1-066cd1e3d5ec","name":"OrderPlaced","state":"PENDING_PRODUCT_RESERVATION"}
[KTABLE-SOURCE-0000000001]: f2b9ff2b-62c3-470e-8df1-066cd1e3d5ec , ({"uid":"string","productId":0,"orderId":"f2b9ff2b-62c3-470e-8df1-066cd1e3d5ec","name":"OrderPlaced","state":"PENDING_PRODUCT_RESERVATION"}<-null)
2017-07-25 23:53:15.569 DEBUG 17540 --- [ StreamThread-1] o.a.k.s.state.internals.ThreadCache : Thread order-service-streams-16941f70-87b3-45f4-88de-309e4fd22748-StreamThread-1 cache stats on flush: #puts=1, #gets=1, #evicts=0, #flushes=1
2017-07-25 23:53:15.576 DEBUG 17540 --- [ StreamThread-1] o.a.k.s.p.internals.RecordCollectorImpl : task [0_0] Flushing producer
I found that the property that controls this is commit.interval.ms:
props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10);
Why is it set to 30000ms by default (sounds like a long time) and what are the implications of changing it to 10ms?
If instead of a KTable I work with a KStream...
KStream<String, JsonNode> kStream = kStreamBuilder.stream(ORDERS_TOPIC);
kStream.print();
...I can see the messages right away, without having to wait those 30000ms, why the difference?
It's related to memory management in particular, the KTable caches: http://docs.confluent.io/current/streams/developer-guide.html#memory-management
KTable is actually updated all the time and if you use "Interactive Queries" to access the underlying state store, you can get each update immediately. However, the KTable cache buffers the updates to reduce downstream load and each time a commit is triggered, this cache needs to be flushed downstream to avoid data loss in case if failure. If your cache size is small, you might also see downstream records if a key get's evicted from the cache.
About commit interval: in general, the commit interval is set to a relatively large value, to reduce the commit load on the brokers.

How to read HDF data from HDFS for Hadoop

I am working in Image processing on Hadoop. I am using HDF satellite data for processing, I can access and use jpg and other image types of data in hadoop streaming. But while using HDF data it comes with error. Hadoop couldnt read HDF data from HDFS. It takes more than twenty minutes to show the error also. My HDF data size is more than 150MB single file.
How to solve this problem. How to make hadoop can read this HDF data from HDFS.
Some of my code
hadoop#master:/usr/local/master/hdf/examples$ ./runD1.sh
Buildfile: /usr/local/master/hdf/build.xml
downloader:
setup:
test_settings:
compile:
BUILD SUCCESSFUL
Total time: 0 seconds
Output HIB: /var/www/html/uploads/
14/09/26 15:28:46 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
Found host successfully: 0
Repeated host: 1
Repeated host: 2
Repeated host: 3
Tried to get 2 nodes, got 1
14/09/26 15:28:46 INFO input.FileInputFormat: Total input paths to process : 1
First n-1 nodes responsible for 1592259 images
Last node responsible for 1592259 images
14/09/26 15:29:04 INFO mapred.JobClient: Running job: job_201409191212_0006
14/09/26 15:29:05 INFO mapred.JobClient: map 0% reduce 0%
14/09/26 15:39:15 INFO mapred.JobClient: Task Id : attempt_201409191212_0006_m_000000_0, Status : FAILED
Task attempt_201409191212_0006_m_000000_0 failed to report status for 600 seconds. Killing!
14/09/26 15:49:17 INFO mapred.JobClient: Task Id : attempt_201409191212_0006_m_000000_1, Status : FAILED
Task attempt_201409191212_0006_m_000000_1 failed to report status for 600 seconds. Killing!
14/09/26 15:59:19 INFO mapred.JobClient: Task Id : attempt_201409191212_0006_m_000000_2, Status : FAILED
Task attempt_201409191212_0006_m_000000_2 failed to report status for 600 seconds. Killing!
Error log is:
2014-09-26 15:38:45,133 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201409191212_0006_m_-1211757488
2014-09-26 15:38:45,133 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201409191212_0006_m_-1211757488 spawned.
2014-09-26 15:38:45,136 INFO org.apache.hadoop.mapred.TaskController: Writing commands to /usr/local/master/temp/mapred/local/ttprivate/taskTracker/hadoop/jobcache/job_201409191212_0006/attempt_201409191212_0006_m_000000_0.cleanup/taskjvm.sh
2014-09-26 15:38:45,631 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201409191212_0006_m_-1211757488 given task: attempt_201409191212_0006_m_000000_0
2014-09-26 15:38:46,145 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201409191212_0006_m_000000_0 0.0%
2014-09-26 15:38:46,198 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201409191212_0006_m_000000_0 0.0% cleanup
2014-09-26 15:38:46,200 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201409191212_0006_m_000000_0 is done.
2014-09-26 15:38:46,200 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201409191212_0006_m_000000_0 was -1
2014-09-26 15:38:46,200 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
2014-09-26 15:38:46,340 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201409191212_0006_m_-1211757488 exited with exit code 0. Number of tasks it ran: 1
Please can anyone help me to solve this problem.