com.thinkaurelius.titan.diskstorage.PermanentBackendException: Unexpected interrupt - titan

After upgrading to Titan 1.0.0 I started to see the following exceptions under load, using Cassandra (2.2.6) as the storage backend:
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)[:1.8.0_102]
at java.lang.Thread.sleep(Thread.java:340)[:1.8.0_102]
at
java.util.concurrent.TimeUnit.sleep(TimeUnit.java:386)[:1.8.0_102]
at
com.thinkaurelius.titan.diskstorage.util.time.TimestampProviders.sleepPast(TimestampProviders.java:138)
at
com.thinkaurelius.titan.diskstorage.common.DistributedStoreManager.sleepAfterWrite(DistributedStoreManager.java:222)
... 66 more
Can this be fixed through configuration?
While there are several configuration items available around timestamps, I did not find any that strikes me as relevant to the timestamp provider itself.

You should check your Cassandra logs. I have found that Titan under load starts to throw these types of errors as well as Timeout errors when Cassandra starts its compaction process.
Grep for the keyword "GC" in /var/log/cassandra/system.log monitor your disk usage using dstat. If you see "GC" often then you under going heavy compaction and this bogs down titan.
To get around this you can try to optimise how you load your data into titan so as to not cause compaction to often.
The following are just things we tried that worked for our case:
Avoid deletions. Deletions trigger tombstoning which leads to compaction.
Increase the size of your JVM. One of the things which causes compaction to run is when you start to run out of memory so this makes it less likely to run.
You can try to use different compaction strategies. Each one is optimised for a different use case.

Related

Scheduling jobs fails with org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException

Thank you for reading this SO question, it may seem long, but I'll try to get as most information as possible in it to help to get the answer.
Summary
We are currently experiencing a scheduling issue with our Flink cluster.
The symptoms are that some/most/all (it depends, the symptoms are not always the same) of our tasks are shown as SCHEDULED but fail after a timeout. The jobs are then shown as RUNNING.
The failing exception is the following one:
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout
After analysis, we assume (we cannot prove it, as there are not that much logs for that part of the code) that the failure is due to a deadlock/race condition that is happening when several jobs are being submitted at the same time to the Flink cluster, even though we have enough slots available in the cluster.
We actually have the error with 52 available task slots, and have 12 jobs that are not scheduled.
Additional information
Flink version: 1.13.1 commit a7f3192
Flink cluster in session mode
2 Job managers using k8s HA mode (resource requests: 2 CPU, 4Gb Ram, limits sets on memory to 4Gb)
50 task managers with 2 slots each (resource requests: 2 CPUs, 2GB Ram. No limits set).
Our Flink cluster is shut down every night, and restarted every morning. The error seems to occur when a lot of jobs needs to be scheduled. The jobs are configured to restore their state, and we do not see any issues for jobs that are being scheduled and run correctly, it seems to really be related to a scheduling issue.
Questions
May it be that the issue described in FLINK-23409 is actually the same, but occurs only when there is a race condition when scheduling several jobs?
Is there any way to increase logging in the scheduler to debug this issue?
Is it a known issue? If yes, is there any workaround/solution to resolve it?
P.S: a while ago, I asked more or less the same question on the ML, but dropped it, I'm sorry if this is considered as cross-asking, it's not intended t. We are just opening a new thread as we have more information and the issue re-occur.

Zombie giant unkillable task blocks Druid at restart

I'm running Apache Druid 0.17 deploying with nohup ./bin/start-nano-quickstart > mylog.log. As the deep storage I am using s3 and I have parquet extension enabled and all work fine. I could ingest with several small spark partitioned parquet datasources from s3 correctly. All the remaining configurations are untouched.
As I tried loading a giant datasource to test the performance and resource usage the task died after a couple of hours because of OutOfMemory.(It was expected)
2020-02-07T17:32:20,519 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - New segment[arc_2016-09-29T12:00:00.000Z_2016-09-29T13:00:00.000Z_2020-02-07T17:22:45.965Z] for sequenceName[index_parallel_arc_chgindko_2020-02-07T14:59:32.337Z].
Terminating due to java.lang.OutOfMemoryError: GC overhead limit exceeded
Now every time I restart Druid, it starts that giant task and it is impossible to kill it. Even when the task apparently dies or turns in waiting status the CPU usage is about 140% and I cannot submit new tasks to Druid. I tried to access the Derby database manually to find the task and remove it but I was not successful and this solution is really nasty. I know that I can change the database in the configuration so the next time I will have a fresh Druid but it is not a good solution as I will miss all other datasources. How can I get ready of this long running zombie task?

Mongo Db secondary setup

From last 1 week I am trying to setup replica set for my one node mongodb (3.4.2 version) but facing multiple issues. My primary node currently have around 650 gb of data and every day it is growing by 90 gb. First time I added new secondary node with empty data directory after almost a day it failed with too much of lag in oplog issue. Next time I tried manually copying data. After copy when restarted secondary it started giving me the error that I cannot synch from primary (There was not connection problem I was able to ping). I again retried manual copy procedure but this time it failed with below error. As wired tiger issue is with specific collection file. I copied that file again and retried but it failed again with same issue. Can someone please help me in setting up secondary. Everyday it is becoming more difficult as data is growing and I cannot keep primary down for long time (During manual copy I stop all writes in primary).
2017-03-02T16:08:16.315+0000 E STORAGE [initandlisten] WiredTiger error (-31802) [1488470896:315136][17051:0x7ffdbd3d7dc0], file:mcse.45trace/collection-16-7756455024301269277.wt, WT_SESSION.open_cursor: /app/data/mcse.45trace/collection-16-7756455024301269277.wt: handle-read: pread: failed to read 4096 bytes at offset 86474874880: WT_ERROR: non-specific WiredTiger error
2017-03-02T16:08:16.315+0000 I - [initandlisten] Invariant failure: ret resulted in status UnknownError: -31802: WT_ERROR: non-specific WiredTiger error at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 95
If you can solve that first problem with the replication lag, then you will probably get everything running OK. Take a look at the Troubleshooting Replica Sets guide, it has some useful suggestions:
Possible causes of replication lag include:
Network Latency
Check the network routes between the members of your set to ensure that there is no packet loss or network routing issue.
Use tools including ping to test latency between set members and traceroute to expose the routing of packets network endpoints.
Disk Throughput
If the file system and disk device on the secondary is unable to flush data to disk as quickly as the primary, then the secondary will have difficulty keeping state. Disk-related issues are incredibly prevalent on multi-tenant systems, including virtualized instances, and can be transient if the system accesses disk devices over an IP network (as is the case with Amazon’s EBS system.)
Use system-level tools to assess disk status, including iostat or vmstat.
Concurrency
In some cases, long-running operations on the primary can block replication on secondaries. For best results, configure write concern to require confirmation of replication to secondaries. This prevents write operations from returning if replication cannot keep up with the write load.
Use the database profiler to see if there are slow queries or long-running operations that correspond to the incidences of lag.
Appropriate Write Concern
If you are performing a large data ingestion or bulk load operation that requires a large number of writes to the primary, particularly with unacknowledged write concern, the secondaries will not be able to read the oplog fast enough to keep up with changes.
To prevent this, request write acknowledgement write concern after every 100, 1,000, or another interval to provide an opportunity for secondaries to catch up with the primary.
For more information see:
• Write Concern
• Replica Set Write Concern
• Oplog Size
WiredTiger error (-31802) file:xxx.wt
This could be related to corrupted .wt files (e.g. WiredTiger.wt/WiredTiger.turtle) as per SERVER-31076 bug report.
Try running:
mongod --repair --dbpath /path/to/data/db
Also make sure all data/db files have the right read and write permission.

Can I catch events such as on Executor start in Apache Spark?

What I want to do, is for the executor to start a program, such as a profiling tool, when it starts (that is, before it start executing any task). In this way, it would be possible to monitor things like CPU usage of an executor. Does Spark provide such hooks/callbacks? I have used SparkListener, but that is used by the driver side. Do we have a similar thing for Executors?
This should work for your requirement.
http://spark.apache.org/developer-tools.html#profiling
Setup yourkit to work with both drivers and slaves (executors). It doesn't start profiling unless you tell it. Connect to master or slave, start profiling and then run your tests.
Happy profiling!!

why did serverstatus have a bad effect on mongod write operation?

I have 1 mongos, 3 mongod and 3 config server. when I write some documents, sometimes the insert speed of one of mongods is very slow and there're "serverstatus was very slow" in mongod log file. why?
the version is 2.0.4
That message actually reflects the fact that your server was slow, not that serverStatus is causing the problem. If the serverStatus command (which is run periodically by MMS agents for example) is slow, it will log that warning - it is a symptom rather than a cause.
It is quite lightweight as a command, so if it is returning slowly enough to warn you about it then the host is likely very busy at that time.
The usual places to look for load apply (high inserts/updates, table scans, poorly indexed queries, disk issues, RAM/CPU contention, page faults etc.).