YCSB get stuck when working with Cassandra - ycsb

The command I use:
sudo bin/ycsb run cassandra-cql -P workloads/workloadf -P cassandra.dat -s > outputs/runf-2.dat 2> outputs/log-runf-2.txt
And:
tail -f outputs/log-runf-2.txt
But the log stuck at:
2021-08-16 17:36:38:255 7850 sec: 65373659 operations; 6141.2 current ops/sec; est completion in 1 hour 9 minutes [READ: Count=61414, Max=1200127, Min=265, Avg=4892.3, 90=1040, 99=168191, 99.9=544255, 99.99=799743] [READ-MODIFY-WRITE: Count=30620, Max=986111, Min=472, Avg=5108.82, 90=1557, 99=153087, 99.9=481279, 99.99=762367] [UPDATE: Count=30634, Max=63551, Min=166, Avg=484.96, 90=550, 99=737, 99.9=14887, 99.99=44991]
2021-08-16 17:36:48:255 7860 sec: 65435896 operations; 6223.7 current ops/sec; est completion in 1 hour 9 minutes [READ: Count=62232, Max=1267711, Min=300, Avg=4969.52, 90=1029, 99=150399, 99.9=586239, 99.99=946175] [READ-MODIFY-WRITE: Count=31050, Max=1258495, Min=541, Avg=5495.38, 90=1544, 99=156543, 99.9=572927, 99.99=950271] [UPDATE: Count=31044, Max=94847, Min=152, Avg=486.59, 90=544, 99=730, 99.9=16095, 99.99=54623]
2021-08-16 17:36:58:255 7870 sec: 65501151 operations; 6525.5 current ops/sec; est completion in 1 hour 9 minutes [READ: Count=65256, Max=2203647, Min=233, Avg=4650.72, 90=1016, 99=151935, 99.9=532479, 99.99=888319] [READ-MODIFY-WRITE: Count=32585, Max=2203647, Min=368, Avg=5292.65, 90=1524, 99=151295, 99.9=549375, 99.99=909823] [UPDATE: Count=32576, Max=87935, Min=132, Avg=485.37, 90=542, 99=726, 99.9=15343, 99.99=55423]
2021-08-16 17:37:08:255 7880 sec: 65559502 operations; 5835.1 current ops/sec; est completion in 1 hour 9 minutes [READ: Count=58354, Max=1277951, Min=313, Avg=5203.44, 90=1037, 99=176767, 99.9=634367, 99.99=939519] [READ-MODIFY-WRITE: Count=29259, Max=1217535, Min=563, Avg=5589.22, 90=1547, 99=172031, 99.9=627711, 99.99=916479] [UPDATE: Count=29247, Max=76863, Min=183, Avg=480.89, 90=545, 99=733, 99.9=17087, 99.99=57279]
2021-08-16 17:37:18:255 7890 sec: 65614920 operations; 5541.8 current ops/sec; est completion in 1 hour 8 minutes [READ: Count=55415, Max=1049599, Min=199, Avg=5494.55, 90=1047, 99=192383, 99.9=639999, 99.99=934911] [READ-MODIFY-WRITE: Count=27552, Max=1030143, Min=326, Avg=5864.25, 90=1571, 99=184319, 99.9=578047, 99.99=915455] [UPDATE: Count=27567, Max=773631, Min=113, Avg=551.14, 90=553, 99=742, 99.9=26751, 99.99=111295]
It didn't show any error or warning but stopped printing log.
I check the ycsb process:
ps auwx | grep ycsb
The result:
ran 93177 0.0 0.0 13144 1048 pts/2 S+ 18:10 0:00 grep --color=auto ycsb

Related

Cssandra workload fails o insert data

While I run "worklada" from seed node,
#./bin/ycsb.sh load cassandra-cql -p hosts="X.X.X.X" -P /root/ycsb/ycsb-0.17.0/workloads/workloada -threads 64 -p operationcount=1000000 -p recordcount=1000000 -s > /root/ycsb/ycsb-0.17.0/workload_A64T_VSSBB_load.csv
It throws bellow results after the run :
Error inserting, not retrying any more. number of attempts: 1Insertion Retry Limit: 0
2022-11-23 08:13:24:257 2 sec: 0 operations; est completion in 106751991167300 days 15 hours [CLEANUP: Count=64, Max=2234367, Min=0, Avg=34896.25, 90=1, 99=4, 99.9=2234367, 99.99=2234367] [INSERT: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [INSERT-FAILED: Count=64, Max=23679, Min=16240, Avg=19395.94, 90=22223, 99=23471, 99.9=23679, 99.99=23679]
What can cause this error?

Shard Server crashed with invariant failure - commonPointOpTime.getTimestamp() >= lastCommittedOpTime.getTimestamp()

We have deployed the community edition of MongoDB in a Kubernetes cluster. In this deployment, one of the shard DB pods is crashing with a failure error message,
2022-08-08T17:46:04.110+0000 F - [rsBackgroundSync] Invariant failure commonPointOpTime.getTimestamp() >= lastCommittedOpTime.getTimestamp() src/mongo/db/repl/rollback_impl.cpp 955
We want to understand in what circumstances such an error might arise.
Going through the logs, I understand that this specific server has a replication commit point greater than the rollback common point.
https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/rs_rollback.cpp#L1995
2022-08-08T17:43:22.737+0000 I REPL [rsBackgroundSync] Starting rollback due to OplogStartMissing: Our last optime fetched: { ts: Timestamp(1658645237, 1), t: 33 }. source's GTE: { ts: Timestamp(1658645239, 1), t: 34 }
2022-08-08T17:43:22.737+0000 I REPL [rsBackgroundSync] Replication commit point: { ts: Timestamp(1658645115, 8), t: 33 }
2022-08-08T17:43:22.737+0000 I REPL [rsBackgroundSync] Rollback using 'recoverToStableTimestamp' method.
2022-08-08T17:43:22.737+0000 I REPL [rsBackgroundSync] Scheduling rollback (sync source: mongo-shareddb-3.mongo-shareddb-service.avx.svc.cluster.local:27017)
2022-08-08T17:43:22.737+0000 I ROLLBACK [rsBackgroundSync] transition to ROLLBACK
2022-08-08T17:46:04.109+0000 I ROLLBACK [rsBackgroundSync] Rollback common point is { ts: Timestamp(1658645070, 13), t: 33 }
2022-08-08T17:46:04.110+0000 F - [rsBackgroundSync] Invariant failure commonPointOpTime.getTimestamp() >= lastCommittedOpTime.getTimestamp() src/mongo/db/repl/rollback_impl.cpp 955
2022-08-08T17:46:04.110+0000 F - [rsBackgroundSync] \n\n***aborting after invariant() failure\n\n
Timestamps in the log:
1658645105 - 24 July 2022 06:45:05 GMT
1658645237 - 24 July 2022 06:47:17 GMT
1658645239 - 24 July 2022 06:47:19 GMT
1658645115 - 24 July 2022 06:45:15 GMT
1658645070 - 24 July 2022 06:44:30 GMT
commonPointOpTime = 24 July 2022 06:44:30 GMT
lastCommittedOpTime = 24 July 2022 06:45:15 GMT
What could lead to such an invalid state?

Observing READ-FAILED after 2.5 hrs when running YCSB on Cassandra

I am new to Cassandra and YCSB and trying to run benchmarking on the 3 node cassandra cluster which is built through docker-compose with YCSB.
YCSB's load phase completed in 4 hrs without any error or issues but in the run phase, I am seeing "READ-FAILED" error after running load for 2.5 hours (on 9212 second). I tried running the same test a couple of times but seeing the same issue not sure why.
.
.
2021-05-27 22:22:53:019 9208 sec: 8625003 operations; 661 current ops/sec; est completion in 6 days 1 hour [READ: Count=133, Max=89599, Min=311, Avg=5145.44, 90=10551, 99=78783, 99.9=89599, 99.99=89599] [READ-MODIFY-WRITE: Count=69, Max=26751, Min=707, Avg=4425.57, 90=11583, 99=18271, 99.9=26751, 99.99=26751] [INSERT: Count=450, Max=1432, Min=216, Avg=537.25, 90=818, 99=1128, 99.9=1432, 99.99=1432] [UPDATE: Count=145, Max=1471, Min=184, Avg=472.85, 90=733, 99=1284, 99.9=1471, 99.99=1471]
2021-05-27 22:22:54:019 9209 sec: 8625668 operations; 665 current ops/sec; est completion in 6 days 1 hour [READ: Count=127, Max=66367, Min=334, Avg=4931.35, 90=12767, 99=36127, 99.9=66367, 99.99=66367] [READ-MODIFY-WRITE: Count=64, Max=36543, Min=709, Avg=4670.2, 90=13511, 99=34143, 99.9=36543, 99.99=36543] [INSERT: Count=458, Max=2303, Min=237, Avg=589.22, 90=869, 99=1195, 99.9=2303, 99.99=2303] [UPDATE: Count=144, Max=1190, Min=218, Avg=501.5, 90=759, 99=1186, 99.9=1190, 99.99=1190]
2021-05-27 22:22:55:019 9210 sec: 8626279 operations; 611 current ops/sec; est completion in 6 days 1 hour [READ: Count=110, Max=98495, Min=399, Avg=6190.99, 90=12063, 99=38431, 99.9=98495, 99.99=98495] [READ-MODIFY-WRITE: Count=55, Max=100095, Min=692, Avg=8793.56, 90=15983, 99=39999, 99.9=100095, 99.99=100095] [INSERT: Count=441, Max=1659, Min=241, Avg=624.24, 90=969, 99=1327, 99.9=1659, 99.99=1659] [UPDATE: Count=119, Max=1395, Min=187, Avg=571.55, 90=909, 99=1310, 99.9=1395, 99.99=1395]
2021-05-27 22:22:56:019 9211 sec: 8626842 operations; 563 current ops/sec; est completion in 6 days 1 hour [READ: Count=118, Max=97215, Min=318, Avg=5499.74, 90=10463, 99=93055, 99.9=97215, 99.99=97215] [READ-MODIFY-WRITE: Count=45, Max=98495, Min=742, Avg=5810.96, 90=8807, 99=98495, 99.9=98495, 99.99=98495] [INSERT: Count=385, Max=1252, Min=239, Avg=616.27, 90=924, 99=1163, 99.9=1252, 99.99=1252] [UPDATE: Count=101, Max=1327, Min=195, Avg=580.12, 90=904, 99=1097, 99.9=1327, 99.99=1327]
2021-05-27 22:22:57:019 9212 sec: 8627010 operations; 168 current ops/sec; est completion in 6 days 1 hour [READ: Count=33, Max=90367, Min=732, Avg=12685.67, 90=35679, 99=90367, 99.9=90367, 99.99=90367] [READ-MODIFY-WRITE: Count=18, Max=93183, Min=1121, Avg=17020.33, 90=36895, 99=93183, 99.9=93183, 99.99=93183] [INSERT: Count=120, Max=109951, Min=325, Avg=2155.85, 90=3283, 99=7943, 99.9=109951, 99.99=109951] [UPDATE: Count=35, Max=11567, Min=302, Avg=1142.29, 90=2081, 99=11567, 99.9=11567, 99.99=11567] [READ-FAILED: Count=1, Max=23615, Min=23600, Avg=23608, 90=23615, 99=23615, 99.9=23615, 99.99=23615]
2021-05-27 22:22:58:019 9213 sec: 8627523 operations; 513 current ops/sec; est completion in 6 days 1 hour [READ: Count=87, Max=97151, Min=417, Avg=8968.98, 90=14639, 99=67967, 99.9=97151, 99.99=97151] [READ-MODIFY-WRITE: Count=44, Max=62303, Min=654, Avg=7554.91, 90=14047, 99=62303, 99.9=62303, 99.99=62303] [INSERT: Count=378, Max=1220, Min=240, Avg=467.85, 90=686, 99=1030, 99.9=1220, 99.99=1220] [UPDATE: Count=97, Max=1017, Min=217, Avg=411.89, 90=649, 99=861, 99.9=1017, 99.99=1017] [READ-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=NaN, 90=0, 99=0, 99.9=0, 99.99=0]
2021-05-27 22:22:59:019 9214 sec: 8628119 operations; 596 current ops/sec; est completion in 6 days 1 hour [READ: Count=115, Max=112063, Min=334, Avg=6460.7, 90=12127, 99=90943, 99.9=112063, 99.99=112063] [READ-MODIFY-WRITE: Count=58, Max=91711, Min=788, Avg=6967.95, 90=13015, 99=60575, 99.9=91711, 99.99=91711] [INSERT: Count=423, Max=1359, Min=234, Avg=473.31, 90=708, 99=895, 99.9=1359, 99.99=1359] [UPDATE: Count=108, Max=1033, Min=210, Avg=429.63, 90=637, 99=1031, 99.9=1033, 99.99=1033] [READ-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=NaN, 90=0, 99=0, 99.9=0, 99.99=0]
2021-05-27 22:23:00:019 9215 sec: 8628679 operations; 560 current ops/sec; est completion in 6 days 1 hour [READ: Count=117, Max=115071, Min=327, Avg=6498.37, 90=16143, 99=64863, 99.9=115071, 99.99=115071] [READ-MODIFY-WRITE: Count=66, Max=65599, Min=607, Avg=6775.21, 90=17151, 99=48191, 99.9=65599, 99.99=65599] [INSERT: Count=391, Max=1137, Min=218, Avg=466.95, 90=711, 99=1021, 99.9=1137, 99.99=1137] [UPDATE: Count=118, Max=1338, Min=191, Avg=438.92, 90=674, 99=1012, 99.9=1338, 99.99=1338] [READ-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=NaN, 90=0, 99=0, 99.9=0, 99.99=0]
2021-05-27 22:23:01:019 9216 sec: 8629411 operations; 732 current ops/sec; est completion in 6 days 1 hour [READ: Count=139, Max=94143, Min=390, Avg=5108.03, 90=10015, 99=59999, 99.9=94143, 99.99=94143] [READ-MODIFY-WRITE: Count=71, Max=95039, Min=597, Avg=5881.15, 90=8959, 99=41823, 99.9=95039, 99.99=95039] [INSERT: Count=524, Max=1256, Min=200, Avg=443.07, 90=639, 99=1023, 99.9=1218, 99.99=1256] [UPDATE: Count=142, Max=988, Min=174, Avg=404.29, 90=659, 99=926, 99.9=988, 99.99=988] [READ-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=NaN, 90=0, 99=0, 99.9=0, 99.99=0]
2021-05-27 22:23:02:019 9217 sec: 8629929 operations; 518 current ops/sec; est completion in 6 days 1 hour [READ: Count=116, Max=103615, Min=362, Avg=6558.6, 90=12535, 99=89599, 99.9=103615, 99.99=103615] [READ-MODIFY-WRITE: Count=55, Max=103999, Min=619, Avg=7671.18, 90=15127, 99=19727, 99.9=103999, 99.99=103999] [INSERT: Count=344, Max=960, Min=233, Avg=481.37, 90=683, 99=892, 99.9=960, 99.99=960] [UPDATE: Count=111, Max=818, Min=189, Avg=402.95, 90=596, 99=779, 99.9=818, 99.99=818] [READ-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=NaN, 90=0, 99=0, 99.9=0, 99.99=0]
.
.
.
However, when I am running benchmarking on MongoDB it's working fine not seeing any error.
Please let me know if any settings or parameter needs to be changed in Cassandra yml deployment or while running YCSB on Cassandra cluster.
In case if you need any more logs please do let me know, will upload them as per request. Currently, I have uploaded 2 log files (on github) one for docker and Cassandra logs and one for YCSB execution.
Any help is appreciated.
[ycsb_logs.txt] https://github.com/neekhraashish/logs/blob/main/ycsb_logs.txt
[docker_cassandra_logs.txt] https://github.com/neekhraashish/logs/blob/main/docker_cassandra_logs.txt
Thanks
Looking at the Cassandra logs, the cluster is not in a healthy state - a few things are noticeable:
The commit log sync warnings - this is indicating that the underlying IO is not keeping up with the commit logs being written to disk.
Dropped mutations - a lot of operations are being dropped between the nodes, this then comes back in the form of synchronous read-repairs when the digest mismatch is noticed on reading - and these read repairs also also often failing.
Some more details on how you have the storage / io provisioned would be useful.

Not running RabbitMQ on Linux, can not find the file asn1.app

I installed on CentOs successfully ever. However, here is another CentOs I used, and it failed to stared rabbitMq.
My erlang from here.
[rabbitmq-erlang]
name=rabbitmq-erlang
baseurl=https://dl.bintray.com/rabbitmq/rpm/erlang/20/el/7
gpgcheck=1
gpgkey=https://dl.bintray.com/rabbitmq/Keys/rabbitmq-release-signing-key.asc
repo_gpgcheck=0
enabled=1
this is my erl_crash.dump.
erl_crash_dump:0.5
Sat Jun 23 09:17:30 2018
Slogan: init terminating in do_boot ({error,{no such file or directory,asn1.app}})
System version: Erlang/OTP 20 [erts-9.3.3] [source] [64-bit] [smp:24:24] [ds:24:24:10] [async-threads:384] [hipe] [kernel-poll:true]
Compiled: Tue Jun 19 22:25:03 2018
Taints: erl_tracer,zlib
Atoms: 14794
Calling Thread: scheduler:2
=scheduler:1
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK
Current Process:
=scheduler:2
Scheduler Sleep Info Flags:
Scheduler Sleep Info Aux Work: THR_PRGR_LATER_OP
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC
Current Process: <0.0.0>
Current Process State: Running
Current Process Internal State: ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | ACTIVE | RUNNING | TRAP_EXIT | ON_HEAP_MSGQ
Current Process Program counter: 0x00007fbd81fa59c0 (init:boot_loop/2 + 64)
Current Process CP: 0x0000000000000000 (invalid)
how to identify this problem ? Thank you.

Mojolicious: long subroutine have no effect on my variable

I have some trouble with very long subroutine with hypnotoad.
I need to run 1 minute subroutine (hardware connected requirements).
Firsly, i discover this behavior with:
my $var = 0;
Mojo::IOLoop->recurring(60 => sub {
$log->debug("starting loop: var: $var");
if ($var == 0) {
#...
#make some long (30 to 50 seconds) communication with external hardware
sleep 50; #fake reproduction of this long communication
$var = 1;
}
$log->debug("ending loop: var: $var");
})
Log:
14:13:45 2018 [debug] starting loop: var: 0
14:14:26 2018 [debug] ending loop: var: 1 #duration: 41 seconds
14:14:26 2018 [debug] starting loop: var: 0
14:15:08 2018 [debug] ending loop: var: 1 #duration: 42 seconds
14:15:08 2018 [debug] starting loop: var: 0
14:15:49 2018 [debug] ending loop: var: 1 #duration: 42 seconds
14:15:50 2018 [debug] starting loop: var: 0
14:16:31 2018 [debug] ending loop: var: 1 #duration: 41 seconds
...
3 problems:
1) Where do these 42 seconds come from? (yes, i know, 42 seconds is the answer of the universe...)
2) Why the IOLoop recuring loses his pace?
3) Why my variable is setted to 1, and just one second after, the if get a variable equal to 0?
When looped job needs 20 seconds or 25 seconds, no problem.
When looped job needs 60 secondes and used with morbo, no problem.
When looped job needs more than 40 seconds and used with hypnotoad (1 worker), this is the behavior explained here.
If i increase the "not needed" time (e.g. 120 seconds IOLoop for 60 seconds jobs, the behaviour is always the same.
It's not a problem about IOLoop, i can reproduce the same behavior outside loop.
I suspect a problem with worker killed and heart-beat, but i have no log about that.