Couchbase: 20k items stuck in Tap Queue - memcached

We are currently evaluating couchbase as a memcached replacement in the first place. Our setup looks like this:
php -> localhost moxi -> couchbase bucket (Total bucket size = 10240 MB (2048 MB x 5 nodes with replica count 1))
The Servers have 16GB RAM and are SSD backed.
We were inserting at about 400 ops/s and had no problem for a few days. When we reached about 13 million items. We found out that we forgot to implement the delete function in our testsetup and a lot of keys had no expiration set.
To start over again we flushed the bucket through the webinterface. This where our problems began.
We started to see that we had temp ooms, back-offs, and tap queue was filled with 20k items. the drain and fill rate was nearly the same. See attached screenshot
What also catched our eye was that node 4 had only 220k items, where everyone else had around 1.39M
Somehow it looks like the replication messed up something, but im relatively new to couchbase. Any hints, suggestions? - See more at: http://www.couchbase.com/communities/q-and-a/20k-items-stuck-tap-queue#sthash.v9MxNnTk.dpuf
The problem was solved for a short time, after removing the failing node from the cluster.
So now with this four nodes left in the cluster, after some hours the same happend again with another node. We tried setting the now failing node into FailOver state. That fixed the problem again, but after Re-Adding the node, the same phenomenon happened again on that node.
Other things we realized are:
* Three out of four nodes have thousands of items in their TAP replication queue, but one
("the failing one") has 0.
* Also three out of four nodes have a back-off rate of around 400, but one ("the failing one") has 0.
* Only the failing one has a massive amount of "Temp OOMs per second", but the other three have 0.
The phenomenon seems to disappear, if we lower the load to the servers by disabling the couchbase-writes for one out of two software project writing to couchbase.
But if we enable the writes again, after around 10 minutes we can see this in the memcached.log on the failing node:
Tue Dec 17 12:29:05.010547 CET 3: (CENSORED) Received error[86] from mccouch for unknown
Tue Dec 17 12:29:05.010576 CET 3: (CENSORED) Retry notify CouchDB of update, vbucket=277 rev=522
Tue Dec 17 12:29:08.748103 CET 3: (CENSORED) Received error[86] from mccouch for unknown
Tue Dec 17 12:29:08.748257 CET 3: (CENSORED) Retry notify CouchDB of update, vbucket=321 rev=948
Tue Dec 17 12:40:17.354448 CET 3: (CENSORED) Received error[86] from mccouch for unknown
Tue Dec 17 12:40:17.354476 CET 3: (CENSORED) Retry notify CouchDB of update, vbucket=303 rev=491
This error then happens around 5 times within four hours:
Tue Dec 17 14:19:32.145071 CET 3: (CENSORED) TAP (Producer) eq_tapq:replication_ns_1#10.65.20.12 - Suspend for 5.00 secs
And after these four hours it starts spamming this instantly (Maybe, because the load increased heavily, because in the evening our page generates much more load than in the morning/noon) together with this "error from mccouch":
Tue Dec 17 16:42:30.875343 CET 3: (CENSORED) TAP (Producer) eq_tapq:replication_ns_1#10.65.20.12 - Suspend for 5.00 secs
Tue Dec 17 16:42:36.493317 CET 3: (CENSORED) TAP (Producer) eq_tapq:replication_ns_1#10.65.20.12 - Suspend for 5.00 secs
Tue Dec 17 16:43:25.239876 CET 3: (CENSORED) Received error[86] from mccouch for unknown
Tue Dec 17 16:43:25.240052 CET 3: (CENSORED) Retry notify CouchDB of update, vbucket=296 rev=483
Tue Dec 17 16:43:25.903997 CET 3: (CENSORED) TAP (Producer) eq_tapq:replication_ns_1#10.65.20.12 - Suspend for 5.00 secs
Tue Dec 17 16:43:31.906178 CET 3: (CENSORED) TAP (Producer) eq_tapq:replication_ns_1#10.65.20.12 - Suspend for 5.00 secs
Tue Dec 17 16:43:36.913045 CET 3: (CENSORED) TAP (Producer) eq_tapq:replication_ns_1#10.65.20.12 - Suspend for 5.00 secs
Tue Dec 17 16:43:42.919114 CET 3: (CENSORED) TAP (Producer) eq_tapq:replication_ns_1#10.65.20.12 - Suspend for 5.00 secs
Tue Dec 17 16:43:48.920354 CET 3: (CENSORED) TAP (Producer) eq_tapq:replication_ns_1#10.65.20.12 - Suspend for 5.00 secs
Tue Dec 17 16:43:54.924017 CET 3: (CENSORED) TAP (Producer) eq_tapq:replication_ns_1#10.65.20.12 - Suspend for 5.00 secs
Tue Dec 17 16:44:00.928572 CET 3: (CENSORED) TAP (Producer) eq_tapq:replication_ns_1#10.65.20.12 - Suspend for 5.00 secs
We have no clue what is happening here, why this failing node seems to reject every replication and throwing this error.
Do you have any idea?
Thanks for all your help and greetings from Cologne,
Andy!

Seeing as you just want to delete all items in the Bucket have you tried just deleting and re-creating the bucket?
This will be much faster than flush, as flush actually needs to send a delete request for every document in the bucket.
I can't find it in the docs at the moment, but I think Flush is not really recommended with the latest versions.

you are not writing what is your operating system. If it's Linux try to check maximum amount of open sockets for user running the Couchbase. Check the file /etc/security/limits.conf.
the command for check on Linux is: ulimit -Hn.
Hope that helps.
Daniel

I think you should try these settings:
http://docs.couchbase.com/couchbase-manual-2.1/#specifying-backoff-for-replication

Related

Why does Puma mess up incoming requests? (timed out worker)

Problem
I have a Rails 7 app deployed on render.com, and it doesn't get a lot of traffic (maybe once per day). However, when a few requests do come in, everything seems to running fine for a moment until Puma seems to barf. The incoming requests are from Twilio for a voice call, and the call eventually errors with "We're sorry, an application error has occurred. Goodbye". It seems like something about a "timed out" worker happens, then the worker boots, and whammo! a flood of "Completed 2XX OK" and "Kredis Connected to shared" lines come crashing through like they've been pent up the entire time. THEN, nearly a day later without any outside requests coming in, several log lines about Out-of-sync worker list, no 78 worker come through. My Puma config file is unchanged from what ships with Rails.
Questions
Where might I go look for the offending code? What tools could help me decipher why a Puma worker is timing out? Could it have something to do with how I'm using Redis via Kredis in my app?
Workaround
To get around this issue, I've started to occasionally redeploy my latest commit and that seems to help. I'm not certain, but it seems like inactivity causes Puma to become discombobulated.
Log output
Here's what the offending lines in my log file look like:
... a few requests that complete 200 OK ...
Sep 13 05:53:15 PM [70] ! Terminating timed out worker (worker failed to check in within 60 seconds): 90
... a couple more normal log lines and then ...
Sep 13 05:53:16 PM [70] - Worker 3 (PID: 134) booted in 0.04s, phase: 0
... some more normal log lines and then ...
Sep 13 05:53:16 PM I, [2022-09-13T22:53:16.593713 #74] INFO -- : [595ad8e5-fa3a-45a3-8c5b-a506e6c94b69] Completed 204 No Content in 110ms (Allocations: 13681)
Sep 13 05:53:16 PM I, [2022-09-13T22:53:16.425579 #86] INFO -- : [f1a64c71-8048-4032-8bf6-2e68aa1fa7ba] Completed 204 No Content in 2ms (Allocations: 541)
Sep 13 05:53:16 PM I, [2022-09-13T22:53:16.595408 #86] INFO -- : [68d19bd9-2286-4f75-a982-5fa3e864d6ac] Completed 200 OK in 105ms (Views: 0.2ms | Allocations: 1592)
Sep 13 05:53:16 PM I, [2022-09-13T22:53:16.614951 #76] INFO -- : [e883350f-9a26-4d3d-8f1c-4853285aa71a] Kredis (10.6ms) Connected to shared
Sep 13 05:53:16 PM I, [2022-09-13T22:53:16.615787 #76] INFO -- : [fbcd8730-1514-4af5-9332-0bdf0c89fc2d] Kredis (17.2ms) Connected to shared
Sep 13 05:53:16 PM I, [2022-09-13T22:53:16.705926 #86] INFO -- : [1f67a177-38f2-4bf5-bd03-1c59a3edb3a4] Kredis (224.1ms) Connected to shared
Sep 13 05:53:16 PM I, [2022-09-13T22:53:16.958386 #76] INFO -- : [e883350f-9a26-4d3d-8f1c-4853285aa71a] Completed 200 OK in 472ms (ActiveRecord: 213.1ms | Allocations: 32402)
Sep 13 05:53:17 PM I, [2022-09-13T22:53:17.034211 #86] INFO -- : [1f67a177-38f2-4bf5-bd03-1c59a3edb3a4] Completed 200 OK in 606ms (ActiveRecord: 256.6ms | Allocations: 17832)
Sep 13 05:53:17 PM I, [2022-09-13T22:53:17.136231 #76] INFO -- : [fbcd8730-1514-4af5-9332-0bdf0c89fc2d] Completed 200 OK in 654ms (ActiveRecord: 88.0ms | Allocations: 37385)
... literally a day later without any other activity ...
Sep 14 05:02:29 AM [69] ! Terminating timed out worker (worker failed to check in within 60 seconds): 78
Sep 14 05:02:31 AM [69] ! Out-of-sync worker list, no 78 worker
Sep 14 05:02:31 AM [69] ! Out-of-sync worker list, no 78 worker
Sep 14 05:02:31 AM [69] ! Out-of-sync worker list, no 78 worker
Sep 14 05:02:31 AM [69] ! Out-of-sync worker list, no 78 worker
Sep 14 05:02:31 AM [69] ! Out-of-sync worker list, no 78 worker
Sep 14 05:02:31 AM [69] ! Out-of-sync worker list, no 78 worker
Sep 14 05:02:31 AM [69] - Worker 1 (PID: 132) booted in 0.03s, phase: 0

MongoDB data corruption on a replica set

I am working with a MongoDB database running in a replica set.
Unfortunately, I noticed that the data appears to be corrupted.
There should be over 10,000 documents in the database. However, there are several thousand records that are not being returned in queries.
The total count DOES show the correct total.
db.records.find().count()
10793
And some records are returned when querying by RecordID (a custom sequence integer).
db.records.find({"RecordID": 10049})
{ "_id" : ObjectId("5dfbdb35c1c2a400104edece")
However, when querying for a records that I know for a fact should exist, it does not return anything.
db.records.find({"RecordID": 10048})
db.records.find({"RecordID": 10047})
db.records.find({"RecordID": 10046})
The issue appears to be very sporadic, and in some cases entire ranges of records are missing. The entire range from RecordIDs 1500 to 8000 is missing.
Questions: What could be the cause of the issue? What can I do to troubleshoot this issue further and recover the corrupted data? I looked into running repairDatabase but that is for standalone instances only.
UPDATE:
More info on replication:
rs.printReplicationInfo()
configured oplog size: 5100.880859375MB
log length start to end: 14641107secs (4066.97hrs)
oplog first event time: Wed Mar 03 2021 05:21:25 GMT-0500 (EST)
oplog last event time: Thu Aug 19 2021 17:19:52 GMT-0400 (EDT)
now: Thu Aug 19 2021 17:20:01 GMT-0400 (EDT)
rs.printSecondaryReplicationInfo()
source: node2-examplehost.com:27017
syncedTo: Thu Aug 19 2021 17:16:42 GMT-0400 (EDT)
0 secs (0 hrs) behind the primary
source: node3-examplehost.com:27017
syncedTo: Thu Aug 19 2021 17:16:42 GMT-0400 (EDT)
0 secs (0 hrs) behind the primary
UPDATE 2:
We did a restore from a backup and somehow it looks like it fixed the issue.
We did a restore from a backup and somehow it looks like it fixed the issue.

Why can't linux read hwclock some month shift?

We have a linux system that we are building with yocto.
We can read our hardware clock after reboots, change both system time and hardware time without any error (most of the time). However; after some new month, every year that we have tried we are running in to this error. "hwclock: RTC_RD_TIME: Invalid argument".
Example 1:
root#:~# date
Thu Apr 30 23:59:50 UTC 2020
root#:~# hwclock
Thu Apr 30 23:59:52 2020 0.000000 seconds
root#:~#
root#:~#
root#:~# date
Fri May 1 00:00:10 UTC 2020
root#:~# hwclock
hwclock: RTC_TD_TIME: Invalid argument
root#:~#
This is not happening every new month, if I do the same test in January linux can read the hwclock without any issues. It does also not matter if the unit is powered or not. If I set the hwclock to first of May 00:00:00 it can keep track of the time.
The same error occurs on the following month shift:
Feb (it does not matter if it is leap year or not) -> Mar
Apr -> May
Jun -> Jul
Sep -> Oct
Nov -> Dec
Dec (Not sure because of new year or new month) -> Jan
In my understanding, this is happening because rtc-lib.c cannot verify the time correctly.
I have tried on multiple different hardware
Does anyone have any idea what might cause this?
Solution:
The fault was not in rtc-lib.c. The cause of the error was a faulty RTC implementation. The RTC month value is 1-indexed, but the kernel assumes it is 0-indexed. Added a patch for this to rtc-[my_rtc_model].c and now it seems to be working.

Master turning into slave after redis sentinel failover

I am trying out redis master slave replication using sentinels.
I have 1 master and 2 slaves and 3 sentinels. All running as different pods.
My issue is:
1) When I delete the master pod, one of the slaves turns to master.
2) Ideally, there should be a new master now with only one slave. For some reason, the master IP that I deleted turns into a slave of the newly elected master.
3) Is this a desirable behaviour? Because when the sentinel shows there are 2 slaves to the newly elected master, in fact there exists only 1 slave pod because the master pod is deleted.
Below are the logs:
:M 29 May 2020 07:32:19.569 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
8:M 29 May 2020 07:32:19.569 # Server initialized
8:M 29 May 2020 07:32:19.569 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
8:M 29 May 2020 07:32:19.569 * Ready to accept connections
8:M 29 May 2020 07:33:22.329 * Replica 172.16.2.12:6379 asks for synchronization
8:M 29 May 2020 07:33:22.329 * Full resync requested by replica 172.16.2.12:6379
8:M 29 May 2020 07:33:22.329 * Starting BGSAVE for SYNC with target: disk
8:M 29 May 2020 07:33:22.330 * Background saving started by pid 12
12:C 29 May 2020 07:33:22.333 * DB saved on disk
12:C 29 May 2020 07:33:22.334 * RDB: 2 MB of memory used by copy-on-write
8:M 29 May 2020 07:33:22.355 * Background saving terminated with success
8:M 29 May 2020 07:33:22.356 * Synchronization with replica 172.16.2.12:6379 succeeded
8:M 29 May 2020 07:33:23.092 * Replica 172.16.4.48:6379 asks for synchronization
8:M 29 May 2020 07:33:23.092 * Full resync requested by replica 172.16.4.48:6379
8:M 29 May 2020 07:33:23.092 * Starting BGSAVE for SYNC with target: disk
8:M 29 May 2020 07:33:23.092 * Background saving started by pid 13
13:C 29 May 2020 07:33:23.097 * DB saved on disk
13:C 29 May 2020 07:33:23.097 * RDB: 2 MB of memory used by copy-on-write
8:M 29 May 2020 07:33:23.158 * Background saving terminated with success
8:M 29 May 2020 07:33:23.158 * Synchronization with replica 172.16.4.48:6379 succeeded
8:M 29 May 2020 07:36:26.866 # Connection with replica 172.16.2.12:6379 lost.
8:M 29 May 2020 07:36:27.871 # Connection with replica 172.16.4.48:6379 lost.
8:S 29 May 2020 07:36:37.926 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
8:S 29 May 2020 07:36:37.927 * REPLICAOF 172.16.2.12:6379 enabled (user request from 'id=21 addr=172.16.3.135:56721 fd=9 name=sentinel-5261eb21-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=151 qbuf-free=32617 obl=36 oll=0 omem=0 events=r cmd=exec')
8:S 29 May 2020 07:36:37.933 # CONFIG REWRITE executed with success.
8:S 29 May 2020 07:36:38.284 * Connecting to MASTER 172.16.2.12:6379
8:S 29 May 2020 07:36:38.284 * MASTER <-> REPLICA sync started
8:S 29 May 2020 07:36:38.284 * Non blocking connect for SYNC fired the event.
8:S 29 May 2020 07:36:38.285 * Master replied to PING, replication can continue...
8:S 29 May 2020 07:36:38.285 * Trying a partial resynchronization (request 563ca4b5f67f1e24c129729eaa74800b108902a3:52568).
8:S 29 May 2020 07:36:38.321 * Full resync from master: f21b8c35187b109b621605b375ef62e61b301834:52901
8:S 29 May 2020 07:36:38.321 * Discarding previously cached master state.
8:S 29 May 2020 07:36:38.356 * MASTER <-> REPLICA sync: receiving 178 bytes from master
8:S 29 May 2020 07:36:38.356 * MASTER <-> REPLICA sync: Flushing old data
8:S 29 May 2020 07:36:38.356 * MASTER <-> REPLICA sync: Loading DB in memory
8:S 29 May 2020 07:36:38.356 * MASTER <-> REPLICA sync: Finished with success
I am using redis 5.0. Earlier I was using redis 4.0 but I did not face such issue.

How to measure time of replicating mongo database?

I perform a write operation with a huge data to primary server.
How to measure time since data available on primary server to secondary server.
From https://docs.mongodb.com/manual/tutorial/troubleshoot-replica-sets/#check-the-replication-lag:
To check the current length of replication lag:
In a mongo shell connected to the primary, call the rs.printSlaveReplicationInfo() method.
Returns the syncedTo value for each member, which shows the time when the last oplog entry was written to the secondary, as shown in the following example:
source: m1.example.net:27017
syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
0 secs (0 hrs) behind the primary
source: m2.example.net:27017
syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
0 secs (0 hrs) behind the primary