minikube kernel: hpet1: lost 319 rtc interrupts - minikube

$ minikube ssh
$ journalctl -f
Aug 02 04:25:38 minikube kernel: hpet1: lost 319 rtc interrupts
Aug 02 04:25:43 minikube kernel: hpet1: lost 318 rtc interrupts
Aug 02 04:25:48 minikube kernel: hpet1: lost 317 rtc interrupts
Aug 02 04:25:53 minikube kernel: hpet1: lost 319 rtc interrupts
Aug 02 04:25:58 minikube kernel: hpet1: lost 318 rtc interrupts
...
minikube has a slightly higher CPU footprint. When I check the log, I get a lot of such records. What does that mean?

This alert is related to the virtual HW clocking mechanism and is rather a harmless one.
You need to set the boot parameter hpet=disable
Detailed solution can be found here.
Go to /etc/default/grub
Find the line: GRUB_CMDLINE_LINUX_DEFAULT="quiet"
Add: GRUB_CMDLINE_LINUX_DEFAULT="quiet hpet=disable" parameter behind the existing separated with a space
Save the file
Update grub with: sudo update-grub
Reboot
Also, this log message is generated if you have the plugin Scheduled Wakealarms installed.
Please let me know if that helped.

Related

failing k8s mongodb pod after some time - DBPathInUse: Unable to create/open the lock file

I'm running a kubernetes cluster (bare metal) with a mongodb (version 4, as my server cannot handle newer versions) replicaset (2 replicas), which is initially working, but from time to time (sometimes 24 hours, somtimes 10 days) one or more mongodb pods are failing.
Warning BackOff 2m9s (x43454 over 6d13h) kubelet Back-off restarting failed container
The relevant part of the logs should be
DBPathInUse: Unable to create/open the lock file: /bitnami/mongodb/data/db/mongod.lock (Read-only file system). Ensure the user executing mongod is the owner of the lock file and has the appropriate permissions. Also make sure that another mongod instance is not already running on the /bitnami/mongodb/data/db directory
But I do not change anything and initially it is working. Also the second pod is currently running (but which will fail the next days).
I'm using longhorn (before I tried nfs) for the storage and I installed mongodb using bitnami helm chart with these values:
image:
registry: docker.io
repository: bitnami/mongodb
digest: "sha256:916202d7af766dd88c2fff63bf711162c9d708ac7a3ffccd2aa812e3f03ae209" # tag: 4.4.15
pullPolicy: IfNotPresent
architecture: replicaset
replicaCount: 2
updateStrategy:
type: RollingUpdate
containerPorts:
mongodb: 27017
auth:
enabled: true
rootUser: root
rootPassword: "password"
usernames: ["user"]
passwords: ["userpass"]
databases: ["db"]
service:
portName: mongodb
ports:
mongodb: 27017
persistence:
enabled: true
accessModes:
- ReadWriteOnce
size: 8Gi
volumePermissions:
enabled: true
livenessProbe:
enabled: false
readinessProbe:
enabled: false
logs
mongodb 21:25:05.55 INFO ==> Advertised Hostname: mongodb-1.mongodb-headless.mongodb.svc.cluster.local
mongodb 21:25:05.55 INFO ==> Advertised Port: 27017
mongodb 21:25:05.56 INFO ==> Pod name doesn't match initial primary pod name, configuring node as a secondary
mongodb 21:25:05.59
mongodb 21:25:05.59 Welcome to the Bitnami mongodb container
mongodb 21:25:05.60 Subscribe to project updates by watching https://github.com/bitnami/containers
mongodb 21:25:05.60 Submit issues and feature requests at https://github.com/bitnami/containers/issues
mongodb 21:25:05.60
mongodb 21:25:05.60 INFO ==> ** Starting MongoDB setup **
mongodb 21:25:05.64 INFO ==> Validating settings in MONGODB_* env vars...
mongodb 21:25:05.78 INFO ==> Initializing MongoDB...
mongodb 21:25:05.82 INFO ==> Deploying MongoDB with persisted data...
mongodb 21:25:05.83 INFO ==> Writing keyfile for replica set authentication...
mongodb 21:25:05.88 INFO ==> ** MongoDB setup finished! **
mongodb 21:25:05.92 INFO ==> ** Starting MongoDB **
{"t":{"$date":"2022-10-29T21:25:05.961+00:00"},"s":"I", "c":"CONTROL", "id":20698, "ctx":"main","msg":"***** SERVER RESTARTED *****"}
{"t":{"$date":"2022-10-29T21:25:05.963+00:00"},"s":"I", "c":"CONTROL", "id":23285, "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
{"t":{"$date":"2022-10-29T21:25:05.968+00:00"},"s":"W", "c":"ASIO", "id":22601, "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2022-10-29T21:25:05.968+00:00"},"s":"I", "c":"NETWORK", "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
{"t":{"$date":"2022-10-29T21:25:05.969+00:00"},"s":"W", "c":"ASIO", "id":22601, "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2022-10-29T21:25:06.011+00:00"},"s":"I", "c":"STORAGE", "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":1,"port":27017,"dbPath":"/bitnami/mongodb/data/db","architecture":"64-bit","host":"mongodb-1"}}
{"t":{"$date":"2022-10-29T21:25:06.011+00:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.15","gitVersion":"bc17cf2c788c5dda2801a090ea79da5ff7d5fac9","openSSLVersion":"OpenSSL 1.1.1n 15 Mar 2022","modules":[],"allocator":"tcmalloc","environment":{"distmod":"debian10","distarch":"x86_64","target_arch":"x86_64"}}}}
{"t":{"$date":"2022-10-29T21:25:06.012+00:00"},"s":"I", "c":"CONTROL", "id":51765, "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"PRETTY_NAME=\"Debian GNU/Linux 10 (buster)\"","version":"Kernel 5.15.0-48-generic"}}}
{"t":{"$date":"2022-10-29T21:25:06.012+00:00"},"s":"I", "c":"CONTROL", "id":21951, "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"config":"/opt/bitnami/mongodb/conf/mongodb.conf","net":{"bindIp":"*","ipv6":false,"port":27017,"unixDomainSocket":{"enabled":true,"pathPrefix":"/opt/bitnami/mongodb/tmp"}},"processManagement":{"fork":false,"pidFilePath":"/opt/bitnami/mongodb/tmp/mongodb.pid"},"replication":{"enableMajorityReadConcern":true,"replSetName":"rs0"},"security":{"authorization":"disabled","keyFile":"/opt/bitnami/mongodb/conf/keyfile"},"setParameter":{"enableLocalhostAuthBypass":"true"},"storage":{"dbPath":"/bitnami/mongodb/data/db","directoryPerDB":false,"journal":{"enabled":true}},"systemLog":{"destination":"file","logAppend":true,"logRotate":"reopen","path":"/opt/bitnami/mongodb/logs/mongodb.log","quiet":false,"verbosity":0}}}}
{"t":{"$date":"2022-10-29T21:25:06.013+00:00"},"s":"E", "c":"STORAGE", "id":20557, "ctx":"initandlisten","msg":"DBException in initAndListen, terminating","attr":{"error":"DBPathInUse: Unable to create/open the lock file: /bitnami/mongodb/data/db/mongod.lock (Read-only file system). Ensure the user executing mongod is the owner of the lock file and has the appropriate permissions. Also make sure that another mongod instance is not already running on the /bitnami/mongodb/data/db directory"}}
{"t":{"$date":"2022-10-29T21:25:06.013+00:00"},"s":"I", "c":"REPL", "id":4784900, "ctx":"initandlisten","msg":"Stepping down the ReplicationCoordinator for shutdown","attr":{"waitTimeMillis":10000}}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"COMMAND", "id":4784901, "ctx":"initandlisten","msg":"Shutting down the MirrorMaestro"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"SHARDING", "id":4784902, "ctx":"initandlisten","msg":"Shutting down the WaitForMajorityService"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"NETWORK", "id":20562, "ctx":"initandlisten","msg":"Shutdown: going to close listening sockets"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"NETWORK", "id":4784905, "ctx":"initandlisten","msg":"Shutting down the global connection pool"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"STORAGE", "id":4784906, "ctx":"initandlisten","msg":"Shutting down the FlowControlTicketholder"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"-", "id":20520, "ctx":"initandlisten","msg":"Stopping further Flow Control ticket acquisitions."}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"REPL", "id":4784907, "ctx":"initandlisten","msg":"Shutting down the replica set node executor"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"NETWORK", "id":4784918, "ctx":"initandlisten","msg":"Shutting down the ReplicaSetMonitor"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"SHARDING", "id":4784921, "ctx":"initandlisten","msg":"Shutting down the MigrationUtilExecutor"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"CONTROL", "id":4784925, "ctx":"initandlisten","msg":"Shutting down free monitoring"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"STORAGE", "id":4784927, "ctx":"initandlisten","msg":"Shutting down the HealthLog"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"STORAGE", "id":4784929, "ctx":"initandlisten","msg":"Acquiring the global lock for shutdown"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"-", "id":4784931, "ctx":"initandlisten","msg":"Dropping the scope cache for shutdown"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"FTDC", "id":4784926, "ctx":"initandlisten","msg":"Shutting down full-time data capture"}
{"t":{"$date":"2022-10-29T21:25:06.015+00:00"},"s":"I", "c":"CONTROL", "id":20565, "ctx":"initandlisten","msg":"Now exiting"}
{"t":{"$date":"2022-10-29T21:25:06.015+00:00"},"s":"I", "c":"CONTROL", "id":23138, "ctx":"initandlisten","msg":"Shutting down","attr":{"exitCode":100}}
Update
I checked the syslog and before the the logs Nov 14 23:07:17 k8s-worker2 kubelet[752]: E1114 23:07:17.749057 752 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"mongodb\" with CrashLoopBackOff: \"back-off 10s restarting failed container=mongodb pod=mongodb-2_mongodb(314f2776-ced4-4ba3-b90b-f927dc079770)\"" pod="mongodb/mongodb-2" podUID=314f2776-ced4-4ba3-b90b-f927dc079770
I find these logs:
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.341806] sd 2:0:0:1: [sda] tag#42 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=11s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.341866] sd 2:0:0:1: [sda] tag#42 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.341891] sd 2:0:0:1: [sda] tag#42 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.341899] sd 2:0:0:1: [sda] tag#42 CDB: Write(10) 2a 00 00 85 1f b8 00 00 40 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.341912] blk_update_request: critical medium error, dev sda, sector 8724408 op 0x1:(WRITE) flags 0x800 phys_seg 8 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.352012] Aborting journal on device sda-8.
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.354980] EXT4-fs error (device sda) in ext4_reserve_inode_write:5726: Journal has aborted
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.355103] sd 2:0:0:1: [sda] tag#40 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=15s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.357056] sd 2:0:0:1: [sda] tag#40 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.357061] sd 2:0:0:1: [sda] tag#40 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.357066] sd 2:0:0:1: [sda] tag#40 CDB: Write(10) 2a 00 00 44 14 88 00 00 10 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.357068] blk_update_request: critical medium error, dev sda, sector 4461704 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.357088] EXT4-fs error (device sda): ext4_dirty_inode:5922: inode #131080: comm mongod: mark_inode_dirty error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.359566] EXT4-fs warning (device sda): ext4_end_bio:344: I/O error 7 writing to inode 131081 starting block 557715)
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.361432] EXT4-fs error (device sda) in ext4_dirty_inode:5923: Journal has aborted
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.362792] Buffer I/O error on device sda, logical block 557713
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.364010] Buffer I/O error on device sda, logical block 557714
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.365222] sd 2:0:0:1: [sda] tag#43 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=8s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.365228] sd 2:0:0:1: [sda] tag#43 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.365230] sd 2:0:0:1: [sda] tag#43 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.365233] sd 2:0:0:1: [sda] tag#43 CDB: Write(10) 2a 00 00 44 28 38 00 00 08 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.365234] blk_update_request: critical medium error, dev sda, sector 4466744 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.367434] EXT4-fs warning (device sda): ext4_end_bio:344: I/O error 7 writing to inode 131083 starting block 558344)
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.367442] Buffer I/O error on device sda, logical block 558343
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.368593] sd 2:0:0:1: [sda] tag#41 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=15s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.368597] sd 2:0:0:1: [sda] tag#41 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.368599] sd 2:0:0:1: [sda] tag#41 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.368602] sd 2:0:0:1: [sda] tag#41 CDB: Write(10) 2a 00 00 44 90 70 00 00 10 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.368604] blk_update_request: critical medium error, dev sda, sector 4493424 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.370907] EXT4-fs warning (device sda): ext4_end_bio:344: I/O error 7 writing to inode 131081 starting block 561680)
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.370946] sd 2:0:0:1: [sda] tag#39 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=15s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.370949] sd 2:0:0:1: [sda] tag#39 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.370952] sd 2:0:0:1: [sda] tag#39 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.370949] EXT4-fs error (device sda): ext4_journal_check_start:83: comm kworker/u4:0: Detected aborted journal
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.370954] sd 2:0:0:1: [sda] tag#39 CDB: Write(10) 2a 00 00 10 41 98 00 00 08 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.372081] blk_update_request: critical medium error, dev sda, sector 1065368 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.374353] EXT4-fs warning (device sda): ext4_end_bio:344: I/O error 7 writing to inode 131080 starting block 133172)
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.374396] Buffer I/O error on device sda, logical block 133171
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.388492] EXT4-fs error (device sda) in __ext4_new_inode:1136: Journal has aborted
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.390763] EXT4-fs error (device sda) in ext4_create:2786: Journal has aborted
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.391732] sd 2:0:0:1: [sda] tag#46 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.392941] sd 2:0:0:1: [sda] tag#46 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.392944] sd 2:0:0:1: [sda] tag#46 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.392948] sd 2:0:0:1: [sda] tag#46 CDB: Write(10) 2a 08 00 00 00 00 00 00 08 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.392950] blk_update_request: critical medium error, dev sda, sector 0 op 0x1:(WRITE) flags 0x23800 phys_seg 1 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.395562] Buffer I/O error on dev sda, logical block 0, lost sync page write
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.396945] sd 2:0:0:1: [sda] tag#45 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.396953] sd 2:0:0:1: [sda] tag#45 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.396955] sd 2:0:0:1: [sda] tag#45 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.396958] sd 2:0:0:1: [sda] tag#45 CDB: Write(10) 2a 08 00 84 00 00 00 00 08 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.396959] blk_update_request: critical medium error, dev sda, sector 8650752 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.396930] EXT4-fs (sda): I/O error while writing superblock
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.399771] Buffer I/O error on dev sda, logical block 1081344, lost sync page write
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.403897] JBD2: Error -5 detected when updating journal superblock for sda-8.
Nov 14 23:07:01 k8s-worker2 systemd[1]: run-docker-runtime\x2drunc-moby-d1c0f0dc3e024723707edfc12e023b98fb98f1be971177ecca5ac0cfdc91ab87-runc.w3zzIL.mount: Deactivated successfully.
Nov 14 23:07:05 k8s-worker2 kubelet[752]: E1114 23:07:05.415798 752 dns.go:157] "Nameserver limits exceeded" err="Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 46.38.252.230 46.38.225.230 2a03:4000:0:1::e1e6"
Nov 14 23:07:06 k8s-worker2 kubelet[752]: E1114 23:07:06.412219 752 dns.go:157] "Nameserver limits exceeded" err="Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 46.38.252.230 46.38.225.230 2a03:4000:0:1::e1e6"
Nov 14 23:07:06 k8s-worker2 systemd[1]: run-docker-runtime\x2drunc-moby-d1c0f0dc3e024723707edfc12e023b98fb98f1be971177ecca5ac0cfdc91ab87-runc.nK23K3.mount: Deactivated successfully.
Nov 14 23:07:11 k8s-worker2 systemd[1]: run-docker-runtime\x2drunc-moby-d1c0f0dc3e024723707edfc12e023b98fb98f1be971177ecca5ac0cfdc91ab87-runc.L5TkRU.mount: Deactivated successfully.
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.411831] sd 2:0:0:1: [sda] tag#44 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=15s
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.411888] sd 2:0:0:1: [sda] tag#44 Sense Key : Medium Error [current]
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.411898] sd 2:0:0:1: [sda] tag#44 Add. Sense: Unrecovered read error
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.411952] sd 2:0:0:1: [sda] tag#44 CDB: Write(10) 2a 00 00 44 28 40 00 00 50 00
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.411965] blk_update_request: critical medium error, dev sda, sector 4466752 op 0x1:(WRITE) flags 0x0 phys_seg 10 prio class 0
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.419273] EXT4-fs warning (device sda): ext4_end_bio:344: I/O error 7 writing to inode 131083 starting block 558354)
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.430398] sd 2:0:0:1: [sda] tag#47 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=15s
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.430407] sd 2:0:0:1: [sda] tag#47 Sense Key : Medium Error [current]
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.430409] sd 2:0:0:1: [sda] tag#47 Add. Sense: Unrecovered read error
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.430412] sd 2:0:0:1: [sda] tag#47 CDB: Write(10) 2a 08 00 00 00 00 00 00 08 00
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.430415] blk_update_request: critical medium error, dev sda, sector 0 op 0x1:(WRITE) flags 0x23800 phys_seg 1 prio class 0
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.433686] Buffer I/O error on dev sda, logical block 0, lost sync page write
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.436088] EXT4-fs (sda): I/O error while writing superblock
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.444291] sd 2:0:0:1: [sda] tag#32 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=14s
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.444300] sd 2:0:0:1: [sda] tag#32 Sense Key : Medium Error [current]
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.444304] sd 2:0:0:1: [sda] tag#32 Add. Sense: Unrecovered read error
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.444308] sd 2:0:0:1: [sda] tag#32 CDB: Write(10) 2a 00 00 41 01 18 00 00 08 00
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.444313] blk_update_request: critical medium error, dev sda, sector 4260120 op 0x1:(WRITE) flags 0x3000 phys_seg 1 prio class 0
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.449491] Buffer I/O error on dev sda, logical block 532515, lost async page write
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.453591] sd 2:0:0:1: [sda] tag#33 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.453600] sd 2:0:0:1: [sda] tag#33 Sense Key : Medium Error [current]
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.453603] sd 2:0:0:1: [sda] tag#33 Add. Sense: Unrecovered read error
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.453607] sd 2:0:0:1: [sda] tag#33 CDB: Write(10) 2a 08 00 00 00 00 00 00 08 00
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.453610] blk_update_request: critical medium error, dev sda, sector 0 op 0x1:(WRITE) flags 0x23800 phys_seg 1 prio class 0
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.459072] Buffer I/O error on dev sda, logical block 0, lost sync page write
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.461189] EXT4-fs (sda): I/O error while writing superblock
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.464347] EXT4-fs (sda): Remounting filesystem read-only
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.466527] EXT4-fs (sda): failed to convert unwritten extents to written extents -- potential data loss! (inode 131081, error -30)
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.470833] Buffer I/O error on device sda, logical block 561678
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.473548] Buffer I/O error on device sda, logical block 561679
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.477384] EXT4-fs (sda): failed to convert unwritten extents to written extents -- potential data loss! (inode 131083, error -30)
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.482014] Buffer I/O error on device sda, logical block 558344
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.484881] Buffer I/O error on device sda, logical block 558345
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.487224] Buffer I/O error on device sda, logical block 558346
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.488837] Buffer I/O error on device sda, logical block 558347
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.490543] Buffer I/O error on device sda, logical block 558348
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.492061] Buffer I/O error on device sda, logical block 558349
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.493494] Buffer I/O error on device sda, logical block 558350
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.494931] Buffer I/O error on device sda, logical block 558351
Not sure, if this is really related to the problem.
Generally when you see this error message:
"error":"DBPathInUse: Unable to create/open the lock file: /bitnami/mongodb/data/db/mongod.lock (Read-only file system)
It most probably mean that your mongodb pod did not shutted down gracefully and had no time to remove the mongod.lock file so when your pod was re-created in another k8s node the "new" mongod process cannot start because it is finding the previous mongod.lock file.
The easiest way to resolve the current availability issue is to scale up and add immediately one more replicaSet member so the new member to init-sync from the available good member:
helm upgrade mongodb bitnami/mongodb \
--set architecture=replicaset \
--set auth.replicaSetKey=myreplicasetkey \
--set auth.rootPassword=myrootpassword \
--set replicaCount=3
and elect again primary.
You can check if mongoDB replicaSet elected PRIMARY from mongo shell inside the pod with the command:
rs.status()
For affected pod with the issue you can do as follow:
You can plan maitenance window and scale down ( scaling down stateFullset do not expect to automatically delete the pvc/pv , but good to make backup just in case.
After you scale down you can start custom helper pod to mount the pv so you can remove the mongod.lock file:
Temporary pod that you will start to mount the affected dbPath and remove the mongodb.lock file:
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: mongo-pvc-helper
spec:
securityContext:
runAsUser: 0
containers:
- command:
- sh
- -c
- while true ; do echo alive ; sleep 10 ; done
image: busybox
imagePullPolicy: Always
name: mongo-pvc-helper
resources: {}
securityContext:
capabilities:
drop:
- ALL
volumeMounts:
- mountPath: /mongodata
name: mongodata
volumes:
- name: mongodata
persistentVolumeClaim:
claimName: <your_faulty_pod_pvc_name>
EOF
After you start the pod you can do:
kubectl exec mongo-pvc-helper -it sh
$ chown -R 0:0 /mongodata
$ rm /mongodata/mongod.lock
$ exit
Or you can complete wipe up the entire pv(if you prefer safely to init-sync entirely this member):
rm -rf /mongodata/*
And terminate the pod so you can finish the process:
kubectl delete pod mongo-pvc-helper
And again scale-up:
helm upgrade mongodb bitnami/mongodb \
--set architecture=replicaset \
--set auth.replicaSetKey=myreplicasetkey \
--set auth.rootPassword=myrootpassword \
--set replicaCount=2
Btw, good to have at least 3x data members in replicaSet for better redundancy to allow during single member down event election to keep still the PRIMARY up and running...
How to troubleshoot this further:
Ensure your pods have the terminationGracePeriod set (at least 10-20 sec) so it allow some time for the mongod process to flush data to storage and remove the mongod.lock file.
Depending from pod memory limits/requests , you can set some safer value for storage.wiredTiger.engineConfig.cacheSizeGB (if not set it is allocating ~50% from memory ).
Check the kubelet logs from node where pod was killed there maybe more details why pod was killed.
I think #R2D2's extensive answer makes some good points about how to recover from the situation. I very much agree with their recommendation to use 3 data bearing nodes which aligns with fault tolerance considerations. With the additional logs you were able to add, I am arriving at the same conclusion that your storage subsystem is the problem here which is going to be the actual cause of your MongoDB failing.
In your initial query the following log line was specifically highlighted:
DBPathInUse: Unable to create/open the lock file: /bitnami/mongodb/data/db/mongod.lock (Read-only file system). Ensure the user executing mongod is the owner of the lock file and has the appropriate permissions. Also make sure that another mongod instance is not already running on the /bitnami/mongodb/data/db directory
Specifically: (Read-only file system). Now in the new logs you have provided the host itself is reporting:
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.459072] Buffer I/O error on dev sda, logical block 0, lost sync page write
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.461189] EXT4-fs (sda): I/O error while writing superblock
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.464347] EXT4-fs (sda): Remounting filesystem read-only
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.466527] EXT4-fs (sda): failed to convert unwritten extents to written extents -- potential data loss! (inode 131081, error -30)
Specifically Remounting filesystem to read-only. If mongod is using any of these mount points for its operation then we would expect the system to no longer be able to function properly if it can no longer write to them. The database process itself may terminate, which is something the storage node watchdog could be configured to do (in subsequent versions).
In any case, the issues with the storage look quite serious, they include text like this: failed to convert unwritten extents to written extents -- potential data loss! It seems imperative that you look into this further and resolve any issues as soon as possible.
Relatedly, you mentioned:
I'm using longhorn (before I tried nfs) for the storage
The logs also suggest EXT4-fs is at play here. I think all of these have been known to have issues or otherwise be suboptimal for usage with MongoDB. From their documentation:
With the WiredTiger storage engine, using XFS is strongly recommended for data bearing nodes to avoid performance issues that may occur when using EXT4 with WiredTiger.
From elsewhere on the same page (emphasis added):
With the WiredTiger storage engine, WiredTiger objects may be stored on remote file systems if the remote file system conforms to ISO/IEC 9945-1:1996 (POSIX.1). Because remote file systems are often slower than local file systems, using a remote file system for storage may degrade performance.
I don't have any personal experience with Longhorn, but you can see an example here where instability with that storage system caused the same DBPathInUse error that you observed. There are other reports of people having nothing but problems with storage constantly detaching itself.
In short - instability with the storage subsystem is what is both causing the mongod process/pod to fail as well as preventing it from recovering. The problem is compounded by the fact that you only have 2 members in the replica set which provides no fault tolerance. Once you lose one member the other one will not be able to operate as a PRIMARY since there is no majority. Increasing the replica set to 3 members will at least provide fault tolerance of 1 node. The storage issues are a separate problem that should be pursued further via another question focused more on how that component is configured in your environment.
Some time ago I had something like that. That is always sad experience.
According to answer done by #R2D2. When you see (Read-only file system) in your logs - it can mean many things all not good. For instance when Linux starts file system is read-only, when everything is OK it is switched to read-write. That is not your case - so - just an example.
Please see that file system was marked as read-only due to io-errors. Looks like hard drive is corrupted. Check system, on which Kubernetes is running - fsck for Linux - like described here.
When drive is fixed restart Kubernetes - some data is lost, count on mongo complaining about data integrity... Nothing more than mongod --repair comes to my mind. Aaaand it can be that lock file should also be deleted before repair, but it should complain about it - like - "there is another instance", or "I can't set lock - file exists".
Besides that - use SMART monitoring, also mentioned later at the page.
Newer, faster, bigger drives are also more fragile. That is the price.
If you have backup... Yes I know - I've mentioned about my case - since then I have backup... Good luck!

Mongodb after install crash

i'm trying to start fresh installed mongodb
(packets mongodb-org=5.0.2 mongodb-org-database=5.0.2 mongodb-org-server=5.0.2 mongodb-org-shell=5.0.2 mongodb-org-mongos=5.0.2 mongodb-org-tools=5.0.2)
OS ubuntu20.04 (clear, fresh installed)
Vmware exsi
config default
and after install i'm trying start service ang get errors like that:
sudo systemctl status mongod.service
● mongod.service - MongoDB Database Server
Loaded: loaded (/lib/systemd/system/mongod.service; disabled; vendor preset: enabled)
Active: failed (Result: core-dump) since Mon 2021-11-22 13:02:15 UTC; 1s ago
Docs: https://docs.mongodb.org/manual
Process: 1769 ExecStart=/usr/bin/mongod --config /etc/mongod.conf (code=dumped, signal=ILL)
Main PID: 1769 (code=dumped, signal=ILL)
Nov 22 13:02:14 rocket systemd[1]: Started MongoDB Database Server.
Nov 22 13:02:15 rocket systemd[1]: mongod.service: Main process exited, code=dumped, status=4/ILL
Nov 22 13:02:15 rocket systemd[1]: mongod.service: Failed with result 'core-dump'.
journal -xe:
-- A start job for unit mongod.service has finished successfully.
--
-- The job identifier is 456.
Nov 22 12:59:33 rocket sudo[1687]: pam_unix(sudo:session): session closed for user root
Nov 22 12:59:33 rocket kernel: show_signal: 18 callbacks suppressed
Nov 22 12:59:33 rocket kernel: traps: mongod[1693] trap invalid opcode ip:562c4fb0708a sp:7ffe3d3abcb0 error:0 in mongod[562c4bbc8000+5055000]
Nov 22 12:59:34 rocket systemd[1]: mongod.service: Main process exited, code=dumped, status=4/ILL
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- An ExecStart= process belonging to unit mongod.service has exited.
--
-- The process' exit code is 'dumped' and its exit status is 4.
Nov 22 12:59:34 rocket systemd[1]: mongod.service: Failed with result 'core-dump'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit mongod.service has entered the 'failed' state with result 'core-dump'.
rocket kernel: traps: mongod[1693] trap invalid opcode ip:562c4fb0708a
sp:7ffe3d3abcb0 error:0 in mongod[562c4bbc8000+5055000]
limits:
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 47570
max locked memory (kbytes, -l) 65536
max memory size (kbytes, -m) unlimited
open files (-n) 65000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 47570
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
a. the issue looks to be a matter of missing dependencies.
Run:
apt-get -u dist-upgrade
and then reinstall mongodb
b. Dont use 5.0.2 - it has a severe alert from MongoDB for that version.
Use 5.0.3 instead
It looks like Mongo is sensitive to extensions on the underlying chipset (AVX). If you are running this under KVM, you have to experiment with the underlying CPU
architecture. I had the same problem with Mongo 6.0.4.
I am running KVM on an older Xeon chip, so instead of QEMU emulation, I pass execution to the underlying processor. All is well then.
https://www.mongodb.com/community/forums/t/mongodb-5-0-cpu-intel-g4650-compatibility/116610

Ceph Monitor out of quorum

we're experiencing a problem with one of our ceph monitors. Cluster uses 3 monitors and they are all up&running. They can communicate with each other and gives a relevant ceph -s output. However quorum shows second monitor is down. The ceph -s output from supposedly down monitor is below:
cluster:
id: bb1ab46a-d282-4530-bf5c-021e9c940958
health: HEALTH_WARN
insufficient standby MDS daemons available
noout flag(s) set
9 large omap objects
47 pgs not deep-scrubbed in time
application not enabled on 2 pool(s)
1/3 mons down, quorum mon1,mon3
services:
mon: 3 daemons, quorum mon1,mon3 (age 3d), out of quorum: mon2
mgr: mon1(active, since 3d)
mds: filesystem:1 {0=mon1=up:active}
osd: 77 osds: 77 up (since 3d), 77 in (since 2w)
flags noout
rbd-mirror: 1 daemon active (12512649)
rgw: 1 daemon active (mon1)
data:
pools: 13 pools, 1500 pgs
objects: 65.36M objects, 23 TiB
usage: 85 TiB used, 701 TiB / 785 TiB avail
pgs: 1500 active+clean
io:
client: 806 KiB/s wr, 0 op/s rd, 52 op/s wr
systemctl status ceph-mon#2.service shows:
ceph-mon#2.service - Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon#.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Tue 2020-12-08 12:12:58 +03; 28s ago
Process: 2681 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 2681 (code=exited, status=1/FAILURE)
Dec 08 12:12:48 mon2 systemd[1]: Unit ceph-mon#2.service entered failed state.
Dec 08 12:12:48 mon2 systemd[1]: ceph-mon#2.service failed.
Dec 08 12:12:58 mon2 systemd[1]: ceph-mon#2.service holdoff time over, scheduling restart.
Dec 08 12:12:58 mon2 systemd[1]: Stopped Ceph cluster monitor daemon.
Dec 08 12:12:58 mon2 systemd[1]: start request repeated too quickly for ceph-mon#2.service
Dec 08 12:12:58 mon2 systemd[1]: Failed to start Ceph cluster monitor daemon.
Dec 08 12:12:58 mon2 systemd[1]: Unit ceph-mon#2.service entered failed state.
Dec 08 12:12:58 mon2 systemd[1]: ceph-mon#2.service failed.
Restarting, Stop/Starting, Enable/Disabling the monitor daemon did not work. Docs mention the monitor asok file in var/run/ceph and i don't have it in the supposed directory yet the other monitors have their asok files right in place. And now im in a state that i can't even stop the monitor daemon on second monitor it only stays at failed state. There is no logs shown in /var/log/ceph monitor logs. What am i supposed to do? I don't have much experience in ceph so i don't want to change things without being absolutely sure in order to avoid messing up the cluster.
try to start the service manually on MON2 with:
/usr/bin/ceph-mon -f --cluster Ceph --id 2 --setuser ceph --setgroup ceph

Problems installing OpenMediaVault 5 on Raspberry Pi 1

BTW, I am trying to install OpenMediaVault on a Raspberry Pi 1, however, whenever I try and install OpenMediaVault on a Raspberry Pi 1, I get problems with the monit package, the FastCGI Process Manager and the openmediavault package too. Here is what happens with the package monit:
monit.service - LSB: service and resource monitoring daemon
Loaded: loaded (/etc/init.d/monit; generated)
Active: failed (Result: exit-code) Since Fri 2020-05-22 13:49:54 BST; 30min ago
Docs: man:systemd-sysv-generator(8)
May 22 13:49:54 raspberrypi systemd[1]: Starting LSB: service and resource monitoring daemon...
May 22 13:49:54 raspberrypi monit[4940]: Starting daemon monitor: monitSegmentation fault
May 22 13:49:54 raspberrypi monit[4940]: failed!
May 22 13:49:54 raspberrypi systemd[1]: monit.service: Control process exit, code=exited,
status=1/FAILURE
May 22 13:49:54 raspberrypi systemd[1]: monit.service: Failed with result 'exit-code'.
May 22 13:49:54 raspberrypi systemd[1]: Failed to start LSB: service and resource monitoring
daemon.
Here is what happens with the FastCGI Process Manager:
php7.3-fpm.service - The PHP 7.3 FastCGI Process Manager
Loaded: loaded (/lib/systemd/system/php7.3-fpm.service; enabled; vendor preset: enabled)
Active: failed (Resul: signal) since Fri 2020-05-22 14:33:21 BST; 6min ago
Docs: man:php-fpm7.3(8)
Main PID: 416 (code=killed, signal=ILL)
May 22 14:33:15 raspberrypi systemd[1]: Starting The PHP 7.3 FastCGI Process Manager...
May 22 14:33:15 raspberrypi systemd[1]: php7.3-fpm.service: Main process exited, code=killed,
status=4/ILL
May 22 14:33:15 raspberrypi systemd[1]: php7.3-fpm.service: Failed with result 'signal'.
May 22 14:33:15 raspberrypi systemd[1]: Failed to start The PHP 7.3 FastCGI Process Manager.
And here is what happens with the package openmediavault:
dpkg: dependency problems prevent configuration of openmediavault:
openmediavault depends on monit; however
Package monit is not configured yet.
dpkg: error processing package openmediavault (--configure):
dependency problems - leaving unconfigured
Could anyone help with this problem?

How to activate bcm2835_wdt watchdog kernel module for raspberry pi 3?

I have been trying to activate bcm2835_wdt watchdog module of raspberry pi 3 for 6 hours but I couldn't.
modprobe bcm2835_wdt returns no error but lsmod command doesn't return bcm2835_wdt module in the list.
I have loaded watchdog and chkconfig
then;
sudo chkconfig watchdog on
when I try to start service
sudo /etc/init.d/watchdog start
I got an error
[....] Starting watchdog (via systemctl): watchdog.service Job for watchdog.service failed because the control process exited with error code.
See "systemctl status watchdog.service" and "journalctl -xe" for details.
failed!
journalctl -xe returns;
-- Kernel start-up required 2093448 microseconds.
--
-- Initial RAM disk start-up required INITRD_USEC microseconds.
--
-- Userspace start-up required 5579375635 microseconds.
Jan 11 16:03:45 al sudo[935]: root : TTY=pts/1 ; PWD=/ ; USER=root ; COMMAND=/etc/init.d/watchdog start
Jan 11 16:03:45 al sudo[935]: pam_unix(sudo:session): session opened for user root by root(uid=0)
Jan 11 16:03:46 al systemd[1]: Starting watchdog daemon...
-- Subject: Unit watchdog.service has begun start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit watchdog.service has begun starting up.
Jan 11 16:03:46 al sh[949]: modprobe: **FATAL: Module dcm2835_wdt not found in directory /lib/modules/4.9.59-v7+**
Jan 11 16:03:46 al systemd[1]: watchdog.service: Control process exited, code=exited status=1
Jan 11 16:03:46 al systemd[1]: Failed to start watchdog daemon.
-- Subject: Unit watchdog.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit watchdog.service has failed.
My question is how to enable watchdog kernel module bcm2835_wdt for raspberry pi3 ?
Thank you in advance...
May the bcm2835_wdt has been compiled into the kernel on your system, so you don't see it with lsmod. Just try:
# cat /lib/modules/$(uname -r)/modules.builtin | grep wdt
kernel/drivers/watchdog/bcm2835_wdt.ko
If you can see it in the list, it has been compiled within the kernel. You may also see if it has been enabled with this:
journalctl --no-pager | grep -i watchdog
Regarding you watchdog configuration, see this error:
modprobe: **FATAL: Module dcm2835_wdt not found in directory /lib/modules/4.9.59-v7+**
The module has been called dcm2835_wdt, not bcm2835_wdt.
Also, keep in mind that your watchdog may be used by SystemD, so you should refer to that for using it.
If you don't mind, you may also try a fork bomb to see if the watchdog is able to restart you system when a problem is detected:
python -c "import os, itertools; [os.fork() for i in itertools.count()]"