Motion cannot open camera - raspberry-pi

My raspberry camera v1.3 does work with libcamera, so it's properly connected. But when I try to use motion to get a livestream of my camera it doens't work.
This is the error I got:
[1:ml1] [NTC] [ALL] [Jan 16 13:39:56] motion_init: Camera 0 started: motion detection Enabled
[1:ml1] [NTC] [VID] [Jan 16 13:39:56] vid_start: Opening V4L2 device
[1:ml1] [NTC] [VID] [Jan 16 13:39:56] v4l2_device_open: Using videodevice /dev/media4 and input -1
[1:ml1] [ERR] [VID] [Jan 16 13:39:56] v4l2_device_capability: Not a V4L2 device?
[1:ml1] [ERR] [VID] [Jan 16 13:39:56] vid_start: V4L2 device failed to open
[1:ml1] [WRN] [ALL] [Jan 16 13:39:56] motion_init: Could not fetch initial image from camera
[1:ml1] [WRN] [ALL] [Jan 16 13:39:56] motion_init: Motion continues using width and height from config file(s)
[1:ml1] [NTC] [ALL] [Jan 16 13:39:56] image_ring_resize: Resizing pre_capture buffer to 1 items
[1:ml1] [NTC] [ALL] [Jan 16 13:39:56] image_ring_resize: Resizing pre_capture buffer to 4 items
[1:ml1] [NTC] [ALL] [Jan 16 13:39:56] mlp_actions: End of event 1
[1:ml1] [NTC] [ALL] [Jan 16 13:39:56] motion_loop: Thread exiting
When I execute: libcamera-still I get this result and I get a image from my camera.
pi#raspberrypi:/home/log/motion $ libcamera-still
Preview window unavailable
[0:33:30.439570748] [1786] INFO Camera camera_manager.cpp:293 libcamera v0.0.0+3866-0c55e522
[0:33:30.467622034] [1787] INFO RPI raspberrypi.cpp:1374 Registered camera /base/soc/i2c0mux/i2c#1/ov5647#36 to Unicam device /dev/media4 and ISP device /dev/media0
[0:33:30.468308986] [1786] INFO Camera camera.cpp:1035 configuring streams: (0) 1296x972-YUV420
[0:33:30.468648092] [1787] INFO RPI raspberrypi.cpp:761 Sensor: /base/soc/i2c0mux/i2c#1/ov5647#36 - Selected sensor format: 1296x972-SGBRG10_1X10 - Selected unicam format: 1296x972-pGAA
The motion config:
daemon on
# Start in Setup-Mode, daemon disabled.
setup_mode off
# File to store the process ID.
; pid_file value
# File to write logs messages into. If not defined stderr and syslog is used.
log_file /home/log/motion/motion.log
# Level of log messages [1..9] (EMG, ALR, CRT, ERR, WRN, NTC, INF, DBG, ALL).
log_level 6
# Target directory for pictures, snapshots and movies
target_dir /home/pi/motion
# Video device (e.g. /dev/video0) to be used for capturing.
videodevice /dev/media4
# Parameters to control video device. See motion_guide.html
; vid_control_params value
# The full URL of the network camera stream.
; netcam_url value
# Name of mmal camera (e.g. vc.ril.camera for pi camera).
; mmalcam_name value
# Camera control parameters (see raspivid/raspistill tool documentation)
; mmalcam_control_params value
# Image width in pixels.
width 640
# Image height in pixels.
height 480
# Maximum number of frames to be captured per second.
framerate 15
# Text to be overlayed in the lower left corner of images
text_left CAMERA1
# Text to be overlayed in the lower right corner of images.
text_right %Y-%m-%d\n%T-%q
emulate_motion off
# Threshold for number of changed pixels that triggers motion.
threshold 1500
# Noise threshold for the motion detection.
; noise_level 32
# Despeckle the image using (E/e)rode or (D/d)ilate or (l)abel.
despeckle_filter EedDl
# Number of images that must contain motion to trigger an event.
minimum_motion_frames 1
# Gap in seconds of no motion detected that triggers the end of an event.
event_gap 60
# The number of pre-captured (buffered) pictures from before motion.
pre_capture 3
# Number of frames to capture after motion is no longer detected.
post_capture 0
# Command to be executed when an event starts.
; on_event_start value
# Command to be executed when an event ends.
; on_event_end value
# Command to be executed when a movie file is closed.
; on_movie_end value
picture_output off
# File name(without extension) for pictures relative to target directory
picture_filename %Y%m%d%H%M%S-%q
webcontrol_port 8080
# Restrict webcontrol connections to the localhost.
webcontrol_localhost on
# Type of configuration options to allow via the webcontrol.
webcontrol_parms 0
# The port number for the live stream.
stream_port 8081
# Restrict stream connections to the localhost.
stream_localhost off
And when I check the status:
pi#raspberrypi:/home/log/motion $ sudo service motion status
● motion.service - Motion detection video capture daemon
Loaded: loaded (/lib/systemd/system/motion.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Mon 2023-01-16 13:51:04 GMT; 3s ago
Docs: man:motion(1)
Process: 1826 ExecStart=/usr/bin/motion (code=exited, status=0/SUCCESS)
Main PID: 1826 (code=exited, status=0/SUCCESS)
CPU: 340ms
Jan 16 13:51:02 raspberrypi systemd[1]: Started Motion detection video capture daemon.
Jan 16 13:51:03 raspberrypi motion[1826]: [0:motion] [NTC] [ALL] conf_load: Processing thread 0 - config file /etc/motion/motion.conf
Jan 16 13:51:03 raspberrypi motion[1826]: [0:motion] [NTC] [ALL] conf_load: Processing thread 0 - config file /etc/motion/motion.conf
Jan 16 13:51:03 raspberrypi motion[1826]: [0:motion] [NTC] [ALL] motion_startup: Logging to file (/home/log/motion/motion.log)
Jan 16 13:51:03 raspberrypi motion[1826]: [0:motion] [NTC] [ALL] motion_startup: Logging to file (/home/log/motion/motion.log)
Jan 16 13:51:04 raspberrypi systemd[1]: motion.service: Succeeded.

Related

failing k8s mongodb pod after some time - DBPathInUse: Unable to create/open the lock file

I'm running a kubernetes cluster (bare metal) with a mongodb (version 4, as my server cannot handle newer versions) replicaset (2 replicas), which is initially working, but from time to time (sometimes 24 hours, somtimes 10 days) one or more mongodb pods are failing.
Warning BackOff 2m9s (x43454 over 6d13h) kubelet Back-off restarting failed container
The relevant part of the logs should be
DBPathInUse: Unable to create/open the lock file: /bitnami/mongodb/data/db/mongod.lock (Read-only file system). Ensure the user executing mongod is the owner of the lock file and has the appropriate permissions. Also make sure that another mongod instance is not already running on the /bitnami/mongodb/data/db directory
But I do not change anything and initially it is working. Also the second pod is currently running (but which will fail the next days).
I'm using longhorn (before I tried nfs) for the storage and I installed mongodb using bitnami helm chart with these values:
image:
registry: docker.io
repository: bitnami/mongodb
digest: "sha256:916202d7af766dd88c2fff63bf711162c9d708ac7a3ffccd2aa812e3f03ae209" # tag: 4.4.15
pullPolicy: IfNotPresent
architecture: replicaset
replicaCount: 2
updateStrategy:
type: RollingUpdate
containerPorts:
mongodb: 27017
auth:
enabled: true
rootUser: root
rootPassword: "password"
usernames: ["user"]
passwords: ["userpass"]
databases: ["db"]
service:
portName: mongodb
ports:
mongodb: 27017
persistence:
enabled: true
accessModes:
- ReadWriteOnce
size: 8Gi
volumePermissions:
enabled: true
livenessProbe:
enabled: false
readinessProbe:
enabled: false
logs
mongodb 21:25:05.55 INFO ==> Advertised Hostname: mongodb-1.mongodb-headless.mongodb.svc.cluster.local
mongodb 21:25:05.55 INFO ==> Advertised Port: 27017
mongodb 21:25:05.56 INFO ==> Pod name doesn't match initial primary pod name, configuring node as a secondary
mongodb 21:25:05.59
mongodb 21:25:05.59 Welcome to the Bitnami mongodb container
mongodb 21:25:05.60 Subscribe to project updates by watching https://github.com/bitnami/containers
mongodb 21:25:05.60 Submit issues and feature requests at https://github.com/bitnami/containers/issues
mongodb 21:25:05.60
mongodb 21:25:05.60 INFO ==> ** Starting MongoDB setup **
mongodb 21:25:05.64 INFO ==> Validating settings in MONGODB_* env vars...
mongodb 21:25:05.78 INFO ==> Initializing MongoDB...
mongodb 21:25:05.82 INFO ==> Deploying MongoDB with persisted data...
mongodb 21:25:05.83 INFO ==> Writing keyfile for replica set authentication...
mongodb 21:25:05.88 INFO ==> ** MongoDB setup finished! **
mongodb 21:25:05.92 INFO ==> ** Starting MongoDB **
{"t":{"$date":"2022-10-29T21:25:05.961+00:00"},"s":"I", "c":"CONTROL", "id":20698, "ctx":"main","msg":"***** SERVER RESTARTED *****"}
{"t":{"$date":"2022-10-29T21:25:05.963+00:00"},"s":"I", "c":"CONTROL", "id":23285, "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
{"t":{"$date":"2022-10-29T21:25:05.968+00:00"},"s":"W", "c":"ASIO", "id":22601, "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2022-10-29T21:25:05.968+00:00"},"s":"I", "c":"NETWORK", "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
{"t":{"$date":"2022-10-29T21:25:05.969+00:00"},"s":"W", "c":"ASIO", "id":22601, "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2022-10-29T21:25:06.011+00:00"},"s":"I", "c":"STORAGE", "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":1,"port":27017,"dbPath":"/bitnami/mongodb/data/db","architecture":"64-bit","host":"mongodb-1"}}
{"t":{"$date":"2022-10-29T21:25:06.011+00:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.15","gitVersion":"bc17cf2c788c5dda2801a090ea79da5ff7d5fac9","openSSLVersion":"OpenSSL 1.1.1n 15 Mar 2022","modules":[],"allocator":"tcmalloc","environment":{"distmod":"debian10","distarch":"x86_64","target_arch":"x86_64"}}}}
{"t":{"$date":"2022-10-29T21:25:06.012+00:00"},"s":"I", "c":"CONTROL", "id":51765, "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"PRETTY_NAME=\"Debian GNU/Linux 10 (buster)\"","version":"Kernel 5.15.0-48-generic"}}}
{"t":{"$date":"2022-10-29T21:25:06.012+00:00"},"s":"I", "c":"CONTROL", "id":21951, "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"config":"/opt/bitnami/mongodb/conf/mongodb.conf","net":{"bindIp":"*","ipv6":false,"port":27017,"unixDomainSocket":{"enabled":true,"pathPrefix":"/opt/bitnami/mongodb/tmp"}},"processManagement":{"fork":false,"pidFilePath":"/opt/bitnami/mongodb/tmp/mongodb.pid"},"replication":{"enableMajorityReadConcern":true,"replSetName":"rs0"},"security":{"authorization":"disabled","keyFile":"/opt/bitnami/mongodb/conf/keyfile"},"setParameter":{"enableLocalhostAuthBypass":"true"},"storage":{"dbPath":"/bitnami/mongodb/data/db","directoryPerDB":false,"journal":{"enabled":true}},"systemLog":{"destination":"file","logAppend":true,"logRotate":"reopen","path":"/opt/bitnami/mongodb/logs/mongodb.log","quiet":false,"verbosity":0}}}}
{"t":{"$date":"2022-10-29T21:25:06.013+00:00"},"s":"E", "c":"STORAGE", "id":20557, "ctx":"initandlisten","msg":"DBException in initAndListen, terminating","attr":{"error":"DBPathInUse: Unable to create/open the lock file: /bitnami/mongodb/data/db/mongod.lock (Read-only file system). Ensure the user executing mongod is the owner of the lock file and has the appropriate permissions. Also make sure that another mongod instance is not already running on the /bitnami/mongodb/data/db directory"}}
{"t":{"$date":"2022-10-29T21:25:06.013+00:00"},"s":"I", "c":"REPL", "id":4784900, "ctx":"initandlisten","msg":"Stepping down the ReplicationCoordinator for shutdown","attr":{"waitTimeMillis":10000}}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"COMMAND", "id":4784901, "ctx":"initandlisten","msg":"Shutting down the MirrorMaestro"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"SHARDING", "id":4784902, "ctx":"initandlisten","msg":"Shutting down the WaitForMajorityService"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"NETWORK", "id":20562, "ctx":"initandlisten","msg":"Shutdown: going to close listening sockets"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"NETWORK", "id":4784905, "ctx":"initandlisten","msg":"Shutting down the global connection pool"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"STORAGE", "id":4784906, "ctx":"initandlisten","msg":"Shutting down the FlowControlTicketholder"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"-", "id":20520, "ctx":"initandlisten","msg":"Stopping further Flow Control ticket acquisitions."}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"REPL", "id":4784907, "ctx":"initandlisten","msg":"Shutting down the replica set node executor"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"NETWORK", "id":4784918, "ctx":"initandlisten","msg":"Shutting down the ReplicaSetMonitor"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"SHARDING", "id":4784921, "ctx":"initandlisten","msg":"Shutting down the MigrationUtilExecutor"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"CONTROL", "id":4784925, "ctx":"initandlisten","msg":"Shutting down free monitoring"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"STORAGE", "id":4784927, "ctx":"initandlisten","msg":"Shutting down the HealthLog"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"STORAGE", "id":4784929, "ctx":"initandlisten","msg":"Acquiring the global lock for shutdown"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"-", "id":4784931, "ctx":"initandlisten","msg":"Dropping the scope cache for shutdown"}
{"t":{"$date":"2022-10-29T21:25:06.014+00:00"},"s":"I", "c":"FTDC", "id":4784926, "ctx":"initandlisten","msg":"Shutting down full-time data capture"}
{"t":{"$date":"2022-10-29T21:25:06.015+00:00"},"s":"I", "c":"CONTROL", "id":20565, "ctx":"initandlisten","msg":"Now exiting"}
{"t":{"$date":"2022-10-29T21:25:06.015+00:00"},"s":"I", "c":"CONTROL", "id":23138, "ctx":"initandlisten","msg":"Shutting down","attr":{"exitCode":100}}
Update
I checked the syslog and before the the logs Nov 14 23:07:17 k8s-worker2 kubelet[752]: E1114 23:07:17.749057 752 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"mongodb\" with CrashLoopBackOff: \"back-off 10s restarting failed container=mongodb pod=mongodb-2_mongodb(314f2776-ced4-4ba3-b90b-f927dc079770)\"" pod="mongodb/mongodb-2" podUID=314f2776-ced4-4ba3-b90b-f927dc079770
I find these logs:
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.341806] sd 2:0:0:1: [sda] tag#42 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=11s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.341866] sd 2:0:0:1: [sda] tag#42 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.341891] sd 2:0:0:1: [sda] tag#42 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.341899] sd 2:0:0:1: [sda] tag#42 CDB: Write(10) 2a 00 00 85 1f b8 00 00 40 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.341912] blk_update_request: critical medium error, dev sda, sector 8724408 op 0x1:(WRITE) flags 0x800 phys_seg 8 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.352012] Aborting journal on device sda-8.
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.354980] EXT4-fs error (device sda) in ext4_reserve_inode_write:5726: Journal has aborted
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.355103] sd 2:0:0:1: [sda] tag#40 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=15s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.357056] sd 2:0:0:1: [sda] tag#40 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.357061] sd 2:0:0:1: [sda] tag#40 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.357066] sd 2:0:0:1: [sda] tag#40 CDB: Write(10) 2a 00 00 44 14 88 00 00 10 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.357068] blk_update_request: critical medium error, dev sda, sector 4461704 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.357088] EXT4-fs error (device sda): ext4_dirty_inode:5922: inode #131080: comm mongod: mark_inode_dirty error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.359566] EXT4-fs warning (device sda): ext4_end_bio:344: I/O error 7 writing to inode 131081 starting block 557715)
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.361432] EXT4-fs error (device sda) in ext4_dirty_inode:5923: Journal has aborted
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.362792] Buffer I/O error on device sda, logical block 557713
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.364010] Buffer I/O error on device sda, logical block 557714
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.365222] sd 2:0:0:1: [sda] tag#43 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=8s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.365228] sd 2:0:0:1: [sda] tag#43 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.365230] sd 2:0:0:1: [sda] tag#43 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.365233] sd 2:0:0:1: [sda] tag#43 CDB: Write(10) 2a 00 00 44 28 38 00 00 08 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.365234] blk_update_request: critical medium error, dev sda, sector 4466744 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.367434] EXT4-fs warning (device sda): ext4_end_bio:344: I/O error 7 writing to inode 131083 starting block 558344)
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.367442] Buffer I/O error on device sda, logical block 558343
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.368593] sd 2:0:0:1: [sda] tag#41 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=15s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.368597] sd 2:0:0:1: [sda] tag#41 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.368599] sd 2:0:0:1: [sda] tag#41 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.368602] sd 2:0:0:1: [sda] tag#41 CDB: Write(10) 2a 00 00 44 90 70 00 00 10 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.368604] blk_update_request: critical medium error, dev sda, sector 4493424 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.370907] EXT4-fs warning (device sda): ext4_end_bio:344: I/O error 7 writing to inode 131081 starting block 561680)
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.370946] sd 2:0:0:1: [sda] tag#39 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=15s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.370949] sd 2:0:0:1: [sda] tag#39 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.370952] sd 2:0:0:1: [sda] tag#39 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.370949] EXT4-fs error (device sda): ext4_journal_check_start:83: comm kworker/u4:0: Detected aborted journal
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.370954] sd 2:0:0:1: [sda] tag#39 CDB: Write(10) 2a 00 00 10 41 98 00 00 08 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.372081] blk_update_request: critical medium error, dev sda, sector 1065368 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.374353] EXT4-fs warning (device sda): ext4_end_bio:344: I/O error 7 writing to inode 131080 starting block 133172)
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.374396] Buffer I/O error on device sda, logical block 133171
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.388492] EXT4-fs error (device sda) in __ext4_new_inode:1136: Journal has aborted
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.390763] EXT4-fs error (device sda) in ext4_create:2786: Journal has aborted
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.391732] sd 2:0:0:1: [sda] tag#46 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.392941] sd 2:0:0:1: [sda] tag#46 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.392944] sd 2:0:0:1: [sda] tag#46 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.392948] sd 2:0:0:1: [sda] tag#46 CDB: Write(10) 2a 08 00 00 00 00 00 00 08 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.392950] blk_update_request: critical medium error, dev sda, sector 0 op 0x1:(WRITE) flags 0x23800 phys_seg 1 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.395562] Buffer I/O error on dev sda, logical block 0, lost sync page write
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.396945] sd 2:0:0:1: [sda] tag#45 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.396953] sd 2:0:0:1: [sda] tag#45 Sense Key : Medium Error [current]
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.396955] sd 2:0:0:1: [sda] tag#45 Add. Sense: Unrecovered read error
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.396958] sd 2:0:0:1: [sda] tag#45 CDB: Write(10) 2a 08 00 84 00 00 00 00 08 00
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.396959] blk_update_request: critical medium error, dev sda, sector 8650752 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.396930] EXT4-fs (sda): I/O error while writing superblock
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.399771] Buffer I/O error on dev sda, logical block 1081344, lost sync page write
Nov 14 23:06:59 k8s-worker2 kernel: [3413829.403897] JBD2: Error -5 detected when updating journal superblock for sda-8.
Nov 14 23:07:01 k8s-worker2 systemd[1]: run-docker-runtime\x2drunc-moby-d1c0f0dc3e024723707edfc12e023b98fb98f1be971177ecca5ac0cfdc91ab87-runc.w3zzIL.mount: Deactivated successfully.
Nov 14 23:07:05 k8s-worker2 kubelet[752]: E1114 23:07:05.415798 752 dns.go:157] "Nameserver limits exceeded" err="Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 46.38.252.230 46.38.225.230 2a03:4000:0:1::e1e6"
Nov 14 23:07:06 k8s-worker2 kubelet[752]: E1114 23:07:06.412219 752 dns.go:157] "Nameserver limits exceeded" err="Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 46.38.252.230 46.38.225.230 2a03:4000:0:1::e1e6"
Nov 14 23:07:06 k8s-worker2 systemd[1]: run-docker-runtime\x2drunc-moby-d1c0f0dc3e024723707edfc12e023b98fb98f1be971177ecca5ac0cfdc91ab87-runc.nK23K3.mount: Deactivated successfully.
Nov 14 23:07:11 k8s-worker2 systemd[1]: run-docker-runtime\x2drunc-moby-d1c0f0dc3e024723707edfc12e023b98fb98f1be971177ecca5ac0cfdc91ab87-runc.L5TkRU.mount: Deactivated successfully.
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.411831] sd 2:0:0:1: [sda] tag#44 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=15s
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.411888] sd 2:0:0:1: [sda] tag#44 Sense Key : Medium Error [current]
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.411898] sd 2:0:0:1: [sda] tag#44 Add. Sense: Unrecovered read error
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.411952] sd 2:0:0:1: [sda] tag#44 CDB: Write(10) 2a 00 00 44 28 40 00 00 50 00
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.411965] blk_update_request: critical medium error, dev sda, sector 4466752 op 0x1:(WRITE) flags 0x0 phys_seg 10 prio class 0
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.419273] EXT4-fs warning (device sda): ext4_end_bio:344: I/O error 7 writing to inode 131083 starting block 558354)
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.430398] sd 2:0:0:1: [sda] tag#47 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=15s
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.430407] sd 2:0:0:1: [sda] tag#47 Sense Key : Medium Error [current]
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.430409] sd 2:0:0:1: [sda] tag#47 Add. Sense: Unrecovered read error
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.430412] sd 2:0:0:1: [sda] tag#47 CDB: Write(10) 2a 08 00 00 00 00 00 00 08 00
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.430415] blk_update_request: critical medium error, dev sda, sector 0 op 0x1:(WRITE) flags 0x23800 phys_seg 1 prio class 0
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.433686] Buffer I/O error on dev sda, logical block 0, lost sync page write
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.436088] EXT4-fs (sda): I/O error while writing superblock
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.444291] sd 2:0:0:1: [sda] tag#32 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=14s
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.444300] sd 2:0:0:1: [sda] tag#32 Sense Key : Medium Error [current]
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.444304] sd 2:0:0:1: [sda] tag#32 Add. Sense: Unrecovered read error
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.444308] sd 2:0:0:1: [sda] tag#32 CDB: Write(10) 2a 00 00 41 01 18 00 00 08 00
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.444313] blk_update_request: critical medium error, dev sda, sector 4260120 op 0x1:(WRITE) flags 0x3000 phys_seg 1 prio class 0
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.449491] Buffer I/O error on dev sda, logical block 532515, lost async page write
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.453591] sd 2:0:0:1: [sda] tag#33 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.453600] sd 2:0:0:1: [sda] tag#33 Sense Key : Medium Error [current]
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.453603] sd 2:0:0:1: [sda] tag#33 Add. Sense: Unrecovered read error
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.453607] sd 2:0:0:1: [sda] tag#33 CDB: Write(10) 2a 08 00 00 00 00 00 00 08 00
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.453610] blk_update_request: critical medium error, dev sda, sector 0 op 0x1:(WRITE) flags 0x23800 phys_seg 1 prio class 0
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.459072] Buffer I/O error on dev sda, logical block 0, lost sync page write
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.461189] EXT4-fs (sda): I/O error while writing superblock
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.464347] EXT4-fs (sda): Remounting filesystem read-only
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.466527] EXT4-fs (sda): failed to convert unwritten extents to written extents -- potential data loss! (inode 131081, error -30)
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.470833] Buffer I/O error on device sda, logical block 561678
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.473548] Buffer I/O error on device sda, logical block 561679
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.477384] EXT4-fs (sda): failed to convert unwritten extents to written extents -- potential data loss! (inode 131083, error -30)
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.482014] Buffer I/O error on device sda, logical block 558344
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.484881] Buffer I/O error on device sda, logical block 558345
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.487224] Buffer I/O error on device sda, logical block 558346
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.488837] Buffer I/O error on device sda, logical block 558347
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.490543] Buffer I/O error on device sda, logical block 558348
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.492061] Buffer I/O error on device sda, logical block 558349
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.493494] Buffer I/O error on device sda, logical block 558350
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.494931] Buffer I/O error on device sda, logical block 558351
Not sure, if this is really related to the problem.
Generally when you see this error message:
"error":"DBPathInUse: Unable to create/open the lock file: /bitnami/mongodb/data/db/mongod.lock (Read-only file system)
It most probably mean that your mongodb pod did not shutted down gracefully and had no time to remove the mongod.lock file so when your pod was re-created in another k8s node the "new" mongod process cannot start because it is finding the previous mongod.lock file.
The easiest way to resolve the current availability issue is to scale up and add immediately one more replicaSet member so the new member to init-sync from the available good member:
helm upgrade mongodb bitnami/mongodb \
--set architecture=replicaset \
--set auth.replicaSetKey=myreplicasetkey \
--set auth.rootPassword=myrootpassword \
--set replicaCount=3
and elect again primary.
You can check if mongoDB replicaSet elected PRIMARY from mongo shell inside the pod with the command:
rs.status()
For affected pod with the issue you can do as follow:
You can plan maitenance window and scale down ( scaling down stateFullset do not expect to automatically delete the pvc/pv , but good to make backup just in case.
After you scale down you can start custom helper pod to mount the pv so you can remove the mongod.lock file:
Temporary pod that you will start to mount the affected dbPath and remove the mongodb.lock file:
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: mongo-pvc-helper
spec:
securityContext:
runAsUser: 0
containers:
- command:
- sh
- -c
- while true ; do echo alive ; sleep 10 ; done
image: busybox
imagePullPolicy: Always
name: mongo-pvc-helper
resources: {}
securityContext:
capabilities:
drop:
- ALL
volumeMounts:
- mountPath: /mongodata
name: mongodata
volumes:
- name: mongodata
persistentVolumeClaim:
claimName: <your_faulty_pod_pvc_name>
EOF
After you start the pod you can do:
kubectl exec mongo-pvc-helper -it sh
$ chown -R 0:0 /mongodata
$ rm /mongodata/mongod.lock
$ exit
Or you can complete wipe up the entire pv(if you prefer safely to init-sync entirely this member):
rm -rf /mongodata/*
And terminate the pod so you can finish the process:
kubectl delete pod mongo-pvc-helper
And again scale-up:
helm upgrade mongodb bitnami/mongodb \
--set architecture=replicaset \
--set auth.replicaSetKey=myreplicasetkey \
--set auth.rootPassword=myrootpassword \
--set replicaCount=2
Btw, good to have at least 3x data members in replicaSet for better redundancy to allow during single member down event election to keep still the PRIMARY up and running...
How to troubleshoot this further:
Ensure your pods have the terminationGracePeriod set (at least 10-20 sec) so it allow some time for the mongod process to flush data to storage and remove the mongod.lock file.
Depending from pod memory limits/requests , you can set some safer value for storage.wiredTiger.engineConfig.cacheSizeGB (if not set it is allocating ~50% from memory ).
Check the kubelet logs from node where pod was killed there maybe more details why pod was killed.
I think #R2D2's extensive answer makes some good points about how to recover from the situation. I very much agree with their recommendation to use 3 data bearing nodes which aligns with fault tolerance considerations. With the additional logs you were able to add, I am arriving at the same conclusion that your storage subsystem is the problem here which is going to be the actual cause of your MongoDB failing.
In your initial query the following log line was specifically highlighted:
DBPathInUse: Unable to create/open the lock file: /bitnami/mongodb/data/db/mongod.lock (Read-only file system). Ensure the user executing mongod is the owner of the lock file and has the appropriate permissions. Also make sure that another mongod instance is not already running on the /bitnami/mongodb/data/db directory
Specifically: (Read-only file system). Now in the new logs you have provided the host itself is reporting:
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.459072] Buffer I/O error on dev sda, logical block 0, lost sync page write
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.461189] EXT4-fs (sda): I/O error while writing superblock
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.464347] EXT4-fs (sda): Remounting filesystem read-only
Nov 14 23:07:14 k8s-worker2 kernel: [3413844.466527] EXT4-fs (sda): failed to convert unwritten extents to written extents -- potential data loss! (inode 131081, error -30)
Specifically Remounting filesystem to read-only. If mongod is using any of these mount points for its operation then we would expect the system to no longer be able to function properly if it can no longer write to them. The database process itself may terminate, which is something the storage node watchdog could be configured to do (in subsequent versions).
In any case, the issues with the storage look quite serious, they include text like this: failed to convert unwritten extents to written extents -- potential data loss! It seems imperative that you look into this further and resolve any issues as soon as possible.
Relatedly, you mentioned:
I'm using longhorn (before I tried nfs) for the storage
The logs also suggest EXT4-fs is at play here. I think all of these have been known to have issues or otherwise be suboptimal for usage with MongoDB. From their documentation:
With the WiredTiger storage engine, using XFS is strongly recommended for data bearing nodes to avoid performance issues that may occur when using EXT4 with WiredTiger.
From elsewhere on the same page (emphasis added):
With the WiredTiger storage engine, WiredTiger objects may be stored on remote file systems if the remote file system conforms to ISO/IEC 9945-1:1996 (POSIX.1). Because remote file systems are often slower than local file systems, using a remote file system for storage may degrade performance.
I don't have any personal experience with Longhorn, but you can see an example here where instability with that storage system caused the same DBPathInUse error that you observed. There are other reports of people having nothing but problems with storage constantly detaching itself.
In short - instability with the storage subsystem is what is both causing the mongod process/pod to fail as well as preventing it from recovering. The problem is compounded by the fact that you only have 2 members in the replica set which provides no fault tolerance. Once you lose one member the other one will not be able to operate as a PRIMARY since there is no majority. Increasing the replica set to 3 members will at least provide fault tolerance of 1 node. The storage issues are a separate problem that should be pursued further via another question focused more on how that component is configured in your environment.
Some time ago I had something like that. That is always sad experience.
According to answer done by #R2D2. When you see (Read-only file system) in your logs - it can mean many things all not good. For instance when Linux starts file system is read-only, when everything is OK it is switched to read-write. That is not your case - so - just an example.
Please see that file system was marked as read-only due to io-errors. Looks like hard drive is corrupted. Check system, on which Kubernetes is running - fsck for Linux - like described here.
When drive is fixed restart Kubernetes - some data is lost, count on mongo complaining about data integrity... Nothing more than mongod --repair comes to my mind. Aaaand it can be that lock file should also be deleted before repair, but it should complain about it - like - "there is another instance", or "I can't set lock - file exists".
Besides that - use SMART monitoring, also mentioned later at the page.
Newer, faster, bigger drives are also more fragile. That is the price.
If you have backup... Yes I know - I've mentioned about my case - since then I have backup... Good luck!

scanbd unable to detect scanner

I am trying to setup a scan station using my Canon CanoScan LiDE 110 connected to a pi zero w running DietPi 8.1.2.
Initially I had saned running and it worked fine.
scanimage -L listed my scanner and I was able to scan from a networked system using xsane.
I decided to try to use scanbd to make scanning easier with the buttons on the scanner.
following docs here and here
I installed the scanbd package,
copied the original config to /etc/scanbd/sane.d
root#cscan:/etc/scanbd/sane.d# egrep -v '^#|^$' {dll,net}.conf
dll.conf:genesys
Created a symlink for good measure
root#cscan:/var/log# ls -la /usr/local/etc/scanbd
lrwxrwxrwx 1 root root 12 Feb 7 08:53 /usr/local/etc/scanbd -> /etc/scanbd/
modified the existing saned config to point to net only and localhost, so that it wouldn't try to connect directly to the scanner,
root#cscan:/etc/sane.d# egrep -v '^#|^$' {net,dll}.conf
net.conf: connect_timeout = 3
net.conf: localhost
dll.conf:net
enable and start systemd stuff
root#cscan:/var/log# systemctl status {scanbd,scanbm.socket}
? scanbd.service - Scanner button polling Service
Loaded: loaded (/lib/systemd/system/scanbd.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-02-07 08:53:41 EST; 36min ago
Main PID: 2438 (scanbd)
Tasks: 5 (limit: 991)
CGroup: /system.slice/scanbd.service
??2438 /usr/sbin/scanbd -f -c /etc/scanbd/scanbd.conf
Feb 07 08:53:41 cscan systemd[1]: Started Scanner button polling Service.
Feb 07 08:53:41 cscan scanbd[2438]: /usr/sbin/scanbd: dbus match type='signal',interface='or
g.freedesktop.Hal.Manager'
Feb 07 08:57:59 cscan scanbd[2438]: /usr/sbin/scanbd: trigger action for scan for device gen
esys:libusb:001:003 with script test.script
Feb 07 08:58:00 cscan scanbd[2475]: /usr/share/scanbd/scripts/test.script: Begin of scan for
device genesys:libusb:001:003
Feb 07 08:58:00 cscan scanbd[2478]: /usr/share/scanbd/scripts/test.script: End of scan for
device genesys:libusb:001:003
? scanbm.socket - scanbd/saned incoming socket
Loaded: loaded (/lib/systemd/system/scanbm.socket; enabled; vendor preset: enabled)
Active: active (listening) since Mon 2022-02-07 08:53:49 EST; 36min ago
Listen: [::]:6566 (Stream)
Accepted: 5; Connected: 0;
Tasks: 0 (limit: 991)
CGroup: /system.slice/scanbm.socket
Feb 07 08:53:49 cscan systemd[1]: Listening on scanbd/saned incoming socket.
It appears that the scan button is launching the test.script as expected (see above)
but scanbd -f doesn't seem to connect to my scanner.
root#cscan:/etc/scanbd# scanbd -d7 -f
scanbd: foreground
scanbd: reading config file /etc/scanbd/scanbd.conf
scanbd: debug on: level: 7
scanbd: dropping privs to uid saned
scanbd: dropping privs to gid scanner
scanbd: group scanner has member:
scanbd: saned
scanbd: drop privileges to gid: 115
scanbd: Running as effective gid 115
scanbd: drop privileges to uid: 108
scanbd: Running as effective uid 108
scanbd: dbus_init
scanbd: dbus match type='signal',interface='org.freedesktop.Hal.Manager'
scanbd: SANE_CONFIG_DIR not set
scanbd: sane version 1.0
scanbd: Scanning for local-only devices
scanbd: start_sane_threads
scanbd: no devices, not starting any polling thread
scanbd: start dbus thread
scanbd: Not Primary Owner (2)
scanbd: udev init
scanbd: get udev monitor
scanbd: udev fd is non-blocking, now setting to blocking mode
scanbd: start udev thread
scanbd: timeout: 500 ms
scanbd: Iteration on dbus call
scanbd: udev thread started
scanbd: Iteration on dbus call
SANE_CONFIG_DIR variable appears to be defined in scanbd.conf
root#cscan:/etc/scanbd# grep -r SANE_CONFIG_DIR
scanbd.conf: saned_env = { "SANE_CONFIG_DIR=/etc/scanbd" } # list of environment vars for saned
And, scanimage -L does work locally, but not remotely
root#cscan:~# scanimage -L
device `net:localhost:genesys:libusb:001:002' is a Canon LiDE 110 flatbed scanner
user#desktop:/etc/sane.d $ scanimage -L
No scanners were identified. If you were expecting something different,
check that the scanner is plugged in, turned on and detected by the
sane-find-scanner tool (if appropriate). Please read the documentation
which came with this software (README, FAQ, manpages).
At this point I have run out of ideas. Is there anyone out there who can help me out with this? My ultimate goal is to be able to network scan via saned using xsane and saneTwain, and use the front panel buttons to do script magic.

WAL contains references to invalid pages

centos 6.7
postgresql 9.5.3
I've DB servers that are on master-standby replication.
Suddenly, standby server's postgresql process was stopped with this logs.
2016-07-14 18:14:19.544 JST [][5783e03b.3cdb][0][15579]WARNING: page 1671400 of relation base/16400/559613 is uninitialized
2016-07-14 18:14:19.544 JST [][5783e03b.3cdb][0][15579]CONTEXT: xlog redo Heap2/VISIBLE: cutoff xid 1902107520
2016-07-14 18:14:19.544 JST [][5783e03b.3cdb][0][15579]PANIC: WAL contains references to invalid pages
2016-07-14 18:14:19.544 JST [][5783e03b.3cdb][0][15579]CONTEXT: xlog redo Heap2/VISIBLE: cutoff xid 1902107520
2016-07-14 18:14:21.026 JST [][5783e038.3cd9][0][15577]LOG: startup process (PID 15579) was terminated by signal 6: Aborted
2016-07-14 18:14:21.026 JST [][5783e038.3cd9][0][15577]LOG: terminating any other active server processes
And, master server's postgresql logs were nothing special.
But, master server's /var/log/messages was listed as below.
Jul 14 05:38:44 host kernel: sbridge: HANDLING MCE MEMORY ERROR
Jul 14 05:38:44 host kernel: CPU 8: Machine Check Exception: 0 Bank 9: 8c000040000800c0
Jul 14 05:38:44 host kernel: TSC 0 ADDR 1f7dad7000 MISC 90004000400008c PROCESSOR 0:306e4 TIME 1468442324 SOCKET 1 APIC 20
Jul 14 05:38:44 host kernel: EDAC MC1: CE row 1, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#1": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=8 Err=0008:00c0 (ch=0), addr = 0x1f7dad7000 => socket=1, Channel=0(mask=1), rank=4
Jul 14 05:38:44 host kernel:
Jul 14 18:30:40 host kernel: sbridge: HANDLING MCE MEMORY ERROR
Jul 14 18:30:40 host kernel: CPU 8: Machine Check Exception: 0 Bank 9: 8c000040000800c0
Jul 14 18:30:40 host kernel: TSC 0 ADDR 1f7dad7000 MISC 90004000400008c PROCESSOR 0:306e4 TIME 1468488640 SOCKET 1 APIC 20
Jul 14 18:30:41 host kernel: EDAC MC1: CE row 1, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#1": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=8 Err=0008:00c0 (ch=0), addr = 0x1f7dad7000 => socket=1, Channel=0(mask=1), rank=4
Jul 14 18:30:41 host kernel:
The memory error's started at 1 week ago. So, I doubt the memory error causes postgresql's error.
My question is here.
1) Can memory error of kernel cause postgresql's "WAL contains references to invalid pages" error?
2) Why there is not any logs at master server's postgresql?
thx.
Faulty memory can cause all kinds of data corruption, so that seems like a good enough explanation to me.
Perhaps there are no log entries at the master PostgreSQL server because all that was corrupted was the WAL stream.
You can run
oid2name
to find out which database has OID 16400 and then
oid2name -d <database with OID 16400> -f 559613
to find out which table belongs to file 559613.
Is that table larger than 12 GB? If not, that would mean that page 1671400 is indeed an invalid value.
You didn't tell which PostgreSQL version you are using, but maybe there are replication bugs fixed in later versions that could cause replication problems even without a hardware bug present; read the release notes.
I would perform a new pg_basebackup and reinitialize the slave system.
But what I'd really be worried about is possible data corruption on the master server. Block checksums are cool (turned on if pg_controldata <data directory> | grep checksum gives you 1), but possibly won't detect the effects of memory corruption.
Try something like
pg_dumpall -f /dev/null
on the master and see if there are errors.
Keep your old backups in case you need to repair something!

Google compute engine boot fails after hardware upgrade to Hashwell

I have a f1-micro instance running on Google Compute Engine running CentOS.
Some time ago I got a mail stating that the zone I am using (us-central1-b) is going through a hardware upgrade to new Hashwell servers, and all instances need to be restarted. So, I did restart. I shut down my services, and performed a restart via the developer dev. console.
But the instance never came up - the boot on the new hardware failed, and now I can not log in to the instance with ssh. I will add the serial console content below this.
My main issue is that there are some content I would really like to get my hands on (nope - didn't do a full backup - shoot!).
How do I restore the instance, when I can not log in using ssh?
Changing serial settings was 0/0 now 3/0
Start bios (version 1.7.2-20150226_170051-google)
Unable to unlock ram - bridge not found
Ram Size=0x26600000 (0x0000000000000000 high)
Relocating low data from 0x000e5810 to 0x000ef780 (size 2161)
Relocating init from 0x000e6081 to 0x265d3540 (size 51612)
CPU Mhz=2301
=== PCI bus & bridge init ===
PCI: pci_bios_init_bus_rec bus = 0x0
=== PCI device probing ===
Found 4 PCI devices (max PCI bus is 00)
=== PCI new allocation pass #1 ===
PCI: check devices
=== PCI new allocation pass #2 ===
PCI: map device bdf=00:03.0 bar 0, addr 0000c000, size 00000040 [io]
PCI: map device bdf=00:04.0 bar 0, addr 0000c040, size 00000040 [io]
PCI: map device bdf=00:03.0 bar 1, addr febfe000, size 00001000 [mem]
PCI: map device bdf=00:04.0 bar 1, addr febff000, size 00001000 [mem]
PCI: init bdf=00:01.0 id=8086:7110
PIIX3/PIIX4 init: elcr=00 0c
PCI: init bdf=00:01.3 id=8086:7113
Using pmtimer, ioport 0xb008, freq 3579 kHz
PCI: init bdf=00:03.0 id=1af4:1004
PCI: init bdf=00:04.0 id=1af4:1000
Found 1 cpu(s) max supported 1 cpu(s)
MP table addr=0x000fdaf0 MPC table addr=0x000fdb00 size=240
SMBIOS ptr=0x000fdad0 table=0x000fd990 size=314
Memory hotplug not enabled. [MHPE=0xffffffff]
ACPI DSDT=0x265fe070
ACPI tables: RSDP=0x000fd960 RSDT=0x265fe030
Scan for VGA option rom
Machine UUID 2b33f2bc-6042-1f12-c61f-0edd33ff965d
Found 0 serial ports
found virtio-scsi at 0:3
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#0,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#1,0
virtio-scsi vendor='Google' product='PersistentDisk' rev='1' type=0 removable=0
virtio-scsi blksize=512 sectors=20971520
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#2,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#3,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#4,0
....
....
....
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#248,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#249,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#250,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#251,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#252,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#253,0
KBD: int09 handler: AL=0
PS2 keyboard initialized
All threads complete.
Scan for option roms
Searching bootorder for: HALT
drive 0x000fd920: PCHS=0/0/0 translation=lba LCHS=1024/255/63 s=20971520
Space available for UMB: 000c0000-000eb800
Returned 122880 bytes of ZoneHigh
e820 map has 6 items:
0: 0000000000000000 - 000000000009fc00 = 1 RAM
1: 000000000009fc00 - 00000000000a0000 = 2 RESERVED
2: 00000000000f0000 - 0000000000100000 = 2 RESERVED
3: 0000000000100000 - 00000000265fe000 = 1 RAM
4: 00000000265fe000 - 0000000026600000 = 2 RESERVED
5: 00000000fffbc000 - 0000000100000000 = 2 RESERVED
Unable to lock ram - bridge not found
KBD: int09 handler: AL=0
Changing serial settings was 3/2 now 3/0
enter handle_19:
NULL
Booting from Hard Disk 0...
Booting from 0000:7c00
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-431.17.1.el6.x86_64 (mockbuild#c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed May 7 23:32:49 UTC 2014
Command line: ro root=UUID=a8cf6ab7-92fb-42c6-b95f-d437f94aaf98 rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD KEYTABLE=us SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_NO_LVM console=ttyS0,38400n8 rd_NO_DM
KERNEL supported cpus:
Intel GenuineIntel
AMD AuthenticAMD
Centaur CentaurHauls
Disabled fast string operations
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000265fe000 (usable)
BIOS-e820: 00000000265fe000 - 0000000026600000 (reserved)
BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved)
DMI 2.4 present.
SMBIOS version 2.4 # 0xFDAD0
Hypervisor detected: KVM
last_pfn = 0x265fe max_arch_pfn = 0x400000000
x86 PAT enabled: cpu 0, old 0x70106, new 0x7010600070106
Using GB pages for direct mapping
init_memory_mapping: 0000000000000000-00000000265fe000
RAMDISK: 25f82000 - 265ed02e
ACPI: RSDP 00000000000fd960 00014 (v00 Google)
ACPI: RSDT 00000000265fe030 00034 (v01 Google GOOGRSDT 00000001 GOOG 00000001)
ACPI: FACP 00000000265fff00 000F4 (v02 Google GOOGFACP 00000001 GOOG 00000001)
ACPI: DSDT 00000000265fe070 015B6 (v01 GOOG GODSDT 00000001 INTL 20140214)
ACPI: FACS 00000000265ffec0 00040
ACPI: SSDT 00000000265ff780 0073D (v01 Google GOOGSSDT 00000001 GOOG 00000001)
ACPI: APIC 00000000265ff660 0006E (v01 Google GOOGAPIC 00000001 GOOG 00000001)
ACPI: WAET 00000000265ff630 00028 (v01 Google GOOGWAET 00000001 GOOG 00000001)
Setting APIC routing to flat.
No NUMA configuration found
Faking a node at 0000000000000000-00000000265fe000
Bootmem setup node 0 0000000000000000-00000000265fe000
NODE_DATA [0000000000009000 - 000000000003cfff]
bootmap [000000000003d000 - 0000000000041cbf] pages 5
(7 early reservations) ==> bootmem [0000000000 - 00265fe000]
#0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
#1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000]
#2 [0001000000 - 0002020aa4] TEXT DATA BSS ==> [0001000000 - 0002020aa4]
#3 [0025f82000 - 00265ed02e] RAMDISK ==> [0025f82000 - 00265ed02e]
#4 [000009fc00 - 0000100000] BIOS reserved ==> [000009fc00 - 0000100000]
#5 [0002021000 - 00020210a5] BRK ==> [0002021000 - 00020210a5]
#6 [0000008000 - 0000009000] PGTABLE ==> [0000008000 - 0000009000]
found SMP MP-table at [ffff8800000fdaf0] fdaf0
crashkernel=auto resulted in zero bytes of reserved memory.
kvm-clock: Using msrs 4b564d01 and 4b564d00
kvm-clock: cpu 0, msr 0:1c247c1, boot clock
Zone PFN ranges:
DMA 0x00000001 -> 0x00001000
DMA32 0x00001000 -> 0x00100000
Normal 0x00100000 -> 0x00100000
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0: 0x00000001 -> 0x0000009f
0: 0x00000100 -> 0x000265fe
ACPI: PM-Timer IO Port: 0xb008
Setting APIC routing to flat.
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
Using ACPI (MADT) for SMP configuration information
SMP: Allowing 1 CPUs, 0 hotplug CPUs
PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
Allocating PCI resources starting at 26600000 (gap: 26600000:d99bc000)
Booting paravirtualized kernel on KVM
NR_CPUS:4096 nr_cpumask_bits:1 nr_cpu_ids:1 nr_node_ids:1
PERCPU: Embedded 31 pages/cpu #ffff880002200000 s94872 r8192 d23912 u2097152
pcpu-alloc: s94872 r8192 d23912 u2097152 alloc=1*2097152
pcpu-alloc: [0] 0
kvm-clock: cpu 0, msr 0:22167c1, primary cpu clock
Built 1 zonelists in Node order, mobility grouping on. Total pages: 154835
Policy zone: DMA32
Kernel command line: ro root=UUID=a8cf6ab7-92fb-42c6-b95f-d437f94aaf98 rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD KEYTABLE=us SYSFONT=latarcyrheb-sun16 rd_NO_LVM console=ttyS0,38400n8 rd_NO_DM
PID hash table entries: 4096 (order: 3, 32768 bytes)
xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340
Checking aperture...
No AGP bridge found
Memory: 594376k/628728k available (5326k kernel code, 392k absent, 33960k reserved, 7013k data, 1280k init)
Hierarchical RCU implementation.
NR_IRQS:33024 nr_irqs:256
Console: colour *CGA 80x25
console [ttyS0] enabled
allocated 2621440 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Detected 2299.982 MHz processor.
Calibrating delay loop (skipped) preset value.. 4599.96 BogoMIPS (lpj=2299982)
pid_max: default: 32768 minimum: 301
Security Framework initialized
SELinux: Initializing.
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
Initializing cgroup subsys blkio
Initializing cgroup subsys perf_event
Initializing cgroup subsys net_prio
Disabled fast string operations
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
mce: CPU supports 32 MCE banks
alternatives: switching to unfair spinlock
SMP alternatives: switching to UP code
Freeing SMP alternatives: 36k freed
ACPI: Core revision 20090903
ftrace: converting mcount calls to 0f 1f 44 00 00
ftrace: allocating 21778 entries in 86 pages
Enabling x2apic
Enabled x2apic
APIC routing finalized to physical x2apic.
..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1
CPU0: Intel(R) Xeon(R) CPU # 2.30GHz stepping 00
Performance Events: unsupported p6 CPU model 63 no PMU driver, software events only.
NMI watchdog disabled (cpu0): hardware events not enabled
Brought up 1 CPUs
Total of 1 processors activated (4599.96 BogoMIPS).
devtmpfs: initialized
regulator: core version 0.5
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1 for base access
bio: create slab <bio-0> at 0
ACPI: Interpreter enabled
ACPI: (supports S0 S3 S4 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7]
pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff]
pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
pci_root PNP0A03:00: host bridge window [mem 0x26600000-0xfebfffff]
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7]
pci_bus 0000:00: root bus resource [io 0x0d00-0xffff]
pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
pci_bus 0000:00: root bus resource [mem 0x26600000-0xfebfffff]
pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
vgaarb: loaded
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
NetLabel: Initializing
NetLabel: domain hash size = 128
NetLabel: protocols = UNLABELED CIPSOv4
NetLabel: unlabeled traffic allowed by default
Switching to clocksource kvm-clock
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 8 devices
ACPI: ACPI bus type pnp unregistered
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 6, 262144 bytes)
TCP established hash table entries: 131072 (order: 9, 2097152 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
NET: Registered protocol family 1
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 6572k freed
audit: initializing netlink socket (disabled)
type=2000 audit(1430203988.308:1): initialized
HugeTLB registered 2 MB page size, pre-allocated 0 pages
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
msgmni has been set to 1173
alg: No test for stdrng (krng)
ksign: Installing public key data
Loading keyring
- Added public key A760F53BCC8FA210
- User ID: CentOS (Kernel Module GPG key)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
pciehp: PCI Express Hot Plug Controller Driver version: 0.4
acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
ipmi message handler version 39.2
IPMI System Interface driver.
ipmi_si: Adding default-specified kcs state machine
ipmi_si: Trying default-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0
ipmi_si: Interface detection failed
ipmi_si: Adding default-specified smic state machine
ipmi_si: Trying default-specified smic state machine at i/o address 0xca9, slave address 0x0, irq 0
ipmi_si: Interface detection failed
ipmi_si: Adding default-specified bt state machine
ipmi_si: Trying default-specified bt state machine at i/o address 0xe4, slave address 0x0, irq 0
ipmi_si: Interface detection failed
ipmi_si: Unable to find any System Interface(s)
input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
ACPI: Power Button [PWRF]
input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
ACPI: Sleep Button [SLPF]
[Firmware Bug]: No valid trip found
GHES: HEST is not enabled!
Non-volatile memory driver v1.3
Linux agpgart interface v0.103
crash memory driver: version 1.1
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
�serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
serial8250: ttyS2 at I/O 0x3e8 (irq = 6) is a 16550A
Refined TSC clocksource calibration: 2300.003 MHz.
serial8250: ttyS3 at I/O 0x2e8 (irq = 7) is a 16550A
00:04: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:05: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:06: ttyS2 at I/O 0x3e8 (irq = 6) is a 16550A
00:07: ttyS3 at I/O 0x2e8 (irq = 7) is a 16550A
brd: module loaded
loop: module loaded
input: Macintosh mouse button emulation as /devices/virtual/input/input2
Fixed MDIO Bus: probed
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
uhci_hcd: USB Universal Host Controller Interface driver
PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
i8042.c: Warning: Keylock active.
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
rtc_cmos 00:01: RTC can wake from S4
rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one day, 114 bytes nvram
cpuidle: using governor ladder
cpuidle: using governor menu
EFI Variables Facility v0.08 2004-May-17
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
GRE over IPv4 demultiplexor driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
registered taskstats version 1
rtc_cmos 00:01: setting system clock to 2015-04-28 06:53:10 UTC (1430203990)
Initalizing network drop monitor service
Freeing unused kernel memory: 1280k freed
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 800k freed
Freeing unused kernel memory: 1584k freed
dracut: dracut-004-336.el6_5.2
dracut: rd_NO_LUKS: removing cryptoluks activation
dracut: rd_NO_LVM: removing LVM activation
udev: starting version 147
dracut: Starting plymouth daemon
dracut: rd_NO_MD: removing MD RAID activation
input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input3
%GFATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input4
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
dracut Warning: No root device "block:/dev/disk/by-uuid/a8cf6ab7-92fb-42c6-b95f-d437f94aaf98" found
%GFATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
dracut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line.
dracut Warning: Signal caught!
dracut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line.
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 2.6.32-431.17.1.el6.x86_64 #1
Call Trace:
[<ffffffff815274ef>] ? panic+0xa7/0x16f
[<ffffffff81077292>] ? do_exit+0x862/0x870
[<ffffffff8118a525>] ? fput+0x25/0x30
[<ffffffff810772f8>] ? do_group_exit+0x58/0xd0
[<ffffffff81077387>] ? sys_exit_group+0x17/0x20
[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
I suggest at first to follow standard troubleshooting steps to validate the disk's filesystem as documented.
Then you can try to snapshot the disk and create a new instance with boot disk from snapshot to avoid exclude a disk UUID conflict.
Finally you can try to attach the disk to recover the files to a brand new instance as I suggested in my previous comment.
If you can not validate the filesystem as per troubleshooting guide the data is most probably lost and nothing more can be done to recover it.
Please note as a good practice to take care of either periodic snapshot, GCS differentiual backup or other replication method.
Thank you.
Sincerely,
P.

Raspberry PI u-boot reports "no partition table"

I have built the very latest u-boot release and placed the image on a Raspberry PI. After boot, I get this error message:
U-Boot 2013.10 (Oct 24 2013 - 09:35:44)
DRAM: 448 MiB
WARNING: Caches not enabled
MMC: bcm2835_sdhci: 0
Using default environment
In: serial
Out: lcd
Err: lcd
Hit any key to stop autoboot: 0
mmc0 is current device
** No partition table - mmc 0 **
U-Boot>
Obviously, u-boot failed to mount the file system.
Can this be related to some mmc related setting as access speed?
Thanks in advance