Lost ZFS pool and looking for ways to recover - recovery

In a Proxmox machine I noticed some of the backups of some VM's were failing, so I wanted to test stuff.
Whilst testing the whole host stopped responding and I forced a reboot.
After the reboot I seem to have lost the whole data store.
Almost every zfs command results in a freeze.
zpool status,zpool list, you name it, it locks up and you can't even ctrl break out of it.
I can still create a new SSH session and try other things though.
In an attempt to see what is causing the commands to hang I thought about running
zpool set failmode=continue
hoping it will show me an error, but as you can guess, that command also hangs.
It's a pool created on two nvme drives. The original command to create the pool was
zpool create -f -o ashift=12 storage-vm /dev/nvme0n1 /dev/nvme1n1
First thing I thought was that one of the nvme's had gone bad so I checked the SMART status, but it shows both drives are perfectly healthy.
Then before trying other stuff I decided to backup the drives to an NFS share with the dd command.
dd if=/dev/nvme0n1 of=/mnt/pve/recovery/nvme0n1
dd if=/dev/nvme1n1 of=/mnt/pve/recovery/nvme1n1
Both commands completed and on the NFS share I have 2 images of the exact same size (2TB each)
Then I tried to do a non destructive read/write test with dd on both the nvme's and got no errors.
In order to rule out as much as possible I build another Proxmox machine using spare hardware (same brand and type etc.) and place the drives in there.
On the new machine all zpool commands also hang. If i run zpool status with the drives removed from the motherboard, it does not hang, but obviously it has nothing to show.
So I placed the nvme's back in the original machine.
zdb -l /dev/nvme0n1 gives
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3
which kind of worries me. It does the same for the other nvme.
And now I'm running out of ideas. I have little knowledge of the zfs system and don't know what is possible to save the data.
Obviously, the drives are not really dead as the smart tells me it is healthy and I can dd an image from them.
Things like faulty RAM or motherboard are pretty much ruled out also with the hardware swap.
Is there a way to recover at least some VM's from that storage?
Help/pointers wil be greatly appreciated.

The issue was eventually solved and this is what I did.
Considering the volume was made out of 2 nvme drives I created 2 loop devices using the dd images.
losetup -fP /mnt/pve/recovery/nvme0n1
losetup -fP /mnt/pve/recovery/nvme1n1
You can check the mounted loop devices with lsblk and unmount them with losetup -d /dev/loop[X]
Finally I imported the pool devices into ZFS in readonly mode and I was able to access/recover all my data
zpool import -f -d /dev/loop0p1 -f -d /dev/loop1p1 -o readonly=on storage-vm

Related

Rsnapshot filepermission problem with network hdd over raspberry pi

After trying to solve this for days, I want to ask for help here:
I want to make backups with rsnapshot, which usually runs on a server and manages local backups. In my case, I want to run rsnapshot on my computer and let rsnapshot manage my backups on an externel harddrive. This externel harddrive is connected to my raspberry pi and mounted to my computer with following command:
sudo sshfs -o default_permissions,allow_other,idmap=user,IdentityFile=/home/user/.ssh/id_rsa pi#192.168.0.1:/mnt/externelHdd /mnt/backupHdd
Here, /mnt/backupHdd is the local root for rsnapshots backup directory.
Additionally, I want to connect the external harddrive directly to my computer for bigger backup jobs. For this purpose I wrote a script, which mounts the external harddrive either locally or over network with upper command. Afterwards, it starts the rsnapshot job with sudo rsnapshot daily. When the harddrive is connected locally, everthing works fine. When it's connected over sshfs, I get permission denied errors.
Rsnapshot apperently is not allowed to manage files per sshfs, when the files/directories were created with physical connection (different users: local and rasppi). I tried to solve this with the option allow_other and idmap=user but I think there is more to do. So Im asking you guys: How can I give permissions to rsnapshot?
Thanks for any help!
edit:
I get the following error:
/bin/cp: cannot create directory '/mnt/backupHdd/daily.1': Permission denied
----------------------------------------------------------------------------
rsnapshot encountered an error! The program was invoked with these options:
/usr/bin/rsnapshot daily
----------------------------------------------------------------------------
ERROR: /bin/cp -al /mnt/backupHdd/daily.0 /mnt/backupHdd/daily.1 failed (result 256, exit status 1).
ERROR: Error! cp_al("/mnt/backupHdd/daily.0/", "/mnt/backupHdd/daily.1/")
daily.0 was created when the hdd was connected to my local computer. daily.1 should be created with my hdd mounted over sshfs.
I'm assuming your running rsnapshot as root and root owns the remote backup directory. This command:
sudo sshfs -o default_permissions,allow_other,idmap=user,IdentityFile=/home/user/.ssh/id_rsa pi#192.168.0.1:/mnt/externelHdd /mnt/backupHdd
Is not going to work out as I think you are intending. Even though you are using sudo on the local side of the connection, your still SSH-ing in as "pi" meaning everything done on the far side of the connection is done by the user pi. No option to sshfs can change this fact. You'd need to enable root login and then ssh in as root, or at least some user that has full R/W access to that drive.

Opening up ZFS pool as writable

I have installed FreeBSD onto a raw image file using QEMU Emulator successfully. I have formatted the image file using the ZFS file system (ZFS POOL).
Using the following commands below I have successfully mounted the image file ready to be opened by zpool
sudo losetup /dev/loop0 [path-to-file].img
sudo kpartx -l /dev/loop0
sudo kpartx -av /dev/loop0
However with the next command show below....
sudo zpool import -R [MOUNT-PATH] -d /dev/mapper
I get the following error message
The pool can only be accessed in read-only mode on this system. It
cannot be accessed in read-write mode because it uses the following
feature(s) not supported on this system:
com.delphix:spacemap_v2 (Space maps representing large segments are more efficient.)
The pool cannot be imported in read-write mode. Import the pool with
"-o readonly=on", access the pool on a system that supports the
required feature(s), or recreate the pool from backup.
I cannot find anywhere online about the feature called 'spacemap_v2'. How do I install this or how do I mount my zfs pool to be writable. I know I can mount it as read-only but that defeats the purpose of what I want to do as I want to be able to write data to copy/paste data in its mountable platform interface.
Does anyone know how to achieve this. I shall be grateful for a response.
Regards
What version of FreeBSD are you using? And where did this ZFS pool come from?
I'm guessing it's a ZFS On Linux pool which, as the message says, is using a feature which FreeBSD's ZFS doesn't currently support.
The only way around it at the moment is to create another pool without the feature on a system that does support it, zfs send to the new pool and then import that pool into FreeBSD.
Note FreeBSD is going to support this feature Soon(tm).

Raspberry Pi Losing Mounted Drive After Reboot

Brand new to the world of Pi - like so new that I had never even touched one until three days ago, and know very little about Linux... I have a Western Digital MyBook plugged directly into my router, and I've found I'm able to mount this as a drive with the following command:
sudo mount -t cifs -o user=yourusername,passwd=yourpasswd,rw,file_mode=0777,dir_mode=0777 //mybookIP/public /mnt/mybook
Unfortunately, it seems to drop this mount whenever I reboot. Anyone have a suggestion on how to make this permanent?
Based on the comments here, this is what I did:
First, in Terminal I ran:
sudo nano /etc/fstab
Once that was opened, I added the line:
//mbookIP/public /mnt/mybook cifs _netdev,username=yourusername,password=yourpasswd 0 0
Once I saved this I was able to reboot and the mounted drive was visible when it all loaded back up again.

Reliability of Docker containers

My question aims at verifying and maybe rectifying my idea of the reliability of Docker containers. I read both, the Docker documentation and several articles on VOLUME in the Dockerfile and --v as an argument when running a container as means to persist data outside a Docker container. Be it in a data container or on the host system. As would like to keep the complexity of my setup simple, I would prefer not to copy/save/store data round and about but keep it in the Docker container itself.
There are several cases through which I discovered the behaviour of Docker containers. I'd like to know if I missed a scenario where a container can be 100% lost unpurposely, i.e. NOT doing $ docker rm -f mycontainer
docker commands to pause, stop and kill a container
-> restartable by $ docker restart mycontainer or $ docker run mycontainer
Host system reboot
-> docker container exits with 0 or 255
Host system unexpected power off
-> What happens?
Application exception
-> docker container exits with -1
Updating or restarting docker (as pointed out by Greg)
-> expected behavior: like on system reboot (?)
In all those cases, the docker container is still existent in the end. So is there any other scenario that can cause a docker container to be lost like with $ docker rm -f mycontainer?
The background is, that I read a lot about mounted volumes and external datastorage on the host system for Postgres but I'd like to avoid storing data outside my containers on the host system if possible. On the other hand, I don't want to wake up and have all data lost. (I do perform regular SQL-dumps, but I don't want to do this every 5 minutes). If a docker container itself is not reliable for persistant data, I don't see why I should create a second container to hold the data for a first one and increase the complexity of my system by adding a new container but not gaining anything in terms of reliability.
Edit: There are two points in the Docker userguide on Volumes which do not explicitly explain which behaviour to expect and therefore making me question if these concepts provide extra reliability:
Changes to a data volume will not be included when you update an
image
-> Does that mean that they get lost or that the content of the volume won't be changed?
Volumes persist until no containers use them
-> What's the definition of 'use'? As long as a container is not stopped, killed, removed? Does that mean that the volume Docker created on the host system will get removed? Or does volume only refer to a virtual bridge between a directory inside Docker and one on the host system?
If you store all your data in the container, what are you going to do when you need to update the image? Updates to images are normally done by changing the Dockerfile and rebuilding the image. If my data is kept separate to my container, I can start a new version of the image, mount the data with --volumes-from or -v and kill the old container. In your case, you have to keep the container running and try to patch in place with something like puppet.
Also, I'm not sure what you think you're saving. If you run the official postgres image, it will have declared volumes in the Dockerfile. Those volumes exist as normal directories on your host system whether you ran the container with -v or not. Even if your Dockerfile has no volumes, clearly the UFS is being stored on your host anyway.
In general, you should consider containers to be temporary and stateless. Whilst you don't have to do this, you will find most of the tooling and support services are designed around this idiom.
Regarding your scenarios, there are a few you're missing:
A bug could make it impossible to restart a stopped container
The updating issue mentioned above
If you want to change storage driver. This will cause a great deal of problems, as you need to migrate your images.
Just for clarity on the commands, docker start will restart stopped or exited containers and docker unpause will unpause paused containers.

Cannot start, stop, enter VE using OpenVZ

I'm using Debian Unstable kernel 2.6.32-5-openvz-amd64 (But I don't think it's a problem).
After install and run our VEs for several month, our hard disk is nearly full and we add 3 more hard drives to make new RAID 5 Array, format it as ext4 then mount with location /openvz
I have a VE with ID 112, I desire to change its configuration to make private area from /var/lib/vz/private/112 (1) to /openvz/112 (2)
After syncing all data from (1) to (2), I cannot start VE 112. I revert configuration back to original, but, when I use vzctl status 112:, it shows:
# vzctl status 112
VEID 112 exist mounted running
and cannot enter the ve:
# vzctl enter 112
enter into VE 112 failed
cannot stop or restart with error: Operation time out.
I've tried many ways: try to umount, mount the private area, or use MAKEDEV to make tty or pty, using vzctl chkpnt 112 --kill but it does not work.
I dont want to reboot this server, it contains 2 others VEs that are running well without problem. If someone did face with same problem, please let me know your solutions.
Thank you very much,
--hung
Are you able to exec commands within your CT using 'vzctl exec'?
If it is possible try
vzctl exec 112 ps aux
to check what is running within your CT.
If you cannot login to your CT because of missing /dev/pts you may mount it with 'vzctl exec':
vzctl exec 112 mount devpts /dev/pts -t devpts
The answer for my question is: I reformat new partition with ext3 and re-sync data. everything worked as normal :)
Have you tried starting the VPS up in verbose mode? You can do this by doing:
vzctl --verbose start 112