Truncated file in XFS filesystem using dd - how to recover - truncate

Disk layout
My HDD raid array is going for his end of life, and I bought some new disks for it.
Old HDD I have used as a storage of raw disk images for kvm/qemu virtual machines.
Raid array was built using mdadm. On md device I have physical volume for LVM. On physical volume I have XFS file system which stores raw disk images.
Every raw disk image was made by qemu-img and contains physical volume for LVM. One PV = one LV = one VG inside raw disk image.
Action
When I tried to use cp for data moving I was encountered with bad blocks and i/o problems in my raid array, so I switched from cp to dd with noerror,sync flags
I wrote dd if=/mnt/old/file.img of=/mnt/**old**/file.img bs=4k conv=noerror,sync
Problem
Now file /mnt/old/file.img has zero size in XFS file system.
Is there a simple solution to recover it?

My sense is your RAID array has failed. You can see the RAID state with...
cat /proc/mdstat
Since you are seeing i/o errors that is likely the source of your problem. The best path forward would be to make sector level copies of each RAID member (or at a minimum the member(s) that are throwing i/o errors). See Linux ddrescue. It is desigened to copy failing hard drives. Then perform recovery work from the copies.

Finally I have found the solution, but it isn't very simple.
Xfs_undelete haven't matched my problem because it does not support B+Tree extent storage format (V3) for very big files.
Successfull semi-manual procedure that had successed my problem is consists of theese main steps:
Unmount filesystem immediately and make full partition backup using dd to a file
Investigate XFS log entries about truncated file
Revert manually inode core header using xfs_db in expert mode
MB. Recovering inode core will not unmark extents as non-free, and when you try to copy some data in usual way from file with recovered inode header you will get i/o error.
It was a cause for developing python script.
Use script that extracts extents data from B+Tree tree for inode and writes them to disk
I have published recovery script under LGPL license at GitHub
P.S. Some data was lost because of corrupted inode b+tree extent records, but they are haven't make sense for me.

Related

How to achieve the association of file and memory in K8S?

In k8s, we can use the memory medium(tmpfs instance) to define the emptyDir volume and mount it to pod's container. In the container, we can read and write data according to the file interface.
I want to know how does k8s achieve the association of file and memory? What is the principle of reading and writing memory data as file? mmap?
According to wikipdia:
tmpfs is a temporary file storage paradigm implemented in many Unix-like operating systems. It is intended to appear as a mounted file system, but data is stored in volatile memory instead of a persistent storage device. A similar construction is a RAM disk, which appears as a virtual disk drive and hosts a disk file system.
So its not k8s feature. Is is a Linux feature that just appears to be used by k8s.
You can read more about it in linux kernel documentation

Could not create shared memory segment. Failed system call was shmget. PostgreSQL MacOS Mojave. Symlinked postgres data directory

This is a common error message and there are many general answers that have not worked for me.
I think I have isolated this particular problem to the PostgreSQL data directory being symlinked to an external hard drive.
FATAL: could not create shared memory segment: No space left on device
DETAIL: Failed system call was shmget(key=5432001, size=56, 03600).
HINT: This error does *not* mean that you have run out of disk space. It occurs either if all available shared memory IDs have been taken, in which case you need to raise the SHMMNI parameter in your kernel, or because the system's overall limit for shared memory has been reached.
$ sysctl -a | grep sysv
kern.sysv.shmmax: 412316860416
kern.sysv.shmmin: 8
kern.sysv.shmmni: 64
kern.sysv.shmseg: 128
kern.sysv.shmall: 100663296
$ sudo cat /etc/sysctl.conf
kern.sysv.shmmax=412316860416
kern.sysv.shmmin=8
kern.sysv.shmmni=64
kern.sysv.shmseg=128
kern.sysv.shmall=100663296
PostgreSQL version 9.4.15. From my PostgreSQL config
shared_buffers = 128MB
Don't know what other settings would be relevant.
Other environment details:
The external hard drive with the data directory is at only 50% capacity. My RAM usage when this happens is ~60% capacity.
I have not been able to determine an exact set of steps that reproduces the bug. I have an external hard drive with a PostgreSQL data directory and a local folder with another data directory. In my project, I'll symlink to one or the other depending on which copy of data I want to use. As far as I have noticed, the problem only appears when I've been working off the symlinked hard drive and when I unplug it without stopping the server and then plug it back in. But it doesn't happen every time when I perform those steps.
I don't expect anyone to be able to point to the specific problem given the above description.
But how can I get more useful information next time I'm in a bugged state? Are there any system commands that would help identify the exact problem?
...It occurs either if all available shared memory IDs have been taken, in which case you need to raise the SHMMNI parameter in your kernel, or because the system's overall limit for shared memory has been reached.
How can I check if if all available shared memory IDs have been taken or if the system's overall limit for shared memory has been reached and what do I do with the answer?

Network usage of file open and seek in distributed filesystem like ceph

When I open a file stored in another node in a distributed filesystem and just read the 100 bytes. Does the filesystem try to "prefetch" more data to my node? E.g., The network traffic for sending the file is more than 100 bytes.
Another question is if I seek to the end of the file stored in another node. Does the distributed file system try to send the entire file to me? Or there is no network usage for transferring the file since only the pointer to the position in the file changed?

HowTo zdb -e poolname to recover data from a single ZFS device

I have the following situation:
1*10TB Drive, full of data on a ZFS
I wanted to add a 100GB NVME partition as a cache
instead of using zpool add poolname cache nvmepartion I wrote zpool add poolname nvmepartition
I did not see my mistake and exported this pool.
Neither the NVME drive is availeable any more, nor the system has any information about this pool in the ZFS cache (due to export).
current status:
zpool import shows the pool but I cannot import the pool using any way found on the internet.
zdb -e poolname shows me what i know: the pool, its name, that it (sadly) has 2 children which one is not availeable any more - and that the system has no informatioon about the missing child (so all tricks i found on internet in linked a ghost device etc. wont work either)
as far i know the only way is to use ZDB to generate all files through
the "journal" and pipe/save them to another path.
**
but how? Nowhere I found any documentation on that.
**
note: the 10 TB drive was 90% full, then I added the NVME partion as a sibling - as ZFS is no real raid 0 and due to the fact that these sibling have been so unequal in size and as I did not wrote many data after my mistake happened - I am quite sure that most of my data is still there.

100GB free space on NFS server, but can't write even an empty file

In my Production NFS server more 100GB free, but I can't write even an empty file on that drive. Please find the attached image for clarification. Now I have fixed the issue by removing some folders on that drive.
Use both df & df -i after reading df(1); perhaps you have too many inodes in your file system. See also stat(1) so run stat -f
Perhaps you have reached some disk quota. See also quota(1)
Consider using strace(1) to find the failing syscall and its errno