Mounting GCS over FUSE - google-cloud-storage

When mounting GCS through FUSE, gcsfuse does the file/files stored in the mount point are saved on the local disk file system (meaning does it consume actual disk space) or all data is stored directly to the cloud?

gcsfuse downloads files to a temporary location, and keeps a cache. This is usually the right thing because otherwise you can use up all your available ram. If you want, you can prevent storing a local copy on disk by setting --temp-dir to a ramdisk.

Related

Confusion in the dbutils.fs.ls() command output. Please suggest

When I use the below command in Azure Databricks
display(dbutils.fs.ls("/mnt/MLRExtract/excel_v1.xlsx"))
My output is coming as wasbs://paycnt#sdvstr01.blob.core.windows.net/mnt/MLRExtract/excel_v1.xlsx
not as expected-- dbfs://mnt/MLRExtract/excel_v1.xlsx
Please suggest
Mounting a storage account to Databricks File System allows users to access them any number of times without any credentials. Any files or directories can be accessed from Databricks clusters using these mount points. The procedure you used allows you to mount blob storage container to DBFS.
So, you can access your blob storage container from DBFS using the mount point. The method dbutils.fs.ls(<mount_point>) displays all the files and directories available in that mount point. It is not necessary to provide path of a file, instead simply use:
display(dbutils.fs.ls(“/mnt/MLRExtract/”))
The above command returns all the files available in the mount point (which is your blob storage container). You can perform all the required operations and then write to this DBFS, which will be reflected in your blob storage container too.
Refer to the following link to understand more about Databricks file system.
https://docs.databricks.com/data/databricks-file-system.html

Truncated file in XFS filesystem using dd - how to recover

Disk layout
My HDD raid array is going for his end of life, and I bought some new disks for it.
Old HDD I have used as a storage of raw disk images for kvm/qemu virtual machines.
Raid array was built using mdadm. On md device I have physical volume for LVM. On physical volume I have XFS file system which stores raw disk images.
Every raw disk image was made by qemu-img and contains physical volume for LVM. One PV = one LV = one VG inside raw disk image.
Action
When I tried to use cp for data moving I was encountered with bad blocks and i/o problems in my raid array, so I switched from cp to dd with noerror,sync flags
I wrote dd if=/mnt/old/file.img of=/mnt/**old**/file.img bs=4k conv=noerror,sync
Problem
Now file /mnt/old/file.img has zero size in XFS file system.
Is there a simple solution to recover it?
My sense is your RAID array has failed. You can see the RAID state with...
cat /proc/mdstat
Since you are seeing i/o errors that is likely the source of your problem. The best path forward would be to make sector level copies of each RAID member (or at a minimum the member(s) that are throwing i/o errors). See Linux ddrescue. It is desigened to copy failing hard drives. Then perform recovery work from the copies.
Finally I have found the solution, but it isn't very simple.
Xfs_undelete haven't matched my problem because it does not support B+Tree extent storage format (V3) for very big files.
Successfull semi-manual procedure that had successed my problem is consists of theese main steps:
Unmount filesystem immediately and make full partition backup using dd to a file
Investigate XFS log entries about truncated file
Revert manually inode core header using xfs_db in expert mode
MB. Recovering inode core will not unmark extents as non-free, and when you try to copy some data in usual way from file with recovered inode header you will get i/o error.
It was a cause for developing python script.
Use script that extracts extents data from B+Tree tree for inode and writes them to disk
I have published recovery script under LGPL license at GitHub
P.S. Some data was lost because of corrupted inode b+tree extent records, but they are haven't make sense for me.

How to achieve the association of file and memory in K8S?

In k8s, we can use the memory medium(tmpfs instance) to define the emptyDir volume and mount it to pod's container. In the container, we can read and write data according to the file interface.
I want to know how does k8s achieve the association of file and memory? What is the principle of reading and writing memory data as file? mmap?
According to wikipdia:
tmpfs is a temporary file storage paradigm implemented in many Unix-like operating systems. It is intended to appear as a mounted file system, but data is stored in volatile memory instead of a persistent storage device. A similar construction is a RAM disk, which appears as a virtual disk drive and hosts a disk file system.
So its not k8s feature. Is is a Linux feature that just appears to be used by k8s.
You can read more about it in linux kernel documentation

How to configure mongo to use different volumes for databases?

I see that mongo has the configuration option storage.directoryPerDB. But I only see storage.dbPath to specify which path data is stored.
We have 2 small frequently used "settings" databases that will be stored locally in the default location. There is another "results" database for large image files, that will be written often, but queried infrequently, which has a dedicated SSD drive for its storage, this data needs to be on is own drive because our application can generate hundreds of gigs of image data in a small amount of time.
How can I configure mongod to store a database on a different drive? The server is running on windows, if that makes any difference.
Nevermind. The documentation at http://docs.mongodb.org/manual/reference/configuration-options/#storage.directoryPerDB explains how to do it perfectly.
Along with http://technet.microsoft.com/en-us/library/cc753321.aspx#BKMK_CMD
which describes how to mount a drive to a folder location.

GCE Use Cloud Storage Bucket as Mounted Drive

Is there a way to mount a storage bucket to an instance so it can be used by the webserver as storage? If not, how can I add more storage to the instance without adding another persistent disk with an OS?
Aside from attaching a new persistent disk, you could also use a number of FUSE based utilities to mount either a Google Cloud Storage or AWS S3 bucket as a local disk.
s3fs:
*Can work with Google Cloud or AWS
*Bucket can be mounted on multiple systems at same time
*Files are stored as objects on the bucket, so the files can be manipulated externally
*A con is that it can be a little bit slow if you have a lot of files
S3QL:
*Can work with Google Cloud or AWS
*Bucket can be mounted on one system
*Files are stored in a proprietary format, can't be manipulated outside of the mounted filesystem
*Much faster than s3fs for many files
*Doesn't handle network connectivity issues so well (manual fsck and remount if you lose network).
Hope this helps.
You can certainly create a new (larger) Persistent Disk and attach it to your instance as a data disk. This is a very good option, since it keeps your website data separate from your operating system. See the Persistent Disk docs for details on all the options.
In your case:
Create a new Persistent Disk for the data. Pick a size large enough for your data and large enough to get the I/O throughput you want. (See this chart for details)
Attach the disk to your instance.