Google Cloud Storage FUSE - Using gcsfuse fills up local instance memory - google-cloud-storage

I've been using gcsfuse (FUSE) for some weeks and everything was running smoothly until my instance disk(10GB) got filled up out of nowhere.
I was trying to identify the cause and erasing some temporal files and found out that unmounting the bucket fixed the issue.
It's supposed to upload to the cloud right? So why is it taking up space as if it was counted as local instance space?
Thanks for the help guys.

Here is a reason why you would see this behaviour.
Pasting from the gcsfuse doc
https://cloud.google.com/storage/docs/gcs-fuse
Local storage: Objects that are new or modified will be stored in
their entirety in a local temporary file until they are closed or
synced. When working with large files, be sure you have enough local
storage capacity for temporary copies of the files, particularly if
you are working with Google Compute Engine instances. For more
information, see the readme documentation.

Related

Mount a shared volume to Kubernetes cluster so that all users can access same storage and share files

I am following Zero to JupyterHub with Kubernetes to create a jupyterHub environment for my team to use.
Using Google Kubernetes Engine and every user gets his/her own storage and files are stored - this setup works fine.
I am having trouble as how should I create a volume or shared database so that everyone in team can see each other's notebooks, share file's and data.
To explain more, in desired setup - when a user signs in and goes to his/her jupyter image - every user sees the same folder "shared" and one can create individual folders for themselves inside that folder but are able to reuse code that someone else has already written.
I looked into NFS with Firestore but that seems very expensive.
As in the documentation gcePersistenceDisk do not support multiple read and write.
There is alternative solution for the problem. Rook is a storage backend various storage provisioner available through it. One of them is Ceph which has shared filesystem solution on kubernetes.

Google Compute Engine snapshot of instance with persistent disks attached failed

I have a working VM instance that I'm trying to copy to allow redundancy behind google load balancer.
A test run with a dummy instance worked fine, creating a new instance from a snapshot of a running one.
Now, the real "original" instance have a persistent disk attached and this cause a problem in starting up the cloned instance because of the (obviously) missing persistent disk mount.
Logs from serial console output is as:
* Stopping cold plug devices[74G[ OK ]
* Stopping log initial device creation[74G[ OK ]
* Starting enable remaining boot-time encrypted block devices[74G[ OK ]
The disk drive for /mnt/XXXX-log is not ready yet or not present.
keys:Continue to wait, or Press S to skip mounting or M for manual recovery
As I understand there is no way to send any of this key strokes to the instance, is there any other way to overcome this issue? I know that I could unmount the disk before the snapshot, but the workflow I would like to instate is creating period snapshots of production servers, so un-mounting disks every time before performing it would require instance downtime (plus all the unnecessary risks of doing an action that would seem pointless).
Is there a way to boot this type of cloned instances successfully, and attach a new persistence disk afterwards?
Is this happening because the original persistent disk is in use, or the same problem would occur even if the original instance is offline (for example due to a failure in which case I would try to created a new instance from a snapshot)?
One workaround that I am using to get away from the same issue is that I dont't actually unmount the disk rather just comment out the the mount line in /etc/fstab and take the snapshot. This way my instance has no downtime or down disks while snapshoting. (I am using Ubuntu 14.04 as OS if that matters)
Later I fix and uncomment it when I use that snapshot on a new instance.
However you can also look into adding the nofail option in the commented line to get a better solution.
By the way I am doing a similar task building a load balanced setup with multiple webserver nodes. Each being cloned from the said snapshot with extra persistent disks mounted for eg uploads,data and logs etc
I'm a little unclear as to what you're trying to accomplish. It sounds like you're looking to periodically snapshot the data volumes of a production server so you can clone them later.
In all likelihood, you simply need to sync and fsfreeze to before you make your snapshot, rather than just unmounting/remounting it. The GCP documentation has a basic example of this in the Snapshots documentation.

Google cloud storage bucket is getting unmounted

I am running some application on google compute engine where it the application reads the data from google cloud storage and writes data to persistent disk. And the bucket is mounted using gcsfuse.
But in the middle the bucket is getting unmounted and my application is going to sleep mode and getting stalled.
When I try to see the content in Mounted directory I am getting following error
cannot access /home/santhosh/MountPoint/ Transport endpoint is not connected
Is there any time limit on the bucket live? How can we see the bucket is mounted all the time?
Can someone please help me how can I resolve this? I want the program to run without any breaks in the middle.
I experience same problems, random I/O errors, unmounting. Do not use gcsfuse in production, from the doc: "Please treat gcsfuse as beta-quality software." We use it for the maintenance only.

GCE Use Cloud Storage Bucket as Mounted Drive

Is there a way to mount a storage bucket to an instance so it can be used by the webserver as storage? If not, how can I add more storage to the instance without adding another persistent disk with an OS?
Aside from attaching a new persistent disk, you could also use a number of FUSE based utilities to mount either a Google Cloud Storage or AWS S3 bucket as a local disk.
s3fs:
*Can work with Google Cloud or AWS
*Bucket can be mounted on multiple systems at same time
*Files are stored as objects on the bucket, so the files can be manipulated externally
*A con is that it can be a little bit slow if you have a lot of files
S3QL:
*Can work with Google Cloud or AWS
*Bucket can be mounted on one system
*Files are stored in a proprietary format, can't be manipulated outside of the mounted filesystem
*Much faster than s3fs for many files
*Doesn't handle network connectivity issues so well (manual fsck and remount if you lose network).
Hope this helps.
You can certainly create a new (larger) Persistent Disk and attach it to your instance as a data disk. This is a very good option, since it keeps your website data separate from your operating system. See the Persistent Disk docs for details on all the options.
In your case:
Create a new Persistent Disk for the data. Pick a size large enough for your data and large enough to get the I/O throughput you want. (See this chart for details)
Attach the disk to your instance.

Copying a virtual machine data drive in Microsoft Azure

Added more details at the bottom of the question.
We are testing deployment scenarios in Azure VM preview and have run into an issue.
Here is our scenario. We have a software stack that we use in all of our servers. We have created an image with all of that stack installed on an attached data drive. We have created a image of the VM that we can use as a template. Now what we want to do is to to create a VM based on that template and create a copy of the data drive and attach it to the newly created VM in an automated manner.
Our problem is that while we have found lots of information about creating drives, we can't find any guidance on how to copy the data drive using Azure for Powershell.
Any thoughts, code, or RTFMs happily accepted.
Cheers,
Terence
We have sucessfully created an operating system image that we can use to create VM's. But there is a data disk that holds our standard software stack that we want to reuse by copying it across VMs. The scenario that we are trying to implement is:
Create a VM from a standard VM image - PBIMaster
Attach a disk as F to that image called PBIMasterDisk
Install all of the software required for our app on F: (to big for the OS disk and besides sticking it on the OS disk seems messy)
Build an image from PBIMaster call it PBIMasterImage save it.
Create a new image from PBIMaster call it Node1
Copy PBIMasterDisk to a new Azure disk call it Node1Software disk
Attach Node1Softwaredisk to Node1 as F:
Since the image has the correct registry settings from the previous installs our stack is ready to go.
9 Add appropriate endpoints.
Rinse and repeat for each additional node.
Hopefully that makes our scenario clearer.
Thanks.
If I understood your objective correctly you already have uploaded two VHD in your subscription and you have also create a VM based on your OS Disk VHD1:
OS Disk (VHD1)
Data Disk (VHD2)
Now you want to copy VHD2 to VHD3 and then attach VHD3 to your VM (which is based on OS disk) via Powershell.
As of there is no powershell command which will let you copy DataDisk (VHD2) to another data disk (i.e VHD3)..
I haven't tried but you can use the following code to try copying your DataDisk:
http://blogs.msdn.com/b/windowsazurestorage/archive/2012/06/12/introducing-asynchronous-cross-account-copy-blob.aspx
This method does copy blobs directly at cloud storage level so there is no bandwidth usage towards on-premise and potentially zero cost if you are in same DC. Trying using the same subscription and see if that solves your problem.