Google cloud storage bucket is getting unmounted - google-cloud-storage

I am running some application on google compute engine where it the application reads the data from google cloud storage and writes data to persistent disk. And the bucket is mounted using gcsfuse.
But in the middle the bucket is getting unmounted and my application is going to sleep mode and getting stalled.
When I try to see the content in Mounted directory I am getting following error
cannot access /home/santhosh/MountPoint/ Transport endpoint is not connected
Is there any time limit on the bucket live? How can we see the bucket is mounted all the time?
Can someone please help me how can I resolve this? I want the program to run without any breaks in the middle.

I experience same problems, random I/O errors, unmounting. Do not use gcsfuse in production, from the doc: "Please treat gcsfuse as beta-quality software." We use it for the maintenance only.

Related

What happens when network connection to GCP is lost?

Imagine I have a GCS bucket mounted on my local Linux file system. Imagine I have an app that is writing new files into a Linux directory that is mounted to GCS. My goal is to have those locally written files eventually show up in GCS.
I understand that the writes on Linux happen "locally" until the file is closed ... what happens if I lose network connectivity and hence can't write to GCS? Will the local file eventually end up in GCS? Do retries and re-attempts happen?
Based on the repository documentation for gcsfuse, file upload retries are already built into the utility, and they happen when there are problems accessing the storage bucket that is mounted. You are able to modify the maximum backoff for retries by using the --max-retry-sleep flag. This flag controls the maximum time that can be reached between retries, after which retrying stops. The flag accepts an X amount of minutes as input.
This doc page is also relevant if you would like to know more about specific characteristics of gcsfuse.

Migrate to kubernetes

We're planning to migrate our software to run in kubernetes with auto scalling, this is our current infrastructure:
PHP and apache are running in Google Compute Engine n1-standard-4 (4 vCPUs, 15 GB memory)
MySql is running in Google Cloud SQL
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
I found many posts that recomments to store the data file in the Google Cloud Storage and use the API to fetch the file and uploading to the bucket. We have very limited time so I decide to use NFS to share the data files over the pods, the problem is nfs speed is slow, it's around 100mb/s when I copying the file with pv, the result from iperf is 1.96 Gbits/sec.Do you know how to achieve the same result without implement the cloud storage? or increase the NFS speed?
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
There's nothing stopping you from volume mounting an SSD into the Pod so you can continue to use an SSD. I can only speak to AWS terminology, but some EC2 instances come with "local" SSD hardware, and thus you would only need to use a nodeSelector to ensure your Pods were scheduled onto machines that had said local storage available.
Where you're going to run into problems is if you are currently just using one php+apache and thus just one SSD, but now you want to scale the application up and it requires that all php+apache have access to the same SSD. That's a classic distributed application architecture problem, and something kubernetes itself can't fix for you.
If you're willing to expend the effort, you can also try any one of the other distributed filesystems (Ceph, GlusterFS, etc) and see if they perform better for your situation. Then again, "We have very limited time" I guess pretty much means that's off the table.

How can I check the remaining size of a persistent disk on Google Cloud? And where can I find the code in the instances?

I created a project on Google Cloud a long time ago and I am currently having some problems with it. The only result I seem to be receiving is Internal Server Error.
I tried connecting to the compute instance through ssh, but it does not help much because :
as far as I remember, I used to be able to see all the code on the compute instance. It's no longer there, the home folder only has some hidden files. I am not sure where to look for the actual project files.
the only error I managed to get from a log file was : Error syncing pod 9c8e56bc-4298-11e6-ab50, skipping: failed to "StartContainer" for "postgres" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=postgres pod=postgres_default(9c8e56bc-4298-11e6-ab50); this makes me think there are some issues with Postgres, which has a persistent disk to its own, but there seems to be no easy way to find out how much of that disk is occupied.
even though I am admin on that project and I should receive detailed (with stacktrace) emails every time there is an error, I am not receiving anything at all.
This behaviour started today, all of a sudden, and I haven't touched the project in almost 2 years, so I am completely lost.
Thanks.
How can I check the remaining size of a persistent disk on Google
Cloud?
For this part, I finally found a way to do it today. I'll describe it all here with print screens so that it is easy for anyone.
First, go to the Google Console, Disks page : https://console.cloud.google.com/compute/disks
Identify the persistent disk you are interested in. In my case, this was called pg-data-disk. Click on the respective VM instance; this will be on the column "In use by", link in the image below :
This will open a SSH connection to the VM instance to which your persistent disk is attached. In the SSH window, run the following command : sudo lsblk. The result should be like in the image below :
You will thus discover the DISK ID (in my case this was sdb), so you can now run : sudo df -h <YOUR DISK ID>. This command will give you the exact disk usage, as shown below :
As for the other part of the question, I was actually using Docker containers which were orchestrated by Kubernetes. And I totally forgot about it.
Will upgrade my RAM and get back to work.
Thank you all.

mount google cloud storage bucket but cache locally

I would like to know if there is a way to mount google cloud storage bucket as a folder for the first time
and each time we read the file, cache it locally (so it won't use money/bandwidth).
GCSFUSE has two type of caching available, Stat caching and type caching. You can refer to this document which provide detailed information on these types of caching with there trade-offs.

Google Cloud Storage FUSE - Using gcsfuse fills up local instance memory

I've been using gcsfuse (FUSE) for some weeks and everything was running smoothly until my instance disk(10GB) got filled up out of nowhere.
I was trying to identify the cause and erasing some temporal files and found out that unmounting the bucket fixed the issue.
It's supposed to upload to the cloud right? So why is it taking up space as if it was counted as local instance space?
Thanks for the help guys.
Here is a reason why you would see this behaviour.
Pasting from the gcsfuse doc
https://cloud.google.com/storage/docs/gcs-fuse
Local storage: Objects that are new or modified will be stored in
their entirety in a local temporary file until they are closed or
synced. When working with large files, be sure you have enough local
storage capacity for temporary copies of the files, particularly if
you are working with Google Compute Engine instances. For more
information, see the readme documentation.