https://cloud.google.com/dataproc/docs/guides/dataproc-images specifies that
The custom image is saved in Cloud Compute Images, and is valid to create a Cloud Dataproc cluster for 30 days. You must re-create the custom image to reuse it after the 30-day period.
Is that limitation temporary while the custom image feature is in beta, or will it be perpetual?
This is a perpetual limitation, which will be present after custom images will go to GA (General Availability).
If you have a feedback on how and why this is impacting your use case you can send it to dataproc-feedback#google.com for Dataproc team consideration.
Related
I am working on a new project and the problem is my firebase storage is filling gradually up even as I dont use it, right now its 4,1 GB big.
I did not have a bucket created and it was filling up.
One thing I tried to do was to look at the files in the cloud console but all of them are of a weird format that I can not manage to open up.
Until now I was not even working with media that could take up that space.
I would appreciate ideas how to backtrack the usage.
this is how my 3 GB bucket (I never uploaded something to it) looks like, any idea how I can open these files?
A change to how firebase are deploying functions from node 10 onwards means they automatically add container image files to your cloud storage with every deployment. This counts towards your "Bytes stored" and "Bandwidth" limits in firebase.
To save costs you can delete all these files, and only deploy individual functions with firebase deploy --only functions:myFunctionName instead of deploying them all at once.
The following is a screenshot from firebase support:
Links from image Cloud Build Container Registry Firebase pricing FAQ
Is there a way how to apply upload limit for google storage bucket per day/month/year?
Is there is a way how to apply limit on amount of Network traffic?
Is there is a way how to apply limit on Class A operations?
Is there is a way how to apply limit on Class B operations?
I found only Queries per 100 seconds per user and Queries per day using
https://cloud.google.com/docs/quota instructions, but this is JSON Api quotas
(I even not sure what kind of api is used inside of StorageClient c# client class)
For defining Quotas, and by the way SLO, you need to have SLI: Service level indicator. that means to have metrics on what you want to observe.
Here, it's not the case. Cloud Storage haven't indicator on the volume of data per day. Thus, you don't have built in indicator and metrics, ... and quotas.
If you want it, you have to build something by your own. To wrap all the Cloud Storage call in a service that count the volume of blob per days and then you will be able to apply your own rules on this personal indicator.
Of course, for preventing any by pass, you have to deny direct access to the buckets and only grant your "indicator service" to access them. Same things for the bucket creation, to register the new buckets in your service.
Not an easy task...
I'm trying to do some testing with Cloud Data Fusion, however, I'm receiving issues with connections when running my pipelines. I've come to understand that it is using the default network, and I would like to change my System Compute Profile over to a different network.
The problem is, I don't have the option to create a new System Compute Profile (The option doesn't show up under the Configuration tab). How can I go about getting the correct access to create a new compute profile? I have the role of Data Fusion Admin.
Thank you.
Creating a new compute profile is only available in Data Fusion Enterprise edition. In the basic edition, only the default compute profile can be used. But you can customize the profile when you run the pipeline. To do that:
Go to the pipeline page
Click on Configure, in the Compute config, click Customize
This will pop up the settings for the profile, in General Settings, you can set the value for the network.
Just an update on this thread for future viewer. Custom profile can be created in Cloud data fusion Version 6.2.2 (Basic)
On AWS I use it with S3 + Lambda combination. As new image uploaded to a bucket, lambda is triggered and create 3 different sizes of image (small, medium, large). How can I do this with GCS + Function?
PS: I know that there's "getImageServingUrl()", but can this be used with GCE or it's for app engine only?
Would really appreciate any input.
Thanks.
Google Cloud Functions directly supports triggers for new objects being uploaded to GCS: https://cloud.google.com/functions/docs/calling/storage
For finer control, you can also configure a GCS bucket to publish object upload notifications to a Cloud Pub/Sub topic, and then set a subscription on that topic to trigger Google Cloud Functions: https://cloud.google.com/functions/docs/calling/pubsub
Note that there are some quotas on Cloud Functions uploading and downloading resources, so if you need to process more than to 1 Gigabyte of image data per 100 seconds or so, you may need to request a quota increase: https://cloud.google.com/functions/quotas
I understand google dataproc clusters are equipped to handle initialization actions - which are executed on creation of every node. However, this is only reasonable for small actions, and would not do well with creating nodes with tons of dependencies and software for large pipelines. Thus, I was wondering - is there anyway to load nodes as custom images or have an image spin up once the node is created that has all the installs on it, so you don't have to download things again and again.
Good question.
As you note, initialization actions are currently the canonical way to install stuff on Clusters when they are created. If you have a ton of dependancies, or need to do things like compile from source, those initialization actions may take a bit.
We have support for a better method to handle customizations on our long-term roadmap. This may be via custom images or some other mechanism.
In the interim, scaling clusters up/down may provide some relief if you want to keep some of the customizations in place and split the difference between boot time and the persistence of your cluster. Likewise, if there are any precompiled packages, those always save time.