I have been working out solution to move data from Azure VM to Google cloud storage. I see there is no connector available in ADF to move. It appears AzCopy is the only option.
I am looking at full and delta load option.
Full Data size is around : 100GB
Delta data will be around : ~1GB per day
Does anyone has any better solution?
Related
I am new to google cloud and was wondering if it is possible to run PostgresQL container on Cloud Run but the data_directory of PostgresQL was pointed to Cloud Storage?
If possible, then please could you point me to some tutorials/guides on this topic. And also what are the downsides of this approach?
Edit-0: Just to clarify what I am trying to achieve:
I am learning google cloud and want to write simple application to work along with it. I have decided that the backend code will run as a container under Cloud Run and the persistent data(i.e the database file) will reside on Cloud Storage. Because this is a small app for learning purpose, I am trying to use as less moving parts as possible on the backend(and also ones that are always free). And also both PostgresQL and the backend code will reside in the same container except for the actual data file, which will reside under Cloud Storage. Is this approach correct? Are there better approaches to achieve the same minimalism?
Edit-1: Okay, I got the answer! The Google documentation here mentions the following:
"Don't run a database over Cloud Storage FUSE!"
The buckets are not meant to store database information, some of the limits are the following:
There is no limit to writes across multiple objects, which includes uploading, updating, and deleting objects. Buckets initially support roughly 1000 writes per second and then scale as needed.
There is no limit to reads of objects in a bucket, which includes reading object data, reading object metadata, and listing objects. Buckets initially support roughly 5000 object reads per second and then scale as needed.
One alternative to separate persistent disk for your PostgreSQL database, is to use Google Compute Engine. You can follow the “How to Set Up a New Persistent Disk for PostgreSQL Data” Community Tutorial.
I am working on a new project and the problem is my firebase storage is filling gradually up even as I dont use it, right now its 4,1 GB big.
I did not have a bucket created and it was filling up.
One thing I tried to do was to look at the files in the cloud console but all of them are of a weird format that I can not manage to open up.
Until now I was not even working with media that could take up that space.
I would appreciate ideas how to backtrack the usage.
this is how my 3 GB bucket (I never uploaded something to it) looks like, any idea how I can open these files?
A change to how firebase are deploying functions from node 10 onwards means they automatically add container image files to your cloud storage with every deployment. This counts towards your "Bytes stored" and "Bandwidth" limits in firebase.
To save costs you can delete all these files, and only deploy individual functions with firebase deploy --only functions:myFunctionName instead of deploying them all at once.
The following is a screenshot from firebase support:
Links from image Cloud Build Container Registry Firebase pricing FAQ
i want to create bigquery table from the cloud storage. Kafka steam uploaded as text files into Cloud storage by every 5 minutes. I want to create bigquery table using that is updating every 5 minutes from the updated files into Bigquery. What is the best way to do this? Please give me some suggestions
You could use google-cloud-functions to detect when a file is uploaded, then execute some code to index that file.
Alternatively, I believe there already exists a BigQuery Kafka Connector, so you could skip GCS unless you need the raw data. (Note: binary files would be cheaper to store than plaintext, and BigQuery supports reading various formats)
On AWS I use it with S3 + Lambda combination. As new image uploaded to a bucket, lambda is triggered and create 3 different sizes of image (small, medium, large). How can I do this with GCS + Function?
PS: I know that there's "getImageServingUrl()", but can this be used with GCE or it's for app engine only?
Would really appreciate any input.
Thanks.
Google Cloud Functions directly supports triggers for new objects being uploaded to GCS: https://cloud.google.com/functions/docs/calling/storage
For finer control, you can also configure a GCS bucket to publish object upload notifications to a Cloud Pub/Sub topic, and then set a subscription on that topic to trigger Google Cloud Functions: https://cloud.google.com/functions/docs/calling/pubsub
Note that there are some quotas on Cloud Functions uploading and downloading resources, so if you need to process more than to 1 Gigabyte of image data per 100 seconds or so, you may need to request a quota increase: https://cloud.google.com/functions/quotas
I am trying to figure out a proper solution for the following:
We have a client from which we want to receive data, for instance a binary that is 200Mbytes updated daily. We want them to deposit that data file(s) onto a local server near them (Europe).
We then want to do one of the following:
We want to retrieve the data, either from a local
server where we are (China/HK), or
We can log into their European
server where they have deposited the files and pull the files directly ourselves.
QUESTIONS:
Can Google's clould platform serve as a secure, easy way to provide a cloud drive for which to store and pull the data file?
Does Google's cloud platform distribute such that files pushed onto a server in Europe will be mirrored in a server over in East Asia? (that is, where and how would this distribution model work with regard to my example.)
For storing binary data, Google Cloud Storage is a fine solution. To answer your questions:
Secure: yes. Easy: yes, in that you don't need to write different code depending on your location, but there is a caveat on performance.
Google Cloud Storage replicates files for durability and availability, but it doesn't mirror files across all bucket locations. So for the best performance, you should store the data in a bucket located where you will access it the most frequently. For example, if you create the bucket and choose its location to be Europe, transfers to your European server will be fast but transfers to your HK server will be slow. See the Google Cloud Storage bucket locations documentation for details.
If you need frequent access from both locations, you could create one bucket in each location and keep them in sync with a tool like gsutil rsync