I am working on a project that there is a raspberry pi that generates some data as CSV files in a laptop connected to it. My goal is to send these CSV files regularly into GCS (real-time or each 15minutes).
Then I will be using google cloud functions to send the data from GCS to BigQuery.
The raspberry pi is registered in the network (I am not sure how it can help)
My question: How do send CSV files from the laptop connected to raspberry Pi into Google cloud storage buckets?
You have to use the GCS client libraries:
https://cloud.google.com/storage/docs/reference/libraries
The one from python may be your best fit for the raspberry https://pypi.org/project/google-cloud-storage/
You will need a GCP project, create a bucket and create a ServiceAccount with permission to upload files. Download the ServiceAccount file to your raspeberry and use it as specified in the Client Library you chose: basically specify the credentials file location in your script or as a env var.
BTW to insert the file into BigQuery you could use the Cloud Storage pubsub notifications that create messages when new files are uploaded, then with a Push subscription to your Cloud Function may load it into the BigQuery with the BigQuery Client Library. Take a look to: https://cloud.google.com/storage/docs/pubsub-notifications?hl=es-419
Related
I need to sync (not mirror) files between a local disk and a cloud bucket. I can think of something like the Google Drive app, that works also in offline mode (and when the local PC goes online it automatically syncs data). This is useful for the app I'm going to develop, because of offline usage.
I dig a lot into the documentation but I didn't find any useful resource.
I can use gcloud rsync in combination with a Cloud Function to listen to cloud bucket events.
And a custom, local, trigger for events on the local hard disk (let's assume I'm developing a Node.JS local app).
But then I've to handle edge situations like: offline, concurrent operations, very long transfers, permissions, etc.
I don't want to re-invent the wheel and I think this is a common pattern, like the previously mentioned GDrive app.
Also, Firestore Native Mode does implement something really close to, although it's related to documents and not files.
Does Google Cloud Platform and/or Firebase allow the synchronization of local folder with cloud bucket with ease?
What do you think about my approach?
As you mentioned, these functions are implemented in Google Drive / One, and for these products, this is the main intention "to be a cloud drive" (basically all the time it is in sync with your local devices).
On the other hand, Google Cloud Storage is a service with a different approach, this is an object-based storage and was designed as part of the Cloud architecture to interact with cloud services (always online services), at this time Google does not offer a similar software client (as Google Drive does) for syncing local and cloud folders.
I found this third party software (not supported by Google) that allows syncing between local folders and cloud storage
Also I reviewed the pricing for Google One and cloud storage and Google One is significantly cheaper, for example.
2 TB / month G ONE: $ 10 USD
2TB / month G storage: $ 40 USD
Based on this, you should also add the price of additional services, for example.
pubsub service
cloud function service
outgoing network traffic
Your approach sounds good (it takes a lot of effort but it's okay) but unfortunately you are trying to use a service in an off-design scenario.
If you want to save code in the cloud, you can use Google Cloud Repositories which basically works like Github, but has the advantage of being easily integrated with CI/CD services like Google Cloud Build
Google Apps Script JDBC doesn't support a connection to PostgreSQL directly but Google Data Studio supports a connection to PostgreSQL to pull data and build reports. I've also heard they support a low-key export to .csv option. Is it then possible to exploit the Data Studio Service in Google Apps Script to populate Google Sheets with that data, effectively creating a workaround?
All I need is a one-way access from PostgreSQL into Google Sheets by means of Google Apps Script, I do NOT expect to import anything back into my database.
Looking at the reference documentation, the built-in Apps Script service for DataStudio does not allow you to pull data from a connected data source. It can be used to create connectors but its does not allow direct access to connected data sources.
However, you can try creating a custom API or server-less mirco-service in a language that supports PostgreSQL, and then expose that service as HTTP endpoints that you can call via URLFetchApp. You can leverage Google Cloud Functions to do this and write the mirco-service in either back-end Javascript(Node.js), Python or Go. This approach will take you well-outside the bounds of your typical GAS script, but it is a viable option.
I have two Google Service Credentials and a bucket on each account .I have to transfer files from one bucket to another. How can I do this programmatic ally?
Can I achieve this with two Storage objects or using the Cloud storage Transfer service?
Yes, with Storage Transfer Service you can create a transfer job and send the data to a destination bucket (in another project), keep in mind that it is documented that:
To access the data source and the data sink, this service account must
have source permissions and sink permissions.
Meaning that you can't use two different service accounts, you will need to grant access to only one of the two service accounts you have.
If you want to transfer files from one bucket to another programmatically. First, you must grant permission to the service account associated with the Storage Transfer Service so it can access the data sink(destination bucket), please follow these steps.
Please note that if you are not creating the transfer job in the same project where the source bucket is located, then you must grant permissions to access it.
With Storage Transfer Service you can create a transfer job programmatically with Java and Python, examples include creating the transfer job and checking the transfer operation status. Full code example can be found for Java and Python.
I have a number of files that I transferred into Azure Blob Storage via the Azure Data Factory. Unfortunately, this tool doesn't appear to set the Content-MD5 value for any of the values, so when I pull that value from the Blob Storage API, it's empty.
I'm aiming to transfer these files out of Azure Blob Storage and into Google Storage. The documentation I'm seeing for Google's Storagetransfer service at https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec#HttpData indicates that I can easily initiate such a transfer if I supply a list of the files with their URL, length in bytes and an MD5 hash of each.
Well, I can easily pull the first two from Azure Storage, but the third doesn't appear to automatically get populated by Azure Storage, nor can I find any way to get it to do so.
Unfortunately, my other options look limited. In the possibilities so far:
Download file to local machine, determine the hash and update the Blob MD5 value
See if I can't write an Azure Functions app in the same region that can calculate the hash value and write it to the blob for each in the container
Use an Amazon S3 egress from Data Factory and then use Google's support for importing from S3 to pull it from there, per https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec#AwsS3Data but this really seems like a waste of bandwidth (and I'd have to set up an Amazon account).
Ideally, I want to be able to write a script, hit go and leave it alone. I don't have the fastest download rate from Azure, so #1 would be less than desireable as it'd take a long time.
Have any other approaches?
May 2020 update: Google Cloud Data Transfer now supports Azure Blob storage as a source. This is a no-code solution.
We used this to transfer ~ 1TB of files from Azure Blob storage to Google Cloud Storage. We also have a daily refresh so any new files in Azure Blob are automatically copied to Cloud Storage.
I know it's a bit late to answer this question for you, but it might help others who all are trying to migrate data from Azure Blob Storage to Google Cloud Storage
Google Cloud Storage and Azure Blob Storage, both platforms being storage services, does not have a command line interface, where we can simply go and run transfer commands. For that, we need an intermediate compute instance which would actually be able to run the required commands. We will follow the steps below in order to achieve the Cloud to Cloud transfer.
First and foremost, create a Compute Instance in Google Cloud Platform. You needn't create a computationally powerful instance, all you need is a Debian-10GB machine with 2-core CPU and 4 GB of memory.
In the early days, you would have downloaded the data to the Compute Instance in GCP and then move it further to Google Cloud Storage. But now with the introduction of gcsfuse we can simply mount a Google Storage Account as a File System.
Once the compute instance is created, simply login to that instance using SSH from Google Console and install the following packages.
Install Google Cloud Storage Fuse
export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt-get update -y
sudo apt-get install gcsfuse -y
# Create local folder
mkdir local_folder_name
# Mount the Storage Account as a bucket
gcsfuse <bucket_name> <local_folder_path>
Install Azcopy
wget https://aka.ms/downloadazcopy-v10-linux
tar -xvf downloadazcopy-v10-linux
sudo cp ./azcopy_linux_amd64_*/azcopy /usr/bin/
Once these packages are installed, the next step is to create the Shared Signature Access key. If you have Azure Blob Storage Explorer, just right click on the storage account name in the directory tree and Select Generate Shared Access Signature
Now you will have to create a URL to your blob objects. To achieve this, simply right-click on any of your blob object, select Properties and copy the URL from the dialogue box.
Your final Url should look like.
<https://URL_to_file> + <SAS Token>
https://myaccount.blob.core.windows.net/sascontainer/sasblob.txt?sv=2015-04-05&st=2015-04-29T22%3A18%3A26Z&se=2015-04-30T02%3A23%3A26Z&sr=b&sp=rw&sip=168.1.5.60-168.1.5.70&spr=https&sig=Z%2FRHIX5Xcg0Mq2rqI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D
Now, use following command to start copying the files from Azure to GCP storage.
azcopy cp --recursive=true "<-source url->" "<-destination url->"
If in case, your job fails you can list your jobs using:
azcopy jobs list
and to resume failed jobs:
azcopy jobs resume jobid <-source sas->
You can collate all the steps into one bash, leave it running till your data transfer is complete.
And that's all! I hope it help others
We have migrated about 3TB files from Azure to Google Storage. We have started a cheap Linux server with a few TB local disk in the Google Computing Engine. Transferred the the Azure files to the local disk by blobxfer, then copied the files from the local disk to the Google Storage by gsutil rsync (gsutil cp works too).
You can use other tools to transfer files from Azure, you may even start the Windows server in the GCE and use gsutils on Windows.
It has taken a few days, but was simple and straightforward.
Did you think about using Azure Data Factory custom activity support that is used for data transformation? On back-end, you can use Azure Batch for downloading, updating and uploading your files into Google Storage, if you go with ADF custom activity.
I want to pull events files that exists persistent storage on remote server into google cloud server.
The gsutil and python api only support to upload if the code will run
on the server with the files.
I can't run the code on the local sever, i can only run code on external server.