How to securely transfer on-premise data to GCS buckets daily - google-cloud-storage

What is the best way to securely transfer data to GCS bucket from on-premise. Data size in 10GBs to transferred daily to GCS buckets.
Data is lying on my on-premise server in my organisation.
Is there SFTP kinda protocol available there.

You could use gsutil to do this. Data are transferred over HTTPS, and depending on your security requirements there are various options for controlling encryption keys used for storing data.

Related

Cloud service recommendation for pdf storage

I'm creating a web app that will contain a lot of pdfs for educational notes.What i have figured is that these pdfs should not be stored in my database i.e mongodb so can anyone recommend a cloud service which is free or cheap that can be used to store a large amount of pdfs for my web application.Considering storage upto 50-60gb. Also, i'm using node and express for my application.
Use an Amazon Web Services S3 Bucket on there Free tier for up to 5GB. Otherwise try Cloud Front:
https://aws.amazon.com/cloudfront/
"As part of the AWS Free Usage Tier, you can get started with Amazon CloudFront for free. Upon sign-up, new AWS customers receive 50 GB Data Transfer Out and 2,000,000 HTTP and HTTPS Requests each month for one year"

Is Google Cloud Storage an automagical global CDN?

I’m attempting to setup a Google Cloud Storage bucket to store and serve all the static objects for my site. I’m also attempting to push all the objects in that bucket out to all the global edge locations offered by Google Cloud CDN.
I’ve created a bucket on Google Cloud Storage: cdn.mysite.com. I chose “US” multi-region for the bucket location setting.
My assumption is that any object stored in this bucket will be replicated to all the us-* regions for high-durability purposes, but not pushed out to all the Google Cloud CDN global edge locations for CDN purposes.
Or are all my objects in my “US” multi-region bucket already automagically pushed out to all of Google Cloud CDN edge locations?
I’m gobsmacked that I can’t figure out whether or not my bucket is already a CDN or not. Even after two days of searching (Google, ironically).
Thanks in advance for any help.
The best discussion I've seen of Cloud Storage edge caching vs. Cloud CDN was during the Google Cloud Next '18 session Best Practices for Storage Classes, Reliability, Performance and Scalability. The entire video is useful, but here's link to the content distribution topic.
One key note from the summary is that edge caching gives you many of the benefits of a CDN, but you still pay for data egress. The Cloud CDN gives you caching, which can lower the cost of egress. They also outlined a couple other options.
Cloud CDN and Cloud Storage are distinct, so objects in your multi-region bucket are not necessarily pushed to Cloud CDN edges. You can find information about Cloud Storage regions here; as you probably already know, Cloud CDN's edge locations are mapped out here. However, it's very straightforward to integrate Cloud Storage with Cloud CDN: just follow these steps!
Oct 2020 - Yes - if you take Google's word for it:
Cloud Storage essentially works as a content delivery network. This
does not require any special configuration because by default any
publicly readable object is cached in the global Cloud Storage
network.
https://cloud.google.com/appengine/docs/standard/java11/serving-static-files
Partly:
Cloud Storage behaves like a Content Delivery Network (CDN) with no work on your part because publicly readable objects are cached in the Cloud Storage network by default.
But:
Feature Cloud Storage Cloud CDN
Max cacheable file size 10 MiB 5 TiB
Default cache expiration 1 hour 1 hour (configurable)
Support custom domains over HTTPS No Yes
Cache invalidation No Yes
In particular, if you serve videos to your users, they are likely to be larger than 10 MiB and will not be cached then.
Also note that it only uses caching for public objects.
https://cloud.google.com/storage/docs/caching

Azure Message size limit and IOT

I read through azure documentation and found that the message size limit of Queues is 64 Kb and Service Bus is 256 KB. We are trying to develop an application which will read sensor data from the some devices, call a REST Service and upload it to cloud . This data will be stored in the queues and then dumped in to a Cloud database.
There could be chances that the sensor data collected is more than 256 KB... In such cases what is the recommended approach... Do we need to split the data
in the REST service and then put chunks of data in the queue or is there any other recommended pattern
Any help is appreciated
You have several conflicting technology statements. I will begin by clarifying a few.
Service Bus/IoT Hub are not post calls. A post call would use a
restful service, which exists separately. IoT Hub uses a low
latency message passing system that is abstracted from you. These
are intended to be high volume small packets and fits most IoT
scenarios.
In the situation in which a message is larger than 256 KB (which is very interesting for an IoT scenario, I would be interested to
see why those messages are so large), you should ideally upload to
blob storage. You can still post packets
If you have access to blob storage api's with your devices, you should go that route
If you do not have access to this, you should post big packets to a rest endpoint and cross your fingers it makes it or chop it up.
You can run post analytics on blob storage, I would recommend using the wasb prefix as those containers are Hadoop compliant and you can stand up analytics clusters on top of those storage mechanisms.
You have no real need for a queue that I can immediately see.
You should take a look at the patterns leveraging:
Stream Analytics: https://azure.microsoft.com/en-us/services/stream-analytics/
Azure Data Factory: https://azure.microsoft.com/en-us/services/data-factory/
Your typical ingestion will be: Get your data up into the cloud into super cheap storage as easily as possible and then deal with analytics later using clusters you can stand up and tear down on demand. That cheap storage is typically blob and that analytics cluster is usually some form of Hadoop. Using data factory allows you to pipe your data around as you figure out what you are going to use specific components of it for.
Example of having used HBase as ingestion with cheap blob storage as the underlayment and Azure Machine Learning as part of my analytics solution: http://indiedevspot.com/2015/07/09/powering-azureml-with-hadoop-hbase/

How to optimize large file transactions using a WCF Service as an endpoint for the Azure storage?

I have an WCF REST Service as an endpoint for the Azure Storage. The WCF REST Services handles uploads and downloads of files that usually measure 5-10 MB. When handling the stream (both for download an upload) the bytes are in the Azure VM RAM memory, right? Even if for upload the data is splitted into 4 MB blocks, those 4 MB are kept in the RAM memory until the upload is complete. For download, the bytes are kept until the download is complete. So, if I have 1000 users downloading a file at the same time that means that the Azure VM should have 4 GB RAM just for the transfer.
Is there a way to optimize this? Correct me if I'm wrong when I assume that the data is kept in VM machine RAM until the operation is finished. Should I use Microsoft's Azure REST Service? Where does that service keep the data until the transfer is finished?
I think you can avoid having all the data in memory at once by doing a couple things.
set up your WCF service to use a Streamed TransferMode
Handle your streams appropriately in code. ie, open the stream to BLOB storage, and then copy from the WCF stream to the BLOB stream in chunks, or using Stream.CopyTo().
I tried to implement this about 6 months ago, but I couldn't make WCF Streamed mode work properly in my situation. It seems to be finicky to get working, and may not be possible with HTTP endpoints. Then again, maybe my WCF-fu is just too weak. If you do get something working please post it back here, as I would very much like to see it.
Note that if you have multiple clients downloading the same data at once, you could keep a single copy in memory and stream it to each of them independently. But I don't believe that approach is compatible with the method I just described - you'd have to do one or the other.
You could use Windows Azure Blob Storage to handle all of the download/upload optimizations for you. It has a REST API, different "settings" for large/small files, shared access keys for keeping data private, http/https etc.
For downloading the files you can directly access the resource via a REST URL (pass in the optimization headers) or use the StorageClient library...and not process it via your own service.
Blob storage should be used for objects as large as you describe. Blob storage runs in a separate service than the Azure VM and is massively scaleable. Here's an article that describes how blob storage is architechted and how to implement it in your application. You don't mention the types of files you are trying to serve, but in some cases it might be useful to use a queue service as described here to cut down the data load.

Back up of Streaming server

I want to take a new streaming server for my website which generally holds videos and audio files. But how do we maintain backup of the streaming server if storage size is increasing day by day.
Generally Database server, like Sql Server, backup can be easily taken and restored very easily as it does not occupy much space for medium range application.
On the other hand how can we take backup of streaming server. If the server fails, the there should be an alternative server / solution that should decrease downtime of the server.
How the back-end architecture of YouTube built to handle this.
The backend architecture of YouTube probably uses Google's BigTable which stores objects redundantly over several different servers. If you are using a single server solution your only real options are backing up to an attached disk, backing up to another server or using an offsite storage system like Amazon S3 (which you could then use with their CDN to do basic HTTP streaming of content in the case of a failure).