Limit on IBM Cloud Object Storage public bucket access - ibm-cloud

I am trying to create a public bucket in IBM COS, my questions are:
Can we put the limit on the number of calls to the COS public bucket to avoid the DOS attack and is it required?
Does IBM COS service handle it itself?

No - it isn't possible to set a request limit or quota. But the endpoint that the request is sent to is actually a set of load balancers in front of the COS system itself, and should throttle traffic appropriately. This would be a useful clarification to add to the documentation.

Related

Is downloading an object from a bucket via public url from a Google instance charged?

I would like to upload images to a bucket, and use a google VM Instance to download the image / edit it on the fly and serve it.
The outgoing traffic from the VM is already paid, do I also have to calculate the bandwidth from google cloud storage to the VM? Or being in the same network is not paid? In the documentation I found "Accessing data in an EU bucket with an EU-WEST1 GKE instance. - Free" does the same also apply to Custom VM instances?
It will mostly depend on the location of your resources.
Downloading an object will necessarily imply a network egress, but since the egress will stay within Google Cloud the cost will be highly reduced (free in most cases) compared to egress costs to an external location.
Basically, the network egress cost will be cheap or free if the GCS bucket and your GCE instance are located in the same continent, and will be priced at standard rates on other cases.
You can find the pricing details on this page, in the "Network egress within Google Cloud" section which lists the various scenarios: https://cloud.google.com/storage/pricing
Note that you will also need to consider the cost of the read operations when downloading the object.
Egress from Cloud Storage into a GCE instance in the same Cloud zone is free, networking-wise. However, you will still be charged any retrieval cost (free for Standard storage, a few cents per gigabyte for nearline, coldline, or archive) and an operation charge ($0.004 per 10,000 read operations). The ingress into a compute engine instance in the same zone is also free.
For more, check out the pricing policy for Cloud Storage and Compute Engine. Keep in mind that this is very general advice and a lot depends on exact details here.

cloud sql send pubsub message on update/insert

I am setting up a read only GraphQL instance using Java. GraphQL as I understand it needs to be told when to re-query its data sources. We are using GCP, and Cloud SQL for our primary data source. Our monolithic system is what is responsible for updating the data.
Is there a way to trigger a web request or pub/sub message from cloud sql without sys_eval(sys_eval('curl https://example.com'));?
or is there a way to turn on sys_eval in cloud sql?
After some brainstorming around sys_eval alternatives such as binary logs and so on, I think the course of action I'd recommend is to move the MySQL client to the GCE instance, and establish the connection to the Cloud SQL instance through a Private IP.
Such connection will be guaranteed a much lower latency, and an a lot higher network security, since, the service does not use Public IPs and it would be protected from the "outside" Internet; all compared to your current architecture.
You can find connection examples using VPC networks in the documentation provided.

Is Google Cloud Storage an automagical global CDN?

I’m attempting to setup a Google Cloud Storage bucket to store and serve all the static objects for my site. I’m also attempting to push all the objects in that bucket out to all the global edge locations offered by Google Cloud CDN.
I’ve created a bucket on Google Cloud Storage: cdn.mysite.com. I chose “US” multi-region for the bucket location setting.
My assumption is that any object stored in this bucket will be replicated to all the us-* regions for high-durability purposes, but not pushed out to all the Google Cloud CDN global edge locations for CDN purposes.
Or are all my objects in my “US” multi-region bucket already automagically pushed out to all of Google Cloud CDN edge locations?
I’m gobsmacked that I can’t figure out whether or not my bucket is already a CDN or not. Even after two days of searching (Google, ironically).
Thanks in advance for any help.
The best discussion I've seen of Cloud Storage edge caching vs. Cloud CDN was during the Google Cloud Next '18 session Best Practices for Storage Classes, Reliability, Performance and Scalability. The entire video is useful, but here's link to the content distribution topic.
One key note from the summary is that edge caching gives you many of the benefits of a CDN, but you still pay for data egress. The Cloud CDN gives you caching, which can lower the cost of egress. They also outlined a couple other options.
Cloud CDN and Cloud Storage are distinct, so objects in your multi-region bucket are not necessarily pushed to Cloud CDN edges. You can find information about Cloud Storage regions here; as you probably already know, Cloud CDN's edge locations are mapped out here. However, it's very straightforward to integrate Cloud Storage with Cloud CDN: just follow these steps!
Oct 2020 - Yes - if you take Google's word for it:
Cloud Storage essentially works as a content delivery network. This
does not require any special configuration because by default any
publicly readable object is cached in the global Cloud Storage
network.
https://cloud.google.com/appengine/docs/standard/java11/serving-static-files
Partly:
Cloud Storage behaves like a Content Delivery Network (CDN) with no work on your part because publicly readable objects are cached in the Cloud Storage network by default.
But:
Feature Cloud Storage Cloud CDN
Max cacheable file size 10 MiB 5 TiB
Default cache expiration 1 hour 1 hour (configurable)
Support custom domains over HTTPS No Yes
Cache invalidation No Yes
In particular, if you serve videos to your users, they are likely to be larger than 10 MiB and will not be cached then.
Also note that it only uses caching for public objects.
https://cloud.google.com/storage/docs/caching

IP Restriction Google Cloud Storage

Is it possible to create a Google Cloud Storage bucket and restrict it's access to one IP? I plan on using a bucket to store data that only I would ever need to upload/download from.
Check out VPC Service Controls. This no cost feature allows you to restrict client access to project resources based on a variety of attributes, including source IP address, and includes support for Cloud Storage buckets.
No, that's not available at this time.

In hadoop, Is there any limit to the size of data that can be accessed through knox + webhdfs?

In hadoop, Is there any limit to the size of data that can be accessed/Ingested to HDFS through knox + webhdfs?
Apache Knox is your best option when you need to access webhdfs resources from outside of a cluster that is protected with firewall/s. If you don't have access to all of the datanode ports then direct access to webhdfs will not work for you. Opening firewall holes for all of those host:ports defeats the purpose of the firewall, introduces a management nightmare and leaks network details to external clients needlessly.
As Hellmar indicated, it depends on your specific usecase/s and clients. If you need to do ingestion of huge files or numbers of files then you may want to consider a different approach to accessing the cluster internals for those clients. If you merely need access to files of any size then you should be able to extend that access to many clients.
Not having to authenticate using kerberos/SPNEGO to access such resources opens up many possible clients that would otherwise be unusable with secure clusters.
The Knox users guide has examples for accessing webhdfs resources - you can find them: http://knox.apache.org/books/knox-0-7-0/user-guide.html#WebHDFS - this also illustrates the groovy based scripting available from Knox. This allows you to do some really interesting things.
In theory, there is no limit. However, using Knox creates a bottleneck. Pure WebHDFS would redirect the read/write request for each block to a
(possibly) different datanode, parallelizing access; but with Knox everything is routed through a single gateway and serialized.
That being said, you would probably not want to upload a huge file using Knox and WebHDFS. It will simply take too long (and depending on your client, you may get a timeout.)