is it possible to copyObject from one cloud object storage instance to another. The buckets are in different regions - ibm-cloud

I would like to use the node sdk to implement a backup and restore mechanism between 2 instances of Cloud Object Storage. I have added a service ID to the instances and added a permissions for the service id to access the buckets present in the instance i want to write to. The buckets will be in different regions. I have tried a variety of endpoints both legacy and non-legacy private and public to achieve this but i usually get Access Denied.
Is what I am trying to do possible with the sdk? if so can someone point me in the right direction?
var config = {
"apiKeyId": "xxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxx",
"endpoint": "s3.eu-gb.objectstorage.softlayer.net",
"iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:cloud-object-storage:global:a/xxxxxxxxxxx:xxxxxxxxxxx::",
"iam_apikey_name": "auto-generated-apikey-xxxxxxxxxxxxxxxxxxxxxx",
"iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
"iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/0xxxxxxxxxxxxxxxxxxxx::serviceid:ServiceIdxxxxxxxxxxxxxxxxxxxxxx",
"serviceInstanceId": "crn:v1:bluemix:public:cloud-object-storage:global:a/xxxxxxxxxxxxxxxxxxx:xxxxxxxxxxxxxxxxxxxxxxxxxx::",
"ibmAuthEndpoint": "iam.cloud.ibm.com/oidc/token"
}

This should work as long as you are able to properly grant the requesting user access to be able to read the source of the put-copy, so long as you are not using KeyProtect based keys.

So the breakdown here is a bit confusing due to some unintuitive terminology.
A service instance is a collection of buckets. The primary reason for having multiple instances of COS is to have more granularity in your billing, as you'll get a separate line item for each instance. The term is a bit misleading, however, because COS is a true multi-tenant system - you aren't actually provisioning an instance of COS, you're provisioning a sort of sub-account within the existing system.
A bucket is used to segment your data into different storage locations or storage classes. Other behavior, like CORS, archiving, or retention, acts on the bucket level as well. You don't want to segment something that you expect to scale (like customer data) across separate buckets, as there's a limit of ~1k buckets in an instance. IBM Cloud IAM treats buckets as 'resources' and are subject to IAM policies.
Instead, data that doesn't need to be segregated by location or class, and that you expect to be subject to the same CORS, lifecycle, retention, or IAM policies can be separated by prefix. This means a bunch of similar objects share a path, like foo/bar and foo/bas have the same prefix foo/. This helps with listing and organization but doesn't provide granular access control or any other sort of policy-esque functionality.
Now, to your question, the answer is both yes and no. If the buckets are in the same instance then no problem. Bucket names are unique, so as long as there isn't any secondary managed encryption (eg Key Protect) there's no problem copying across buckets, even if they span regions. Keep in mind, however, that large objects will take time to copy, and COS's strong consistency might lead to situations where the operation may not return a response until it's completed. Copying across instances is not currently supported.

Related

Saas: Single-instance vs Multi-instance vs Single-tenant vs Multi-tenant?

I've been reading about instances and tenants and in the Saas architecture. My questions are as follows (please correct anything that you notice I've gotten wrong with any of the following terms):
1) Instance: Is an instance of a piece of software just a copy of that software with its own database? Is there anything more to it than that?
2) Tenant: Is a tenant a user / group of users that share a common set of access privileges to an individual instance?
3) Single-instance: If a Saas provider offers single-instance service, does this mean that they create only a single instance of their software? Or does it mean that there could be multiple instances, but that each instance can serve multiple tenants? If so, is single-instance the same as multi-tenant?
4) Multi-instance: Does this mean that each instance can serve only one tenant, or can there be multiple instances that each serve multiple tenants? ie. Can a multi-instance service be either single-tenant or multi-tenant?
5) Single-tenant: Does this just mean that an individual instance can serve only one tenant, or does it also imply that there are multiple instances? ie. Can a single-tenant service be both single-instance and multi-instance?
6) Multi-tenant: Does this just mean that an individual instance can serve multiple tenants, or does it imply that there is only a single instance? ie. Can a multi-tenant service be both single-instance and multi-instance?
7) To sum up: Can you have single-instance+single-tenant, single-instance+multi-tenant, multi-instance+single-tenant, and multi-instance+multi-tenant?
I'm going to write from my direct experience:
1) simple answer is 'yes'.
2) nearly yes: there will probably be refined access rights, say an administrator or two, and general users.
3) they're providing you with just one instance of that module, which will be single tenant.
4) they're providing you with multiple instances of that module, which will be single tenant.
5) I would use single-tenant to refer to the server hosting the instances is used by only one tenant. This might be done for perceived security benefits, or the server is running on a time zone that is non-standard for SaaS provider, like staying on UTC all year round.
6) I would use multi-tenant to refer to the server hosting the instances is used by more than one tenant. This tends to be more cost effective and probably just as secure as single-tenant.
7) yes, no, yes, yes.

Dedicate a node to a stream - Security rules

Can anyone let me know how to show a stream only in a specific node
i have a 2 nodes cluster.. and i would like to dedicate RIM01 specific to Stream1. RIM02 to Steam2. Meaning any request to that streams or apps in that stream should go to there nodes
So, if a go to RIM01 the Stream2 should be hidden etc...
Central node
RIM02 -- Repository + Engine
RIM03 -- Repository + Engine + Scheduler
i tried lot of security rules like
Filter : ServerNodeConfiguration_,Stream_
(node.#NodeUse="dev") and (node.#NodeType=stream.#StreamType and !resource.stream.Empty())
or
Filter : ServerNodeConfiguration_,Stream_
((resource.resourcetype = "Nodes" and resource.name="RIM01")) and ((resource.name="test"))
but none of them work :/
Thanks
So, at present, load balancing in Qlik Sense applies to Apps, not Streams. Load Balancing routes apps to servers, whereas security rules govern stream visibility. And, unfortunately, there is not a clean mechanism to use node meta-data in security rules. All in all, there isn't a solution for hiding a stream on a given server.
I have the same issue, you can designate the apps are only readable on single node, so depending on how your user stream rights are configured some users may see an empty stream on the node where the app cannot be accessed.
There's some interesting stuff happening with the multi cloud capability where the concept of streams is now collections, which gives lots more flexibility around this type of thing. Alas QEFE capability is only just come with June 2018, and access is limited to certain use cases / customers.

Different S3 behavior using different endpoints?

I'm currently writing code to use Amazon's S3 REST API and I notice different behavior where the only difference seems to be the Amazon endpoint URI that I use, e.g., https://s3.amazonaws.com vs. https://s3-us-west-2.amazonaws.com.
Examples of different behavior for the the GET Bucket (List Objects) call:
Using one endpoint, it includes the "folder" in the results, e.g.:
/path/subfolder/
/path/subfolder/file1.txt
/path/subfolder/file2.txt
and, using the other endpoint, it does not include the "folder" in the results:
/path/subfolder/file1.txt
/path/subfolder/file2.txt
Using one endpoint, it represents "folders" using a trailing / as shown above and, using the other endpoint, it uses a trailing _$folder$:
/path/subfolder_$folder$
/path/subfolder/file1.txt
/path/subfolder/file2.txt
Why the differences? How can I make it return results in a consistent manner regardless of endpoint?
Note that I get these same odd results even if I use Amazon's own command-line AWS S3 client, so it's not my code.
And the contents of the buckets should be irrelevant anyway.
Your assertion notwithstanding, your issue is exactly about the content of the buckets, and not something S3 is doing -- the S3 API has no concept of folders. None. The S3 console can display folders, but this is for convenience -- the folders are not really there -- or if there are folder-like entities, they're irrelevant and not needed.
In Amazon S3, buckets and objects are the primary resources, where objects are stored in buckets. Amazon S3 has a flat structure with no hierarchy like you would see in a typical file system. However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. Amazon S3 does this by using key name prefixes for objects.
http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
So why are you seeing this?
Either you've been using EMR/Hadoop, or some other code written by someone who took a bad example and ran with it... or is doing something differently than it should have been done for quite some time.
Amazon EMR is a web service that uses a managed Hadoop framework to process, distribute, and interact with data in AWS data stores, including Amazon S3. Because S3 uses a key-value pair storage system, the Hadoop file system implements directory support in S3 by creating empty files with the <directoryname>_$folder$ suffix.
https://aws.amazon.com/premiumsupport/knowledge-center/emr-s3-empty-files/
This may have been something the S3 console did many years ago, and apparently (since you don't report seeing them in the console) it still supports displaying such objects as folders in the console... but the S3 console no longer creates them this way, if it ever did.
I've mirrored the bucket "folder" layout exactly
If you create a folder in the console, an empty object with the key "foldername/" is created. This in turn is used to display a folder that you can navigate into, and upload objects with keys beginning with that folder name as a prefix.
The Amazon S3 console treats all objects that have a forward slash "/" character as the last (trailing) character in the key name as a folder
http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
If you just create objects using the API, then "my/object.txt" appears in the console as "object.txt" inside folder "my" even though there is no "my/" object created... so if the objects are created with the API, you'd see neither style of "folder" in the object listing.
That is probably a bug in the API endpoint which includes the "folder" - S3 internally doesn't actually have a folder structure, but instead is just a set of keys associated with files, where keys (for convenience) can contain slash-separated paths which then show up as "folders" in the web interface. There is the option in the API to specify a prefix, which I believe can be any part of the key up to and including part of the filename.
EMR's s3 client is not the apache one, so I can't speak accurately about it.
In ASF hadoop releases (and HDP, CDH)
The older s3n:// client uses $folder$ as its folder delimiter.
The newer s3a:// client uses / as its folder marker, but will handle $folder$ if there. At least it used to; I can't see where in the code it does now.
The S3A clients strip out all folder markers when you list things; S3A uses them to simulate empty dirs and deletes all parent markers when you create child file/dir entries.
Whatever you have which processes GET should just ignore entries with "/" or $folder at the end.
As to why they are different, the local EMRFS is a different codepath, using dynamo for implementing consistency. At a guess, it doesn't need to mock empty dirs, as the DDB tables will host all directory entries.

What mechanism is there to prevent GCP bucket name squatting?

My company's name is mycomp.
GCP buckets are in a global, public namespace and their names must be publicly unique so all my buckets are prefixed with mycomp.
So mycomp-production, mycomp-test, mycomp-stage, etc.
What is to prevent someone from grabbing mycomp-dev? Like cybersquatting on that bucket name. Something like that could potentially really screw up my organizational structure.
How can I stop or reserve a bucket prefix? Is this even possible? If I want to be an A-hole whats to stop me from grabbing "Nordstrom" or "walmart" if I get their first?
GCS supports domain-named buckets. This would allow you to create buckets like production.mycomp.com and test.mycomp.com. Since the domain must be owned to create buckets with the domain suffix, it ensures that other people can't create buckets with that naming scheme.

What is the best way to handle REST resource proposal generation?

I have this API : GET /travels/generate/{city-departure}/{city-arrival}
It generate a list of possible travels path (with train changes, etc).
Now these are not real resources because they don't have ID (they are only generated for proposal).
What is the best way to select one and save it in a RESTful way ? Should I create a temporary resource for each proposal like "GET /temporary-travel/{id}" ?
A REST resource does not need to have an ID. It must be identifiable. Your URLs
/travels/generate/{city-departure}/{city-arrival}
are completely OK to identify a resource.
A REST resource does not need to have an ID. It must be identifiable.
One solution would be using a list index (e.g. GET /travels/generate/{city-departure}/{city-arrival}/{index} ). This somehow needs you to remember the content and the order of the proposed travel paths.
To overcome the limitation of temporary storing possible travel paths, you may either store them permanently and providing them an static identifier or you may provide a domain specific key that consists of multiple chained static identifiers that provide an identity to your travel path (e.g. chaining all route segment IDs or so).
I somehow prefer the idea of storing all possible travel paths even knowing it is technically somewhat nearly impossible. I like it because the travel paths possibly provided by your system are kind of limited due to the algorithm and the data base you use.