Migrate ceph buckets to new user with zero downtime - ceph

I have a ceph/radosgw bucket with several million objects in it, and I need to change the ownership of the bucket to another user.
Normally, this is done by linking the bucket to the new user, then chowning all of the files in it, like this:
radosgw-admin bucket unlink --uid=user1 --bucket=bigbucket
radosgw-admin bucket link --uid=user2 --bucket=bigbucket
radosgw-admin bucket chown --uid=user2 --bucket=bigbucket
Unfortunately, the chown operation has to loop over every single object in the bucket in order to update metadata. This results in an extended downtime window (sometimes 1 hour per million objects apparently) where neither the old user nor the new user can access the full contents of the bucket.
Is there any way to change bucket ownership that doesn't require downtime? Some ideas:
Is it possible for a bucket or specific objects to be owned by two users at the same time?
Could we create the new user, then just change their uid or some other piece of metadata that grants them access to the old user's bucket?
Could the problem be solved client-side, or maybe with a proxy?

You can add a bucket policy to the bucket to get access to both users until the migration gets done by chown command:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"AWS": ["arn:aws:iam:::user/user1", "arn:aws:iam:::user/user2"]},
"Action": "*",
"Resource": [
"arn:aws:s3:::bigbucket/*"
]
}]
}

Related

Trigger a dag in Amazon Managed Workflows for Apache Airflow (MWAA) as a part CI/CD

Wondering if there is any way (blueprint) to trigger an airflow dag in MWAA on the merge of a pull request (preferably via github actions)? Thanks!
You need to create a role in AWS :
set permission with policy airflow:CreateCliToken
{
"Action": "airflow:CreateCliToken",
"Effect": "Allow",
"Resource": "*"
}
Add trusted relationship (with your account and repo)
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::{account_id}:oidc-provider/token.actions.githubusercontent.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringLike": {
"token.actions.githubusercontent.com:sub": "repo:{repo-name}:*"
}
}
}
]
}
In github action you need to set AWS credential with role-to-assume and permission to job
permissions:
id-token: write
contents: read
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials#v1
with:
role-to-assume: arn:aws:iam::{ account_id }:role/{role-name}
aws-region: {region}
Call MWAA using the CLI see aws ref about how to create token and run dag.
(Answering for Airflow without specific context to MWAA)
Airflow offers rest API which has trigger dag end point so in theory you can configure GitHub action that will run after merge of PR and trigger a dag run via REST call. In theory this should work.
In practice this will not work as you expect.
Airflow is not synchronous with your merges (even if merged dump code in the dag folder and there is no additional wait time for GitSync). Airflow has a DAG File Processing service that scans the Dag folder and lookup for changes in files. It process the changes and then a dag is registered to the database. Only after that Airflow can use the new code. This seralization process is important it makes sure different parts of airflow (webserver etc..) don't have access to your dag folder.
This means that if you invoke dagrun right after merge you are risking that it will execute an older version of your code.
I don't know what why you need such mechanism it's not very typical requirement but I'd advise you to not trying to force this idea into your deployment.
To clarify:
If under a specific deployment you can confirm that the code you deployed is parsed and register as dag in the database then there is no risk in doing what you are after. This is probably a very rare and unique case.

Restrict gcloud service account to specific bucket

I have 2 buckets, prod and staging, and I have a service account. I want to restrict this account to only have access to the staging bucket. Now I saw on https://cloud.google.com/iam/docs/conditions-overview that this should be possible. I created a policy.json like this
{
"bindings": [
{
"role": "roles/storage.objectCreator",
"members": "serviceAccount:staging-service-account#lalala-co.iam.gserviceaccount.com",
"condition": {
"title": "staging bucket only",
"expression": "resource.name.startsWith(\"projects/_/buckets/uploads-staging\")"
}
}
]
}
But when i fire gcloud projects set-iam-policy lalala policy.json i get:
The specified policy does not contain an "etag" field identifying a
specific version to replace. Changing a policy without an "etag" can
overwrite concurrent policy changes.
Replace existing policy (Y/n)?
ERROR: (gcloud.projects.set-iam-policy) INVALID_ARGUMENT: Can't set conditional policy on policy type: resourcemanager_projects and id: /lalala
I feel like I misunderstood how roles, policies and service-accounts are related. But in any case: is it possible to restrict a service account in that way?
Following comments, i was able to solve my problem. Apparently bucket-permissions are somehow special, but i was able to set a policy on the bucket that allows access for my user, using gsutil:
gsutils iam ch serviceAccount:staging-service-account#lalala.iam.gserviceaccount.com:objectCreator gs://lalala-uploads-staging
After firing this, the access is as-expected. I found it a little bit confusing that this is not reflected on the service-account policy:
% gcloud iam service-accounts get-iam-policy staging-service-account#lalala.iam.gserviceaccount.com
etag: ACAB
Thanks everyone

Can't remove OWNER access to a Google Cloud Storage object

I have a server that writes some data files to a Cloud Storage bucket, using a service account to which I have granted "Storage Object Creator" permissions for the bucket. I want that service account's permissions to be write-only.
The Storage Object Creator permission also allows read access, as far as I can tell, so I wanted to just remove the permission for the objects after they have been written. I thought I could use an ACL to do this, but it doesn't seem to work. If I use
gsutil acl get gs://bucket/object > acl.json
then edit acl.json to remove the OWNER permission for the service account, then use
gsutil acel set acl.json gs://bucket/object
to update the ACL, I find that nothing has changed; the OWNER permission is still there if I check the ACL again. The same thing happens if I try to remove the OWNER permission in the Cloud Console web interface.
Is there a way to remove that permission? Or another way to accomplish this?
You cannot remove the OWNER permissions for the service account that uploaded the object, from:
https://cloud.google.com/storage/docs/access-control/lists#bestpractices
The bucket or object owner always has OWNER permission of the bucket or object.
The owner of a bucket is the project owners group, and the owner of an object is either the user who uploaded the object, or the project owners group if the object was uploaded by an anonymous user.
When you apply a new ACL to a bucket or object, Cloud Storage respectively adds OWNER permission to the bucket or object owner if you omit the grants.
I have not tried this, but you could upload the objects using once service account (call it SA1), then rewrite the objects using a separate service account (call it SA2), and then delete the objects. SA1 will no longer be the owner, and therefore won't have read permissions. SA2 will continue to have both read and write permissions though, there is no way to prevent the owner of an object from reading it.
Renaming the object does the trick.
gsutil mv -p gs://bucket/object gs://bucket/object-renamed
gsutil mv -p gs://bucket/object-renamed gs://bucket/object
The renamer service account will become the object OWNER.

Objects do not inherit bucket permissions

In GCS storage, when adding permissions to a bucket (NOT the whole project; just a single bucket inside that project), you used to be able to set up the permissions of a bucket so that any NEW objects put in the bucket inherit the bucket's permissions.
In the newest version of the GCS however, we have not been able to figure out how to do this. We can set permissions to a root bucket:
{
"email": "someuser#someaccount.iam.gserviceaccount.com",
"entity": "someuser#someaccount.iam.gserviceaccount.com",
"role": "READER"
}
But then when a new object is placed in that bucket, it does not inherit this role.
Is there a way to either (a) inherit the role, or (b) set an IAM role to the bucket (we have only been able to set an IAM role to the project, not a specific bucket)?
Thanks!
There are five different ways to configure Access Control options for Cloud Storage buckets. I suggest you the Access Control Lists (ACLs) to inherit the role in a single bucket since ACLs are used when “you need fine-grained control over individual objects”.
To change the permissions on a single bucket inside a project using the Console,
Go to Storage, browser. Once there you will see a bucket list.
Select the bucket in which you want to change the permissions.
Click on the three vertical dots at the right side and select "Edit bucket permissions".
Type the account that you want to configure and select the desired role.
The described procedure is detailed here, as well as other ways to set the ACLs, as for example using the Cloud Shell. The next command specify individual grants:
gsutil acl ch -u [USER_EMAIL]:[PERMISSION] gs://[BUCKET_NAME]
Find a list of predefined roles here.
Update 2
Considering the next error:
CommandException:
user#account.iam.gserviceaccount.com:roles/storage.legacyBucketReader
is not a valid ACL change Allowed permissions are OWNER, WRITER,
READER
And the fact that there are two types of roles involved:
Identity and Access Management (IAM) roles: members project oriented roles. “Defines who (identity) has what access (role) for which resource”. Example: gsutil iam ch user:[USER_EMAIL]:objectCreator,objectViewer gs://[BUCKET_NAME]
Access Control Lists (ACLs): grant read or write access to users for individual buckets or objects. Example: gsutil acl ch -u [USER_EMAIL]:READER gs://[BUCKET_NAME]
The command is not working because both commands are mixed. For gsutil acl, the only possible permissions are READER, WRITER, OWNER, Default, as you can see here.

PUT Object to AWS S3 via HTTP through VPC Endpoint with proper ACL?

I am using an HTTPS client to PUT an object to Amazon S3 from an EC2 instance within a VPC that has an S3 VPC Endpoint configured. The target Bucket has a Bucket Policy that only allows access from specific VPCs, so authentication via IAM is impossible; I have to use HTTPS GET and PUT to read and write Objects.
This works fine as described, but I'm having trouble with the ACL that gets applied to the Object when I PUT it to the Bucket. I've played with setting a Canned ACL using HTTP headers like the following, but neither results in the correct behavior:
x-amz-acl: private
If I set this header, the Object is private but it can only be read by the root email account so this is no good. Others need to be able to access this Object via HTTPS.
x-amz-acl: bucket-owner-full-control
I totally thought this Canned ACL would do the trick, however, it resulted in unexpected behavior, namely that the Object became World Readable! I'm also not sure how the Owner of the Object was decided either since it was created via HTTPS, in the console the owner is listed as a seemingly random value. This is the documentation description:
Both the object owner and the bucket owner get FULL_CONTROL over the
object. If you specify this canned ACL when creating a bucket, Amazon
S3 ignores it.
This is totally baffling me, because according to the Bucket Policy, only network resources of approved VPCs should even be able to list the Object, let alone read it! Perhaps it has to do with the union of the ACL and the Bucket Policy and I just don't see something.
Either way, maybe I'm going about this all wrong anyway. How can I PUT an object to S3 via HTTPS and set the permissions on that object to match the Bucket Policy, or otherwise make the Bucket Policy authoritative over the ACL?
Here is the Bucket Policy for good measure:
{
"Version": "2008-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": [
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:GetObject",
"s3:GetObjectTagging",
"s3:GetObjectTorrent",
"s3:GetObjectVersion",
"s3:GetObjectVersionTagging",
"s3:GetObjectVersionTorrent",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions",
"s3:ListMultipartUploadParts",
"s3:PutObject",
"s3:PutObjectTagging"
],
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"
],
"Condition": {
"StringEquals": {
"aws:SourceVpc": "vpc-12345678"
}
}
}
]
}
The way that S3 ACLs and Bucket Policies work is the concept of "Least Privilege".
Your bucket policy only specifies ALLOW for the specified VPC. No one else is granted ALLOW access. This is NOT the same as denying access.
This means that your Bucket or object ACL is granting access.
In the S3 console double check who the file owner is after the PUT.
Double check the ACL for the bucket. What rights have your granted at the bucket level?
Double check the rights that you are using for the PUT operation. Unless you have granted public write access or the PUT is being ALLOWED by the bucket policy, the PUT must be using a signature. This signature will determine the permissions for the PUT operation and who owns the file after the PUT. This is determined by the ACCESS KEY used for the signature.
Your x-amz-acl should contain bucket-owner-full-control.
[EDIT after numerous comments below]
The problem that I see is that you are approaching security wrong in your example. I would not use the bucket policy. Instead I would create an IAM role and assign that role to the EC2 instances that are writing to the bucket. This means that the PUTs are then signed with the IAM Role Access Keys. This preserves the ownership of the objects. You can then have the ACL being bucket-owner-full-control and public-read (or any supported ACL permissions that you want).