When creating a new cloud composer env, is it possible to set the bucket to preexisting one? - gcloud

So I already have an empty storage bucket created for this and I don't want composer to create its own bucket for the dags - I'd like to use the one already created.
It's not ideal to have it just create a random bucket and then go
gcloud composer environments run test-environment --location europe-west1 variables -- --set gcs_bucket gs://my-bucket
I've dug around the docs but it seems you cannot go around it creating a brand new bucket every time?

Currently, it is not possible.
In the environment’s configuration in Cloud Composer API, the dagGcsPrefix parameter is output only, you cannot set it. Documentation also mentions a Cloud Storage bucket is always created along with the Composer environment, the name of the bucket is based on the environment’s region, name and a random Id.
You may want to “Star” this Feature Request for the mentioned functionality, to receive notifications whenever an update on this regard is published. You can also review or subscribe to the Cloud Composer release notes to be updated about recently added features.

You are right, this is currently not supported in Composer.

Related

Without retention policy or lifecycle rules, would Google Cloud Storage automatically delete files?

My app uses Google Cloud Storage through Firebase with Java, Angular & Flutter. It stores pictures and such there. Now, a lot of older files recently disappeared from Google Cloud Storage. A test version of my app is probably the culprit. But I want to make sure that I got the storage bucket configured correctly.
Please note that I don't have object versioning enabled. From what I know, it would keep a copy of deleted files around. That's why I plan to enable it in the future. But it doesn't help me with files deleted in the past.
Right now, my storage bucket is configured as follows:
Default storage class: Standard
Object versioning: Off
Retention policy: None
Lifecycle rules: None
So with that configuration, would Google Cloud Storage automatically delete files? Like, say, after a year or so?
No. If you don't ask Cloud Storage to delete your files, your files will stay around forever. There's no expiration of any sort by default. Cloud Storage is a popular tool for long term storage/backup/retention.
If you want to be especially careful not to delete certain objects, retention policies and object holds can be used to make it harder to delete objects by accident. For example, if you wanted to temporarily ensure that your scripts would not delete your most important object, you could run:
gsutil retention temp set gs://my_bucket_name/my_important_file.txt
This would set a "temporary object hold" on the object, which would make it so that my_important_file.txt could not be deleted with the delete command until you released the hold.

In teraform, is there a way to refresh the state of a resource using TF files without using CLI commands?

I have a requirement to refresh the state of a resource "ibm_is_image" using TF files without using CLI commands ?
I know that we can import the state of a resource using "terraform import ". But I should do the same using IaC in TF files.
How to achieve this ?
Example:
In workspace1, I create a resource "f5_custom_image" which gets deleted later from command line. In workspace2, the same code in TF file will assume that "f5_custom_image" already exists and it fails to read the custom image resource. So, my code has to refresh the terraform state of this resource for every execution of "terraform apply":
resource "ibm_is_image" "f5_custom_image" {
depends_on = ["data.ibm_is_images.custom_images"]
href = "${local.image_url}"
name = "${var.vnf_vpc_image_name}"
operating_system = "centos-7-amd64"
timeouts {
create = "30m"
delete = "10m"
}
}
In Terraform's model, an object is fully managed by a single Terraform configuration and nothing else. Having an object be managed by multiple configurations or having an object be created by Terraform but then deleted later outside of Terraform is not a supported workflow.
Terraform is intended for managing long-lived architecture that you will gradually update over time. It is not designed to manage build artifacts like machine images that tend to be created, used, and then destroyed.
The usual architecture for this sort of use-case is to consider the creation of the image as a "build" step, carried out using some other software outside of Terraform, and then we use Terraform only for the "deploy" step, at which point the long-lived infrastructure is either created or updated to use the new image.
That leads to a build and deploy pipeline with a series of steps like this:
Use separate image build software to construct the image, and record the id somewhere from which it can be retrieved using a data source in Terraform.
Run terraform apply to update the long-lived infrastructure to make use of the new image. The Terraform configuration should include a data block to read the image id from wherever it was recorded in the previous step.
If desired, destroy the image using software outside of Terraform once Terraform has completed.
When implementing a pipeline like this, it's optional but common to also consider a "rollback" process to use in case the new image is faulty:
Reset the recorded image id that Terraform is reading from back to the id that was stored prior to the new build step.
Run terraform apply to update the long-lived infrastructure back to using the old image.
Of course, supporting that would require retaining the previous image long enough to prove that the new image is functioning correctly, so the normal build and deploy pipeline would need to retain at least one historical image per run to roll back to. With that said, if you have a means to quickly recreate a prior image during rollback then this special workflow isn't strictly needed: instead, you can implement rollback instead by "rolling forward" to an image constructed with the prior configuration.
An example software package commonly used to prepare images for use with Terraform on other cloud vendors is HashiCorp Packer, but sadly it looks like it does not have IBM Cloud support and so you may need to look for some similar software that does support IBM Cloud, or write something yourself using the IBM Cloud SDK.

Create an RDS/Postgres Replica in another AWS account?

I have an AWS account with a Postgres RDS database that represents the production environment for an app. We have another team that is building an analytics infrastructure in a different AWS account. They need to be able to pull data from our production database to hydrate their reports.
From my research so far, it seems there are a couple options:
Create a bash script that runs on a CRON schedule that uses pg_dump and pg_restore and stash that on an EC2 instance in one of the accounts.
Automate the process of creating a Snapshot on a schedule and then ship that to the other accounts S3 bucket. Then create a Lambda (or other script) that triggers when the snapshot is placed in the S3 bucket and restore it. Downside to this is we'd have to create a new RDS instance with each restore (since you can't restore a Snapshot to an existing instance), which changes the FQDN of the database (which we can mitigate using Route53 and a CNAME that gets updated, but this is complicated).
Create a read-replica in the origin AWS account and open up security for that instance so they can just access it directly (but then my account is responsible for all the costs associated with hosting and accessing it).
None of these seem like good options. Is there some other way to accomplish this?
I would suggest to use AWS Data Migration Service It can listen to changes on your source database and stream them to a target (https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Task.CDC.html)
There is also a third-party blog post explaining how to set this up
https://medium.com/tensult/cross-account-and-cross-region-rds-mysql-db-replication-part-1-55d307c7ae65
Pricing is per hour, depending on the size of the replication EC2 instance. It runs in the target account, so it will not be on your cost center.

How to write to dags directory from Cloud Composer webserver?

I'm writing Cloud Composer plugin and I need to create DAG in runtime. How can I create DAG file from webserver or how can I access bucket ID from plugin code(so I can use gcs client and just upload DAG)? I tried code below and it doesn't work, I don't get any exceptions but also I don't see any results:
dag_path = os.path.join(settings.DAGS_FOLDER, dag_id + '.py')
with open(dag_path, 'w') as dag:
dag.write(result)
Possible solution is to read bucket ID from Cloud Composer env variable
You may either use the Environment Variables, or you may make use of the API's provided by GCloud SDK.
gcloud composer environments describe --format=json --project=<project-name> --location=<region> <cluster-name>
This would return the details of the cloud composer cluster.
It would have the dag location under the key dagGcsPrefix
The format of dagGcsPrefix would be gs://<GCSBucket>/dags

Setting the Durable Reduced Availability (DRA) attribute for a bucket using Storage Console

When manually creating a new cloud storage bucket using the web-based storage console (https://console.developers.google.com/), is there a way to specify the DRA attribute? From the documentation, it appears that the only way to create buckets with that attribute is to either use Curl, gsutil or some other script, but not the console.
There is currently no way to do this.
At present, the storage console provides only a subset of the Cloud Storage API, so you'll need to use one of the tools you mentioned to create a DRA bucket.
For completeness, it's pretty easy to do this using gsutil (documentation at https://developers.google.com/storage/docs/gsutil/commands/mb):
gsutil mb -c DRA gs://some-bucket