Trigger a dag in Amazon Managed Workflows for Apache Airflow (MWAA) as a part CI/CD - github

Wondering if there is any way (blueprint) to trigger an airflow dag in MWAA on the merge of a pull request (preferably via github actions)? Thanks!

You need to create a role in AWS :
set permission with policy airflow:CreateCliToken
{
"Action": "airflow:CreateCliToken",
"Effect": "Allow",
"Resource": "*"
}
Add trusted relationship (with your account and repo)
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::{account_id}:oidc-provider/token.actions.githubusercontent.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringLike": {
"token.actions.githubusercontent.com:sub": "repo:{repo-name}:*"
}
}
}
]
}
In github action you need to set AWS credential with role-to-assume and permission to job
permissions:
id-token: write
contents: read
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials#v1
with:
role-to-assume: arn:aws:iam::{ account_id }:role/{role-name}
aws-region: {region}
Call MWAA using the CLI see aws ref about how to create token and run dag.

(Answering for Airflow without specific context to MWAA)
Airflow offers rest API which has trigger dag end point so in theory you can configure GitHub action that will run after merge of PR and trigger a dag run via REST call. In theory this should work.
In practice this will not work as you expect.
Airflow is not synchronous with your merges (even if merged dump code in the dag folder and there is no additional wait time for GitSync). Airflow has a DAG File Processing service that scans the Dag folder and lookup for changes in files. It process the changes and then a dag is registered to the database. Only after that Airflow can use the new code. This seralization process is important it makes sure different parts of airflow (webserver etc..) don't have access to your dag folder.
This means that if you invoke dagrun right after merge you are risking that it will execute an older version of your code.
I don't know what why you need such mechanism it's not very typical requirement but I'd advise you to not trying to force this idea into your deployment.
To clarify:
If under a specific deployment you can confirm that the code you deployed is parsed and register as dag in the database then there is no risk in doing what you are after. This is probably a very rare and unique case.

Related

cloudformation template applied to all resources, using a wildcard

I am trying to use a JSON script as a Cloudformation template, but I am being asked to add a resource member even though the JSON script is already running in AWS.
The policy is meant to apply to all resources, and it's currently defined in IAM:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Action": [
"*"
],
"Resource": [
"*"
],
"Condition": {
"DateLessThan": {
"aws:TokenIssueTime": "[policy creation time]"
}
}
}
]
}
All I want to do is simply copy that code (which currently sits in IAM > Roles > Revoke Sessions tab)
and stick it into a cloudformation template, but I cannot figure out how to tell Cloudformation that the JSON script is meant to be applied to ALL resources.
Is there any way to specify that the policy should apply to all resources in the JSON script? Any help would be much appreciated. Thank you!

GitHub API permission name for self-hosted GitHub Actions runners?

At the end of the day, I'm trying to implement the solution linked from here: Reuse Github Actions self hosted runner on multiple repositories. But the tutorials walk you though setting up a GitHub app in the UI, and I'm trying to do it via the API.
Context:
Creating a new "GitHub App" (not "OAuth App") in GitHub Enterprise v3.0 (soon migrating to v3.1).
Trying to do it entirely over the API and explicitly NOT the UI, by creating an "app manifest" (https://docs.github.com/en/enterprise-server#3.0/developers/apps/building-github-apps/creating-a-github-app-from-a-manifest).
Everything I've read about permissions on docs.github.com ends up pointing over to https://docs.github.com/en/enterprise-server#3.0/rest/reference/permissions-required-for-github-apps, which does not include the specific values that can be used with the API.
On a GHE instance, there is a large list of permissions available at a URL with this pattern:
https://{HOSTNAME}/organizations/{ORG}/settings/apps/{APP}/permissions
The specific permission I'm trying set says:
Self-hosted runners
View and manage Actions self-hosted runners available to an organization.
Access: Read & write
In the documentation (https://docs.github.com/en/enterprise-server#3.0/developers/apps/building-github-apps/creating-a-github-app-from-a-manifest#github-app-manifest-parameters) there is a parameter called default_permissions.
What is the identifier (key) to use for this permission, where the value is write?
I've tried:
the documented Self-hosted runners
the guess self-hosted runners
the guess self-hosted_runners
the guess self_hosted_runners
the guess selfhosted_runners
the guess runners
…but ultimately, the actual values which can be used here are (as far as I can tell after several hours of digging and guessing) undocumented.
actions:read and checks:read appear to work. Those are also undocumented, but I was able to figure it out by looking at the URLs, making an educated guess, and testing.
All of the tutorials I can find on the internet, including those on docs.github.com, all walk you through creating a new GitHub app via the UI. I am very explicitly trying to do this over the API.
Any tips? Have I missed something? Is this not available in GHE yet?
Here is my app manifest, redacted.
{
"public": true,
"name": "My app",
"description": "My app's description.",
"url": "https://github.example.com/my-org/my-repo",
"redirect_url": "http://localhost:9876/register/redirect",
"default_events": [],
"default_permissions": {
"actions": "read",
"checks": "read",
"runners": "write"
},
"hook_attributes": {
"url": "",
"active": false
}
}
WITH the "runners": "write" line, the error message I receive says:
Invalid GitHub App configuration
The configuration does not appear to be a valid GitHub App manifest.
× Error Default permission records resource is not included in the list
WITHOUT the "runners": "write" line, the submission is successful.
The GitHub team finally updated the documentation. The permission I was looking for was organization_self_hosted_runners.

Restrict gcloud service account to specific bucket

I have 2 buckets, prod and staging, and I have a service account. I want to restrict this account to only have access to the staging bucket. Now I saw on https://cloud.google.com/iam/docs/conditions-overview that this should be possible. I created a policy.json like this
{
"bindings": [
{
"role": "roles/storage.objectCreator",
"members": "serviceAccount:staging-service-account#lalala-co.iam.gserviceaccount.com",
"condition": {
"title": "staging bucket only",
"expression": "resource.name.startsWith(\"projects/_/buckets/uploads-staging\")"
}
}
]
}
But when i fire gcloud projects set-iam-policy lalala policy.json i get:
The specified policy does not contain an "etag" field identifying a
specific version to replace. Changing a policy without an "etag" can
overwrite concurrent policy changes.
Replace existing policy (Y/n)?
ERROR: (gcloud.projects.set-iam-policy) INVALID_ARGUMENT: Can't set conditional policy on policy type: resourcemanager_projects and id: /lalala
I feel like I misunderstood how roles, policies and service-accounts are related. But in any case: is it possible to restrict a service account in that way?
Following comments, i was able to solve my problem. Apparently bucket-permissions are somehow special, but i was able to set a policy on the bucket that allows access for my user, using gsutil:
gsutils iam ch serviceAccount:staging-service-account#lalala.iam.gserviceaccount.com:objectCreator gs://lalala-uploads-staging
After firing this, the access is as-expected. I found it a little bit confusing that this is not reflected on the service-account policy:
% gcloud iam service-accounts get-iam-policy staging-service-account#lalala.iam.gserviceaccount.com
etag: ACAB
Thanks everyone

Can we trigger AWS Lambda function from aws Glue PySpark job?

Currently i'm able to run Glue PySpark job, but is this possible to call a lambda function from Glue this job ? Using below code from my PySpark Glue job i'm calling lambda function.
lambda_client = boto3.client('lambda', region_name='us-west-2')
response = lambda_client.invoke(FunctionName='test-lambda')
Error:
botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when calling the Invoke operation: User: arn:aws:sts::208244724522:assumed-role/AWSGlueServiceRoleDefault/GlueJobRunnerSession is not authorized to perform: lambda:InvokeFunction on resource: arn:aws:lambda:us-west-2:208244724522:function:hw-test
But I added proper lambda roles to my Glue iam role, still getting above error. Any specific role need to add ?
Thanks.
To invoke AWS Lambda you can use the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowToExampleFunction",
"Effect": "Allow",
"Action": "lambda:InvokeFunction",
"Resource": "arn:aws:lambda:<region>:<123456789012>:function:<example_function>"
}
]
}
Your roles are not suitable for Lambda invocations as
AWSLambdaBasicExecutionRole – Grants permissions only for the Amazon CloudWatch Logs actions to write logs. You can use this policy
if your Lambda function does not access any other AWS resources except
writing logs.
AWSLambdaVPCAccessExecutionRole – Grants permissions for Amazon Elastic Compute Cloud (Amazon EC2) actions to manage elastic network
interfaces (ENIs).
Please see documentation here about these roles.

Disable Deploy API for specific Stages

I'm building an API using AWS API Gateway, I will have two or more stages like dev, production etc.
What i want to do is allow only a group of users to deploy to production stage.
What i have accomplished is deny deploy to all stages, but i can't figure out how to specify stages.
Here is my policy to deny Deploy to every stage, also if there is a better way to control I will be glad to hear it.
{
"Sid": "VisualEditor2",
"Effect": "Deny",
"Action": "apigateway:POST",
"Resource": "arn:aws:apigateway:us-east-1::/restapis/{APIID}/deployments"
}
Did you try something like this, to block the hole stage
"Resource": [
"arn:aws:apigateway:us-east-1::/restapis/{APIID}/stages",
"arn:aws:apigateway:us-east-1::/restapis/{APIID}/stages/production"
]
Source: https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-iam-policy-examples.html#api-gateway-policy-example-apigateway-stage-full-access