In CDK, can I wait until a Helm-installed operator is running before applying a manifest? - kubernetes

I'm installing the External Secrets Operator alongside Apache Pinot into an EKS cluster using CDK. I'm running into an issue that I think is being caused by CDK attempting to create a resource defined by the ESO before the ESO has actually gotten up and running. Here's the relevant code:
// install Pinot
const pinot = cluster.addHelmChart('Pinot', {
chartAsset: new Asset(this, 'PinotChartAsset', { path: path.join(__dirname, '../pinot') }),
release: 'pinot',
namespace: 'pinot',
createNamespace: true
});
// install the External Secrets Operator
const externalSecretsOperator = cluster.addHelmChart('ExternalSecretsOperator', {
chart: 'external-secrets',
release: 'external-secrets',
repository: 'https://charts.external-secrets.io',
namespace: 'external-secrets',
createNamespace: true,
values: {
installCRDs: true,
webhook: {
port: 9443
}
}
});
// create a Fargate Profile
const fargateProfile = cluster.addFargateProfile('FargateProfile', {
fargateProfileName: 'externalsecrets',
selectors: [{ 'namespace': 'external-secrets' }]
});
// create the Service Account used by the Secret Store
const serviceAccount = cluster.addServiceAccount('ServiceAccount', {
name: 'eso-service-account',
namespace: 'external-secrets'
});
serviceAccount.addToPrincipalPolicy(new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
actions: [
'secretsmanager:GetSecretValue',
'secretsmanager:DescribeSecret'
],
resources: [
'arn:aws:secretsmanager:us-east-1:<MY-ACCOUNT-ID>:secret:*'
]
}))
serviceAccount.node.addDependency(externalSecretsOperator);
// create the Secret Store, an ESO Resource
const secretStoreManifest = getSecretStoreManifest(serviceAccount);
const secretStore = cluster.addManifest('SecretStore', secretStoreManifest);
secretStore.node.addDependency(serviceAccount);
secretStore.node.addDependency(fargateProfile);
// create the External Secret, another ESO resource
const externalSecretManifest = getExternalSecretManifest(secretStoreManifest)
const externalSecret = cluster.addManifest('ExternalSecret', externalSecretManifest)
externalSecret.node.addDependency(secretStore);
externalSecret.node.addDependency(pinot);
Even though I've set the ESO as a dependency to the Secret Store, when I try to deploy this I get the following error:
Received response status [FAILED] from custom resource. Message returned: Error: b'Error from server (InternalError): error when creating "/tmp/manifest.yaml": Internal error occurred:
failed calling webhook "validate.clustersecretstore.external-secrets.io": Post "https://external-secrets-webhook.external-secrets.svc:443/validate-external-secrets-io-v1beta1-clusterse
cretstore?timeout=5s": no endpoints available for service "external-secrets-webhook"\n'
If I understand correctly, this is the error you'd get if you try to add a Secret Store before the ESO is fully installed. I'm guessing that CDK does not wait until the ESO's pods are running before attempting to apply the manifest. Furthermore, if I comment out the lines the create the Secret Store and External Secret, do a cdk deploy, uncomment those lines and then deploy again, everything works fine.
Is there any way around this? Some way I can retry applying the manifest, or to wait a period of time before attempting the apply?

The addHelmChart method has a property wait that is set to false by default - setting it to true lets CDK know to not mark the installation as complete until of its its K8s resources are in a ready state.

Related

How to deploy the kinesis-video-producer Docker image from AWS's own ECR to Fargate using CDK in TypeScript?

I'm trying to stand up a proof of concept that ingests an RTSP video stream into Kinesis Video. The provided documentation has a docker image all set up that seems to have everything I need to do this, hosted by AWS on 546150905175.dkr.ecr.us-west-2.amazonaws.com. What I am having trouble with, though, is getting that deployment (via an Amplify Custom category, in TypeScript CDK) to work.
I've tried different variations on
import * as iam from "#aws-cdk/aws-iam";
import * as ecs from "#aws-cdk/aws-ecs";
import * as ec2 from "#aws-cdk/aws-ec2";
const kinesisUserAccessKey = new iam.AccessKey(this, 'KinesisStreamUserAccessKey', {
user: kinesisStreamUser,
})
const servicePrincipal = new iam.ServicePrincipal('ecs-tasks.amazonaws.com');
const executionRole = new iam.Role(this, 'IngestVideoTaskDefExecutionRole', {
assumedBy: servicePrincipal,
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonECSTaskExecutionRolePolicy'),
]
});
const taskDefinition = new ecs.FargateTaskDefinition(this, 'IngestVideoTaskDef', {
cpu: 512,
memoryLimitMiB: 1024,
executionRole,
})
const image = ecs.ContainerImage.fromRegistry('546150905175.dkr.ecr.us-west-2.amazonaws.com/kinesis-video-producer-sdk-cpp-amazon-linux:latest');
taskDefinition.addContainer('IngestVideoContainer', {
command: [
'gst-launch-1.0',
'rtspsrc',
`location="${locationParam.secretValue.toString()}"`,
'short-header=TRUE',
'!',
'rtph264depay',
'!',
'video/x-h264,',
'format=avc,alignment=au',
'!',
'kvssink',
`stream-name="${cfnStream.name}"`,
'storage-size=512',
`access-key="${kinesisUserAccessKey.accessKeyId}"`,
`secret-key="${kinesisUserAccessKey.secretAccessKey.toString()}"`,
`aws-region="${REGION}"`,
// `aws-region="${cdk.Aws.REGION}"`,
],
image,
logging: new ecs.AwsLogDriver({
streamPrefix: 'IngestVideoContainer',
}),
})
const service = new ecs.FargateService(this, 'IngestVideoService', {
cluster,
taskDefinition,
desiredCount: 1,
securityGroups: [
ec2.SecurityGroup.fromSecurityGroupId(this, 'DefaultSecurityGroup', SECURITY_GROUP_ID)
],
vpcSubnets: {
subnets: SUBNET_IDS.map(subnetId => ec2.Subnet.fromSubnetId(this, subnetId, subnetId)),
}
})
But it seems like regardless of what I do, an amplify push just stays in 'in progress' for like an hour until I go into the CloudFormation console and cancel the stack update, but deep in the my way to the ECS Console I managed to find an actual error message:
Resourceinitializationerror: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.us-west-2.amazonaws.com/": dial tcp 52.94.177.118:443: i/o timeout
It seems to be some kind of networking issue, but I'm not sure how to proceed. Any assistance you can provide would be wonderful. Cheers!
Figured it out. For those stuck with similar issues, you have to give it an execution role with AmazonECSTaskExecutionRolePolicy, which I already edited above, and set assignPublicIp: true in the service.

aws-ecs-patterns error: Cluster for this service needs Ec2 capacity. Call addXxxCapacity() on the cluster

Hoping someone can help me here, according to AWS CDK documentation if I declare my VPC then I shouldn't declare 'capacity', but when I run cdk synth I get the following error...
throw new Error(Validation failed with the following errors:\n ${errorList});
Error: Validation failed with the following errors:
[PrerenderInfrasctutureStack/preRenderApp/Service] Cluster for this service needs Ec2 capacity. Call addXxxCapacity() on the cluster.
here is my code...
(i hope Nathan Peck sees this)
const ec2 = require('#aws-cdk/aws-ec2');
const ecsPattern = require('#aws-cdk/aws-ecs-patterns');
const ecs = require('#aws-cdk/aws-ecs');
class PrerenderInfrasctutureStack extends cdk.Stack {
/**
*
* #param {cdk.Construct} scope
* #param {string} id
* #param {cdk.StackProps=} props
*/
constructor(scope, id, props) {
super(scope, id, props);
const myVPC = ec2.Vpc.fromLookup(this, 'publicVpc', {
vpcId:'vpc-xxx'
});
const preRenderApp = new ecsPattern.ApplicationLoadBalancedEc2Service(this, 'preRenderApp', {
vpcId: myVPC,
certificate: 'arn:aws:acm:ap-southeast-2:xxx:certificate/xxx', //becuase this is spcified, then the LB will automatically use HTTPS
domainName: 'my-dev.com.au.',
domainZone:'my-dev.com.au',
listenerPort: 443,
publicLoadBalancer: true,
memoryReservationMiB: 8,
cpu: 4096,
desiredCount: 1,
taskImageOptions:{
image: ecs.ContainerImage.fromRegistry('xxx.dkr.ecr.region.amazonaws.com/express-prerender-server'),
containerPort: 3000
},
});
}
}
module.exports = { PrerenderInfrasctutureStack }
This is because if you don't explicitly pass a cluster then it uses the default cluster that exists on your account. However the default cluster starts out with no EC2 capacity, since EC2 instances cost money when they run. You can use the empty default cluster with Fargate mode since Fargate does not require EC2 capacity, it just runs your container inside Fargate, but the default cluster won't work with EC2 mode until you add EC2 instances to the cluster.
The easy solution here is to switch to ApplicationLoadBalancedFargateService instead, because Fargate services run using Fargate capacity, so they don't require EC2 instances in the cluster. Alternatively you should define your own cluster using something like:
// Create an ECS cluster
const cluster = new ecs.Cluster(this, 'Cluster', {
vpc,
});
// Add capacity to it
cluster.addCapacity('DefaultAutoScalingGroupCapacity', {
instanceType: new ec2.InstanceType("t2.xlarge"),
desiredCapacity: 3,
});
Then pass that cluster as a property when creating the ApplicationLoadBalancedEc2Service
Hope this helps!

Pulumi kubernetes secret not creating all data keys

I've declared a kubernetes secret in pulumi like:
const tlsSecret = new k8s.core.v1.Secret('tlsSecret', {
metadata: {
name: 'star.builds.qwil.co'
},
data: {
'tls.crt': tlsCert,
'tls.key': tlsKey
}
});
However, I'm finding that when the secret is created only tls.key is present in the secret. When I look at the Diff View from the pulumi deployment on app.pulumi.com I see the following:
tlsSecret (kubernetes:core:Secret)
+ kubernetes:core/v1:Secret (create)
[urn=urn:pulumi:local::ledger::kubernetes:core/v1:Secret::tlsSecret]
apiVersion: "v1"
data : {
tls.key: "[secret]"
}
kind : "Secret"
metadata : {
labels: {
app.kubernetes.io/managed-by: "pulumi"
}
name : "star.builds.qwil.co"
}
Why is only tls.key being picked up even though I've also specified a tls.crt?
Turns out the variable tlsCert was false-y (I was loading it from config with the wrong key for Config.get()). Pulumi was smart and didn't create a secret with an empty string.

How can I configure an AWS EKS autoscaler with Terraform?

I'm using the AWS EKS provider (github.com/terraform-aws-modules/terraform-aws-eks ). I'm following along the tutorial with https://learn.hashicorp.com/terraform/aws/eks-intro
However this does not seem to have autoscaling enabled... It seems it's missing the cluster-autoscaler pod / daemon?
Is Terraform able to provision this functionality? Or do I need to set this up following a guide like: https://eksworkshop.com/scaling/deploy_ca/
You can deploy Kubernetes resources using Terraform. There is both a Kubernetes provider and a Helm provider.
data "aws_eks_cluster_auth" "authentication" {
name = "${var.cluster_id}"
}
provider "kubernetes" {
# Use the token generated by AWS iam authenticator to connect as the provider does not support exec auth
# see: https://github.com/terraform-providers/terraform-provider-kubernetes/issues/161
host = "${var.cluster_endpoint}"
cluster_ca_certificate = "${base64decode(var.cluster_certificate_authority_data)}"
token = "${data.aws_eks_cluster_auth.authentication.token}"
load_config_file = false
}
provider "helm" {
install_tiller = "true"
tiller_image = "gcr.io/kubernetes-helm/tiller:v2.12.3"
}
resource "helm_release" "cluster_autoscaler" {
name = "cluster-autoscaler"
repository = "stable"
chart = "cluster-autoscaler"
namespace = "kube-system"
version = "0.12.2"
set {
name = "autoDiscovery.enabled"
value = "true"
}
set {
name = "autoDiscovery.clusterName"
value = "${var.cluster_name}"
}
set {
name = "cloudProvider"
value = "aws"
}
set {
name = "awsRegion"
value = "${data.aws_region.current_region.name}"
}
set {
name = "rbac.create"
value = "true"
}
set {
name = "sslCertPath"
value = "/etc/ssl/certs/ca-bundle.crt"
}
}
This answer below is still not complete... But at least it gets me partially further...
1.
kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
helm install stable/cluster-autoscaler --name my-release --set "autoscalingGroups[0].name=demo,autoscalingGroups[0].maxSize=10,autoscalingGroups[0].minSize=1" --set rbac.create=true
And then manually fix the certificate path:
kubectl edit deployments my-release-aws-cluster-autoscaler
replace the following:
path: /etc/ssl/certs/ca-bundle.crt
With
path: /etc/ssl/certs/ca-certificates.crt
2.
In the AWS console, give AdministratorAccess policy to the terraform-eks-demo-node role.
3.
Update the nodes parameter with (kubectl edit deployments my-release-aws-cluster-autoscaler)
- --nodes=1:10:terraform-eks-demo20190922124246790200000007

Accessing Kubernetes Secret from Airflow KubernetesPodOperator

I'm setting up an Airflow environment on Google Cloud Composer for testing. I've added some secrets to my namespace, and they show up fine:
$ kubectl describe secrets/eric-env-vars
Name: eric-env-vars
Namespace: eric-dev
Labels: <none>
Annotations: <none>
Type: Opaque
Data
====
VERSION_NUMBER: 6 bytes
I've referenced this secret in my DAG definition file (leaving out some code for brevity):
env_var_secret = Secret(
deploy_type='env',
deploy_target='VERSION_NUMBER',
secret='eric-env-vars',
key='VERSION_NUMBER',
)
dag = DAG('env_test', schedule_interval=None, start_date=start_date)
operator = KubernetesPodOperator(
name='k8s-env-var-test',
task_id='k8s-env-var-test',
dag=dag,
image='ubuntu:16.04',
cmds=['bash', '-cx'],
arguments=['env'],
config_file=os.environ['KUBECONFIG'],
namespace='eric-dev',
secrets=[env_var_secret],
)
But when I run this DAG, the VERSION_NUMBER env var isn't printed out. It doesn't look like it's being properly linked to the pod either (apologies for imprecise language, I am new to both Kubernetes and Airflow). This is from the Airflow task log of the pod creation response (also formatted for brevity/readability):
'env': [
{
'name': 'VERSION_NUMBER',
'value': None,
'value_from': {
'config_map_key_ref': None,
'field_ref': None,
'resource_field_ref': None,
'secret_key_ref': {
'key': 'VERSION_NUMBER',
'name': 'eric-env-vars',
'optional': None}
}
}
]
I'm assuming that we're somehow calling the constructor for the Secret wrong, but I am not entirely sure. Guidance appreciated!
Turns out this was a misunderstanding of the logs!
When providing an environment variable to a Kubernetes pod via a Secret, that value key in the API response is None because the value comes from the secret_key_ref.