Error: Cycle from kubernetes_job in Terraform - kubernetes

I have a pretty simple kubernetes_job like this:
resource "kubernetes_job" "demo" {
metadata {
name = "demo"
}
spec {
template {
metadata {}
spec {
container {
name = "pi"
image = "perl"
command = ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
}
restart_policy = "Never"
}
}
backoff_limit = 1
}
wait_for_completion = true
timeouts {
create = "2m"
update = "2m"
}
}
Which is supposed to just get completed once and be done with it... which it does, but a re-occurring error keeps appearing:
Error: Cycle: module.kubernetes_job.demo
I have found that the job in Kubernetes is listed as "complete" but the respective pod has disappeared, so to "fix" this, I have to delete the job and terraform plan and apply again.
Is there a way to have the job (even though it says completed), start a new pod if the existing one disappears without this cycle error?

Related

I am getting a vmSize error when trying to deploy a batch service pool using bicep

I have the following Bicep code:
resource pool 'Microsoft.Batch/batchAccounts/pools#2021-06-01' = {
name: '${bs.name}/run-python'
properties: {
scaleSettings: {
fixedScale: {
nodeDeallocationOption: 'TaskCompletion'
targetDedicatedNodes: 1
}
}
deploymentConfiguration: {
cloudServiceConfiguration: {
osFamily: '6'
}
}
vmSize: 'standard_A1_v2'
startTask: {
commandLine: 'cmd /c "pip install azure-storage-blob pandas"'
userIdentity: {
autoUser: {
elevationLevel: 'NonAdmin'
scope: 'Pool'
}
}
waitForSuccess: true
}
}
dependsOn: [
bs
]
}
Where I try to create a pool for my batch service, but when I try to deploy it, I get the following error:
{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.","details":[{"code":"PropertyName","message":"vmSize"}]}
So I think the problem is the vmSize, but there I cannot find concrete example on what the value should be.
CloudServiceConfiguration Pools are deprecated. Please retry using VirtualMachineConfiguration.

Scheduled restarts on kubernetes using terraform

I run a kubernetes cluster on aws managed by terraform. I'd like to automatically restart the pods in the cluster at some regular interval, maybe weekly. Since the entire cluster is managed by terraform, I'd like to run the auto restart command through terraform as well.
At first I assumed that kubernetes would have some kind of ttl for its pods, but that does not seem to be the case.
Elsewhere on SO I've seen the ability to run auto restarts using a cron job managed by kubernetes (eg: How to schedule pods restart). Terraform has a relevant resource -- the kubernetes_cron_job -- but I can't fully understand how to set it up with the permissions necessary to actually run.
Would appreciate some feedback!
Below is what I've tried:
resource "kubernetes_cron_job" "deployment_restart" {
metadata {
name = "deployment-restart"
}
spec {
concurrency_policy = "Forbid"
schedule = "0 8 * * *"
starting_deadline_seconds = 10
successful_jobs_history_limit = 10
job_template {
metadata {}
spec {
backoff_limit = 2
active_deadline_seconds = 600
template {
metadata {}
spec {
service_account_name = var.service_account.name
container {
name = "kubectl"
image = "bitnami/kubectl"
command = ["kubectl rollout restart deploy"]
}
}
}
}
}
}
}
resource "kubernetes_role" "deployment_restart" {
metadata {
name = "deployment-restart"
}
rule {
api_groups = ["apps", "extensions"]
resources = ["deployments"]
verbs = ["get", "list", "patch", "watch"]
}
}
resource "kubernetes_role_binding" "deployment_restart" {
metadata {
name = "deployment-restart"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "Role"
name = kubernetes_role.deployment_restart.metadata[0].name
}
subject {
kind = "ServiceAccount"
name = var.service_account.name
api_group = "rbac.authorization.k8s.io"
}
}
This was based on a combination of Granting RBAC roles in k8s cluster using terraform and How to schedule pods restart.
Currently getting the following error:
Error: RoleBinding.rbac.authorization.k8s.io "deployment-restart" is invalid: subjects[0].apiGroup: Unsupported value: "rbac.authorization.k8s.io": supported values: ""
As per the official documentation rolebinding.subjects.apiGroup for Service Accounts should be empty.
kubectl explain rolebinding.subjects.apiGroup
KIND: RoleBinding VERSION: rbac.authorization.k8s.io/v1
FIELD: apiGroup
DESCRIPTION:
APIGroup holds the API group of the referenced subject. Defaults to "" for
ServiceAccount subjects. Defaults to "rbac.authorization.k8s.io" for User
and Group subjects.

How to deploy Stale Pod (I mean non operational) through gitlab CI?

I would like to deploy an application and the pod should not go to running status(it should be non-operational). User might trigger this when it really requires using Infrastructure as Code (Terraform). I am aware of using kubectl scale -- replicas=0 . Any other leads or info will be well appreciated.
You can keep the replica count to zero for the Deployment or POD into your YAML file if you are using it.
Or if you are using the Terraform
resource "kubernetes_deployment" "example" {
metadata {
name = "terraform-example"
labels = {
test = "MyExampleApp"
}
}
spec {
replicas = 0
selector {
match_labels = {
test = "MyExampleApp"
}
}
template {
metadata {
labels = {
test = "MyExampleApp"
}
}
spec {
container {
image = "nginx:1.7.8"
name = "example"
resources {
limits = {
cpu = "0.5"
memory = "512Mi"
}
requests = {
cpu = "250m"
memory = "50Mi"
}
}
liveness_probe {
http_get {
path = "/nginx_status"
port = 80
http_header {
name = "X-Custom-Header"
value = "Awesome"
}
}
initial_delay_seconds = 3
period_seconds = 3
}
}
}
}
}
}
There is no other way around you can use the client of Kubernetes to do this if don't want to use the Terraform.
If you want to edit the local file using the terraform checkout local-exec
This invokes a process on the machine running Terraform, not on the
resource.
resource "aws_instance" "web" {
# ...
provisioner "local-exec" {
command = "echo ${self.private_ip} >> private_ips.txt"
}
}
using sed command in local-exec or any other command you can update the YAML and apply it.
https://www.terraform.io/docs/language/resources/provisioners/local-exec.html

How can I name eks worker nodes provisioned with terraform?

I am using terraform 12.20.0 and I have provisioned an EKS cluster with 2 node groups.
How can I add name tags to EKS node workers according to their node group names?
I have tried adding "Name" tag in the additional tag sections of each node-group but the tags did not take and my EC2 instance names are empty, while other tags appear.
Here is the configuration - I have skipped the less relevant bits:
module "eks-cluster" {
...
node_groups_defaults = {
disk_size = 128
key_name = var.key_name
subnets = [
aws_subnet.{{}}.id,
aws_subnet.{{}}.id,
]
k8s_labels = {
env = var.environment
}
additional_tags = {
env = var.environment
"k8s.io/cluster-autoscaler/enabled" = "true"
"k8s.io/cluster-autoscaler/${var.cluster-name}" = "true"
}
}
node_groups = {
app = {
name = "app"
.....
k8s_labels = {
nodegroup = "app"
}
additional_tags = {
nodegroup = "app"
Name = "${var.environment}-app-node"
}
}
ml = {
name = "ml"
...
instance_type = "m5.xlarge"
k8s_labels = {
nodegroup = "ml"
}
additional_tags = {
nodegroup = "ml"
Name = "${var.environment}-ml-node"
}
}
}
tags = {
env = var.environment
}
map_roles = [{
......
}]
}
As per documentation Resource: aws_eks_node_group doesn't allow for modifying tags on your instances.
There is a nice feature coming soon to EKS node groups which will allow you to pass a custom userdata script. Using that you will be able to modify programatically tags for your instances. Issues can be tracked -> https://github.com/aws/containers-roadmap/issues/596
UPDATE:
As of 20/08/2020, you can now utilise launch_template with your node group. This will allow you to pass in Name tag. Example:
resource "aws_launch_template" "cluster" {
image_id = data.aws_ssm_parameter.cluster.value
instance_type = "t3.medium"
name = "eks-launch-template-test"
update_default_version = true
tag_specifications {
resource_type = "instance"
tags = {
Name = "eks-node-group-instance-name"
}
}
}
The following Terraform resource works.
resource "aws_autoscaling_group_tag" "your_group_tag" {
autoscaling_group_name = aws_eks_node_group.your_group.resources[0].autoscaling_groups[0].name
tag {
key = "Name"
value = "enter-your-name-here"
propagate_at_launch = true
}
depends_on = [
aws_eks_node_group.your_group
]
}
Had the same issue, can this is the solution I came up with which works great.
First, create the ASG tags via the aws_autoscaling_group_tag resource.
resource "aws_autoscaling_group_tag" "mytag" {
autoscaling_group_name = aws_eks_node_group.main.resources[0].autoscaling_groups[0].name
tag {
key = "foo"
value = "bar"
propagate_at_launch = true
}
depends_on = [aws_eks_node_group.main]
}
Unfortunately this resource block doesn't accept multiple tags, so you'd have to create this resource block individually for each tag.
Another thing to keep in mind, is that the tags are applied to future scaled EC2 instances, not the currently running ones.
Which means, that you need to either manually scale down your nodes and scale back up, or write a bash script and run it as a local provisioner with terraform.
resource "null_resource" "refresh_autoscale" {
provisioner "local-exec" {
command = "cd ${path.module}/scripts ; bash ./scale_refresh.sh"
environment = {
ASG_NAME = aws_eks_node_group.main.resources[0].autoscaling_groups[0].name
CLUSTER_NAME = "foo_cluster"
NODE_GROUP_NAME = "foo_cluster_node"
REGION = var.region
AWS_PROFILE = var.aws_profile
DESIRED_SIZE = var.desired_size
MIN_SIZE = var.min_size
MAX_SIZE = var.max_size
}
}
depends_on = [aws_eks_node_group.main]
}
Your bash script can run commands via the AWS CLI to scale down and up your node groups.
aws --profile ${AWS_PROFILE} --region ${REGION} eks update-nodegroup-config --cluster-name ${CLUSTER_NAME} \
--scaling-config "minSize=0,maxSize=1,desiredSize=0" --nodegroup-name ${NODE_GROUP_NAME}
Because the instances are not scaled down immediately, there is a period of waiting for the scale down to complete. If you have jq installed, you can periodically query the state of your ASG and see how many instances are currently running.
INSTANCE_COUNT=$(aws --profile ${AWS_PROFILE} --region ${REGION} autoscaling describe-auto-scaling-groups --auto-scaling-group-name ${ASG_NAME} \
| jq '.[][0] | .Instances | length')
I just noticed that
"k8s.io/cluster-autoscaler/${var.cluster-name}" = "true"
Might need to be
"k8s.io/cluster-autoscaler/${var.cluster-name}" = "owned"
There is an existing issue with node group to add the "Name" tag on ASG. https://github.com/aws/containers-roadmap/issues/608 (open) and this on terraform end https://github.com/terraform-aws-modules/terraform-aws-eks/issues/860 (closed)
However, there is an alternative to use aws cli command to add tag explicitly.
Try below to use in terraform
resource "null_resource" "add_custom_tags_to_asg" {
for_each = module.eks-cluster.node_groups
provisioner "local-exec" {
command = <<EOF
aws autoscaling create-or-update-tags \
--tags ResourceId=${each.value.resources[0].autoscaling_groups[0].name},ResourceType=auto-scaling-group,Key=Name,Value=k8s-node-groups-${each.value.labels["env"]},PropagateAtLaunch=true
EOF
}
}

How to Horizontal Autoscaler a Kubernetes Deployment

EDIT:
SOLUTION: I forgot to add target_cpu_utilization_percentage to autoscaler.tf file
I want a web-service in Python (or other language) running on Kubernetes but with auto scaling.
I created a Deployment and a Horizontal Autoscaler but is not working.
I'm using Terraform to configure Kubernetes.
I have this files:
Deployments.tf
resource "kubernetes_deployment" "rui-test" {
metadata {
name = "rui-test"
labels {
app = "rui-test"
}
}
spec {
strategy = {
type = "RollingUpdate"
rolling_update = {
max_unavailable = "26%" # This is not working
}
}
selector = {
match_labels = {
app = "rui-test"
}
}
template = {
metadata = {
labels = {
app = "rui-test"
}
}
spec = {
container {
name = "python-test1"
image = "***************************"
}
}
}
}
}
Autoscaler.tf
resource "kubernetes_horizontal_pod_autoscaler" "test-rui" {
metadata {
name = "test-rui"
}
spec {
max_replicas = 10 # THIS IS NOT WORKING
min_replicas = 3 # THIS IS NOT WORKING
scale_target_ref {
kind = "Deployment"
name = "test-rui" # Name of deployment
}
}
}
Service.tf
resource "kubernetes_service" "rui-test" {
metadata {
name = "rui-test"
labels {
app = "rui-test"
}
}
spec {
selector {
app = "rui-test"
}
type = "LoadBalancer" # Use 'cluster_ip = "None"' or 'type = "LoadBalancer"'
port {
name = "http"
port = 8080
}
}
}
When I run kubectl get hpa I see this:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
rui-test Deployment/rui-test <unknown>/80% 1 3 1 1h
Instead of:
rui-test Deployment/rui-test <unknown>/79% 3 10 1 1h
That is what I want.
But if I run kubectl autoscale deployment rui-test --min=3 --max=10 --cpu-percent=81 I see this:
Error from server (AlreadyExists): horizontalpodautoscalers.autoscaling "rui-test" already exists
In kubernetes appear this
You are missing the metrics server. Kubernetes needs to determine current CPU/Memory usage so that it can autoscale up and down.
One way to know if you have the metrics server installed is to run:
$ kubectl top node
$ kubectl top pod
The Horizontal Pod AutoScaler is dependent on resource limits being configured for your Deployment.
From the documentation:
Please note that if some of the pod’s containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the autoscaler will not take any action for that metric.