I get below error in place of metrics when I deploy the helm chart for postgres exporter:
An error has occurred while serving metrics:
7 error(s) occurred:
* collected metric "pg_database_size_bytes" { label:<name:"datname" value:"azure_maintenance" > label:<name:"server" value:"X-dev.postgres.database.azure.com:5432" > gauge:<value:X > } was collected before with the same name and label values
* collected metric "pg_database_size_bytes" { label:<name:"datname" value:"template1" > label:<name:"server" value:"X-dev.postgres.database.azure.com:5432" > gauge:<value:X+06 > } was collected before with the same name and label values
* collected metric "pg_database_size_bytes" { label:<name:"datname" value:"postgres" > label:<name:"server" value:"X-dev.postgres.database.azure.com:5432" > gauge:<value:X+07 > } was collected before with the same name and label values
* collected metric "pg_database_size_bytes" { label:<name:"datname" value:"template0" > label:<name:"server" value:"X-dev.postgres.database.azure.com:5432" > gauge:<value:X+06 > } was collected before with the same name and label values
* collected metric "pg_database_size_bytes" { label:<name:"datname" value:"azure_sys" > label:<name:"server" value:"X-dev.postgres.database.azure.com:5432" > gauge:<value:X+06 > } was collected before with the same name and label values
* collected metric "pg_database_size_bytes" { label:<name:"datname" value:"fhio-temp" > label:<name:"server" value:"X-dev.postgres.database.azure.com:5432" > gauge:<value:X+07 > } was collected before with the same name and label values
* collected metric "pg_database_size_bytes" { label:<name:"datname" value:"X-dev" > label:<name:"server" value:"fhiodatabase-dev.postgres.database.azure.com:5432" > gauge:<value:X+X> } was collected before with the same name and label values
Related
I read through the karpenter document at https://karpenter.sh/v0.16.1/getting-started/getting-started-with-terraform/#install-karpenter-helm-chart. I followed instructions step by step. I got errors at the end.
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
DEBUG controller.provisioning Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.topologySpreadConstraints = {"maxSkew":1,"topologyKey":"topology.kubernetes.io/zone","whenUnsatisfiable":"ScheduleAnyway","labelSelector":{"matchLabels":{"app.kubernetes.io/instance":"karpenter","app.kubernetes.io/name":"karpenter"}}} {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
2022-09-10T00:13:13.122Z
ERROR controller.provisioning Could not schedule pod, incompatible with provisioner "default", incompatible requirements, key karpenter.sh/provisioner-name, karpenter.sh/provisioner-name DoesNotExist not in karpenter.sh/provisioner-name In [default] {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
Below is the source code:
cat main.tf
terraform {
required_version = "~> 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 4.0"
}
helm = {
source = "hashicorp/helm"
version = "~> 2.5"
}
kubectl = {
source = "gavinbunney/kubectl"
version = "~> 1.14"
}
}
}
provider "aws" {
region = "us-east-1"
}
locals {
cluster_name = "karpenter-demo"
# Used to determine correct partition (i.e. - `aws`, `aws-gov`, `aws-cn`, etc.)
partition = data.aws_partition.current.partition
}
data "aws_partition" "current" {}
module "vpc" {
# https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/latest
source = "terraform-aws-modules/vpc/aws"
version = "3.14.4"
name = local.cluster_name
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
one_nat_gateway_per_az = false
public_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/elb" = 1
}
private_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = 1
}
}
module "eks" {
# https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest
source = "terraform-aws-modules/eks/aws"
version = "18.29.0"
cluster_name = local.cluster_name
cluster_version = "1.22"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
# Required for Karpenter role below
enable_irsa = true
node_security_group_additional_rules = {
ingress_nodes_karpenter_port = {
description = "Cluster API to Node group for Karpenter webhook"
protocol = "tcp"
from_port = 8443
to_port = 8443
type = "ingress"
source_cluster_security_group = true
}
}
node_security_group_tags = {
# NOTE - if creating multiple security groups with this module, only tag the
# security group that Karpenter should utilize with the following tag
# (i.e. - at most, only one security group should have this tag in your account)
"karpenter.sh/discovery/${local.cluster_name}" = local.cluster_name
}
# Only need one node to get Karpenter up and running.
# This ensures core services such as VPC CNI, CoreDNS, etc. are up and running
# so that Karpenter can be deployed and start managing compute capacity as required
eks_managed_node_groups = {
initial = {
instance_types = ["m5.large"]
# Not required nor used - avoid tagging two security groups with same tag as well
create_security_group = false
min_size = 1
max_size = 1
desired_size = 1
iam_role_additional_policies = [
"arn:${local.partition}:iam::aws:policy/AmazonSSMManagedInstanceCore", # Required by Karpenter
"arn:${local.partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy",
"arn:${local.partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly", #for access to ECR images
"arn:${local.partition}:iam::aws:policy/CloudWatchAgentServerPolicy"
]
tags = {
# This will tag the launch template created for use by Karpenter
"karpenter.sh/discovery/${local.cluster_name}" = local.cluster_name
}
}
}
}
#The EKS module creates an IAM role for the EKS managed node group nodes. We’ll use that for Karpenter.
#We need to create an instance profile we can reference.
#Karpenter can use this instance profile to launch new EC2 instances and those instances will be able to connect to your cluster.
resource "aws_iam_instance_profile" "karpenter" {
name = "KarpenterNodeInstanceProfile-${local.cluster_name}"
role = module.eks.eks_managed_node_groups["initial"].iam_role_name
}
#Create the KarpenterController IAM Role
#Karpenter requires permissions like launching instances, which means it needs an IAM role that grants it access. The config
#below will create an AWS IAM Role, attach a policy, and authorize the Service Account to assume the role using IRSA. We will
#create the ServiceAccount and connect it to this role during the Helm chart install.
module "karpenter_irsa" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "5.3.3"
role_name = "karpenter-controller-${local.cluster_name}"
attach_karpenter_controller_policy = true
karpenter_tag_key = "karpenter.sh/discovery/${local.cluster_name}"
karpenter_controller_cluster_id = module.eks.cluster_id
karpenter_controller_node_iam_role_arns = [
module.eks.eks_managed_node_groups["initial"].iam_role_arn
]
oidc_providers = {
ex = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["karpenter:karpenter"]
}
}
}
#Install Karpenter Helm Chart
#Use helm to deploy Karpenter to the cluster. We are going to use the helm_release Terraform resource to do the deploy and pass in the
#cluster details and IAM role Karpenter needs to assume.
provider "helm" {
kubernetes {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", local.cluster_name]
}
}
}
resource "helm_release" "karpenter" {
namespace = "karpenter"
create_namespace = true
name = "karpenter"
repository = "https://charts.karpenter.sh"
chart = "karpenter"
version = "v0.16.1"
set {
name = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
value = module.karpenter_irsa.iam_role_arn
}
set {
name = "clusterName"
value = module.eks.cluster_id
}
set {
name = "clusterEndpoint"
value = module.eks.cluster_endpoint
}
set {
name = "aws.defaultInstanceProfile"
value = aws_iam_instance_profile.karpenter.name
}
}
#Provisioner
#Create a default provisioner using the command below. This provisioner configures instances to connect to your cluster’s endpoint and
#discovers resources like subnets and security groups using the cluster’s name.
#This provisioner will create capacity as long as the sum of all created capacity is less than the specified limit.
provider "kubectl" {
apply_retry_count = 5
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
load_config_file = false
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}
}
resource "kubectl_manifest" "karpenter_provisioner" {
yaml_body = <<-YAML
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
limits:
resources:
cpu: 1000
provider:
subnetSelector:
Name: "*private*"
securityGroupSelector:
karpenter.sh/discovery/${module.eks.cluster_id}: ${module.eks.cluster_id}
tags:
karpenter.sh/discovery/${module.eks.cluster_id}: ${module.eks.cluster_id}
ttlSecondsAfterEmpty: 30
YAML
depends_on = [
helm_release.karpenter
]
}
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 0
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
terminationGracePeriodSeconds: 0
containers:
- name: inflate
image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
resources:
requests:
cpu: 1
EOF
kubectl scale deployment inflate --replicas 5
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
DEBUG controller.provisioning Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.topologySpreadConstraints = {"maxSkew":1,"topologyKey":"topology.kubernetes.io/zone","whenUnsatisfiable":"ScheduleAnyway","labelSelector":{"matchLabels":{"app.kubernetes.io/instance":"karpenter","app.kubernetes.io/name":"karpenter"}}} {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
2022-09-10T00:13:13.122Z
ERROR controller.provisioning Could not schedule pod, incompatible with provisioner "default", incompatible requirements, key karpenter.sh/provisioner-name, karpenter.sh/provisioner-name DoesNotExist not in karpenter.sh/provisioner-name In [default] {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
I belive this is due to the pod topology defined in the Karpenter deployment here:
https://github.com/aws/karpenter/blob/main/charts/karpenter/values.yaml#L73-L77
, you can read further on what pod topologySpreadConstraints does here:
https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/
If you increase the desired_size to 2 which matches the default deployment replicas above, that should resove the error.
I am trying to add the boot disk size to the node auto-provisioned Kubernetes cluster as follows:
resource "google_container_cluster" "gc-dev-kube-ds0" {
.
.
.
cluster_autoscaling {
enabled = true
resource_limits {
resource_type = "cpu"
minimum = 4
maximum = 150
}
resource_limits {
resource_type = "memory"
minimum = 4
maximum = 600
}
resource_limits {
resource_type = "nvidia-tesla-v100"
minimum = 0
maximum = 4
}
}
disk_size_gb = 200
}
but I am getting the following error:
Error: Unsupported argument
on kubernetes.tf line 65, in resource "google_container_cluster" "gc-dev-kube-ds0":
65: disk_size_gb = 200
An argument named "disk_size_gb" is not expected here.
Also checked the terraform documentation but nothing is mentioned on this.
The error is getting because the disk_size_gb module must be in the node_config block, as the following.
node_config {
disk_size_gb = 200
}
The TerraForm documentation about google_container_cluster the module needs to be under the block.
In my terraform code: ( 0.15.x )
resource "kubernetes_service" "keycloak_service" {
metadata {
name = "balab-service"
namespace = "project-ns
annotations = {
"alb.ingress.kubernetes.io/aws-load-balancer-healthcheck-interval-seconds" = trim("'60'", "\\"")
}
}
spec {
....
...
In EKS/kubectl Service
I keep seeing
alb.ingress.kubernetes.io/aws-load-balancer-healthcheck-interval-seconds:
"'60'"
$ terraform console
trim("'60'", "\"")
"'60'"
I expect
alb.ingress.kubernetes.io/aws-load-balancer-healthcheck-interval-seconds:
'60'
I tried to load Postresql data into Geomesa (with a Cassandra datastore), by the JDBC Converter.
Loading from shape works fine, so the Cassandra and GeoMesa setup is okay
Next I tried to load data from PostgreSQL
Command:
echo "SELECT year, geom, grondgebruik, crop_code, crop_name, fieldid, global_id, area, perimeter, geohash FROM v_gewaspercelen2018" | bin/geomesa-cassandra ingest -c catalog -P cassandraserver:9042 -k agrodatacube -f parcel -C geomesa.converters.parcel -u -p
The converter definition file geomesa.converters.parcel looks like this:
geomesa.converters.parcel = {
type = "jdbc"
connection = "dbc:postgresql://postgresserver:5432/agrodatacube"
id-field="toString($5)"
fields = [
{ name = "fieldid", transform = "$5" }
{ name = "global_id", transform = "$6" }
{ name = "year", transform = "$0" }
{ name = "area", transform = "$7" }
{ name = "perimeter", transform = "$8" }
{ name = "grondgebruik", transform = "$2" }
{ name = "crop_code", transform = "$3" }
{ name = "crop_name", transform = "$4" }
{ name = "geohash", transform = "$9" }
{ name = "geom", transform = "$1" }
]
}
The geomesa output is:
INFO Schema 'parcel' exists
INFO Running ingestion in local mode
INFO Ingesting from stdin with 1 thread
[ ] 0% complete 0 i[ ] 0% complete 0 ingested 0 failed in 00:00:01
ERROR Fatal error running local ingest worker on <stdin>
[ ] 0% complete 0 i[ ] 0% complete 0 ingested 0 failed in 00:00:01
INFO Local ingestion complete in 00:00:01
INFO Ingested 0 features with no failures for file: <stdin>
WARN Some files caused errors, ingest counts may not be accurate
Does someone have a clue what is wrong here?
You can check in the logs folder for more detailed errors. However, just at a first glance, the JDBC converter follows standard result set numbering, meaning the first field is $1 (not $0). In addition, you may need to transform your geometry with a transform function, i.e. geometry($2).
Thanks Emilio, both suggestions made sence!
Made the converter field count start at 1
Inside the converter definition file changed
{ name = "geom", transform = "$2" }
into
{ name = "geom", transform = "geometry($2)" }
The SQL Select command should be:
SELECT year, ST_AsText(geom), .... FROM v_gewaspercelen2018
By the way, username and password are part of the connection-string (which is inside file geomesa.converters.parcel):
connection =
"dbc:postgresql://postgresserver:5432/agrodatacube?user=username&password=password"
So the -u and -p flags do not appear in the final command:
echo "SELECT year, ST_AsText(geom), grondgebruik, crop_code,
crop_name, fieldid, global_id, area, perimeter, geohash FROM
v_gewaspercelen2018" | bin/geomesa-cassandra ingest -c catalog -P
cassandraserver:9042 -k agrodatacube -f parcel -C
geomesa.converters.parcel
With these changes it works.
Thanks again!
Hugo
EDIT:
SOLUTION: I forgot to add target_cpu_utilization_percentage to autoscaler.tf file
I want a web-service in Python (or other language) running on Kubernetes but with auto scaling.
I created a Deployment and a Horizontal Autoscaler but is not working.
I'm using Terraform to configure Kubernetes.
I have this files:
Deployments.tf
resource "kubernetes_deployment" "rui-test" {
metadata {
name = "rui-test"
labels {
app = "rui-test"
}
}
spec {
strategy = {
type = "RollingUpdate"
rolling_update = {
max_unavailable = "26%" # This is not working
}
}
selector = {
match_labels = {
app = "rui-test"
}
}
template = {
metadata = {
labels = {
app = "rui-test"
}
}
spec = {
container {
name = "python-test1"
image = "***************************"
}
}
}
}
}
Autoscaler.tf
resource "kubernetes_horizontal_pod_autoscaler" "test-rui" {
metadata {
name = "test-rui"
}
spec {
max_replicas = 10 # THIS IS NOT WORKING
min_replicas = 3 # THIS IS NOT WORKING
scale_target_ref {
kind = "Deployment"
name = "test-rui" # Name of deployment
}
}
}
Service.tf
resource "kubernetes_service" "rui-test" {
metadata {
name = "rui-test"
labels {
app = "rui-test"
}
}
spec {
selector {
app = "rui-test"
}
type = "LoadBalancer" # Use 'cluster_ip = "None"' or 'type = "LoadBalancer"'
port {
name = "http"
port = 8080
}
}
}
When I run kubectl get hpa I see this:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
rui-test Deployment/rui-test <unknown>/80% 1 3 1 1h
Instead of:
rui-test Deployment/rui-test <unknown>/79% 3 10 1 1h
That is what I want.
But if I run kubectl autoscale deployment rui-test --min=3 --max=10 --cpu-percent=81 I see this:
Error from server (AlreadyExists): horizontalpodautoscalers.autoscaling "rui-test" already exists
In kubernetes appear this
You are missing the metrics server. Kubernetes needs to determine current CPU/Memory usage so that it can autoscale up and down.
One way to know if you have the metrics server installed is to run:
$ kubectl top node
$ kubectl top pod
The Horizontal Pod AutoScaler is dependent on resource limits being configured for your Deployment.
From the documentation:
Please note that if some of the pod’s containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the autoscaler will not take any action for that metric.