scala: akka clustering not working - actors don't start

scala: akka clustering not working - actors don't start - scala

This is how the actor section in my application.conf looks like -
actor {
provider = "akka.cluster.ClusterActorRefProvider"
unstarted-push-timeout = 100s
default-mailbox {
mailbox-type = "akka.dispatch.SingleConsumerOnlyUnboundedMailbox"
mailbox-push-timeout-time = 2s
}
default-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 16
parallelism-factor = 4.0
parallelism-max = 64
}
throughput = 1
}
job-manager-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 16
parallelism-factor = 4.0
parallelism-max = 64
}
throughput = 1
}
}
remote {
log-remote-lifecycle-events = on
netty.tcp {
hostname = "0.0.0.0"
port = 2557
}
}
extensions = [
"akka.contrib.pattern.DistributedPubSubExtension"
]
cluster {
seed-nodes = [
"akka.tcp://dispatcher#0.0.0.0:2557"
]
auto-down-unreachable-after = 30s
}
}
akka.contrib.cluster.pub-sub {
name = dispatcherPubSubMediator
role = ""
routing-logic = round-robin
gossip-interval = 1s
removed-time-to-live = 120s
}
This is how I create actors -
val aRef = instances match {
case 1 =>
system.actorOf(Props[T].withDispatcher(dispatcher), name)
case _ =>
system.actorOf(
ClusterRouterPool(AdaptiveLoadBalancingPool(
SystemLoadAverageMetricsSelector), ClusterRouterPoolSettings(
totalInstances = instances * 64, maxInstancesPerNode = instances,
allowLocalRoutees = isLocal, useRole = None)
).props(Props[T]).withDispatcher(dispatcher), name)
}
ClusterReceptionistExtension(system).registerService(aRef)
The single instance (local) creation works fine. But the cluster pool instantiation is not working (no exception/error, but constructor/preStart etc are not called.
Any help appreciated.

Related

Why terraform is not allowing me to use the image_pull_secrets?

I have an image to pull from a private registry. I did all the configs and added the secret to the pod config under pod.spec.image_pull_secrets. But I am getting an error like
An argument named "image_pull_secrets" is not expected here. Did you mean to define a block of type "image_pull_secrets"?
As per documentation this should be ok.
https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/pod#nested-blocks
resource "kubernetes_pod" "main" {
count = data.coder_workspace.me.start_count
metadata {
name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}"
namespace = var.workspaces_namespace
}
spec {
image_pull_secrets = {
name = ["coder-ocir-secret"]
}
security_context {
# run_as_user = "1000"
fs_group = "1000"
}
init_container {
name = "init-eclipse"
image = "busybox:latest"
command = [ "chown","-R","1000:1000","/data"]
security_context {
run_as_user = "0"
privileged = "true"
allow_privilege_escalation = "true"
read_only_root_filesystem = "false"
run_as_non_root = "false"
capabilities {
add = ["CAP_SYS_ADMIN","CHOWN",
"FOWNER",
"DAC_OVERRIDE"]
drop = [
"ALL"]
}
}
volume_mount {
mount_path = "/data"
name = "home-coder-vol-${data.coder_workspace.me.owner}-${lower(data.coder_workspace.me.name)}"
}
}
container {
name = "eclipse"
image = "docker.io/manumaan/eclipsevncv2.2:latest"
command = ["sh", "-c", coder_agent.coder.init_script]
image_pull_policy = "Always"
security_context {
run_as_user = "1000"
# fs_group = "1000"
}
env {
name = "CODER_AGENT_TOKEN"
value = coder_agent.coder.token
}
resources {
requests = {
cpu = "${var.cpu}"
memory = "${var.memory}G"
ephemeral-storage = "2Gi"
}
limits = {
cpu = "${var.cpu}"
memory = "${var.memory}G"
ephemeral-storage = "4Gi"
}
}
volume_mount {
mount_path = "/home/coder"
name = "home-coder-vol-${data.coder_workspace.me.owner}-${lower(data.coder_workspace.me.name)}"
}
}
I also tried giving it inside container, after all containers etc inside spec but it does not accept it.I am going crazy!
Also made it not a list: No difference.
image_pull_secrets = {
name = "coder-ocir-secret"
}

This might be caused by a typo, image_pull_secrets is a block, so you don't need the =, neither the square brackets ([]) here:
image_pull_secrets = {
name = ["coder-ocir-secret"]
}
It should be instead:
image_pull_secrets {
name = "coder-ocir-secret"
}
If you need to define multiple pull_secrets you can define multiple ones, or use dynamic blocks

Make sure your block is perfect and indentation also, this one is working for me
resource "kubernetes_pod" "main" {
metadata {
name = "coder-name"
namespace = "default"
}
spec {
image_pull_secrets {
name = "coder-ocir-secret"
}
security_context {
# run_as_user = "1000"
fs_group = "1000"
}
init_container {
name = "init-eclipse"
image = "busybox:latest"
command = [ "chown","-R","1000:1000","/data"]
security_context {
run_as_user = "0"
privileged = "true"
allow_privilege_escalation = "true"
read_only_root_filesystem = "false"
run_as_non_root = "false"
capabilities {
add = ["CAP_SYS_ADMIN","CHOWN",
"FOWNER",
"DAC_OVERRIDE"]
drop = [
"ALL"]
}
}
volume_mount {
mount_path = "/data"
name = "home-coder-vol-fake-name"
}
}
container {
name = "eclipse"
image = "docker.io/manumaan/eclipsevncv2.2:latest"
command = ["sh", "-c", "command"]
image_pull_policy = "Always"
security_context {
run_as_user = "1000"
# fs_group = "1000"
}
env {
name = "CODER_AGENT_TOKEN"
value = "value"
}
resources {
requests = {
cpu = "1"
memory = "1G"
ephemeral-storage = "2Gi"
}
limits = {
cpu = "1"
memory = "2G"
ephemeral-storage = "4Gi"
}
}
volume_mount {
mount_path = "/home/coder"
name = "home-coder-vol-fake-name"
}
}
}
}

how to properly map ports on ecs service

so im having a fair bit of trouble because a cannot figure how ports should be configure in my security groups and my load balancer i was using an example in which the frontend is on port 80 but the application that i need to run needs to be on port 3000 son how would i edit this files to run on port 3000 instead of 80
#Internal ALB configurations
internal_alb_config = {
name = "Internal-Alb"
listeners = {
"HTTP" = {
listener_port = 80
listener_protocol = "HTTP"
}
}
ingress_rules = [
{
from_port = 0
to_port = 0
protocol = "tcp"
cidr_blocks = ["10.10.0.0/16"]
}
]
egress_rules = [
{
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["10.10.0.0/16"]
}
]
}
public_alb_config = {
name = "Public-Alb"
listeners = {
"HTTP" = {
listener_port = 80
listener_protocol = "HTTP"
}
}
ingress_rules = [
{
from_port = 0
to_port = 0
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
]
egress_rules = [
{
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
]
}
microservice_config = {
"Memo-frontend" = {
name = "Memo-frontend"
is_public = true
container_port = 3000
host_port = 3000
cpu = 256
memory = 512
desired_count = 1
alb_target_group = {
port = 3000
protocol = "HTTP"
path_pattern = ["/*"]
health_check_path = "/"
priority = 1
}
auto_scaling = {
max_capacity = 2
min_capacity = 1
cpu = {
target_value = 75
}
memory = {
target_value = 75
}
}
},
"Memo-Backend" = {
name = "Memo-Backend"
is_public = false
container_port = 5000
host_port = 5000
cpu = 256
memory = 512
desired_count = 1
alb_target_group = {
port = 5000
protocol = "HTTP"
path_pattern = ["/Memo-Backend*"]
health_check_path = "/"
priority = 1
}
auto_scaling = {
max_capacity = 2
min_capacity = 1
cpu = {
target_value = 75
}
memory = {
target_value = 75
}
}
}
}
the full code can be found here https://github.com/shashimal/terraform-ecs/blob/master/terraform.tfvars
i need the frontend container to run on port 3000 instead of 80 bc im hosting a react app

GKE node pool dont autoscaling to 0 nodes

I made a gke cluster using Terraform
resource "google_container_cluster" "airflow_prd" {
name = "airflow-prd"
remove_default_node_pool = true
initial_node_count = 1
network = var.vpc
location = var.zone_prd
subnetwork = var.subnet_prd
project = "xxxxxx"
private_cluster_config {
enable_private_endpoint = false
enable_private_nodes = true
master_ipv4_cidr_block = "172.13.0.0/28"
master_global_access_config {
enabled = true
}
}
ip_allocation_policy {
cluster_secondary_range_name = ""
}
}
resource "google_container_node_pool" "default_prd" {
name = "default"
cluster = google_container_cluster.airflow_prd.name
initial_node_count = 2
location = var.zone_prd
node_config {
machine_type = "e2-small"
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
service_account = "xxxxxxxxx.iam.gserviceaccount.com"
}
autoscaling {
max_node_count = 4
min_node_count = 2
}
}
resource "google_container_node_pool" "airflow_prd" {
name = "airflow"
cluster = google_container_cluster.airflow_prd.name
initial_node_count = 0
location = var.zone_prd
node_config {
machine_type = "e2-standard-8"
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
service_account = "xxxxxx.iam.gserviceaccount.com"
}
autoscaling {
max_node_count = 1
min_node_count = 0
}
}
resource "google_container_node_pool" "etl_32_prd" {
name = "etl-32"
cluster = google_container_cluster.airflow_prd.name
initial_node_count = 0
location = var.zone_prd
node_config {
machine_type = "e2-standard-8"
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
service_account = "xxxxxx.iam.gserviceaccount.com"
}
autoscaling {
max_node_count = 4
min_node_count = 0
}
}
The problem is with node pool etl-32. It automatically creates nodes when needed. When it is no longer needed, the number of nodes is reduced to 1, not to 0, which is what I want. How to make it go down to 0? The system pods are all in node pool default_prd, which always has 2 nodes

Assign memory ressource of Pods from Terraform

I have a K8S cluster on GCP where I have to run Data Science workload.
Some of they are in status "Evicted" because
The node was low on resource: memory. Container base was using 5417924Ki, which exceeds its request of 0.
I manage my architecture with Terraform and know how to manage cluster auto-scaling but I have no idea, even after reading the doc, how to manage this at a Pod level
resource "google_container_cluster" "k8s_cluster" {
name = "my-cluster-name
description = ""
location = var.default_region
network = var.network
subnetwork = var.subnetwork
initial_node_count = 1
remove_default_node_pool = true
ip_allocation_policy {
# VPC-native cluster using alias IP addresses
cluster_secondary_range_name = "gke-pods"
services_secondary_range_name = "gke-services"
}
maintenance_policy {
daily_maintenance_window {
start_time = "03:00"
}
}
master_authorized_networks_config {
cidr_blocks {
display_name = var.airflow.display_name
cidr_block = var.airflow.cidr_block
}
cidr_blocks {
display_name = var.gitlab.display_name
cidr_block = var.gitlab.cidr_block
}
}
network_policy {
enabled = false
}
private_cluster_config {
enable_private_endpoint = true
enable_private_nodes = true
master_ipv4_cidr_block = var.vpc_range_k8s_master
}
resource_labels = {
zone = var.zone
role = var.role
env = var.environment
}
# Disable basic auth and client certificate
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
cluster_autoscaling {
enabled = true
resource_limits {
resource_type = "cpu"
minimum = 1
maximum = 4
}
resource_limits {
resource_type = "memory"
minimum = 1
maximum = 2
}
}
}

Akka: How to start a simple local cluster?

akka {
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp {
hostname = "127.0.0.1"
port = 0
}
}
}
akka.cluster {
seed-nodes = [
"akka.tcp://MyCluster#127.0.0.1:2551",
"akka.tcp://MyCluster#127.0.0.1:2552"
]
}
object AndromedaApiClusterActivator extends App {
val system = ActorSystem("MyCluster", ConfigFactory.load())
val clusterController = system.actorOf(Props[MyCluster], name = "MyCluster")
}
class MyCluster extends Actor {
val log = Logging(context.system, this)
val cluster = Cluster(context.system)
override def preStart() {
cluster.subscribe(self, classOf[MemberEvent], classOf[UnreachableMember])
}
override def postStop() {
cluster.unsubscribe(self)
}
override def receive = {
case x: MemberEvent => log.info("MemberEvent: {}", x)
case x: UnreachableMember => log.info("UnreachableMember {}: ", x)
}
}
When I run it I get:
Association with remote system [akka.tcp://MyCluster#127.0.0.1:2552] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://MyCluster#127.0.0.1:2552]] Caused by: [Connection refused: /127.0.0.1:2552]
Association with remote system [akka.tcp://MyCluster#127.0.0.1:2551] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://MyCluster#127.0.0.1:2551]] Caused by: [Connection refused: /127.0.0.1:2551]
I cannot find an explanation. Any help?

You should start 2 nodes first and then connect to them. To illustrate it, I will create both systems inside one App, but you can run 2 instances of the App with different configs/ports specified in command line.
object Main extends App {
val system1 = ActorSystem("MyCluster1", ConfigFactory.load("node1.conf"))
val system2 = ActorSystem("MyCluster2", ConfigFactory.load("node2.conf"))
val clusterController = system1.actorOf(Props[MyCluster], name = "MyCluster1")
}
application.conf:
akka {
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp {
hostname = "127.0.0.1"
port = 2552
}
}
}
akka.cluster {
seed-nodes = [
"akka.tcp://MyCluster1#127.0.0.1:2552",
"akka.tcp://MyCluster2#127.0.0.1:2553"
]
}
To start other nodes, I suggest to specify different configs with node1.conf:
include "application"
akka.remote.netty.tcp.port = 2552
node2.conf:
include "application"
akka.remote.netty.tcp.port = 2553