How debug akka association porcess? - scala

Here is a scenario:
I have packaged scala project with spray into jar file.
Launch jar file on RedHat 6.5 on Virtual Box (ip - 192.168.1.38)
Launch jar file on RedHat 6.5 on Virtual Box (ip - 192.168.1.41)
Everything works locally - I can send REST request to each virtual machine and get response.
Problem
Akka systems can not became to cluster. I run 192.168.1.38 with default settings, but 192.168.1.41 have an additional property - akka.cluster.seed-nodes which is set to akka.tcp://mySystem#192.168.1.38:2551. So I get:
[WARN] [12/09/2014 17:10:24.043] [mySystem-akka.remote.default-remote-dispatcher-8] [akka.tcp://mySystem#192.168.1.41:2551/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FmySystem%40192.168.1.38%3A2551-0] Association with remote system [akka.tcp://mySystem#192.168.1.38:2551] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://mySystem#192.168.1.38:2551]].
No other errors or warning. Also how can I test akka association or print debug akka association settings?
Also can linux settings influence to akka association?

Most probably iptables is blocking particular port, if it's your test configuration just disable iptables.
service iptables save
service iptables stop
chkconfig iptables off
service ip6tables save
service ip6tables stop
chkconfig ip6tables off
If it will not help try to check you SELinux configuration using command getenforce and the same for test purposes you can completely disable it. SELinux manual
In case of your application.conf, try using following configuration for each node:
akka {
log-dead-letters = on
loglevel = "debug"
actor
{
provider = "akka.cluster.ClusterActorRefProvider"
}
extensions = ["akka.contrib.pattern.ClusterReceptionistExtension"]
remote {
log-remote-lifecycle-events = off
netty.tcp {
port = 6001
}
}
cluster {
seed-nodes = [
"akka.tcp://ActorSystem#192.168.1.38:6001",
"akka.tcp://ActorSystem#192.168.1.41:6001"
]
auto-down-unreachable-after = 10s
}
}
All the logs related to the cluster nodes are logged as info but having debug log level in test environment is in general good idea.
When the second, node will join the cluster, you should notice following log:
INFO [ActorSystem-akka.actor.default-dispatcher-4] [Cluster(akka://ActorSystem)] - Cluster Node [akka.tcp://ActorSystem#10.0.1.41:6001] - Marking node(s) as REACHABLE [Member(address = akka.tcp://ActorSystem#10.0.1.41:6001, status = Up)]
Cluster state could be also monitored using jmx akka.Cluster MXBean
{ "self-address": "akka.tcp://ActorSystem#10.0.1.82:6001", "members": [ { "address": "akka.tcp://ActorSystem#10.0.1.82:6001", "status": "Up" } ], "unreachable": [ ] }

Related

Linkerd inbound port annotation leads to "Failed to bind inbound listener"

We are using Linkerd 2.11.1 on Azure AKS Kubernetes. Amongst others there is a Deployment using using an Alpine Linux image containing Apache/mod_php/PHP8 serving an API. HTTPS is resolved by Traefik v2 with cert-manager, so that in coming traffic to the APIs is on port 80. The Linkerd proxy container is injected as a Sidecar.
Recently I saw that the API containers return 504 errors during a short period of time when doing a Rolling deployment. In the Sidecars log, I found the following :
[ 0.000590s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[ 0.001062s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[ 0.001078s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[ 0.001081s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[ 0.001083s] INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190
[ 0.001085s] INFO ThreadId(01) linkerd2_proxy: Local identity is default.my-api.serviceaccount.identity.linkerd.cluster.local
[ 0.001088s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.001090s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.014676s] INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity: default.my-api.serviceaccount.identity.linkerd.cluster.local
[ 3674.769855s] INFO ThreadId(01) inbound:server{port=80}: linkerd_app_inbound::detect: Handling connection as opaque timeout=linkerd_proxy_http::version::Version protocol detection timed out after 10s
My guess is that this detection leads to the 504 errors somehow. However, if I add the linkerd inbound port annotation to the pod template (terraform syntax):
resource "kubernetes_deployment" "my_api" {
metadata {
name = "my-api"
namespace = "my-api"
labels = {
app = "my-api"
}
}
spec {
replicas = 20
selector {
match_labels = {
app = "my-api"
}
}
template {
metadata {
labels = {
app = "my-api"
}
annotations = {
"config.linkerd.io/inbound-port" = "80"
}
}
I get the following:
time="2022-03-01T14:56:44Z" level=info msg="Found pre-existing key: /var/run/linkerd/identity/end-entity/key.p8"
time="2022-03-01T14:56:44Z" level=info msg="Found pre-existing CSR: /var/run/linkerd/identity/end-entity/csr.der"
[ 0.000547s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
thread 'main' panicked at 'Failed to bind inbound listener: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }', /github/workspace/linkerd/app/src/lib.rs:195:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Can somebody tell me why it fails to bind the inbound listener?
Any help is much appreciated,
thanks,
Pascal
Found it : Kubernetes sends asynchronuously requests to shutdown the pods and to no longer send traffic to them. And if the pod shuts down faster than it's removal from the IP lists, it can receive requests when already being dead.
To fix this, I added a preStop lifecycle hook to the application container:
lifecycle {
pre_stop {
exec {
command = ["/bin/sh", "-c" , "sleep 5"]
}
}
}
and the following annotation to pod template :
annotations = {
"config.alpha.linkerd.io/proxy-wait-before-exit-seconds" = "10"
}
Documented here :
https://linkerd.io/2.11/tasks/graceful-shutdown/
and here :
https://blog.gruntwork.io/delaying-shutdown-to-wait-for-pod-deletion-propagation-445f779a8304
annotations = {
"config.linkerd.io/inbound-port" = "80"
}
I don't think you want this setting. Linkerd will transparently proxy connections without you setting anything.
This setting configures Linkerd's proxy to try to listen on port 80. This would likely conflict with your web server's port configuration; but the specific error you're hitting is that the Linkerd proxy does not run as root and so it does not have permission to bind port 80.
I'd expect it all to work if you removed that annotation :)

Consul agent on kubernetes, on node or pod?

I deployed an aws eks cluster via terraform. I also deployed Consul following hasicorp’s tutorial and I see the nodes in consul’s UI.
Now I’m wondering how al the consul agents will know about the pods I deploy? I deploy something and it’s not shown anywhere on consul.
I can’t find any documentation as to how to register pods (services) on consul via the node’s consul agent, do I need to configure that somewhere? Should I not use the node’s agent and register the service straight from the pod? Hashicorp discourages this since it may increase resource utilization depending on how many pods one deploy on a given node. But then how does the node’s agent know about my services deployed on that node?
Moreover, when I deploy a pod in a node and ssh into the node, and install consul, consul’s agent can’t find the consul server (as opposed from the node, which can find it)
EDIT:
Bottom line is I can't find WHERE to add the configuration. If I execute ON THE POD:
consul members
It works properly and I get:
Node Address Status Type Build Protocol DC Segment
consul-consul-server-0 10.0.103.23:8301 alive server 1.10.0 2 full <all>
consul-consul-server-1 10.0.101.151:8301 alive server 1.10.0 2 full <all>
consul-consul-server-2 10.0.102.112:8301 alive server 1.10.0 2 full <all>
ip-10-0-101-129.ec2.internal 10.0.101.70:8301 alive client 1.10.0 2 full <default>
ip-10-0-102-175.ec2.internal 10.0.102.244:8301 alive client 1.10.0 2 full <default>
ip-10-0-103-240.ec2.internal 10.0.103.245:8301 alive client 1.10.0 2 full <default>
ip-10-0-3-223.ec2.internal 10.0.3.249:8301 alive client 1.10.0 2 full <default>
But if i execute:
# consul agent -datacenter=voip-full -config-dir=/etc/consul.d/ -log-file=log-file -advertise=$(wget -q -O - http://169.254.169.254/latest/meta-data/local-ipv4)
I get the following error:
==> Starting Consul agent...
Version: '1.10.1'
Node ID: 'f10070e7-9910-06c7-0e12-6edb6cc4c9b9'
Node name: 'ip-10-0-3-223.ec2.internal'
Datacenter: 'voip-full' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Cluster Addr: 10.0.3.223 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
==> Log data will now stream in as it occurs:
2021-08-16T18:23:06.936Z [WARN] agent: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
2021-08-16T18:23:06.936Z [WARN] agent: Node name "ip-10-0-3-223.ec2.internal" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2021-08-16T18:23:06.946Z [WARN] agent.auto_config: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
2021-08-16T18:23:06.947Z [WARN] agent.auto_config: Node name "ip-10-0-3-223.ec2.internal" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2021-08-16T18:23:06.948Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: ip-10-0-3-223.ec2.internal 10.0.3.223
2021-08-16T18:23:06.948Z [INFO] agent.router: Initializing LAN area manager
2021-08-16T18:23:06.950Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=udp
2021-08-16T18:23:06.950Z [WARN] agent.client.serf.lan: serf: Failed to re-join any previously known node
2021-08-16T18:23:06.950Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=tcp
2021-08-16T18:23:06.951Z [INFO] agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
2021-08-16T18:23:06.951Z [WARN] agent: DEPRECATED Backwards compatibility with pre-1.9 metrics enabled. These metrics will be removed in a future version of Consul. Set `telemetry { disable_compat_1.9 = true }` to disable them.
2021-08-16T18:23:06.953Z [INFO] agent: started state syncer
2021-08-16T18:23:06.953Z [INFO] agent: Consul agent running!
2021-08-16T18:23:06.953Z [WARN] agent.router.manager: No servers available
2021-08-16T18:23:06.954Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2021-08-16T18:23:34.169Z [WARN] agent.router.manager: No servers available
2021-08-16T18:23:34.169Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
So where to add the config?
I also tried adding a service in k8s pointing to the pod, but the service doesn't come up on consul's UI...
What do you guys recommend?
Thanks
Consul knows where these services are located because each service
registers with its local Consul client. Operators can register
services manually, configuration management tools can register
services when they are deployed, or container orchestration platforms
can register services automatically via integrations.
if you planning to use manual option you have to register the service into the consul.
Something like
echo '{
"service": {
"name": "web",
"tags": [
"rails"
],
"port": 80
}
}' > ./consul.d/web.json
You can find the good example at : https://thenewstack.io/implementing-service-discovery-of-microservices-with-consul/
Also this is a very nice document for having detailed configuration of the health check and service discovery : https://cloud.spring.io/spring-cloud-consul/multi/multi_spring-cloud-consul-discovery.html
Official document : https://learn.hashicorp.com/tutorials/consul/get-started-service-discovery
BTW, I was finally able to figure out the issue.
consul-dns is not deployed by default, i had to manually deploy it, then forward all .consul requests from coredns to consul-dns.
All is working now. Thanks!

Singleton cluster actor is not starting up

The following cluster singleton is not starting up.
commander = system.actorOf(
ClusterSingletonManager.props(Commander.props(this),
terminationMessage = PoisonPill.getInstance,
settings = ClusterSingletonManagerSettings.create(system).withRole("commander")
), name = "Commander")
No error messages are thrown.
Logs are:
[INFO] [08/03/2016 11:43:58.656] [ScalaTest-run-running-ClusterSuite] [akka.remote.Remoting] Starting remoting
[INFO] [08/03/2016 11:43:59.007] [ScalaTest-run-running-ClusterSuite] [akka.remote.Remoting] Remoting started; listening on addresses :[akka.tcp://galaxyFarFarAway#127.0.0.1:59592]
[INFO] [08/03/2016 11:43:59.035] [ScalaTest-run-running-ClusterSuite] [akka.cluster.Cluster(akka://galaxyFarFarAway)] Cluster Node [akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - Starting up...
[INFO] [08/03/2016 11:43:59.218] [ScalaTest-run-running-ClusterSuite] [akka.cluster.Cluster(akka://galaxyFarFarAway)] Cluster Node [akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - Registered cluster JMX MBean [akka:type=Cluster]
[INFO] [08/03/2016 11:43:59.218] [ScalaTest-run-running-ClusterSuite] [akka.cluster.Cluster(akka://galaxyFarFarAway)] Cluster Node [akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - Started up successfully
[INFO] [08/03/2016 11:43:59.247] [galaxyFarFarAway-akka.actor.default-dispatcher-2] [akka.cluster.Cluster(akka://galaxyFarFarAway)] Cluster Node [akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - Metrics will be retreived from MBeans, and may be incorrect on some platforms. To increase metric accuracy add the 'sigar.jar' to the classpath and the appropriate platform-specific native libary to 'java.library.path'. Reason: java.lang.ClassNotFoundException: org.hyperic.sigar.Sigar
[INFO] [08/03/2016 11:43:59.257] [galaxyFarFarAway-akka.actor.default-dispatcher-2] [akka.cluster.Cluster(akka://galaxyFarFarAway)] Cluster Node [akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - Metrics collection has started successfully
[INFO] [08/03/2016 11:43:59.268] [galaxyFarFarAway-akka.actor.default-dispatcher-3] [akka.cluster.Cluster(akka://galaxyFarFarAway)] Cluster Node [akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - No seed-nodes configured, manual cluster join required
Disconnected from the target VM, address: '127.0.0.1:59574', transport: 'socket'
The configuration is:
akka {
actor {
provider = "akka.cluster.ClusterActorRefProvider"
default-dispatcher {
throughput = 10
}
}
cluster {
roles = [commander]
}
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "127.0.0.1"
port = 0
}
}
akka.extensions=["akka.cluster.metrics.ClusterMetricsExtension"]
}
When I debug the code of Commander class, the constructor is not even called anywhere. When I omit the ClusterSingletonManager and just create it with Props it does work however, the Commander actor is going to be created.
I sense incorrect configuration behind this issue. Do you guys have any remarks about this?
You've sensed quite right: you haven't specified the seed node configuration for the Akka clustering. You can see this in the last line of the log:
[akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - No seed-nodes configured, manual cluster join required Disconnected from the target VM, address: '127.0.0.1:59574', transport: 'socket'
Because you haven't specified any seed nodes in the configuration file, Akka will wait for you to specify the seed nodes programmatically. You can specify the seed nodes in the config like this:
akka.cluster.seed-nodes = [
"akka.tcp://yourClusterSystem#127.0.0.1:2551",
"akka.tcp://yourClusterSystem#127.0.0.1:2552"
]
Alternatively, you can call the joinSeedNodes method to join the cluster programmatically. In both cases, you have to specify at least one seed node that is available. The actor system itself can also act as a seed node.
Once the seed nodes have been specified and the actor system has joined the cluster, Akka features depending on clustering (cluster singletons, sharding etc.) will boot up. This is why you can launch an ordinary actor, but not the singleton.
For more information on setting up seed nodes see Akka cluster documentation.

Akka Singleton Cluster: resolveOnce of a worker by master fails after restart

I am using Akka Cluster 2.4.3 and trying to setup a simple cluster in my machine to understand its working better. I have a singleton cluster with remoting enabled with primary and standby master and one worker node. Each of these 3 run in separate JVMs
Things work fine when all the nodes are started the first time. If I kill and restart the worker, I see following issues happening
Restart Worker
When the worker comes back after restart, the master on receiving MemberUp event tries to resolve for the actorRef from the member address the following way
context.actorSelection(member.address.toString).resolveOne(15 seconds)
This fails with an exception saying ActorNotFound. This works with no problem when all the nodes are coming up for the first time in the cluster.
Restart worker again
This time, the worker comes up with the following message
[WARN] [04/15/2016 18:24:24.991] [clustersystem-akka.remote.default-remote-dispatcher-5] [akka.remote.Remoting] Tried to associate with unreachable remote address [akka.tcp://clustersystem#host1:2551]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.]
Restart worker again
This time the resolveOne on a MemberUp event works.
I am having a bit of difficulty in understanding what is happening here, I have looked into the docs but I did not find anything that will help me in there.
application.conf
akka {
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
enabled-transports = ["akka.remote.netty.tcp"]
}
log-dead-letters = off
jvm-exit-on-fatal-error = on
loglevel = "DEBUG"
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "host1"
port = 0
}
}
cluster {
seed-nodes = [
"akka.tcp://clustersystem#host1:2551",
"akka.tcp://clustersystem#host1:2552"]
auto-down-unreachable-after = 10s
}
extensions = ["akka.cluster.metrics.ClusterMetricsExtension"]
}
I start master nodes at ports 2551 and 2552 (provide the ports as command line args) and I start the worker on port 3551

akka remote actor running on local actor

I am learning akka-remote and trying to re-do http://www.typesafe.com/activator/template/akka-sample-remote-scala myself.
When I try to run the project in two separate JVMs, I see
$ clear;java -jar akkaio-remote/target/akka-remote-jar-with-dependencies.jar com.harit.akkaio.remote.RemoteApp ProcessingActor
ProcessingActorSystem Started
and
$ clear;java -jar akkaio-remote/target/akka-remote-jar-with-dependencies.jar com.harit.akkaio.remote.RemoteApp WatchingActor
WatchingActorSystem Started
asking processor to process
processing big things
I asked my Processing System to run on port 2552
include "common"
akka {
# LISTEN on tcp port 2552
remote.netty.tcp.port = 2552
}
and I told my other system (WatchingSystem) to run on port 2554 but start processingActor on port 2552
include "common"
akka {
actor {
deployment {
"/processingActor/*" {
remote = "akka.tcp://ProcessingActorSystem#127.0.0.1:2552"
}
}
}
remote.netty.tcp.port = 2554
}
and common is about using the right provider
akka {
actor {
provider = "akka.remote.RemoteActorRefProvider"
}
remote {
netty.tcp {
hostname = "127.0.0.1"
}
}
}
Questions/Concerns
From logs, I see that the processingActor is running on WatchingActorSystem and not on ProcessingActorSystem, what is wrong going on?
How can I see that the two ActorSystems are connecting to each other. I do not see logging happening. However, in the example, I shared the logging happens. What am I missing?
The entire code is posted on Github and runs as well
1) Your deployment configuration is set up to have all the children of processingActor being remote, as described in the akka configuration docs
You should set it to this instead:
deployment {
"/processingActor" {
remote = "akka.tcp://ProcessingActorSystem#127.0.0.1:2552"
}
2) You need to set your log level to something useful as described in the akka logging documentation