I have a cluster with 2 workers and 1 master. the cluster is implemented with Akka and Scala.
When i killed the worker and try to run it again with the following command:
java -Xms3500M -Xmx3500M -Dlog_file_name=worker1
"-Dconfig.file=F:\cluster\application.conf" -cp cluster.jar
knowmail.Worker worker1 2551
I get the following error:
Connection refused
Association with remote system
[akka.tcp://ClusterSystem#xxxxxx:2552] has failed, address is now
gated for [5000] ms. Reason: [As
kka.tcp://ClusterSystem#xxxx:2552]] Caused by: [Connection
refused: no further information: /xxxx:2552]
a configuration of cluster:
remote {
log-remote-lifecycle-events = off
log-received-messages = on
log-sent-messages = on
netty.tcp {
hostname = "xxxxxx"
port = 8888
bind-hostname = 0.0.0.0
bind-port = 8888
}
}
cluster {
seed-nodes = [
"akka.tcp://ClusterSystem#xxxxx:2551",
"akka.tcp://ClusterSystem#xxxxxx:2552"]
auto-down-unreachable-after = 20s
}
http.client.parsing.max-content-length = infinite
}
Did anyone encountered this error and solved it?
This happens when I start one of the seed nodes/workers before the other seed node has been started.
So one seed node is looking for the other and reports the following error of:
akka.tcp://ClusterSystem#10.5.2.10:2552] has failed, address is now
gated for [5000] ms.
Related
We are performing load test on our application using Jmeter, our application uses consul and vault as a backend service for reading/storing application configuration related data. While performing load testing, our application queries the vault for authentication data and this happens for each incoming request. Initially it runs fine for some duration (10 to 15 minutes) and I can see the success response in Jmete, but eventually after sometime the responses starts failing for all the requests. I see the following error in the vault log for each request but do not see any error/exception in the consul log.
Error in Vault log
[ERROR] core: failed to lookup token: error=failed to read entry: Get http://localhost:8500/v1/kv//vault/sys/token/id/87f7b82131cb8fa1ef71aa52579f155d4cf9f095: dial tcp [::1]:8500: getsockopt: connection refused
As of now the load is 100 request (users) in each 10 milliseconds with a ramp-up period of 60 seconds. And this executes over a loop. What could be the cause of this error? Is it due to the limited connection to port 8500
Below is my vault and consul configuration
Vault
backend "consul" {
address = "localhost:8500"
path = "app/vault/"
}
listener "tcp" {
address = "10.88.97.216:8200"
cluster_address = "10.88.97.216:8201"
tls_disable = 0
tls_min_version = "tls12"
tls_cert_file = "/var/certs/vault.crt"
tls_key_file = "/var/certs/vault.key"
}
Consul
{
"data_dir": "/var/consul",
"log_level": "info",
"server": true,
"leave_on_terminate": true,
"ui": true,
"client_addr": "127.0.0.1",
"ports": {
"dns": 53,
"serf_lan": 8301,
"serf_wan" : 8302
},
"disable_update_check": true,
"enable_script_checks": true,
"disable_remote_exec": false,
"domain": "primehome",
"limits": {
"http_max_conns_per_client": 1000,
"rpc_max_conns_per_client": 1000
},
"service": {
"name": "nginx-consul-https",
"port": 443,
"checks": [{
"http": "https://localhost/nginx_status",
"tls_skip_verify": true,
"interval": "10s",
"timeout": "5s",
"status": "passing"
}]
}
}
I have also configured the http_max_conns_per_client & rpc_max_conns_per_client, thinking that it might be due to the limited connection perclicent. But still I am seeing this error in vault log.
After taking another look at this, the issue appears to be that Vault is attempting to contact Consul over the IPv6 loopback address–likely due to the v4 and v6 addresses being present in /etc/hosts–but Consul is only listening on the IPv4 loopback address.
You can likely resolve this through one of the following methods.
Use 127.0.0.1 instead of localhost for Consul's address in the Vault config.
backend "consul" {
address = "127.0.0.1:8500"
path = "app/vault/"
}
Configure Consul to listen on both the IPv4 and IPv6 loopback addresses.
{
"client_addr": "127.0.0.1 [::1]"
}
(Rest of the config omitted for brevity.)
Remove the localhost hostname from the IPv6 loopback in /etc/hosts
127.0.0.1 localhost
# Old hosts entry for ::1
#::1 localhost ip6-localhost ip6-loopback
# New entry
::1 ip6-localhost ip6-loopback
I'm new to akka and wanted to connect two PC using akka remotely just to run some code in both as (2 actors). I had tried the example in akka doc. But what I really do is to add the 2 IP addresses into config file I always get this error?
First machine give me this error:
[info] [ERROR] [11/20/2018 13:58:48.833]
[ClusterSystem-akka.remote.default-remote-dispatcher-6]
[akka.remote.artery.Association(akka://ClusterSystem)] Outbound
control stream to [akka://ClusterSystem#192.168.1.2:2552] failed.
Restarting it. Handshake with [akka://ClusterSystem#192.168.1.2:2552]
did not complete within 20000 ms
(akka.remote.artery.OutboundHandshake$HandshakeTimeoutException:
Handshake with [akka://ClusterSystem#192.168.1.2:2552] did not
complete within 20000 ms)
And second machine:
Exception in thread "main"
akka.remote.RemoteTransportException: Failed to bind TCP to
[192.168.1.3:2552] due to: Bind failed because of
java.net.BindException: Cannot assign requested address: bind
Config file content :
akka {
actor {
provider = cluster
}
remote {
artery {
enabled = on
transport = tcp
canonical.hostname = "192.168.1.3"
canonical.port = 0
}
}
cluster {
seed-nodes = [
"akka://ClusterSystem#192.168.1.3:2552",
"akka://ClusterSystem#192.168.1.2:2552"]
# auto downing is NOT safe for production deployments.
# you may want to use it during development, read more about it in the docs.
auto-down-unreachable-after = 120s
}
}
# Enable metrics extension in akka-cluster-metrics.
akka.extensions=["akka.cluster.metrics.ClusterMetricsExtension"]
# Sigar native library extract location during tests.
# Note: use per-jvm-instance folder when running multiple jvm on one host.
akka.cluster.metrics.native-library-extract-folder=${user.dir}/target/native
First of all, you don't need to add cluster configuration for AKKA remoting. Both the PCs or nodes should be enabled remoting with a concrete port instead of "0" that way you know which port to connect.
Have below configurations
PC1
akka {
actor {
provider = remote
}
remote {
artery {
enabled = on
transport = tcp
canonical.hostname = "192.168.1.3"
canonical.port = 19000
}
}
}
PC2
akka {
actor {
provider = remote
}
remote {
artery {
enabled = on
transport = tcp
canonical.hostname = "192.168.1.4"
canonical.port = 18000
}
}
}
Use below actor path to connect any actor in remote from PC1 to PC2
akka://<PC2-ActorSystem>#192.168.1.4:18000/user/<actor deployed in PC2>
Use below actor path to connect from PC2 to PC1
akka://<PC2-ActorSystem>#192.168.1.3:19000/user/<actor deployed in PC1>
Port numbers and IP address are samples.
I am writing a JBoss EAR application that needs to query the local JBoss server it is running on for status of deployments. I am running JBoss as standalone. I am able to query the JBoss server easily using the JBoss-CLI, but using the API with the 'ModelControllerClient', I am getting a "connection refused" error. My firewall is completely disabled, and I am pointing at localhost, so I am not sure what the problem could be.
Here is the code I am running:
public static void GetStatus() throws Exception{
ModelNode operation = new ModelNode();
operation.get( "address" ).add( "deployment", "*" );
operation.get( "operation" ).set( "read-attribute" );
ModelControllerClient client = ModelControllerClient.Factory.create(InetAddress.getByName("127.0.0.1"), 9990);
ModelNode result = client.execute( new OperationBuilder(operation).build() );
List<ModelNode> deployments = result.get( "result" ).asList();
String deploymentName;
// finally we can iterate and get the deployment names.
for ( ModelNode deployment : deployments ) {
deploymentName = deployment.get( "result" ).asString();
System.out.println( "deploymentName = " + deploymentName );
}
}
... and here is the error I receive when this method is called:
java.io.IOException: java.net.ConnectException: JBAS012144: Could not connect to remote://localhost:9990. The connection timed out
If I run the netstat -tuna command, I see that I am listening to 0:0:0:0:9990
Thanks!
Try with starting server on localhost instead of 0.0.0.0. Also refer sample example from http://middlewaremagic.com/jboss/?p=2676
I decided to play with lapis - https://github.com/leafo/lapis, but the application drops when I try to query the database (PostgreSQL) with the output:
2017/07/01 16:04:26 [error] 31284#0: *8 lua entry thread aborted: runtime error: attempt to yield across C-call boundary
stack traceback:
coroutine 0:
[C]: in function 'require'
/usr/local/share/lua/5.1/lapis/init.lua:15: in function 'serve'
content_by_lua(nginx.conf.compiled:22):2: in function , client: 127.0.0.1, server: , request: "GET / HTTP/1.1", host: "localhost:8080"
The code that causes the error:
local db = require("lapis.db")
local res = db.query("SELECT * FROM users");
config.lua:
config({ "development", "production" }, {
postgres = {
host = "0.0.0.0",
port = "5432",
user = "wars_base",
password = "12345",
database = "wars_base"
}
})
The database is running, the table is created, in table 1 there is a record.
What could be the problem?
Decision: https://github.com/leafo/lapis/issues/556
You need to specify the right server IP in the host parameter.
The IP you have specified 0.0.0.0 is not a valid one, and normally it is used when you specify a listen address, with the meaning of "every address".
Usually you can use the '127.0.0.1' address during development.
I want to create a system that will not have a single point of failure.
I was under the impression that routers are the tool for doing that but I'm not sure it works as I would expect.
This is the entry point of my program :
object Main extends App{
val system = ActorSystem("mySys", ConfigFactory.load("application"))
val router = system.actorOf(
ClusterRouterPool(RoundRobinPool(0), ClusterRouterPoolSettings(
totalInstances = 2, maxInstancesPerNode = 1,
allowLocalRoutees = false, useRole = Some("testActor"))).props(Props[TestActor]),
name = "testActors")
}
And this is the code for running the remote ActorSystem(so the router could deploy the TestActor code to the remote nodes):
object TestActor extends App{
val system = ActorSystem("mySys", ConfigFactory.load("application").getConfig("testactor1"))
case object PrintRouterPath
}
I'm running this twice, once with testactor1 and once with testactor2.
TestActor code:
class TestActor extends Actor with ActorLogging{
implicit val ExecutionContext = context.dispatcher
context.system.scheduler.schedule(10000 milliseconds, 30000 milliseconds,self, PrintRouterPath)
override def receive: Receive = {
case PrintRouterPath =>
log.info(s"router is on path ${context.parent}")
}
}
And application.conf
akka{
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "127.0.0.1"
port = 2552
}
}
cluster {
seed-nodes = [
"akka.tcp://mySys#127.0.0.1:2552"
"akka.tcp://mySys#127.0.0.1:2553"
"akka.tcp://mySys#127.0.0.1:2554"]
auto-down-unreachable-after = 20s
}
}
testactor1{
akka{
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "127.0.0.1"
port = 2554
}
}
cluster {
roles.1 = "testActor"
seed-nodes = [
"akka.tcp://mySys#127.0.0.1:2552"
"akka.tcp://mySys#127.0.0.1:2553"
"akka.tcp://mySys#127.0.0.1:2554"]
auto-down-unreachable-after = 20s
}
}
}
testactor2{
akka{
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "127.0.0.1"
port = 2553
}
}
cluster {
roles.1 = "testActor"
seed-nodes = [
"akka.tcp://mySys#127.0.0.1:2552"
"akka.tcp://mySys#127.0.0.1:2553"
"akka.tcp://mySys#127.0.0.1:2554"]
auto-down-unreachable-after = 20s
}
}
}
Now the problem is that when the process that started the router is killed, the actors that are running the code of TestActor, are not receiving any messages(the messages that the scheduler sends), I would have expect that the router will be deployed on another seed node in the cluster and the actors will be recovered. Is this possible? or is there any other way of implementing this flow and not having a single point of failure?
I think that, by deploying the router on only one node you are setting up a master-slave cluster, where the master is a single point of failure by definition.
From what I understand (looking at the docs), a router can be cluster-aware in the sense that it can deploy (pool mode) or lookup (group mode) routees on nodes in the cluster. The router itself will not react to failure by spawning somewhere else in the cluster.
I believe you have 2 options:
make use of multiple routers to make you system more fault-tolerant. Routees can either be shared (group mode) or not (pool mode) between routers.
make use of the Cluster Singleton pattern - which allows for a master-slave configuration where the master will be automatically re-spawned in case of failure. In relation to your example, note that this behaviour is achieved by having an actor (ClusterSingletonManager) deployed in each node. This actor has the purpose of working out if the chosen master needs to be respawned and where. None of this logic is in place in case of cluster-aware router like the one you setup.
You can find examples of multiple cluster setups in this Activator sample.
i tested two approaches, first using your code with ClusterRouterPool
Like you said when the process that started the router is killed, TestActor not receive more messages.
While reading the documentation and testing , if you change in application.conf :
`auto-down-unreachable-after = 20s`
for this
`auto-down-unreachable-after = off`
the TestActor keep receiving the messages, although in the log the following message appears(i don`t know how to put the log here, sorry):
[WARN] [01/30/2017 17:20:26.017] [mySys-akka.remote.default-remote-dispatcher-5] [akka.tcp://mySys#127.0.0.1:2554/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FmySys%40127.0.0.1%3A2552-0] Association with remote system [akka.tcp://mySys#127.0.0.1:2552] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://mySys#127.0.0.1:2552]] Caused by: [Connection refused: /127.0.0.1:2552]
[INFO] [01/30/2017 17:20:29.860] [mySys-akka.actor.default-dispatcher-4] [akka.tcp://mySys#127.0.0.1:2554/remote/akka.tcp/mySys#127.0.0.1:2552/user/testActors/c1] router is on path Actor[akka.tcp://mySys#127.0.0.1:2552/user/testActors#-1120251475]
[WARN] [01/30/2017 17:20:32.016] [mySys-akka.remote.default-remote-dispatcher-5]
And in the case the MainApp is restarted the log works normally without warning or errors
MainApp Log :
[INFO] [01/30/2017 17:23:32.756] [mySys-akka.actor.default-dispatcher-2] [akka.cluster.Cluster(akka://mySys)] Cluster Node [akka.tcp://mySys#127.0.0.1:2552] - Welcome from [akka.tcp://mySys#127.0.0.1:2554]
TestActor Log:
INFO] [01/30/2017 17:23:21.958] [mySys-akka.actor.default-dispatcher-14] [akka.cluster.Cluster(akka://mySys)] Cluster Node [akka.tcp://mySys#127.0.0.1:2554] - New incarnation of existing member [Member(address = akka.tcp://mySys#127.0.0.1:2552, status = Up)] is trying to join. Existing will be removed from the cluster and then new member will be allowed to join.
[INFO] [01/30/2017 17:23:21.959] [mySys-akka.actor.default-dispatcher-14] [akka.cluster.Cluster(akka://mySys)] Cluster Node [akka.tcp://mySys#127.0.0.1:2554] - Marking unreachable node [akka.tcp://mySys#127.0.0.1:2552] as [Down]
[INFO] [01/30/2017 17:23:22.454] [mySys-akka.actor.default-dispatcher-2] [akka.cluster.Cluster(akka://mySys)] Cluster Node [akka.tcp://mySys#127.0.0.1:2554] - Leader can perform its duties again
[INFO] [01/30/2017 17:23:22.461] [mySys-akka.actor.default-dispatcher-2] [akka.cluster.Cluster(akka://mySys)] Cluster Node [akka.tcp://mySys#127.0.0.1:2554] - Leader is removing unreachable node [akka.tcp://mySys#127.0.0.1:2552]
[INFO] [01/30/2017 17:23:32.728] [mySys-akka.actor.default-dispatcher-4] [akka.cluster.Cluster(akka://mySys)] Cluster Node [akka.tcp://mySys#127.0.0.1:2554] - Node [akka.tcp://mySys#127.0.0.1:2552] is JOINING, roles []
[INFO] [01/30/2017 17:23:33.457] [mySys-akka.actor.default-dispatcher-14] [akka.cluster.Cluster(akka://mySys)] Cluster Node [akka.tcp://mySys#127.0.0.1:2554] - Leader is moving node [akka.tcp://mySys#127.0.0.1:2552] to [Up]
[INFO] [01/30/2017 17:23:37.925] [mySys-akka.actor.default-dispatcher-19] [akka.tcp://mySys#127.0.0.1:2554/remote/akka.tcp/mySys#127.0.0.1:2552/user/testActors/c1] router is on path Actor[akka.tcp://mySys#127.0.0.1:2552/user/testActors#-630150507]
The other approach is to use ClusterRouterGroup, because the routees are shared among the nodes of the cluster
Group - router that sends messages to the specified path using actor selection The routees can be shared among routers running on different nodes in the cluster. One example of a use case for this type of router is a service running on some backend nodes in the cluster and used by routers running on front-end nodes in the cluster.
Pool - router that creates routees as child actors and deploys them on remote nodes. Each router will have its own routee instances. For example, if you start a router on 3 nodes in a 10-node cluster, you will have 30 routees in total if the router is configured to use one instance per node. The routees created by the different routers will not be shared among the routers. One example of a use case for this type of router is a single master that coordinates jobs and delegates the actual work to routees running on other nodes in the cluster.
The Main App
object Main extends App {
val system = ActorSystem("mySys", ConfigFactory.load("application.conf"))
val routerGroup = system.actorOf(
ClusterRouterGroup(RoundRobinGroup(Nil), ClusterRouterGroupSettings(
totalInstances = 2, routeesPaths = List("/user/testActor"),
allowLocalRoutees = false, useRole = Some("testActor"))).props(),
name = "testActors")
}
you must start the TestActor in each remote node
object TestActor extends App{
val system = ActorSystem("mySys", ConfigFactory.load("application").getConfig("testactor1"))
system.actorOf(Props[TestActor],"testActor")
case object PrintRouterPath
}
http://doc.akka.io/docs/akka/2.4/scala/cluster-usage.html#Router_with_Group_of_Routees
The routee actors should be started as early as possible when starting the actor system, because the router will try to use them as soon as the member status is changed to 'Up'.
I hope it helps you