Akka Singleton Cluster: resolveOnce of a worker by master fails after restart - scala

I am using Akka Cluster 2.4.3 and trying to setup a simple cluster in my machine to understand its working better. I have a singleton cluster with remoting enabled with primary and standby master and one worker node. Each of these 3 run in separate JVMs
Things work fine when all the nodes are started the first time. If I kill and restart the worker, I see following issues happening
Restart Worker
When the worker comes back after restart, the master on receiving MemberUp event tries to resolve for the actorRef from the member address the following way
context.actorSelection(member.address.toString).resolveOne(15 seconds)
This fails with an exception saying ActorNotFound. This works with no problem when all the nodes are coming up for the first time in the cluster.
Restart worker again
This time, the worker comes up with the following message
[WARN] [04/15/2016 18:24:24.991] [clustersystem-akka.remote.default-remote-dispatcher-5] [akka.remote.Remoting] Tried to associate with unreachable remote address [akka.tcp://clustersystem#host1:2551]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.]
Restart worker again
This time the resolveOne on a MemberUp event works.
I am having a bit of difficulty in understanding what is happening here, I have looked into the docs but I did not find anything that will help me in there.
application.conf
akka {
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
enabled-transports = ["akka.remote.netty.tcp"]
}
log-dead-letters = off
jvm-exit-on-fatal-error = on
loglevel = "DEBUG"
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "host1"
port = 0
}
}
cluster {
seed-nodes = [
"akka.tcp://clustersystem#host1:2551",
"akka.tcp://clustersystem#host1:2552"]
auto-down-unreachable-after = 10s
}
extensions = ["akka.cluster.metrics.ClusterMetricsExtension"]
}
I start master nodes at ports 2551 and 2552 (provide the ports as command line args) and I start the worker on port 3551

Related

Akka cluster sharding configuration

Ok, I have 2 instances of my backend, hosted on 2 difference centos servers. What I want to do using Akka Cluster Sharding is to divide the work done by each of these instances:
I have data for 4 countries, which is retrieved from db at every 10 seconds by both backend instances, which update a Redis instance. So, multiple times, I have duplicated requests, because both backends get data for same country;
Using Akka Cluster Sharding, I try to divide the work dinamically, instance1 to get data for ES and EN, instance2 to get data for DE and IT. In case of instance1 is down, instance2 will take the jobs and will get data even for ES/EN.
I tought this is simple...but not.
All jobs are done by Akka Actors, so using Cluster Sharding, I thought all declared actors (from both instances) will be centralized somewhere, to can manipulate which do whatever job.
On localhost, all works fine, because I have an instance for my app with port 9001 and 2 cluster nodes with ports 2551 and 2552. But for production, I can't understand how to configure the hostnames
application.conf
"clusterRegistration" {
akka {
actor {
allow-java-serialization = on
provider = cluster
}
remote.artery {
enabled = on
transport = aeron-udp
}
cluster {
jmx.multi-mbeans-in-same-jvm = on
seed-nodes = [
"akka://ClusterService#instance1:8083",
"akka://ClusterService#instance1:2551"
]
}
}
}
class
object ClusterSharding {
def createNode(hostname: String, port: Int, role: String, props: Props, actorName: String) = {
val config = ConfigFactory.parseString(
s"""
|akka.cluster.roles = ["$role"]
|akka.remote.artery.canonical.hostname = $hostname
|akka.remote.artery.canonical.port = $port
|""".stripMargin
).withFallback(ConfigFactory.load
.getConfig("clusterRegistration"))
val system = ActorSystem("ClusterService", config)
system.actorOf(props, actorName)
}
val master = createNode("instance1", 8083, "master", Props[Master], "master")
createNode("instance1", 2551, "worker", Props[Worker], "worker")
createNode("instance2", 8083, "worker", Props[Worker], "worker")
Future {
while (true) {
master ! Proceed // this will fire an Actor Resolver case
Thread.sleep(5000)
}
}
}
master actor
class Master extends Actor {
var workers: Map[Address, ActorRef] = Map()
val cluster = Cluster(context.system)
override def preStart(): Unit = {
cluster.subscribe(
self,
initialStateMode = InitialStateAsEvents,
classOf[MemberEvent],
classOf[UnreachableMember]
)
}
override def postStop(): Unit = {
cluster.unsubscribe(self)
}
def receive = handleClusterEvents // cluster events
.orElse(handleWorkerRegistration) // worker registered to cluster
.orElse(handleJob) // give jobs to workers
def handleJob: Receive = {
case Proceed => {
// Here I must be able to use all workers from both instances
// (centos1 and centos2) and give work for each dinamically
if (workers.length == 2) {
worker1 ! List("EN", "ES")
worker2 ! List("DE", "IT")
} else if (workers.length == 1) {
worker ! List("EN", "ES", "DE", "IT")
} else {
execQueries() // if no worker is available, each backend instance will exec queries on his own way
}
}
}
}
Both instances are hosted with port 8083 (centos1: instance1:8083, centos2: instance2:8083). If I use settings just for one of the instances in application.conf and in createNode (instance1 for example), I can see in logs that the workers are created, but there is no communication with the second instance.
Where I'm wrong? thx
Your approach to configuring the hostnames is viable. There are better ways to do it (depending on how you're deploying the service: manual deploy vs. ansible/chef/puppet vs. docker vs. kubernetes/nomad/mesos will be different), but setting the hostname isn't likely your actual problem.
Your current approach will give you a master and 2 workers on every node and you're not actually using Cluster Sharding (you're using Cluster, but Cluster Sharding is something you opt into on top of Cluster). From the code you've posted, I strongly suspect that using Cluster Sharding will entail a dramatic redesign (though without posting the Worker and more complete Master code, it's hard to say).
The broad approach I'd take with this would be to have the process of updating Redis for a given country be owned by a sharded entity (keyed by that country). A cluster singleton actor would trigger the update process for each country every 10 seconds. Because we're using sharding and singleton, I'd probably actually have at least 3 instances of the service, or alternatively make use of a strongly consistent external lease system (the other split-brain resolution strategies (note that cluster sharding and cluster singleton basically force you to resolve split-brains) will all boil down, at least half the time, to losing one node is the same as losing both in a 2-node cluster). Because sharding implies that the actor for a process could be stopped arbitrarily (and possibly resumed on a different node), you'll also want to think about how the process can be resumed in a way that makes sense for the application.
Starting multiple ActorSystems in the same JVM process is generally only a good idea in fairly specific circumstances.

ActiveMQ Artemis publish message loss during HA fail-over

I use ActiveMQ Artemis 2.17.0, and I'm looking to avoid message loss in a producer during fail-over.
Message publish loss handled during Artemis active to passive switch by catching ActiveMQUnBlockedException and sending the message again.
The brokers are configured as active/passive HA shared-store. Active node configured in host1 and passive node configured in host2.
Url is:
(tcp://host1:61616,tcp://host2:61616)?ha=true&reconnectAttempts=-1&blockOnDurableSend=false
blockOnDurableSend set to false for high throughput.
During active to passive switch publishing code throws ActiveMQUnBlockedException but not during passive to active switching.
We're using Spring 4.2.5 and CachingConnectionFactory for connection factory.
I'm using the following code to send messages:
private void sendMessageInternal(final ConnectionFactory connectionFactory, final Destination queue, final String message)
throws JMSException {
try (final Connection connection = connectionFactory.createConnection();) {
connection.start();
try (final Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
final MessageProducer producer = session.createProducer(queue);) {
final TextMessage textMessage = session.createTextMessage(message);
producer.send(textMessage);
}
} catch (JMSException thr) {
if (thr.getCause() instanceof ActiveMQUnBlockedException) {
// consider as fail-over disconnection, send message again.
} else {
throw thr;
}
}
}
In host1 machine, Artemis deployed as master - node1.
In host2 machine, Artemis deployed as slave - node2.
following steps I did to simulate fail-over
node1 and node2 started
node1 started as live server and node2 started as backup server
killed node1, node2 become live server
client publish code threw ActiveMQUnBlockedException and handled to send message again
started node1 again. node1 become live server and node2 become backup again
client publish code did not throw ActiveMQUnBlockedException and loss in message.
Getting following error stack during step #3. ( Killed node1 and node2 become Live server).
javax.jms.JMSException: AMQ219016: Connection failure detected. Unblocking a blocking call that will never get a response
at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:540)
at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.sessionStop(ActiveMQSessionContext.java:470)
at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.stop(ClientSessionImpl.java:1121)
at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.stop(ClientSessionImpl.java:1110)
at org.apache.activemq.artemis.jms.client.ActiveMQSession.stop(ActiveMQSession.java:1244)
at org.apache.activemq.artemis.jms.client.ActiveMQConnection.stop(ActiveMQConnection.java:339)
at org.springframework.jms.connection.SingleConnectionFactory$SharedConnectionInvocationHandler.localStop(SingleConnectionFactory.java:644)
at org.springframework.jms.connection.SingleConnectionFactory$SharedConnectionInvocationHandler.invoke(SingleConnectionFactory.java:577)
at com.sun.proxy.$Proxy5.close(Unknown Source)
at com.eu.amq.failover.test.ProducerNodeTest.sendMessageInternal(ProducerNodeTest.java:133)
at com.eu.amq.failover.test.ProducerNodeTest.sendMessage(ProducerNodeTest.java:110)
at com.eu.amq.failover.test.ProducerNodeTest.main(ProducerNodeTest.java:90)
The ActiveMQUnBlockedException you're getting is coming from Spring's invocation of javax.jms.Connection#stop. It's not related to sending a message. Re-sending a message when you get this specific exception could result in a duplicate message.
Ultimately your problem is directly related to setting blockOnDurableSend=false. This tells the client to "fire and forget." In other words the client won't wait for a response from the broker to ensure the message actually made it successfully. This lack of waiting increases throughput but decreases reliability.
If you really want to mitigate potential message loss you have two main options.
Set blockOnDurableSend=true. This will reduce message throughput, but it's the simplest way to guarantee the message arrived at the broker successfully.
Use a CompletionListener. This will allow you to keep blockOnDurableSend=false, but the application will still be informed if there are problems sending the message although the information will be provided asynchronously. This feature was added in JMS 2 specifically for this kind of scenario. See the JavaDoc for more details.

Singleton cluster actor is not starting up

The following cluster singleton is not starting up.
commander = system.actorOf(
ClusterSingletonManager.props(Commander.props(this),
terminationMessage = PoisonPill.getInstance,
settings = ClusterSingletonManagerSettings.create(system).withRole("commander")
), name = "Commander")
No error messages are thrown.
Logs are:
[INFO] [08/03/2016 11:43:58.656] [ScalaTest-run-running-ClusterSuite] [akka.remote.Remoting] Starting remoting
[INFO] [08/03/2016 11:43:59.007] [ScalaTest-run-running-ClusterSuite] [akka.remote.Remoting] Remoting started; listening on addresses :[akka.tcp://galaxyFarFarAway#127.0.0.1:59592]
[INFO] [08/03/2016 11:43:59.035] [ScalaTest-run-running-ClusterSuite] [akka.cluster.Cluster(akka://galaxyFarFarAway)] Cluster Node [akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - Starting up...
[INFO] [08/03/2016 11:43:59.218] [ScalaTest-run-running-ClusterSuite] [akka.cluster.Cluster(akka://galaxyFarFarAway)] Cluster Node [akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - Registered cluster JMX MBean [akka:type=Cluster]
[INFO] [08/03/2016 11:43:59.218] [ScalaTest-run-running-ClusterSuite] [akka.cluster.Cluster(akka://galaxyFarFarAway)] Cluster Node [akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - Started up successfully
[INFO] [08/03/2016 11:43:59.247] [galaxyFarFarAway-akka.actor.default-dispatcher-2] [akka.cluster.Cluster(akka://galaxyFarFarAway)] Cluster Node [akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - Metrics will be retreived from MBeans, and may be incorrect on some platforms. To increase metric accuracy add the 'sigar.jar' to the classpath and the appropriate platform-specific native libary to 'java.library.path'. Reason: java.lang.ClassNotFoundException: org.hyperic.sigar.Sigar
[INFO] [08/03/2016 11:43:59.257] [galaxyFarFarAway-akka.actor.default-dispatcher-2] [akka.cluster.Cluster(akka://galaxyFarFarAway)] Cluster Node [akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - Metrics collection has started successfully
[INFO] [08/03/2016 11:43:59.268] [galaxyFarFarAway-akka.actor.default-dispatcher-3] [akka.cluster.Cluster(akka://galaxyFarFarAway)] Cluster Node [akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - No seed-nodes configured, manual cluster join required
Disconnected from the target VM, address: '127.0.0.1:59574', transport: 'socket'
The configuration is:
akka {
actor {
provider = "akka.cluster.ClusterActorRefProvider"
default-dispatcher {
throughput = 10
}
}
cluster {
roles = [commander]
}
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "127.0.0.1"
port = 0
}
}
akka.extensions=["akka.cluster.metrics.ClusterMetricsExtension"]
}
When I debug the code of Commander class, the constructor is not even called anywhere. When I omit the ClusterSingletonManager and just create it with Props it does work however, the Commander actor is going to be created.
I sense incorrect configuration behind this issue. Do you guys have any remarks about this?
You've sensed quite right: you haven't specified the seed node configuration for the Akka clustering. You can see this in the last line of the log:
[akka.tcp://galaxyFarFarAway#127.0.0.1:59592] - No seed-nodes configured, manual cluster join required Disconnected from the target VM, address: '127.0.0.1:59574', transport: 'socket'
Because you haven't specified any seed nodes in the configuration file, Akka will wait for you to specify the seed nodes programmatically. You can specify the seed nodes in the config like this:
akka.cluster.seed-nodes = [
"akka.tcp://yourClusterSystem#127.0.0.1:2551",
"akka.tcp://yourClusterSystem#127.0.0.1:2552"
]
Alternatively, you can call the joinSeedNodes method to join the cluster programmatically. In both cases, you have to specify at least one seed node that is available. The actor system itself can also act as a seed node.
Once the seed nodes have been specified and the actor system has joined the cluster, Akka features depending on clustering (cluster singletons, sharding etc.) will boot up. This is why you can launch an ordinary actor, but not the singleton.
For more information on setting up seed nodes see Akka cluster documentation.

akka remote actor running on local actor

I am learning akka-remote and trying to re-do http://www.typesafe.com/activator/template/akka-sample-remote-scala myself.
When I try to run the project in two separate JVMs, I see
$ clear;java -jar akkaio-remote/target/akka-remote-jar-with-dependencies.jar com.harit.akkaio.remote.RemoteApp ProcessingActor
ProcessingActorSystem Started
and
$ clear;java -jar akkaio-remote/target/akka-remote-jar-with-dependencies.jar com.harit.akkaio.remote.RemoteApp WatchingActor
WatchingActorSystem Started
asking processor to process
processing big things
I asked my Processing System to run on port 2552
include "common"
akka {
# LISTEN on tcp port 2552
remote.netty.tcp.port = 2552
}
and I told my other system (WatchingSystem) to run on port 2554 but start processingActor on port 2552
include "common"
akka {
actor {
deployment {
"/processingActor/*" {
remote = "akka.tcp://ProcessingActorSystem#127.0.0.1:2552"
}
}
}
remote.netty.tcp.port = 2554
}
and common is about using the right provider
akka {
actor {
provider = "akka.remote.RemoteActorRefProvider"
}
remote {
netty.tcp {
hostname = "127.0.0.1"
}
}
}
Questions/Concerns
From logs, I see that the processingActor is running on WatchingActorSystem and not on ProcessingActorSystem, what is wrong going on?
How can I see that the two ActorSystems are connecting to each other. I do not see logging happening. However, in the example, I shared the logging happens. What am I missing?
The entire code is posted on Github and runs as well
1) Your deployment configuration is set up to have all the children of processingActor being remote, as described in the akka configuration docs
You should set it to this instead:
deployment {
"/processingActor" {
remote = "akka.tcp://ProcessingActorSystem#127.0.0.1:2552"
}
2) You need to set your log level to something useful as described in the akka logging documentation

How debug akka association porcess?

Here is a scenario:
I have packaged scala project with spray into jar file.
Launch jar file on RedHat 6.5 on Virtual Box (ip - 192.168.1.38)
Launch jar file on RedHat 6.5 on Virtual Box (ip - 192.168.1.41)
Everything works locally - I can send REST request to each virtual machine and get response.
Problem
Akka systems can not became to cluster. I run 192.168.1.38 with default settings, but 192.168.1.41 have an additional property - akka.cluster.seed-nodes which is set to akka.tcp://mySystem#192.168.1.38:2551. So I get:
[WARN] [12/09/2014 17:10:24.043] [mySystem-akka.remote.default-remote-dispatcher-8] [akka.tcp://mySystem#192.168.1.41:2551/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FmySystem%40192.168.1.38%3A2551-0] Association with remote system [akka.tcp://mySystem#192.168.1.38:2551] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://mySystem#192.168.1.38:2551]].
No other errors or warning. Also how can I test akka association or print debug akka association settings?
Also can linux settings influence to akka association?
Most probably iptables is blocking particular port, if it's your test configuration just disable iptables.
service iptables save
service iptables stop
chkconfig iptables off
service ip6tables save
service ip6tables stop
chkconfig ip6tables off
If it will not help try to check you SELinux configuration using command getenforce and the same for test purposes you can completely disable it. SELinux manual
In case of your application.conf, try using following configuration for each node:
akka {
log-dead-letters = on
loglevel = "debug"
actor
{
provider = "akka.cluster.ClusterActorRefProvider"
}
extensions = ["akka.contrib.pattern.ClusterReceptionistExtension"]
remote {
log-remote-lifecycle-events = off
netty.tcp {
port = 6001
}
}
cluster {
seed-nodes = [
"akka.tcp://ActorSystem#192.168.1.38:6001",
"akka.tcp://ActorSystem#192.168.1.41:6001"
]
auto-down-unreachable-after = 10s
}
}
All the logs related to the cluster nodes are logged as info but having debug log level in test environment is in general good idea.
When the second, node will join the cluster, you should notice following log:
INFO [ActorSystem-akka.actor.default-dispatcher-4] [Cluster(akka://ActorSystem)] - Cluster Node [akka.tcp://ActorSystem#10.0.1.41:6001] - Marking node(s) as REACHABLE [Member(address = akka.tcp://ActorSystem#10.0.1.41:6001, status = Up)]
Cluster state could be also monitored using jmx akka.Cluster MXBean
{ "self-address": "akka.tcp://ActorSystem#10.0.1.82:6001", "members": [ { "address": "akka.tcp://ActorSystem#10.0.1.82:6001", "status": "Up" } ], "unreachable": [ ] }