Scala Akka TCP Actors - scala

I have a question about the Akka 2.4 TCP API.
I running a server and have 2 TCP servers in Akka TCP, one for incoming clients and one for my server's worker nodes (which are on other computers/IPs). I have one current connection to a client, and one connection to a worker node.
If receiving a message from a client, I want to pass some of that information to the worker node, but my TCP Akka Actor representing the worker node connection doesn't seem to like when I, from the thread running the Client Akka Actor, send messages to the Akka Actor worker node.
So, as an example, if the client sends a message to delete a file, and that partitions on that file is on a worker node, I want to send a TCP message to that worker node that it should delete the partitions.
How can I from the client Actor send a message to the worker node Actor, that it should pass to the worker node server through TCP? When just doing the regular workerActorRef ! msg it doesn't receive it at all and no logging is shown.
I hope this question wasn't unclear but essentially I want the workerActorRef to in some way be able to have some functionality similar to "send this through the TCP socket".
Cheers,
Johan

Have you looked at Akka Remoting at all? If used properly it should be able to achieve what you want. You might want to look into Clustering too as it's built on top of Remoting.

Related

akka tcp server-client heartbeat message block by scheduler processing

I am using Akka cluster (Server) and exchanging HeartBeat message with client using Akka TCP in every 5 seconds.
HeartBeat is working fine till I am not using scheduler.
but when I am starting 4-5 schedulers, Server is not receiving heartbeat buffer message from client (tcp connection). After scheduler processing, I am getting 4-5 heartbeat messages at same time.
Akka sceduler is blocking actor other processing (buffer reading etc).
I already tried below, but still facing same issue.
different-2 dispatcher
created new actor and added scheduler call in seperate actor.
using 8 core machine
tried fork-join-executor and thread-pool-executor
already tried changing Tcp-SO-ReceivedBufferSize and Tcp-SO-SendBufferSize to 1024 or 2048, but it didn’t work.
already tried Tcp-SO-TcpNoDelay
Kindly help.

How are TCP Connections managed by kafka-clients scala library?

I am using kafka-clients library for integrating Kafka with a Scala application. And finding it difficult to understand, how and when TCP connections are made between Brokers and Producers/Consumers.
Please verify my understanding on the below points-
(1) No TCP connection is established on initialisation of KafkaProducer instance.
val producer = new KafkaProducer[String, String](properties)
This also holds true for KafkaConsumer.
val consumer = new KafkaConsumer[String, String](properties)
(2) First TCP connection (between Broker and Producer) is established on producing a record to Broker.
producer.send(record1)
(3) Subsequent send() calls from the same Producer to same Broker will share same TCP connection irrespective of the Topic.
producer.send(record2)
(4) In case of Consumer, first TCP connection is established on polling a Topic (not on Subscription).
val records = consumer.poll(timeout)
(5) Subsequent calls to poll by the same Consumer to the same Broker share the same connection.
No TCP connection is established on initialisation of KafkaProducer instance.
Not exactly. KafkaProducer initialisation will start the Sender thread from within multiple TCP connections to all the bootstrap servers will be established. Those Sockets will be used to retrieve metadata from the cluster.
First TCP connection (between Broker and Producer) is established on producing a record to Broker.
Almost correct. Actually client always creates multiple TCP connections to the brokers. This is even true when you have one broker. For producer, it often creates two connections, one of which is for updating Metadata and the other is for sending messages. For consumer(assume you are using consumer group), seems it will create 3 connections. One for finding coordinator; one for group management(including join/sync groups and offset things); one for retrieving offsets and the last for pulling messages.
UPDATE: consumer creates 3 connections instead of 4 which I previously claimed. THANKS #ppatierno FOR THE REMINDING.
Subsequent send() calls from the same Producer to same Broker will share same TCP connection irrespective of the Topic.
Subsequent send calls reuse the second connection producer creates.
In case of Consumer, first TCP connection is established on polling a Topic (not on Subscription).
Yes, all connections are created in the poll call.
Subsequent calls to poll by the same Consumer to the same Broker share the same connection.
Subsequent calls to poll reuse the last connection consumer creates.
Subsequent send() calls from the same Producer to same Broker will
share same TCP connection irrespective of the Topic.
Just to add (to the great answer by #amethystic) that if the producer tries to send to a new topic and the broker to which it's connected isn't the leader, the producer needs to fetch metadata about that topic and opening a new connection to the broker which is leader for that topic. So saying "share same TCP connection irrespective of the Topic" is not completely correct.

Kafka producer produce data to topic from PORT

I'm new to Kafka.
I have a Linux machine in which port number 2552 getting data stream from external server.
I want to use Kafka producer to listen to that port and get the stream of data to a topic.
This is a complete hack, but would work for a sandbox example:
nc -l 2552 | ./bin/kafka-console-producer --broker-list localhost:9092 --topic test_topic
It uses netcat to listen on the TCP port, and pipe anything received to a Kafka topic.
A quick Google also turned up this https://github.com/dhanuka84/kafka-connect-tcp which looks to do a similar thing but more robustly, using the Kafka Connect API.
You don't say if the traffic on port 2552 is TCP or UDP but in general you can easily write a program that listens on that port, parses the data received into discrete messages, and then publishes the data to a Kafka Topic as Kafka messages (with or without keys) using the Kafka Producer API.
In some cases there is existing open source code that might already do this for you so you do not need to write it from scratch. If the port 2552 protocol is a well known protocol like for example the TCP or UDP call-logging protocol registered in IANA ( see ftp://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt) then there might even be an existing Kafka Connector or Proxy that supports it. Search on GitHub for kafka-connect-[protocol] or take a look at the curated Connector list at https://www.confluent.io/product/connectors/
There may even be a generic TCP or UDP connector that you can use as a reference to configure or build your own for the specific protocol you are trying to ingest.

Zookeeper - what will happen if I pass in a connection string only some of the nodes from the zk cluster (ensemble)?

I have a zookeeper cluster consisting of N nodes (which knows about each other). What if I pass only M < N of the nodes' addresses in zk client connection string? What will be the cluster's behavior?
In a more specific case, what if I pass host address of only 1 zk from the cluster? Is it possible then for the zk client to connect to other hosts from the cluster? What if this one host is down? Will be client able to connect to other zookeeper nodes in an ensemble?
The other question is, is it possible to limit client to use only specific nodes from the ensemble?
What if I pass only M < N of the nodes' addresses in zk client
connection string? What will be the cluster's behavior?
ZooKeeper clients will connect only to the M nodes specified in the connection string. The ZooKeeper ensemble's back-end interactions (leader election and processing write transaction proposals) will continue to be processed by all N nodes in the cluster. Any of the N nodes still could become the ensemble leader. If a ZooKeeper server receives a write transaction request, and that server is not the current leader, then it will forward the request to the current leader.
In a more specific case, what if I pass host address of only 1 zk from
the cluster? Is it possible then for the zk client to connect to other
hosts from the cluster? What if this one host is down? Will be client
able to connect to other zookeeper nodes in an ensemble?
No, the client would only be able to connect to the single address specified in the connection string. That address effectively becomes a single point of failure for the application, because if the server goes down, the client will not have any other options for establishing a connection.
The other question is, is it possible to limit client to use only specific nodes from the ensemble?
Yes, you can limit the nodes that the client considers for establishing a connection by listing only those nodes in the client's connection string. However, keep in mind that any of the N nodes in the cluster could still become the leader, and then all client write requests will get forwarded to that leader. In that sense, the client is using the other nodes indirectly, but the client is not establishing a direct socket connection to those nodes.
The ZooKeeper Overview page in the Apache documentation has further discussion of client and server behavior in a ZooKeeper cluster. For example, there is a relevant quote in the Implementation section:
As part of the agreement protocol all write requests from clients are
forwarded to a single server, called the leader. The rest of the
ZooKeeper servers, called followers, receive message proposals from
the leader and agree upon message delivery. The messaging layer takes
care of replacing leaders on failures and syncing followers with
leaders.

Storm cluster not working in Production mode

I have a storm topology which in two nodes. One is the nimbus and the other is the supervisor.
A proxy which is not part of storm accepts an HTTP request from a client and passes it to the storm topology.
The topology is like this:
1. The proxy passes data to a storm spout.
2. The spout passes data to multiple bolts.
3. The result is passed back to the proxy by the last bolt.
I am running the proxy and passing data to storm. I am able to connect a socket to the listener at the topology side. The data emitted by the spout is shown to be 0 in the UI. The same topology works fine in a local mode.
Thought it was a problem with supervisor, but the supervisor seems to be running fine because I am able to see the supervisor description and the individual spouts and bolts. But none of them emit anything.
Now, I am confused if the problem is the data being passed to the wrong machine or something. In order to communicate to the spout, Im creating the socket from the proxy as follows:
InetAddress stormInetAddr=InetAddress.getByName("198.18.17.16");
int stormPort=4321;
Socket stormSocket=new Socket(stormInetAddr,stormPort);
Here 198.18.17.16 is the nimbus IP. And 4321 is the port where data is being expected.
I tried giving the supervisor IP here, and it didnt connect. However, this does.
Now the proxy waits for the output on a specific port.
On the other side, after processing, data is read from the bolt. And there seems to be no activity from the cluster. But, I am getting a response which is basically the same request I had sent with some jumbled up data. And this response is supposed to be sent by the last bolt to a specific port which I had defined. And I GET data back, but the cluster shows NO ACTIVITY. I know this is very vague, but, does anyone have any idea as to whats happening?
It sounds like Storm is working fine, but your proxy/network settings are not. If it were a storm error, you should see exceptions in Nimbus UI and/or in the Storm supervisor logs.
Consider temporarily shutting down storm and use nc -l 4321 on the supervisor machines to assert your proxy is working as expected.
However...
You may have a fundamental flaw in your model. Storm's spouts are pull-based, so it seems odd to have incoming requests pushed to them. This is possible, of course, if you have your spouts start listening when they spin up and simply queue the requests. However, this presents another challenge for your model: you will likely have multiple spouts running on a single machine and they cannot share the same port (4321).
If you want to meld these two world of push & pull; then consider using a Kafka Spout.