SockJS connections in a clustered Vert.x environment - vert.x

The vertx application runs in Docker containers, on two EC2 instances and is clustered.
Clustering is achieved with the hazelcast-aws plugin and the application is started like this:
docker run --name ... -p ... \
--network ... \
-v ... \
-d ... \
-c 'exec java \
-Dvertx.eventBus.options.setClustered=true \
-Dvertx.eventBus.options.setClusterPort=15701 \
-jar ... -conf ... \
-cluster'
Nothing cluster-related is set programmatically.
Client opens a socket on the first request and uses it for future similar requests.
Each request will:
initiate an async request with the server by publishing a message to the event bus
register a consumer on the event bus which will handle the result of the above,
and which is passed a reference to the socket connection where it should send the result to
Since vertx does round robin by default when clustered and there are two instances, this means any instance gets every other message (from 1., above) and makes the client, which connects to one instance only, receive exactly half of all expected responses.
I suppose this is because, even though the registered consumer has a reference to the socket object, it can't use it because it was created on a different node/webserver.
Would that be correct and is there a way to get 100% of messages to the client, connected to just one node, without introducing things like RabbitMQ?
Here's the SockJS handler code:
SockJSHandler sockJSHandler = SockJSHandler.create(vertx, new SockJSHandlerOptions());
sockJSHandler.socketHandler(socket -> {
SecurityService securityService = (SecurityService) ServiceFactory.getService(SecurityService.class);
if (securityService.socketHeadersSecurity(socket)) {
socket.handler(socketMessage -> {
try {
LOGGER.trace("socketMessage: " + socketMessage);
Socket socket = Json.decodeValue(socketMessage.toString(), Socket.class);
Report report = socket.getReport();
if (report != null) {
Account accountRequest = socket.getAccount();
Account accountDatabase = accountRequest == null ? null
: ((AccountService) ServiceFactory.getService(AccountService.class)).getById(accountRequest.getId());
Response result = securityService.socketReportSecurity(accountRequest, accountDatabase, report) ?
((ReportService) ServiceFactory.getService(ReportService.class)).createOrUpdateReport(report, accountDatabase)
: new Response(Response.unauthorized);
if (Response.success.equals(result.getResponse())) {
//register a consumer
String consumerName = "report.result." + Timestamp.from(ClockFactory.getClock().instant());
vertx.eventBus().consumer(consumerName, message -> {
Response executionResult;
if ("success".equals(message.body())) {
try {
Path csvFile = Paths.get(config.getString(Config.reportPath.getConfigName(), Config.reportPath.getDefaultValue())
+ "/" + ((Report) result.getPayload()).getId() + ".csv");
executionResult = new Response(new JsonObject().put("csv", new String(Files.readAllBytes(csvFile))));
} catch (IOException ioEx) {
executionResult = new Response(new Validator("Failed to read file.", ioEx.getMessage(), null, null));
LOGGER.error("Failed to read file.", ioEx);
}
} else {
executionResult = new Response(new Validator("Report execution failed", (String)message.body(), null, null));
}
//send second message to client
socket.write(Json.encode(executionResult));
vertx.eventBus().consumer(consumerName).unregister();
});
//order report execution
vertx.eventBus().send("report.request", new JsonObject()
.put("reportId", ((Report) result.getPayload()).getId())
.put("consumerName", consumerName));
}
//send first message to client
socket.write(Json.encode(result));
} else {
LOGGER.info("Insufficient data sent over socket: " + socketMessage.toString());
socket.end();
}
} catch (DecodeException dEx) {
LOGGER.error("Error decoding message.", dEx);
socket.end();
}
});
} else {
LOGGER.info("Illegal socket connection attempt from: " + socket.remoteAddress());
socket.end();
}
});
mainRouter.route("/websocket/*").handler(sockJSHandler);
Interestingly, when running two nodes clustered on localhost the client gets 100% of the results.
EDIT:
This was not a SockJS but a configuration issue.

Since vertx does round robin by default when clustered and there are
two instances, this means any instance gets every other message (from
1., above) and makes the client, which connects to one instance only, receive exactly half of all expected responses.
This assumption is only partially correct. Vert.x does round-robin, yes, but this means each instance will get half of the connections, not half of the messages.
Once connection is established, all its messages will arrive to a single instance.
So this:
Would that be correct and is there a way to get 100% of messages to
the client, connected to just one node, without introducing things
like RabbitMQ?
Already happens.

Related

stream two event messages in one from gRPC Server

I have a scenario where I am trying to send multiple messages in one and trying to stream the one message from the gRPC server to the gRPC client side.
My proto files on server side look like this:
Service Greeter{
rpc AccumulateEvents (EventRequest) returns (stream EventsMessage);
}
message EventsMessage{
FirstEvent firstEvents =1;
SecondEvents secondEvent = 2;
}
message EventRequest{
//sending empty request
}
The service method is as follows:
public override async Task AccumulateEvents(EventRequest eventRequest, IServerStreamWriter < EventsMessage > responseStream, ServerCallContext context) {
IDisposable disposable4 = service.SubscribeToEvents(OnEvents);
service.execute();
await responseStream.WriteAsync(new EventsMessage {
FirstEvent = firstEvent, SecondEvents = secondEvents
});
}
When I am trying to fetch and parse the stream from the client side,i am getting null for secondEvent part of the message EventsMessage. Only firstEvents was returned from the server to the client. I tried debugging and could see secondEvent getting populated but then it became null when the streaming started from the server.
Also, secondEvent is a repeated field. I am not sure if that is the reason of it becoming null.
Please let me know what i might be missing here.

Netty Client Connect with Server, but server does not fire channelActive/Registered

I have the following architecture in use:
- [Client] - The enduser connecting to our service.
- [GameServer] - The game server on which the game is running.
- [GameLobby] - A server that is responsible for matching Clients with a GameServer.
If we have for example 4 Clients that want to play a game and get matched to a GameLobby, then the first time all these connection succeeds properly.
However when they decide to rematch, then one of the Clients will not properly connect.
The connection between all the Clients and the GameServer happens simultaneously.
Clients that rematch first removes their current connection with the GameServer and head into the lobby again.
This connection will succeed, no errors are thrown. Even using a ChannelFuture it shows that the client connection was made properly, the following values are retrieved to show that the client thinks the connection was correct:
- ChannelFuture.isSuccess() = True
- ChannelFuture.isDone() = True
- ChannelFuture.cause() = Null
- ChannelFuture.isCancelled() = False
- Channel.isOpen() = True
- Channel.isActive() = True
- Channel.isRegistered() = True
- Channel.isWritable() = True
Thus the connection was properly made according to the Client. However on the GameServer at the SimpleChannelInboundHandler, the method ChannelRegistered/ChannelActive is never called for that specific Client. Only for the other 3 Clients.
All the 4 Clients, the GameServer, and the Lobby are running on the same IPAddress.
Since it only happens when (re)connecting again to the GameServer, I thought that is had to do with not properly closing the connection. Currently this is done through:
try {
group.shutdownGracefully();
channel.closeFuture().sync();
} catch (InterruptedException e) {
e.printStackTrace();
}
On the GameServer the ChannelUnregister is called thus this is working, and the connection is destroyed.
I have tried adding listeners to the ChannelFuture of the malfunctioning channel connection, however according to the channelFuture everything works, which is not the case.
I tried adding ChannelOptions to allow for more Clients queued to the server.
GameServer
The GameServer server is initialized as follow:
// Create the bootstrap to make this act like a server.
ServerBootstrap serverBootstrap = new ServerBootstrap();
serverBootstrap.group(bossGroup)
.channel(NioServerSocketChannel.class)
.childHandler(new ChannelInitialisation(new ClientInputReader(gameThread)))
.option(ChannelOption.SO_BACKLOG, 1000)
.childOption(ChannelOption.SO_KEEPALIVE, true)
.childOption(ChannelOption.TCP_NODELAY, true);
bossGroup.execute(gameThread); // Executing the thread that handles all games on this GameServer.
// Launch the server with the specific port.
serverBootstrap.bind(port).sync();
The GameServer ClientInputReader
#ChannelHandler.Sharable
public class ClientInputReader extends SimpleChannelInboundHandler<Packet> {
private ServerMainThread serverMainThread;
public ClientInputReader(ServerMainThread serverMainThread) {
this.serverMainThread = serverMainThread;
}
#Override
public void channelRegistered(ChannelHandlerContext ctx) throws Exception {
System.out.println("[Connection: " + ctx.channel().id() + "] Channel registered");
super.channelRegistered(ctx);
}
#Override
protected void channelRead0(ChannelHandlerContext ctx, Packet packet) {
// Packet handling
}
}
The malfunction connection is not calling anything of the SimpleChannelInboundHandler. Not even ExceptionCaught.
The GameServer ChannelInitialisation
public class ChannelInitialisation extends ChannelInitializer<SocketChannel> {
private SimpleChannelInboundHandler channelInputReader;
public ChannelInitialisation(SimpleChannelInboundHandler channelInputReader) {
this.channelInputReader = channelInputReader;
}
#Override
protected void initChannel(SocketChannel ch) throws Exception {
ChannelPipeline pipeline = ch.pipeline();
// every packet is prefixed with the amount of bytes that will follow
pipeline.addLast(new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE, 0, 4, 0, 4));
pipeline.addLast(new LengthFieldPrepender(4));
pipeline.addLast(new PacketEncoder(), new PacketDecoder(), channelInputReader);
}
}
Client
Client creating a GameServer connection:
// Configure the client.
group = new NioEventLoopGroup();
Bootstrap b = new Bootstrap();
b.group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.TCP_NODELAY, true)
.handler(new ChannelInitialisation(channelHandler));
// Start the client.
channel = b.connect(address, port).await().channel();
/* At this point, the client thinks that the connection was succesfully, as the channel is active, open, registered and writable...*/
ClientInitialisation:
public class ChannelInitialisation extends ChannelInitializer<SocketChannel> {
private SimpleChannelInboundHandler<Packet> channelHandler;
ChannelInitialisation(SimpleChannelInboundHandler<Packet> channelHandler) {
this.channelHandler = channelHandler;
}
#Override
public void initChannel(SocketChannel ch) throws Exception {
// prefix messages by the length
ch.pipeline().addLast(new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE, 0, 4, 0, 4));
ch.pipeline().addLast(new LengthFieldPrepender(4));
// our encoder, decoder and handler
ch.pipeline().addLast(new PacketEncoder(), new PacketDecoder(), channelHandler);
}
}
ClientHandler:
public class ClientPacketHandler extends SimpleChannelInboundHandler<Packet> {
#Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
super.channelActive(ctx);
System.out.println("Channel active: " + ctx.channel().id());
ctx.channel().writeAndFlush(new PacketSetupClientToGameServer());
System.out.println("Sending setup packet to the GameServer: " + ctx.channel().id());
// This is successfully called, as the client thinks the connection was properly made.
}
#Override
protected void channelRead0(ChannelHandlerContext ctx, Packet packet) {
// Reading packets.
}
}
I expect that the Client could connect properly to the server. Since the other Clients are properly connecting and the client could previously connect just fine.
TL;DR: When multiple Clients try to create a new match, there is a possibility that one, possibly more, Client(s) will not connect properly with the server, after the previous connection was closed.
For some that struggle with this issue in some way or another.
I did a workaround that allows me to continue even tho there is still a bug inside the Netty framework (as far as I am concerned). The workaround is quite simple just create a connection pool.
My solution uses a maximum of five connections inside the connection pool. If one of the connection gets no reply from the GameServer, then it is not that big of a deal, since there are four others that will have a high chance of succeeding. I know this is a bad workaround, but I could not find any information on this issue. It works and only gives a maximum delay of 5 seconds (each retry takes a second)

Unique Transactional IDs for Kafka Producer in distributed running mode

I have a big data application that is based on the process Consume -> Process -> Produce. I am using Kafka in my ingestion pipeline and I am using the transactional producer for producing messages. All pieces of my application run fine, however there is a small problem in generating the IDs for the Transactional Producer. Scenario:
Say my application is running on one machine, I instantiate 2 consumer which have their own producers, so for e.g. lets say
Producer 1 has the transactional ID -> Consumer-0-Producer
Producer 2 has the transactional ID -> Consumer-1-Producer
now transactions initiated by these two producers will not interfere with each other, and this is what I desire. Pseudo code looks something like this:
ExecutorService executorService// responsible for starting my consumers
for (int i = 0; i < 2; i++) {
prod_trans_id = "consumer-" + str(i) + "-producer"
Custom_Consumer consumer = new Custom_Consumer(prod_trans_id)
executorService.submit(consumer)
}
This works perfectly fine if my application works on a single machine, however, this is not the case as the application needs to be run on multiple machines so when the same code is run on machine 2 the producers instantiated by the consumers on machine 2 will have same transactional ID as on machine 1. I want transactional IDs to be produced in a way that they don't conflict with one another as well as they are reproducible, which means in case if a application crashes/stops (say someone does service application stop and then service application start) and when it comes back online, then it should use the same Transactional IDs that were being used previously. I thought of UUIDs based approach, however, UUIDs are random and will not be the same when the application on one machine dies and comes back up online.
private final static String HOSTNAME_COMMAND = "hostname";
public static String getHostName() {
BufferedReader inputStreamReader = null;
BufferedReader errorStreamReader = null;
try {
Process process = Runtime.getRuntime().exec(HOSTNAME_COMMAND);
inputStreamReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
errorStreamReader = new BufferedReader(new InputStreamReader(process.getErrorStream()));
if (errorStreamReader.readLine() != null) {
throw new RuntimeException(String.format("Failed to get the hostname, exception message: %s",
errorStreamReader.readLine()));
}
return inputStreamReader.readLine();
} catch (IOException e) {
try {
if (inputStreamReader != null) {
inputStreamReader.close();
}
if (errorStreamReader != null) {
errorStreamReader.close();
}
} catch (IOException e1) {
LogExceptionTrace.logExceptionStackTrace(e1);
throw new RuntimeException(e1);
}
LogExceptionTrace.logExceptionStackTrace(e);
throw new RuntimeException(e);
}
}
And then use the hostname as follows:
final String producerTransactionalID = String.format("%s_producer", this.consumerName);
Where consumer name is set as follows:
for (int i = 0; i < NUMBER_OF_CONSUMERS; i++) {
String consumerName = String.format("%s-worker-%d", hostName, i);
Executor executor = new Executor(
Configuration, consumerName
);
Executors.add(executor);
futures.add(executorService.submit(executor));
}

KafkaSpout (idle) generates a huge network traffic

After developing and executing my Storm (1.0.1) topology with a KafkaSpout and a couple of Bolts, I noticed a huge network traffic even when the topology is idle (no message on Kafka, no processing is done in bolts). So I started to comment out my topology piece by piece in order to find the cause and now I have only the KafkaSpout in my main:
....
final SpoutConfig spoutConfig = new SpoutConfig(
new ZkHosts(zkHosts, "/brokers"),
"files-topic", // topic
"/kafka", // ZK chroot
"consumer-group-name");
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConfig.startOffsetTime = OffsetRequest.LatestTime();
topologyBuilder.setSpout(
"kafka-spout-id,
new KafkaSpout(config),
1);
....
When this (useless) topology executes, even in local mode, even the very first time, the network traffic always grows a lot: I see (in my Activity Monitor)
An average of 432 KB of data received/sec
After a couple of hours the topology is running (idle) data received is 1.26GB and data sent is 1GB
(Important: Kafka is not running in cluster, a single instance that runs in the same machine with a single topic and a single partition. I just downloaded Kafka on my machine, started it and created a simple topic. When I put a message in the topic, everything in the topology is working without any problem at all)
Obviously, the reason is in the KafkaSpout.nextTuple() method (below), but I don't understand why, without any message in Kafka, I should have such traffic. Is there something I didn't consider? Is that the expected behaviour? I had a look at Kafka logs, ZK logs, nothing, I have cleaned up Kafka and ZK data, nothing, still the same behaviour.
#Override
public void nextTuple() {
List<PartitionManager> managers = _coordinator.getMyManagedPartitions();
for (int i = 0; i < managers.size(); i++) {
try {
// in case the number of managers decreased
_currPartitionIndex = _currPartitionIndex % managers.size();
EmitState state = managers.get(_currPartitionIndex).next(_collector);
if (state != EmitState.EMITTED_MORE_LEFT) {
_currPartitionIndex = (_currPartitionIndex + 1) % managers.size();
}
if (state != EmitState.NO_EMITTED) {
break;
}
} catch (FailedFetchException e) {
LOG.warn("Fetch failed", e);
_coordinator.refresh();
}
}
long diffWithNow = System.currentTimeMillis() - _lastUpdateMs;
/*
As far as the System.currentTimeMillis() is dependent on System clock,
additional check on negative value of diffWithNow in case of external changes.
*/
if (diffWithNow > _spoutConfig.stateUpdateIntervalMs || diffWithNow < 0) {
commit();
}
}
Put a sleep for one second (1000ms) in the nextTuple() method and observe the traffic now, For example,
#Override
public void nextTuple() {
try {
Thread.sleep(1000);
} catch(Exception ex){
log.error("Ëxception while sleeping...",e);
}
List<PartitionManager> managers = _coordinator.getMyManagedPartitions();
for (int i = 0; i < managers.size(); i++) {
...
...
...
...
}
The reason is, kafka consumer works on the basis of pull methodology which means, consumers will pull data from kafka brokers. So in consumer point of view (Kafka Spout) will do a fetch request to the kafka broker continuously which is a TCP network request. So you are facing a huge statistics on the data packet sent/received. Though the consumer doesn't consumes any message, pull request and empty response also will get account into network data packet sent/received statistics. Your network traffic will be less if your sleeping time is high. There are also some network related configurations for the brokers and also for consumer. Doing the research on configuration may helps you. Hope it will helps you.
Is your bolt receiving messages ? Do your bolt inherits BaseRichBolt ?
Comment out that line m.fail(id.offset) in Kafaspout and check it out. If your bolt doesn't ack then your spout assumes that message is failed and try to replay the same message.
public void fail(Object msgId) {
KafkaMessageId id = (KafkaMessageId) msgId;
PartitionManager m = _coordinator.getManager(id.partition);
if (m != null) {
//m.fail(id.offset);
}
Also try halt the nextTuple() for few millis and check it out.
Let me know if it helps

rabbit messaging confirmation

I am using rabbitmq and I want to make sure that if I have a connection problem in the client, the messages that I posted won't be lost. I simulate it with eclipse: I do system.exit the program of fetching after 100 messages. I posted 1000 messages. The second run I don't limit the number of messages and it returns me 840 messages with 3 times. Can you help me?
the code of the producer is:
public void run() {
String json =SimpleQueueServiceSample.getFromList();
while (!(json.equals(""))){
json =SimpleQueueServiceSample.getFromList();
try {
c.basicPublish("", "test",
MessageProperties.PERSISTENT_TEXT_PLAIN, json.getBytes());
} catch (IOException e) {
e.printStackTrace();
}
}
try {
c.waitForConfirmsOrDie();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
the code of the consumber is:
QueueingConsumer consumer = new QueueingConsumer(channel);
channel.basicConsume(QUEUE_NAME, true, consumer);
while (true) {
System.out.println(count++);
QueueingConsumer.Delivery delivery = consumer.nextDelivery();
String message = new String(delivery.getBody());
System.out.println(" [x] Received '" + message + "'");
}
So the challenge for your scenario is how you're handling the acknowledgements.
channel.basicConsume(QUEUE_NAME, true, consumer);
Is the problem. The second parameter of true is the auto-acknowledge field.
To fix that, use:
channel.basicConsume(QUEUE_NAME, false, consumer);
while (true) {
QueueingConsumer.Delivery delivery = consumer.nextDelivery();
//...
channel.basicAck(delivery.getEnvelope().getDeliveryTag(), false);
}
It looks like you're using RabbitMQ's tutorials, and your code snippet is from part one. If you look at part two, they start talking about acknowledgements and setting up quality of service to provide round-robin dispatch.
It's worth pointing out that the basicConsume() and nextDelivery() combination rely upon a hidden queue that lives within the consumer. So when you call basicConsume() several messages are pulled down to the client to local storage.
The benefit at that approach is that it avoids additional network overhead from calling for each individual message. The problem is that it can put more messages within your local consumer than you wish and you may lose messages if the consumer drops before processing all of the messages in the local hidden queue.
If you truly want your consumers only working on one message a time so that nothing is lost, you probably want to look at the basicGet() method instead of the basicConsume().