Unique Transactional IDs for Kafka Producer in distributed running mode - apache-kafka

I have a big data application that is based on the process Consume -> Process -> Produce. I am using Kafka in my ingestion pipeline and I am using the transactional producer for producing messages. All pieces of my application run fine, however there is a small problem in generating the IDs for the Transactional Producer. Scenario:
Say my application is running on one machine, I instantiate 2 consumer which have their own producers, so for e.g. lets say
Producer 1 has the transactional ID -> Consumer-0-Producer
Producer 2 has the transactional ID -> Consumer-1-Producer
now transactions initiated by these two producers will not interfere with each other, and this is what I desire. Pseudo code looks something like this:
ExecutorService executorService// responsible for starting my consumers
for (int i = 0; i < 2; i++) {
prod_trans_id = "consumer-" + str(i) + "-producer"
Custom_Consumer consumer = new Custom_Consumer(prod_trans_id)
executorService.submit(consumer)
}
This works perfectly fine if my application works on a single machine, however, this is not the case as the application needs to be run on multiple machines so when the same code is run on machine 2 the producers instantiated by the consumers on machine 2 will have same transactional ID as on machine 1. I want transactional IDs to be produced in a way that they don't conflict with one another as well as they are reproducible, which means in case if a application crashes/stops (say someone does service application stop and then service application start) and when it comes back online, then it should use the same Transactional IDs that were being used previously. I thought of UUIDs based approach, however, UUIDs are random and will not be the same when the application on one machine dies and comes back up online.

private final static String HOSTNAME_COMMAND = "hostname";
public static String getHostName() {
BufferedReader inputStreamReader = null;
BufferedReader errorStreamReader = null;
try {
Process process = Runtime.getRuntime().exec(HOSTNAME_COMMAND);
inputStreamReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
errorStreamReader = new BufferedReader(new InputStreamReader(process.getErrorStream()));
if (errorStreamReader.readLine() != null) {
throw new RuntimeException(String.format("Failed to get the hostname, exception message: %s",
errorStreamReader.readLine()));
}
return inputStreamReader.readLine();
} catch (IOException e) {
try {
if (inputStreamReader != null) {
inputStreamReader.close();
}
if (errorStreamReader != null) {
errorStreamReader.close();
}
} catch (IOException e1) {
LogExceptionTrace.logExceptionStackTrace(e1);
throw new RuntimeException(e1);
}
LogExceptionTrace.logExceptionStackTrace(e);
throw new RuntimeException(e);
}
}
And then use the hostname as follows:
final String producerTransactionalID = String.format("%s_producer", this.consumerName);
Where consumer name is set as follows:
for (int i = 0; i < NUMBER_OF_CONSUMERS; i++) {
String consumerName = String.format("%s-worker-%d", hostName, i);
Executor executor = new Executor(
Configuration, consumerName
);
Executors.add(executor);
futures.add(executorService.submit(executor));
}

Related

Kafka Transactional Producer

I am using Kafka 2 and I was going through the following link.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging
Below is my sample code for Transactional producer.
My code:
public void runProducer(final int sendMessageCount) throws Exception {
final Producer<Long, String> producer = createProducer();
producer.initTransactions();
final long time = System.currentTimeMillis();
try {
producer.beginTransaction();
for (long index = time; index < (time + sendMessageCount); index++) {
final ProducerRecord<Long, String> record =
new ProducerRecord<>(TOPIC, index,
"Test " + index);
// send returns Future
producer.send(record).get();
}
producer.commitTransaction();
}
catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
e.printStackTrace();
// We can't recover from these exceptions, so our only option is to close the producer and exit.
producer.close();
}
catch (final KafkaException e) {
e.printStackTrace();
// For all other exceptions, just abort the transaction and try again.
producer.abortTransaction();
}
finally {
producer.flush();
producer.close();
}
}
Questions:
Do we need to call endTransaction after commitTransaction ?
Do we need to call sendOffsetsToTransaction? What will happen if I don't include this?
How does it work when we deploy the same code to multiple servers with same transactionId? Do we need to have a separate transactionId for each instance? Say, machine1 crashes after beginTransaction() and after sending few records? How does machine2 with same transactionId recovers.
Machine1 is using transactionId "test" and it crashed after beginTransaction() and after producing few records. When the same instance comes up how does it resume the same transaction? We will actually again start from init & begin transaction.
How does it work for the same topic which was not involving in transaction and involving in transaction now? I am starting a new consumerGroup with transaction_committed, Will it read the messages which were committed before the transaction? Will the consumer with transaction_uncommitted see the messages which were aborted by transaction?

MassTransit 3 How to send a message explicitly to the error queue

I'm using MassTransit with Reactive Extensions to stream messages from the queue in batches. Since the behaviour isn't the same as a normal consumer I need to be able to send a message to the error queue if it fails an x number of times.
I've looked through the MassTransit source code and posted on the google groups and can't find an anwser.
Is this available on the ConsumeContext interface? Or is this even possible?
Here is my code. I've removed some of it to make it simpler.
_busControl = Bus.Factory.CreateUsingRabbitMq(cfg =>
{
var host = cfg.Host(new Uri("rabbitmq://localhost/"), h =>
{
h.Username("guest");
h.Password("guest");
});
cfg.UseInMemoryScheduler();
cfg.ReceiveEndpoint(host, "customer_update_queue", e =>
{
var _observer = new ObservableObserver<ConsumeContext<Customer>>();
_observer.Buffer(TimeSpan.FromMilliseconds(1000)).Subscribe(OnNext);
e.Observer(_observer);
});
});
private void OnNext(IList<ConsumeContext<Customer>> messages)
{
foreach (var consumeContext in messages)
{
Console.WriteLine("Content: " + consumeContext.Message.Content);
if (consumeContext.Message.RetryCount > 3)
{
// I want to be able to send to the error queue
consumeContext.SendToErrorQueue()
}
}
}
I've found a work around by using the RabbitMQ client mixed with MassTransit. Since I can't throw an exception when using an Observable and therefore no error queue is created. I create it manually using the RabbitMQ client like below.
ConnectionFactory factory = new ConnectionFactory();
factory.HostName = "localhost";
factory.UserName = "guest";
factory.Password = "guest";
using (IConnection connection = factory.CreateConnection())
{
using (IModel model = connection.CreateModel())
{
string exchangeName = "customer_update_queue_error";
string queueName = "customer_update_queue_error";
string routingKey = "";
model.ExchangeDeclare(exchangeName, ExchangeType.Fanout);
model.QueueDeclare(queueName, false, false, false, null);
model.QueueBind(queueName, exchangeName, routingKey);
}
}
The send part is to send it directly to the message queue if it fails an x amount of times like so.
consumeContext.Send(new Uri("rabbitmq://localhost/customer_update_queue_error"), consumeContext.Message);
Hopefully the batch feature will be implemented soon and I can use that instead.
https://github.com/MassTransit/MassTransit/issues/800

KafkaSpout (idle) generates a huge network traffic

After developing and executing my Storm (1.0.1) topology with a KafkaSpout and a couple of Bolts, I noticed a huge network traffic even when the topology is idle (no message on Kafka, no processing is done in bolts). So I started to comment out my topology piece by piece in order to find the cause and now I have only the KafkaSpout in my main:
....
final SpoutConfig spoutConfig = new SpoutConfig(
new ZkHosts(zkHosts, "/brokers"),
"files-topic", // topic
"/kafka", // ZK chroot
"consumer-group-name");
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConfig.startOffsetTime = OffsetRequest.LatestTime();
topologyBuilder.setSpout(
"kafka-spout-id,
new KafkaSpout(config),
1);
....
When this (useless) topology executes, even in local mode, even the very first time, the network traffic always grows a lot: I see (in my Activity Monitor)
An average of 432 KB of data received/sec
After a couple of hours the topology is running (idle) data received is 1.26GB and data sent is 1GB
(Important: Kafka is not running in cluster, a single instance that runs in the same machine with a single topic and a single partition. I just downloaded Kafka on my machine, started it and created a simple topic. When I put a message in the topic, everything in the topology is working without any problem at all)
Obviously, the reason is in the KafkaSpout.nextTuple() method (below), but I don't understand why, without any message in Kafka, I should have such traffic. Is there something I didn't consider? Is that the expected behaviour? I had a look at Kafka logs, ZK logs, nothing, I have cleaned up Kafka and ZK data, nothing, still the same behaviour.
#Override
public void nextTuple() {
List<PartitionManager> managers = _coordinator.getMyManagedPartitions();
for (int i = 0; i < managers.size(); i++) {
try {
// in case the number of managers decreased
_currPartitionIndex = _currPartitionIndex % managers.size();
EmitState state = managers.get(_currPartitionIndex).next(_collector);
if (state != EmitState.EMITTED_MORE_LEFT) {
_currPartitionIndex = (_currPartitionIndex + 1) % managers.size();
}
if (state != EmitState.NO_EMITTED) {
break;
}
} catch (FailedFetchException e) {
LOG.warn("Fetch failed", e);
_coordinator.refresh();
}
}
long diffWithNow = System.currentTimeMillis() - _lastUpdateMs;
/*
As far as the System.currentTimeMillis() is dependent on System clock,
additional check on negative value of diffWithNow in case of external changes.
*/
if (diffWithNow > _spoutConfig.stateUpdateIntervalMs || diffWithNow < 0) {
commit();
}
}
Put a sleep for one second (1000ms) in the nextTuple() method and observe the traffic now, For example,
#Override
public void nextTuple() {
try {
Thread.sleep(1000);
} catch(Exception ex){
log.error("Ëxception while sleeping...",e);
}
List<PartitionManager> managers = _coordinator.getMyManagedPartitions();
for (int i = 0; i < managers.size(); i++) {
...
...
...
...
}
The reason is, kafka consumer works on the basis of pull methodology which means, consumers will pull data from kafka brokers. So in consumer point of view (Kafka Spout) will do a fetch request to the kafka broker continuously which is a TCP network request. So you are facing a huge statistics on the data packet sent/received. Though the consumer doesn't consumes any message, pull request and empty response also will get account into network data packet sent/received statistics. Your network traffic will be less if your sleeping time is high. There are also some network related configurations for the brokers and also for consumer. Doing the research on configuration may helps you. Hope it will helps you.
Is your bolt receiving messages ? Do your bolt inherits BaseRichBolt ?
Comment out that line m.fail(id.offset) in Kafaspout and check it out. If your bolt doesn't ack then your spout assumes that message is failed and try to replay the same message.
public void fail(Object msgId) {
KafkaMessageId id = (KafkaMessageId) msgId;
PartitionManager m = _coordinator.getManager(id.partition);
if (m != null) {
//m.fail(id.offset);
}
Also try halt the nextTuple() for few millis and check it out.
Let me know if it helps

Cloud Service for incoming TCP connections hangs

I'm developing a cloud service (worker role) for collecting data from a number of instruments. These instruments reports data randomly every minute or so. The service itself is not performance critical and doesn't need to be asynchronous. The instruments are able to resend their data up to an hour on failed connection attempt.
I have tried several implementations for my cloud service including this one:
http://msdn.microsoft.com/en-us/library/system.net.sockets.tcplistener.stop(v=vs.110).aspx
But all of them hang my cloud server sooner or later (sometimes within an hour).
I suspect something is wrong with my code. I have a lot of logging in my code but I get no errors. The service just stops to receive incoming connections.
In Azure portal it seems like the service is running fine. No error logs and no suspicious cpu usage etc.
If I restart the service it will run fine again until it hangs next time.
Would be most grateful if someone could help me with this.
public class WorkerRole : RoleEntryPoint
{
private LoggingService _loggingService;
public override void Run()
{
_loggingService = new LoggingService();
StartListeningForIncommingTCPConnections();
}
private void StartListeningForIncommingTCPConnections()
{
TcpListener listener = null;
try
{
listener = new TcpListener(RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["WatchMeEndpoint"].IPEndpoint);
listener.Start();
while (true)
{
_loggingService.Log(SeverityLevel.Info, "Waiting for connection...");
var client = listener.AcceptTcpClient();
var remoteEndPoint = client.Client != null ? client.Client.RemoteEndPoint.ToString() : "Unknown";
_loggingService.Log(SeverityLevel.Info, String.Format("Connected to {0}", remoteEndPoint));
var netStream = client.GetStream();
var data = String.Empty;
using (var reader = new StreamReader(netStream, Encoding.ASCII))
{
data = reader.ReadToEnd();
}
_loggingService.Log(SeverityLevel.Info, "Received data: " + data);
ProcessData(data); //data is processed and stored in database (all resources are released when done)
client.Close();
_loggingService.Log(SeverityLevel.Info, String.Format("Connection closed for {0}", remoteEndPoint));
}
}
catch (Exception exception)
{
_loggingService.Log(SeverityLevel.Error, exception.Message);
}
finally
{
if (listener != null)
listener.Stop();
}
}
private void ProcessData(String data)
{
try
{
var processor = new Processor();
var lines = data.Split('\n');
foreach (var line in lines)
processor.ProcessLine(line);
processor.ProcessMessage();
}
catch (Exception ex)
{
_loggingService.Log(SeverityLevel.Error, ex.Message);
throw new Exception(ex.InnerException.Message);
}
}
}
One strange observation i just did:
I checked the log recently and no instrument has connected for the last 30 minutes (which indicates that the service is down).
I connected to the service myself via a TCP client i've written myself and uploaded some test data.
This worked fine.
When I checked the log again my test data had been stored.
The strange thing is, that 4 other instruments had connected about the same time and send their data successfully.
Why couldn't they connect by themself before I connected with my test client?
Also, what does this setting in .csdef do for an InputEndpoint, idleTimeoutInMinutes?
===============================================
Edit:
Since a cuple of days back my cloud service has been running successfully.
Unfortunately this morning last log entry was from this line:
_loggingService.Log(SeverityLevel.Info, String.Format("Connected to {0}", remoteEndPoint));
No other connections could be made after this. Not even from my own test TCP client (didn't get any error though, but no data was stored and no new logs).
This makes me think that following code causes the service to hang:
var netStream = client.GetStream();
var data = String.Empty;
using (var reader = new StreamReader(netStream, Encoding.ASCII))
{
data = reader.ReadToEnd();
}
I've read somewhere that StremReader's ReadToEnd() could hang. Is this possible?
I have now changed this piece of code to this:
int i;
var bytes = new Byte[256];
var data = new StringBuilder();
const int dataLimit = 10;
var dataCount = 0;
while ((i = netStream.Read(bytes, 0, bytes.Length)) != 0)
{
data.Append(Encoding.ASCII.GetString(bytes, 0, i));
if (dataCount >= dataLimit)
{
_loggingService.Log(SeverityLevel.Error, "Reached data limit");
break;
}
dataCount++;
}
Another explanation could be something hanging in the database. I use the SqlConnection and SqlCommand classes to read and write to my database. I always close my connection afterwards (finally block).
SqlConnection and SqlCommand should have default timeouts, right?
===============================================
Edit:
After some more debugging I found out that when the service wasn't responding it "hanged" on this line of code:
while ((i = netStream.Read(bytes, 0, bytes.Length)) != 0)
After some digging I found out that the NetStream class and its read methods could actually hang. Even though MS declares otherwise.
NetworkStream read hangs
I've now changed my code into this:
Thread thread = null;
var task = Task.Factory.StartNew(() =>
{
thread = Thread.CurrentThread;
while ((i = netStream.Read(bytes, 0, bytes.Length)) != 0)
{
// Translate data bytes to a ASCII string.
data.Append(Encoding.ASCII.GetString(bytes, 0, i));
}
streamReadSucceeded = true;
});
task.Wait(5000);
if (streamReadSucceeded)
{
//Process data
}
else
{
thread.Abort();
}
Hopefully this will stop the hanging.
I'd say that part of your problem is you are processing your data on the thread that listens for connections from clients. This would prevent new clients from connecting if another client has started a long running operation of some type. I'd suggest you defer your processing to worker threads thus freeing the "listener" thread to accept new connections.
Another problem you could be experiencing, if your service throws an error, then the service will stop accepting connections as well.
private static void ListenForClients()
{
tcpListener.Start();
while (true)
{
TcpClient client = tcpListener.AcceptTcpClient();
Thread clientThread = new Thread(new ParameterizedThreadStart(HandleClientComm));
clientThread.Start(client);
}
}
private static void HandleClientComm(object obj)
{
try
{
using(TcpClient tcpClient = (TcpClient)obj)
{
Console.WriteLine("Got Client...");
using (NetworkStream clientStream = tcpClient.GetStream())
using (StreamWriter writer = new StreamWriter(clientStream))
using(StreamReader reader = new StreamReader(clientStream))
{
//do stuff
}
}
}
catch(Exception ex)
{
}
}

HornetQ Consumer Stops Receiving Messages After N hours

I'm using a HornetQ(v2.2.13) consumer, running standalone, to read off a Persistent topic published by a JBOSS server(7.1.1 final). All goes well for several hours (between 2-6) and then the consumer just stops receiving messages from the topic. From the log file on the server, I see data keeps getting pumped down the pipe but the consumer log file indicates the client stopped reading the data. I deduced that from the client saying the last time it read a message off the topic was 12:00:00 and the server log saying the last time it pushed a message to the topic was 14:00:00.
I've tried adjusting the HornetQ configs, but it doesn't seem to be working for a sustainable duration.
The code I use to communicate with the topic is as follows.
private TransportConfiguration getTC(String hostname) {
Map<String,Object> params = new HashMap<String, Object>();
params.put(TransportConstants.HOST_PROP_NAME, hostname);
params.put(TransportConstants.PORT_PROP_NAME, 5445);
TransportConfiguration tc = new TransportConfiguration(NettyConnectorFactory.class.getName(), params);
return tc;
}
private Topic createDestination(String destinationName) {
Topic topic = new HornetQTopic(destinationName);
return topic;
}
private HornetQConnectionFactory createCF(TransportConfiguration tc) {
HornetQConnectionFactory cf = HornetQJMSClient.createConnectionFactoryWithoutHA(JMSFactoryType .CF, tc);
return cf == null ? null : cf;
}
Code snippet that creates the session and starts it:
TransportConfiguration tc = this.getTC(this.hostname);
HornetQConnectionFactory cf = this.createCF(tc);
cf.setRetryInterval(4000);
cf.setReconnectAttempts(10);
cf.setConfirmationWindowSize(1000000);
Destination destination = this.createDestination(this.topicName);
logger.info("Starting Topic Connection");
try {
this.connection = cf.createConnection();
connection.start();
this.session = connection.createSession(transactional, ackMode);
MessageConsumer consumer = session.createConsumer(destination);
consumer.setMessageListener(this);
logger.info("Started topic connection");
} catch (Exception ex) {
ex.printStackTrace();
logger.error("EXCEPTION!");
}
You don't get any log about server getting disconnected on the server's side.
Did you try playing with client-failure-check-period and other ping parameters?
What about VM Settings?
How are you acknowledging the messages? I see that you created it as transaction. Are you sure you are committing the TX as you recieve messages?