Sarama ClusterAdmin connection issue - broken pipe - apache-kafka

I am using sarama(1.27) ClusterAdmin to manage topics in kafka1.1.0. My application that manages kafka topics, is running as a REST service. My application runs fine for a while and I can get/create/delete topic.
But after some time elapses without any activity, a new topic request gets error - write tcp xxxxx:37888->xxxxx:9092: write: broken pipe.
I came across this How to fix broker may not be available after broken pipe.
Since my application is running as a service, how do I prevent broken pipe issue ? I close ClusterAdmin only when application exits. Same ClusterAdmin connection is used to serve all requests. I reinitialize clusterAdmin for each request if for any reason it is nil(Usually it is not nil after first initialization, so same connection is reused).
Should I close clusteradmin after each request is served and open a NewClusterAdmin() for each topic request, or is there a keepalive option that I need to use?
Here is my existing code:
if admin == nil{
admin, err := NewClusterAdmin([]string{"localhost:9092"}, s.config)
..
}
topicMetadata, err := admin.DescribeTopics([]string{topicName})

I also came cross this error. My way to fix this question is try again several times, e.g. 2 to 10 times.

Related

JMS Outbound Gateway - Receiving Replies from two jobs instances

We are using the JMSOutboundGateway to send message and receive message using the reply channel within the JMSOutboundGateway. When we run multiple iterations of the same job using the same JMSOutboundGateway, it fails with this error "Message contained wrong job instance id [85] should have been [86]" ( org.springframework.batch.integration.chunk.ChunkMessageChannelItemWriter.getNextResult() ) .
This is due to same JMSOutBoundGateway instance being using when I run the second when the first job is still in progress.
Is there a way I can run parallel execution of the same job type ?
This is a known issue, see https://github.com/spring-projects/spring-batch/issues/1372 and https://github.com/spring-projects/spring-batch/issues/1096.
The workaround is to use a separate instance of the writer for each job to prevent sharing the same reply channel.

I see errors using node-rdkafka but it seems to be working ok

I have a Bluemix Node.js (6.1.0) application that uses node-rdkafka 1.0.3. It seems to be working ok but there are tons of error events like Error: Local: Broker Transport Failure or Error: Local: Authentication failure.
The producer options I have set are:
var producer_opts = {
"metadata.broker.list":env.messagehub.brokers,
"security.protocol":"sasl_ssl",
"ssl.ca.location":env.messagehub.calocation,
"sasl.mechanisms":"PLAIN",
"sasl.username":env.messagehub.user,
"sasl.password":env.messagehub.password,
"api.version.request":true,
"socket.timeout.ms": 10000,
"dr_msg_cb":true
};
Consumer has similar settings plus the group.id tag.
I wonder if I should be worrying for theese errors and if there is a way to eliminate them.
Thanks!
You are probably hitting https://github.com/edenhill/librdkafka/issues/1218.
In many cases, as you've noticed, these errors are harmless. The library node-rdkafka is based onto, librdkafka, always connects to all brokers in the cluster. Brokers your applications doesn't interact with will close the idle connections after a while leading to these error messages in your clients.
Unfortunately we don't have a recommended way to eliminate them at the moment. We are currently working on a potential solution to at least reduce their rate and maybe get rid of them.
Update:
With the most recent releases of node-rdkafka (>2.2), you can get rid of all the noisy logs by setting the following properties when creating clients:
'broker.version.fallback': '0.10.2.1',
'log.connection.close' : false

ReactiveMongo socket disconnect

Logs:
OUT 08:52:27.158 [reactivemongo-akka.actor.default-dispatcher-4] ERROR reactivemongo.core.actors.MongoDBSystem - The primary is unavailable, is there a network problem?
ERR reactivemongo.core.errors.GenericDriverException: MongoError['socket disconnected']
ERR at reactivemongo.core.actors.MongoDBSystem$$anonfun$4$$anonfun$applyOrElse$30.apply(actors.scala:390) ~[org.reactivemongo.reactivemongo_2.11-0.11.6.jar:0.11.6]
Our rest api, written in Scala (utilising the Spray and Akka frameworks) is deployed on a cloud.
We've tried setting the KeepAlive flag in ReactiveMongoOptions and then implemented a Jenkins job to periodically hit the database to keep it alive. However since adding these we've not seen the issue reoccur.
Rather than assume this has fixed it, before pushing to production, we are trying to reproduce the issue. Any ideas on what may be the cause or how we can reproduce this?

MessageQueue.Exists(QueueName) returns false but it exists

The problem I'm having is with this code:
if (!MessageQueue.Exists(QueueName))
{
MessageQueue.Create(QueueName, true);
}
It will check if a queue exists; if it doesn't I want it to create the queue. This code has been working and hasn't changed for a few months. Today I started receiving this error:
[MessageQueueException (0x80004005): A queue with the same path name
already exists.] System.Messaging.MessageQueue.Create(String path,
Boolean transactional) +239478
The queues are local and if I delete the specific queue it will work once. After the queue is created it starts to fail again with the same error message.
It looks like the issue may be because of the Network Load Balancing (NLB) configuration. I was unaware of a change that recently put the machine in a NLB environment. The configuration we are using is an unsupported one.
More information is in How Message Queuing can function over Network Load Balancing (NLB).

Cannot read remote private queue

I'm trying to get MSMQ 5 working on my two Windows Server 2008 R2 virtual machines.
I can send to local and remote private queues, and I can read from local private queues.
I can't read from remote private queues.
I've read a number of suggestions, especially the ones summarised by John Breakwell at MSMQ Issue reading remote private queues (again).
Things I've already done:
Turned off firewalls on both machines.
Ensured that Everyone and AnonymousLogon have full control of the queues. (If I take away AnonymousLogon access, then I can't remotely send to the queue, and the message ends up with "Access is denied" on the receiving machine.)
Allowed Nonauthenticated Rpc on both machines.
Allowed NewRemoteReadServerAllowNoneSecurityClient on both machines.
the sending code fragment is:
MessageQueue queue = new MessageQueue(queueName, false, false, QueueAccessMode.Send);
Message msg = new Message("Blah");
msg.UseDeadLetterQueue = true;
msg.UseJournalQueue = true;
queue.Send(msg, MessageQueueTransactionType.Automatic);
queue.Close();
The receiving code fragment is:
queueName = String.Format("FormatName:DIRECT=OS:{0}\\private$\\{1}",host,id);
queue = new MessageQueue(queueName, QueueAccessMode.Receive);
queue.ReceiveCompleted += new ReceiveCompletedEventHandler(receive);
queue.BeginReceive();
...
public void receive(object sender, ReceiveCompletedEventArgs e)
{
queue.EndReceive(e.AsyncResult);
Console.WriteLine("Message received");
queue.BeginReceive();
}
My queueName ends up as FormatName:DIRECT=OS:server2\private$\TestQueue
When I call beginReceive() on the queue, I get
Exception: System.Messaging.MessageQueueException (0x80004005)
at System.Messaging.MessageQueue.MQCacheableInfo.get_ReadHandle()
at System.Messaging.MessageQueue.ReceiveAsync(TimeSpan timeout, CursorHandle cursorHandle, Int32 action, AsyncCallback callback, Object stateObject)
at System.Messaging.MessageQueue.BeginReceive()
I've used Wireshark on Server1 to look at the network traffic. Without posting all the detail, it seems to go through the following stages. (Server1 is trying to read from a queue on Server2.)
Server1 contacts Server2, and there is an NTLMSSP challenge/response negotiation. A couple of the responses mention "Unknown result (3), reason: Local limit exceeded".
Server1 sends Server2 an rpc__mgmt_inq_princ_name request, and Server2 replies with a corresponding response.
There's some ldap exchanges looking up the domain, then a referral to ldap://domain/cn=msmq,CN=Server2,CN=Computers,DC=domain which returns a "no such object" response.
Then there's some SASL GSS-API encrypted exchange with the LDAP server
Then connections to the ldap server and Server2 are closed.
I've tried enabling Event Viewer > Applications and Services Logs > Microsoft > Windows > MSMQ > End2End. It shows messages being sent, but no indication of why trying to receive is failing.
How can I debug this further?
The problem was related to domains. Server1 and Server2 were part of a development domain. My login account was part of the corporate domain. The development domain trusts the corporate domain enough for me to log in, be a member of administrators, install features etc. But it seems to be insufficient trust to read remote queues.
I found this by looking into public queues. If I was having trouble reading remote private queues, perhaps I should get more data by trying public queues. After installing the appropriate directory integration feature, I was able to create a public queue, but not see it in the list of public queues. Trying to refresh the list of public queues gave me this error:
Not all
public queues can be displayed. Only public queues cached locally can be
displayed. Error: The object was not found in Active Directory.
Google pointed me to John Breakwell's answer to a similar problem here, which indicates that trust relationships don't work across messaging protocols.
Try to use the standard Receive method instead and specify the transaction type as it seems like BeginReceive does not support receiving from transactional queues.
Message msg = queue.Receive(MessageQueueTransactionType.Automatic);
MSMQ does not always return logical error messages...
System.Messaging.MessageQueueException (0x80004005)
at System.Messaging.MessageQueue.MQCacheableInfo.get_ReadHandle()
This error can also be caused due to the BeginReceive Read on an non-existent queue. Check the configuration to ensure queue path specified exists physically and has "Everyone" full permissions