Replacement for SimpleConsumer.send in kafka? - scala

SimpleConsumer has been deprecated in kafka, with org.apache.kafka.clients.consumer.KafkaConsumer being the replacement. However, it doesn't have a send(...) function. How can I rewrite the below code using the new KafkaConsumer?
import scala.concurrent.duration._
import kafka.api.TopicMetadataRequest
import kafka.consumer.SimpleConsumer
....
val consumer = new SimpleConsumer(
host = "127.0.0.1",
port = 9092,
soTimeout = 2.seconds.toMillis.toInt,
bufferSize = 1024,
clientId = "health-check")
// this will fail if Kafka is unavailable
consumer.send(new TopicMetadataRequest(Nil, 1))

You can get topic metadata with .partitionsFor and .listTopics

There is no direct replacement for method ,it depends what you want to do .
If you need all partitions info there is method for that consumer.partitionFor(topic) in the new api

Related

How to do I connect Kafka python to accept username and password for jaas like it's done in Java?

Using an existing working Java example I am trying to write a python equivalent of the producer using python-kafka and confluent_kafka library. How do I configure sasl.jass.config in python with the information like that in Java below?
import java.util.Properties;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
...
Properties props = new Properties();
...
props.put("sasl.jaas.config", "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"<Kafka_Username>\" password=\"<Kafka_Password>\";");
Producer<String, String> producer = new KafkaProducer<>(props);
This works for me
from confluent_kafka import Producer
SECURITY_PROTOCOL = "SASL_SSL"
SASL_MECHANISM = "PLAIN"
conf = {
'bootstrap.servers': SERVERS,
'sasl.mechanisms': SASL_MECHANISM,
'security.protocol': SECURITY_PROTOCOL,
'sasl.username': SASL_USERNAME,
'sasl.password': SASL_PASSWORD,
...
}
consumer = Producer(conf)

Configure log4j.properties for Kafka appender, error when parsing property bootstrap.servers

I want to add a Kafka appender to the audit-hdfs log in a Cloudera cluster.
I have successfully configured a log4j2.xml file with a Kafka appender, I need to convert this log4j2.xml into a log4j2.properties file in order to be able to merge it with the HDFS log log4j2.properties file. I am unable to do this because when I launch my dummy process with log4j2.properties instead of XML, I get an error.
I have tried writing the properties file in several different ways, always resulting in problems with the bootstrap.servers property
This is my properties file
filters = threshold
filter.threshold.type = ThresholdFilter
filter.threshold.level = ALL
appenders = console,kafka
appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %m%n
appender.kafka.type = Kafka
appender.kafka.name = kafka
appender.kafka.layout.type = PatternLayout
appender.kafka.layout.pattern =%m%n
appender.kafka.property.type = Property
appender.kafka.property.bootstrap.servers = ip:port
appender.kafka.topic = cdh-audit-hdfs
Here the problem is in this line:
appender.kafka.property.bootstrap.servers = ip:port
I have tried the following, to no avail:
appender.kafka.property.bootstrap.servers = ip:port
appender.kafka.property.bootstrap\.servers = ip:port
appender.kafka.property.name = "bootstrap.servers"
appender.kafka.property.bootstrap.servers = ip:port
appender.kafka.property.key = "bootstrap.servers"
appender.kafka.property.value = ip:port
etc...
This is my dummy process:
package blabla
import org.apache.logging.log4j.LogManager
object dummy extends App{
val logger = LogManager.getLogger
val record = "...c"
while(true){
logger.info(record)
Thread.sleep(5000)
}
}
How do I need to configure my log4j2.properties in order to be able to define this property?
Im expecting this process to write the record in my Kafka topic but instead I get errors like this:
Exception in thread "main" org.apache.logging.log4j.core.config.ConfigurationException: No type attribute provided for component bootstrap
Exception in thread "main" org.apache.kafka.common.config.ConfigException: Missing required configuration "bootstrap.servers" which has no default value.
try this solution, it works for me :
appender.kafka.property.type=Property
appender.kafka.property.name=bootstrap.servers
appender.kafka.property.value=host:port

Producer lost some message on kafka restart

Kafka Client : 0.11.0.0-cp1
Kafka Broker :
On Kafka broker rolling restart, our application lost some messages while sending to broker. I believe with rolling restart there should not be any loss of message. These are the producer (Using Producer with asynchronous send() and not using callback/future etc) settings we are using :
val acksConfig: String = "all",
val retriesConfig: Int = Int.MAX_VALUE,
val retriesBackOffConfig: Int = 1000,
val batchSize: Int = 32768,
val lingerTime: Int = 1,
val maxBlockTime: Int = Int.MAX_VALUE,
val requestTimeOut: Int = 420000,
val bufferMemory: Int = 33_554_432,
val compressionType: String = "gzip",
val keySerializer: Class<StringSerializer> = StringSerializer::class.java,
val valueSerializer: Class<ByteArraySerializer> = ByteArraySerializer::class.java
I am seeing these exceptions in the logs
2019-03-19 17:30:59,224 [org.apache.kafka.clients.producer.internals.Sender] [kafka-producer-network-thread | producer-1] (Sender.java:511) WARN org.apache.kafka.clients.producer.internals.Sender - Got error produce response with correlation id 1105790 on topic-partition catapult_on_entitlement_updates_prod-67, retrying (2147483643 attempts left). Error: NOT_LEADER_FOR_PARTITION
But log says retry attempt left, i am curious why didnt it retry then? Let me know if anyone has any idea?
Two things to note:
What is the replication factor of the topic you are producing and what is the required number of min.insync.replicas?
What do you mean by "producer lost some messages". The producer if it cannot successfully produce to #min.insync.replicas brokers it will throw an exception and fail (for synchronous production). It is up to the producer/ client to retry in case of failure (synchronous or asynchronous production).

Kafka consumer rebalance occurs even if there is no message to consume

I have been seeing so many kafka consumer rebalances even if the thread is consuming nothing.I would expect the consumer not to rebalance in this scenario.
Here is the sample code.
import argparse
from kafka.coordinator.assignors.range import RangePartitionAssignor
from kafka import KafkaConsumer
import time
import sys
import logging
from kafka.consumer.subscription_state import ConsumerRebalanceListener
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler(sys.stdout)
ch.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)
def get_config(args):
config = {
'bootstrap_servers': args.host,
'group_id': args.group,
'key_deserializer': lambda msg: msg,
'value_deserializer': lambda msg: msg,
'partition_assignment_strategy': [RangePartitionAssignor],
'max_poll_records': args.records,
'auto_offset_reset': args.offset,
# 'max_poll_interval_ms': 300000,
# 'connections_max_idle_ms': 8 * 60 * 1000
}
return config
def start_consumer(args):
config = get_config(args)
consumer = KafkaConsumer(**config)
consumer.subscribe([args.topic],
listener=RepartitionListener(None))
for record in consumer:
print record.offset, record.partition
time.sleep(int(args.delay) / 1000.0)
class RepartitionListener(ConsumerRebalanceListener):
def __init__(self):
pass
def on_partitions_revoked(self, revoked):
print("partition revoked ")
for tp in revoked:
try:
print("[{}] revoked topic = {} partition = {}".format(time.strftime("%c"),
tp.topic, tp.partition))
partition_key = "{}_{}".format(tp.topic, str(tp.partition))
except Exception as e:
print("Got exception partition_key = {} {}".
format(tp, e.message))
def on_partitions_assigned(self, assigned):
pass
def main():
parser = argparse.ArgumentParser(
description='Tool to test consumer group with delay')
named_args = parser.add_argument_group('snamed arguments')
named_args.add_argument('-g', '--group', help='group id for the consumer',
required=True)
named_args.add_argument('-r', '--records', help='num records to consume',
required=True)
named_args.add_argument('-k', '--topic', help='kafka topic', required=True)
named_args.add_argument('-d', '--delay', help='add process delay in ms', required=True)
named_args.add_argument('-s', '--host', help='Kafka host format host:port', required=False)
parser.add_argument('-o', '--offset',
default='latest',
help='offset to read from earliest/latest')
args = parser.parse_args()
print args
start_consumer(args)
if __name__ == "__main__":
main()
How to avoid the rebalance trigger? From the logs I am seeing heartbeat is failing, But I am expecting the heartbeat to be continued even if there is no messages over a period of time greater than session.time.out.ms.
2019-02-27 20:39:43,281 - kafka.coordinator - WARNING - Heartbeat session expired, marking coordinator dead

Unable to publish in Kafka

I want to publish in Kafka topic
I am unable to do so, the program halts.
I am getting this error:
KafkaTimeoutError: Failed to update metadata after 60.0 secs.
def saveResults(response):
entities_tweet = response["entities"]
for entity in entities_tweet:
try:
for i in entity_dict:
for j in entity_dict[i]:
if(entity["text"] in j):
entity["tweet"] = response["tweet"]
entity["tweetId"] = response["tweetId"]
entity["timeStamp"] = response["timeStamp"]
#entity["userProfile"] = response["userProfile"]
future = producer.send('argentina-iceland-june-16-watson', bytes(entity))
print("Published.")
else:
print("All ignored.")
future = producer.send('argentina-iceland-june-16-watson', bytes(entity))
print("Published")
except Exception as e:
print (e)
finally:
producer.flush()
However, this is working:
from kafka import KafkaProducer
from kafka.errors import KafkaError
producer = KafkaProducer(bootstrap_servers=['broker1:1234'])
# Asynchronous by default
future = producer.send('my-topic', b'raw_bytes')
It looks like that you're using incorrect boostrap server, it should be broker1:9092 instead of broker1:1234...