I'm writing a spark streaming job that reads data from Kafka, makes some changes to the records and sends the results to another Kafka cluster.
The performance of the job seems very slow, the processing rate is about 70,000 records per second. The sampling shows that 30% of the time is spent on reading data and processing it and the remaining 70% spent on sending data to the Kafka.
I've tried to tweak the Kafka configurations, add memory, change batch intervals, but the only change that works is to add more cores.
profiler:
Spark job details:
max.cores 30
driver memory 6G
executor memory 16G
batch.interval 3 minutes
ingres rate 180,000 messages per second
Producer Properties (I've tried different varations)
def buildProducerKafkaProperties: Properties = {
val producerConfig = new Properties
producerConfig.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, destKafkaBrokers)
producerConfig.put(ProducerConfig.ACKS_CONFIG, "all")
producerConfig.put(ProducerConfig.BATCH_SIZE_CONFIG, "200000")
producerConfig.put(ProducerConfig.LINGER_MS_CONFIG, "2000")
producerConfig.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "gzip")
producerConfig.put(ProducerConfig.RETRIES_CONFIG, "0")
producerConfig.put(ProducerConfig.BUFFER_MEMORY_CONFIG, "13421728")
producerConfig.put(ProducerConfig.SEND_BUFFER_CONFIG, "13421728")
producerConfig
}
Sending code
stream
.foreachRDD(rdd => {
val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
rdd
.map(consumerRecord => doSomething(consumerRecord))
.foreachPartition(partitionIter => {
val producer = kafkaSinkBroadcast.value
partitionIter.foreach(row => {
producer.send(kafkaTopic, row)
producedRecordsAcc.add(1)
})
stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)
})
Versions
Spark Standalone cluster 2.3.1
Destination Kafka cluster 1.1.1
Kafka topic has 120 partitions
Can anyone suggest how to increase sending throughput?
Update Jul 2019
size: 150k messages per second, each message has about 100 columns.
main settings:
spark.cores.max = 30 # the cores balanced between all the workers.
spark.streaming.backpressure.enabled = true
ob.ingest.batch.duration= 3 minutes
I've tried to use rdd.repartition(30), but it made the execution slower by ~10%
Thanks
Try to use repartition as below -
val numPartitons = ( Number of executors * Number of executor cores )
stream
.repartition(numPartitons)
.foreachRDD(rdd => {
val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
rdd
.map(consumerRecord => doSomething(consumerRecord))
.foreachPartition(partitionIter => {
val producer = kafkaSinkBroadcast.value
partitionIter.foreach(row => {
producer.send(kafkaTopic, row)
producedRecordsAcc.add(1)
})
stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)
})
This will give you optimum performance.
Hope this will help.
Related
I currently have a Kafka Stream service:
{
val _ = metrics
val timeWindow = Duration.of(config.timeWindow.toMillis, ChronoUnit.MILLIS)
val gracePeriod = Duration.of(config.gracePeriod.toMillis, ChronoUnit.MILLIS)
val store = Materialized
.as[AggregateKey, AggMetricDocument, ByteArrayWindowStore](
config.storeNames.reducerStateStore
)
.withRetention(gracePeriod.plus(timeWindow))
.withCachingEnabled()
builder
.stream[AggregateKey, AggMetricDocument(config.topicNames.aggMetricDocumentsIntermediate)
.groupByKey
.windowedBy(TimeWindows.of(timeWindow).grace(gracePeriod))
.reduce { (metricDoc1, metricDoc2) =>
metricDoc1.copy(
metrics = metricDoc1.metrics
.merge(metricDoc2.metrics, config.metricDocumentsReducerSamplesReservoirSize),
docsCount = metricDoc1.docsCount + metricDoc2.docsCount
)
}(store)
.toStream
.to(config.topicNames.aggMetricDocuments)(
Produced.`with`(AggregateKey.windowedSerde, AggMetricDocument.flattenSerde)
)
}
While
timeWindow=1m
gracePeriod=39h
The stream works fine on normal cardinality but when it starts processing high cardinality data(more than 100 million different keys) the processing rate declines after some time.
By looking at the RocksDB metrics it looks like the avg fetch latency is rising from 30µs to 600µs, and some decreasing in the hit rate of the filters and index as seen in the following test(sending ~15K/sec messages with uniqe keys):
The disk throughput and io seems under the disk limits.
The cpu usage and Load Avg increasing(the limit is 5 cores):
I made some RocksDB config modification:
private val cache = new LRUCache(2147483648, -1, false, 0.9) // 2GB
private val writeBufferManager = new WriteBufferManager(2147483648, cache)
val tableConfig = options.tableFormatConfig.asInstanceOf[BlockBasedTableConfig]
tableConfig.setBlockCache(BoundedMemoryRocksDBConfig.cache)
tableConfig.setCacheIndexAndFilterBlocks(true) // Default false
tableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true)
All other setting has the default values of Kafka Streams.
It seems that increasing the LRUCache helps for a while.
I am not sure what the core problem, Does someone have an idea of what causing this problem and which configuration should I tune to get better performance on high cardinality data.
I define a receiver to read data from Redis.
part of receiver simplified code:
class MyReceiver extends Receiver (StorageLevel.MEMORY_ONLY){
override def onStart() = {
while(!isStopped) {
val res = readMethod()
if (res != null) store(res.toIterator)
// using res.foreach(r => store(r)) the performance is almost the same
}
}
}
My streaming workflow:
val ssc = new StreamingContext(spark.sparkContext, new Duration(50))
val myReceiver = new MyReceiver()
val s = ssc.receiverStream(myReceiver)
s.foreachRDD{ r =>
r.persist()
if (!r.isEmpty) {
some short operations about 1s in total
// note this line ######1
}
}
I have a producer which produce much faster than consumer so that there are plenty records in Redis now, I tested with number 10000. I debugged, and all records could quickly be read after they are in Redis by readMethod() above. However, in each microbatch I can only get 30 records. (If store is fast enough it should get all of 10000)
With this suspect, I added a sleep 10 seconds code Thread.sleep(10000) to ######1 above. Each microbatch still gets about 30 records, and each microbatch process time increases 10 seconds. And if I increase the Duration to 200ms, val ssc = new StreamingContext(spark.sparkContext, new Duration(200)), it could get about 120 records.
All of these shows spark streaming only generate RDD in Duration? After gets RDD and in main workflow, store method is temporarily stopped? But this is a great waste if it is true. I want it also generates RDD (store) while the main workflow is running.
Any ideas?
I cannot leave a comment simply I don't have enough reputation. Is it possible that propertyspark.streaming.receiver.maxRate is set somewhere in your code ?
I have built a kafka-streams application with 20 streams threads a month ago. This app calculates the consumption of different people in a fixed time interval. Recently I found the people's spend money that is queried form local state store is less than real. I have read official, and any other documents I could find, but have not found the solve method.
I used Kafka version 0.11.0.3, kafka server's version is 0.11.0.3, and kafka streams api is also 0.11.0.3. Only one application with 20 streams threads.
Some important info:
Kafka streams config:
replication.factor 3
num.stream.threads 20
commit.interval.ms 1000
partition.assignment.strategy StickyAssignor.class.getName()
fetch.max.wait.ms 500
max.poll.records 5000
max.poll.interval.ms 300000
heartbeat.interval.ms 3000
session.timeout.ms 30000
auto.offset.reset latest
kafka message structure
key = person's name
value = the money he spend
time = the current time that this message was created
Kafka streams build code:
KStreamBuilder kStreamBuilder = new KStreamBuilder();
KStream<String, Double> peopleSpendStream = kStreamBuilder.stream(topic);
peopleSpendStream.groupByKey()// group by people's name
.aggregate(() -> new HashMap<String, Double>(8192),
(key, value, aggregate) -> {
aggregate.merge(key, value, Double::sum);
return aggregate;
},
TimeWindows.of(ONE_MINUTE).until(ONE_HOUR * 10), // 1-min window, keep 9 hours
new HashMapSerde<>(), // serialize and deserialize by jackson actually
PEOPLE_SPEND_STORE_NAME);
Query code:
long time = System.currentTimeMilles();
for (String name : names) { // query by people's name
try (WindowStoreIterator<HashMap<String, Double>> iterator = store.fetch(name, time - TEN_MINUTE_MILLES, time)) {
iterator.forEachRemaining(kv -> log.info("name = {}, time = {}, cost = {}", name, kv.key, kv.value));
}
}
Is anything I get wrong? I need your help in particular.
I try to aggregate a large amount of data using time windows of different sizes using Kafka Streams.
I increased the cache size to 2 GB, but when I set the window size in 1 hour I get the CPU load of 100% and the application starts to slow down.
My code looks like this:
val tradeStream = builder.stream<String, Trade>(configuration.topicNamePattern, Consumed.with(Serdes.String(), JsonSerde(Trade::class.java)))
tradeStream
.groupBy(
{ _, trade -> trade.pair },
Serialized.with(JsonSerde(TokensPair::class.java), JsonSerde(Trade::class.java))
)
.windowedBy(TimeWindows.of(windowDuration).advanceBy(windowHop).until(windowDuration))
.aggregate(
{ Ticker(windowDuration) },
{ _, newValue, aggregate -> aggregate.add(newValue) },
Materialized.`as`<TokensPair, Ticker>(storeByPairs)
.withKeySerde(JsonSerde(TokensPair::class.java))
.withValueSerde(JsonSerde(Ticker::class.java))
)
.toStream()
.filter { tokensPair, _ -> filterFinishedWindow(tokensPair.window(), windowHop) }
.map { tokensPair, ticker -> KeyValue(
TickerKey(ticker.tokensPair!!, windowDuration, Timestamp(tokensPair.window().start())),
ticker.calcPrice()
)}
.to(topicName, Produced.with(JsonSerde(TickerKey::class.java), JsonSerde(Ticker::class.java)))
In addition, before sending the aggregated data to the kafka topic they are filtered by end time of the window in order send to topic just finished window.
Perhaps there are some better approaches for implementing this kind of aggregation?
With out a knowing a bit more of the system it’s hard to diagnose.
How many partitions are present in your cluster ?
How many stream applications are you running ?
Are the stream applications running on the same machine ?
Are you using compression for the payload ?
Does it work for smaller intervals?
Hope that helps.
I am writing a Producer in Scala and I want to do batching. The way batching should work is, it should hold the messages in queue till it is full and then post all of them together on the topic. But somehow it's not working. The moment I start sending message, it starts posting the message one by one. Does anyone know how to use batching in Kafka Producer.
val kafkaStringSerializer = "org.apache.kafka.common.serialization.StringSerializer"
val batchSize: java.lang.Integer = 163840
val props = new Properties()
props.put("key.serializer", kafkaStringSerializer)
props.put("value.serializer", kafkaStringSerializer)
props.put("batch.size", batchSize);
props.put("bootstrap.servers", "localhost:9092")
val producer = new KafkaProducer[String,String](props)
val TOPIC="topic"
val inlineMessage = "adsdasdddddssssssssssss"
for(i<- 1 to 10){
val record: ProducerRecord[String, String] = new ProducerRecord(TOPIC, inlineMessage )
val futureResponse: Future[RecordMetadata] = producer.send(record)
futureResponse.isDone
println("Future Response ==========>" + futureResponse.get().serializedValueSize())
}
You have to set linger.ms in your props
By default, it is to zero, meaning that message is send immediatly if possible.
You can increase it (for example 100) so that batch occur - this means higher latency, but higher throughput.
batch.size is a maximum : if you reach it before linger.ms has passed, data will be sent without waiting more time.
To view the batches actually sent, you will need to configure your logging (batching s done on a background thread and you will not be able to view what batches are done with producer api - you can't send or receive batches, only send a record and receive its response, communication with broker via batch is done internally)
First, if not already done, bind a log4j properties file (Dlog4j.configuration=file:path/to/log4j.properties)
log4j.rootLogger=WARN, stderr
log4j.logger.org.apache.kafka.clients.producer.internals.Sender=TRACE, stderr
log4j.appender.stderr=org.apache.log4j.ConsoleAppender
log4j.appender.stderr.layout=org.apache.log4j.PatternLayout
log4j.appender.stderr.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.stderr.Target=System.err
For example, I will receive
TRACE Sent produce request to 2: (type=ProduceRequest, magic=1, acks=1, timeout=30000, partitionRecords=({test-1=[(record=LegacyRecordBatch(offset=0, Record(magic=1, attributes=0, compression=NONE, crc=2237306008, CreateTime=1502444105996, key=0 bytes, value=2 bytes))), (record=LegacyRecordBatch(offset=1, Record(magic=1, attributes=0, compression=NONE, crc=3259548815, CreateTime=1502444106029, key=0 bytes, value=2 bytes)))]}), transactionalId='' (org.apache.kafka.clients.producer.internals.Sender)
Which is a batch of 2 data. Batch will contain records send to a same broker
Then, play with batch.size and linger.ms to see the difference. Note that a record contain some overhead, so a batch.size of 1000 will not contain 10 messages of size 100
Note that I did not find documentation which stated all logger and what they do (like log4j.logger.org.apache.kafka.clients.producer.internals.Sender). You can enable DEBUG/TRACE on rootLogger and find the data you want, or explore the code
Your are producing the data to the Kafka server synchronously. Means, the moment you call producer.send with futureResponse.get, it will return only after the data gets stored in the Kafka Server.
Store the response in a separate list, and call futureResponse.get outside the for loop.
With default configuration, Kafka supports batching, see linger.ms and batch.size
List<Future<RecordMetadata>> responses = new ArrayList<>();
for (int i=1; i<=10; i++) {
ProducerRecord<String, String> record = new ProducerRecord<>(TOPIC, inlineMessage);
Future<RecordMetadata> response = producer.send(record);
responses.add(response);
}
for (Future<RecordMetadata> response : responses) {
response.get(); // verify whether the message is sent to the broker.
}