Kafka latency optimization - apache-kafka

My kafka version is 0.10.2.1.
My service have really low qps (1msg/sec). And our requirement for rtt is really strict. ( 99.9% < 30ms)
Currently I've encounter a problem, when kafka run for a long time, 15 days or so, performance start to go down.
2017-10-21 was like
Time . num of msgs . percentage
cost<=2ms 0 0.000%
2ms<cost<=5ms 12391 32.659%
5ms<cost<=8ms 25327 66.754%
8ms<cost<=10ms 186 0.490%
10ms<cost<=15ms 24 0.063%
15ms<cost<=20ms 2 0.005%
20ms<cost<=30ms 0 0.000%
30ms<cost<=50ms 4 0.011%
50ms<cost<=100ms 1 0.003%
100ms<cost<=200ms 0 0.000%
200ms< cost<=300ms 6 0.016%
300ms<cost<=500ms 0 0.000%
500ms<cost<=1s 0 0.000%
cost>1s 0 0.000%
But recently, it became :
cost<=2ms 0 0.000%
2ms<cost<=5ms 7592 29.202%
5ms<cost<=8ms 17470 67.197%
8ms<cost<=10ms 698 2.685%
10ms<cost<=15ms 143 0.550%
15ms<cost<=20ms 23 0.088%
20ms<cost<=30ms 19 0.073%
30ms<cost<=50ms 11 0.042%
50ms<cost<=100ms 5 0.019%
100ms<cost<=200ms 11 0.042%
200m s<cost<=300ms 26 0.100%
300ms<cost<=500ms 0 0.000%
500ms<cost<=1s 0 0.000%
cost>1s 0 0.000%
When I check the log, I don't see a way to check the reason why a specific message have a high rtt. And if there's any way to optimize(OS tune, broker config), please enlighten me

Without the request handling time break-down it is hard to tell which part maybe the culprit of your issue. More specifically you'll need to hook up your jmx and check the following request-level metrics:
TotalTimeMs
RequestQueueTimeMs
LocalTimeMs
RemoteTimeMs
ResponseQueueTimeMs
ResponseSendTimeMs
https://kafka.apache.org/documentation/#monitoring
Check their avg / 99 percentile value over time and see which one is contributing to the perf degradation.

Consider upgrading to 0.11 (or 1.00) which has performance improvements in it
Optimisation article: https://www.confluent.io/blog/optimizing-apache-kafka-deployment/

Related

Why does my esp32 keep rebooting problem?

This is my problem:
Brownout detector was triggered
ets Jun 8 2016 00:22:57
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:5008
ho 0 tail 12 room 4
load:0x40078000,len:10600
ho 0 tail 12 room 4
load:0x40080400,len:5684
entry 0x400806bc
I want to make some web server on my ESP32

How to preserve the order of items emitted by two observables after they are merged?

I have run into a behavior of Scala Observables that has surprised me. Consider my example below:
object ObservablesDemo extends App {
val oFast = Observable.interval(3.seconds).map(n => s"[FAST] ${n*3}")
val oSlow = Observable.interval(7.seconds).map(n => s"[SLOW] ${n*7}")
val oBoth = (oFast merge oSlow).take(8)
oBoth.subscribe(println(_))
oBoth.toBlocking.toIterable.last
}
The code demonstrates emitting elements from two observables. One of them emits its elements in a "slow" way (every 7 seconds), the other in a "fast" way (every 3 seconds). For the sake of the question assume we want to define those observables with the use of the map function and map the numbers from the interval appropriately as seen above (as opposed to another possible approach which would be emitting items at the same rate from both observables and then filtering out as needed).
The output of the code seems counterintuitive to me:
[FAST] 0
[FAST] 3
[SLOW] 0
[FAST] 6
[FAST] 9 <-- HERE
[SLOW] 7 <-- HERE
[FAST] 12
[FAST] 15
The problematic part is when the [FAST] observable emits 9 before the [SLOW] observable emits 7. I would expect 7 to be emitted before 9 as whatever is emitted on the seventh second should come before what is emitted on the ninth second.
How should I modify the code to achieve the intended behavior? I have looked into the RxScala documentation and have started my search with topics such as the different interval functions and the Scheduler classes but I'm not sure if it's the right place to search for the answer.
That looks like the way it should work. Here it is listing out the seconds and the events. You can verify with TestObserver and TestScheduler if that is available in RXScala. RXScala was EOL in 2019, so keep that in mind too.
Secs Event
-----------------
1
2
3 [Fast] 0
4
5
6 [Fast] 3
7 [Slow] 0
8
9 [Fast] 6
10
11
12 [Fast] 9
13
14 [Slow] 7
15 [Fast] 12
16
17
18 [Fast] 15
19
20
21 [Fast] 18

SocketCAN - device state "STOPPED"

I use a Raspberry Pi with the PiCAN board which uses a MCP2515 CAN controller.
I use SocketCAN to read and write CAN messages via an application I wrote.
After running a few weeks without a problem the controller is now in the state "STOPPED".
What is the difference between the state STOPPED and BUS-OFF?
Does a device enter the BUS-OFF state if too many error occure on the CAN bus and the device enters the STOPPED state if you set the device down (ip link set canX down)?
Are there any other ways how the device may enter the state STOPPED? I wasn't able to find a way how my application might have set the device down.
ip -details -statistics link show can0
3: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 10
link/can promiscuity 0
can state STOPPED restart-ms 100
bitrate 250000 sample-point 0.875
tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
clock 8000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 146 139 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
RX: bytes packets errors dropped overrun mcast
787700920 151606570 24 0 24 0
TX: bytes packets errors dropped carrier collsns
6002905 5895301 0 0 0 0
You need to familiarize your self with ERROR ACTIVE, ERROR PASSIVE, and BUS OFF error states of CAN bus devices, and when is it needed to manually restart CAN communication.
All relevant info can be found at one of these links:
http://www.can-wiki.info/doku.php?id=can_faq:can_faq_erors
http://www.port.de/cgi-bin/CAN/CanFaqErrors

How to run !dumpheap -dead on the generation 2 across all the heaps?

I have 8 managed gc heaps reported by !eeheap -gc:
0:000> !eeheap -gc
Number of GC Heaps: 8
------------------------------
Heap 0 (00000000009a2c50)
generation 0 starts at 0x00000000d92e3aa0
generation 1 starts at 0x00000000d8cdb128
generation 2 starts at 0x000000007fff1000
ephemeral segment allocation context: none
segment begin allocated size
000000007fff0000 000000007fff1000 00000000d93edab8 0x593fcab8(1497352888)
Large object heap starts at 0x000000047fff1000
segment begin allocated size
000000047fff0000 000000047fff1000 0000000487fabf00 0x7fbaf00(133934848)
00000004e6400000 00000004e6401000 00000004ee3af2f8 0x7fae2f8(133882616)
000000050e400000 000000050e401000 00000005152f8578 0x6ef7578(116356472)
0000000572400000 0000000572401000 00000005756e8ad8 0x32e7ad8(53377752)
Heap Size: Size: 0x73544d00 (1934904576) bytes.
------------------------------
Heap 1 (00000000009ad690)
generation 0 starts at 0x00000001609a9cc8
generation 1 starts at 0x000000016072f780
generation 2 starts at 0x00000000ffff1000
ephemeral segment allocation context: none
segment begin allocated size
00000000ffff0000 00000000ffff1000 0000000161bf8f50 0x61c07f50(1640005456)
Large object heap starts at 0x0000000487ff1000
segment begin allocated size
0000000487ff0000 0000000487ff1000 000000048ffea910 0x7ff9910(134191376)
0000000044b50000 0000000044b51000 000000004cb44978 0x7ff3978(134166904)
000000051e400000 000000051e401000 000000052575aae0 0x7359ae0(120953568)
000000057a400000 000000057a401000 000000057c2e8610 0x1ee7610(32405008)
Heap Size: Size: 0x7ae362c8 (2061722312) bytes.
...
I would like to run the !dumpheap -dead command on the gen 2 and LOH only, however, I am a bit confused as to:
The command clearly says where the gen 2 starts, but it is unclear to me where does it end. For example, for Heap 0 I figure I give -start 0x000000007fff1000, but what goes into -end ? Is it the start of gen 1?
I have 8 heaps, so I guess I have to run the !dumpheap -dead 8 times for gen 2. For LOH, which seems to span multiple fragments, the number of times is even higher. Is there a way to automate the process of dumping all these dead objects across all the LOHs and gen 2s?

how to intrerpret output of tracefile-analyser.php run on xdebug trace file generated in Zend environment

My main goal is to determine memory requirements of my web application that runs under Zend. I have successfully setup xdebug, generated trace file with it and applied tracefile-analyser.php. Now, I need some help with interpreting the results. My question is if the memory that was needed by mysql is counted into that value? Is it really the total memory consumption per one request?
$ ./tracefile-analyser.php /var/www/simira/logs/profiles/trace.2043925204.xt memory-inclusive
parsing...
(49.88%)
Done.
Showing the 25 most costly calls sorted by 'memory-inclusive'.
Inclusive Own
function #calls time memory time memory
--------------------------------------------------------------------------------------------------------------------
{main} 1 1.8332 19701712 0.0016 121392
Zend_Loader_Autoloader::autoload 40 0.1397 12780440 0.0014 -11200
Zend_Loader::loadFile 23 0.0959 10480432 0.0311 3501384
Zend_Loader::loadClass 23 0.0980 10471760 0.0011 -12128
call_user_func 49 0.1063 10471704 0.0004 664
Zend_Loader_Autoloader->_autoload 23 0.0992 10470800 0.0005 0
Zend_Controller_Front->dispatch 1 1.3967 10284488 0.0022 390336
Zend_Application->run 1 1.3970 10284200 0.0000 0
Zend_Application_Bootstrap_Bootstrap->run 1 1.3969 10284200 0.0001 -392
Zend_Controller_Dispatcher_Standard->dispatch 1 1.1260 9144376 0.0001 896
include_once 43 0.0786 7331968 0.0294 3679992
...
MySQL's memory usage, and I suppose you mean on the client side, and not the MySQL server side, is not included as it doesn't use PHP's memory management routines. Because of this, PHP (and henceforth Xdebug) can't show you that information.
As for your other question, yes, a trace gives the exact information for one request only. But be aware that when the script ends, some memory might already have been freed and thus doesn't show in up the memory for {main}. Simply use xdebug_peak_memory_usage() to find out the maximum amount of memory used.