Kafka 0.10 Replication Performance Decrease - apache-kafka

I am trying to benchmark Kafka Cluster. I am newbie. I build 3 node-cluster. Each node has one partition. I did not change default broker settings. I just copied producer and consumer code directly from official website.
When i create topic with replication 1 and partition 3, i was able to 170 MB per sec. throughput. When i create topic with replication 3 and parititon 3, i hardly see 30 MB per seconds throughput.
Then i applied production config in this link https://kafka.apache.org/documentation#prodconfig. The result got worse.
Can you share your experience with me?
disk type replication insert count one message length elapsed time req per sec concurreny throughput MB
hdd 1 10,000,000 250 25 400,000 1 95.36743164
hdd 1 10,000,000 250 28 357,000 2 170.2308655
hdd 1 10,000,000 250 55 175,000 4 166.8930054
hdd 1 1,000,000 250 22 45,400 8 86.59362793
hdd 1 10,000,000 250 22 85,000 8 162.1246338
hdd 3 1,000,000 250 10 100,000 1 23.84185791
hdd 3 1,000,000 250 19 55,000 2 26.2260437
hdd 3 1,000,000 250 30 32,000 4 30.51757813
hdd 3 1,000,000 250 45 20,000 8 38.14697266
hdd 3 10,000,000 250 559 18,000 8 34.33227539

You should expect performance to decrease when increasing replication. You're initial run had such high throughput because Kafka didn't need to copy the message data to multiple different partitions. When you increase the replication factor you're basically trading speed for durability.

Related

Handling of cassandra blocking writes when exceeds the memtable_cleanup_threshold

I was reading through the cassandra flushing strategies and came across following statement -
If the data to be flushed exceeds the memtable_cleanup_threshold, Cassandra blocks writes until the next flush succeeds.
Now my query is, let say we have insane writes to cassandra about 10K records per second and application is running 24*7. What should be the settings that we should make in following parameters to avoid blocking.
memtable_heap_space_in_mb
memtable_offheap_space_in_mb
memtable_cleanup_threshold
& Since it is a Time Series Data , do I need to make any changes with Compaction Strategy as well. If yes, what should be best for my case.
My spark application which is taking data from kafka and continuously inserting into Cassandra gets hang after particular time and I have analysed at that moment, there are lot of pending tasks in nodetool compactionstats.
nodetool tablehistograms
% SSTables WL RL P Size Cell Count
(ms) (ms) (bytes)
50% 642.00 88.15 25109.16 310 24
75% 770.00 263.21 668489.53 535 50
95% 770.00 4055.27 668489.53 3311 310
98% 770.00 8409.01 668489.53 73457 6866
99% 770.00 12108.97 668489.53 219342 20501
Min 4.00 11.87 20924.30 150 9
Max 770.00 1996099.05 668489.53 4866323 454826
Keyspace : trackfleet_db
Read Count: 7183347
Read Latency: 15.153115504235004 ms
Write Count: 2402229293
Write Latency: 0.7495135263492935 ms
Pending Flushes: 1
Table: locationinfo
SSTable count: 3307
Space used (live): 62736956804
Space used (total): 62736956804
Space used by snapshots (total): 10469827269
Off heap memory used (total): 56708763
SSTable Compression Ratio: 0.38214618375483633
Number of partitions (estimate): 493571
Memtable cell count: 2089
Memtable data size: 1168808
Memtable off heap memory used: 0
Memtable switch count: 88033
Local read count: 765497
Local read latency: 162.880 ms
Local write count: 782044138
Local write latency: 1.859 ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 368
Bloom filter false ratio: 0.00000
Bloom filter space used: 29158176
Bloom filter off heap memory used: 29104216
Index summary off heap memory used: 7883835
Compression metadata off heap memory used: 19720712
Compacted partition minimum bytes: 150
Compacted partition maximum bytes: 4866323
Compacted partition mean bytes: 7626
Average live cells per slice (last five minutes): 3.5
Maximum live cells per slice (last five minutes): 6
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 359
After changing the Compaction Strategy :-
Keyspace : trackfleet_db
Read Count: 8568544
Read Latency: 15.943608060365916 ms
Write Count: 2568676920
Write Latency: 0.8019530641630868 ms
Pending Flushes: 1
Table: locationinfo
SSTable count: 5843
SSTables in each level: [5842/4, 0, 0, 0, 0, 0, 0, 0, 0]
Space used (live): 71317936302
Space used (total): 71317936302
Space used by snapshots (total): 10469827269
Off heap memory used (total): 105205165
SSTable Compression Ratio: 0.3889946058934169
Number of partitions (estimate): 542002
Memtable cell count: 235
Memtable data size: 131501
Memtable off heap memory used: 0
Memtable switch count: 93947
Local read count: 768148
Local read latency: NaN ms
Local write count: 839003671
Local write latency: 1.127 ms
Pending flushes: 1
Percent repaired: 0.0
Bloom filter false positives: 1345
Bloom filter false ratio: 0.00000
Bloom filter space used: 54904960
Bloom filter off heap memory used: 55402400
Index summary off heap memory used: 14884149
Compression metadata off heap memory used: 34918616
Compacted partition minimum bytes: 150
Compacted partition maximum bytes: 4866323
Compacted partition mean bytes: 4478
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 660
Thanks,
I would not touch the memtable settings unless its a problem. They will only really block if your writing at a rate that exceeds your disks ability to write or GCs are messing up timings. "10K records per second and application is running 24*7" -- isn't actually that much given the records are not very large in size and will not overrun writes (a decent system can do 100k-200k/s constant load). nodetool tablestats, tablehistograms, and schema can help identify if your records are too big, partitions too wide and give better indicator of what your compaction strategy should be (probably TWCS but maybe LCS if you have any reads at all and partitions span a day or so).
pending tasks in nodetool compactionstats has nothing to do memtable settings really either as its more that your compactions not keeping up. This can be just something like spikes as bulk jobs run, small partitions flush, or repairs stream sstables over but if it grows instead of going down you need to tune your compaction strategy. Really a lot depends on data model and stats (tablestats/tablehistograms)
you may refer this link to tune above parameters. http://abiasforaction.net/apache-cassandra-memtable-flush/
memtable_cleanup_threshold – A percentage of your total available memtable space that will trigger a memtable cleanup.
memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers +
1). By default this is essentially 33% of your
memtable_heap_space_in_mb. A scheduled cleanup results in flushing of
the table/column family that occupies the largest portion of memtable
space. This keeps happening till your available memtable memory drops
below the cleanup threshold.

Size of the table is growing abnormally in postgres database

One of our database size is 50gb. Out of it one of the table has 149444622 records. Size of that table is 14GB and its indexes size is 16GB.
Total size of the table and its indexes are 30GB. I have perfomred the below steps on that table.
reindex table table_name;
vacuum full verbose analyze on table_name;
But still the size of the table and its indexes size are not reduced. Please guid me. How to proceed further.
Structure of the table as below.
14 GB for your data is not abnormal. Let's do the math.
Simply adding up the sizes of your columns gives 68 bytes per column.
2 bigints # 8 bytes each 16 bytes
4 integers # 4 bytes each 16 bytes
4 doubles # 8 bytes each 32 bytes
1 date # 4 bytes 4 bytes
--------
68 bytes
149,444,622 at 68 bytes each is about 9.7 GB. This is the absolute minimum size of your data if there were no database overhead. But there is overhead. This answer reckons its about 28 bytes per row. 68 + 28 is 96 bytes per row. That brings us to... 14.3 GB. Just what you have.
I doubt you can reduce the size without changing your schema, dropping indexes, or deleting data. If you provided more detail about your schema we could give suggestions, but I would suggest doing that as a new question.
Finally, consider that 50 GB is a pretty small database. For example, the smallest paid database offered by Heroku is 64 GB and just $50/month. This may be a situation where it's fine to just use a bigger disk.

Calculate Max over time on sum function in Prometheus

I am running prometheus in my kubernetes cluster.
I have the following system in kubernetes:
I have 4 nodes. I want to calculate free memory. I want to have the summation of those four nodes. Then I want to find the maximum over 1 day. So, for example,
at time=t1
node1: 500 MB
node2: 600 MB
node3: 200 MB
node4: 300 MB
Total = 1700 MB
at time=t2
node1: 400 MB
node2: 700 MB
node3: 100 MB
node4: 200 MB
Total = 1300 MB
at time=t3
node1: 600 MB
node2: 800 MB
node3: 1200 MB
node4: 1300 MB
Total = 3900 MB
at time=t4
node1: 100 MB
node2: 200 MB
node3: 300 MB
node4: 400 MB
Total = 1000 MB
So, The answer to my query should be 3900 MB. I am not able to do max_over_time for the sum.
I have done like this(which is not working at all):
max_over_time(sum(node_memory_MemFree)[2m])
Since version 2.7 (Jan 2019), Prometheus supports sub-queries:
max_over_time( sum(node_memory_MemFree_bytes{instance=~"foobar.*"})[1d:1h] )
(metric for the past 2 days, with a resolution of 1 hour.)
Read the documentation for more informations on using recording rules: https://prometheus.io/docs/prometheus/latest/querying/examples/#subquery
However, do note the blog recommendation:
Epilogue
Though subqueries are very convenient to use in place of recording rules, using them unnecessarily has performance implications. Heavy subqueries should eventually be converted to recording rules for efficiency.
It is also not recommended to have subqueries inside a recording rule. Rather create more recording rules if you do need to use subqueries in a recording rule.
The use of recording rules is explained in brian brazil article:
https://www.robustperception.io/composing-range-vector-functions-in-promql/
This isn't possible in one expression, you need to use a recording rule for the intermediate expression. See
https://www.robustperception.io/composing-range-vector-functions-in-promql/

perl cpu profiling

I want to profile my perl script for cpu time. I found out Devel::Nytprof and Devel::SmallProf
but the first one cannot show the cpu time and the second one works bad. At least I couldn't find what I need.
Can you advise any tool for my purposes?
UPD: I need per line profiling/ Since my script takes a lot of cpu time and I want to improve the part of it
You could try your system's (not shell's internal!) time utility (leading \ is not a typo):
$ \time -v perl collatz.pl
13 40 20 10 5 16 8 4 2 1
23 70 35 106 53 160 80 40
837799 525
Command being timed: "perl collatz.pl"
User time (seconds): 3.79
System time (seconds): 0.06
Percent of CPU this job got: 97%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.94
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 171808
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 9
Minor (reclaiming a frame) page faults: 14851
Voluntary context switches: 16
Involuntary context switches: 935
Swaps: 0
File system inputs: 1120
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

membase key-heavy memory usage vs redis

I recently ran a test with membase, incrementing 60 million keys, each key of size 20-30 bytes, the values are less than the value of an integer. This cluster was across 3 16 GB boxes, 15 GB dedicated to a single bucket (replication=1) in membase. The build is membase-server-community_x86_64_1.7.1.1 on 64-bit ubuntu lucid boxes.
Results:
Initially, 10 million keys resided on 3 GB of memory. (3mil keys / GB)
#60 million keys resided on 45 GB of memory. (1.33mil keys / GB)
In comparison, redis handles 9-10 million keys / GB # 60 million keys. This ratio of keys per GB is consistent regardless of the dataset size.
Question:
Membase does not seem to scale well when faced with key heavy datasets. Is there any tuning/configuration that could help Membase in this use case?
Thanks.
P.S I migrated from redis to membase because the latter seemed to offer more reliability against cache failure. However, this degradation of performance with large datasets is a bit too painful.