Flink task manager distributes task uneven - apache-kafka

I'm using kafka streaming to handle calc works via kafka request message. Howver, I just have discovered that Flink task managers may have distributed kafka messages to worker in batches, therefore, some workers may have got more batch of works than others. Therefore, the entire job runs longer than expected as works not distributed equally to all works.
Do we know how can we change the batch size to a smaller number so that the flink task managers could do a better distribution?
There is the number:
Thread
Tasks
elapse (s)
thread 101
463
2217
thread 103
464
2757
thread 94
232
1493
thread 95
232
1376
thread 96
232
1277
thread 97
463
2098
thread 98
232
1008
thread 99
232
1252
We can see from the above table, some threads got more works than others, and took more time to complete all the tasks.
Is it possible to change the batch size? I couldn't find any params relavent to batch messages in flink.
Thanks!
How could we adjust the batch size used by the FlinkKafkaConsumer010 in flink?

Related

Ceph PGs not deep scrubbed in time keep increasing

I've noticed this about 4 days ago and dont know what to do right now. The problem is as follows:
I have a 6 node 3 monitor ceph cluster with 84 osds, 72x7200rpm spin disks and 12xnvme ssds for journaling. Every value for scrub configurations are the default values. Every pg in the cluster is active+clean, every cluster stat is green. Yet PGs not deep scrubbed in time keeps increasing and it is at 96 right now. Output from ceph -s:
cluster:
id: xxxxxxxxxxxxxxxxx
health: HEALTH_WARN
1 large omap objects
96 pgs not deep-scrubbed in time
services:
mon: 3 daemons, quorum mon1,mon2,mon3 (age 6h)
mgr: mon2(active, since 2w), standbys: mon1
mds: cephfs:1 {0=mon2=up:active} 2 up:standby
osd: 84 osds: 84 up (since 4d), 84 in (since 3M)
rgw: 3 daemons active (mon1, mon2, mon3)
data:
pools: 12 pools, 2006 pgs
objects: 151.89M objects, 218 TiB
usage: 479 TiB used, 340 TiB / 818 TiB avail
pgs: 2006 active+clean
io:
client: 1.3 MiB/s rd, 14 MiB/s wr, 93 op/s rd, 259 op/s wr
How do i solve this problem? Also ceph health detail output shows that this non deep-scrubbed pg alerts started in january 25th but i didn't notice this before. The time I noticed this was when an osd went down for 30 seconds and got up. Might it be related to this issue? will it just resolve itself? should i tamper with the scrub configurations? For example how much performance loss i might face on client side if i increase osd_max_scrubs to 2 from 1?
Usually the cluster deep-scrubs itself during low I/O intervals on the cluster. The default is every PG has to be deep-scrubbed once a week. If OSDs go down they can't be deep-scrubbed, of course, this could cause some delay.
You could run something like this to see which PGs are behind and if they're all on the same OSD(s):
ceph pg dump pgs | awk '{print $1" "$23}' | column -t
Sort the output if necessary, and you can issue a manual deep-scrub on one of the affected PGs to see if the number decreases and if the deep-scrub itself works.
ceph pg deep-scrub <PG_ID>
Also please add ceph osd pool ls detail to see if any flags are set.
You can set the deep scrub period to 2 week, to stretch the deep scrub window.
Insted of
osd_deep_scrub_interval = 604800
use:
osd_deep_scrub_interval = 1209600
Mr. Eblock has a good idea to force manually some of the pgs for deep scrub , to spread the actions evently within 2 week.
You have 2 options:
Increase the interval between deep scrubs.
Control deep scrubbing manually with a standalone script.
I've written a simple PHP script which takes care of deep scrubbing for me: https://gist.github.com/ethaniel/5db696d9c78516308b235b0cb904e4ad
It lists all the PGs, picks 1 PG which have a last deep scrub done more than 2 weeks ago (the script takes the oldest one), checks if the OSDs that the PG sits on are not being used for another scrub (are in active+clean state), and only then starts a deep scrub on that PG. Otherwise it goes looking for another PG.
I have osd_max_scrubs set to 1 (otherwise OSD daemons start crashing due to a bug in Ceph), so this script works nicely with the regular scheduler - whichever starts the scrubbing on a PG-OSD first, wins.

apache kafka NoReplicaOnlineException

Using Apache Kafka with a single node (1 Zookeeper, 1 Broker) I get this exception (repeated multiple times):
kafka.common.NoReplicaOnlineException: No replica in ISR for partition __consumer_offsets-2 is alive. Live brokers are: [Set()], ISR brokers are: [0]
What does it mean? Note, I am starting the KafkaServer programmatically, and I am able to send and consume from a topic using the CLI tools.
It seems I should tell this node that it is operation in standalone mode - how should I do this?
This seems to happen during startup.
Full exception:
17-11-07 19:43:44 NP-3255AJ193091.home ERROR [state.change.logger:107]
- [Controller id=0 epoch=54] Initiated state change for partition __consumer_offsets-16 from OfflinePartition to OnlinePartition failed
kafka.utils.ShutdownableThread.run ShutdownableThread.scala:
64
kafka.controller.ControllerEventManager$ControllerEventThread.doWork
ControllerEventManager.scala: 52
kafka.metrics.KafkaTimer.time KafkaTimer.scala: 31
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply
ControllerEventManager.scala: 53 (repeats 2 times)
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp
ControllerEventManager.scala: 53
kafka.controller.KafkaController$Startup$.process
KafkaController.scala: 1581
kafka.controller.KafkaController.elect KafkaController.scala:
1681
kafka.controller.KafkaController.onControllerFailover
KafkaController.scala: 298
kafka.controller.PartitionStateMachine.startup
PartitionStateMachine.scala: 58
kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange
PartitionStateMachine.scala: 81
scala.collection.TraversableLike$WithFilter.foreach
TraversableLike.scala: 732
scala.collection.mutable.HashMap.foreach
HashMap.scala: 130
scala.collection.mutable.HashMap.foreachEntry
HashMap.scala: 40
scala.collection.mutable.HashTable$class.foreachEntry
HashTable.scala: 236
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply
HashMap.scala: 130 (repeats 2 times)
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply
TraversableLike.scala: 733
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply
PartitionStateMachine.scala: 81
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply
PartitionStateMachine.scala: 84
kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange
PartitionStateMachine.scala: 163
kafka.controller.PartitionStateMachine.electLeaderForPartition
PartitionStateMachine.scala: 303
kafka.controller.OfflinePartitionLeaderSelector.selectLeader
PartitionLeaderSelector.scala: 65
kafka.common.NoReplicaOnlineException: No replica in ISR for partition
__consumer_offsets-16 is alive. Live brokers are: [Set()], ISR brokers are: [0]

pg_top output analysis of puppetdb with postgres

I recently started using a tool called pg_top that shows statistics for Postgres, however since I am not verify versed with the internals of Postgres I needed a bit of clarification on the output.
last pid: 6152; load avg: 19.1, 18.6, 20.4; up 119+20:31:38 13:09:41
41 processes: 5 running, 36 sleeping
CPU states: 52.1% user, 0.0% nice, 0.8% system, 47.1% idle, 0.0% iowait
Memory: 47G used, 16G free, 2524M buffers, 20G cached
DB activity: 151 tps, 0 rollbs/s, 253403 buffer r/s, 86 hit%, 1550639 row r/s,
21 row w/s
DB I/O: 0 reads/s, 0 KB/s, 35 writes/s, 2538 KB/s
DB disk: 233.6 GB total, 195.1 GB free (16% used)
Swap:
My question is under the DB Activity, the 1.5 million row r/s, is that a lot? If so what can be done to improve it? I am running puppetdb 2.3.8, with 6.8 million resources, 2500 nodes, and Postgres 9.1. All of this runs on a single 24 core box with 64GB of memory.

bootloader lock failure standard unlocking commands not working thinking its hash failure or sst key corrupted

I do not know why but when y friend gets inebriated she like to hook her phone up to a PC and play with it. she has a basic knowledge of ADB and fastboot commmand and i verified with her what was thrown. When she went to re-lock the bootloader it did not with thisI did. she downloaded Google minimal sdk tools to get the updated ADB and Fastboot then went all the way and got mfastboot from Motorola to insure parsing for flashing. All of these fastboot packages were also tested on Mac and Linux Ubuntu, on Windows 8.1 Pro N Update 1 and Windows 7 Professional N SP2 (all x64). Resulted in the same errors. She is super thorough and I only taught here how to manually erase and flash no scripts or tool kits.
fastboot oem lock
and it returned.
(bootloader) FAIL: Please run fastboot oem lock begin first!
(bootloader) sst lock failure!
FAILED (remote failure)
finished. total time: 0.014s
Then tried again, then again, and then yep again. At this this point she either read the log and followed it. personally though I think based on the point she starts playing with phones it more likely she started to panic because she needs the bootloader locked for work and started attempting to flash.
fastboot oem lock begin
and it returned.
M:\SHAMU\FACTORY IMAGE\shamu-lmy47z>fastboot oem lock begin
...
(bootloader) Ready to flash signed images
OKAY [ 0.121s]
finished. total time: 0.123s
FACTORY IMAGE\shamu-lmy47z>fastboot flash boot boot.img
target reported max download size of 536870912 bytes
sending 'boot' (7731 KB)...
OKAY [ 0.252s]
writing 'boot'...
(bootloader) Preflash validation failed
FAILED (remote failure)
finished. total time: 0.271s
Then the bootloader log stated
cmd: oem lock
hab check failed for boot
failed to validate boot image
upon flashing boot.img the Bootloader Logs lists "Mismatched partition size (boot)".
intresting sometimes it returns
fastboot oem lock begin
...
(bootloader) Ready to flash signed images
OKAY [ 0.121s]
finished. total time: 0.123s
fastboot flash boot boot.img
target reported max download size of 536870912 bytes
sending 'boot' (7731 KB)...
OKAY [ 0.252s]
writing 'boot'...
(bootloader) Preflash validation failed
FAILED (remote failure)
finished. total time: 0.271s
I logged the partitions to see if they are zeroed out indicating bad emmc but they are not.
cat /proc/partitions
cat /proc/partitions
major minor #blocks name
179 0 61079552 mmcblk0
179 1 114688 mmcblk0p1
179 2 16384 mmcblk0p2
179 3 384 mmcblk0p3
179 4 56 mmcblk0p4
179 5 16 mmcblk0p5
179 6 32 mmcblk0p6
179 7 1024 mmcblk0p7
179 8 256 mmcblk0p8
179 9 512 mmcblk0p9
179 10 500 mmcblk0p10
179 11 4156 mmcblk0p11
179 12 384 mmcblk0p12
179 13 1024 mmcblk0p13
179 14 256 mmcblk0p14
179 15 512 mmcblk0p15
179 16 500 mmcblk0p16
179 17 4 mmcblk0p17
179 18 512 mmcblk0p18
179 19 1024 mmcblk0p19
179 20 1024 mmcblk0p20
179 21 1024 mmcblk0p21
179 22 1024 mmcblk0p22
179 23 16384 mmcblk0p23
179 24 16384 mmcblk0p24
179 25 2048 mmcblk0p25
179 26 32768 mmcblk0p26
179 27 256 mmcblk0p27
179 28 32 mmcblk0p28
179 29 128 mmcblk0p29
179 30 8192 mmcblk0p30
179 31 1024 mmcblk0p31
259 0 2528 mmcblk0p32
259 1 1 mmcblk0p33
259 2 8 mmcblk0p34
259 3 16400 mmcblk0p35
259 4 9088 mmcblk0p36
259 5 16384 mmcblk0p37
259 6 262144 mmcblk0p38
259 7 65536 mmcblk0p39
259 8 1024 mmcblk0p40
259 9 2097152 mmcblk0p41
259 10 58351488 mmcblk0p42
179 32 4096 mmcblk0rpmb
254 0 58351488 dm-0
Ive asked for log or the total process to see the full warning, error, and failure message but she is super far on business. From what I do have and what literature i have started to crack. I am starting to believe from all my research and learnng about the android boot proccess. Maybe there is a missing or corrupted key in the SST table which is I beleieved called the bigtable to google. or a hash password failure when locking the bootloader security down or i could be way off please let me know. What I do not know how to investigate or disprove this issue to move on. Would I be able to get confirmation through a stack trace for missing or corrupted coding. So then it can be a puzzle thats solved. Honestly though this has become a puzzle that begs to be solved not an emergency thanks.
You should try "fastboot flashing lock" command instead.

Help me analyze dump file

Customers are reporting problems almost every day on about the same hours. This app is running on 2 nodes. It is Metastorm BPM platform and it's calling our code.
In some dumps I noticed very long running threads (~50 minutes) but not in all of them. Administrators are also telling me that just before users report problems memory usage goes up. Then everything slows down to the point they can't work and admins have to restart platforms on both nodes. My first thought was deadlocks (long running threads) but didn't manage to confirm that. !syncblk isn't returning anything. Then I looked at memory usage. I noticed a lot of dynamic assemblies so thought maybe assemblies leak. But it looks it's not that. I have received dump from day where everything was working fine and number of dynamic assemblies is similar. So maybe memory leak I thought. But also cannot confirm that. !dumpheap -stat shows memory usage grows but I haven't found anything interesting with !gcroot. But there is one thing I don't know what it is. Threadpool Completion Port. There's a lot of them. So maybe sth is waiting on sth? Here is data I can give You so far that will fit in this post. Could You suggest anything that could help diagnose this situation?
Users not reporting problems:
Node1 Node2
Size of dump: 638MB 646MB
DynamicAssemblies 259 265
GC Heaps: 37MB 35MB
Loader Heaps: 11MB 11MB
Node1:
Number of Timers: 12
CPU utilization 2%
Worker Thread: Total: 5 Running: 0 Idle: 5 MaxLimit: 2000 MinLimit: 200
Completion Port Thread:Total: 2 Free: 2 MaxFree: 16 CurrentLimit: 4 MaxLimit: 1000 MinLimit: 8
!dumpheap -stat (biggest)
0x793041d0 32,664 2,563,292 System.Object[]
0x79332b9c 23,072 3,485,624 System.Int32[]
0x79330a00 46,823 3,530,664 System.String
0x79333470 22,549 4,049,536 System.Byte[]
Node2:
Number of Timers: 12
CPU utilization 0%
Worker Thread: Total: 7 Running: 0 Idle: 7 MaxLimit: 2000 MinLimit: 200
Completion Port Thread:Total: 3 Free: 1 MaxFree: 16 CurrentLimit: 5 MaxLimit: 1000 MinLimit: 8
!dumpheap -stat
0x793041d0 30,678 2,537,272 System.Object[]
0x79332b9c 21,589 3,298,488 System.Int32[]
0x79333470 21,825 3,680,000 System.Byte[]
0x79330a00 46,938 5,446,576 System.String
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Users start to report problems:
Node1 Node2
Size of dump: 662MB 655MB
DynamicAssemblies 236 235
GC Heaps: 159MB 113MB
Loader Heaps: 10MB 10MB
Node1:
Work Request in Queue: 0
Number of Timers: 14
CPU utilization 20%
Worker Thread: Total: 7 Running: 0 Idle: 7 MaxLimit: 2000 MinLimit: 200
Completion Port Thread:Total: 48 Free: 1 MaxFree: 16 CurrentLimit: 49 MaxLimit: 1000 MinLimit: 8
!dumpheap -stat
0x7932a208 88,974 3,914,856 System.Threading.ReaderWriterLock
0x79333054 71,397 3,998,232 System.Collections.Hashtable
0x24f70350 319,053 5,104,848 Our.Class
0x79332b9c 53,190 6,821,588 System.Int32[]
0x79333470 52,693 6,883,120 System.Byte[]
0x79333150 72,900 11,081,328 System.Collections.Hashtable+bucket[]
0x793041d0 247,011 26,229,980 System.Object[]
0x79330a00 644,807 34,144,396 System.String
Node2:
Work Request in Queue: 1
Number of Timers: 17
CPU utilization 17%
Worker Thread: Total: 6 Running: 0 Idle: 6 MaxLimit: 2000 MinLimit: 200
Completion Port Thread:Total: 48 Free: 2 MaxFree: 16 CurrentLimit: 49 MaxLimit: 1000 MinLimit: 8
!dumpheap -stat
0x7932a208 76,425 3,362,700 System.Threading.ReaderWriterLock
0x79332b9c 42,417 5,695,492 System.Int32[]
0x79333150 41,172 6,451,368 System.Collections.Hashtable+bucket[]
0x79333470 44,052 6,792,004 System.Byte[]
0x793041d0 175,973 18,573,780 System.Object[]
0x79330a00 397,361 21,489,204 System.String
Edit:
I downloaded debugdiag and let it analyze my dumps. Here is part of output:
The following threads in process_name name_of_dump.dmp are making a COM call to thread 193 within the same process which in turn is waiting on data to be returned from another server via WinSock.
The call to WinSock originated from 0x0107b03b and is destined for port xxxx at IP address xxx.xxx.xxx.xxx
( 18 76 172 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 210 211 212 213 214 215 216 217 218 224 225 226 227 228 229 231 232 233 236 239 )
14,79% of threads blocked
And the recommendation is:
Several threads making calls to the same STA thread can cause a performance bottleneck due to serialization. Server side COM servers are recommended to be thread aware and follow MTA guidelines when multiple threads are sharing the same object instance.
I checked using windbg what thread 193 does. It is calling our code. Our code is calling some Metastorm engine code and it hangs on some remoting call. But !runaway shows it is hanging for 8 seconds. So not that long. So I checked what are those waiting threads. All except thread 18 are:
System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*) I could understand one, but why so many of them. Is it specific to business process modeling engine we're using or is it something typical? I guess it's taking threads that could be used by other clients and that's why the slowdown reported by users. Are those threads Completion Port Threads I asked about before? Can I do anything more to diagnose or did I found our code to be the cause?
From the looks of the output most of the memory is not on the .net heaps (only 35 MB out of ~650) so if you are looking at the .net heaps I think you are looking in the wrong place. The memory is probably either in assemblies or in native memory if you are using some native component for file transfers or similar. You would want to use Debug Diag to monitor that.
It is hard to say if you are leaking dynamic assemblies without looking at the pattern of growth so I would suggest for that that you look at perfmon and #current assemblies to see if it keeps growing over time, if it does then you would have to investigate that further by looking at what the dynamic assemblies are with !dda