Ceph OSD crashes

Ceph OSD crashes - centos

I am looking for a solution that can bring up our 3 ceph OSD which are down. According to the log I can see the following output that I guess it is because of OSD are full.
At the moment 3 osd of 4 are down and the ceph cluster is also down and I have no clue about the following error.
root#cephosd02 ~]# /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/15.2.13/rpm/el7/BUILD/ceph-15.2.13/src/os/bluestore/AvlAllocator.cc: In function 'virtual void AvlAllocator::_add_to_tree(uint64_t, uint64_t)' thread 7f633fca3bc0 time 2021-11-28T12:02:33.118576+0330
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/15.2.13/rpm/el7/BUILD/ceph-15.2.13/src/os/bluestore/AvlAllocator.cc: 60: FAILED ceph_assert(size != 0)
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14c) [0x55b6ed7219a3]
2: (()+0x4deb6b) [0x55b6ed721b6b]
3: (AvlAllocator::_add_to_tree(unsigned long, unsigned long)+0x2db9) [0x55b6edd9af89]
4: (AvlAllocator::init_add_free(unsigned long, unsigned long)+0x69) [0x55b6edd9e8d9]
5: (BlueStore::_open_alloc()+0x1d3) [0x55b6edc61ad3]
6: (BlueStore::_open_db_and_around(bool)+0xa7) [0x55b6edc7a507]
7: (BlueStore::_mount(bool, bool)+0x5c2) [0x55b6edcc6ba2]
8: (OSD::init()+0x35d) [0x55b6ed82f1bd]
9: (main()+0x1b5e) [0x55b6ed78372e]
10: (__libc_start_main()+0xf5) [0x7f633caf5555]
11: (()+0x575dd5) [0x55b6ed7b8dd5]
* Caught signal (Aborted) **
in thread 7f633fca3bc0 thread_name:ceph-osd
2021-11-28T12:02:33.125+0330 7f633fca3bc0 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/15.2.13/rpm/el7/BUILD/ceph-15.2.13/src/os/bluestore/AvlAllocator.cc: In function 'virtual void AvlAllocator::_add_to_tree(uint64_t, uint64_t)' thread 7f633fca3bc0 time 2021-11-28T12:02:33.118576+0330
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/15.2.13/rpm/el7/BUILD/ceph-15.2.13/src/os/bluestore/AvlAllocator.cc: 60: FAILED ceph_assert(size != 0)
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14c) [0x55b6ed7219a3]
2: (()+0x4deb6b) [0x55b6ed721b6b]
3: (AvlAllocator::_add_to_tree(unsigned long, unsigned long)+0x2db9) [0x55b6edd9af89]
4: (AvlAllocator::init_add_free(unsigned long, unsigned long)+0x69) [0x55b6edd9e8d9]
5: (BlueStore::_open_alloc()+0x1d3) [0x55b6edc61ad3]
6: (BlueStore::_open_db_and_around(bool)+0xa7) [0x55b6edc7a507]
7: (BlueStore::_mount(bool, bool)+0x5c2) [0x55b6edcc6ba2]
8: (OSD::init()+0x35d) [0x55b6ed82f1bd]
9: (main()+0x1b5e) [0x55b6ed78372e]
10: (__libc_start_main()+0xf5) [0x7f633caf5555]
11: (()+0x575dd5) [0x55b6ed7b8dd5]
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
1: (()+0xf630) [0x7f633dd16630]
2: (gsignal()+0x37) [0x7f633cb09387]
3: (abort()+0x148) [0x7f633cb0aa78]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19b) [0x55b6ed7219f2]
5: (()+0x4deb6b) [0x55b6ed721b6b]
6: (AvlAllocator::_add_to_tree(unsigned long, unsigned long)+0x2db9) [0x55b6edd9af89]
7: (AvlAllocator::init_add_free(unsigned long, unsigned long)+0x69) [0x55b6edd9e8d9]
8: (BlueStore::_open_alloc()+0x1d3) [0x55b6edc61ad3]
9: (BlueStore::_open_db_and_around(bool)+0xa7) [0x55b6edc7a507]
10: (BlueStore::_mount(bool, bool)+0x5c2) [0x55b6edcc6ba2]
11: (OSD::init()+0x35d) [0x55b6ed82f1bd]
12: (main()+0x1b5e) [0x55b6ed78372e]
13: (__libc_start_main()+0xf5) [0x7f633caf5555]
14: (()+0x575dd5) [0x55b6ed7b8dd5]
2021-11-28T12:02:33.133+0330 7f633fca3bc0 -1 * Caught signal (Aborted) **
in thread 7f633fca3bc0 thread_name:ceph-osd
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
1: (()+0xf630) [0x7f633dd16630]
2: (gsignal()+0x37) [0x7f633cb09387]
3: (abort()+0x148) [0x7f633cb0aa78]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char

Related

Migrating from ag-grid-community to ag-grid-enterprise with angular causing heap out of memory

I started with ag-grid-community to check out it's features and work it into our application, which I was successful with. I wanted to transition to enterprise to utilize the features, however when I switched to use #ag-grid-enterprise/all-modules and tried running my app, it started throwing a FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory error. When I switch back to ag-grid-community, it was able to run. I've tried the /usr/local/bin/node --max-old-space-size=8092 suggestion (even with a much higher number), but still got the same result. The following is some output produced.
<--- Last few GCs --->
[54531:0x110000000] 48815 ms: Mark-sweep 1296.0 (1442.8) -> 1279.7 (1440.8) MB, 659.4 / 0.0 ms (average mu = 0.166, current mu = 0.092) allocation failure scavenge might not succeed
[54531:0x110000000] 49521 ms: Mark-sweep 1293.0 (1440.8) -> 1282.9 (1442.8) MB, 680.0 / 0.0 ms (average mu = 0.104, current mu = 0.037) allocation failure scavenge might not succeed
<--- JS stacktrace --->
==== JS stack trace =========================================
0: ExitFrame [pc: 0x2ba46df5be3d]
1: StubFrame [pc: 0x2ba46df483f3]
Security context: 0x032486c9e6e9 <JSObject>
2: match [0x32486c904d1](this=0x032474daf801 <String[70]\: GridOptionsWrapper.prototype.isEmbedFullWidthRows = function () {\n>,0x03245c5bcee1 <JSRegExp <String[26]: (?!$)[^\n\r;{}]*[\n\r;{}]*>>)
3: /* anonymous */(aka /* anonymous */) [0x324e9bf4251] [/Users/erflor/Development/Project/site/node_modules/webpa...
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0x10003cf99 node::Abort() [/usr/local/bin/node]
2: 0x10003d1a3 node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
3: 0x1001b7835 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
4: 0x100585682 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/bin/node]
5: 0x100588155 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [/usr/local/bin/node]
6: 0x100583fff v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/usr/local/bin/node]

I updated the build command in the scripts section of my package.json and the error went away.
"scripts": {
"ng": "ng",
"start": "ng s",
"build": "echo 'build' && npm run version && ng build",
"build:dev": "echo 'build:dev' && node --max_old_space_size=8192 ./node_modules/#angular/cli/bin/ng build --configuration=dev",

Huge amount of memory used by flink

Since the last couple week I build a DataStream programs in Flink in scala.
But I have a strange behavior, flink uses lots of more memory than I expected.
I have a 4 ListState of tuple(Int, long) in my processFunction keyed by INT, I use it to get different unique Counter in a different time frame, and I expected the most of the memory was used by this List.
But it's not the case.
So I print an histo live of the JVM.
And I was surprised how many memories are used.
num #instances #bytes class name
----------------------------------------------
1: 138920685 6668192880 java.util.HashMap$Node
2: 138893041 5555721640 org.apache.flink.streaming.api.operators.InternalTimer
3: 149680624 3592334976 java.lang.Integer
4: 48313229 3092046656 org.apache.flink.runtime.state.heap.CopyOnWriteStateTable$StateTableEntry
5: 14042723 2579684280 [Ljava.lang.Object;
6: 4492 2047983264 [Ljava.util.HashMap$Node;
7: 41686732 1333975424 com.myJob.flink.tupleState
8: 201 784339688 [Lorg.apache.flink.runtime.state.heap.CopyOnWriteStateTable$StateTableEntry;
9: 17230300 689212000 com.myJob.flink.uniqStruct
10: 14025040 561001600 java.util.ArrayList
11: 8615581 413547888 com.myJob.flink.Data$FingerprintCnt
12: 6142006 393088384 com.myJob.flink.ProcessCountStruct
13: 4307549 172301960 com.myJob.flink.uniqresult
14: 4307841 137850912 com.myJob.flink.Data$FingerprintUniq
15: 2153904 137849856 com.myJob.flink.Data$StreamData
16: 1984742 79389680 scala.collection.mutable.ListBuffer
17: 1909472 61103104 scala.collection.immutable.$colon$colon
18: 22200 21844392 [B
19: 282624 9043968 org.apache.flink.shaded.netty4.io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
20: 59045 6552856 [C
21: 33194 2655520 java.nio.DirectByteBuffer
22: 32804 2361888 sun.misc.Cleaner
23: 35 2294600 [Lscala.concurrent.forkjoin.ForkJoinTask;
24: 640 2276352 [Lorg.apache.flink.shaded.netty4.io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry;
25: 32768 2097152 org.apache.flink.core.memory.HybridMemorySegment
26: 12291 2082448 java.lang.Class
27: 58591 1874912 java.lang.String
28: 8581 1372960 java.lang.reflect.Method
29: 32790 1311600 java.nio.DirectByteBuffer$Deallocator
30: 18537 889776 java.util.concurrent.ConcurrentHashMap$Node
31: 4239 508680 java.lang.reflect.Field
32: 8810 493360 java.nio.HeapByteBuffer
33: 7389 472896 java.util.HashMap
34: 5208 400336 [I
The tupple(Int, long) is com.myJob.flink.tupleState in 7th position.
And I see the tuple use less than 2G of memory.
I don't understand why flink used this amount of memory for these classes.
Can anyone give me a light on this behavior, thanks in advance.
Update:
I run my job on a stand alone cluster (1 jobManager, 3 taskManager)
the flink version is 1.5-SNAPSHOT commit : e4486ae
I get the histo live on one taskManager node.
Update 2 :
In my processFunction I used :
ctx.timerService.registerProcessingTimeTimer(ctx.timestamp + 100)
And after on onTimer function, I process my listState to check all old data.
so it create a timer for each call on processFunction.
but why the timer is steel on memory after onTimer function triggered

How many windows do you end up with? Based on the top two entries what are are seeing is the "timers" that are used by Flink to track when to clean up the window. For every key in the window you will end up with (key, endTimestamp) effectively in the timer state. If you have a very large number of windows (perhaps out of order time or delayed watermarking) or a very large number of keys in each window, those will each take up memory.
Note that even if you are using RocksDB state, the TimerService uses Heap memory so you have to watch out for that.

Ionic CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory while starting new project

I am new to Ionic I am getting below error when I am creating any new project.
Could any one help me to fix this issue?
==== JS stack trace =========================================
Security context: 0x1240acccf781 <JS Object>
1: build [/usr/local/lib/node_modules/ionic/node_modules/chalk/index.js:118] [pc=0x185922bbdfc] (this=0x1883445b3691 <JS Function Chalk.chalk.template (SharedFunctionInfo 0x1883445b3199)>,_styles=0x36dd597b5061 <JS Array[1]>,key=0x1240acc57f81 <String[4]: bold>)
2: _onTimeout [/usr/local/lib/node_modules/ionic/node_modules/#ionic/cli-utils/lib/utils/task.js:~111] [pc=0x18592337f60] (thi...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node::Abort() [ionic]
2: 0x109f0ac [ionic]
3: v8::Utils::ReportApiFailure(char const*, char const*) [ionic]
4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [ionic]
5: v8::internal::Factory::NewTransitionArray(int) [ionic]
6: v8::internal::TransitionArray::Insert(v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Map>, v8::internal::SimpleTransitionFlag) [ionic]
7: v8::internal::Map::CopyReplaceDescriptors(v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::DescriptorArray>, v8::internal::Handle<v8::internal::LayoutDescriptor>, v8::internal::TransitionFlag, v8::internal::MaybeHandle<v8::internal::Name>, char const*, v8::internal::SimpleTransitionFlag) [ionic]
8: v8::internal::Map::CopyAddDescriptor(v8::internal::Handle<v8::internal::Map>, v8::internal::Descriptor*, v8::internal::TransitionFlag) [ionic]
9: v8::internal::Map::CopyWithField(v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::FieldType>, v8::internal::PropertyAttributes, v8::internal::Representation, v8::internal::TransitionFlag) [ionic]
10: v8::internal::Map::TransitionToDataProperty(v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::Object::StoreFromKeyed) [ionic]
11: v8::internal::LookupIterator::PrepareTransitionToDataProperty(v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::Object::StoreFromKeyed) [ionic]
12: v8::internal::StoreIC::LookupForWrite(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::Object::StoreFromKeyed) [ionic]
13: v8::internal::StoreIC::UpdateCaches(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::Object::StoreFromKeyed) [ionic]
14: v8::internal::StoreIC::Store(v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::Object::StoreFromKeyed) [ionic]
15: v8::internal::Runtime_StoreIC_Miss(int, v8::internal::Object**, v8::internal::Isolate*) [ionic]
16: 0x185921092a7
Aborted (core dumped)

Try with the following command:
$node --max-old-space-size=4096 /usr/local/bin/ionic cordova build android --prod

Unnecessary load and store instruction in scala's byte code

I just did some inverstigation on pattern match and its corresponding byte code.
val a = Array(1,2,3,4)
a.map {
case i => i + 1
}
For above code, I use javap and got the byte code for the annonymous function inside map:
public int apply$mcII$sp(int);
Code:
0: iload_1
1: istore_2
2: iload_2
3: iconst_1
4: iadd
5: ireturn
So it seems to me that in line 0 we push an int (the parameter), then in line 1 we load the int and in line 2 we push it back ... What's the purpose here?
Thanks!

Dude, try -optimise.
public int apply$mcII$sp(int);
flags: ACC_PUBLIC
Code:
stack=2, locals=2, args_size=2
0: iload_1
1: iconst_1
2: iadd
3: ireturn
Use
scala> :javap -prv -
and then something like
scala> :javap -prv $line4/$read$$iw$$iw$$anonfun$1

This is not really an answer, since I couldn't figure out why this happens. I'm hoping that these observations will be at least helpful :)
I'm seeing the following bytecode in Scala 2.10:
public int apply$mcII$sp(int);
Code:
0: iload_1 ; var1 -> stack
1: istore_2 ; var2 <- stack
2: iload_2 ; var2 -> stack
3: iconst_1 ; 1 -> stack
4: iadd
5: istore_3 ; var3 <- stack
6: iload_3 ; var3 -> stack
7: ireturn ; return <- stack
The first two instructions seem to simply move the value of var1 to var2, then move var2 to the stack as a parameter. The same can be observed after iadd, where the result is stored in var3 for no apparent reason, since ireturn returns the value from the stack anyway.

[AVPlaybackItem fpItem]: message sent to deallocated instance

I play a movie with MPMoviePlayerController. Later, the app is "restarted" (meaning a pseudo-reset, where all viewControllers are removed and the user returns to the home screen), and the same movie is played again.
This leads to a crash in iOS 3.2.2 on the iPad:
[AVPlaybackItem fpItem]: message sent
to deallocated instance
I have no idea where that comes from. Seems to be something private. Has anyone experienced and possibly solved the same problem?
The stack trace for that particular address:
(gdb) info malloc 0x11471400
Alloc: Block address: 0x11471400 length: 76
Stack - pthread: 0xa0630500 number of frames: 34
0: 0x9534e0c3 in malloc_zone_calloc
1: 0x9534e01a in calloc
2: 0x343edc9 in _internal_class_createInstanceFromZone
3: 0x344b5c9 in _class_createInstanceFromZone
4: 0x344b5ef in class_createInstance
5: 0x3326b57 in +[NSObject allocWithZone:]
6: 0x332583a in +[NSObject alloc]
7: 0x536ab67 in -[AVPlaybackQueue queueItemWasAddedNotification:]
8: 0x27f586 in _nsnote_callback
9: 0x328d165 in _CFXNotificationPostNotification
10: 0x2762ca in -[NSNotificationCenter postNotificationName:object:userInfo:]
11: 0x5354982 in -[AVQueue itemWasAdded:atIndex:]
12: 0x5354801 in -[AVQueue insertItem:atIndex:error:]
13: 0x53549d8 in -[AVQueue appendItem:error:]
14: 0x535c3be in -[AVController addNextFeederItemToQueue]
15: 0x535b06f in -[AVController checkQueueSpace]
16: 0x5359f46 in -[AVController setQueue:]
17: 0x535ac62 in -[AVController setQueueFeeder:withIndex:]
18: 0x30eee20 in -[MPAVController reloadFeederWithStartIndex:]
19: 0x30deed7 in -[MPMoviePlayerControllerNew _prepareToPlayWithStartIndex:]
20: 0x30dc686 in -[MPMoviePlayerControllerNew prepareToPlay]
21: 0x27f586 in _nsnote_callback
22: 0x328d165 in _CFXNotificationPostNotification
23: 0x2762ca in -[NSNotificationCenter postNotificationName:object:userInfo:]
24: 0x281238 in -[NSNotificationCenter postNotificationName:object:]
25: 0x31596d1 in -[MPMovie _determineMediaType]
26: 0x291b87 in __NSFireDelayedPerform
27: 0x32747dc in CFRunLoopRunSpecific
28: 0x32738a8 in CFRunLoopRunInMode
29: 0x3aaf89d in GSEventRunModal
30: 0x3aaf962 in GSEventRun
31: 0x52b372 in UIApplicationMain
32: 0x27be in main at /blablabla
33: 0x2735 in start

It sounds like you're calling release more than you are calling retain.
Does the error message not contain a hex address at the end? If it does, follow these steps to hunt down the offending object:
Navigate to Project->Edit Active Executable (or press Command-Option-X). Choose the arguments tab. Set the environmental variables as shown below:
Run the program and repeat the steps needed to reproduce the error.
Copy the hex address at the end of the error. Then, in the debugger console type this command: (gdb) info malloc-history <paste-address-here>.
Examine the output to hunt down the offending object.
P.S. Don't forget to disable the environmental variables when you're done.

maybe you call prepare to play more than once for the same movie and i think this is the problem and it exists to all ios prior to 4.3 i guess (not sure though) so just flag the movie if prepare to play was called once then don't recall it for the same file

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Ceph OSD crashes - centos

Related

Migrating from ag-grid-community to ag-grid-enterprise with angular causing heap out of memory

Huge amount of memory used by flink

Ionic CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory while starting new project

Unnecessary load and store instruction in scala's byte code

[AVPlaybackItem fpItem]: message sent to deallocated instance

Categories

Resources