Ionic 3 app stuck with splash screen loading - ionic-framework

have a problem with running Ionic 3 app on real device. The application is already in prod and sometimes (not each time but quite often) it stucks on splash-screen. Bellow the device's logs (IOS) when the error appeared.
default 11:49:57.382435 -0400 OPSEU Initialized 2 displays
default 11:49:57.388655 -0400 OPSEU Faulting in NSHTTPCookieStorage singleton
default 11:49:57.388672 -0400 OPSEU Faulting in CFHTTPCookieStorage singleton
default 11:49:57.388688 -0400 OPSEU Creating default cookie storage with default identifier
default 11:49:57.389365 -0400 OPSEU Retrieving resting unlock: 0
default 11:49:57.409105 -0400 OPSEU Apache Cordova native platform version 4.5.3 is starting.
default 11:49:57.409212 -0400 OPSEU Multi-tasking -> Device: YES, App: YES
default 11:49:57.454831 -0400 OPSEU 0x1063d0000 - DocumentLoader::startLoadingMainResource: Returning empty document (frame = 0x105534240, main = 1)
default 11:49:57.454853 -0400 OPSEU Memory usage info dump at MainFrameLoadCompleted:
default 11:49:57.454873 -0400 OPSEU virtual_size: 4929028096
default 11:49:57.454893 -0400 OPSEU compressed: 0
default 11:49:57.454936 -0400 OPSEU javascript_gc_heap_extra_memory_size: 0
default 11:49:57.454957 -0400 OPSEU phys_footprint: 9798096
default 11:49:57.454976 -0400 OPSEU internal: 9584640
default 11:49:57.454995 -0400 OPSEU document_count: 1
default 11:49:57.455037 -0400 OPSEU pagecache_page_count: 0
default 11:49:57.455056 -0400 OPSEU javascript_gc_heap_capacity: 65536
default 11:49:57.455075 -0400 OPSEU resident_size: 20725760
default 11:49:57.503503 -0400 OPSEU Using UIWebView
default 11:49:57.503719 -0400 OPSEU [CDVTimer][console] 0.049949ms
default 11:49:57.503743 -0400 OPSEU [CDVTimer][handleopenurl] 0.074983ms
default 11:49:57.504011 -0400 OPSEU Unlimited access to network resources
default 11:49:57.504032 -0400 OPSEU [CDVTimer][intentandnavigationfilter] 1.449943ms
default 11:49:57.504051 -0400 OPSEU [CDVTimer][gesturehandler] 0.051022ms
default 11:49:57.504101 -0400 OPSEU [CDVTimer][base64togallery] 0.028014ms
default 11:49:57.504121 -0400 OPSEU [CDVTimer][camerapreview] 0.113964ms
default 11:49:57.504203 -0400 OPSEU [CDVTimer][file] 1.989007ms
default 11:49:57.504222 -0400 OPSEU [CDVTimer][cordovagooglemaps] 0.867963ms
default 11:49:57.504241 -0400 OPSEU CDVIonicKeyboard: resize mode 1
default 11:49:57.504318 -0400 OPSEU CDVIonicKeyboard: WARNING!!: Keyboard plugin works better with WK
default 11:49:57.504338 -0400 OPSEU [CDVTimer][keyboard] 1.910925ms
default 11:49:57.504356 -0400 OPSEU [CDVTimer][photolibrary] 0.115991ms
default 11:49:57.504869 -0400 OPSEU [CDVTimer][splashscreen] 5.751967ms
default 11:49:57.505553 -0400 OPSEU [CDVTimer][statusbar] 8.061051ms
default 11:49:57.505572 -0400 OPSEU [CDVTimer][TotalPluginStartup] 21.359086ms
default 11:49:57.531240 -0400 OPSEU createNotificationChecker
default 11:49:57.531273 -0400 OPSEU not coldstart
default 11:49:57.585531 -0400 OPSEU HTHangEventCreate: HangTracing is disabled. Not creating a new event.
default 11:49:57.585756 -0400 OPSEU active
default 11:49:57.585779 -0400 OPSEU PushPlugin skip clear badge
default 11:49:57.585799 -0400 OPSEU Memory usage info dump at MainFrameLoadStarted:
default 11:49:57.585819 -0400 OPSEU virtual_size: 4995710976
default 11:49:57.585837 -0400 OPSEU compressed: 0
default 11:49:57.585855 -0400 OPSEU javascript_gc_heap_extra_memory_size: 0
default 11:49:57.585874 -0400 OPSEU phys_footprint: 12370504
default 11:49:57.585930 -0400 OPSEU internal: 42434560
default 11:49:57.585952 -0400 OPSEU document_count: 1
default 11:49:57.585970 -0400 OPSEU pagecache_page_count: 0
default 11:49:57.586014 -0400 OPSEU javascript_gc_heap_capacity: 65536
default 11:49:57.586060 -0400 OPSEU resident_size: 68108288
default 11:49:57.588003 -0400 OPSEU 0x102711620 - FrameLoader::prepareForLoadStart: Starting frame load (frame = 0x105534240, main = 1)
default 11:49:57.588024 -0400 OPSEU Resetting plugins due to page load.
default 11:49:57.588044 -0400 OPSEU 0x10637c000 - DocumentLoader::startLoadingMainResource: Starting load (frame = 0x105534240, main = 1)
default 11:49:57.612794 -0400 OPSEU Task <B75FA3B6-2759-40A1-84FC-1A6A7F58B6BF>.<0> {strength 0, tls 4, ct 0, sub 0, sig 1, ciphers 0, bundle 0, builtin 0}
default 11:49:57.613217 -0400 OPSEU TIC Enabling TLS [1:0x283820f00]
default 11:49:57.613257 -0400 OPSEU TIC TCP Conn Start [1:0x283820f00]
default 11:49:57.613541 -0400 OPSEU [C1 Hostname#3007e493:443 tcp, url: https://fonts.googleapis.com/css?family=Maven+Pro, tls] start
default 11:49:57.615833 -0400 OPSEU nw_connection_report_state_with_handler_locked [C1] reporting state preparing
default 11:49:57.615917 -0400 OPSEU Task <B75FA3B6-2759-40A1-84FC-1A6A7F58B6BF>.<0> setting up Connection 1
default 11:49:57.621346 -0400 OPSEU Task <54B08C46-42A9-4A9D-B86A-04A87294C393>.<0> {strength 0, tls 4, ct 0, sub 0, sig 1, ciphers 0, bundle 0, builtin 0}
default 11:49:57.621552 -0400 OPSEU TIC Enabling TLS [2:0x283828c00]
default 11:49:57.621588 -0400 OPSEU TIC TCP Conn Start [2:0x283828c00]
default 11:49:57.621750 -0400 OPSEU [C2 Hostname#59d22349:443 tcp, url: https://maps.googleapis.com/maps/api/js?v=3&key=AIzaSyBlWhkNqdzt7vKcTx-0alvhumVD-_mVF9U, tls] start
default 11:49:57.624564 -0400 OPSEU nw_connection_report_state_with_handler_locked [C2] reporting state preparing
default 11:49:57.624657 -0400 OPSEU Task <54B08C46-42A9-4A9D-B86A-04A87294C393>.<0> setting up Connection 2
default 11:49:57.629657 -0400 OPSEU nw_endpoint_flow_protocol_connected [C1.1 IPv4#c06d301c:443 in_progress channel-flow (satisfied)] Transport protocol connected
default 11:49:57.631032 -0400 OPSEU TIC TLS Event [1:0x283820f00]: 1, Pending(0)
default 11:49:57.631106 -0400 OPSEU TIC TLS Event [1:0x283820f00]: 2, Pending(0)
default 11:49:57.639754 -0400 OPSEU nw_endpoint_flow_protocol_connected [C2.1 IPv4#c06d301c:443 in_progress channel-flow (satisfied)] Transport protocol connected
default 11:49:57.641551 -0400 OPSEU TIC TLS Event [2:0x283828c00]: 1, Pending(0)
default 11:49:57.641666 -0400 OPSEU TIC TLS Event [2:0x283828c00]: 2, Pending(0)
default 11:49:57.668317 -0400 OPSEU TIC TLS Event [1:0x283820f00]: 2, Pending(0)
default 11:49:57.668500 -0400 OPSEU TIC TLS Event [1:0x283820f00]: 11, Pending(0)
default 11:49:57.668695 -0400 OPSEU TIC TLS Event [1:0x283820f00]: 14, Pending(0)
default 11:49:57.672138 -0400 OPSEU TIC TLS Trust Result [1:0x283820f00]: 0
default 11:49:57.678579 -0400 OPSEU TIC TLS Event [2:0x283828c00]: 2, Pending(0)
default 11:49:57.678768 -0400 OPSEU TIC TLS Event [2:0x283828c00]: 11, Pending(0)
default 11:49:57.678875 -0400 OPSEU TIC TLS Event [2:0x283828c00]: 14, Pending(0)
default 11:49:57.685699 -0400 OPSEU nw_endpoint_flow_protocol_connected [C1.1 IPv4#c06d301c:443 in_progress channel-flow (satisfied)] Output protocol connected
default 11:49:57.685976 -0400 OPSEU nw_connection_report_state_with_handler_locked [C1] reporting state ready
default 11:49:57.686357 -0400 OPSEU TIC TLS Event [1:0x283820f00]: 20, Pending(0)
default 11:49:57.686428 -0400 OPSEU TIC TCP Conn Connected [1:0x283820f00]: Err(16)
default 11:49:57.686699 -0400 OPSEU TIC TCP Conn Event [1:0x283820f00]: 1
default 11:49:57.686741 -0400 OPSEU TIC TCP Conn Event [1:0x283820f00]: 8
default 11:49:57.686782 -0400 OPSEU TIC TLS Handshake Complete [1:0x283820f00]
default 11:49:57.687171 -0400 OPSEU Task <B75FA3B6-2759-40A1-84FC-1A6A7F58B6BF>.<0> now using Connection 1
default 11:49:57.687355 -0400 OPSEU TIC TLS Trust Result [2:0x283828c00]: 0
default 11:49:57.687489 -0400 OPSEU nw_endpoint_flow_protocol_connected [C1.1 IPv4#c06d301c:443 ready channel-flow (satisfied)] Output protocol connected
default 11:49:57.687842 -0400 OPSEU Task <B75FA3B6-2759-40A1-84FC-1A6A7F58B6BF>.<0> sent request, body N
default 11:49:57.703001 -0400 OPSEU nw_endpoint_flow_protocol_connected [C2.1 IPv4#c06d301c:443 in_progress channel-flow (satisfied)] Output protocol connected
default 11:49:57.703221 -0400 OPSEU nw_connection_report_state_with_handler_locked [C2] reporting state ready
default 11:49:57.703516 -0400 OPSEU TIC TLS Event [2:0x283828c00]: 20, Pending(0)
default 11:49:57.703541 -0400 OPSEU TIC TCP Conn Connected [2:0x283828c00]: Err(16)
default 11:49:57.703677 -0400 OPSEU TIC TCP Conn Event [2:0x283828c00]: 1
default 11:49:57.703782 -0400 OPSEU TIC TCP Conn Event [2:0x283828c00]: 8
default 11:49:57.703804 -0400 OPSEU TIC TLS Handshake Complete [2:0x283828c00]
default 11:49:57.703896 -0400 OPSEU Task <54B08C46-42A9-4A9D-B86A-04A87294C393>.<0> now using Connection 2
default 11:49:57.704059 -0400 OPSEU nw_endpoint_flow_protocol_connected [C2.1 IPv4#c06d301c:443 ready channel-flow (satisfied)] Output protocol connected
default 11:49:57.704169 -0400 OPSEU Task <54B08C46-42A9-4A9D-B86A-04A87294C393>.<0> sent request, body N
default 11:49:57.725431 -0400 OPSEU Task <B75FA3B6-2759-40A1-84FC-1A6A7F58B6BF>.<0> received response, status 200 content C
default 11:49:57.725948 -0400 OPSEU Task <B75FA3B6-2759-40A1-84FC-1A6A7F58B6BF>.<0> response ended
default 11:49:57.726017 -0400 OPSEU Task <B75FA3B6-2759-40A1-84FC-1A6A7F58B6BF>.<0> done using Connection 1
default 11:49:57.760519 -0400 OPSEU Task <54B08C46-42A9-4A9D-B86A-04A87294C393>.<0> received response, status 200 content K
default 11:49:57.779260 -0400 OPSEU Task <54B08C46-42A9-4A9D-B86A-04A87294C393>.<0> response ended
default 11:49:57.779417 -0400 OPSEU Task <54B08C46-42A9-4A9D-B86A-04A87294C393>.<0> done using Connection 2
default 11:50:27.859420 -0400 OPSEU TIC TCP Conn Cancel [2:0x283828c00]
default 11:50:27.859911 -0400 OPSEU [C2 Hostname#59d22349:443 tcp, url: https://maps.googleapis.com/maps/api/js?v=3&key=AIzaSyBlWhkNqdzt7vKcTx-0alvhumVD-_mVF9U, tls] cancel
default 11:50:27.859942 -0400 OPSEU [C2 Hostname#59d22349:443 tcp, url: https://maps.googleapis.com/maps/api/js?v=3&key=AIzaSyBlWhkNqdzt7vKcTx-0alvhumVD-_mVF9U, tls] cancelled
[C2.1 <private> 192.168.15.253:61560<->IPv4#c06d301c:443]
Connected Path: satisfied (Path is satisfied), interface: en0, ipv4, dns
Duration: 30.238s, DNS #0.000s took 0.005s, TCP #0.007s took 0.011s, TLS took 0.063s
bytes in/out: 30897/677, packets in/out: 24/3, rtt: 0.017s, retransmitted packets: 0, out-of-order packets: 11
default 11:50:27.860002 -0400 OPSEU 0.000s [C2 <private> Hostname#59d22349:443 resolver] path:start
default 11:50:27.860026 -0400 OPSEU 0.000s [C2 <private> Hostname#59d22349:443 resolver] path:satisfied
default 11:50:27.860047 -0400 OPSEU 0.000s [C2 <private> Hostname#59d22349:443 resolver] resolver:start_dns
default 11:50:27.860068 -0400 OPSEU 0.005s [C2 <private> Hostname#59d22349:443 resolver] resolver:receive_dns
default 11:50:27.860117 -0400 OPSEU 0.005s [C2.1 <private> 192.168.15.253:61560<->IPv4#c06d301c:443 channel-flow] path:start
default 11:50:27.860159 -0400 OPSEU 0.006s [C2.1 <private> 192.168.15.253:61560<->IPv4#c06d301c:443 channel-flow] path:satisfied
default 11:50:27.860179 -0400 OPSEU 0.006s [C2.1 <private> 192.168.15.253:61560<->IPv4#c06d301c:443 channel-flow] flow:start_nexus
default 11:50:27.860222 -0400 OPSEU 0.006s [C2.1 <private> 192.168.15.253:61560<->IPv4#c06d301c:443 channel-flow] flow:receive_nexus
default 11:50:27.860396 -0400 OPSEU 0.007s [C2.1 <private> 192.168.15.253:61560<->IPv4#c06d301c:443 channel-flow] flow:start_connect
default 11:50:27.860417 -0400 OPSEU 0.018s [C2.1 <private> 192.168.15.253:61560<->IPv4#c06d301c:443 channel-flow] flow:finish_transport
default 11:50:27.860892 -0400 OPSEU 0.018s [C2 <private> Hostname#59d22349:443 resolver] flow:finish_transport
default 11:50:27.860924 -0400 OPSEU 0.081s [C2.1 <private> 192.168.15.253:61560<->IPv4#c06d301c:443 channel-flow] flow:finish_connect
default 11:50:27.860988 -0400 OPSEU 0.081s [C2 <private> Hostname#59d22349:443 resolver] flow:finish_connect
default 11:50:27.861105 -0400 OPSEU 0.081s [C2.1 <private> 192.168.15.253:61560<->IPv4#c06d301c:443 channel-flow] flow:changed_viability
default 11:50:27.861283 -0400 OPSEU 0.081s [C2 <private> Hostname#59d22349:443 resolver] flow:changed_viability
default 11:50:27.861304 -0400 OPSEU 30.238s [C2] path:cancel
default 11:50:27.861724 -0400 OPSEU nw_protocol_tcp_log_summary [C2.1:3]
[<private> <private>:61560<-><private>:443]
Init: 1, Conn_Time: 10.234ms, Syn's: 1, WR_T: 0/0, RD_T: 0/0, TFO: 0/0/0, ECN: 0/0/0, TS: 1
RTT_Cache: process, rtt_upd: 4, rtt: 17.625ms, rtt_var: 13.125ms rtt_nc: 14.437ms, rtt_var_nc: 12.500ms
default 11:50:27.861769 -0400 OPSEU nw_endpoint_flow_protocol_disconnected [C2.1 IPv4#c06d301c:443 cancelled channel-flow (null)] Output protocol disconnected
default 11:50:27.862051 -0400 OPSEU nw_connection_report_state_with_handler_locked [C2] reporting state cancelled
default 11:50:27.862070 -0400 OPSEU TIC TCP Conn Cancel [1:0x283820f00]
default 11:50:27.862108 -0400 OPSEU [C1 Hostname#3007e493:443 tcp, url: https://fonts.googleapis.com/css?family=Maven+Pro, tls] cancel
default 11:50:27.862137 -0400 OPSEU [C1 Hostname#3007e493:443 tcp, url: https://fonts.googleapis.com/css?family=Maven+Pro, tls] cancelled
[C1.1 <private> 192.168.15.253:61559<->IPv4#c06d301c:443]
Connected Path: satisfied (Path is satisfied), interface: en0, ipv4, dns
Duration: 30.247s, DNS #0.000s took 0.003s, TCP #0.005s took 0.011s, TLS took 0.056s
bytes in/out: 3623/655, packets in/out: 5/3, rtt: 0.015s, retransmitted packets: 0, out-of-order packets: 2
default 11:50:27.862158 -0400 OPSEU 0.000s [C1 <private> Hostname#3007e493:443 resolver] path:start
default 11:50:27.862179 -0400 OPSEU 0.000s [C1 <private> Hostname#3007e493:443 resolver] path:satisfied
default 11:50:27.862198 -0400 OPSEU 0.000s [C1 <private> Hostname#3007e493:443 resolver] resolver:start_dns
default 11:50:27.862218 -0400 OPSEU 0.003s [C1 <private> Hostname#3007e493:443 resolver] resolver:receive_dns
default 11:50:27.862243 -0400 OPSEU 0.004s [C1.1 <private> 192.168.15.253:61559<->IPv4#c06d301c:443 channel-flow] path:start
default 11:50:27.862263 -0400 OPSEU 0.004s [C1.1 <private> 192.168.15.253:61559<->IPv4#c06d301c:443 channel-flow] path:satisfied
default 11:50:27.862282 -0400 OPSEU 0.004s [C1.1 <private> 192.168.15.253:61559<->IPv4#c06d301c:443 channel-flow] flow:start_nexus
default 11:50:27.862325 -0400 OPSEU 0.004s [C1.1 <private> 192.168.15.253:61559<->IPv4#c06d301c:443 channel-flow] flow:receive_nexus
default 11:50:27.862346 -0400 OPSEU 0.005s [C1.1 <private> 192.168.15.253:61559<->IPv4#c06d301c:443 channel-flow] flow:start_connect
default 11:50:27.862365 -0400 OPSEU 0.016s [C1.1 <private> 192.168.15.253:61559<->IPv4#c06d301c:443 channel-flow] flow:finish_transport
default 11:50:27.862387 -0400 OPSEU 0.016s [C1 <private> Hostname#3007e493:443 resolver] flow:finish_transport
default 11:50:27.862407 -0400 OPSEU 0.072s [C1.1 <private> 192.168.15.253:61559<->IPv4#c06d301c:443 channel-flow] flow:finish_connect
default 11:50:27.862426 -0400 OPSEU 0.072s [C1 <private> Hostname#3007e493:443 resolver] flow:finish_connect
default 11:50:27.862446 -0400 OPSEU 0.072s [C1.1 <private> 192.168.15.253:61559<->IPv4#c06d301c:443 channel-flow] flow:changed_viability
default 11:50:27.862464 -0400 OPSEU 0.072s [C1 <private> Hostname#3007e493:443 resolver] flow:changed_viability
default 11:50:27.862489 -0400 OPSEU 30.247s [C1] path:cancel
default 11:50:27.862690 -0400 OPSEU nw_protocol_tcp_log_summary [C1.1:3]
[<private> <private>:61559<-><private>:443]
Init: 1, Conn_Time: 10.057ms, Syn's: 1, WR_T: 0/0, RD_T: 0/0, TFO: 0/0/0, ECN: 0/0/0, TS: 1
RTT_Cache: kernel, rtt_upd: 4, rtt: 15.750ms, rtt_var: 10.312ms rtt_nc: 13.156ms, rtt_var_nc: 8.875ms
default 11:50:27.862729 -0400 OPSEU nw_endpoint_flow_protocol_disconnected [C1.1 IPv4#c06d301c:443 cancelled channel-flow (null)] Output protocol disconnected
default 11:50:27.862767 -0400 OPSEU nw_connection_report_state_with_handler_locked [C1] reporting state cancelled
default 11:50:27.862860 -0400 OPSEU TIC TCP Conn Destroyed [2:0x283828c00]
default 11:50:27.862956 -0400 OPSEU TIC TCP Conn Destroyed [1:0x283820f00]
I'm a new guy in the development world, so can you determine where is the problem here?

What you have given isn't sufficient, though have you enabled prod mode in your main.ts? If not change your main.ts to below
import { platformBrowserDynamic } from '#angular/platform-browser-dynamic';
import { AppModule } from './app.module';
import {enableProdMode} from "#angular/core";
enableProdMode();
platformBrowserDynamic().bootstrapModule(AppModule);

Ok, the funny thing. When I added catch block for platform.ready promise - the problem had gone.
platform.ready().then(() => {
.....
console.log('---READY---');
splashScreen.hide();
})
.catch((err)=> {
console.log('ERROR: ', err);
});
I hoped to see what cause an error, but my console is crystal clear. I would be appreciated if some one could explain this phenomena.
Thank you.

This usually happens when some code in your app throws an error on launch that causes the app to break. Might be an incompatible plugin, or some other function that is failing at times.
Run a debug build and check the console for any errors.
You could also post your ionic info output and check if all plugin requirements are met.

Related

NotReady node with ContainerGCFailed warning

see the following in the events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ContainerGCFailed 58s (x1775 over 30h) kubelet rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (16797216 vs. 16777216)
and in Conditions:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Sat, 19 Nov 2022 17:17:30 -0600 Wed, 16 Nov 2022 22:28:31 -0600 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sat, 19 Nov 2022 17:17:30 -0600 Wed, 16 Nov 2022 22:28:31 -0600 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sat, 19 Nov 2022 17:17:30 -0600 Wed, 16 Nov 2022 22:28:31 -0600 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Sat, 19 Nov 2022 17:17:30 -0600 Fri, 18 Nov 2022 11:03:06 -0600 KubeletNotReady PLEG is not healthy: pleg was last seen active 30h17m27.791101751s ago; threshold is 3m0s
how to interpeter this information? What could be the reason?
The info is relatively obvious.
how to interpeter this information?
Kubernetes uses Garbage collection to clean up cluster resources. The kubelet performs garbage collection on unused images every five minutes and on unused containers every minute. Reason "ContainerGCFailed" means that it fails the GC process.
What could be the reason?
The limit Kubelet has set for gRPC messages is 16MB. When you have a LOT of (possibly dead) containers, the size of the gRPC message exceeds it, and kubelet receives the rpc error.
Possible solution:
Remove those old dead containers and add --maximum-dead-containers=1000 to the Kubelet to solve the issue.

Why clustering on k8s through redis-ha doesn't work?

I'm trying to create Redis cluster along with Node.JS (ioredis/cluster) but that doesn't seem to work.
It's v1.11.8-gke.6 on GKE.
I'm doing exactly what been told in ha-redis docs:
~  helm install --set replicas=3 --name redis-test stable/redis-ha
NAME: redis-test
LAST DEPLOYED: Fri Apr 26 00:13:31 2019
NAMESPACE: yt
STATUS: DEPLOYED
RESOURCES:
==> v1/ConfigMap
NAME DATA AGE
redis-test-redis-ha-configmap 3 0s
redis-test-redis-ha-probes 2 0s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
redis-test-redis-ha-server-0 0/2 Init:0/1 0 0s
==> v1/Role
NAME AGE
redis-test-redis-ha 0s
==> v1/RoleBinding
NAME AGE
redis-test-redis-ha 0s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
redis-test-redis-ha ClusterIP None <none> 6379/TCP,26379/TCP 0s
redis-test-redis-ha-announce-0 ClusterIP 10.7.244.34 <none> 6379/TCP,26379/TCP 0s
redis-test-redis-ha-announce-1 ClusterIP 10.7.251.35 <none> 6379/TCP,26379/TCP 0s
redis-test-redis-ha-announce-2 ClusterIP 10.7.252.94 <none> 6379/TCP,26379/TCP 0s
==> v1/ServiceAccount
NAME SECRETS AGE
redis-test-redis-ha 1 0s
==> v1/StatefulSet
NAME READY AGE
redis-test-redis-ha-server 0/3 0s
NOTES:
Redis can be accessed via port 6379 and Sentinel can be accessed via port 26379 on the following DNS name from within your cluster:
redis-test-redis-ha.yt.svc.cluster.local
To connect to your Redis server:
1. Run a Redis pod that you can use as a client:
kubectl exec -it redis-test-redis-ha-server-0 sh -n yt
2. Connect using the Redis CLI:
redis-cli -h redis-test-redis-ha.yt.svc.cluster.local
~  k get pods | grep redis-test
redis-test-redis-ha-server-0 2/2 Running 0 1m
redis-test-redis-ha-server-1 2/2 Running 0 1m
redis-test-redis-ha-server-2 2/2 Running 0 54s
~  kubectl exec -it redis-test-redis-ha-server-0 sh -n yt
Defaulting container name to redis.
Use 'kubectl describe pod/redis-test-redis-ha-server-0 -n yt' to see all of the containers in this pod.
/data $ redis-cli -h redis-test-redis-ha.yt.svc.cluster.local
redis-test-redis-ha.yt.svc.cluster.local:6379> set test key
(error) READONLY You can't write against a read only replica.
But in the end only one random pod I connect to is writable. I ran logs on a few containers and everything seem to be fine there. I tried to run cluster info in redis-cli but I get ERR This instance has cluster support disabled everywhere.
Logs:
~  k logs pod/redis-test-redis-ha-server-0 redis
1:C 25 Apr 2019 20:13:43.604 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Apr 2019 20:13:43.604 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Apr 2019 20:13:43.604 # Configuration loaded
1:M 25 Apr 2019 20:13:43.606 * Running mode=standalone, port=6379.
1:M 25 Apr 2019 20:13:43.606 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 25 Apr 2019 20:13:43.606 # Server initialized
1:M 25 Apr 2019 20:13:43.606 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 25 Apr 2019 20:13:43.627 * DB loaded from disk: 0.021 seconds
1:M 25 Apr 2019 20:13:43.627 * Ready to accept connections
1:M 25 Apr 2019 20:14:11.801 * Replica 10.7.251.35:6379 asks for synchronization
1:M 25 Apr 2019 20:14:11.801 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c2827ffe011d774db005a44165bac67a7e7f7d85', my replication IDs are '8311a1ca896e97d5487c07f2adfd7d4ef924f36b' and '0000000000000000000000000000000000000000')
1:M 25 Apr 2019 20:14:11.802 * Delay next BGSAVE for diskless SYNC
1:M 25 Apr 2019 20:14:17.825 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 25 Apr 2019 20:14:17.825 * Background RDB transfer started by pid 55
55:C 25 Apr 2019 20:14:17.826 * RDB: 0 MB of memory used by copy-on-write
1:M 25 Apr 2019 20:14:17.926 * Background RDB transfer terminated with success
1:M 25 Apr 2019 20:14:17.926 # Slave 10.7.251.35:6379 correctly received the streamed RDB file.
1:M 25 Apr 2019 20:14:17.926 * Streamed RDB transfer with replica 10.7.251.35:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 25 Apr 2019 20:14:18.828 * Synchronization with replica 10.7.251.35:6379 succeeded
1:M 25 Apr 2019 20:14:42.711 * Replica 10.7.252.94:6379 asks for synchronization
1:M 25 Apr 2019 20:14:42.711 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c2827ffe011d774db005a44165bac67a7e7f7d85', my replication IDs are 'af453adde824b2280ba66adb40cc765bf390e237' and '0000000000000000000000000000000000000000')
1:M 25 Apr 2019 20:14:42.711 * Delay next BGSAVE for diskless SYNC
1:M 25 Apr 2019 20:14:48.976 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 25 Apr 2019 20:14:48.977 * Background RDB transfer started by pid 125
125:C 25 Apr 2019 20:14:48.978 * RDB: 0 MB of memory used by copy-on-write
1:M 25 Apr 2019 20:14:49.077 * Background RDB transfer terminated with success
1:M 25 Apr 2019 20:14:49.077 # Slave 10.7.252.94:6379 correctly received the streamed RDB file.
1:M 25 Apr 2019 20:14:49.077 * Streamed RDB transfer with replica 10.7.252.94:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 25 Apr 2019 20:14:49.761 * Synchronization with replica 10.7.252.94:6379 succeeded
~  k logs pod/redis-test-redis-ha-server-1 redis
1:C 25 Apr 2019 20:14:11.780 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Apr 2019 20:14:11.781 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Apr 2019 20:14:11.781 # Configuration loaded
1:S 25 Apr 2019 20:14:11.786 * Running mode=standalone, port=6379.
1:S 25 Apr 2019 20:14:11.791 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:S 25 Apr 2019 20:14:11.791 # Server initialized
1:S 25 Apr 2019 20:14:11.791 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:S 25 Apr 2019 20:14:11.792 * DB loaded from disk: 0.001 seconds
1:S 25 Apr 2019 20:14:11.792 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 25 Apr 2019 20:14:11.792 * Ready to accept connections
1:S 25 Apr 2019 20:14:11.792 * Connecting to MASTER 10.7.244.34:6379
1:S 25 Apr 2019 20:14:11.792 * MASTER <-> REPLICA sync started
1:S 25 Apr 2019 20:14:11.792 * Non blocking connect for SYNC fired the event.
1:S 25 Apr 2019 20:14:11.793 * Master replied to PING, replication can continue...
1:S 25 Apr 2019 20:14:11.799 * Trying a partial resynchronization (request c2827ffe011d774db005a44165bac67a7e7f7d85:6006176).
1:S 25 Apr 2019 20:14:17.824 * Full resync from master: af453adde824b2280ba66adb40cc765bf390e237:722
1:S 25 Apr 2019 20:14:17.824 * Discarding previously cached master state.
1:S 25 Apr 2019 20:14:17.852 * MASTER <-> REPLICA sync: receiving streamed RDB from master
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Flushing old data
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Finished with success
What am I missing or is there a better way to do clustering?
Not the best solution, but I figured I can just use Sentinel instead of finding another way (or maybe there is no another way). It has support on most languages so it shouldn't be very hard (except redis-cli, can't figure how to query Sentinel server).
This is how I got this done on ioredis (node.js, sorry if you not familiar with ES6 syntax):
import * as IORedis from 'ioredis';
import Redis from 'ioredis';
import { redisHost, redisPassword, redisPort } from './config';
export function getRedisConfig(): IORedis.RedisOptions {
// I'm not sure how to set this properly
// ioredis/cluster automatically resolves all pods by hostname, but not this.
// So I have to implicitly specify all pods.
// Or resolve them all by hostname
return {
sentinels: process.env.REDIS_CLUSTER.split(',').map(d => {
const [host, port = 26379] = d.split(':');
return { host, port: Number(port) };
}),
name: process.env.REDIS_MASTER_NAME || 'mymaster',
...(redisPassword ? { password: redisPassword } : {}),
};
}
export async function initializeRedis() {
if (process.env.REDIS_CLUSTER) {
const cluster = new Redis(getRedisConfig());
return cluster;
}
// For dev environment
const client = new Redis(redisPort, redisHost);
if (redisPassword) {
await client.auth(redisPassword);
}
return client;
}
In env:
env:
- name: REDIS_CLUSTER
value: redis-redis-ha-server-1.redis-redis-ha.yt.svc.cluster.local:26379,redis-redis-ha-server-0.redis-redis-ha.yt.svc.cluster.local:23679,redis-redis-ha-server-2.redis-redis-ha.yt.svc.cluster.local:23679
You may wanna protect it using password.

How to use K8S node_problem_detector?

Question
node-problem-detector is mentioned in Monitor Node Health documentation if K8S. How do we use it if it is not in GCE? Does it feed information to Dashboard or provide API metrics?
"This tool aims to make various node problems visible to the upstream layers in cluster management stack. It is a daemon which runs on each node, detects node problems and reports them to apiserver."
Err Ok but... What does that actually mean? How can I tell if it went to the api server?
What does the before and after look like? Knowing that would help me understand what it's doing.
Before installing Node Problem Detector I see:
Bash# kubectl describe node ip-10-40-22-166.ec2.internal | grep -i condition -A 20 | grep Ready -B 20
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Thu, 20 Jun 2019 12:30:05 -0400 Thu, 20 Jun 2019 12:30:05 -0400 WeaveIsUp Weave pod has set this
OutOfDisk False Thu, 20 Jun 2019 18:27:39 -0400 Thu, 20 Jun 2019 12:29:44 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Thu, 20 Jun 2019 18:27:39 -0400 Thu, 20 Jun 2019 12:29:44 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 20 Jun 2019 18:27:39 -0400 Thu, 20 Jun 2019 12:29:44 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Thu, 20 Jun 2019 18:27:39 -0400 Thu, 20 Jun 2019 12:29:44 -0400 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Thu, 20 Jun 2019 18:27:39 -0400 Thu, 20 Jun 2019 12:30:14 -0400 KubeletReady kubelet is posting ready status
After installing Node Problem Detector I see:
Bash# helm upgrade --install npd stable/node-problem-detector -f node-problem-detector.values.yaml
Bash# kubectl rollout status daemonset npd-node-problem-detector #(wait for up)
Bash# kubectl describe node ip-10-40-22-166.ec2.internal | grep -i condition -A 20 | grep Ready -B 20
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
DockerDaemon False Thu, 20 Jun 2019 22:06:17 -0400 Thu, 20 Jun 2019 22:04:14 -0400 DockerDaemonHealthy Docker daemon is healthy
EBSHealth False Thu, 20 Jun 2019 22:06:17 -0400 Thu, 20 Jun 2019 22:04:14 -0400 NoVolumeErrors Volumes are attaching successfully
KernelDeadlock False Thu, 20 Jun 2019 22:06:17 -0400 Thu, 20 Jun 2019 22:04:14 -0400 KernelHasNoDeadlock kernel has no deadlock
ReadonlyFilesystem False Thu, 20 Jun 2019 22:06:17 -0400 Thu, 20 Jun 2019 22:04:14 -0400 FilesystemIsNotReadOnly Filesystem is not read-only
NetworkUnavailable False Thu, 20 Jun 2019 12:30:05 -0400 Thu, 20 Jun 2019 12:30:05 -0400 WeaveIsUp Weave pod has set this
OutOfDisk False Thu, 20 Jun 2019 22:07:10 -0400 Thu, 20 Jun 2019 12:29:44 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Thu, 20 Jun 2019 22:07:10 -0400 Thu, 20 Jun 2019 12:29:44 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 20 Jun 2019 22:07:10 -0400 Thu, 20 Jun 2019 12:29:44 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Thu, 20 Jun 2019 22:07:10 -0400 Thu, 20 Jun 2019 12:29:44 -0400 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Thu, 20 Jun 2019 22:07:10 -0400 Thu, 20 Jun 2019 12:30:14 -0400 KubeletReady kubelet is posting ready status
Note I asked for help coming up with a way to see this for all nodes, Kenna Ofoegbu came up with this super useful and readable gem:
zsh# nodes=$(kubectl get nodes | sed '1d' | awk '{print $1}') && for node in $nodes; do; kubectl describe node | sed -n '/Conditions/,/Ready/p' ; done
Bash# (same command, gives errors)
Ok so now I know what Node Problem Detector does but... what good is adding a condition to the node, how do I use the condition to do something useful?
Question: How to use Kubernetes Node Problem Detector?
Use Case #1: Auto heal borked nodes
Step 1.) Install Node Problem Detector, so it can attach new condition metadata to nodes.
Step 2.) Leverage Planetlabs/draino to cordon and drain nodes with bad conditions.
Step 3.) Leverage https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler to auto heal. (When the node is cordon and drained it'll be marked unscheduleable, this will trigger a new node to be provisioned, and then the bad node's resource utilization will be super low which will cause the bad node to get deprovisioned)
Source: https://github.com/kubernetes/node-problem-detector#remedy-systems
Use Case #2: Surface the unhealthy node event so that it can be detected by Kubernetes, and then injested into your monitoring stack so you have an auditable historic record that the event occurred and when.
These unhealthy node events are logged somewhere on the host node, but usually, the host node is generating so much noisy/useless log data that these events aren't usually collected by default.
Node Problem Detector knows where to look for these events on the host node and filters out the noise when it sees the signal of a negative outcome it'll post it to its pod log, which isn't noisy.
The pod log is likely getting ingested into an ELK and Prometheus Operator stack, where it can be detected, alerted on, stored, and graphed.
Also, note that nothing is stopping you from implementing both use cases.
Update, added a snippet of node-problem-detector.helm-values.yaml file per request in comment:
log_monitors:
#https://github.com/kubernetes/node-problem-detector/tree/master/config contains the full list, you can exec into the pod and ls /config/ to see these as well.
- /config/abrt-adaptor.json #Adds ABRT Node Events (ABRT: automatic bug reporting tool), exceptions will show up under "kubectl describe node $NODENAME | grep Events -A 20"
- /config/kernel-monitor.json #Adds 2 new Node Health Condition Checks "KernelDeadlock" and "ReadonlyFilesystem"
- /config/docker-monitor.json #Adds new Node Health Condition Check "DockerDaemon" (Checks if Docker is unhealthy as a result of corrupt image)
# - /config/docker-monitor-filelog.json #Error: "/var/log/docker.log: no such file or directory", doesn't exist on pod, I think you'd have to mount node hostpath to get it to work, gain doesn't sound worth effort.
# - /config/kernel-monitor-filelog.json #Should add to existing Node Health Check "KernelDeadlock", more thorough detection, but silently fails in NPD pod logs for me.
custom_plugin_monitors: #[]
# Someone said all *-counter plugins are custom plugins, if you put them under log_monitors, you'll get #Error: "Failed to unmarshal configuration file "/config/kernel-monitor-counter.json""
- /config/kernel-monitor-counter.json #Adds new Node Health Condition Check "FrequentUnregisteredNetDevice"
- /config/docker-monitor-counter.json #Adds new Node Health Condition Check "CorruptDockerOverlay2"
- /config/systemd-monitor-counter.json #Adds 3 new Node Health Condition Checks "FrequentKubeletRestart", "FrequentDockerRestart", and "FrequentContainerdRestart"
Considering node-problem-detector is a Kubernetes addon, you would need to install that addon on your own Kubernetes server.
A Kubernetes CLuster has an addon-manager that will use it.
Do you mean:how to install it?
kubectl create -f https://github.com/kubernetes/node-problem-detector.yaml

How do I find out what image is running in a Kubernetes VM on GCE?

I've created a Kubernetes cluster in Google Compute Engine using cluster/kube-up.sh. How can I find out what Linux image GCE used to create the virtual machines? I've logged into some nodes using SSH and the usual commands (uname -a etc) don't tell me.
The default config file at kubernetes/cluster/gce/config-default.sh doesn't seem to offer any clues.
It uses something called Google Container VM image. Check out the blogpost announcing it here:
https://cloudplatform.googleblog.com/2016/09/introducing-Google-Container-VM-Image.html
There are two simple ways to look at it
In the Kubernetes GUI based dashboard, click on the nodes
From command line of the kubernetes master node use kubectl describe
pods/{pod-name}
(Make sure to select the correct namespace, if you are using any.)
Here is a sample output, please look into the "image" label of the output
kubectl describe pods/fedoraapache
Name: fedoraapache
Namespace: default
Image(s): fedora/apache
Node: 127.0.0.1/127.0.0.1
Labels: name=fedoraapache
Status: Running
Reason:
Message:
IP: 172.17.0.2
Replication Controllers: <none>
Containers:
fedoraapache:
Image: fedora/apache
State: Running
Started: Thu, 06 Aug 2015 03:38:37 -0400
Ready: True
Restart Count: 0
Conditions:
Type Status
Ready True
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Thu, 06 Aug 2015 03:38:35 -0400 Thu, 06 Aug 2015 03:38:35 -0400 1 {scheduler } scheduled Successfully assigned fedoraapache to 127.0.0.1
Thu, 06 Aug 2015 03:38:35 -0400 Thu, 06 Aug 2015 03:38:35 -0400 1 {kubelet 127.0.0.1} implicitly required container POD pulled Pod container image "gcr.io/google_containers/pause:0.8.0" already present on machine
Thu, 06 Aug 2015 03:38:36 -0400 Thu, 06 Aug 2015 03:38:36 -0400 1 {kubelet 127.0.0.1} implicitly required container POD created Created with docker id 98aeb13c657b
Thu, 06 Aug 2015 03:38:36 -0400 Thu, 06 Aug 2015 03:38:36 -0400 1 {kubelet 127.0.0.1} implicitly required container POD started Started with docker id 98aeb13c657b
Thu, 06 Aug 2015 03:38:37 -0400 Thu, 06 Aug 2015 03:38:37 -0400 1 {kubelet 127.0.0.1} spec.containers{fedoraapache} created Created with docker id debe7fe1ff4f
Thu, 06 Aug 2015 03:38:37 -0400 Thu, 06 Aug 2015 03:38:37 -0400 1 {kubelet 127.0.0.1} spec.containers{fedoraapache} started Started with docker id debe7fe1ff4f

Kubernetes pod on Google Container Engine continually restarts, is never ready

I'm trying to get a ghost blog deployed on GKE, working off of the persistent disks with WordPress tutorial. I have a working container that runs fine manually on a GKE node:
docker run -d --name my-ghost-blog -p 2368:2368 -d us.gcr.io/my_project_id/my-ghost-blog
I can also correctly create a pod using the following method from another tutorial:
kubectl run ghost --image=us.gcr.io/my_project_id/my-ghost-blog --port=2368
When I do that I can curl the blog on the internal IP from within the cluster, and get the following output from kubectl get pod:
Name: ghosty-nqgt0
Namespace: default
Image(s): us.gcr.io/my_project_id/my-ghost-blog
Node: very-long-node-name/10.240.51.18
Labels: run=ghost
Status: Running
Reason:
Message:
IP: 10.216.0.9
Replication Controllers: ghost (1/1 replicas created)
Containers:
ghosty:
Image: us.gcr.io/my_project_id/my-ghost-blog
Limits:
cpu: 100m
State: Running
Started: Fri, 04 Sep 2015 12:18:44 -0400
Ready: True
Restart Count: 0
Conditions:
Type Status
Ready True
Events:
...
The problem arises when I instead try to create the pod from a yaml file, per the Wordpress tutorial. Here's the yaml:
metadata:
name: ghost
labels:
name: ghost
spec:
containers:
- image: us.gcr.io/my_project_id/my-ghost-blog
name: ghost
env:
- name: NODE_ENV
value: production
- name: VIRTUAL_HOST
value: myghostblog.com
ports:
- containerPort: 2368
When I run kubectl create -f ghost.yaml, the pod is created, but is never ready:
> kubectl get pod ghost
NAME READY STATUS RESTARTS AGE
ghost 0/1 Running 11 3m
The pod continuously restarts, as confirmed by the output of kubectl describe pod ghost:
Name: ghost
Namespace: default
Image(s): us.gcr.io/my_project_id/my-ghost-blog
Node: very-long-node-name/10.240.51.18
Labels: name=ghost
Status: Running
Reason:
Message:
IP: 10.216.0.12
Replication Controllers: <none>
Containers:
ghost:
Image: us.gcr.io/my_project_id/my-ghost-blog
Limits:
cpu: 100m
State: Running
Started: Fri, 04 Sep 2015 14:08:20 -0400
Ready: False
Restart Count: 10
Conditions:
Type Status
Ready False
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Fri, 04 Sep 2015 14:03:20 -0400 Fri, 04 Sep 2015 14:03:20 -0400 1 {scheduler } scheduled Successfully assigned ghost to very-long-node-name
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} implicitly required container POD created Created with docker id dbbc27b4d280
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} implicitly required container POD started Started with docker id dbbc27b4d280
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} created Created with docker id ceb14ba72929
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} started Started with docker id ceb14ba72929
Fri, 04 Sep 2015 14:03:27 -0400 Fri, 04 Sep 2015 14:03:27 -0400 1 {kubelet very-long-node-name} implicitly required container POD pulled Pod container image "gcr.io/google_containers/pause:0.8.0" already present on machine
Fri, 04 Sep 2015 14:03:30 -0400 Fri, 04 Sep 2015 14:03:30 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} started Started with docker id 0b8957fe9b61
Fri, 04 Sep 2015 14:03:30 -0400 Fri, 04 Sep 2015 14:03:30 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} created Created with docker id 0b8957fe9b61
Fri, 04 Sep 2015 14:03:40 -0400 Fri, 04 Sep 2015 14:03:40 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} created Created with docker id edaf0df38c01
Fri, 04 Sep 2015 14:03:40 -0400 Fri, 04 Sep 2015 14:03:40 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} started Started with docker id edaf0df38c01
Fri, 04 Sep 2015 14:03:50 -0400 Fri, 04 Sep 2015 14:03:50 -0400 1 {kubelet very-long-node-name} spec.containers{ghost} started Started with docker id d33f5e5a9637
...
This cycle of created/started goes on forever, if I don't kill the pod. The only difference from the successful pod is the lack of a replication controller. I don't expect this is the problem because the tutorial mentions nothing about rc.
Why is this happening? How can I create a successful pod from config file? And where would I find more verbose logs about what is going on?
If the same docker image is working via kubectl run but not working in a pod, then something is wrong with the pod spec. Compare the full output of the pod as created from spec and as created by rc to see what differs by running kubectl get pods <name> -o yaml for both. Shot in the dark: is it possible the env vars specified in the pod spec are causing it to crash on startup?
Maybe you could use different restart Policy in the yaml file?
What you have I believe is equivalent to
- restartPolicy: Never
no replication controller. You may try to add this line to yaml and set it to Always (and this will provide you with RC), or to OnFailure.
https://github.com/kubernetes/kubernetes/blob/master/docs/user-guide/pod-states.md#restartpolicy
Container logs may be useful, with kubectl logs
Usage:
kubectl logs [-p] POD [-c CONTAINER]
http://kubernetes.io/v1.0/docs/user-guide/kubectl/kubectl_logs.html