DPDK Multi-process; Kill a primary process and restart as a secondary doesn't work - pmd

I'm already running up to 4 DPDK processes next to each other without any issues and I can also restart secondary processes successfully.
I read here in end of the symmetric multi-process section, that you can destroy the primary process and restart it as a secondary.
But when I'm trying to restart the primary process, I run into some problems.
For example:
Running 2 processes. Each of them will stream data from its own dedicated port to the 0. queue of the virtual function. The goal is now to restart the first process as secondary.
After the init of the EAL , mbufs, and rings, I call rte_eal_remote_launch() for each process with its own dedicated lcore which launches a function that does some packet processing.
Start primary:
$ sudo mp_dpdk_app -l 0-4 -n 2 --proc-type=primary -- -p 3 --num-procs=2 --proc-id=0
Output:
EAL init start.
EAL: Detected CPU lcores: 64
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 1048576 kB hugepages reported
EAL: VFIO support initialized
EAL: Using IOMMU type 8 (No-IOMMU)
EAL: Probe PCI driver: net_ixgbe_vf (8086:10ed) device: 0000:19:10.0 (socket 0)
EAL: Probe PCI driver: net_ixgbe_vf (8086:10ed) device: 0000:19:10.1 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
EAL Process Type: PRIMARY
Start the secondary:
$ sudo mp_dpdk_app -l 0-4 -n 2 --proc-type=secondary -- -p 3 --num-procs=2 --proc-id=1
Output:
EAL init start.
EAL: Detected CPU lcores: 64
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_13330_2fd6664d78de
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Using IOMMU type 8 (No-IOMMU)
EAL: Probe PCI driver: net_ixgbe_vf (8086:10ed) device: 0000:19:10.0 (socket 0)
eth_ixgbevf_dev_init(): No TX queues configured yet. Using default TX function.
EAL: Probe PCI driver: net_ixgbe_vf (8086:10ed) device: 0000:19:10.1 (socket 0)
eth_ixgbevf_dev_init(): No TX queues configured yet. Using default TX function.
EAL Process Type: SECONDARY
Kill primary and restart with:
$ sudo mp_dpdk_app -l 0-4 -n 2 --proc-type=secondary -- -p 3 --num-procs=2 --proc-id=0
But the init fails with the following output:
EAL init start.
EAL: Detected CPU lcores: 64
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_13473_2fda4aa02c52
EAL: failed to send to (/var/run/dpdk/rte/mp_socket) due to Connection refused
EAL: Fail to send request /var/run/dpdk/rte/mp_socket:bus_vdev_mp
vdev_scan(): Failed to request vdev from primary
EAL: Selected IOVA mode 'PA'
EAL: failed to send to (/var/run/dpdk/rte/mp_socket) due to Connection refused
EAL: Fail to send request /var/run/dpdk/rte/mp_socket:eal_vfio_mp_sync
EAL: Cannot request default VFIO container fd
EAL: VFIO support could not be initialized
EAL: Requested device 0000:19:10.0 cannot be used
EAL: Requested device 0000:19:10.1 cannot be used
EAL: Error - exiting with code: 1
Cause: :: no Ethernet ports found
I noticed that a new mp socket is created (mp_socket_13473_2fda4aa02c52).
But somehow the EAL tries then to connect to the rte/mp_socket, which was created by the primary process at the beginning and don't use the new one.
If I exit the primary process with rte_eal_cleanup() , the /rte/mp_socket is removed and I still can't start a new secondary process due to the error /rte/mp_process does not exist
My hardware setup:
Network devices using DPDK-compatible driver
============================================
0000:19:10.0 '82599 Ethernet Controller Virtual Function 10ed' drv=vfio-pci unused=ixgbevf
0000:19:10.1 '82599 Ethernet Controller Virtual Function 10ed' drv=vfio-pci unused=ixgbevf
The processes don't have to communicate in-between each other.
Can anybody give me a clue about this issue?
Any help will be appreciated.

The short answer to the question In DPDK multiprocess scenario (Primary & 1 or multiple secondary processes), if the DPDK primary process is stopped|killed can restarting the process allows it to communicate with the secondary process? is No, it can not.
let me explain this for clarity below
Please make use of DPDK documentation on multiprocess, which clarifies it is primary process is one which initializes the huge pages and creates the MP_HANDLE to communicate with the secondary process. In contrast, secondary or multiple secondaries attach with primary using MP_HANDLE.
Primary process makes use of PCI, virtual, Hugepage, cores and creates unique file-id in the default path as /var/run/dpdk/{file-prefix}-appname.
there ae configuration or conf settings saved in this path, which helps the secondary to make use of the hugepage address to MMAP in shared mode.
So when a primary comes up with all the required resource secondary dpdk process make uses pre-initialized environment to run certain subset of DPDK API
So when a primary process is stopped|killed, the resource mapping and configuration are not cleaned by default, which allows the existing secondary process to continue working. In case of EAL-ARGS --no-shconf immediate clean up of the folder /var/run/dpdk/{file-prefix}-dpdk is triggered (also does not support secondary process). Thereby starting|restarting a new primary will fail, as the PCI resources, hugepages, cores and file path (/var/run/dpdk/{file-prefix}-appname) already is been used.
hence the expectation of restarting primary can it connect to running secondary is not true. There are internal check rte_eal_init routine with the standard DPDK releases.

Related

Postgresql WalReceiver process waits on connecting master regardless of "connect_timeout"

I am trying to deploy an automated high-available PostgreSQL cluster on kubernetes. In cases of master failover or temporary failures in master, standby loses streaming replication connection and when retrying, it takes a long time until it gets failed and retries.
I use PostgreSQL 10 and streaming replication (cluster-main-cluster-master-service is a service that always routes to master and all the replicas connect to this service for replication). I've tried setting configs like connect_timeout and keepalive in primary_conninfo of recovery.conf and wal_receiver_timeout in postgresql.conf of standby but I could not make any progress with them.
In the first place when master goes down, replication stops with the following error (state 1):
2019-10-06 14:14:54.042 +0330 [3039] LOG: replication terminated by primary server
2019-10-06 14:14:54.042 +0330 [3039] DETAIL: End of WAL reached on timeline 17 at 0/33000098.
2019-10-06 14:14:54.042 +0330 [3039] FATAL: could not send end-of-streaming message to primary: no COPY in progress
2019-10-06 14:14:55.534 +0330 [12] LOG: record with incorrect prev-link 0/2D000028 at 0/33000098
After investigating Postgres activities I found out that WalReceiver proccess stucks in LibPQWalReceiverConnect wait_event (state 2) but timeout is way longer than what I configured (although I set connect_timeout to 10 seconds, it takes about 2 minutes). Then, It fails with the following error (state 3):
2019-10-06 14:17:06.035 +0330 [3264] FATAL: could not connect to the primary server: could not connect to server: Connection timed out
Is the server running on host "cluster-main-cluster-master-service" (192.168.0.166) and accepting
TCP/IP connections on port 5432?
In the next try, It successfully connects the primary (state 4):
2019-10-06 14:17:07.892 +0330 [5786] LOG: started streaming WAL from primary at 0/33000000 on timeline 17
I also tried killing the process when stuck event occurs (state 2), and when I do, It starts the process again and connects and then streams normally (jumps to state 4).
After checking netstat, I also found that there is a connection with SYN_SENT state to the old master in the walreceiver process (in failover case).
connect_timeout governs how long PostgreSQL will wait for the replication connection to succeed, but that does not include establishing the TCP connection.
To reduce the time that the kernel waits for a successful answer to a TCP SYN request, reduce the number of retries. In /etc/sysctl.conf, set:
net.ipv4.tcp_syn_retries = 3
and run sysctl -p.
That should reduce the time significantly.
Reducing the value too much might make your system less stable.

Getting (WARNING: requested port 0 not present - ignoring) while running dpdk

I have tried to run dpdk in my redhat6.3 verison.
Network devices using IGB_UIO driver
====================================
**0000:43:00.0 '82599EB 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio unused=ixgbe
0000:43:00.1 '82599EB 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio unused=ixgbe
0000:46:00.0 'Device 155d' drv=igb_uio unused=
0000:46:00.1 'Device 155d' drv=igb_uio unused=**
coreId : 5
EAL: Cannot read numa node link for lcore 5 -using physical package id instead
EAL: Detected lcore 5 as core 5 on socket 0
EAL: Setting up hugepage memory...
EAL: Ask a virtual area of 0x2097152 bytes
EAL: Virtual area found at 0x7f5483400000 (size = 0x200000)
EAL: Ask a virtual area of 0x4290772992 bytes
EAL: Virtual area found at 0x7f5383600000 (size = 0xffc00000)
EAL: Ask a virtual area of 0x2097152 bytes
EAL: Virtual area found at 0x7f5383200000 (size = 0x200000)
EAL: Ask a virtual area of 0x4294967296 bytes
EAL: Virtual area found at 0x7f5283000000 (size = 0x100000000)
EAL: Ask a virtual area of 0x2097152 bytes
EAL: Virtual area found at 0x7f5282c00000 (size = 0x200000)
EAL: Ask a virtual area of 0x4290772992 bytes
EAL: Virtual area found at 0x7f5182e00000 (size = 0xffc00000)
EAL: Ask a virtual area of 0x2097152 bytes
EAL: Virtual area found at 0x7f5182a00000 (size = 0x200000)
EAL: Ask a virtual area of 0x2097152 bytes
EAL: Virtual area found at 0x7f5182600000 (size = 0x200000)
EAL: Ask a virtual area of 0x4290772992 bytes
EAL: Virtual area found at 0x7f5082800000 (size = 0xffc00000)
EAL: Ask a virtual area of 0x2097152 bytes
EAL: Virtual area found at 0x7f5082400000 (size = 0x200000)
EAL: Requesting 256 pages of size 2MB from socket 0
EAL: TSC frequency is ~2700001 KHz
EAL: Master core 5 is ready (tid=864ff820)
EAL: PCI device 0000:43:00.1 on NUMA socket 1
EAL: probe driver: 8086:10fb rte_ixgbe_pmd
EAL: PCI memory mapped at 0x7fe900943000
EAL: PCI memory mapped at 0x7fe90093f000
EAL: PCI device 0000:46:00.0 on NUMA socket 1
EAL: probe driver: 8086:155d rte_ixgbe_pmd
EAL: Device is blacklisted, not initializing
EAL: PCI device 0000:46:00.1 on NUMA socket 1
EAL: probe driver: 8086:155d rte_ixgbe_pmd
EAL: Device is blacklisted, not initializing
WARNING: requested port 0 not present - ignoring
WARNING: requested port 1 not present - ignoring
0 [EAL options] -- -p PORTMASK -n NUM_CLIENTS [-s NUM_SOCKETS]
-p PORTMASK: hexadecimal bitmask of ports to use
-n NUM_CLIENTS: number of client processes to use
Process affinity succesffuly set to cpu 5.
TIME_InitCommon(): Done. TIME_MiliSecPerSysTick=1, TIME_SysTickPerSec=1000.
Ports are there, but still i can't run dpdk using "0000:43:00.0 '82599EB"
what is the problem ?
Same configuration i am using to run DPDK for 0000:46:00.0 and 0000:43:00.0. but the above 82599EB NIC is not running why?
The same time Dpdk is able to run with 0000:46:00.0 'Device 155d' [This LOGS are below]
Creating mbuf pool 'MProc_pktmbuf_pool' [46080 mbufs] ...
Port 0 init ... done:
Port 1 init ... done:
Checking link statusdone
Port 0 Link Up - speed 10000 Mbps - full-duplex
Port 1 Link Up - speed 10000 Mbps - full-duplex
imagefile -coreId 5 -socket 0 -c 20 -n 4 --proc-type=primary --socket-mem=512,0,0,0 -b 0:01:0.0 -b 0:01:0.1 -b 0:01:0.2 -b 0:01:0.3 -b 0:46:0.0 -b 0:46:0.1 -b 0:41:0.0 -b 0:41:0.1 -b 0:07:00.0 -b 0:07:00.1 -b 0:03:0.0 -b 0:03:0.1 -b 0:05:0.0 -b 0:05:0.1 -- -p 3 -n
There are few problems with the command line:
The -coreId, -socket is not a valid DPDK command line argument. That is probably your app custom arguments?
The --proc-type is used for multiprocess application, usually we don't need this option.
The --socket-mem=512,0,0,0 allocates 512MB on NUMA node 0 and zero on nodes 1-3. But please note, that some of your NICs are on NUMA node 1 in fact, so we need to allocate some memory there as well.
Instead of a bunch of -b (blacklist) options we can use just one or two -w (whitelist)
Still it does not explain why there is no 0000:43:00.0 device appeared. So pease try to add --log-level=debug to see what is happening to this device.

Hazelcast Multicasting error after stopping the nodes of the cluster

I am having a cluster of two nodes i.e. two OrientDB servers running on two separate machines having the enterprise edition 2.2.3 .Both the machines are VM having fedora OS 18. The orientDB database consists of approximately 75000 edges and 5000 nodes.
When i try to stop any of the nodes or both the nodes one after other i am having following error:
Node1
2017-05-02 17:32:44:811 WARNI Received signal: SIGINT [OSignalHandler]Exception in thread "Timer-1" com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not active!
at com.hazelcast.spi.AbstractDistributedObject.throwNotActiveException(AbstractDistributedObject.java:85)
at com.hazelcast.spi.AbstractDistributedObject.lifecycleCheck(AbstractDistributedObject.java:80)
at com.hazelcast.spi.AbstractDistributedObject.getNodeEngine(AbstractDistributedObject.java:74)
at com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:309)
at com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:250)
at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:94)
at com.orientechnologies.orient.server.hazelcast.OHazelcastDistributedMap.get(OHazelcastDistributedMap.java:53)
at com.orientechnologies.agent.profiler.OEnterpriseProfiler$14.run(OEnterpriseProfiler.java:772)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid11478.hprof ...
Heap dump file created [744789648 bytes in 21.248 secs]
Node2
2017-05-02 17:32:41:108 INFO [192.168.6.153]:2434 [orientdb] [3.6.3] Running shutdown hook... Current state: ACTIVE [Node]Exception in thread "Timer-1" com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not active!
at com.hazelcast.spi.AbstractDistributedObject.throwNotActiveException(AbstractDistributedObject.java:85)
at com.hazelcast.spi.AbstractDistributedObject.lifecycleCheck(AbstractDistributedObject.java:80)
at com.hazelcast.spi.AbstractDistributedObject.getNodeEngine(AbstractDistributedObject.java:74)
at com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:309)
at com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:250)
at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:94)
at com.orientechnologies.orient.server.hazelcast.OHazelcastDistributedMap.get(OHazelcastDistributedMap.java:53)
at com.orientechnologies.agent.profiler.OEnterpriseProfiler$14.run(OEnterpriseProfiler.java:772)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
How can i solve the heap memory issue?
Seems like your problem is the Out of Memory error. The exception from Hazelcast just means that the HazelcastInstance was stopped, most probably based on the OOME fact.

mesos masters keep restarting

I have 3 mesos masters with version 0.26.0 setup with a quorum of 2. When I start them, they keep restarting even before I turn up any frameworks or slaves.
Here's the errors I'm seeing:
F0322 19:36:56.009903 51459 master.cpp:1368] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
E0322 19:37:18.300568 41095 process.cpp:1911] Failed to shutdown socket with fd 26: Transport endpoint is not connected
There's no firewall running.
I start them with supervisord and the following command:
/usr/sbin/mesos-master --cluster=int --log_dir=/var/log/mesos/int --quorum=2 --port=5050 --work_dir=/tmp/mesos/work/int --zk=zk://intMesosMaster01:2181,intMesosMaster02:2181,intMesosMaster03:2181/mesos
Zookeeper is up and running fine with 3 nodes. It's in use for other projects and has no issues at all with them.

Google compute engine boot fails after hardware upgrade to Hashwell

I have a f1-micro instance running on Google Compute Engine running CentOS.
Some time ago I got a mail stating that the zone I am using (us-central1-b) is going through a hardware upgrade to new Hashwell servers, and all instances need to be restarted. So, I did restart. I shut down my services, and performed a restart via the developer dev. console.
But the instance never came up - the boot on the new hardware failed, and now I can not log in to the instance with ssh. I will add the serial console content below this.
My main issue is that there are some content I would really like to get my hands on (nope - didn't do a full backup - shoot!).
How do I restore the instance, when I can not log in using ssh?
Changing serial settings was 0/0 now 3/0
Start bios (version 1.7.2-20150226_170051-google)
Unable to unlock ram - bridge not found
Ram Size=0x26600000 (0x0000000000000000 high)
Relocating low data from 0x000e5810 to 0x000ef780 (size 2161)
Relocating init from 0x000e6081 to 0x265d3540 (size 51612)
CPU Mhz=2301
=== PCI bus & bridge init ===
PCI: pci_bios_init_bus_rec bus = 0x0
=== PCI device probing ===
Found 4 PCI devices (max PCI bus is 00)
=== PCI new allocation pass #1 ===
PCI: check devices
=== PCI new allocation pass #2 ===
PCI: map device bdf=00:03.0 bar 0, addr 0000c000, size 00000040 [io]
PCI: map device bdf=00:04.0 bar 0, addr 0000c040, size 00000040 [io]
PCI: map device bdf=00:03.0 bar 1, addr febfe000, size 00001000 [mem]
PCI: map device bdf=00:04.0 bar 1, addr febff000, size 00001000 [mem]
PCI: init bdf=00:01.0 id=8086:7110
PIIX3/PIIX4 init: elcr=00 0c
PCI: init bdf=00:01.3 id=8086:7113
Using pmtimer, ioport 0xb008, freq 3579 kHz
PCI: init bdf=00:03.0 id=1af4:1004
PCI: init bdf=00:04.0 id=1af4:1000
Found 1 cpu(s) max supported 1 cpu(s)
MP table addr=0x000fdaf0 MPC table addr=0x000fdb00 size=240
SMBIOS ptr=0x000fdad0 table=0x000fd990 size=314
Memory hotplug not enabled. [MHPE=0xffffffff]
ACPI DSDT=0x265fe070
ACPI tables: RSDP=0x000fd960 RSDT=0x265fe030
Scan for VGA option rom
Machine UUID 2b33f2bc-6042-1f12-c61f-0edd33ff965d
Found 0 serial ports
found virtio-scsi at 0:3
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#0,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#1,0
virtio-scsi vendor='Google' product='PersistentDisk' rev='1' type=0 removable=0
virtio-scsi blksize=512 sectors=20971520
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#2,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#3,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#4,0
....
....
....
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#248,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#249,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#250,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#251,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#252,0
Searching bootorder for: /pci#i0cf8/*#3/*#0/*#253,0
KBD: int09 handler: AL=0
PS2 keyboard initialized
All threads complete.
Scan for option roms
Searching bootorder for: HALT
drive 0x000fd920: PCHS=0/0/0 translation=lba LCHS=1024/255/63 s=20971520
Space available for UMB: 000c0000-000eb800
Returned 122880 bytes of ZoneHigh
e820 map has 6 items:
0: 0000000000000000 - 000000000009fc00 = 1 RAM
1: 000000000009fc00 - 00000000000a0000 = 2 RESERVED
2: 00000000000f0000 - 0000000000100000 = 2 RESERVED
3: 0000000000100000 - 00000000265fe000 = 1 RAM
4: 00000000265fe000 - 0000000026600000 = 2 RESERVED
5: 00000000fffbc000 - 0000000100000000 = 2 RESERVED
Unable to lock ram - bridge not found
KBD: int09 handler: AL=0
Changing serial settings was 3/2 now 3/0
enter handle_19:
NULL
Booting from Hard Disk 0...
Booting from 0000:7c00
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-431.17.1.el6.x86_64 (mockbuild#c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed May 7 23:32:49 UTC 2014
Command line: ro root=UUID=a8cf6ab7-92fb-42c6-b95f-d437f94aaf98 rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD KEYTABLE=us SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_NO_LVM console=ttyS0,38400n8 rd_NO_DM
KERNEL supported cpus:
Intel GenuineIntel
AMD AuthenticAMD
Centaur CentaurHauls
Disabled fast string operations
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000265fe000 (usable)
BIOS-e820: 00000000265fe000 - 0000000026600000 (reserved)
BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved)
DMI 2.4 present.
SMBIOS version 2.4 # 0xFDAD0
Hypervisor detected: KVM
last_pfn = 0x265fe max_arch_pfn = 0x400000000
x86 PAT enabled: cpu 0, old 0x70106, new 0x7010600070106
Using GB pages for direct mapping
init_memory_mapping: 0000000000000000-00000000265fe000
RAMDISK: 25f82000 - 265ed02e
ACPI: RSDP 00000000000fd960 00014 (v00 Google)
ACPI: RSDT 00000000265fe030 00034 (v01 Google GOOGRSDT 00000001 GOOG 00000001)
ACPI: FACP 00000000265fff00 000F4 (v02 Google GOOGFACP 00000001 GOOG 00000001)
ACPI: DSDT 00000000265fe070 015B6 (v01 GOOG GODSDT 00000001 INTL 20140214)
ACPI: FACS 00000000265ffec0 00040
ACPI: SSDT 00000000265ff780 0073D (v01 Google GOOGSSDT 00000001 GOOG 00000001)
ACPI: APIC 00000000265ff660 0006E (v01 Google GOOGAPIC 00000001 GOOG 00000001)
ACPI: WAET 00000000265ff630 00028 (v01 Google GOOGWAET 00000001 GOOG 00000001)
Setting APIC routing to flat.
No NUMA configuration found
Faking a node at 0000000000000000-00000000265fe000
Bootmem setup node 0 0000000000000000-00000000265fe000
NODE_DATA [0000000000009000 - 000000000003cfff]
bootmap [000000000003d000 - 0000000000041cbf] pages 5
(7 early reservations) ==> bootmem [0000000000 - 00265fe000]
#0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
#1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000]
#2 [0001000000 - 0002020aa4] TEXT DATA BSS ==> [0001000000 - 0002020aa4]
#3 [0025f82000 - 00265ed02e] RAMDISK ==> [0025f82000 - 00265ed02e]
#4 [000009fc00 - 0000100000] BIOS reserved ==> [000009fc00 - 0000100000]
#5 [0002021000 - 00020210a5] BRK ==> [0002021000 - 00020210a5]
#6 [0000008000 - 0000009000] PGTABLE ==> [0000008000 - 0000009000]
found SMP MP-table at [ffff8800000fdaf0] fdaf0
crashkernel=auto resulted in zero bytes of reserved memory.
kvm-clock: Using msrs 4b564d01 and 4b564d00
kvm-clock: cpu 0, msr 0:1c247c1, boot clock
Zone PFN ranges:
DMA 0x00000001 -> 0x00001000
DMA32 0x00001000 -> 0x00100000
Normal 0x00100000 -> 0x00100000
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0: 0x00000001 -> 0x0000009f
0: 0x00000100 -> 0x000265fe
ACPI: PM-Timer IO Port: 0xb008
Setting APIC routing to flat.
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
Using ACPI (MADT) for SMP configuration information
SMP: Allowing 1 CPUs, 0 hotplug CPUs
PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
Allocating PCI resources starting at 26600000 (gap: 26600000:d99bc000)
Booting paravirtualized kernel on KVM
NR_CPUS:4096 nr_cpumask_bits:1 nr_cpu_ids:1 nr_node_ids:1
PERCPU: Embedded 31 pages/cpu #ffff880002200000 s94872 r8192 d23912 u2097152
pcpu-alloc: s94872 r8192 d23912 u2097152 alloc=1*2097152
pcpu-alloc: [0] 0
kvm-clock: cpu 0, msr 0:22167c1, primary cpu clock
Built 1 zonelists in Node order, mobility grouping on. Total pages: 154835
Policy zone: DMA32
Kernel command line: ro root=UUID=a8cf6ab7-92fb-42c6-b95f-d437f94aaf98 rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD KEYTABLE=us SYSFONT=latarcyrheb-sun16 rd_NO_LVM console=ttyS0,38400n8 rd_NO_DM
PID hash table entries: 4096 (order: 3, 32768 bytes)
xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340
Checking aperture...
No AGP bridge found
Memory: 594376k/628728k available (5326k kernel code, 392k absent, 33960k reserved, 7013k data, 1280k init)
Hierarchical RCU implementation.
NR_IRQS:33024 nr_irqs:256
Console: colour *CGA 80x25
console [ttyS0] enabled
allocated 2621440 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Detected 2299.982 MHz processor.
Calibrating delay loop (skipped) preset value.. 4599.96 BogoMIPS (lpj=2299982)
pid_max: default: 32768 minimum: 301
Security Framework initialized
SELinux: Initializing.
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
Initializing cgroup subsys blkio
Initializing cgroup subsys perf_event
Initializing cgroup subsys net_prio
Disabled fast string operations
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
mce: CPU supports 32 MCE banks
alternatives: switching to unfair spinlock
SMP alternatives: switching to UP code
Freeing SMP alternatives: 36k freed
ACPI: Core revision 20090903
ftrace: converting mcount calls to 0f 1f 44 00 00
ftrace: allocating 21778 entries in 86 pages
Enabling x2apic
Enabled x2apic
APIC routing finalized to physical x2apic.
..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1
CPU0: Intel(R) Xeon(R) CPU # 2.30GHz stepping 00
Performance Events: unsupported p6 CPU model 63 no PMU driver, software events only.
NMI watchdog disabled (cpu0): hardware events not enabled
Brought up 1 CPUs
Total of 1 processors activated (4599.96 BogoMIPS).
devtmpfs: initialized
regulator: core version 0.5
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1 for base access
bio: create slab <bio-0> at 0
ACPI: Interpreter enabled
ACPI: (supports S0 S3 S4 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7]
pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff]
pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
pci_root PNP0A03:00: host bridge window [mem 0x26600000-0xfebfffff]
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7]
pci_bus 0000:00: root bus resource [io 0x0d00-0xffff]
pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
pci_bus 0000:00: root bus resource [mem 0x26600000-0xfebfffff]
pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
vgaarb: loaded
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
NetLabel: Initializing
NetLabel: domain hash size = 128
NetLabel: protocols = UNLABELED CIPSOv4
NetLabel: unlabeled traffic allowed by default
Switching to clocksource kvm-clock
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 8 devices
ACPI: ACPI bus type pnp unregistered
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 6, 262144 bytes)
TCP established hash table entries: 131072 (order: 9, 2097152 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
NET: Registered protocol family 1
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 6572k freed
audit: initializing netlink socket (disabled)
type=2000 audit(1430203988.308:1): initialized
HugeTLB registered 2 MB page size, pre-allocated 0 pages
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
msgmni has been set to 1173
alg: No test for stdrng (krng)
ksign: Installing public key data
Loading keyring
- Added public key A760F53BCC8FA210
- User ID: CentOS (Kernel Module GPG key)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
pciehp: PCI Express Hot Plug Controller Driver version: 0.4
acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
ipmi message handler version 39.2
IPMI System Interface driver.
ipmi_si: Adding default-specified kcs state machine
ipmi_si: Trying default-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0
ipmi_si: Interface detection failed
ipmi_si: Adding default-specified smic state machine
ipmi_si: Trying default-specified smic state machine at i/o address 0xca9, slave address 0x0, irq 0
ipmi_si: Interface detection failed
ipmi_si: Adding default-specified bt state machine
ipmi_si: Trying default-specified bt state machine at i/o address 0xe4, slave address 0x0, irq 0
ipmi_si: Interface detection failed
ipmi_si: Unable to find any System Interface(s)
input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
ACPI: Power Button [PWRF]
input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
ACPI: Sleep Button [SLPF]
[Firmware Bug]: No valid trip found
GHES: HEST is not enabled!
Non-volatile memory driver v1.3
Linux agpgart interface v0.103
crash memory driver: version 1.1
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
�serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
serial8250: ttyS2 at I/O 0x3e8 (irq = 6) is a 16550A
Refined TSC clocksource calibration: 2300.003 MHz.
serial8250: ttyS3 at I/O 0x2e8 (irq = 7) is a 16550A
00:04: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:05: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:06: ttyS2 at I/O 0x3e8 (irq = 6) is a 16550A
00:07: ttyS3 at I/O 0x2e8 (irq = 7) is a 16550A
brd: module loaded
loop: module loaded
input: Macintosh mouse button emulation as /devices/virtual/input/input2
Fixed MDIO Bus: probed
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
uhci_hcd: USB Universal Host Controller Interface driver
PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
i8042.c: Warning: Keylock active.
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
rtc_cmos 00:01: RTC can wake from S4
rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one day, 114 bytes nvram
cpuidle: using governor ladder
cpuidle: using governor menu
EFI Variables Facility v0.08 2004-May-17
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
GRE over IPv4 demultiplexor driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
registered taskstats version 1
rtc_cmos 00:01: setting system clock to 2015-04-28 06:53:10 UTC (1430203990)
Initalizing network drop monitor service
Freeing unused kernel memory: 1280k freed
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 800k freed
Freeing unused kernel memory: 1584k freed
dracut: dracut-004-336.el6_5.2
dracut: rd_NO_LUKS: removing cryptoluks activation
dracut: rd_NO_LVM: removing LVM activation
udev: starting version 147
dracut: Starting plymouth daemon
dracut: rd_NO_MD: removing MD RAID activation
input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input3
%GFATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input4
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
dracut Warning: No root device "block:/dev/disk/by-uuid/a8cf6ab7-92fb-42c6-b95f-d437f94aaf98" found
%GFATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
FATAL: Could not load /lib/modules/2.6.32-431.17.1.el6.x86_64/modules.dep: No such file or directory
dracut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line.
dracut Warning: Signal caught!
dracut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line.
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 2.6.32-431.17.1.el6.x86_64 #1
Call Trace:
[<ffffffff815274ef>] ? panic+0xa7/0x16f
[<ffffffff81077292>] ? do_exit+0x862/0x870
[<ffffffff8118a525>] ? fput+0x25/0x30
[<ffffffff810772f8>] ? do_group_exit+0x58/0xd0
[<ffffffff81077387>] ? sys_exit_group+0x17/0x20
[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
I suggest at first to follow standard troubleshooting steps to validate the disk's filesystem as documented.
Then you can try to snapshot the disk and create a new instance with boot disk from snapshot to avoid exclude a disk UUID conflict.
Finally you can try to attach the disk to recover the files to a brand new instance as I suggested in my previous comment.
If you can not validate the filesystem as per troubleshooting guide the data is most probably lost and nothing more can be done to recover it.
Please note as a good practice to take care of either periodic snapshot, GCS differentiual backup or other replication method.
Thank you.
Sincerely,
P.