What do these Windbg error messages mean? - windbg

I'm trying to do a !heap -s in Windbg to get heap information. When I attempt it I get the following output:
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
00000000005d0000 08000002 512 28 512 10 3 1 0 0
Error: Heap 0000000000000000 has an invalid signature eeffeeff
Front-end heap type info is not available
Front-end heap type info is not available
Virtual block: 0000000000000000 - 0000000000000000 (size 0000000000000000)
HEAP 0000000000000000 (Seg 0000000000000000) At 0000000000000000 Error: Unable to read virtual block
0000000000000000 00000000 0 0 0 0 0 0 1 0
-----------------------------------------------------------------------------
I can't find any reference as to what the unusual error/not available lines mean.
Can someone please give me a summary as to why I'm not getting an expected list of heaps?
The only thing I execute prior to !heap -s is !wow64exts.sw because the process dumps are from a 32 bit process but created by a 64 bit Task Manager.

After testing with the 32 and 64 bit Task Managers it appears that process dumps of 32 bit processes created by the 64 bit Task Manager can only be debugged successfully in some areas using !wow64exts.sw in Windbg to use 32 bit debugging.
That extension allows call stacks to be reviewed correctly, but !heap -s does not appear to work correctly under it. Instead you end up with the errors in the question.
For example, the output from a process dump of the 32 bit process using the 32 bit Task Manager:
0:000> !heap -s
NtGlobalFlag enables following debugging aids for new heaps:
stack back traces
LFH Key : 0x06b058a2
Termination on corruption : DISABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
031b0000 08000002 1024 236 1024 2 13 1 0 0 LFH
001d0000 08001002 1088 188 1088 18 9 2 0 0 LFH
01e30000 08001002 1088 160 1088 4 3 2 0 0 LFH
03930000 08001002 256 4 256 2 1 1 0 0
038a0000 08001002 64 16 64 13 1 1 0 0
-----------------------------------------------------------------------------
The output from a process dump of the 32 bit process using the 64 bit Task Manager without !wow64exts.sw:
0:000> !heap -s
NtGlobalFlag enables following debugging aids for new heaps:
stack back traces
LFH Key : 0x000000b406b058a2
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-------------------------------------------------------------------------------------
0000000001f70000 08000002 512 28 512 10 3 1 0 0
0000000000020000 08008000 64 4 64 1 1 1 0 0
-------------------------------------------------------------------------------------
The output from a process dump of the 32 bit process using the 64 bit Task Manager with !wow64exts.sw:
0:000> !wow64exts.sw
Switched to 32bit mode
0:000:x86> !heap -s
NtGlobalFlag enables following debugging aids for new heaps:
stack back traces
LFH Key : 0x000000b406b058a2
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
0000000001f70000 08000002 512 28 512 10 3 1 0 0
Error: Heap 0000000000000000 has an invalid signature eeffeeff
Front-end heap type info is not available
Front-end heap type info is not available
Virtual block: 0000000000000000 - 0000000000000000 (size 0000000000000000)
HEAP 0000000000000000 (Seg 0000000000000000) At 0000000000000000 Error: Unable to read virtual block
0000000000000000 00000000 0 0 0 0 0 0 1 0
-----------------------------------------------------------------------------
Those were all taken from the same process.

Related

Do /cat/proc/partitions in 'cygwin' directly correspond to specific wmic entries?

Here's the output from cygwin
> cat /proc/partitions
8 0 500107608 sda
8 1 266240 sda1
8 2 16384 sda2
8 3 472585216 sda3 C:\
8 4 26214400 sda4 D:\
8 5 1024000 sda5
Here's the output from wmic in Powershell
> wmic diskdrive get Name,Model,SerialNumber,Size,Status
Model Name SerialNumber Size Status
NVMe SAMSUNG MZVLW512 \\.\PHYSICALDRIVE0 0025_38BB_1410_1481. 512105932800 OK
Is 'sda' in cat/proc/partitions a 1:1 equivalence with '\.\PHYSICALDRIVE0'
Followup - here I only have 1 disk drive. If I had multiple drives attached would there be an easy command to tell which 'wmic' entry corresponds to which 'proc/partitions' entry?
I expect the sequence maintained. On my system the
SDA is PhysicalDrive0
SDB is PhysicalDrive1
and the dimensions in byte vs KB is almost matching
wmic diskdrive get Name,Model,SerialNumber,Size,Status
Model Name SerialNumber Size Status
ST1000LM035-1RK172 \\.\PHYSICALDRIVE0 WL10S143 1000202273280 OK
SAMSUNG MZNLN256HAJQ-000H1 \\.\PHYSICALDRIVE1 S3T6NE0JC13444 256052966400 OK
$ cat /proc/partitions
major minor #blocks name win-mounts
8 0 976762584 sda
8 1 960658432 sda1 D:\
8 2 16102400 sda2 E:\
8 16 250059096 sdb
8 17 266240 sdb1
8 18 16384 sdb2
8 19 248765440 sdb3 C:\
8 20 1003520 sdb4

Centos not using available memory

I have Centos installed on a server with 64gb memory and it seems as if the memory usage is being suppressed.
I came to this conclusion by running an insert statement where I insert 10million rows into a Postgres table in both a Timescaledb and a standard Postgres instance hosted on Docker.
I monitored the insert process in three different ways:
Docker stats timescaledb:
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
timescaledb 73.14% 10.42 MiB / 62.75 GiB 0.02% 8.46 kB / 8.39 kB 0 B / 15.1 GB 12
free -i gives the following:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16298 avahi 20 0 16.2g 762356 759908 R 41.5 1.2 0:22.72 postgres
16127 avahi 20 0 16.2g 693080 691968 S 4.3 1.1 0:01.29 postgres
16129 avahi 20 0 16.2g 17748 16712 S 2.3 0.0 0:00.87 postgres
1578 root 30 10 1232780 86976 11568 S 0.7 0.1 0:46.34 osqueryd
17014 root 20 0 162264 2480 1596 R 0.7 0.0 0:00.03 top
928 root 20 0 90608 3212 2352 S 0.3 0.0 0:03.47 rngd
16128 avahi 20 0 16.2g 132064 131016 S 0.3 0.2 0:00.18 postgres
free -h gives the following
total used free shared buff/cache available
Mem: 62G 1.0G 58G 1.1G 3.1G 56G
Swap: 62G 0B 62G
I know that Timescaledb is an extension of Postgres which comes with its own memory configurations, but the Docker container of Timescaledb configures these automatically for you (for instance effective cache size is set at 48gb as opposed to the default 4gb that Postgres ships with). I also ran a similar process with Apache spark with 16gb assigned to the worker and it ran into an oom error. Additionally, I did a similar test on a different smaller VM and the memory usage increased as expected. All of this leads me to believe that it's a Centos config setting that I am missing somewhere, and nothing to do with Timescale/Postgres?
I have added the following parameters to vm.overcommit_memory = 2 and vm.overcommit_ratio = 95 in /etc/sysctl.conf and ran sysctl -p to implement the settings, but this didn't make a difference.
kernel.shmall = 8224280
kernel.shmmax = 33686650880
kernel.shmmni = 4096
vm.overcommit_memory = 2
vm.overcommit_ratio = 95
Below is the output from cat /proc/meminfo
MemTotal: 65794240 kB
MemFree: 61098656 kB
MemAvailable: 59252660 kB
Buffers: 2120 kB
Cached: 3467144 kB
SwapCached: 0 kB
Active: 2817620 kB
Inactive: 884816 kB
Active(anon): 1109220 kB
Inactive(anon): 234708 kB
Active(file): 1708400 kB
Inactive(file): 650108 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 65535996 kB
SwapFree: 65535996 kB
Dirty: 88 kB
Writeback: 0 kB
AnonPages: 233188 kB
Mapped: 1175120 kB
Shmem: 1110756 kB
Slab: 204044 kB
SReclaimable: 142700 kB
SUnreclaim: 61344 kB
KernelStack: 7232 kB
PageTables: 14672 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 128040524 kB
Committed_AS: 18709300 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 408824 kB
VmallocChunk: 34325399548 kB
Percpu: 9216 kB
HardwareCorrupted: 0 kB
AnonHugePages: 96256 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 133604 kB
DirectMap2M: 66965504 kB
Is there maybe something I can try to increase my memory usage? Is there maybe a config setting that I am missing somehere?
Thanks in advance for any help
PostgreSQL also uses "unused" memory, because it uses buffered I/O. So this "unused memory" is used by the kernel to cache files – in the case of a database server, these will be database files. That way, I/O requests by PostgreSQL can be served from the kernel cache rather than causing disk I/O requests.

ddrescue read non tried blocks

I'm trying to rescue a 1TB disk which has read errors. Because I didn't have a free 1TB drive, I created a raid 0 of two 500GB drives.
I used the command line from Wikipedia for the first run:
sudo ddrescue -f -n /dev/sdk /dev/md/md_test /home/user/rescue.map
ddrescue already completed this run after approximately 20 hours and more than 7000 read errors.
Now I'm trying to do a second run
sudo ddrescue -d -f -v -r3 /dev/sdk /dev/md/md_test /home/user/rescue.map
and read the non tried blocks but ddrescue gives me this:
GNU ddrescue 1.23
About to copy 1000 GBytes from '/dev/sdk' to '/dev/md/md_test'
Starting positions: infile = 0 B, outfile = 0 B
Copy block size: 128 sectors Initial skip size: 19584 sectors
Sector size: 512 Bytes
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 635060 MB, tried: 0 B, bad-sector: 0 B, bad areas: 0
Current status
ipos: 1000 GB, non-trimmed: 0 B, current rate: 0 B/s
opos: 1000 GB, non-scraped: 0 B, average rate: 0 B/s
non-tried: 365109 MB, bad-sector: 0 B, error rate: 0 B/s
rescued: 635060 MB, bad areas: 0, run time: 0s
pct rescued: 63.49%, read errors: 0, remaining time: n/a
time since last successful read: n/a
Copying non-tried blocks... Pass 1 (forwards)
ddrescue: Write error: Invalid argument
I can't figure out what this write errors means, already searched the manual for answers.
Any help is appreciated! Thx!
After a while I found the cause for the write error, the capacity of the corrupt drive is 931,5G but the total capacity of the raid 0 was just 931,3G.
Realized it, while I took a closer look to the output of lsblk command.
So I rebuild the raid 0 array with 3 500G drives and ddrescue now works as expected.

mmap allocates memory in heap ?

I was reading about mmap in wikipedia and trying out this example http://en.wikipedia.org/wiki/Mmap#Example_of_usage. I have compiled this program with gcc and ran valgrind overit.
Here is valgrind output:
# valgrind a.out
==7018== Memcheck, a memory error detector
==7018== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==7018== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==7018== Command: a.out
==7018==
PID 7018: anonymous string 1, zero-backed string 1
PID 7019: anonymous string 1, zero-backed string 1
PID 7018: anonymous string 2, zero-backed string 2
==7018==
==7018== HEAP SUMMARY:
==7018== in use at exit: 0 bytes in 0 blocks
==7018== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==7018==
==7018== All heap blocks were freed -- no leaks are possible
==7018==
==7018== For counts of detected and suppressed errors, rerun with: -v
==7018== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
PID 7019: anonymous string 2, zero-backed string 2
==7019==
==7019== HEAP SUMMARY:
==7019== in use at exit: 0 bytes in 0 blocks
==7019== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==7019==
==7019== All heap blocks were freed -- no leaks are possible
==7019==
==7019== For counts of detected and suppressed errors, rerun with: -v
==7019== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
My Question is
Does mmap allocate memory on Heap ? if not what does munmap do ?
On a Unix-like system, your program's address space consists of one or more virtual memory regions, each of which is mapped by the OS to physical memory, to a file, or to nothing at all.
The heap is, generally speaking, one specific memory region created by the C runtime, and managed by malloc (which in turn uses the brk and sbrk system calls to grow and shrink).
mmap is a way of creating new memory regions, independently of malloc (and so independently of the heap). munmap is simply its inverse, it releases these regions.
mmapped memory is neither heap nor stack. It is mapped into virtual address space of the calling process, but it's not allocated on the heap.

Comprehensive methods of viewing memory usage on Solaris [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 7 years ago.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Improve this question
On Linux, the "top" command shows a detailed but high level overview of your memory usage, showing:
Total Memory, Used Memory, Free Memory, Buffer Usage, Cache Usage, Swap size and Swap Usage.
My question is, what commands are available to show these memory usage figures in a clear and simple way? Bonus points if they're present in the "Core" install of Solaris. 'sar' doesn't count :)
Here are the basics. I'm not sure that any of these count as "clear and simple" though.
ps(1)
For process-level view:
$ ps -opid,vsz,rss,osz,args
PID VSZ RSS SZ COMMAND
1831 1776 1008 222 ps -opid,vsz,rss,osz,args
1782 3464 2504 433 -bash
$
vsz/VSZ: total virtual process size (kb)
rss/RSS: resident set size (kb, may be inaccurate(!), see man)
osz/SZ: total size in memory (pages)
To compute byte size from pages:
$ sz_pages=$(ps -o osz -p $pid | grep -v SZ )
$ sz_bytes=$(( $sz_pages * $(pagesize) ))
$ sz_mbytes=$(( $sz_bytes / ( 1024 * 1024 ) ))
$ echo "$pid OSZ=$sz_mbytes MB"
vmstat(1M)
$ vmstat 5 5
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr rm s3 -- -- in sy cs us sy id
0 0 0 535832 219880 1 2 0 0 0 0 0 -0 0 0 0 402 19 97 0 1 99
0 0 0 514376 203648 1 4 0 0 0 0 0 0 0 0 0 402 19 96 0 1 99
^C
prstat(1M)
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
1852 martin 4840K 3600K cpu0 59 0 0:00:00 0.3% prstat/1
1780 martin 9384K 2920K sleep 59 0 0:00:00 0.0% sshd/1
...
swap(1)
"Long listing" and "summary" modes:
$ swap -l
swapfile dev swaplo blocks free
/dev/zvol/dsk/rpool/swap 256,1 16 1048560 1048560
$ swap -s
total: 42352k bytes allocated + 20192k reserved = 62544k used, 607672k available
$
top(1)
An older version (3.51) is available on the Solaris companion CD from Sun, with the disclaimer that this is "Community (not Sun) supported".
More recent binary packages available from sunfreeware.com or blastwave.org.
load averages: 0.02, 0.00, 0.00; up 2+12:31:38 08:53:58
31 processes: 30 sleeping, 1 on cpu
CPU states: 98.0% idle, 0.0% user, 2.0% kernel, 0.0% iowait, 0.0% swap
Memory: 1024M phys mem, 197M free mem, 512M total swap, 512M free swap
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
1898 martin 1 54 0 3336K 1808K cpu 0:00 0.96% top
7 root 11 59 0 10M 7912K sleep 0:09 0.02% svc.startd
sar(1M)
And just what's wrong with sar? :)
# echo ::memstat | mdb -k
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 7308 57 23%
Anon 9055 70 29%
Exec and libs 1968 15 6%
Page cache 2224 17 7%
Free (cachelist) 6470 50 20%
Free (freelist) 4641 36 15%
Total 31666 247
Physical 31256 244
"top" is usually available on Solaris.
If not then revert to "vmstat" which is available on most UNIX system.
It should look something like this (from an AIX box)
vmstat
System configuration: lcpu=4 mem=12288MB ent=2.00
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
2 1 1614644 585722 0 0 1 22 104 0 808 29047 2767 12 8 77 3 0.45 22.3
the colums "avm" and "fre" tell you the total memory and free memery.
a "man vmstat" should get you the gory details.
Top can be compiled from sources or downloaded from sunfreeware.com. As previously posted, vmstat is available (I believe it's in the core install?).
The command free is nice. Takes a short while to understand the "+/- buffers/cache", but the idea is that cache and buffers doesn't really count when evaluating "free", as it can be dumped right away. Therefore, to see how much free (and used) memory you have, you need to remove the cache/buffer usage - which is conveniently done for you.