Why autovacuum process takes so much memory and swap memory? - postgresql

We have a product log database which only produces insert sql。 But we found that
The autovacuum process took so much memory。 it takes about 16 GB at peak, And it is
happend about every 2 month。The folloing is the detail information, Any one know it?
And the table skytf.urs_user_log_201105 only has insert operation, and have no
update,delete operations, So i think the table has no dead tuples! But why the autovacuum
process takes so much memory on the table?
--top detail
top - 16:39:46 up 225 days, 1:12, 1 user, load average: 1.29, 1.51, 1.52
Tasks: 341 total, 2 running, 339 sleeping, 0 stopped, 0 zombie
Cpu(s): 4.8%us, 5.3%sy, 0.0%ni, 85.5%id, 4.1%wa, 0.0%hi, 0.4%si, 0.0%st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29267 postgres 14 -1 27.0g 16g 2928 S 1.0 72.2 199:59.74 postgres: autovacuum launcher process
From the above, we can see that the autovacuum process taks about 16 gb;
--current sql
postgres=# select datname, current_query from pg_stat_activity where current_query !='<IDLE>';
datname | current_query
----------+-------------------------------------------------------------------------------------
skytf | autovacuum: VACUUM skytf.urs_user_log_201105 (to prevent wraparound)
--table size
skytf=> \dt+ urs_user_log_201105
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------------------+-------+--------+-------+-------------
skytf | urs_user_log_201105 | table | skytf | 62 GB |
(1 row)
--memory state
postgres#logdb-> free -m
total used free shared buffers cached
Mem: 24104 24028 75 0 4 5545
-/+ buffers/cache: 18479 5625
Swap: 16386 8824 7561

If you look at this:
autovacuum: VACUUM skytf.urs_user_log_201105 (to prevent wraparound)
this is not a regular autovacuum. it is running to prevent transaction wraparound. this kind of autovacuum must run before transaction id goes beyond two billion. read more at here:
http://www.postgresql.org/docs/8.3/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
you can control the behavior of it with vacuum_freeze_min_age, vacuum_freeze_max_age. you can not disable this kind of autovacuum whether you have disabled autovacuum or not.

only has insert operation, and have no update,delete operations, So i think the table has no dead tuples!
If memory serves, autovacuum will actually do two things:
vacuum
analyze
The first won't kick in if you only ever get inserts. But the second continues to do so when PG thinks the stats might have changed a bit.

Related

Get disk informations without using WMI

I'm using check_mk as monitoring solution and I disabled WMI service cause it create timeout when check_mk query for information.
Get-WmiObject / Get-Disk / Get-PSDrive use WMI service to get information and I would like to get disk information like total space, used space etc without using WMI beacause I can't.
Do you know any workaround do to that?
TL;DR -
(echo select disk=0 & echo list partition & (for /l %A in (1,1,10) do #echo select disk=next &#echo list partition)) | diskpart | findstr /i /v /r "^$ > microsoft ^reached ^select ^there ^the\ start"
Details -
The 'diskpart.exe' command can get you what you want. It requires admin rights, but since you mentioned disabling services, that didn't sound like an issue.
Rather than interacting with DISKPART's unique menu system, this example will blindly request the list of partitions on the first 11 disks (and filter away unnecessary lines..). Should be enough.
:-)
Cmd:
(echo select disk=0 & echo list partition & (for /l %A in (1,1,10) do #echo select disk=next &#echo list partition)) | diskpart | findstr /i /v /r "^$ > microsoft ^reached ^select ^there ^the\ start"
Output From My Live System:
Disk 0 is now the selected disk.
Partition ### Type Size Offset
------------- ---------------- ------- -------
Partition 1 Primary 1863 GB 1024 KB
Disk 1 is now the selected disk.
Partition ### Type Size Offset
------------- ---------------- ------- -------
Partition 1 Primary 350 MB 1024 KB
Partition 2 Primary 270 GB 351 MB
Partition 3 Recovery 845 MB 271 GB
Partition 4 Primary 204 GB 272 GB
Disk 2 is now the selected disk.
Partition ### Type Size Offset
------------- ---------------- ------- -------
Partition 1 Primary 931 GB 1024 KB

Getting CPU cycles from user mode dump

Process Explorer has columns for CPU time (down to milliseconds) and CPU Cycles. For WinDbg I am aware of the !runaway command, also !runaway 7 for more details, but it shows CPU time only.
Are the CPU cycles also available somehow in a user mode crash dump?
What I have tried:
I looked at dt nt!_KTHREAD and I see it has a CycleTime property
ntdll!_KTHREAD
+0x000 Header : _DISPATCHER_HEADER
+0x018 CycleTime : Uint8B
I tried to query that property in a !for_each_thread, but WinDbg responds that it's available in kernel mode only.
Why do I want those CPU cycles?
I am working on a training for JetBrains dotTrace. It has an option to count CPU cycles and I'd like to explain where this cycles come from. Above kernel structure and Process Explorer is probably enough, but it would be awesome to see it live or post mortem in a user mode dump. I explain a lot of basics with WinDbg.
Following the implementation of GetProcessTimes() in ReactOS, you can see that the information is copied from the process' KPROCESS. So, indeed, it's only physically present in a dump that includes kernel memory.
C:\tw>ls -l
total 0
C:\tw>cdb -c ".dump /ma .\tw.dmp;q" calc.exe | grep writ
Dump successfully written
C:\tw>cdb -c "lm;!peb;.dump /ma .\tw1.dmp;q" calc.exe | grep writ
Dump successfully written
C:\tw>cdb -c ".ttime;q" -z tw.dmp | grep -B 3 quit
Created: Wed Apr 5 20:03:55.919 2017 ()
Kernel: 0 days 0:00:00.046
User: 0 days 0:00:00.000
quit:
C:\tw>cdb -c ".ttime;q" -z tw1.dmp | grep -B 3 quit
Created: Wed Apr 5 20:04:28.682 2017 ()
Kernel: 0 days 0:00:00.031
User: 0 days 0:00:00.000
quit:
C:\tw>

PostgreSQL Transaction ID went backwards

In PostgreSQL 9.0, I have a table that keeps tracks of last processed transactions. For some reason, it went backwards (in time)! Here is the table data:
seq_id | tx_id
628 | 10112
629 | 10118
630 | 10124
631 | 10130
632 | 10136
654 | 10160
655 | 10166 <---
656 | 4070 <---
657 | 4071
658 | 4084
659 | 4090
660 | 4096
How can this happen? Can a restart of the database induce such behavior?
Thanks for any hints.
Regards,
D.
This is an invalid issue. Please ignore.
It turns out that the issue came out of restoring the table from a backup and continue working with (invalid) previous data, in a newly created database :-(
Thanks you for all those who responded already.
Case closed.
Lesson learned: TXID will NOT go backwards and they do get synced to a slave instance if you're using a Master/Slave setup. TXID rollovers are also correctly handled. Hope this will help others who might be thinking TXID can go backwards!

Process information in dump

I learnt that .tlist command in windbg dumps all the processes running in the system at the time of creating crash dump.
I would like to see the Memory Information of each process. So that it will help me to see if the system is over loaded by any specific process.
!process 0 1 will list all the processes and show memory related info for each. I issued this command using livekd and got all the processes. And here's my chrome process (which I picked out from the output):
PROCESS fffffa8007cb4200
SessionId: 1 Cid: 1158 Peb: 7efdf000 ParentCid: 0ff8
DirBase: 1b7962000 ObjectTable: fffff8a00addb010 HandleCount: 135.
Image: chrome.exe
VadRoot fffffa80090a6f80 Vads 169 Clone 0 Private 4037. Modified 3702. Locked 0.
DeviceMap 0000000000000000
Token fffff8a0091f9120
ElapsedTime 00:05:49.161
UserTime 00:00:00.000
KernelTime 00:00:00.000
QuotaPoolUsage[PagedPool] 0
QuotaPoolUsage[NonPagedPool] 0
Working Set Sizes (now,min,max) (8020, 50, 345) (32080KB, 200KB, 1380KB)
PeakWorkingSetSize 10137
VirtualSize 144 Mb
PeakVirtualSize 151 Mb
PageFaultCount 66631
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 5784
Job fffffa8009822e30
Note memory related properties such as "Working Set Sizes", "Virtual Size", etc.
ps. Works with livekd and with system memory dumps (which I believe is what livekd does).
Marc
This information is not contained in process dump. .tlist queries your current system, not the state when the dump was taken. If you can take a system dump, than you can check out processes and their memory usage, as Marc Sherman already answered.

Solaris CPU run queue

Is there a command which can tell me whats in the Solaris run queue?
I can get a count using vmstat, but I need to know what processes/threads are in there.
The run-queue is always changing, so it's almost impossible to get the set of processes in the current run-queue.
That said, you can get an approximation by looking at the STAT (state) field of the process list from ps. When running the command below:
$ ps aux
...the if the STAT field begins with R, then the process is marked RUNNABLE by the kernel, which on most operating systems means that it is in the run-queue. Here's what a runnable process looks like on my machine:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 78179 0.0 0.0 599828 480 s003 R+ 7:51AM 0:00.00 ps aux
On solaris, you can also use the prstat command and look at the STATE column. The value run indicates that the process is on the run-queue. (Also note that the value cpuN indicates that the process is currently running on processor N.
For example:
$ prstat -s cpu -n 5
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
13974 kincaid 888K 432K run 40 0 36:14.51 67% cpuhog/1
27354 kincaid 2216K 1928K run 31 0 314:48.51 27% server/5
14690 root 136M 46M sleep 59 0 0:00.59 2.3% Xsun/1
14797 kincaid 9192K 7496K sleep 59 0 0:00.10 0.9% dtwm/8
14851 kincaid 24M 14M sleep 48 0 0:00.03 0.3% netscape/1
Total: 97 processes, 190 lwps, load averages: 2.18, 2.15, 2.11
I was about to correct 0xfe answer when I saw you already did it. The run queue is containing theads not processes so the -L option is mandatory with the prstat command if you want to have the number of "state run" lines more or less matching the run queue. Beware that sampling artifacts will probably prevent to get accurate matches.
In any case, if you want to precisely know what processes/threads are sitting in the run queue you'd rather go the dtrace way assuming you are running Solaris 10 or newer.
The whoqueue.d script which might already been in /usr/demo/dtrace directory on your machine will be a good start:
# dtrace -s /usr/demo/dtrace/whoqueue.d
Run queue of length 1:
24349/1 (dtrace)
Run queue of length 3:
0/0 (sched)
0/0 (sched)
0/0 (sched)
Run queue of length 4:
22468/30 (java)
22468/17 (java)
22468/23 (java)
22468/10 (java)
Have a look at this page for details.