Solaris CPU run queue - solaris

Is there a command which can tell me whats in the Solaris run queue?
I can get a count using vmstat, but I need to know what processes/threads are in there.

The run-queue is always changing, so it's almost impossible to get the set of processes in the current run-queue.
That said, you can get an approximation by looking at the STAT (state) field of the process list from ps. When running the command below:
$ ps aux
...the if the STAT field begins with R, then the process is marked RUNNABLE by the kernel, which on most operating systems means that it is in the run-queue. Here's what a runnable process looks like on my machine:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 78179 0.0 0.0 599828 480 s003 R+ 7:51AM 0:00.00 ps aux
On solaris, you can also use the prstat command and look at the STATE column. The value run indicates that the process is on the run-queue. (Also note that the value cpuN indicates that the process is currently running on processor N.
For example:
$ prstat -s cpu -n 5
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
13974 kincaid 888K 432K run 40 0 36:14.51 67% cpuhog/1
27354 kincaid 2216K 1928K run 31 0 314:48.51 27% server/5
14690 root 136M 46M sleep 59 0 0:00.59 2.3% Xsun/1
14797 kincaid 9192K 7496K sleep 59 0 0:00.10 0.9% dtwm/8
14851 kincaid 24M 14M sleep 48 0 0:00.03 0.3% netscape/1
Total: 97 processes, 190 lwps, load averages: 2.18, 2.15, 2.11

I was about to correct 0xfe answer when I saw you already did it. The run queue is containing theads not processes so the -L option is mandatory with the prstat command if you want to have the number of "state run" lines more or less matching the run queue. Beware that sampling artifacts will probably prevent to get accurate matches.
In any case, if you want to precisely know what processes/threads are sitting in the run queue you'd rather go the dtrace way assuming you are running Solaris 10 or newer.
The whoqueue.d script which might already been in /usr/demo/dtrace directory on your machine will be a good start:
# dtrace -s /usr/demo/dtrace/whoqueue.d
Run queue of length 1:
24349/1 (dtrace)
Run queue of length 3:
0/0 (sched)
0/0 (sched)
0/0 (sched)
Run queue of length 4:
22468/30 (java)
22468/17 (java)
22468/23 (java)
22468/10 (java)
Have a look at this page for details.

Related

Azure Rest API CI task completes but stuck on non-closing STDIO lingering process

As part of CI pipeline on ADO, I make REST API GET calls to get a list of requirement work items objects and a list of test results object. I sort and match the list and then I do multiple POST call to add information from that list as a attachment of my ADO test item object. Everything is done thought the ADO Matlab plugin task by using system function to execute curl through the command line via Matlab. Everything seem to work, I see the attachment on every test uploaded well and it even prints ''Done'' after all curl POST request to indicate, I'm done with my Matlab script.
I would expect the ADO CI job task to complete and pass to the next task in YAML file. But after running for 5 mins and completing everything, the job task stall and keep running for another 10h(Max ADO pipeline time)
The STDIO streams did not close within 10 seconds of the exit event from process 'C:\agent_work_tasks\RunMATLABCommand_28fdff80-51b4-4b6e-83e1-cfcf3f3b25a6\0.6.3\bin\run_matlab_command.bat'. This may indicate a child process inherited the STDIO streams and has not yet exited.
Any Ideas how to resolve this bug?
What I Tried:
I tried to run my script locally, I saw all my curl outputs via my matlab command window, but I saw that upon completion it opened a empty command prompt window.
looking for similar incidents and tried closing all processes(#4) as indicated here: https://developercommunity.visualstudio.com/t/the-stdio-streams-did-not-close-within-10-seconds/523146
I tried adding pause/wait in matlab script or via system command call to give time for asynchronous process to complete.
I added quit to my matlab script, and I tried using Taskkill/IM cmd.exe to kill all open command windows when my scripts ends.
Both didn't work on the CI pipeline. it still runs forever until i manually stop it or it reaches max time.
Error message & partial log:
2022-04-04T18:39:45.6947848Z curl -u :**ADOPAT** -X POST -H "Content-Type: application/json" -d "{\"stream\":\"***B64 encoded content***\", \"fileName\": \"requirementPath.txt\", \"comment\":\"Testattachmentupload\",\"attachmentType\":\"GeneralAttachment\"}" http://dev.azure.com/{organization}/{project}/_apis/test/Runs/**runID**/Results/**testResultID**/attachments?api-version=6.0-preview
2022-04-04T18:39:46.3140720Z
2022-04-04T18:39:46.3141125Z status =
2022-04-04T18:39:46.3141240Z
2022-04-04T18:39:46.3141367Z 0
2022-04-04T18:39:46.3141523Z
2022-04-04T18:39:46.3141598Z
2022-04-04T18:39:46.3141697Z cmdout =
2022-04-04T18:39:46.3141781Z
2022-04-04T18:39:46.3141992Z ' % Total % Received % Xferd Average Speed Time Time Time Current
2022-04-04T18:39:46.3142311Z Dload Upload Total Spent Left Speed
2022-04-04T18:39:46.3142510Z
2022-04-04T18:39:46.3142760Z 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
2022-04-04T18:39:46.3143126Z 100 354 100 139 100 215 269 417 --:--:-- --:--:-- --:--:-- 692
2022-04-04T18:39:46.3143506Z 100 354 100 139 100 215 269 416 --:--:-- --:--:-- --:--:-- 692
2022-04-04T18:39:46.3143948Z {"id":**attachementID**,"url":"http://dev.azure.com/{organization}/{project}/_apis/test/Runs/**RunID**/Results/**testResultID**/Attachments/**attachmentID**"}'
2022-04-04T18:39:46.3144224Z
2022-04-04T18:43:06.5720937Z done
2022-04-04T18:43:29.0053832Z ##[debug]Exit code 0 received from tool 'C:\agent\_work\_tasks\RunMATLABCommand_28fdff80-51b4-4b6e-83e1-cfcf3f3b25a6\0.6.3\bin\run_matlab_command.bat'
2022-04-04T18:43:39.0075174Z The STDIO streams did not close within 10 seconds of the exit event from process 'C:\agent\_work\_tasks\RunMATLABCommand_28fdff80-51b4-4b6e-83e1-cfcf3f3b25a6\0.6.3\bin\run_matlab_command.bat'. This may indicate a child process inherited the STDIO streams and has not yet exited.
2022-04-04T18:43:39.0076952Z ##[debug]The STDIO streams did not close within 10 seconds of the exit event from process 'C:\agent\_work\_tasks\RunMATLABCommand_28fdff80-51b4-4b6e-83e1-cfcf3f3b25a6\0.6.3\bin\run_matlab_command.bat'. This may indicate a child process inherited the STDIO streams and has not yet exited.
2022-04-05T04:10:04.2626714Z ##[debug]Re-evaluate condition on job cancellation for step: 'Matlab Rest API Test Update Action'.
2022-04-05T04:10:04.2919553Z ##[error]The operation was canceled.
2022-04-05T04:10:04.2932116Z ##[debug]System.OperationCanceledException: The operation was canceled.

Gem5 in full system running spec2006 runs out of memory

What I am trying to do is run a spec2006 benchmark (namely the 410.bwaves one) in full system mode.
I have made a .rcS script to pass to the fs.py script and the command I type to start the simulation is as follows:
build/X86/gem5.opt configs/example/fs.py --script="../run_bwaves.rcS" --disk-image=ubuntu-14.04.img --kernel=x86_64-vmlinux-2.6.22.9
The result after some time is:
Free swap: 0kB
131072 pages of RAM
3650 reserved pages
18 pages shared
0 pages swap cached
Out of memory: kill process 807 (bwaves) score 13154 or a child
Killed process 807 (bwaves)
/tmp/script: line 39: 807 Killed ./bwaves
Full linux output here
Gem5 output here
I am guessing it has something to do with this line: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
but I am not sure.
I have tried adding a --mem-size=... flag but it brakes the simulation with a Memory size not divisible by page size error.
If anyone could help me I would be glad.
Edit: As suggested by comment, I used a large enough --mem-size flag divisible by the page size. The error now has turned into
bwaves[807]: segfault at 00007ffee647624c rip 0000000000410eb5 rsp 00007fff664761c0 error 4
/tmp/script: line 39: 807 Segmentation fault ./bwaves

Getting CPU cycles from user mode dump

Process Explorer has columns for CPU time (down to milliseconds) and CPU Cycles. For WinDbg I am aware of the !runaway command, also !runaway 7 for more details, but it shows CPU time only.
Are the CPU cycles also available somehow in a user mode crash dump?
What I have tried:
I looked at dt nt!_KTHREAD and I see it has a CycleTime property
ntdll!_KTHREAD
+0x000 Header : _DISPATCHER_HEADER
+0x018 CycleTime : Uint8B
I tried to query that property in a !for_each_thread, but WinDbg responds that it's available in kernel mode only.
Why do I want those CPU cycles?
I am working on a training for JetBrains dotTrace. It has an option to count CPU cycles and I'd like to explain where this cycles come from. Above kernel structure and Process Explorer is probably enough, but it would be awesome to see it live or post mortem in a user mode dump. I explain a lot of basics with WinDbg.
Following the implementation of GetProcessTimes() in ReactOS, you can see that the information is copied from the process' KPROCESS. So, indeed, it's only physically present in a dump that includes kernel memory.
C:\tw>ls -l
total 0
C:\tw>cdb -c ".dump /ma .\tw.dmp;q" calc.exe | grep writ
Dump successfully written
C:\tw>cdb -c "lm;!peb;.dump /ma .\tw1.dmp;q" calc.exe | grep writ
Dump successfully written
C:\tw>cdb -c ".ttime;q" -z tw.dmp | grep -B 3 quit
Created: Wed Apr 5 20:03:55.919 2017 ()
Kernel: 0 days 0:00:00.046
User: 0 days 0:00:00.000
quit:
C:\tw>cdb -c ".ttime;q" -z tw1.dmp | grep -B 3 quit
Created: Wed Apr 5 20:04:28.682 2017 ()
Kernel: 0 days 0:00:00.031
User: 0 days 0:00:00.000
quit:
C:\tw>

Error with !runaway command

I am looking a dump file collected from production environment for high cpu usage. I ran !threadpool and !runaway command as follows
0:000> !ThreadPool
CPU utilization: 100%
Worker Thread: Total: 6 Running: 2 Idle: 4 MaxLimit: 32767 MinLimit: 4
Work Request in Queue: 0
--------------------------------------
Number of Timers: 8
--------------------------------------
Completion Port Thread:Total: 8 Free: 3 MaxFree: 8 CurrentLimit: 8 MaxLimit: 1000 MinLimit: 4
0:000> !runaway
ERROR: !runaway: extension exception 0x80004002.
"Unable to get thread times - dumps may not have time information"
I want to know what threads are consuming most cpu time but I cannot run !runaway command. Are there any other commands in sos, sosex or any other extension that could be helpful in this case?
You need a tool that adds the necessary information to the dump.
In WinDbg, the .dump command has the /mt MiniOption, which
Adds additional thread information to the minidump. This includes thread times, which can be displayed by using the !runaway extension or the .ttime (Display Thread Times) command when debugging the minidump.
(Emphasis: links in WinDbg)
The t option is included in the a option as well, so .dump /ma is fine as well.
To find out whether or not your dump has that information, use the undocumented command .dumpdebug like this:
.shell -ci ".dumpdebug" findstr "MiniDump"
If there is a line
1000 MiniDumpWithThreadInfo
the information is contained and you have a different issue. If it's not there, the time info is not available.
Most other tools I know do not provide such detailed settings, so it's more or less luck, whether this info is included or not.

Process information in dump

I learnt that .tlist command in windbg dumps all the processes running in the system at the time of creating crash dump.
I would like to see the Memory Information of each process. So that it will help me to see if the system is over loaded by any specific process.
!process 0 1 will list all the processes and show memory related info for each. I issued this command using livekd and got all the processes. And here's my chrome process (which I picked out from the output):
PROCESS fffffa8007cb4200
SessionId: 1 Cid: 1158 Peb: 7efdf000 ParentCid: 0ff8
DirBase: 1b7962000 ObjectTable: fffff8a00addb010 HandleCount: 135.
Image: chrome.exe
VadRoot fffffa80090a6f80 Vads 169 Clone 0 Private 4037. Modified 3702. Locked 0.
DeviceMap 0000000000000000
Token fffff8a0091f9120
ElapsedTime 00:05:49.161
UserTime 00:00:00.000
KernelTime 00:00:00.000
QuotaPoolUsage[PagedPool] 0
QuotaPoolUsage[NonPagedPool] 0
Working Set Sizes (now,min,max) (8020, 50, 345) (32080KB, 200KB, 1380KB)
PeakWorkingSetSize 10137
VirtualSize 144 Mb
PeakVirtualSize 151 Mb
PageFaultCount 66631
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 5784
Job fffffa8009822e30
Note memory related properties such as "Working Set Sizes", "Virtual Size", etc.
ps. Works with livekd and with system memory dumps (which I believe is what livekd does).
Marc
This information is not contained in process dump. .tlist queries your current system, not the state when the dump was taken. If you can take a system dump, than you can check out processes and their memory usage, as Marc Sherman already answered.