I've got a server running around 500 powershell processes. Each of these processes are designed to make WMI calls across our environment. I've been careful to verify that I do not use up all of the server's available memory or CPU. When I have all 500 processes running, I'm at around 70% memory usage.
Just in case anybody is wondering how the individual processes are handled, they are executed using a gearman job worker. Basically a shell python script that calls a powershell script...times 500.
The issue i'm running into is that some of my powershell processes are crashing after running a few hours.
Some of the errors that I'm getting are:
A new guard page for the stack cannot be created
When I open event viewer, I see these events when processes crash
Fault bucket , type 0
Event Name: PowerShell
Response: Not available
Cab Id: 0
Problem signature:
P1: powershell.exe
P2: 6.3.9600.16394
P3: System.OutOfMemoryException
P4: System.OutOfMemoryException
P5: oft.PowerShell.ConsoleHost.ReportExceptionFallback
P6: lization.EncodingTable.nativeCreateOpenFileMapping
P7: Consol.. main thread
P8:
P9:
P10:
Attached files:
These files may be available here:
C:\path
Analysis symbol:
Rechecking for solution: 0
Report Id: ID
Report Status: 2048
Hashed bucket:
I'm guessing it has something to do with powershell running out of memory, but the server is not peaked, and not all processes crash, it is sporadic.
Any help would be appreciated.
Here are more crash results, the powershell fault module names are different from time to time:
Problem Event Name: APPCRASH
Application Name: powershell.exe
Application Version: 6.3.9600.16384
Application Timestamp: 52158733
Fault Module Name: ntdll.dll
Fault Module Version: 6.3.9600.16408
Fault Module Timestamp: 523d45fa
Exception Code: c00000fd
Exception Offset: 00069abb
OS Version: 6.3.9600.2.0.0.272.7
Locale ID: 1033
Additional Information 1: 624b
Additional Information 2: 624b484d3cf74536f98239c741379147
Additional Information 3: a901
Additional Information 4: a901f876e92d1eb79eb3a513defef0c6
Problem signature:
Problem Event Name: APPCRASH
Application Name: powershell.exe
Application Version: 6.3.9600.16384
Application Timestamp: 52158733
Fault Module Name: combase.dll
Fault Module Version: 6.3.9600.16408
Fault Module Timestamp: 523d3001
Exception Code: c00000fd
Exception Offset: 0001a360
OS Version: 6.3.9600.2.0.0.272.7
Locale ID: 1033
Additional Information 1: 81ca
Additional Information 2: 81cae32566783b059420874b47802c3e
Additional Information 3: b637
Additional Information 4: b6375e6f6a866fc9d00393d4649231b8
have you looked at your max memory allocation per shell?
get-item WSMan:\localhost\Shell\MaxMemoryPerShellMB
and if its too low changing this;
set-item WSMan:\localhost\Shell\MaxMemoryPerShellMB 2048
Doesn't .Net have a limit of memory?
If you're using TaskManager to check on memory usage, you might try Process Explorer instead. It sometimes gives very different results.
Thanks everyone for the responses, it turns out that I had a memory leak in my powershell code that was causing memory usage to spike every now and then. Since I was not watching the server at every second, I missed when the memory usage spiked.
An interesting note, it appears that Powershell will not use more then 80% of available memory on a server before killing its own processes.
I had to increase the available memory to 56GB and now I'm not running into any issues whatsoever. I've been running 600 powershell processes for a week now and have not had one crash on me.
Related
I get an appcrash when attempting to install arangoDB in on a windows 7 machine. I have also tried the XCOPY version and have the same issue. The APPCRASH gives the following:
Problem signature:
Problem Event Name: APPCRASH
Application Name: arangod.exe
Application Version: 0.0.0.0
Application Timestamp: 59704d12
Fault Module Name: arangod.exe
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 59704d12
Exception Code: c000001d
Exception Offset: 0000000000200f77
OS Version: 6.1.7601.2.1.0.256.1
Locale ID: 2057
Additional Information 1: caa2
Additional Information 2: caa2bb545c0b7fee68e5ff27d1b7f78d
Additional Information 3: 95f8
Additional Information 4: 95f82d1cb337322ec0f22184a0acdc62
I do not believe it even attempts to access the arangod.conf file let alone object to something inside it.
I used Windows debugger to try and get some additional clues but I'm left stumped. The results are here: http://textuploader.com/do6wn
Please use github issues for these kind of problems.
JVM crashes surprizingly and frequently on our prod environment and results in Jboss (EAP6.3) going down. We have java7 U72 installed
Crash logs has same output where current thread is:
Current thread (0x00000000d1d99000): JavaThread "Lucene Merge Thread #0" daemon [_thread_in_Java, id=1144, stack(0x00000000f6a00000,0x00000000f6b00000)]
and all the log is full of :
JavaThread "elasticsearch[Node BD852E44][search][T#68]" daemon [_thread_blocked, id=14396, stack(0x00000000f7b30000,0x00000000f7c30000)]
elasticsearch is some were related to indexing and it uses Lucene in hood as far as I understand but we have number or application deployed how to check on this can someone please help. complete crash logs are at : http://pastebin.com/845LU9iK
Looks like it didn't manage to record stack traces for the affected thread.
If that's the same for all crashes then it doesn't seem to match known lucene or jboss bugs.
# guarantee(result == EXCEPTION_CONTINUE_EXECUTION) failed: Unexpected result from topLevelExceptionFilter
AIUI this indicates an error in native exception handling, so it's one error masking another, probably making this crash log fairly useless.
So I can only provide really generic advice:
you're using an older JVM version, update to the latest java 7, java 8 or possibly even a java 9 dev build and see if it goes away. Even if they still crash they might provide different/more useful error reports
to diagnose potential compiler bugs you can try running with the following flags
-XX:-TieredCompilation 1 should disable the C1 compiler
-XX:+TieredCompilation -XX:TieredStopAtLevel=1 should disable the C2 compiler
-Xint disables all JIT, very slow
ask on the hotspot-dev mailing list for further guidance
1: Tiered compilation is a new java 7 feature, it basically combines the interpreter, C1 and C2 JIT compilers (which formerly were used separately in the client and server VMs) into different optimizing stages.
Each of them can have optimization bugs. Turning off individual stages helps isolating them as potential cause.
Edit: The new crash report is more useful since it at least has java frames, the interesting part is the following:
J 1559 sun.misc.Unsafe.getByte(J)B (0 bytes) # 0x000000000178e99b [0x000000000178e960+0x3b]
j java.nio.DirectByteBuffer.get()B+11
j org.apache.lucene.store.ByteBufferIndexInput.readByte()B+4
J 9447 C2 org.apache.lucene.store.DataInput.readVInt()I (114 bytes) # 0x000000000348cc00 [0x000000000348cbc0+0x40]
DataInput.readVInt seems to be an ongoing source of grief, see this SO answer for possible solutions
I am executing below C# code -
for (; ; )
{
Console.WriteLine("Doc# {0}", ctr++);
BsonDocument log = new BsonDocument();
log["type"] = "auth";
BsonDateTime time = new BsonDateTime(DateTime.Now);
log["when"] = time;
log["user"] = "staticString";
BsonBoolean bol = BsonBoolean.False;
log["res"] = bol;
coll.Insert(log);
}
When I run it on a MongoDB instance (version 2.0.2) running on virtual 64 bit Linux machine with just 512 MB ram, I get about 5k inserts with 1-2 faults as reported by mongostat after few mins.
When same code is run against a MongoDB instance (version 2.0.2) running on a physical Windows machine with 8 GB of ram, I get 2.5k inserts with about 80 faults as reported by mongostat after few mins.
Why more faults are occurring on Windows? I can see following message in logs-
[DataFileSync] FlushViewOfFile failed 33 file
Journaling is disable on both instances
Also, is 5k insert on a virtual machine with 1-2 faults a good enough speed? or should I be expecting better inserts?
Looks like this is a known issue - https://jira.mongodb.org/browse/SERVER-1163
page fault counter on Windows is in fact the total page faults which include both hard and soft page fault.
Process : Page Faults/sec. This is an indication of the number of page faults that
occurred due to requests from this particular process. Excessive page faults from a
particular process are an indication usually of bad coding practices. Either the
functions and DLLs are not organized correctly, or the data set that the application
is using is being called in a less than efficient manner.
I have been getting a stackoverflow exception in my program which may be originating from a thirdparty libary, microsoft.sharepoint.client.runtime.dll.
Using adplus to create the crash dump, I'm facing the problem that I'm struggling to get any information from it when i open it in windbg. This is what I get as a response:
> 0:000> .restart /f
Loading Dump File [C:\symbols\FULLDUMP_FirstChance_epr_Process_Shut_Down_DocumentumMigrator.exe__0234_2011-11-17_15-19-59-426_0d80.dmp]
User Mini Dump File with Full Memory: Only application data is available
Comment: 'FirstChance_epr_Process_Shut_Down'
Symbol search path is: C:\symbols
Executable search path is:
Windows 7 Version 7601 (Service Pack 1) MP (8 procs) Free x64
Product: Server, suite: Enterprise TerminalServer SingleUserTS
Machine Name:
Debug session time: Thu Nov 17 15:19:59.000 2011 (UTC + 2:00)
System Uptime: 2 days 2:44:48.177
Process Uptime: 0 days 0:13:05.000
.........................................WARNING: rsaenh overlaps cryptsp
.................WARNING: rasman overlaps apphelp
......
..WARNING: webio overlaps winhttp
.WARNING: credssp overlaps mswsock
.WARNING: IPHLPAPI overlaps mswsock
.WARNING: winnsi overlaps mswsock
............
wow64cpu!CpupSyscallStub+0x9:
00000000`74e42e09 c3 ret
Any ideas as to how i can get more information from the dump, or how to use it to find where my stackoverflow error is occuring?
The problem you are facing is that the process is 32-bit, but you are running on 64-bit, therefore your dump is a 64-bit dump. To make use of the dump you have to run the following commands:
.load wow64exts
.effmach x86
!analyze -v
The last command should give you a meaningful stack trace.
This page provides lots of useful information and method to analyze the problem.
http://www.dumpanalysis.org/blog/index.php/2007/09/11/crash-dump-analysis-patterns-part-26/
You didn't mention if your code is managed or unmanaged. Assuming it is unmanaged. In debugger:
.symfix
.reload
~*kb
Look through the call stack for all threads and identify thread that caused SO. It is easy to identify the thread with SO, because the call stack will be extra long. Switch to that thread using command ~<N>s, where is thread number, dump more of the call stack using command k 200 to dump up to 200 lines of call stack. At the very bottom of the call stack you should be able to see the code that originated the nested loop.
If your code is managed, use SOS extension to dump call stacks.
I compile my application on a windows XP SP3 machine. When it compiles, I try to lauch it, and windows replies me back with :
Unable to start program 'xx'. This
application has failed to start
because the application configuration
is incorrect. Reviex the manifest file
for possible errors. Reinstalling the
application may fix this problem. For
more details , please see the
application event log.
Trying to copy DLL files didn't help (see my previous question if you want).
I've launch Process monitor from sysinternals then.
I try here to summarise the report while it is not very long.
The process starts, then its first thread. Following is calls to :
QueryNameInformationFile() of my exe file => SUCCESS
Load Image() of my exe file => SUCCESS
Load Image() of ntdll.dll => SUCCESS
QueryNameInformationFile() if my exe file => SUCCESS
CreateFile() Try to create it un C:\WINDOWS\Prefetch\blahbla.pf => NAME NOT FOUND
then the thread and the process exits.
I've add my users with full control on that folder (C:\WINDOWS\prefetch), but did not help.
How to make it work? I feel if I go through this step, my application will work as expected.
Edit: I add procmon details about the error:
18:13:40,4305346 xxx.exe 3172 CreateFile C:\WINDOWS\Prefetch\XXX.EXE-1FA9609A.pf NAME
NOT FOUND Desired Access: Generic
Read, Disposition: Open, Options:
Synchronous IO Non-Alert, Attributes:
n/a, ShareMode: None, AllocationSize:
n/a
Is Task Scheduler running on the PC? A way to repair Prefetch is detailed here, if that is causing the problem :
http://members.rushmore.com/~jsky/id14.html