Exception record for hangs - windbg

Whenever we analyze a hang dump, if we try .exr -1,
The following result is seen.
ExceptionAddress: 000000
ExceptionCode: cfffffff (Application Hang)
.exr -1 means "last exception thrown".
Who is throwing the exception when the application is hung. Normally, in case of crashes, system throws an exception and KiUserDispatch... catches it and proceeds.
But what happens when a hang occurs? Is system throwing an exception? From where that exception record comes from?

An application hang does not generate any exception, so you must examine the call stack for all threads and try to figure out “who is waiting for who”
The following commands my help you:
a) Examine all stacks
~*e ?? #$tid;kvn
b) List critical section locked with the owners stack
!cs –l -o
Also you could try DebugDiag to do a Crash/Hang analyze

Related

Concurrent modification should not be logged as ERROR?

We have an application running on a Wildfly 17. I have a scenario, which occurs occasionally, in which two background threads are accessing the same entity:
Thread A deletes the entity (for good reason)
Thread B is working on slightly older data and attempts to update the entity
When thread B is the later of the two, it fails due to the concurrent modification. This works correctly. It is retried automatically and finds that nothing needs to be done anymore (because the entity has been deleted). That is the intended behavior, when these two threads collide. All is fine!
However I find that this is logged as ERROR by CMTTxInterceptor:
2020-03-31 16:51:35,463 +0200 ERROR: as.ejb3.invocation - WFLYEJB0034: EJB Invocation failed on component ... for method ... throws ...:
javax.ejb.EJBTransactionRolledbackException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1
at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:203) [wildfly-ejb3-17.0.1.Final.jar:17.0.1.Final]
at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:364) [wildfly-ejb3-17.0.1.Final.jar:17.0.1.Final]
at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:144) [wildfly-ejb3-17.0.1.Final.jar:17.0.1.Final]
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
...
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_171]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
Caused by: javax.persistence.OptimisticLockException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1
at org.hibernate.jpa.spi.AbstractEntityManagerImpl.wrapStaleStateException(AbstractEntityManagerImpl.java:1729) [hibernate-entitymanager.jar:5.0.12.Final]
... at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:185) [wildfly-ejb3-17.0.1.Final.jar:17.0.1.Final]
... 262 more
Caused by: org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1
at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:67) [hibernate-core.jar:5.0.12.Final]
at org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:54) [hibernate-core.jar:5.0.12.Final]
at org.hibernate.engine.jdbc.batch.internal.NonBatchingBatch.addToBatch(NonBatchingBatch.java:46) [hibernate-core.jar:5.0.12.Final]
at org.hibernate.persister.entity.AbstractEntityPersister.delete(AbstractEntityPersister.java:3261) [hibernate-core.jar:5.0.12.Final]
...
It seems to me, that this log is incorrect, mostly because this is not an ERROR. Concurrent modification is something normal, that is to be expected - and that our application logic handles. This log will distract my colleagues at the hotline.
Do you agree that the logging is incorrect, or am I missing something?
I think I will disable logging for "as.ejb3.invocation".
In your case this exception was thrown because Hibernate detected that the entity previously fetched from database was changed (deleted) during the current transaction. So, there is nothing to update.
In this specific case I think you can ignore or swallow the exception. In other cases, it may be good to know about this exception, so I recommend swallow the exception but don’t disable as a whole on log level.

UndeliverableException thrown within a RxAndroidBle stream

I have a misbehaving BLE device (temp sensor) that keeps throwing a status 8 (GATT_INSUF_AUTHORIZATION or GATT_CONN_TIMEOUT) exception everytime i try to connect to the device. I'm not concerned about this exception as the device is faulty.
However, I keep getting notified that i've not handled the error correctly by rxjava2 when using RxAndroidBle(1.9.1); see here;
This is my code.
rxBleClient
.getBleDevice(macAddress)
.establishConnection(false)
.flatMapSingle { it.readRssi() }
.subscribe({ "test1:Success" }, { "test1:error" })
and the Error
I/RxBle#GattCallback: MAC='E9:CF:8A:D0:01:19' onConnectionStateChange(), status=8, value=0
D/RxBle#ClientOperationQueue: FINISHED ConnectOperation(147547253) in 10257 ms
D/RxBle#ConnectionOperationQueue: Connection operations queue to be terminated (MAC='E9:CF:8A:D0:01:19')
com.polidea.rxandroidble2.exceptions.BleDisconnectedException: Disconnected from MAC='E9:CF:8A:D0:01:19' with status 8 (GATT_INSUF_AUTHORIZATION or GATT_CONN_TIMEOUT)
at com.polidea.rxandroidble2.internal.connection.RxBleGattCallback$2.onConnectionStateChange(RxBleGattCallback.java:77)
at android.bluetooth.BluetoothGatt$1$4.run(BluetoothGatt.java:249)
at android.bluetooth.BluetoothGatt.runOrQueueCallback(BluetoothGatt.java:725)
at android.bluetooth.BluetoothGatt.-wrap0(Unknown Source:0)
at android.bluetooth.BluetoothGatt$1.onClientConnectionState(BluetoothGatt.java:244)
at android.bluetooth.IBluetoothGattCallback$Stub.onTransact(IBluetoothGattCallback.java:70)
at android.os.Binder.execTransact(Binder.java:697)
D/BleDeviceManagerNew$observeRssiTest: test1:error
E/plication$setupApp: Terminal Exception From RXJAVA was Not handled correctly
io.reactivex.exceptions.UndeliverableException: The exception could not be delivered to the consumer because it has already canceled/disposed the flow or the exception has nowhere to go to begin with. Further reading: https://github.com/ReactiveX/RxJava/wiki/What's-different-in-2.0#error-handling | com.polidea.rxandroidble2.exceptions.BleDisconnectedException: Disconnected from MAC='E9:CF:8A:D0:01:19' with status 8 (GATT_INSUF_AUTHORIZATION or GATT_CONN_TIMEOUT)
at io.reactivex.plugins.RxJavaPlugins.onError(RxJavaPlugins.java:367)
at io.reactivex.internal.operators.observable.ObservableUnsubscribeOn$UnsubscribeObserver.onError(ObservableUnsubscribeOn.java:67)
at io.reactivex.internal.operators.observable.ObservableSubscribeOn$SubscribeOnObserver.onError(ObservableSubscribeOn.java:63)
I'm not sure what else I should do - i've implemented a 'catch all' solution but don't like this approach;
RxJavaPlugins.setErrorHandler { e -> Timber.e(e, "Terminal Exception From RXJAVA was Not handled correctly") }
but don't see that as a good solution as expected that i should be-able to handle exception on the steam. Any suggestions of where I went wrong?
Your code is fine. The library has a flaw that does not allow to achieve your desired behaviour. More on the topic is on this library's wiki page.
While it is possible to design an API that would not throw UndeliverableException it would need to have a separate error Observable or Completable for BluetoothAdapter turning off and a separate one for RxBleConnection disconnect. The user would be responsible to mix those into their chain appropriately.
Current API does not allow it.

Multiple GPU code on Matlab runs for few seconds only

I am running the following MATLAB code on a system with one GTX 1080 and a K80 (with 2 GPUs)
delete(gcp('nocreate'));
parpool('local',2);
spmd
gpuDevice(labindex+1)
end
reset(gpuDevice(2))
reset(gpuDevice(3))
parfor i=1:100
SingleGPUMatlabCode(i);
end
The code runs for around a second. When I rerun the code after few seconds. I get the message:
Error using parallel.gpu.CUDADevice/reset
An unexpected error occurred during CUDA execution. The
CUDA error was:
unknown error
Error in CreateDictionary
reset(gpuDevice(2))
I tried increasing TdrDelay, but it did not help.
Something in your GPU code is causing an error on the device. Because the code is running asynchronously, this error is not picked up until the next synchronisation point, which is when you run the code again. I would need to see the contents of SingleGPUMatlabCode to know what that error might be. Perhaps there's an allocation failure or an out of bounds access. Errors that aren't correctly handled will get converted to 'unknown error' at the next CUDA operation.
Try adding wait(gpuDevice) inside the loop to identify when the error is occurring.
If either device 2 or 3 are the GTX1080, you may have discovered an issue with MATLAB's restricted support for the Pascal architecture. See https://www.mathworks.com/matlabcentral/answers/309235-can-i-use-my-nvidia-pascal-architecture-gpu-with-matlab-for-gpu-computing
If this is caused by the Windows timeout, you would see a several second screen blackout.

c++ amp matrixmultiplication accelerator_view_removed at memory location

I am playing with the matrixmultiplication project downloadable from the bottom of the site:
http://blogs.msdn.com/b/nativeconcurrency/archive/2011/11/02/matrix-multiplication-sample.aspx
When I change the values of M, N, W from 256 to 4096, an unhandled exception is thrown:
Unhandled exception at 0x7630C42D in MatrixMultiplication.exe: Microsoft C++ exception: Concurrency::accelerator_view_removed at memory location 0x001CE2F0.
The console output is:
Using device: NVIDIA GeForce GT 640M
MatrixDiemnsion C(4096x4096) = A(4096x4096) * B(4096x4096)
CPU(single core) exec completed.
AMP Simple
The next statement to be executed is leaving the function mxm_amp_simple.
I am using VS2013 Ultimate on Windows 7 Professional N.
Why does this occur and how to prevent this from happening?
EDIT: I have found that the greatest value for M,N,W with which AMP Simple does not lead to a breakpoint being hit is 2800 (M=2800, N=2800, W=2800).
AMP Tiled on the other hand sometimes leads to a breakpoint, and in other cases executes correctly for M,N,W equal to 4096.
The exception is accompanied by a system error message:
"Display driver stopped responding and has recovered. Display driver NVIDIA Windows Kernel Mode Driver, Version 331.65 stopped responding and has successfully recovered."
In case someone else needs this.
This issue is most likely caused by Timeout Detection and Recovery (TDR). If kernel runs for more then 2 seconds windows will kill it and throw Concurrency::accelerator_view_removed exception. The easiest way to check this is to wrap code in try / catch bock. E.g.
try {
av_c.synchronize();
} catch (const Concurrency::accelerator_view_removed& e) {
printf("%s\n", e.what());
}
Microsoft has a blog post with more information, including pointers to instructions how to disable it.

Perl tk main window error

I have a Perl Tk application.
If I move the main window so that it's not right up to the uppermost part of the screen, then the next time the following code is executed, the script fails:
$canvas_fimage_real=$canvas_fimage->Subwidget('canvas');
$canvas_fimage_real=$canvas_fimage unless $canvas_fimage_real;
my $canvas_id=$canvas_fimage_real->id;
my $canvas_fimage_photo=$main_window::main_window->Photo(-format=>'Window', -data=>oct $canvas_id );
And it fails with the following error message:
X Error of failed request: BadMatch (invalid parameter attributes)
Major opcode of failed request: 73 (X_GetImage)
Serial number of failed request: 2796
Current serial number in output stream: 2796
The script crashes at the Photo command.
How can I fix this?
Is this a window that is wholly on the screen? The snapshotting facility only works with what is visible on-screen (a low-level X11 condition; not negotiable). As such, you should file a bug report as the snapshot code shouldn't ask for things that it can't get.
Of course, if the window is fully on screen and you're getting that error message anyway, that's a serious problem. File a bug report in that case too!