failed using cuda-gdb to launch program with CUPTI calls - callback

I'm having this weird issue: I have a program that uses CUPTI callbackAPI to monitor the kernels in the program. It runs well when it's directly launched; but when I put it under cuda-gdb and run, it failed with the following error:
error: function cuptiSubscribe(&subscriber, CUpti_CallbackFunc)my_callback, NULL) failed with error CUPTI_ERROR_NOT_INITIALIZED
I've tried all examples in CUPTI/samples and concluded that programs that use callbackAPI and activityAPI will fail under cuda-gdb. (They are all well-behaved without cuda-gdb) But the fail reason differs:
If I have calls from activityAPI, then once run it under cuda-gdb, it'll hang for a minute then exit with error:
The CUDA driver has hit an internal error. Error code: 0x100ff00000001c Further execution or debugging is unreliable. Please ensure that your temporary directory is mounted with write and exec permissions.
If I have calls from callbackAPI like my own program, then it'll fail out much sooner with the same error:
CUPTI_ERROR_NOT_INITIALIZED
Any experience on this kinda issue? I really appreciate that!

According to NVIDIA forum posting here and also referred to here, the CUDA "tools" must be used uniquely. These tools include:
CUPTI
any profiler
cuda-memcheck
a debugger
Only one of these can be "in use" on a code at a time. It should be fairly easy for developers to use a profiler, or cuda-memcheck, or a debugger independently, but a possible takeaway for those using CUPTI, who also wish to be able to use another CUDA "tool" on the same code, would be to provide a coding method to be able to disable CUPTI use in their application, when they wish to use another tool.

Related

Debugging JavaScript in Edge & VS Code causes DCOM 10016 Event / Access violation

Environment: Windows 10
IDE: Visual Studio Code
Extensions: Live Server v5.7.5 by Ritwick Dey and Microsoft Edge Tools for VS Code v2.1.0
When I am debugging JavaScript files, if I put a break point in an exported class, I get the error shown in the image bellow.
I cleared the Windows System log, and right after I start debugging and get the error, a new entry is in the Windows system log. This happens every time without fail. The error in the Windows System log is:
The application-specific permission settings do not grant Local Activation permission
for the COM Server application with CLSID
{2593F8B9-4EAF-457C-B68A-50F6B8EA6B54}
and APPID
{15C20B67-12E7-4BB6-92BB-7AFF07997402}
to the user DOMAIN\\local_user SID (S-1-5-21-2158192427-3696246665-2163083460-1135) from
address LocalHost (Using LRPC) running in the application container Unavailable SID
(Unavailable). This security permission can be modified using the Component Services
administrative tool.
My question is how do I fix this issue?
Update 7/26/2022:
If I remove the breakpoint from the constructor of the class and put it elsewhere in the class, it works without any errors. The error occurs if the breakpoint is in the constructor.
I found the answer and it is not anything above.
Well, I finally solved the problem. I am updating this answer so that someone else will know the answer without going down all the wrong paths that I went down. The problems was not any of the tools. The problem was with the code. While technically the code was correct, executing it with a breakpoint caused the error I talked about above. I was able to fix this problem by moving all the class member variables to the top of the class before all member functions. The error only occurs when you add a breakpoint before the member variables are defined. Code analyzers say there is nothing wrong with the code. The error message could be more informative!
If you want to see example code associated with this problem. See this post

PowerShell Azure Function: How to fix "Failed to start a new language worker for runtime: powershell"?

In an Azure Function App containing one PowerShell function I get the following log message regularly:
"Failed to start a new language worker for runtime: powershell."
It has "Error" level and thus triggers our error alert notifications. I'm not entirely sure when this message appears. It might appear around restarting the function app, which might explain it somewhat. I think I remember it appearing during normal function operation - but I might be mistaken.
There is a rather involved thread over here about a similar message but for the dotnet runtime that suggests there are configuration options to configure: Azure Function - Failed to start a new language worker for runtime: dotnet-isolated
My function app runtime version is ~4, PS Core version is 7.0, platform is 64 bit and Windows.
What is the error message trying to telling me? Can I ignore it? Is there a configuration I can add to fix it?
After monitoring this for a while over the course of multiple deployments and restarts of PowerShell-based Function Apps my conclusion is this:
The error "Failed to start a new language worker for runtime: powershell." only appears when restarting the Function App. Thus my takeaway is that it can be ignored.

Keep getting VisaIOErrors after crash, unless device and ipython are rebooted

We are controlling a Keithley DMM6500 using the pyVisa library. In our setup, we are keeping an iPython kernel running (through Spyder).
The problem we're running into is the following: whenever a function that interacts with the DMM encounters an unhandled exception (like a KeyboardInterrupt), any subsequent calls to the DMM result in the error VI_ERROR_SYSTEM_ERROR (-1073807360): Unknown system error (miscellaneous error).
In order to fix this, we have tried to call device.clear() and device.close() / device.open(), but this doesn't seem to work. Even rebooting the device does not work. The only thing that fixes the issue, it seems, is to completely restart our iPython kernel.
Is there any way to programmatically restore communication with the device, such that we can avoid having to reboot the ipython kernel?
Some of your question is unclear so my answer might not help, however, it sounds like the terminal is locking the connection and you're loosing the reference.
The two way I have done this in the past:
Open the connection when talking to the device and close the connection when finished. This is useful if your connection is unstable but takes a fraction longer to open and close the connection a lot.
2)In your program you should have a try/except to handle the connection to the insturument and when the program errors you need to close the connection so that it doesn't become locked.
example:
try:
run_program()
except:
close_connection_to_all devices() # build a function to clear connection to all devices
dump_any_unsaved_data() # maybe you want to dump some of the variable to see what the data was when it errored for debug

Profiling foswiki with NYTProf results in incomplete profile data

I've an foswiki installation which is really slow (~ 60 seconds for a uncached page). I've tried to profile the installation with NYTProf, according to http://foswiki.org/Support/NYTProfDebugging with the following command:
> sudo -u www-data NYTPROF="file=/tmp/nytprof.out:addpid=1:endatexit=1" perl -wTd:NYTProf view -topic Some.Topic -username MyUsername
The script fails with an exit code 141 when I run it with profiler. If I run it without profiler (remote d:NYTProf) it exits successful and producing output.
After the profiling I've gotten a bunch of profile files in my /tmp directory:
nytprof.out.[841-1860]
But when I try to merge these files, I've get an error for the first file:
> nytprofmerge nytprof.out.*
Profile data incomplete, inflate error -5 ((null)) at end of input file, perhaps the process didn't exit cleanly or the file has been truncated (refer to TROUBLESHOOTING in the documentation)
I can merge the files without the first file, but the results are useless and shows only 87 calls to Foswiki::Sandbox::CORE:open and that's it.
Do I have any chance got get an valid profiling result? Or is there an other tool, that I can use in this case?
I'm not sure why you can't get NYTProfiler to work, we've used it to figure out some performance issues in Foswiki 2.0.2, which have been partially addressed in Foswiki 2.0.3. There are a couple of issues going on, but one major cause is our conversion to UNICODE internally, and some Perl regex issues in perl versions before 5.20. https://rt.perl.org/Public/Bug/Display.html?id=66852
Foswiki 2.0.3 made the following performance updates:
Changed some heavily called internal functions from regular expressions, to index()
Changed EditRowPlugin to generate less html that requires processing by regular expressions in the rendering module.
Made some other improvements to reduce excessive re-reading of topics.
If 2.0.3 doesn't significantly help, Check to see if the problem pages have large tables in them. If so, you might try disabling the EditRowPlugin and use EditTablePlugin.
Other than that, you might try our official support channel #foswiki on IRC, http://irclogs.foswiki.org/
The script fails with an exit code 141 when I run it with profiler.
That suggests the process received a SIGPIPE signal. The sigexit option may help.
If I run it without profiler ... it exits successful and producing output.
You're using sudo so permissions might be an issue, but that's just a guess. You'll need to dig deeper to confirm if a SIGPIPE is being received and why.
I'm not familiar with foswiki. Perhaps someone in that community could be more helpful.

When I run myTest.js in Chrome an error appears, what's wrong with my dalek.js file?

This is the error message that I have every time I run myTest.js. This happens after I installed the my IE driver in my VirtualBox so that I can test from IE browsers.
cor03rock at Rockys-MacBook-Pro in ~/Desktop/Jalekoo on dev*
💩 dalek myTest.js -b chrome
/Users/cor03rock/Desktop/Jalekoo/node_modules/dalekjs/lib/dalek.js:333
this.driverEmitter.emit('killAll');
^
TypeError: Cannot call method 'emit' of undefined
at Object.Dalek._shutdown (/Users/cor03rock/Desktop/Jalekoo/node_modules/dalekjs/lib/dalek.js:333:24)
at process.EventEmitter.emit (events.js:95:17)
at process._fatalException (node.js:272:26)
Jalekoo is my Dalek folder.
It would probably help if you could post a reduced test case & what DalekJS, NodeJS version your on, as well as your operating system. Maybe your projects directory layout could also help spotting the error.
You are also talking about having installed the IE driver & your example call shows -b chrome. So do you might have mistaken that.
Also, does that error occur with other browsers (PhantomJS for example) or only for IE (Chrome?).
Would love to help you out, but I need some more information to be able to.