add_disk() hangs on insmod - linux-device-driver

add_disk() hangs on insmod - linux-device-driver

I am writing a Linux block device driver and I have a lot of the initialisation stuff working. However, when I finally call add_disk(), the module hangs during insmod.
The offending snippet is here:
set_capacity(gendisk, dev->nsectors);
add_disk(gendisk);
//this line is never reached

This appears to be caused by setting the capacity with set_capacity() before adding the disk. According to this mailing list, add_disk should be called on a gendisk with gendisk->capacity = 0, otherwise it hangs in check_partition().
The following appears to work:
set_capacity(gendisk, 0)
add_disk(gendisk);
set_capacity(gendisk, dev->nsectors);

Related

Boost Asio tcp::iostream construction raise an Access Violation Exception on every second use

I am trying to use the implementation of std::iostream provided by boost::asio on top of boost::asio::ip::tcp::socket. My code replicate almost line to line the example that is published in Boost Asio's documentation:
#include <iostream>
#include <stdexcept>
#include <boost/asio.hpp>
int main()
{
using boost::asio::ip::tcp;
try
{
boost::asio::io_service io_service;
tcp::endpoint endpoint(tcp::v4(), 8000);
tcp::acceptor acceptor(io_service, endpoint);
for (;;)
{
tcp::iostream stream; // <-- The exception is triggered on this line, on the second loop iteration.
boost::system::error_code error_code;
acceptor.accept(*stream.rdbuf(), error_code);
std::cout << stream.rdbuf() << std::flush;
}
}
catch (std::exception& exception)
{
std::cerr << exception.what() << std::endl;
}
return 0;
}
The only difference is the use I make of the resulting tcp::iostream: I forward everything I receive to the standard output.
When I compile this code with VisualStudio2019/toolset v142 and Boost from the NuGet boost-vc142, I get an Access Violation Exception only in the second iteration in the for loop, in the function
template <typename Service>
Service& service_registry::use_service(io_context& owner)
{
execution_context::service::key key;
init_key<Service>(key, 0);
factory_type factory = &service_registry::create<Service, io_context>;
return *static_cast<Service*>(do_use_service(key, factory, &owner));
} // <-- The debugger show the exception was raised on this line
in asio/detail/impl/service_registry.hpp. So the first iteration everything goes as planned, the connection is accepted, the data shows up on the standard output, and as soon as the stream is instanciated on the stack for the second time, the exception pops.
I don't have a high confidence in the accuracy of this location of the exception reported by the debugger. For some reason, the stack seams to be messed up and show only one frame.
If the declaration of stream is moved out of the loop, no exception is raised any more but then I need to stream.close() at the end of the loop, or nothing shows up on the standard output except the data from the first client's connection.
Basically, as soon as I try to instanciate more than one boost::asio::tcp::iostream (not necessarily at the same time), the exception is raised.
I tried the exact same code under linux (Arch linux, latest version of g++, same version of Boost) and everything works perfectly.
I could work around this issue by not using iostreams, but my idea is to feed the data received on the tcp socket to a parser which only accept implementations of std::iostream, hence I would still need to wrap asio's tcp socket in an homebrewed (and mediocre) implementation of std::iostream.
Does anybody have an idea on what's wrong with this setup, if I missed a crucial #define somewhere or anything?
Update:
Subsequent investigation show that the only situation where the access violation happens is when the executable is run from within Visual Studio (typ. from the menu Debug -> Start Debugging).
The build process seems to have no effect (calling directly cl.exe, using MSBuild, using devenv.exe).
Moreover, if the executable is run from a command prompt, and only then the debugger is attached, no access violation happens.
At this point, the issue is most likely not linked to the code itself.

Okay, it was exceedingly painful to test this on windows.
Of course I first tried on Linux (clang/gcc) and MingW 8.1 on windows.
Then I bit the bullet and jumped the hoops to get MSVC in command line with boost packages¹.
I cheated by manually copying the .lib/.dll for boost_{system,date_time,regex} into the working directory so the command line stayed "wieldy":
C:\work>C:\Users\sghee\Downloads\nuget.exe install boost_system-vc142
C:\work>C:\Users\sghee\Downloads\nuget.exe install boost_date_time-vc142
C:\work>C:\Users\sghee\Downloads\nuget.exe install boost_regex-vc142
(Be sure to get some coffee during those)
C:\work\> cl /EHsc test.cpp /I .\boost.1.72.0.0\lib\native\include /link
Now I can run test.exe
C:\work\> test.exe
And it listens fine, accepts connections (sequentially, not simultaneously). If you connect a second client while the first is still connected, it will be queued and be accepted only after the first disconnects. That's fine, because it's what you expect with the synchronous accept and loop.
I used Ncat.exe (from Nmap) to connect:
C:\Program Files (x86)\Nmap>.\ncat.exe localhost 8000
Quirk: The buffering was fine with the MSVC cl.exe build (linewise) as opposed to MingW behaviour, even though MingW also uses ws2_32.dll. #trivia
I know this doesn't "help", but maybe you can compare notes and see what is different with your system.
Video Of Test
¹ (that's a tough job without VS and also I - obviously - ran out of space, because 50GiB for a VM can't be enough right)

Problems in exit code using C++ AMP

Environment: Visual Studio 2017, Windows 10 ver. 1709. Compiling mode: release.
When I call:
accelerator_view acc_view = accelerator().default_view;
an exception is raised (see figure link below), but the code performs fine afterwards.
But when the executable process exits and I call:
::GetExitCodeProcess(hChildProcess, &retVal);
from a caller process, instead of returning 0, it returns a garbage value in retVal.
Digging the source code, the problem seems to be in the snipped code below (SchedulerBase.cpp, line 149)
// Auto-reset event that is not signalled initially
m_hThrottlingEvent = platform::__CreateAutoResetEvent();
// Use a trampoline for UMS
if (!RegisterWaitForSingleObject(&m_hThrottlingWait, m_hThrottlingEvent, SchedulerBase::ThrottlerTrampoline, this, INFINITE, WT_EXECUTEDEFAULT))
{
throw scheduler_resource_allocation_error(HRESULT_FROM_WIN32(GetLastError()));
}
I think it is beyond my hands to fix it, because the code above is inside MFC. The same code works well when compiling with Visual Studio 2013. Refer to the figure attached of the stack, showing the raised exception (and catched inside) when I call
accelerator_view acc_view = accelerator().default_view;
The question: how to clean up the AMP before exiting and the getting the correct result when calling GetExitCodeProcess()?
Here is the figure:

Solved! If you add
concurrency::amp_uninitialize();
after using AMP framework, when the caller process calls
::GetExitCodeProcess(hChildProcess, &retVal);
The retVal parameter is filled correctly.

CLIPS (clear) command fails / throws exception in pyclips

I have a pyclips / clips program for which I wrote some unit tests using pytest.
Each test case involes an initial clips.Clear() followed by the execution of real clips COOL code via clips.Load(rule_file.clp). Running each test individually works fine.
Yet, when telling pytest to run all tests, some fail with ClipsError: S03: environment could not be cleared. In fact, it depends on the order of the tests in the .py file. There seem to be test cases, that cause the subsequent test case to throw the exception.
Maybe some clips code is still "in use" so that the clearing fails?
I read here that (clear)
Clears CLIPS. Removes all constructs and all associated data structures (such as facts and instances) from the CLIPS environment. A clear may be performed safely at any time, however, certain constructs will not allow themselves to be deleted while they are in use.
Could this be the case here? What is causing the (clear) command to fail?
EDIT:
I was able to narrow down the problem. It occurs under the following circumstances:
test_case_A comes right before test_case_B.
In test_case_A there is a test such as
(test (eq (type ?f_bio_puts) clips_FUNCTION))
but f_bio_puts has been set to
(slot f_bio_puts (default [nil]))
So testing the type of a slot variable, which has been set to [nil] initially, seems to cause the (clear) command to fail. Any ideas?
EDIT 2
I think I know what is causing the problem. It is the test line. I adapted my code to make it run in the clips Dialog Windows. And I got this error when loading via (batch ...)
[INSFUN2] No such instance nil in function type.
[DRIVE1] This error occurred in the join network
Problem resided in associated join
Of pattern #1 in rule part_1
I guess it is a bug of pyclips that this is masked.

Change the EnvClear function in the CLIPS source code construct.c file adding the following lines of code to reset the error flags:
globle void EnvClear(
void *theEnv)
{
struct callFunctionItem *theFunction;
/*==============================*/
/* Clear error flags if issued */
/* from an embedded controller. */
/*==============================*/
if ((EvaluationData(theEnv)->CurrentEvaluationDepth == 0) &&
(! CommandLineData(theEnv)->EvaluatingTopLevelCommand) &&
(EvaluationData(theEnv)->CurrentExpression == NULL))
{
SetEvaluationError(theEnv,FALSE);
SetHaltExecution(theEnv,FALSE);
}

Multiple GPU code on Matlab runs for few seconds only

I am running the following MATLAB code on a system with one GTX 1080 and a K80 (with 2 GPUs)
delete(gcp('nocreate'));
parpool('local',2);
spmd
gpuDevice(labindex+1)
end
reset(gpuDevice(2))
reset(gpuDevice(3))
parfor i=1:100
SingleGPUMatlabCode(i);
end
The code runs for around a second. When I rerun the code after few seconds. I get the message:
Error using parallel.gpu.CUDADevice/reset
An unexpected error occurred during CUDA execution. The
CUDA error was:
unknown error
Error in CreateDictionary
reset(gpuDevice(2))
I tried increasing TdrDelay, but it did not help.

Something in your GPU code is causing an error on the device. Because the code is running asynchronously, this error is not picked up until the next synchronisation point, which is when you run the code again. I would need to see the contents of SingleGPUMatlabCode to know what that error might be. Perhaps there's an allocation failure or an out of bounds access. Errors that aren't correctly handled will get converted to 'unknown error' at the next CUDA operation.
Try adding wait(gpuDevice) inside the loop to identify when the error is occurring.
If either device 2 or 3 are the GTX1080, you may have discovered an issue with MATLAB's restricted support for the Pascal architecture. See https://www.mathworks.com/matlabcentral/answers/309235-can-i-use-my-nvidia-pascal-architecture-gpu-with-matlab-for-gpu-computing
If this is caused by the Windows timeout, you would see a several second screen blackout.

gem5 cache statistics - reset and dump

I am trying to get familiar with gem5 simulator.
To start, I wrote a simple program with
int main()
{
m5_reset_stats(0, 0);
m5_dump_stats(0, 0);
return 0;
}
I compiled it with util/m5/m5op_x86.S and ran it using...
./build/X86/gem5.opt configs/example/se.py --caches -c ~/tmp/hello
The m5out/stats.txt shows (among other things)...
system.cpu.dcache.ReadReq_hits::total 881
system.cpu.dcache.WriteReq_hits::total 917
system.cpu.dcache.ReadReq_misses::total 54
system.cpu.dcache.WriteReq_misses::total 42
Why is an empty function showing so much hits and misses? Are the hits and misses caused by libc? If so, then what is the purpose of m5_reset_stats() and m5_dump_stats()?

I would check in the stats.txt file if there are two chunks of
---Begin---
---End-----
because as you explained it, the simulator is supposed to dump the stats at dump_stats(0,0) and at the end of the run. So, it seems like you either are looking at one of those intervals (and I would expect the other interval to have 0 for all stats); or there was a bug in the simulation and the dump_stats() (or reset_stats())didn't actually do anything. That actually happened to me plenty of times, but I am not really sure as to the source of this bug.
If you want to troubleshoot further, you could do the following:
Look at the disassembly of your code and find the reset_stats.w and dump_stats.w
Dump a trace from gem5 and see if it ends up executing the dump and reset instructions and also what instructions (and how many) are executed before/after.
Hope this helps!

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse