problem with nvmex in matlab

problem with nvmex in matlab - matlab

i have installed matlab on my system and also have installed the CUDA SDK for windows. however i am not able to compile any .cu files. I have included the nvmex script file in the bin directory of the Matlab installation path. Can some body help?

nvmex isn't really supported in any recent versions of Matlab or the Cuda SDK. Instead, I would suggest writing a simple DLL in Visual Studio which uses the standard MEX interface to run Cuda. I'm going to assume that your project is called "addAtoB" and that you just want to add two numbers together to make the example simpler.
When you installed the Cuda SDK, you need to tell it to add the CUDA Custom Build Rules to Visual Studio so that it will know how to compile .CU files.
Your main cpp should look something like this:
// addAtoB.cpp
#include <mex.h>
#include <cuda.h>
#include <driver_types.h>
#include <cuda_runtime_api.h>
#pragma comment(lib,"libmx.lib") // link with the Matlab MEX API
#pragma comment(lib,"libmex.lib")
#pragma comment(lib,"cudart.lib") // link with CUDA
// forward declare the function in the .cu file
void runMyCUDAKernel(void);
// input and output variables for the function in the .cu file
float A, B, C;
void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[] )
{
A = (float) mxGetScalar(prhs[0]);
B = (float) mxGetScalar(prhs[1]);
runMyCUDAKernel();
// allocate output
nlhs = 1;
plhs[0] = mxCreateDoubleScalar(C);
mexPrintf("GPU: %f + %f = %f\nCPU: %f", A, B, C, A+B);
cudaDeviceReset();
}
You need add several directories to your Include Path: C:\Program Files\MATLAB\R2009a\extern\include and the CUDA directories.
Add to your Linker Path: C:\Program Files\MATLAB\R2009a\extern\lib\win32\microsoft , $(CUDA_PATH)\lib\$(PlatformName)
Next, add a .DEF file to your project which looks something like this:
LIBRARY "addAtoB"
EXPORTS
mexFunction
Next, create a file called runMyCUDAKernel.cu in the current directory, type in contents something like this, and then add the file to your project:
// runMyCUDAKernel.cu:
#include <cuda.h>
extern float A, B, C;
// Device kernel
__global__ void addAtoB(float A1, float B1, float* C1)
{
*C1 = A1+B1;
}
void runMyCUDAKernel(void)
{
float* pOutput;
cudaMalloc( (void**) &pOutput, 1*sizeof(float));
dim3 dimBlock(1, 1);
dim3 dimGrid(1, 1);
addAtoB<<< dimGrid, dimBlock >>>(A, B, pOutput);
cudaMemcpy( &C, pOutput, 1*sizeof(float), cudaMemcpyDeviceToHost);
cudaDeviceSynchronize();
cudaFree(pOutput);
}
Build the DLL and rename it from .dll to .mexw32 (or .mexw64, if you're using a 64-bit Matlab). Then you should be able to run it with the command addAtoB(1, 2).

I would suggest using CUDA mex from the Matlab file exchange.
It enables you to compile through Matlab. This gets better compatibility across Matlab and Visual Studio versions by not forcing you to specify the mex dependencies explicitly through Visual Studio.

Related

'Concurrency': a namespace with this name does not exist

I am an amateur C# programmer who strayed into C++ because of a need for the C++ AMP technology for some heavy-duty number crunching. Hence, my C++ programming skills are not very well developed.
For my first attempt at an actual program, I chose code based on a Daniel Moth's April 2012 article. I cannot get it to build. I always get the error:
C2871 ‘Concurrency’: a namespace with this name does not exist.
I know that the code was first written for Visual Studio 11, but I only had VS 2008 and VS 2010 on my machine. So, I installed VS 2017 (version 15.9.4, .Net 4.7.03062). I started with an empty C++ project but had trouble with it. The best I could do, after I worked through all the things it didn’t recognize, was an error:
C3861: ‘access’ identifier not found, line 2616 in file ‘amp.h’.
So I started again, this time with an empty Windows Console Application project. Again, I had to tweak the code considerably to migrate from Visual Studio 11 to VS 2017, but ended up with code as shown below.
I tried what I could to find the source of the error. I targeted both x64 and x86, but it made no difference. I commented out line 5 and lines 21 – 27, and the code would build and execute. IntelliSense showed no problems, either with identifiers or syntax. In fact, the mouse-over info recognized the Concurrency constructs as such. I deliberately misspelled Concurrency, but IntelliSense caught that right away. I looked through the project properties with an eye toward a setting that needed to be changed to run AMP, but as I wasn’t even sure what I was looking for, I didn’t find anything.
I tried to find the name of the file in which Concurrency is defined, so that I could search my machine to see if it was there, and to add a path if it was, but was unsuccessful. I couldn’t even find the file name. I googled and pored through on-line sources and MS Docs, but no matter how I phrased my search questions, I didn’t find any answers.
The error says:
Concurrency does not exist
which to me means it can’t find it, it’s not on the machine, or some build setting is preventing it from being used. Most of the on-line articles about writing AMP code say nothing about build settings. Does it not require anything different than a serially-coded project? Is it as simple as a missing reference? If so, where do I go to find it? With my limited experience, I don’t know what else to try.
My machine is a Win 7 SP1 box. The KB2999226 bug fix has been installed. I didn’t install all of VS 2017 since I am only interested in C# and C++. Did I fail to install something I should have?
If this problem was addressed before, I couldn’t find it. So, any help would be appreciated.
1. #include <amp.h>
2. #include "pch.h"
3. #include <iostream>
4. #include <vector>
5. using namespace Concurrency;
6.
7. int main() {
8. const int M = 1024; const int N = 1024; //row, col for vector
9. std::vector<int> vA(M*N); std::vector<int> vB(M*N); //vectors to add
10. std::vector<int> vC(M*N); //vector for result
11.
12. for (int i = 0; i < M; i++) { vA[i] = i; } //populate vectors
13. for (int j = N - 1; j >= 0; j--) { vB[j] = j; }
14.
15. for (int i = 0; i < M; i++) { //serial version of
16. for (int j = 0; j < N; j++) { //matrix addition
17. vC[i*N + j] = vA[i*N + j] + vB[i*N + j]; //using vectors
18. }
19. }
20.
21. extent<2> e(M, N); //uses AMP constructs but no
22. array_view<int, 2> a(e, vA), b(e, vB); //parallel functions invoked
23. array_view<int, 2> c(e, vC);
24. index<2> idx(0, 0);
25. for (idx[0] = 0; idx[0] < e[0]; idx[0]++) {
26. for (idx[1] = 0; idx[1] < e[1]; idx[1]++) {
27. c[idx] = a[idx] + b[idx];
28. }
29. }
30. // C2871 'Concurrency': a namespace with this name does not exist. Line 5
31. // Also C2065, C3861, C2062 for all Concurrency objects Line 21 - Line 27
32. }
33.

With,
#include "amp.h"
#include "pch.h"
#include <iostream>
using namespace concurrency;
I get,
C2871 'concurrency': a namespace with this name does not exist
However, with,
#include "pch.h"
#include <iostream>
#include "amp.h"
using namespace concurrency;
there is no error.
I suggest moving #include "amp.h" as shown.
I also used both concurrency and Concurrency. There was no difference.
For the error C3861: ‘access’ identifier not found, line 2616 in file ‘amp.h’.
From the menu, select Project, then select Properties.
In the Property Pages window, under C/C++, select All Options, then select Conformance mode.
Change Yes (/permissive-) to No. Select OK.
Build the project and run.
By default, the /permissive- option is set in new projects created by Visual Studio 2017 version 15.5 and later versions. It is not set by default in earlier versions. When the option is set, the compiler generates diagnostic errors or warnings when non-standard language constructs are detected in your code, including some common bugs in pre-C++11 code.
More information may be found here.
This suggests, to me, that "amp.h" is not conforming to the changes made to C++ 15.5. Thus it worked with C++ in VS 2015 14.0 (Update 3), then failed with C++ in VS 2017 15.9.5.

How to see the Auto Reference section on Plugin Inspector?

In the latest Unity manual
https://docs.unity3d.com/2019.1/Documentation/Manual/PluginInspector.html
they assert that the Plugin Inspector
now features an "Auto Reference" concept:
So using the latest Unity (and even trying .2 etc),
However no matter what I do I cannot make this appear. Every single Unity project I have tried, even Unity examples, does not have the feature.
How it looks for me ..
What is going on?
how to access the Auto Reference ?

tl;dr- "Auto reference" only works for managed plugins. that is a .dll file that was written in, and compiled from C#. Unmanaged plugins (dll's written in a language that is not C#, are unmanaged and can't be auto referenced)
edit: I just noticed there were more hidden comments, one of which was Aybe mentioning it working for managed DLL's.
edit2: if you want the project to test it out i can upload it.
I wanted to check if there was a difference between managed and unmanaged DLL's when inspecting in the editor (testing in Unity 2019, but I assume the same goes for 2018).
I made the following two DLL's. One in C# (managed) and one in CPP (unmanaged). I added some simply functionality to it to make sure it wouldn't be caused by having an empty dll.
Managed C# plugin
using System;
namespace TestDLLManaged
{
public class TestDLLManaged
{
public static float Multiply(int a, float b)
{
return a * b;
}
}
}
Compiled it into a DLL targeting .Net 3.5 framework (unity 2018 and later versions support 4.x, but wanted to play it on the safe side) and placed the .dll file in the /Assets/ folder (Apparantly the Assets/Plugin folder is intended to be used with native/unmanaged plugins, and not managed).
Unmanaged/native C++ plugin
//header filer
#pragma once
#define TESTDLLMULTIPLY_API __declspec(dllexport)
extern "C"
{
TESTDLLMULTIPLY_API float MultiplyNumbers(int a, float b);
}
//body
#include "TestDLLMultiply.h"
extern "C"
{
float MultiplyNumbers(int a, float b)
{
return a * b;
}
}
Also compiled this into a dll, and placed it in the /Assets/Plugin folder.
I call both DLL's inside DLLImportTest.cs and perform a simple calculation to make sure both DLL's are actually imported, and functioning like so
using static TestDLLManaged.TestDLLManaged;
public class DLLImportTest : MonoBehaviour
{
const float pi = 3.1415926535f;
[DllImport("TestDLL", EntryPoint = "MultiplyNumbers")]
public static extern float UnmanagedMultiply(int a, float b);
// Use this for initialization
void Start()
{
UnityEngine.Debug.LogFormat("validating unmanaged, expeceted result = 100: {0}", UnmanagedMultiply(10, 10f));
UnityEngine.Debug.LogFormat("validating managed, expeceted result = 100: {0}", Multiply(10, 10f));
}
}
When inspecting the DLL's in the editor it seems that the Managed (C#) plugin does have the option to auto reference and the Unmanaged/native (cpp) dll indeed doens't have the functionality. Now I don't actually know why this is the case, as it is nowhere to be found in the documentation. Maybe it's a bug, maybe there is another reason behind it. I may make a forum post about it later asking for more clarification.
As a little extra I decided to run a benchmark the two function, and to my surprise found that the managed C# plugin was actually faster than the cpp one.
private void BenchMark()
{
Stopwatch watch1 = new Stopwatch();
watch1.Start();
for (int i = 0; i < 10000000; i++)
{
UnmanagedMultiply(1574, pi);
}
watch1.Stop();
UnityEngine.Debug.LogFormat("Unmanaged multiply took {0} milliseconds", watch1.Elapsed);
Stopwatch watch2 = new Stopwatch();
watch2.Start();
for (int i = 0; i < 10000000; i++)
{
Multiply(1574, pi);
}
watch2.Stop();
UnityEngine.Debug.LogFormat("Managed multiply took {0} milliseconds", watch2.Elapsed);
}
Results:
Unmanaged multiply took 00:00:00.1078501 milliseconds
Managed multiply took 00:00:00.0848208 milliseconds
For anyone wishing to view the differences/experiment with it themselves, i've made a git-hub repo here containing the project i used above.

This question has been nicely resolved by #remy_rm
Compiled c# dlls ("managed plugins") do have the auto-reference feature
Actual native plugins ("unmanaged plugins") do NOT have the auto-reference feature
In fact this does apply identically on both PC and Mac:
Unity (sometimes) refers to:
c# compiled as a dll as "managed plugins"; and they (sometimes) refer to
native plugins (say, an actual static library for iPhone which is compiled C) as "unmanaged plugins".
(Whereas, all other Unity-related writing on the www generally refers to compiled c# as "dlls" and native plugins as "plugins".)
The auto-reference system is only for compiled c# .. "managed plugins".
A huge thanks to #remy_rm for spending hours resolving this issue.
Unity are trying and trying to improve their comic documentation - not quite there yet :)

Eclipse Nsight CDT plugin pressing F3 to Open Declaration goes to the wrong line

Nsight Eclipse Edition
Version: 5.5.0
CDT version: 8.1.2.nvidia-qualifier
Quick reference upon mouse over pops up the wrong declaration. Usually it's a function, which is located at the same header file, as the one I'm looking for, but it has no relation to it so far. For example:
For cudaMemcpy() it shows me this function from "cuda_runtime_api.h":
extern __host__ cudaError_t CUDARTAPI cudaPointerGetAttributes(struct cudaPointerAttributes *attributes, void *ptr);
For cudaMalloc() it gives me the description of:
extern __host__ cudaError_t CUDARTAPI cudaMemcpy2DToArray(struct cudaArray *dst, size_t wOffset, size_t hOffset, const void *src, size_t spitch, size_t width, size_t height, enum cudaMemcpyKind kind);
Why is indexing behaving this way? I'm getting tired of it after a couple of days working, but still couldn't find any obvious solution for this issue.

I will log this issue in our issue tracker. Sorry for the inconvenience. There is no workaround available.
Note that for performance reasons, Nsight does not index those files on your system. Instead, it comes prepackaged with compiled index files - apparently some headers might be different from the versions Nsight index was built from.

Is it possible to create a copy-on-write copy of a file on an iOS device?

I need to copy a file that will be modified later on an iOS device. For performance reasons, it would be great if this would work copy-on-write, i.e. the file is not really duplicated, and only modified blocks of the copy are written later.
As pointed out in the comments, this probably has to be supported by the file system (HFS+?). A link is not sufficient, since both the old (A) and new (B) file name will point to the same file, and if I modify A, B will also change.
A "lazy" copy also would not help, since on first write the whole file would still need to be copied.
I was thinking more about a solution like the one described by #Hot Licks that would start with A and B using the same blocks on disk, and when I write to file B, only the modified blocks would be stored on disk, while identical parts in A and B go on using the same blocks on disk.
Is this possible on iOS?
Regards,
Jochen

There's no built-in mechanism for doing efficient partial copies of files, but if you're copying a file and making internal changes to the content, then the most efficient mechanism to use is mmap. You map the file into memory and modify it in-place. The changes are written back to the file automatically without needing to rewrite the file in pieces.
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mman.h>
struct stat astat;
int fd = open("filename", O_RDWR);
if ((fd != -1) && (fstat(fd, &astat) != -1)) {
char *data = mmap(0, astat.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (data != MAP_FAILED) {
self.data_ptr = data;
self.data_size = astat.st_size;
}
close(fd);
}
When you're done with the file, you use munmap to release the mapping back to the os:
munmap(self.data_ptr, self.data_size);
The usual caveats apply from modifying a shared resource.

Build error when using OpenGL in my program

I'm using the following code (which I found on the web) to create a basic OpenGL program:
#include <GL/gl.h>
#include <GL/glu.h>
#include <GL/glut.h>
#define window_width 640
#define window_height 480
// Main loop
void main_loop_function()
{
// Z angle
static float angle;
// Clear color (screen)
// And depth (used internally to block obstructed objects)
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
.
.
.
}
.
.
.
I'm using Ubuntu 12.04 and Eclipse 3.7.2. The program compiles and actually runs, but strangely I get an error showing up in my code. The
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
line has the error:
Multiple markers at this line
- Invalid arguments ' Candidates are: void glClear(unsigned int) '
- Symbol 'GL_COLOR_BUFFER_BIT' could not be resolved
- Symbol 'GL_DEPTH_BUFFER_BIT' could not be resolved
Everything I've tried so far does not remove this error from the IDE. Any help would be welcomed.
NB if I change the line to
glClear(GL_COLOR_BUFFER_BIT);
or
glClear(GL_DEPTH_BUFFER_BIT);
then the error goes away...

I solved this problem by activating "Preferences -> C/C++ -> Indexer -> Use active build configuration" and then rebuilding the project. It now finds the symbol.