Generating code by Simulink (Matlab R2011A on MacOS 64bit)
I got a problem: it uses ceil function inside the code, but it isn't supported on my target platform.
I'm generating using ERT, for Arm Cortex processor (on a Cypress PSoC).
Is it possible to solve this problem?
I tried the solution without success.
Also in Code Generation - Interface, I tried to disable floating-point or not-finite numbers... but in this way every signal of my project raises some errors (same behaviour also changing it data-type).
Really thanks to anyone suggests me what I can try to do
You could write your own ceil function and include it in whatever your output code is for your target device. Assuming you are generating C-code, the function would look something like:
int ceil (double number) {
if (number == 0)
return 0;
if (number > 0) {
if (number - (int) number > 0)
return (int) number + 1;
else
return (int) number;
}
else {
if (number - (int) number < 0)
return (int) number - 1;
else
return (int) number;
}
}
With a prototype in your header file like:
int ceil (double);
Now your C-code can call integerValuedNumber = ceil(doubleValuedNumber) and it should work. You can also do this with macros in the C-file (see nintendo's answer).
EDIT: I corrected my code to use proper type casting syntax for C. Basically what you are doing with the (int) number syntax is taking the double-valued number variable and forcing it to be an integer. You can find more information about data types in C here, or Google "type casting C" or "data types C" for more information.
Also, some additional parenthesis might be needed, like return ((int) number) + 1; and similar. I am a little rusty on my C-programming, but hopefully this gets you going toward a viable solution.
EDIT 2: I corrected the return data type of our self-defined ceil function. You would want this to return an int, or maybe long. Again, check out documentation on data types in C if you are not sure what data type is appropriate for your application. If the values you are applying ceil to are not very large (less than +/- 2^15 for instance) then int is probably fine.
Ok... I solved.
The problem was in the destination environment (PSoC Creator).
As explained here http://www.cypress.com/?id=4&rID=42838 :
Go to Project -> Build Settings -> Linker -> General -> Additional Libraries. Type m in the Additional Libraries field.
If you are not adding this Additional Library then you will get the following Build error "undefined reference to `sqrt'" where sqrt is a math function.
Nothing changes if the problem is with sqrt() or ceil(), because they are in the same library (math.h).
PS: thank you Engineero... your solution is very useful, and can be apprecied from other people with my problem (but in other environments).
Related
I am new to Specman and trying to learn it by reading existing code.
I came across the following function and can't find an explanation in specman documentation...
VerifyNode(end_point:string, derived_value:uint) is also {
if ('(end_point)' === ~derived_value) {
message("Error1");
}
else if ('(end_point)' === derived_value) {
message("Error2");
}
else {
message("Error3");
}
}
I assume logically the '(end_point)' is getting actual value of end_point signal in run-time. Is that true?
Error1 will be the message if value of end_point at run time is negate the derived_value unsigned integer.
Error2 will be the message if value of end_point at run time is the same as the derived_value unsigned integer.
How can i explain the "Error3" condition?
Indeed this is the very old school of signal access based on string. I recommend you search for "simple_port".
The end_point is a string, that should be the full HDL path of the signal you want to read. Example of calling this method:
// check if signal has the value 3:
VerifyNode("top.module_a.sig_val", 3);
The official name of '' is called the tick notation in Specman documentation. Search for the term "tick notation" in cdnshelp and you will find it.
'(end_point)' is two things. It is old-school, non-port access to the device under test (DUT) and the (end_point) part of that snippet is the way to do string substitution directly in the '' access to the DUT. Using the old style of access to the DUT can drastically slow down simulation speed, if your design is big enough for that to be a concern for you. Also, this methodology lacks more static compile time checks as those strings can be anything and you'll only know the path is wrong when you simulate it and the DUT path cannot be found, creating an error.
This is kind of a weird and un-Swift-thonic question, so bear with me.
I want to do in Swift something like the same thing I'm currently doing in Objective-C/C++, so I'll start by describing that.
I have some existing C++ code that defines a macro that, when used in an expression anywhere in the code, will insert an entry into a table in the binary at compile time. In other words, the user writes something like this:
#include "magic.h"
void foo(bool b) {
if (b) {
printf("%d\n", MAGIC(xyzzy));
}
}
and thanks to the definition
#define MAGIC(Name) \
[]{ static int __attribute__((used, section("DATA,magical"))) Name; return Name; }()
what actually happens at compile time is that a static variable named xyzzy (modulo name-mangling) is created and allocated into the special magical section of my Mach-O binary, so that running nm -m foo.o to dump the symbols shows something a lot like this:
0000000000000098 (__TEXT,__eh_frame) non-external EH_frame0
0000000000000050 (__TEXT,__cstring) non-external L_.str
0000000000000000 (__TEXT,__text) external __Z3foob
00000000000000b0 (__TEXT,__eh_frame) external __Z3foob.eh
0000000000000040 (__TEXT,__text) non-external __ZZ3foobENK3$_0clEv
00000000000000d8 (__TEXT,__eh_frame) non-external __ZZ3foobENK3$_0clEv.eh
0000000000000054 (__DATA,magical) non-external [no dead strip] __ZZZ3foobENK3$_0clEvE5xyzzy
(undefined) external _printf
Through the magic of getsectbynamefromheader(), I can then load the symbol table for the magical section, scan through it, and find out (by demangling every symbol I find) that at some point in the user's code, he calls MAGIC(xyzzy). Eureka!
I can replicate the whole second half of that workflow just fine in Swift — starting with the getsectbynamefromheader() part. However, the first part has me stumped.
Swift has no preprocessor, so spelling the magic as elegantly as MAGIC(someidentifier) is impossible. I don't want it to be too ugly, though.
As far as I know, Swift has no way to insert symbols into a given section — no equivalent of __attribute__((section)). This is okay, though, since nothing in my plan requires a dedicated section; that part just makes the second half easier.
As far as I know, the only way to get a symbol into the symbol table in Swift is via a local struct definition. Something like this:
func foo(b: Bool) -> Void {
struct Local { static var xyzzy = 0; };
println(Local.xyzzy);
}
That works, but it's a bit of extra typing, and can't be done inline in an expression (not that that'll matter if we can't make a MAGIC macro in Swift anyway), and I'm worried that the Swift compiler might optimize it away.
So, there are three questions here, all about how to make Swift do things that Swift doesn't want to do: Macros, attributes, and creating symbols that are resistant to compiler optimization.
I'm aware of #asmname but I don't think it helps me since I can already deal with demangling on my own.
I'm aware that Swift has "generics", but they seem to be closer to Java generics than to C++ templates; I don't think they can be used as a substitute for macros in this particular case.
I'm aware that the code for the Swift compiler is now open-source; I've skimmed bits of it in vain; but I can't read through all of it looking for tricks that might not even be there.
Here is the answer to your question about preprocessor (and macros).
Swift has no preprocessor, so spelling the magic as elegantly as MAGIC(someidentifier) is impossible. I don't want it to be too ugly, though.
Swift project has a preprocessor (but, AFAIK, it is not distributed with Swift's binary).
From swift-users mailing list:
What are .swift.gyb files?
It’s a preprocessor the Swift
team wrote so that when they needed to build, say, ten nearly-identical
variants of Int, they wouldn’t have to literally copy and paste the same
code ten times. If you open one of those files, you’ll see that they’re
mainly Swift code, but with some lines of code intermixed that are written in Python.
It is not as beautiful as C macros, but, IMHO, is more powerful.
You can see the available commands with ./swift/utils/gyb --help command after cloning the Swift's git repo.
$ swift/utils/gyb --help
usage, etc (TL;DR)...
Example template:
- Hello -
%{
x = 42
def succ(a):
return a+1
}%
I can assure you that ${x} < ${succ(x)}
% if int(y) > 7:
% for i in range(3):
y is greater than seven!
% end
% else:
y is less than or equal to seven
% end
- The End. -
When run with "gyb -Dy=9", the output is
- Hello -
I can assure you that 42 < 43
y is greater than seven!
y is greater than seven!
y is greater than seven!
- The End. -
My example of GYB usage is available on GitHub.Gist.
For more complex examples look for *.swift.gyb files in #apple/swift/stdlib/public/core.
TL;DR
I'm looking for a way to extract a part of an existing CUDA Toolkit example and turn it into a CUDAKernel executable in MATLAB.
The Story
In an attempt to obtain a short-runtime implementation of the non-local means (NLM) 2D filter, I stumbled upon the imageDenoising example provided with the CUDA Toolkit which implements two variants of this filter, called NLM & NLM2 (or "quick NLM").
Having no previous experience with CUDA coding, I initially attempted to follow MATLAB's documentation on the subject, which resulted in several strange errors including: ptx compilation, multiple entry points and wrong number of inputs in the C prototype. At this point I realized that this isn't going to be a "just works" case and that some tinkering is required.
So I decided to eliminate the multiple entry point issue by simply deleting parts of imageDenoising.cu file and consolidating the relevant .cuh (either ..._nlm_kernel.cuh or ..._nlm2_kernel.cuh) into the .cu so as to obtain a single entry point at any given time.
To my surprise this actually managed to compile and I was finally able to create a CUDAKernel without an error (using the command k = parallel.gpu.CUDAKernel('imageDenoising.ptx', 'uint8_T *, int, int, float, float');).
This however was not enough, because I mistakenly concluded that the 1st argument is the unprocessed image in the form of an RGB matrix (i.e. X*Y*3 uint8), and so the result I was getting back was exactly the input but with 0 in the 1st 4 elements.
After searching a bit more I realized that there are additional, and critical, aspects I'm entirely unaware of (like the need to initialize __device__ variables) to such a conversion process, at which stage I decided to ask for help.
The Problem
I'm currently wondering how to efficiently continue from here. While I'd love to hear if this kind of approach can generally bear fruit (and whether a complete example of this process is available somewhere), which other pitfalls I should look out for, and what alternative courses of action I can take (considering my very limited knowledge in CUDA and the fact I won't hire anybody else to do this for me), I keep in mind that this is SO and so I must have a specific programming problem, so here goes:
How do I modify imageDenoising.cu such that the MATLAB CUDAKernel constructed from it will also accept the unprocessed image as an input?
Note: in my application, the input matrix is a 2d, grayscale, double matrix.
Related: How CudaMalloc work?
P.S.
A working piece of code would obviously be welcomed, but I'd really rather "learn to fish".
I ended up taking an alternative approach to CUDAKernel, using .MEX, by doing the following:
Setting up the external libraries OpenCV v2.4.10 (not v3!) and mexopencv.
Writing a small wrapper function for OpenCV's fastNlMeansDenoising using the guidelines of mexopencv for unimplemented functions, as seen below (excluding the documentation):
#include "mexopencv.hpp"
using namespace cv;
void mexFunction(int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
{
// Check arguments
if (nlhs != 1 || nrhs<1 || ((nrhs % 2) != 1) )
mexErrMsgIdAndTxt("fastNLM:invalidArgs", "Wrong number of arguments");
// Argument vector
vector<MxArray> rhs(prhs, prhs + nrhs);
// Option processing
// Defaults:
double h = 3;
int templateWindowSize = 7;
int searchWindowSize = 21;
// Parsing input name-value pairs:
for (int i = 1; i<nrhs; i += 2) {
string key = rhs[i].toString();
if (key == "h")
h = rhs[i + 1].toDouble();
else if (key == "templateWindowSize")
templateWindowSize = rhs[i + 1].toInt();
else if (key == "searchWindowSize")
searchWindowSize = rhs[i + 1].toInt();
else
mexErrMsgIdAndTxt("mexopencv:error", "Unrecognized option");
}
// Process
Mat src(rhs[0].toMat()), dst;
fastNlMeansDenoising(src, dst, h, templateWindowSize, searchWindowSize);
// Convert cv::Mat back to mxArray*
plhs[0] = MxArray(dst);
}
Compiling it..... and viola - a working CUDA-accelerated NLM filter.
The answer to my question itself can be found by comparing opencv\sources\modules\photo\src\cuda\nlm.cu (this is the opencv2 path) with imageDenoising_nlm2_kernel.cuh.
This solution worked well for me because it was more important for me to get an NLM filter running, rather than using CUDAKernel.
The main lesson I learned from this (and I'd like to pass on to others) is:
Running CUDA code in MATLAB can also be done in ways other than CUDAKernel, such as using .mex wrappers as shown above.
I've been working on some vDSP code and I have come up against an annoying problem. My code is cross platform and hence uses std::complex to store its complex values.
Now I assumed that I would be able to set up an FFT as follows:
DSPSplitComplex dspsc;
dspsc.realp = &complexVector.front().real();
dspsc.imagp = &complexVector.front().imag();
And then use a stride of 2 in the appropriate vDSP_fft_* call.
However this just doesn't seem to work. I can solve the issue by doing a vDSP_ztoc but this requires temporary buffers that I really don't want hanging around. Is there any way to use the vDSP_fft_* functions directly on interleaved complex data? Also can anyone explain why I can't do as I do above with a stride of 2?
Thanks
Edit: As pointed out by Bo Persson the real and imag functions don't actually return a reference.
However it still doesn't work if I do the following instead
DSPSplitComplex dspsc;
dspsc.realp = ((float*)&complexVector.front()) + 0;
dspsc.imagp = ((float*)&complexVector.front()) + 1;
So my original question still does stand :(
The std::complex functions real() and imag() return by value, they do not return a reference to the members of complex.
This means that you cannot get their addresses this way.
This is how you do it.
const COMPLEX *in = reinterpret_cast<const COMPLEX*>(std::complex);
Source: http://www.fftw.org/doc/Complex-numbers.html
EDIT:
To clarify the source; COMPLEX and fftw_complex use the same data layout (although fftw_complex uses double and COMPLEX float)
What is the difference between forward declaration and forward reference?
Forward declaration is, in my head, when you declare a function that isn't yet implemented, but is this incorrect? Do you have to look at the specified situation for either declaring a case "forward reference" or "forward declaration"?
A forward declaration is the declaration of a method or variable before you implement and use it. The purpose of forward declarations is to save compilation time.
The forward declaration of a variable causes storage space to be set aside, so you can later set the value of that variable.
The forward declaration of a function is also called a "function prototype," and is a declaration statement that tells the compiler what a function’s return type is, what the name of the function is, and the types its parameters. Compilers in languages such as C/C++ and Pascal store declared symbols (which include functions) in a lookup table and references them as it comes across them in your code. These compilers read your code sequentially, that is, top to bottom, so if you don't forward declare, the compiler discovers a symbol that it can't reference in the lookup table, and it raises an error that it doesn't know how to respond to the function.
The forward declaration is a hint to the compiler that you have defined (filled out the implementation of) the function elsewhere.
For example:
int first(int x); // forward declaration of first
...
int first(int x) {
if (x == 0) return 1;
else return 2;
}
But, you ask, why don't we just have the compiler make two passes on every source file: the first one to index all the symbols inside, and the second to parse the references and look them up? According to Dan Story:
When C was created in 1972, computing resources were much more scarce
and at a high premium -- the memory required to store a complex
program's entire symbolic table at once simply wasn't available in
most systems. Fixed storage was also expensive, and extremely slow, so
ideas like virtual memory or storing parts of the symbolic table on
disk simply wouldn't have allowed compilation in a reasonable
timeframe... When you're dealing with magnetic tape where seek times
were measured in seconds and read throughput was measured in bytes per
second (not kilobytes or megabytes), that was pretty meaningful.
C++, while created almost 17 years later, was defined as a superset
of C, and therefore had to use the same mechanism.
By the time Java rolled around in 1995, average computers had enough
memory that holding a symbolic table, even for a complex project, was
no longer a substantial burden. And Java wasn't designed to be
backwards-compatible with C, so it had no need to adopt a legacy
mechanism. C# was similarly unencumbered.
As a result, their designers chose to shift the burden of
compartmentalizing symbolic declaration back off the programmer and
put it on the computer again, since its cost in proportion to the
total effort of compilation was minimal.
In Java and C#, identifiers are recognized automatically from source files and read directly from dynamic library symbols. In these languages, header files are not needed for the same reason.
A forward reference is the opposite. It refers to the use of an entity before its declaration. For example:
int first(int x) {
if (x == 0) return 1;
return second(x-1); // forward reference to second
}
int second(int x) {
if (x == 0) return 0;
return first(x-1);
}
Note that "forward reference" is used sometimes, though less often, as a synonym for "forward declaration."
From Wikipedia:
Forward Declaration
Declaration of a variable or function which are not defined yet. Their defnition can be seen later on.
Forward Reference
Similar to Forward Declaration but where the variable or function appears first the definition is also in place.
forward declarations are used to allow single-pass compilation of a language (C, Pascal).
if forward references are allowed without forward declaration (Java, C#), a two-pass compiler is required.