I have an OpenCL kernel that runs well but I want to look at the intermediate code. I use getprograminfo to pull out the binary and save it to a text file. I've tried this with nVidia, AMD, an i7 and a Xeon.
In all of these cases the binary is unreadable.
I understand that on OS X the chunk of data returned is actually a binary plist. I've found instructions for using plutil to convert it to xml, and they work.
It's still unreadable ... though I've seen instructions online that this is where you find the PTX code (in the case of my AMD 5870). There's the expected clBinaryData key but the data under that key is still one big chunk of stuff, not readable IL instructions in text form.
I'd really like to examine the intermediate language to assess inefficiencies in my use of the gpu. Is this simply not possible under Xcode? Or, what am I doing wrong?
Thanks for any information!...
If you run your program with following environmental variable set you should see .IL and .ISA files in your directory.
$ GPU_DUMP_DEVICE_KERNEL=3 ./my-program
Another way is to use AMD APP Kernel Analyzer (which comes along with AMD APP SDK) to look at the Intermediate file i.e IL and ISA.
(I am not sure whether AMD APP SDK available for MAC or not).
One more option according to APP SDK documentation, put the below in your host code.
putenv("GPU_DUMP_DEVICE_KERNEL=3");
References
AMD OpenCL Programming Guide
AMD Devgurus forum
(Making this a top-level answer so I can do some formatting.)
ocluser's answer was very helpful, in that it was enlightening and caused great learning, though it did not, alas, solve the problem.
I've verified that the environment variable described is being set, and is available to my application when run from within xcode. However, it does not have (under OSX) the highly desirable effect it has under Linux.
But, I now know how to set environment variables in 7 of 8 different ways. I also set "tracer" envars to tell me which methods are effective within the scope of my application. From the below, you can see that both the method of "edit scheme" to add arguments works, as does the "putenv" suggested by ocluser. What didn't set it in that scope: ~/.MACOS/environment.plist, app-specific plist, .profile, and adding a build phase to run a custom script (I found at least one other way within xcode to set one but forgot what I called the tracer and can't find it now; maybe it's on another machine....)
GPU_DUMP_DEVICE_KERNEL is 3
GPU_DUMP_TRK_ENVPLIST is (null)
GPU_DUMP_TRK_APPPLIST is (null)
GPU_DUMP_TRK_DOTPROFILE is (null)
GPU_DUMP_TRK_RUNSCRIPT is (null)
GPU_DUMP_TRK_SCHARGS is 1
GPU_DUMP_TRK_PUTENV is 1
... so, no this doesn't really answer the question, but expands on it a bit. Sorry if poor form. Thanks!
Have not given up and shall provide an actual problem-solver if I find one.
Related
I have a question related to V4L-DVB drivers. Following the
Building/Compiling the Latest V4L-DVB Source Code link, there are 3 ways to
compile. I am curious about the last approach (More "Manually
Intensive" Approach). It allows me to choose the components that I
wish to build and install using the "make menuconfig". Some of these components (i.e. "CONFIG_MEDIA_ATTACH") are used in pre-processor directives that define a function in one shape if defined, and a function in another if not defined (i.e.
dvb_attach, dvb_detach) in the resulting modules (i.e. dvb_core.ko)
that will be loaded by most of the DVB drivers. What happens if there are two
drivers (*.ko modules) on the same host machine, one that needs dvb_core.ko with
CONFIG_MEDIA_ATTACH defined and another that needs dvb_core.ko with
CONFIG_MEDIA_ATTACH undefined, is there a clean way to handle this?
What is also not clear to me is: Since the V4L compilation environment seems very customizable (by setting the .config file), if I develop a driver using V4L-DVB structures, there is a big chance that it has conflicts with other drivers since each driver has its own custom settings. Is my understanding correct?
Thanks!
Dave
I recently obtained a license to use Embedded Coder with an existing Simulink model that we have developed. In attempting to generate C code for the first time from the model, I am working through several errors. At first, we had no code generation templates (.cgt) files defined in the model parameters. After some hunting, I found the default template that comes with MATLAB (matlabroot/toolbox/rtw/targets/ecoder/ert_code_template.cgt).
The latest is that I get errors on nearly every token in this default code generation template.
Since I'm just trying to get something to build, at first I commented out the offending lines (things like RTWFileVersion, etc), but now I am noticing that it's giving me errors for things that are mandatory (ie. Types). Types is one of several required items that must be in the .cgt file, so what's wrong that causes MATLAB to not recognize these tokens? I'm guessing something may be messed up with my installation, such as a path.
Other details:
Simulink R2013A x32
Target is a Freescale device
Thanks to Matthias W for getting me to check other configuration options. Turns out I had selected a .tlc file that was probably incompatible with Embedded Coder.
In Code Generation for "System target file" I have selected the ert.tlc file and now I am able to build the parts of my model I'm interested in.
I have a simple model from simulink and I would like to generate code using the code generator in the simulink and then compile it using gcc into a .ELF object file. How can I proceed?
Thanks
You need the product called Simulink Coder (around matlab 2011b) or Real-time Workshop (for older matlab versions). Typing ver at the matlab command window will show what products and licences you have installed.
If Simulink Coder or RTW are installed, you use the menu Simulation->Configuration Parameters to set up the model for code generation.
If you have Embedded Coder you can set System Target File to ert.tlc, and this will produce a very concise main() routine to call your model code. Otherwise, use grt.tlc which produces a lot more bloat then ert, but is the only useful one available for on Windows.
There are a lot of options to go through and check - it really needs someone with a bit of experience to be present!
As you are requesting an ELF file, is this for an embedded system? If so, there is a lot more work to be done. If the target is not one of the already supported targets, then you need a target package, which will take either a lot of time and experience, or money to buy one.
Custom target development - a world of it's own:
http://www.mathworks.co.uk/help/toolbox/rtw/ug/bse3b2z.html
os i figured out how to use the -mthumb and -mno-thumb compiler flag and more or less understand what it's doing.
But what is the -mthumb-interlinking flag doing? when is it needed, and is it set for the whole project if i set 'compile for thumb' in my project settings?
thanks for the info!
Open a terminal and type man gcc
Do you mean -mthumb-interwork ?
-mthumb-interwork
Generate code which supports calling between the ARM and Thumb
instruction sets. Without this option the two instruction sets
cannot be reliably used inside one program. The default is
-mno-thumb-interwork, since slightly larger code is generated when
-mthumb-interwork is specified.
If this is related to a build configuration, you should be able to set it separately for each configuration "such as Release or Debug".
Why do you want to change these settings? I know using thumb instructions save some memory but will it save enough to matter in this case?
my application uses both, thumb and vfp code but i never specifically
set -thumb-interwork flag.. how is that possible?
According to man page, without that flag the two instructions sets
cannot be reliably used inside one program.
It says "reliably"; so without that option, it seems they still can be mixed within a single program but it might be "unreliably". I think normally mixing both instructions sets works, the compiler is smart enough to figure out when it has to switch from one set to another one. However, there might be border cases the compiler just doesn't understand correctly and it might fail to see that it should switch instruction sets here, causing the application to fail (most likely it will crash). This option generates special code, so that no matter what your code does, the switching always happens correctly and reliably; the downside is that this extra code is needed for every global visible function and thus increases the binary side (I have no idea if it also might slow down function calls a little bit, I personally would expect that).
Please also note the following two settings:
-mcallee-super-interworking
Gives all externally visible functions in the file being
compiled an ARM instruction set header
which switches to Thumb mode before executing the rest of
the function. This allows these
functions to be called from non-interworking code.
-mcaller-super-interworking
Allows calls via function pointers (including virtual
functions) to execute correctly regardless
of whether the target code has been compiled for
interworking or not. There is a small overhead
in the cost of executing a function pointer if this option
is enabled.
Though I think you only need those, when building libraries to be used with other projects; but I don't know for sure. The GCC thumb handling is definitely "underdocumented".
This is a little convoluted, but lets try:
I'm integrating LUA scripting into my game engine, and I've done this in the past on win32 in an elegant way. On win32 all I did was to mark all of the functions I wanted to expose to LUA as export functions. Then, to integrate them into LUA, I'd parse the PE header of the executable, unmangle the names, parse the parameters and such, then register them with my LUA runtime. This allowed me to avoid manually registering every function individually just to expose them to LUA.
Now, flash forward to today where I'm working on the iPhone. I've looked through some Unix stuff and I've gotten very close to taking a similar approach, however I'm not sure it will actually work.
I'm not entirely familiar with Unix, but here is what I have so far on iPhone:
Step 1: Query for the executable path through objective-C and get the path of my app
Step 2: Use dlopen to get a handle to my app using: `dlopen(path, RTLD_NOW)`
Step 3: Use `dlsym( libraryHandle, objectName )` to attempt to get the address of a known symbol.
The above steps won't actually get me to where I want to be, but even that doesn't work. Does anyone have any experience doing this type of thing on Unix? Are there any headers or functions I can google to put me on the right track?
Thanks;)
iPhone does not support dynamic linking after the initital application launch. While what you want to do does not actually require linking in any new application TEXT, it would not shock me to find out that some of the dl* functions do not behave as expected.
You may be able to write some platform specific code, but I recommend using a technique developed by the various BSDs called linker sets. Bascially you annotate the functions you want to do something with (just like you currently mark them for export). Through some preprocessor magic they store the annotations, sometimes in an extra segment in the binary image, then have code that grabs that data and enumerates its. So you simply add all the functions you want into the linker set, then walk through the linker set and register all the functions in it with lua.
I know people have gotten this stuff up and running on Windows and Linux, I have used it on Mac OS X and various *BSDs. I am linking the FreeBSD linker_set implementation, but I have not personally seen the Windows implementation.
You need to pass --export-dynamic to the linker (via -Wl,--export-dynamic).
Note: This is for Linux, but could be a starting point for your search.
References:
http://sourceware.org/binutils/docs/ld/Options.html
If static linking is an option, integrate that into the linker script. Before linking, do "nm" on all object files, extract the global symbols, and generate a C file containing a (preferably sorted/hashed) mapping of all symbol names to symbol values:
struct symbol{ char* name; void * value } symbols = [
{"foo", foo},
{"bar", bar},
...
{0,0}};
If you want to be selective in what you expose, it might be easiest to implement a naming schema, e.g. prefixing all functions/methods with Lua_.
Alternatively, you can create a trivial macro,
#define ForLua(X) X
and then grep the sources for ForLua, to select the symbols that you want to incorporate.
You could just generate a mapfile and use that instead, no?