Ive asked a similar question before and didnt manage to find a direct answer.
Could someone provide sample code for extracting the depth buffer of the rendering of an object into a figure in Matlab?
So lets say I load an obj file or even just a simple surf call, render it and now want to get to its depth buffer then what code will do that for me using both Matlab and OpenGL. I.e. how do I set this up and then access the actual data?
I essentially want to be able to use Matlabs powerful plotting functions and then be able to access the underlying graphics context for getting the depth buffer out.
NOTE: The bounty specifies JOGL but that is not a must. Any code which acts as above and can provide me with the depth buffer after running it in Matlab is sufficient)
Today, I went drinking with my colleagues, and after five beers and some tequillas I found this question and thought, "have at ya!" So I was struggling for a while but then I found a simple solution using MEX. I theorized that the OpenGL context, created by the last window, could be left active and therefore could be accessible from "C", if the script ran in the same thread.
I created a simple "C" program which calls one matlab function, called "testofmyfilter" which plots frequency response of a filter (that was the only script I had at hand). This is rendered using OpenGL. Then the program uses glGetViewport() and glReadPixels() to get to the OpenGL buffers. Then it creates a matrix, fills it with the depth values, and passes it to the second function, called "trytodisplaydepthmap". It just displays the depthmap using the imshow function. Note that the MEX function is allowed to return values as well, so maybe the postprocessing would not have to be another function, but I'm in no state to be able to understand how it's done. Should be trivial, though. I'm working with MEX for the first time today.
Without further delay, there are source codes I used:
testofmyfilter.m
imp = zeros(10000,1);
imp(5000) = 1;
% impulse
[bwb,bwa] = butter(3, 0.1, 'high');
b = filter(bwb, bwa, imp);
% filter impulse by the filter
fs = 44100; % sampling frequency (all frequencies are relative to fs)
frequency_response=fft(b); % calculate response (complex numbers)
amplitude_response=20*log10(abs(frequency_response)); % calculate module of the response, convert to dB
frequency_axis=(0:length(b)-1)*fs/length(b); % generate frequency values for each response value
min_f=2;
max_f=fix(length(b)/2)+1; % min, max frequency
figure(1);
lighting gouraud
set(gcf,'Renderer','OpenGL')
semilogx(frequency_axis(min_f:max_f),amplitude_response(min_f:max_f),'r-') % plot with logarithmic axis using red line
axis([frequency_axis(min_f) frequency_axis(max_f) -90 10]) % set axis limits
xlabel('frequency [Hz]');
ylabel('amplitude [dB]'); % legend
grid on % draw grid
test.c
//You can include any C libraries that you normally use
#include "windows.h"
#include "stdio.h"
#include "math.h"
#include "mex.h" //--This one is required
extern WINAPI void glGetIntegerv(int n_enum, int *p_value);
extern WINAPI void glReadPixels(int x,
int y,
int width,
int height,
int format,
int type,
void * data);
#define GL_VIEWPORT 0x0BA2
#define GL_DEPTH_COMPONENT 0x1902
#define GL_FLOAT 0x1406
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
int viewport[4], i, x, y;
int colLen;
float *data;
double *matrix;
mxArray *arg[1];
mexCallMATLAB(0, NULL, 0, NULL, "testofmyfilter");
// call an .m file which creates OpenGL window and draws a plot inside
glGetIntegerv(GL_VIEWPORT, viewport);
printf("GL_VIEWPORT = [%d, %d, %d, %d]\n", viewport[0], viewport[1], viewport[2], viewport[3]);
// print viewport dimensions, should be [0, 0, m, n]
// where m and n are size of the GL window
data = (float*)malloc(viewport[2] * viewport[3] * sizeof(float));
glReadPixels(0, 0, viewport[2], viewport[3], GL_DEPTH_COMPONENT, GL_FLOAT, data);
// alloc data and read the depth buffer
/*for(i = 0; i < 10; ++ i)
printf("%f\n", data[i]);*/
// debug
arg[0] = mxCreateNumericMatrix(viewport[3], viewport[2], mxDOUBLE_CLASS, mxREAL);
matrix = mxGetPr(arg[0]);
colLen = mxGetM(arg[0]);
printf("0x%08x 0x%08x 0x%08x %d\n", data, arg[0], matrix, colLen); // debug
for(x = 0; x < viewport[2]; ++ x) {
for(y = 0; y < viewport[3]; ++ y)
matrix[x * colLen + y] = data[x + (viewport[3] - 1 - y) * viewport[2]];
}
// create matrix, copy data (this is stupid, but matlab switches
// rows/cols, also convert float to double - but OpenGL could have done that)
free(data);
// don't need this anymore
mexCallMATLAB(0, NULL, 1, arg, "trytodisplaydepthmap");
// pass the array to a function (returnig something from here
// is beyond my understanding of mex, but should be doable)
mxDestroyArray(arg[0]);
// cleanup
return;
}
trytodisplaydepthmap.m:
function [] = trytodisplaydepthmap(depthMap)
figure(2);
imshow(depthMap, []);
% see what's inside
Save all of these to the same directory, compile test.c with (type that to Matlab console):
mex test.c Q:\MATLAB\R2008a\sys\lcc\lib\opengl32.lib
Where "Q:\MATLAB\R2008a\sys\lcc\lib\opengl32.lib" is path to "opengl32.lib" file.
And finally execute it all by merely typing "test" in matlab console. It should bring up a window with filter frequency response, and another window with the depth buffer. Note the front and back buffers are swapped at the moment "C" code reads the depth buffer, so it might be required to run the script twice to get any results (so the front buffer which now contains the results swaps with back buffer again, and the depth can be read out). This could be done automatically by "C", or you can try including getframe(gcf); at the end of your script (that reads back from OpenGL as well so it swaps the buffers for you, or something).
This works for me in Matlab 7.6.0.324 (R2008a). The script runs and spits out the following:
>>test
GL_VIEWPORT = [0, 0, 560, 419]
0x11150020 0x0bd39620 0x12b20030 419
And of course it displays the images. Note the depth buffer range depends on Matlab, and can be quite high, so making any sense of the generated images may not be straightforward.
the swine's answer is the correct one.
Here is a slightly formatted and simpler version that is cross-platform.
Create a file called mexGetDepth.c
#include "mex.h"
#define GL_VIEWPORT 0x0BA2
#define GL_DEPTH_COMPONENT 0x1902
#define GL_FLOAT 0x1406
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
int viewport[4], i, x, y;
int colLen;
float *data;
double *matrix;
glGetIntegerv(GL_VIEWPORT, viewport);
data = (float*)malloc(viewport[2] * viewport[3] * sizeof(float));
glReadPixels(0, 0, viewport[2], viewport[3], GL_DEPTH_COMPONENT, GL_FLOAT, data);
plhs[0] = mxCreateNumericMatrix(viewport[3], viewport[2], mxDOUBLE_CLASS, mxREAL);
matrix = mxGetPr(plhs[0]);
colLen = mxGetM(plhs[0]);
for(x = 0; x < viewport[2]; ++ x) {
for(y = 0; y < viewport[3]; ++ y)
matrix[x * colLen + y] = data[x + (viewport[3] - 1 - y) * viewport[2]];
}
free(data);
return;
}
Then if youre on windows compile using
mex mexGetDepth.c "path to OpenGL32.lib"
or if youre on a nix system
mex mexGetDepth.c "path to opengl32.a"
Then run the following small script to test out the new function
peaks;
figure(1);
depthData=mexGetDepth;
figure
imshow(depthData);
Related
Is it possible to take a camera image in Y'UV format and using RenderScript:
Convert it to RGBA
Crop it to a certain region
Rotate it if necessary
Yes! I figured out how and thought I would share it with others. RenderScript has a bit of a learning curve, and more simple examples seem to help.
When cropping, you still need to set up an input and output allocation as well as one for the script itself. It might seem strange at first, but the input and output allocations have to be the same size so if you are cropping you need to set up yet another Allocation to write the cropped output. More on that in a second.
#pragma version(1)
#pragma rs java_package_name(com.autofrog.chrispvision)
#pragma rs_fp_relaxed
/*
* This is mInputAllocation
*/
rs_allocation gInputFrame;
/*
* This is where we write our cropped image
*/
rs_allocation gOutputFrame;
/*
* These dimensions define the crop region that we want
*/
uint32_t xStart, yStart;
uint32_t outputWidth, outputHeight;
uchar4 __attribute__((kernel)) yuv2rgbFrames(uchar4 in, uint32_t x, uint32_t y)
{
uchar Y = rsGetElementAtYuv_uchar_Y(gInputFrame, x, y);
uchar U = rsGetElementAtYuv_uchar_U(gInputFrame, x, y);
uchar V = rsGetElementAtYuv_uchar_V(gInputFrame, x, y);
uchar4 rgba = rsYuvToRGBA_uchar4(Y, U, V);
/* force the alpha channel to opaque - the conversion doesn't seem to do this */
rgba.a = 0xFF;
uint32_t translated_x = x - xStart;
uint32_t translated_y = y - yStart;
uint32_t x_rotated = outputWidth - translated_y;
uint32_t y_rotated = translated_x;
rsSetElementAt_uchar4(gOutputFrame, rgba, x_rotated, y_rotated);
return rgba;
}
To set up the allocations:
private fun createAllocations(rs: RenderScript) {
/*
* The yuvTypeBuilder is for the input from the camera. It has to be the
* same size as the camera (preview) image
*/
val yuvTypeBuilder = Type.Builder(rs, Element.YUV(rs))
yuvTypeBuilder.setX(mImageSize.width)
yuvTypeBuilder.setY(mImageSize.height)
yuvTypeBuilder.setYuvFormat(ImageFormat.YUV_420_888)
mInputAllocation = Allocation.createTyped(
rs, yuvTypeBuilder.create(),
Allocation.USAGE_IO_INPUT or Allocation.USAGE_SCRIPT)
/*
* The RGB type is also the same size as the input image. Other examples write this as
* an int but I don't see a reason why you wouldn't be more explicit about it to make
* the code more readable.
*/
val rgbType = Type.createXY(rs, Element.RGBA_8888(rs), mImageSize.width, mImageSize.height)
mScriptAllocation = Allocation.createTyped(
rs, rgbType,
Allocation.USAGE_SCRIPT)
mOutputAllocation = Allocation.createTyped(
rs, rgbType,
Allocation.USAGE_IO_OUTPUT or Allocation.USAGE_SCRIPT)
/*
* Finally, set up an allocation to which we will write our cropped image. The
* dimensions of this one are (wantx,wanty)
*/
val rgbCroppedType = Type.createXY(rs, Element.RGBA_8888(rs), wantx, wanty)
mOutputAllocationRGB = Allocation.createTyped(
rs, rgbCroppedType,
Allocation.USAGE_SCRIPT)
}
Finally, since you're cropping you need to tell the script what to do before invocation. If the image sizes don't change you can probably optimize this by moving the LaunchOptions and variable settings so they occur just once (rather than every time) but I'm leaving them here for my example to make it clearer.
override fun onBufferAvailable(a: Allocation) {
// Get the new frame into the input allocation
mInputAllocation!!.ioReceive()
// Run processing pass if we should send a frame
val current = System.currentTimeMillis()
if (current - mLastProcessed >= mFrameEveryMs) {
val lo = Script.LaunchOptions()
/*
* These coordinates are the portion of the original image that we want to
* include. Because we're rotating (in this case) x and y are reversed
* (but still offset from the actual center of each dimension)
*/
lo.setX(starty, endy)
lo.setY(startx, endx)
mScriptHandle.set_xStart(lo.xStart.toLong())
mScriptHandle.set_yStart(lo.yStart.toLong())
mScriptHandle.set_outputWidth(wantx.toLong())
mScriptHandle.set_outputHeight(wanty.toLong())
mScriptHandle.forEach_yuv2rgbFrames(mScriptAllocation, mOutputAllocation, lo)
val output = Bitmap.createBitmap(
wantx, wanty,
Bitmap.Config.ARGB_8888
)
mOutputAllocationRGB!!.copyTo(output)
/* Do something with the resulting bitmap */
listener?.invoke(output)
mLastProcessed = current
}
}
All this might seem like a bit much but it's very fast - way faster than doing the rotation on the java/kotlin side, and thanks to RenderScript's ability to run the kernel function over a subset of the image it's less overhead than creating a bitmap then creating a second, cropped one.
For me, all the rotation is necessary because the image seen by the RenderScript was 90 degrees rotated from the camera. I am told this is some kind of peculiarity of having a Samsung phone.
RenderScript was intimidating at first but once you get used to what it's doing it's not so bad. I hope this is helpful to someone.
I want to implement a Sobel filter in RenderScript with uchar4 as Input allocation and float[] as Output allocation. I am not quite sure whether it is possible to use different types for Input and Output allocations in a RenderScript. I want to develop the solution myself, but would be grateful to get some advice on the best Renderscript structure to takle that Problem. Somewhere I read, that it is possible to use
float attribute((kernel)) root(uchar4 *v_in, uint32_t x, uint32_t y) {
}
Would you recommend such Approach or can this be done without using actually a kernel, i.e. just a function? Thanks in advance.
My rs code for the Sobel (X direction) now looks as follows:
#pragma version(1)
#pragma rs java_package_name(com.example.xxx)
#pragma rs_fp_relaxed
rs_allocation gIn;
int32_t width;
int32_t height;
float __attribute__((kernel)) sobelX(uchar4 *v_in, uint32_t x, uint32_t y) {
float out=0;
if (x>0 && y>0 && x<(width-1) && y<(height-1){
uchar4 c11=rsGetElementAt_uchar4(gIn, x-1, y-1);
uchar4 c21=rsGetElementAt_uchar4(gIn, x, y-1);
uchar4 c31=rsGetElementAt_uchar4(gIn, x+1, y-1);
uchar4 c13=rsGetElementAt_uchar4(gIn, x-1, y+1);
uchar4 c23=rsGetElementAt_uchar4(gIn, x, y+1);
uchar4 c33=rsGetElementAt_uchar4(gIn, x+1, y+1);
float4 f11=convert_float4(c11);
float4 f21=convert_float4(c21);
float4 f31=convert_float4(c31);
float4 f13=convert_float4(c13);
float4 f23=convert_float4(c23);
float4 f33=convert_float4(c33);
out= f11.r-f13.r + 2*(f21.r-f23.r) + f31.r-f33.r;
}
return out;
}
What I am struggling is passing the Parameters from Java side:
float[][] gx = new float[width][height];
ScriptC_sobel script;
script=new ScriptC_sobel(rs);
script.set_width(width) ;
script.set_height(height) ;
script.set_gIn(bmpGray);
Allocation inAllocation = Allocation.createFromBitmap(rs, bmpGray, Allocation.MipmapControl.MIPMAP_NONE,
Allocation.USAGE_SCRIPT);
Allocation outAllocation = Allocation.createTyped(rs, float,2) ;
script.forEach_sobelX(inAllocation, outAllocation);
outAllocation.copyTo(gx) ;
I understand that, in order to use rsGetElementAt function (to access neighboring data within the kernel) I need to set the input allocation as a script global as well (rs_allocation gIn in rs code). However, I'm not sure how to handle this "double allocation" from the Java side. Also the outAllocation Statement in the Java code is probably not correct. Specifiyally I am not sure, whether the Kernel will returned this as float[] or as float[][].
It is possible to use different types for input and output. In your case, I would actually suggest:
float __attribute__((kernel)) sobel(unchar4 *v_in, uint32_t x, uint32_t y) {}
You certainly want to use a kernel, so that the performance can benefit from execution by multiple threads.
Also, have a look at this example of doing 3x3 convolution in RS.
UPDATE: generally, the best in/out parameters to use depend on the type of output you want this filter to generate - is it just the magnitude? Then uint output will most likely suffice.
UPDATE2: If you are going to use a variable to pass input allocation, then you don't need it in the kernel parameters, i.e.:
float __attribute__((kernel)) sobelX(uint32_t x, uint32_t y)
The rest of the script looks ok (sans missing parenthesis in the conditional). As for the Java part, below I am pasting a demonstration of how you should prepare the output allocation and start the script. The kernel will then be invoked for every cell (i.e. every float) in the output allocation.
float[] gx = new float[width * height];
Type.Builder TypeIn = new Type.Builder(mRS, Element.F32(mRS));
TypeIn.setX(width).setY(height);
Allocation outAllocation = Allocation.createTyped(mRS, TypeIn.create());
mScript.forEach_sobelX(outAllocation);
This is my renderscript code for now:
#pragma version(1)
#pragma rs java_package_name(com.apps.foo.bar)
rs_allocation inPixels;
uchar4 RS_KERNEL root(uchar4 in, uint32_t x, uint32_t y) {
uchar4 pixel = in.rgba;
pixel.r = (pixel.r + pixel.g + pixel.b)/3;
pixel.g = (pixel.r + pixel.g + pixel.b)/3;
pixel.b = (pixel.r + pixel.g + pixel.b)/3;
return pixel;
}
My phone shows a "greyscaled" picture. I say "grayscaled" because red for example, is still kinda red...It is gray-ish but you can still see that is red. I know I can use more sophisticated methods, but I would like to stick to the simple one for now.
I would like to know if my renderscript code is wrong. Should I be converting the char to another type?
Use a temporary variable to hold the result as you compute it. Otherwise, in the first line you're modifying pixel.r, and in the very next one you are using it to calculate pixel.g. No wonder you get artifacts.
Also, don't forget to assign the alpha value to avoid surprises with "invisible" output.
Also I would recommend not to use equal weights for r, g and b but the weights as below. See e.g. http://www.johndcook.com/blog/2009/08/24/algorithms-convert-color-grayscale/
char4 __attribute__((kernel)) gray(uchar4 in) {
uchar4 out;
float gr= 0.2125*in.r + 0.7154*in.g + 0.0721*in.b;
out.r = out.g = out.b = gr;
out.a = in.a;
return out;
}
I've been working on this problem for quite some time and am at the end of my creativity, so hopefully someone else can help point me in the right direction. I've been working with the Kinect and attempting to capture data to MATLAB. Fortunately there's quite a few ways of doing so (I'm currently using http://www.mathworks.com/matlabcentral/fileexchange/30242-kinect-matlab). When I attempted to project the captured data to 3D, my traditional methods gave poor reconstruction results.
To cut a long story short, I ended up writing a Kinect SDK wrapper for matlab that performs the reconstruction and the alignment. The reconstruction works like a dream, but...
I am having tons of trouble with the alignment as you can see here:
Please don't look too closely at the model :(.
As you can see, the alignment is incorrect. I'm not sure why that's the case. I've read plenty of forums where others have had more success than I with the same methods.
My current pipeline is using Kinect Matlab (using Openni) to capture data, reconstructing using the Kinect SDK, then aligning using the Kinect SDK (by NuiImageGetColorPixelCoordinateFrameFromDepthPixelFrameAtResolution). I suspected it was perhaps due to Openni, but I have had little success in creating mex function calls to capture using the Kinect SDK.
If anyone can point me in a direction I should delve more deeply into, it would be much appreciated.
Edit:
Figure I should post some code. This is the code I use for alignment:
/* The matlab mex function */
void mexFunction( int nlhs, mxArray *plhs[], int nrhs,
const mxArray *prhs[] ){
if( nrhs < 2 )
{
printf( "No depth input or color image specified!\n" );
mexErrMsgTxt( "Input Error" );
}
int width = 640, height = 480;
// get input depth data
unsigned short *pDepthRow = ( unsigned short* ) mxGetData( prhs[0] );
unsigned char *pColorRow = ( unsigned char* ) mxGetData( prhs[1] );
// compute the warping
INuiSensor *sensor = CreateFirstConnected();
long colorCoords[ 640*480*2 ];
sensor->NuiImageGetColorPixelCoordinateFrameFromDepthPixelFrameAtResolution(
NUI_IMAGE_RESOLUTION_640x480, NUI_IMAGE_RESOLUTION_640x480,
640*480, pDepthRow, 640*480*2, colorCoords );
sensor->NuiShutdown();
sensor->Release();
// create matlab output; it's a column ordered matrix ;_;
int Jdimsc[3];
Jdimsc[0]=height;
Jdimsc[1]=width;
Jdimsc[2]=3;
plhs[0] = mxCreateNumericArray( 3, Jdimsc, mxUINT8_CLASS, mxREAL );
unsigned char *Iout = ( unsigned char* )mxGetData( plhs[0] );
for( int x = 0; x < width; x++ )
for( int y = 0; y < height; y++ ){
int idx = ( y*width + x )*2;
long c_x = colorCoords[ idx + 0 ];
long c_y = colorCoords[ idx + 1 ];
bool correct = ( c_x >= 0 && c_x < width
&& c_y >= 0 && c_y < height );
c_x = correct ? c_x : x;
c_y = correct ? c_y : y;
Iout[ 0*height*width + x*height + y ] =
pColorRow[ 0*height*width + c_x*height + c_y ];
Iout[ 1*height*width + x*height + y ] =
pColorRow[ 1*height*width + c_x*height + c_y ];
Iout[ 2*height*width + x*height + y ] =
pColorRow[ 2*height*width + c_x*height + c_y ];
}
}
This is a well known problem for stereo vision systems. I had the same problem a while back. The original question I posted can be found here. What I was trying to do was kind of similar to this. However after a lot of research I came to the conclusion that a captured dataset can not be easily aligned.
On the other hand, while recording the dataset you can easily use a function call to align both the RGB and Depth data. This method is available in both OpenNI and Kinect SDK (functionality is same, while names of the function call are different for each)
It looks like you are using Kinect SDK to capture the dataset, to align data with Kinect SDK you can use MapDepthFrameToColorFrame.
Since you have also mentioned using OpenNI, have a look at AlternativeViewPointCapability.
I have no experience with Kinect SDK, however with OpenNI v1.5 this whole problem was solved by making the following function call, before registering the recorder node:
depth.GetAlternativeViewPointCap().SetViewPoint(image);
where image is the image generator node and depth is the depth generator node. This was with older SDK which has been replaced by OpenNI 2.0 SDK. So if you are using the latest SDK, then the function call might be different, however the overall procedure might be similar.
I am also adding some example images:
Without using the above alignment function call the depth edge on RGB were not aligned
When using the function call the depth edge gets perfectly aligned (there are some infrared shadow regions which show some edges, but they are just invalid depth regions)
depth.GetAlternativeViewPointCap().SetViewPoint(image);
works well but the problem is that it downscales the depth image (by FOCAL_rgb/FOCAL_kinect) and shifts depth pixel by disparity d=focal*B/z; depending on the factory settings there might be a slight rotation as well.
Thus one cannot recover all 3 Real World coordinates any more without undoing these transformations. This being said, the methods that doesn't depend on accurate x, y and take only z into account (such as segmentation) may work well even in shifted shifted map. Moreover they can take advantage of colour as well as depth to perform better segmentation.
You can easily align Depth Frames and Color Frames by reading the U,V texture mapping parameters using the Kinect SDK. For every pixel coordinate (i,j) of the Depth frame D(i,j) the corresponding pixel coordinate of the Color Frame is given by (U(i,j),V(i,j)) so the color is given by C(U(i,j),V(i,j)).
The U,V functions are contained in the hardware of each Kinect and they differ from Kinect to Kinect since the Depth cameras are differently aligned with respect to the Video cameras due to tiny differences when glued on the hardware board at the factory. But you don't have to worry about that if you read U,V from the Kinect SDK.
Below I give you an image example and an actual source code example using the Kinect SDK in Java with the J4K open source library:
public class Kinect extends J4KSDK{
VideoFrame videoTexture;
public Kinect() {
super();
videoTexture=new VideoFrame();
}
#Override
public void onDepthFrameEvent(short[] packed_depth, int[] U, int V[]) {
DepthMap map=new DepthMap(depthWidth(),depthHeight(),packed_depth);
if(U!=null && V!=null) map.setUV(U,V,videoWidth(),videoHeight());
}
#Override
public void onVideoFrameEvent(byte[] data) {
videoTexture.update(videoWidth(), videoHeight(), data);
} }
Image example showing 3 different perspectives of the same Depth-Video aligned frame:
I hope that this helps you!
I have an image in Matlab:
img = imopen('image.jpg')
which returns an uint8 array height x width x channels (3 channels: RGB).
Now I want to use openCV to do some manipulations on it, so I write up a MEX file which takes the image as a parameter and constructs an IplImage from it:
#include "mex.h"
#include "cv.h"
void mexFunction(int nlhs, mxArray **plhs, int nrhs, const mxArray **prhs) {
char *matlabImage = (char *)mxGetData(prhs[0]);
const mwSize *dim = mxGetDimensions(prhs[0]);
CvSize size;
size.height = dim[0];
size.width = dim[1];
IplImage *iplImage = cvCreateImageHeader(size, IPL_DEPTH_8U, dim[2]);
iplImage->imageData = matlabImage;
iplImage->imageDataOrigin = iplImage->imageData;
/* Show the openCV image */
cvNamedWindow("mainWin", CV_WINDOW_AUTOSIZE);
cvShowImage("mainWin", iplImage);
}
This result looks completely wrong, because openCV uses other conventions than matlab for storing an image (for instance, they interleave the color channels).
Can anyone explain what the differences in conventions are and give some pointers on how to display the image correctly?
After spending the day doing fun image format conversions </sarcasm> I can now answer my own question.
Matlab stores images as 3 dimensional arrays: height × width × color
OpenCV stores images as 2 dimensional arrays: (color × width) × height
Furthermore, for best performance, OpenCV pads the images with zeros so rows are always aligned on 32 bit blocks.
I've done the conversion in Matlab:
function [cv_img, dim, depth, width_step] = convert_to_cv(img)
% Exchange rows and columns (handles 3D cases as well)
img2 = permute( img(:,end:-1:1,:), [2 1 3] );
dim = [size(img2,1), size(img2,2)];
% Convert double precision to single precision if necessary
if( isa(img2, 'double') )
img2 = single(img2);
end
% Determine image depth
if( ndims(img2) == 3 && size(img2,3) == 3 )
depth = 3;
else
depth = 1;
end
% Handle color images
if(depth == 3 )
% Switch from RGB to BGR
img2(:,:,[3 2 1]) = img2;
% Interleave the colors
img2 = reshape( permute(img2, [3 1 2]), [size(img2,1)*size(img2,3) size(img2,2)] );
end
% Pad the image
width_step = size(img2,1) + mod( size(img2,1), 4 );
img3 = uint8(zeros(width_step, size(img2,2)));
img3(1:size(img2,1), 1:size(img2,2)) = img2;
cv_img = img3;
% Output to openCV
cv_display(cv_img, dim, depth, width_step);
The code to transform this into an IplImage is in the MEX file:
#include "mex.h"
#include "cv.h"
#include "highgui.h"
#define IN_IMAGE prhs[0]
#define IN_DIMENSIONS prhs[1]
#define IN_DEPTH prhs[2]
#define IN_WIDTH_STEP prhs[3]
void mexFunction(int nlhs, mxArray **plhs, int nrhs, const mxArray **prhs) {
bool intInput = true;
if(nrhs != 4)
mexErrMsgTxt("Usage: cv_disp(image, dimensions, depth, width_step)");
if( mxIsUint8(IN_IMAGE) )
intInput = true;
else if( mxIsSingle(IN_IMAGE) )
intInput = false;
else
mexErrMsgTxt("Input should be a matrix of uint8 or single precision floats.");
if( mxGetNumberOfElements(IN_DIMENSIONS) != 2 )
mexErrMsgTxt("Dimension vector should contain two elements: [width, height].");
char *matlabImage = (char *)mxGetData(IN_IMAGE);
double *imgSize = mxGetPr(IN_DIMENSIONS);
size_t width = (size_t) imgSize[0];
size_t height = (size_t) imgSize[1];
size_t depth = (size_t) *mxGetPr(IN_DEPTH);
size_t widthStep = (size_t) *mxGetPr(IN_WIDTH_STEP) * (intInput ? sizeof(unsigned char):sizeof(float));
CvSize size;
size.height = height;
size.width = width;
IplImage *iplImage = cvCreateImageHeader(size, intInput ? IPL_DEPTH_8U:IPL_DEPTH_32F, depth);
iplImage->imageData = matlabImage;
iplImage->widthStep = widthStep;
iplImage->imageDataOrigin = iplImage->imageData;
/* Show the openCV image */
cvNamedWindow("mainWin", CV_WINDOW_AUTOSIZE);
cvShowImage("mainWin", iplImage);
}
You could optimize your program with mxGetDimensions and mxGetNumberOfDimensions to get the size of the image and use the mxGetClassID to determine the depth more accurately.
I wanted to do the same but I think it would be better to do this using matlab dll and calllib. I would not do the transformation of the image in opencv format not in matlab because it would be slow. This is one of the biggest problems with matlab openCV. I think opencv2.2 has some good solutions for that problem. It looks like there are some solutions like that done from opencv community for octave but I still don't understand them. They are somehow using the c++ functionality of opencv.
Try using the library developed by Kota Yamaguchi:
http://github.com/kyamagu/mexopencv
It defines a class called 'MxArray' that can perform all types of conversions from MATLAB mxArray variables to OpenCV objects (and from OpenCV to MATLAB). For example, this library can convert between mxArray and cv::Mat data types. Btw, IplImage is not relevant anymore if you use C++ API of OpenCV, it's better to use cv::Mat instead.
Note: if using the library, make sure to compile your mex function with MxArray.cpp file from the library; you can do so in MATLAB command line with:
mex yourmexfile.cpp MxArray.cpp
Based on the answer and How the image matrix is stored in the memory on OpenCV, we can make it with Opencv Mat operation only!
C++: Mat::Mat(int ndims, const int* sizes, int type, void* data, const size_t* steps=0)
C++: void merge(const Mat* mv, size_t count, OutputArray dst)
Then the mex C/C++ code is:
#include "mex.h"
#include <opencv2/opencv.hpp>
#define uint8 unsigned char
void mexFunction(int nlhs, mxArray *out[], int nrhs, const mxArray *input[])
{
// assume the type of image is uint8
if(!mxIsClass(input[0], "uint8"))
{
mexErrMsgTxt("Only image arrays of the UINT8 class are allowed.");
return;
}
uint8* rgb = (uint8*) mxGetPr(input[0]);
int* dims = (int*) mxGetDimensions(input[0]);
int height = dims[0];
int width = dims[1];
int imsize = height * width;
cv::Mat imR(1, imsize, cv::DataType<uint8>::type, rgb);
cv::Mat imG(1, imsize, cv::DataType<uint8>::type, rgb+imsize);
cv::Mat imB(1, imsize, cv::DataType<uint8>::type, rgb+imsize + imsize);
// opencv is BGR and matlab is column-major order
cv::Mat imA[3];
imA[2] = imR.reshape(1,width).t();
imA[1] = imG.reshape(1,width).t();
imA[0] = imB.reshape(1,width).t();
// done! imf is what we want!
cv::Mat imf;
merge(imA,3,imf);
}