How do I interpret an Intel Realsense camera depth map in MATLAB? - matlab

I was able to view and capture the image from the depth stream in MATLAB (using the webcam from the Hardware Support Package) from an F200 Intel Realsense camera. However, it does not look the same way as it does in the Camera Explorer.
What I see from MATLAB -
I have also linked Depth.mat that contains the image in the variable "D".
The image is returned as a 3 dimensional array of uint8. I assumed that the depth stream is a larger number that is broken in bits in each plane and tried bitshifting each plane and adding it to the next while taking care of the datatypes. Then displayed it using imagesc, but did not get a proper depth image.
How do I properly interpret this image? Or, is there an alternate way to capture images in MATLAB?

Related

Aligning two images

I have two images of the same shoe sole, one taken with a scanning machine and another with a digital camera. I want to scale one of the images so that it can be easily aligned with the other without having to do it all by hand.
My thought was to use edge detection, connect all the points on the outside of the shoe, scale one image to fit right inside the other, and then scale the original image at the same rate.
I've messed around using different tools in the Image Processing toolbox in MatLab, but am making no progress.
Is there a better way to go about this?
My advise would be to firstly use the function activecontour to obtain the outer contour of the shoe on both images. Then use the function procrustes with the binary images as input.
[~, CameraFittedToScan] = procrustes(Scan,Camera);
This transforms the camera image to best fit with the scanned image. If the scan and camera are not the same size then this needs to be adjusted first using the function imresize.

Does the number of dimensions increase with every convolution in ConvNets?

I am a very beginner at CNN's and I am trying to understand the concept of deep convolutional networks.
I understand that I have to slide my filters over the input image and what I get is an image array. Afterwards I apply ReLU and max-pooling, which leaves me still with an array of images. However, I do not understand what to do when I want to apply another set of filters. Before I had 1 image, which turned into an array of images, but now I have an array of images. Does that mean I will get an array of arrays of images? A 2D array, which is actually 4D because it is a 2D array of 2D arrays - images? And what happens on the next layers? Will there be 5 dimensions? and 6?
Also, can you recommend a good written tutorial (not video) for beginners? Ideally if it has examples for Java.
Any help would be appreciated.
I think you are missing that convolutions work on images including their depth.
If the input image is an RGB image, its depth is 3 and also the depth of the convolutional filters in the first layer is 3. When you slide these 3D filters over your 3D image, you get an array of 2D images as you say. But if you stack these output images along the third dimension, you get a 3D image again. Now it will not have a depth of 3 but the depth will equal the number of the filters you used. So in the second (and any next) layer you get a 3D image as an input and you output a 3D image as well. The depth of the image will vary depending on the number of the filters in the given layer. The depth of the filters must match the depth of the corresponding input image.
You do not say which machine learning toolkit you use but there is one for Java called deeplearning4j. You can find more detailed information in its tutorial on CNNs.

Automatic assembly of multiple Kinect 2 grayscale depth images to a depth map

The project is about measurement of different objects under the Kinect 2. The image acquisition code sample from the SDK is adapted in a way to save the depth information at the whole range and without limiting the values at 0-255.
The Kinect is mounted on a beam hoist and moved in a straight line over the model. At every stop, multiple images are taken, the mean calculated and the error correction applied. Afterwards the images are put together to have a big depth map instead of multiple depth images.
Due to the reduced image size (to limit the influence of the noisy edges), every image has a size of 350x300 pixels. For the moment, the test is done with three images to be put together. As with the final program, I know in which direction the images are taken. Due to the beam hoist, there is no rotation, only translation.
In Matlab the images are saved as matrices with the depth values going from 0 to 8000. As I could only find ideas on how to treat images, the depth maps are transformed into images with a colorbar. Then only the color part is saved and put into the stitching script, i.e. not the axes and the grey part around the image.
The stitching algorithm doesn’t work. It seems to me that the grayscale-images don’t have enough contrast to be treated by the algorithms. Maybe I am just searching in the wrong direction?
Any ideas on how to treat this challenge?

Camera Calibration Kinect Vision Caltech IR camera

So I was trying to calibrate the IR camera of the new Kinect v2 sensor. So I am following all the steps from here http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/example.html
The problem I am having is the following:
The IR image looks fine but once I put it through the program the image I am getting is a mostly white(bright) image. See pics below
Anyone encountered this issue before?
Thanks
You are not reading the IR image pixels correctly. The format is 16 bits per pixel, with only the high 10 ones used (see specification here). You are probably visualizing them as if they were 8bpp images, and therefore they end up white-saturated.
The simplest thing you can do is downshift the values by 8 bits (i.e. divide by 256) before interpreting them in a "standard" 8bpp image.
However, in Matlab you can simply use imagesc to display them with color scaling.

Matlab: accessing both image sequences from a 3D video file

I have recorded a 3D video using a Fujifilm Finepix Real 3d w3 camera. The resulting video file is a single AVI, so the frames from both lenses must somehow be incorporated within the single file.
I now wish to read in the video into Matlab such that I have two image sequences, each corresponding to either the left lens or the right lens.
So far I have played back the AVI file using the computer vision toolbox functions (vision.VideoFileReader etc), however it ignores one of the lenses and plays back only a single lens' image sequence.
How do I access both image sequences within Matlab?