I am a very beginner at CNN's and I am trying to understand the concept of deep convolutional networks.
I understand that I have to slide my filters over the input image and what I get is an image array. Afterwards I apply ReLU and max-pooling, which leaves me still with an array of images. However, I do not understand what to do when I want to apply another set of filters. Before I had 1 image, which turned into an array of images, but now I have an array of images. Does that mean I will get an array of arrays of images? A 2D array, which is actually 4D because it is a 2D array of 2D arrays - images? And what happens on the next layers? Will there be 5 dimensions? and 6?
Also, can you recommend a good written tutorial (not video) for beginners? Ideally if it has examples for Java.
Any help would be appreciated.
I think you are missing that convolutions work on images including their depth.
If the input image is an RGB image, its depth is 3 and also the depth of the convolutional filters in the first layer is 3. When you slide these 3D filters over your 3D image, you get an array of 2D images as you say. But if you stack these output images along the third dimension, you get a 3D image again. Now it will not have a depth of 3 but the depth will equal the number of the filters you used. So in the second (and any next) layer you get a 3D image as an input and you output a 3D image as well. The depth of the image will vary depending on the number of the filters in the given layer. The depth of the filters must match the depth of the corresponding input image.
You do not say which machine learning toolkit you use but there is one for Java called deeplearning4j. You can find more detailed information in its tutorial on CNNs.
Related
I have two images of the same shoe sole, one taken with a scanning machine and another with a digital camera. I want to scale one of the images so that it can be easily aligned with the other without having to do it all by hand.
My thought was to use edge detection, connect all the points on the outside of the shoe, scale one image to fit right inside the other, and then scale the original image at the same rate.
I've messed around using different tools in the Image Processing toolbox in MatLab, but am making no progress.
Is there a better way to go about this?
My advise would be to firstly use the function activecontour to obtain the outer contour of the shoe on both images. Then use the function procrustes with the binary images as input.
[~, CameraFittedToScan] = procrustes(Scan,Camera);
This transforms the camera image to best fit with the scanned image. If the scan and camera are not the same size then this needs to be adjusted first using the function imresize.
The project is about measurement of different objects under the Kinect 2. The image acquisition code sample from the SDK is adapted in a way to save the depth information at the whole range and without limiting the values at 0-255.
The Kinect is mounted on a beam hoist and moved in a straight line over the model. At every stop, multiple images are taken, the mean calculated and the error correction applied. Afterwards the images are put together to have a big depth map instead of multiple depth images.
Due to the reduced image size (to limit the influence of the noisy edges), every image has a size of 350x300 pixels. For the moment, the test is done with three images to be put together. As with the final program, I know in which direction the images are taken. Due to the beam hoist, there is no rotation, only translation.
In Matlab the images are saved as matrices with the depth values going from 0 to 8000. As I could only find ideas on how to treat images, the depth maps are transformed into images with a colorbar. Then only the color part is saved and put into the stitching script, i.e. not the axes and the grey part around the image.
The stitching algorithm doesn’t work. It seems to me that the grayscale-images don’t have enough contrast to be treated by the algorithms. Maybe I am just searching in the wrong direction?
Any ideas on how to treat this challenge?
I've designed an algorithm that matches correspondent lines seen from different positions of a robot.
Now I want to merge correspondent lines into one.
Does anyone know an algorithm for this purpose?
It seems like what you're trying to do is a mosaic but restricted to 2D. Or at least something similar considering only extracted features. I'll go through the basic idea of how to do it (as I remember it anyway).
You extract useful features in both images (your lines)
You do feature matching (your matching)
You extract relative positional information about your cameras from the matched features. This allows to determining a transform between the two.
You transform one image into the other's perspective or both to a different perspective
Since you say you're working in a 2D plane that's where you will want to transform to. If your scans can be considered to not add any 3D distortion (always from the same hight facing perpendicular to the plane) then you need only deal with 2D transformations.
To do what you call the merging of the lines you need to perform step 3 and 4 of the mosaic algorithm.
For step 3 you will need to use a robust approach to calculate your 2D Transformation (rotation and translation) from one picture/scan to the other. Probably something like least mean squares (or other approaches for estimating parameters from multiple values).
For step 4 you use the calculated 2D transform and possibly a previous transformation that was calculated for the previous picture (not needed if you're matching from the composed image, a.k.a moasic, to a new image instead of sequetial images) use it on the image it would apply to. In your case probably just your 2D lines from the new scan (and not a full image) will need to be transformed by this global 2D transform to take their position and orientation to the global map reference.
Hope this helps. Good Luck!
I am trying to register two volumetric images from brain (PET and CT or even PET and MR). Each of these volumetric images contains different numbers of 2D images (slices).
For example, CT has 150 slices and PET has 100 slices. I was thinking of using an interpolation method to calculate and reduce the number of CT slices to 100. Is this a correct approach? Does anyone know of any resources that could be helpful for me? like a pseudo code, or steps that I should go through for registering two volumetric images.
Thank you :)
If you know the spacing information for the 150 CT slices and the 100 PET slices, you can look into MATLAB's interp1 function for interpolating along one axis to rescale the images to the same number of pixels. From here it might be possible to use MATLAB's imregister to perform registration.
If you are looking to learn how registration works under the hood (transforming between pixel and physical coordinates, transforming/resampling images, etc.), one resource I can direct you to is the ITK Software Guide pdf.
In particular, try reading Book 1 Section 4.1.4 (page 41 of the pdf) on image representation, and Book 2 Section 3.9 (page 532 of the pdf) on transforms.
In general, the problem of transforming and interpolating with 3D images in registration can be pretty cumbersome to write code for. You need to ask yourself about the spacing and orientation of pixels, how to transform and interpolate images so that their grids overlap, and you also need to decide what to do with pixels in your grid that lie outside the image boundary when evaluating the similarity metric.
While it's up to you to do what you think is best, I suggest you use existing registration programs if they are capable of doing what you want:
MATLAB's imregister (I have never used it so I can't comment on it)
simpleITK for Python
the ITK for C++ has a learning curve but gives full control over the registration process
elastix is a command line program that uses a text file of parameters to perform registration.
3D slicer has a graphical user interface for simple linear registration
Recently, I have to do a project of multi view 3D scanning within this 2 weeks and I searched through all the books, journals and websites for 3D reconstruction including Mathworks examples and so on. I written a coding to track matched points between two images and reconstruct them into 3D plot. However, despite of using detectSURFFeatures() and extractFeatures() functions, still some of the object points are not tracked. How can I reconstruct them also in my 3D model?
What you are looking for is called "dense reconstruction". The best way to do this is with calibrated cameras. Then you can rectify the images, compute disparity for every pixel (in theory), and then get 3D world coordinates for every pixel. Please check out this Stereo Calibration and Scene Reconstruction example.
The tracking approach you are using is fine but will only get sparse correspondences. The idea is that you would use the best of these to try to determine the difference in camera orientation between the two images. You can then use the camera orientation to get better matches and ultimately to produce a dense match which you can use to produce a depth image.
Tracking every point in an image from frame to frame is hard (its called scene flow) and you won't achieve it by identifying individual features (such as SURF, ORB, Freak, SIFT etc.) because these features are by definition 'special' in that they can be clearly identified between images.
If you have access to the Computer Vision Toolbox of Matlab you could use their matching functions.
You can start for example by checking out this article about disparity and the related matlab functions.
In addition you can read about different matching techniques such as block matching, semi-global block matching and global optimization procedures. Just to name a few keywords. But be aware that the topic of stereo matching is huge one.