I'm trying to detect a transparent object (glass bottle) in an image.
The image is taken from the Kinect so there's rgb and depth images available.
I read from a literature that the boundary of an transparent object have 'unknown depth values' and I can use that as a boundary condition for detecting the object.
The problem is I cannot find that information from my depth file ie. the depth of the image only returns either zero or other values but never 'unknown'
I assume kinect represent 'unknown depth values' as zeros but this raises another problem:
there's a lot of zeros in the image ( ie. boundary etc) how do I know which zero is for the object?
Thanks alot!!
You could try to detect the body of the transparent object rather than the border. The body should return values of whatever is behind it, but those values will be noisier. Take a time-running sample and calculate a running standard deviation. Look for the region of the image that has larger errors than elsewhere. This is simpler if you have access to the raw data (libfreenect). If the data is converted to distance, then the error is a function of distance, so you need to detect regions that are noisier than other regions at that distance, not just regions that are noisier than elsewhere.
I'd recommend you take a look at the following publication:
They were able to detect objects (such as water bottles and glasses). all undertaken in matlab.
Object localisation via action recognition.
J. Darby, B. Li, R. Cunningham and N. Costen.
ICPR, 2012.
Related
I am stuck at a current project:
I have an input picture showing the ground with some shapes on it. I have to find a specific shape with a given template.
I have to use distance transformation into skeletonization. My question now is: How can I compare two skeletons? As far as I noticed and have been told, the most methods from the Image Processing Toolbox to match templates don't work, since they are not scale-invariant and rotation invariant.
Also some skeletons are really showing the shapes, others are just one or two short lines, with which I couldn't identify the shapes, if I didn't know what they should be.
I've used edge detection, and region growing on the input so there are only interessting shapes left.
On the template I used distance transformation and skeletonization.
Really looking forward to some tips.
Greetings :)
You could look into convolutions?
Basically move your template over your image and see if there is a match, and where.
The max value of your array [x,y] is the location of your object in the image.
Matlab has a built-in 2D convolution function for this
i need some help with a corner detection.
I printed a checkerboard and created an image of this checkerboard with a webcam. The problem is that the webcam has a low resolution, therefore it do not find all corners. So i enhanced the number of searched corner. Now it finds all corner but different one for the same Corner.
All Points are stored in a matrix therefore i don't know which element depends to which point.
(I can not use the checkerboard function because the fuction is not available in my Matlab Version)
I am currently using the matlab function corner.
My Question:
Is it possible to search the Extrema of all the point clouds to get 1 Point for each Corner? Or has sb an idea what i could do ? --> Please see the attached photo
Thanks for your help!
Looking at the image my guess is that the false positives of the corner detection are caused by compression artifacts introduced by the lossy compression algorithm used by your webcam's image acquisition software. You can clearly spot ringing artifacts around the edges of the checkerboard fields.
You could try two different things:
Check in your webcam's acquisition software whether you can disable the compression or change to a lossless compression
Working with the image you already have, you could try to alleviate the impact of the compression by binarising the image using a simple thresholding operation (which in the case of a checkerboard would not even mean loosing information since the image is intrinsically binary).
In case you want to go for option 2) I would suggest to do the following steps. Let's assume the variable storing your image is called img
look at the distribution of grey values using e.g. the imhist function like so: imhist(img)
Ideally you would see a clean bimodal distribution with no overlap. Choose an intensity value I in the middle of the two peaks
Then simply binarize by assigning img(img<I) = 0; img(img>I) = 255 (assuming img is of type uint8).
Then run the corner algorithm again and see if the outliers have disappeared
Which Matlab functions or examples should be used to (1) track distance from moving object to stereo (binocular) cameras, and (2) track centroid (X,Y,Z) of moving objects, ideally in the range of 0.6m to 6m. from cameras?
I've used the Matlab example that uses the PeopleDetector function, but this becomes inaccurate when a person is within 2m. because it begins clipping heads and legs.
The first thing that you need deal with, is in how detect the object of interest (I suppose you have resolved this issue). There are a lot of approaches of how to detect moving objects. If your cameras will stand in a fix position you can work only with one camera and use some background subtraction to get the objects that appear in the scene (Some info here). If your cameras are are moving, I think the best approach is to work with optical flow of the two cameras (instead to use a previous frame to get the flow map, the stereo pair images are used to get the optical flow map in each fame).
In MatLab, there is an option called disparity computation, this could help you to try to detect the objects in scene, after this you need to add a stage to extract the objects of your interest, you can use some thresholds. Once you have the desired objects, you need to put them in a binary mask. In this mask you can use some image momentum (Check this and this) extractor to calculate the centroids. If the images in the binary mask look noissy you can use some morphological operations to improve the reults (watch this).
I am developing an app which uses LK for tracking and POSIT for estimation. I am successful in getting rotation matrix, projection matrix and able to track perfectly but the problem for me is I am not able to translate 3D object properly. The object is not fitting in to the right place where it has to fit.
Will some one help me regarding this?
Check this links, they may provide you some ideas.
http://computer-vision-talks.com/2011/11/pose-estimation-problem/
http://www.morethantechnical.com/2010/11/10/20-lines-ar-in-opencv-wcode/
Now, you must also check whether the intrinsic camera parameters are correct. Even a small error in estimating the field of view can cause troubles when trying to reconstruct 3D space. And from your details, it seems that the problem are bad fov angles (field of view).
You can try to measure them, or feed the half or double value to your algorithm.
There are two conventions for fov: half-angle (from image center to top or left, or from bottom to top, respectively from left to right) Maybe you just mixed them up, using full-angle instead of half, or vice-versa
Maybe you can show us how you build a transformation matrix from R and T components?
Remember, that cv::solvePnP function returns inverse transformation (e.g camera in world) - it finds object pose in 3D space where camera is in (0;0;0). For almost all cases you need inverse it to get correct result: {Rt; -T}
I recall seeing a paper a while back for an algorithm that could automatically and seamlessly "graft" texture from parts of an image onto another part of an image.
The approach was something along the lines of the following:
You'd build up a databases of small squares of pixels (perhaps 8X8) from the parts of the picture that are present.
You'd then pick an empty pixel (the "destination" for the texture graft) to fill in, and look for one of the squares in your database that most closely matches the surrounding pixels. You'd then color the empty pixel according to the color of the corresponding pixel in the square you find. Then you pick another empty pixel and repeat until there are no empty pixels remaining.
Of course, this is only a vague description because I can't find any references to this algorithm to refresh my memory of the details! Can anyone help?
Sounds a lot like Texture Synthesis by Non-parametric Sampling