In my understanding (a very simplistic view), the inter prediction (motion estimation / compensation) of H264 standard first finds the best match block on the reference frame, and then it's encoded with motion vector (the effective new X and Y) and the residual (prediction vs reality).
But how does the decoder knows how to fill the old space where the predicted block was before? I'm supposing that the residual is calculated from its new position, on a block level, not a frame level.
Let's say the encoder decided to use inter prediction to the encode the following two images, it calculates where does the ball should be (its new position and the residual energy) but how does it fill the old space?
Motion compensation is just optimization of frame encoding. If we talk about motion vectors, this is "block motion compensation", as defined in Wikipedia:
Block motion compensation divides up the current frame into non-overlapping blocks, and the motion compensation vector tells where those blocks come from (a common misconception is that the previous frame is divided up into non-overlapping blocks, and the motion compensation vectors tell where those blocks move to). The source blocks typically overlap in the source frame. Some video compression algorithms assemble the current frame out of pieces of several different previously-transmitted frames.
So, motion vector is not "move the block from old frame", it is (intra-) encoding of current frame's macroblocks, when some blocks are copied from older frame with some small shift (and some blocks can be copied several times from previous frame into current; and most block are copied with zero motion vector). In theory we can encode new frame by its macroblocks, but with help of motion compensation we have change to get lot of image information from previous frames and encode less. Parts of images which are not compensated are encoded with image macroblocks.
Example from other Wikipedia page from "Big Buck Bunny" free film:
Frame with motion vectors displayed, dots are zero vectors
Residual image after motion compensation to be encoded
There is good description of the encoding/decoding processes in H.264 but in russian: http://www.compression.ru/dv/course/compr_h264.pdf (from http://www.compression.ru/video/ site)
And description of motion compensation in English: http://inst.eecs.berkeley.edu/~ee290t/sp04/lectures/02-Motion_Compensation_girod.pdf (prediction error is encoded, and completely new image will be is almost fully mispredicted. Background image has low frequency of information and most probably will be motion compensated by some of nearby background block.)
Related
Which Matlab functions or examples should be used to (1) track distance from moving object to stereo (binocular) cameras, and (2) track centroid (X,Y,Z) of moving objects, ideally in the range of 0.6m to 6m. from cameras?
I've used the Matlab example that uses the PeopleDetector function, but this becomes inaccurate when a person is within 2m. because it begins clipping heads and legs.
The first thing that you need deal with, is in how detect the object of interest (I suppose you have resolved this issue). There are a lot of approaches of how to detect moving objects. If your cameras will stand in a fix position you can work only with one camera and use some background subtraction to get the objects that appear in the scene (Some info here). If your cameras are are moving, I think the best approach is to work with optical flow of the two cameras (instead to use a previous frame to get the flow map, the stereo pair images are used to get the optical flow map in each fame).
In MatLab, there is an option called disparity computation, this could help you to try to detect the objects in scene, after this you need to add a stage to extract the objects of your interest, you can use some thresholds. Once you have the desired objects, you need to put them in a binary mask. In this mask you can use some image momentum (Check this and this) extractor to calculate the centroids. If the images in the binary mask look noissy you can use some morphological operations to improve the reults (watch this).
I'm trying to fit two data sets. Those contain the results of measuring the same object with two different measurement devices (x-ray vs. µct).
I did manage to reconstruct the image data and fit the orientation and offset of the stacks. It looks like this (one image from a stack of about 500 images):
The whole point of this is to compare several denoising algorithms on the x-ray data (left). It is assumed that the data from µCT (right) is close to the real signal without any noise. So, I want to compare the denoised x-ray data from each of the algorithms to the "pure" signal from µCT to see which algorithm produces the lowest RMS-error. Therefore, I need to somehow fit the grayvalues from the left part to those of the right part without manipulating the noise too much.
The gray values in the right are in the range of 0 to 100 whereas the x-ray data ranges from about 4000 to 30000. The "bubbles" are in a range of about 8000 to 11000. (those are not real bubbles but an artificial phantom with holes out of a 3D printer)
What I tried to do is (kind of) band pass those bubbles and map them to ~100 while shifting everything else towards 4 (which is the value for the background on the µCT data).
That's the code for this:
zwst = zwsr;
zwsr(zwst<=8000)=round(zwst(zwst<=6500)*4/8000);
zwsr(zwst<=11000 & zwst>8000)= round(zwst(zwst<=11000 & zwst>8000)/9500*100);
zwsr(zwst>11000)=round(zwst(zwst>11000)*4/30000);
The results look like this:
Some of those bubbles look distorted and the noise part in the background is gone completely. Is there any better way to fit those gray values while maintaining the noisy part?
EDIT: To clarify things: The µCT data is assumed to be noise free while the x-ray data is assumed to be noisy. In other words, µCT = signal while x-ray = signal + noise. To quantize the quality of my denoising methods, I want to calculate x-ray - µCT = noise.
Too long for a comment, and I believe a reasonable answer:
There is a huge subfield of image processing/ signal processing called image fusion. There is even a specific Matlab library for that using wavelets (http://uk.mathworks.com/help/wavelet/gs/image-fusion.html).
The idea behind image fusion is: given 2 images of the same thing but with very different resolution/data, how can we create a single image containing the information of both?
Stitching both images "by hand" does not give very good result generally so there are a big amount of techniques to do it mathematically. Waveletes are very common here.
Generally this techniques are widely used in medical imaging , as (like in your case) different imaging techniques give different information, and doctors want all of them together:
Example (top row: images pasted together, bottom row: image fusion techniques)
Have a look to some papers, some matlab tutorials, and probably you'll get there with the easy-to-use matlab code, without any fancy state of the art programming.
Good luck!
I am using an optical microscope and camera to record some videos which are post-processed in MATLAB.
Real time acquisition and pixel statistics would be extremely helpful, because some of what I am looking at absorbs very little light (I am using transmission mode). An example is that a blank (background) sample would give me an an average pixel value across a 512x512 ccd array of something like 144 (grayscale). An actual sample might have an average value of 140 or so. This subtle shift in pixel intensity would be useful in helping me focus the microscope.
Unfortunately, my camera setup is not supported by MATLAB, so I cannot use the image acquisition toolbox for real time. So I was wondering, is there a way that I could 'fake' real time image acquisition by selecting say a rectangle of my current desktop (the rectangle that is the video output of the microscopes camera), for matlab to record in real time?
Thanks
Does any body know how to find average luminosity for a texture in a fragment shader? I have access to both RGB and YUV textures the Y component in YUV is an array and I want to get an average number from this array.
I recently had to do this myself for input images and video frames that I had as OpenGL ES textures. I didn't go with generating mipmaps for these due to the fact that I was working with non-power-of-two textures, and you can't generate mipmaps for NPOT textures in OpenGL ES 2.0 on iOS.
Instead, I did a multistage reduction similar to mipmap generation, but with some slight tweaks. Each step down reduced the size of the image by a factor of four in both width and height, rather than the normal factor of two used for mipmaps. I did this by sampling from four texture locations that were in the middle of the four squares of four pixels each that made up a 4x4 area in the higher-level image. This takes advantage of hardware texture interpolation to average the four sets of four pixels, then I just had to average those four pixels to yield a 16X reduction in pixels in a single step.
I converted the image to luminance at the very first stage using a dot product of the RGB values with a vec3 of (0.2125, 0.7154, 0.0721). This allowed me to just read the red channel for each subsequent reduction stage, which really helps on iOS hardware. Note that you don't need this if you are starting with a Y channel luminance texture already, but I was dealing with RGB images.
Once the image had been reduced to a sufficiently small size, I read the pixels from that back onto the CPU and did a last quick iteration over the remaining few to arrive at the final luminosity value.
For a 640x480 video frame, this process yields a luminosity value in ~6 ms on an iPhone 4, and I think I can squeeze out a 1-2 ms reduction in that processing time with a little tuning. In my experience, that seems faster than the iOS devices normally generate mipmaps for power-of-two images at around that size, but I don't have solid numbers to back that up.
If you wish to see this in action, check out the code for the GPUImageLuminosity class in my open source GPUImage framework (and the GPUImageAverageColor superclass). The FilterShowcase example demonstrates this luminosity extractor in action.
You generally don't do this just with a shader.
One of the more common methods is to create a buffer texture with full mip-maps (down to 1x1, this is important). When you want to find luminosity, you copy the backbuffer to this buffer, then regenerate mips with a nearest neighbor algorithm. The bottom pixel will then have the average color of the entire surface and can be used to find average lum through something like (c.r * 0.6) + (c.g * 0.3) + (c.b * 0.1) (edit: if you have a YUV, then do similar and use the Y; the trick is just averaging the texture down to a single value, which is what mips do).
This isn't a precise technique, but is reasonably fast, especially on hardware that can generate mipmaps internally.
I'm presenting a solution for the RGB texture here as I'm not sure mip map generation would work with a YUV texture.
The first step is to create mipmaps for the texture, if not already present:
glGenerateMipmapOES(GL_TEXTURE_2D);
Now we can access the RGB value of the smallest mipmap level from the fragment shader by using the optional third argument of the sampler function texture2D, the "bias":
vec4 color = texture2D(sampler, vec2(0.5, 0.5), 8.0);
This will shift the mipmap level up eight levels, resulting in sampling a far smaller level.
If you have a 256x256 texture and render it with a scale of 1, a bias of 8.0 will effectively reduce the picked mipmap to the smallest 1x1 level (256 / 2^8 == 1). Of course you have to adjust the bias for your conditions to sample the smallest level.
OK, now we have the average RGB value of the whole image. The third step is to reduce RGB to a luminosity:
float lum = dot(vec3(0.30, 0.59, 0.11), color.xyz);
The dot product is just a fancy (and fast) way of calculating a weighted sum.
I was currently doing a project on recognizing the vehicle license plate at the rear side, i have done the OCR as the preliminary step, but i have no idea on how to detect the rectangle shaped(which is the concerned area of the car) license plate, i have read lot of papers but in no where i found a useful information about recognizing the rectangle shaped area of the license plate. I am doing my project using matlab. Please anyone help me with this ...
Many Thanks
As you alluded to, there are at least two distinct phases:
Locating the number plate in the image
Recognising the license number from the image
Since number plates do not embed any location marks (as found in QR codes for example), the complexity of recognising the number plate within the image is reduced by limiting range of transformation on the incoming image.
The success of many ANPR systems relies on the accuracy of the position and timing of the capturing equipment to obtain an image which places the number plate within a predictable range of distortion.
Once the image is captured the location phase can be handled by using a statistical analysis to locate a "number plate" shaped region within the image, i.e. one which is of the correct proportions for the perspective. This article describes one such approach.
This paper and another one describe using Sobel edge detector to locate vertical edges in the number plate. The reasoning is that the letters form more vertical lines compared to the background.
Another paper compares the effectiveness of some techniques (including Sobel detection and Haar wavelets) and may be a good starting point.
I had done my project on 'OCR based Vehicle Identification'
In general, LPR consist of three main phases: License Plate Extraction from the captured image, image segmentation to extract individual characters and character recognition. All the above phases of License Plate Detection are most challenging as it is highly sensitive towards weather condition, lighting condition and license plate placement and other artefact like frame, symbols or logo which is placed on licence plate picture, In India the license number is written either in one row or in two rows.
For LPR system speed and accuracy both are very important factors. In some of the literatures accuracy level is good but speed of the system is less. Like fuzzy logic and neural network approach the accuracy level is good but they are very time consuming and complex. In our work we have maintained a balance between time complexity and accuracy. We have used edge detection method and vertical and horizontal processing for number plate localization. The edge detection is done with ‘Roberts’ operator. The connected component analysis (CCA) with some proper thresholding is used for segmentation. For character recognition we have used template matching by correlation function and to enhance the level of matching we have used enhanced database.
My Approach for Project
Input image from webcam/camera.
Convert image into binary.
Detect number plate area.
Segmentation.
Number identification.
Display on GUI.
My Approach for Number Plate Extraction
Take input from webcam/camera.
Convert it into gray-scale image.
Calculate threshold value.
Edge detection using Roberts’s operator.
Calculate horizontal projection.
Crop image horizontally by comparing with 1.3 times of threshold value.
Calculate vertical projection.
Crop image vertically.
My Approach for Segmentation
Convert extracted image into binary image.
Find in-compliment image of extracted binary image.
Remove connected component whose pixel value is less than 2% of area.
Calculate number of connected component.
For each connected component find the row and column value
Calculate dynamic thresholding (DM).
Remove unwanted characters from segmented characters by applying certain conditions
Store segmented characters coordinates.
My Approach for Recognition
Initialize templates.
For each segmented character repeat step 2 to 7
Convert segmented characters to data base image size i.e. 24x42.
Find correlation coefficient value of segmented character with each data base image and store that value in array.
Find out the index position of maximum value in the array.
Find the letter which is link by that index value
Store that letter in a array.
Check out OpenALPR (http://www.openalpr.com). It recognizes plate regions using OpenCV and the LBP/Haar algorithm. This allows it to recognize both light on dark and dark on light plate regions. After it recognizes the general region, it uses OpenCV to localize based on strong lines/edges in the image.
It's written in C++, so hopefully you can use it. If not, at least it's a reference.