How can I change the exposure of an sRGB image? - post-processing

I wish to "normalize" the exposure of a set of images before doing further processing. I tried the following:
1) convert sRGB to CIE_XYZ per Wikipedia page on sRGB;
2) multiply or divide "Y" by 2 to achieve a 1 stop EV change;
3) convert CIE_XYZ back to sRGB.
The problem is that step 3 frequently yields negative values (they arise after matrix multiplication to convert back to linear rgb).
In particular, my test set of sRGB values have the form (n,n,n) where 0<=n<=255.
I would expect these to be near the center of the gamut, and that a 1 stop change would not push me out of the gamut.
What is wrong with this approach??

I believe that user:1146345's comment is the most accurate, in that it refers to the non-linearity introduced by the raw->rgb conversion. So, for example, conversion from sRGB -> linear RGB -> multiply by 2^(delta stops) -> sRGB will not work well near the ends of the curve. But we don't know how to characterize this non-linearity, since it most likely varies by camera.

There Are Good Linear Image Apps
Using an application such as Adobe AfterEffects, converting to linear is trivial, and most of the tools you need remain available. Unfortunately, Photoshop's implementation of 32bit float linear is less functional.
Nevertheless, once in 32bit floating-point linear space (gamma 1.0), then all the linear math you do functions like light in the real world. In the Film/TV industry, we work in linear most of the time, if not in AfterEffects, then Nuke or Fusion, etc.
Human perception is NOT linear however — so while linear math on linearized image data will behave the way light does, it won't be relative to perception. If you want to use linear math to affect perception in a linear way, then you need to be in a perceptually uniform colorspace such as CIELAB.
Let's assume though that you want to change photometric "exposure", then you want to affect light values as they would be affected in the real world, and so you need to linearize your image data. AE has tools to help here - you'd first set your project to 32 bit floating point, and then select an appropriate profile and "linearize". Make sure you turn ON display color management.
When you import an image, use the appropriate profile to "unwind" it into linear space.
If not using AE, but using MatLab or Octave, then apply the sRGB transfer curve (aka gamma) to unwind the image into linear space.
S Curves etc.
I see some of the comments regarding cameras/debayering algorithms adding S curves aka "soft clip" at the high or low ends. Going to CIEXYZ is not going to help this, and only adds unneeded matrix math.
Stay in sRGB
You will typically be fine to just linearize the sRGB, and stay in linearRGB for your various manipulations. If you ae scaling luminance by 2, then you are probably going to want to be adjusting the high clip anyway - any soft clip is just going to be scaled along with the rest of the image, and that is really not an issue, so long as you are in 32bit floating point, then you won't have any significant quantization errors, and you can adjust the S curves after exposure.
If you want you can use "Curves" to adjust/expand the high end. AE though also has a built in ImageRAW importer, so you can import directly from RAW and set it to not compress highlights.
If you don't have access to the RAW and only the JPG, then again, it should be fine so long as you are in linear 32 bit. After all your manipulations, just re-apply the gamma curve, and the original S curves will remain intact relative to the image highlight, which is usually what you want.
Plug Ins and Gamma
Note that AE and PS and others do have "exposure" plug-ins that can affect this change.
BUT ALSO:
Keep in mind that if you are wanting to emulate real film, that each of the color records has a different gamma, and in film they do interact more than the digital values in sRGB which essentially remain separate.
If you are trying to emulate a film look, try using the LEVELS plugin and playing with the gamma/hi/lo of each color channel separately. Or do the same using CUVES.

Related

How do video encoding standards(like h.264) then serialize motion prediction?

Motion prediction brute force algorithms, in a nutshell work like this(if I'm not mistaken):
Search every possible macroblock in the search window
Compare each of them with the reference macroblock
Take the one that is the most similar and encode the DIFFERENCE between the frames instead of the actual frame.
Now this in theory makes sense to me. But when it gets to the actual serializing I'm lost. We've found the most similar block. We know where it is, and from that we can calculate the distance vector of it. Let's say it's about 64 pixels to the right.
Basically, when serializing this block, we do:
Ignore everything but luminosity(encode only Y, i think i saw this somewhere?), take note of the difference between it and the reference block
Encode the motion, a distance vector
Encode the MSE, so we can reconstruct it
Is the output of this a simple 2D array of luminosity values, with an appended/prepended MSE value and distance vector? Where is the compression in this? We got to take out the UV component? There seem to be many resources that take on the surface level of video encoders, but it's very hard to find actual in-depth explanations of modern video encoders. Feel free to correct me on my above statements.
Grossly oversimplified:
Encoders include built-in decoder functionality. That generates a reference frame for the encoder to use. It's the same frame, inaccuracies and all, that comes out of the decoder at the far end for display to the viewer.
Motion estimation, which can be absent, simple, or complex, generates a motion vector for each 4x4 or 16x16 macroblock, by comparing the reference frame to the input frame.
The decoders (both the built-in one and the one at the far end) apply them to their current decoded image.
Then the encoder generates the pixel-by-pixel differences between the input image and decoded image, compresses them, and sends them to to the decoder. H.264 first uses lossy integer transforms (a form of discrete cosine transforms) on the luma and chroma channels. Then it applies lossless entropy coding to the output of the integer transforms. ("zip" and "gzip" are examples of lossless entropy coding, but not the codings used in H.264).
The point of motion estimation is to reduce the differences between the input image and the reference image before encoding those differences.
(This is for P frames. It's more complex for B frames.)
Dog-simple motion estimation could compute a single overall vector and apply it to all macroblocks in the image. That would be useful for applications where the primary source of motion is slowly panning and tilting the camera.
More complex motion estimation can be optimized for one or more talking heads. Another way to handle it would be to detect multiple arbitrary objects and track each object's movement from frame to frame.
And, if an encoder cannot generate motion vectors at all, everything works the same on the decoder.
The complexity of motion estimation is a feature of the encoder. The more compute cycles it can use to search for motion, the fewer image differences there will be from frame to frame, and so the fewer image-difference bits need to be sent to the far end for the same image-difference quantization level. So, the viewer gets better picture quality for the same number of bits per second, or alternatively the same picture quality for fewer bits per second.
Motion estimation can analyze the luma only, or the luma and chroma. The motion vectors are applied to the luma and both chroma channels in all cases.

DWT: What is it and when and where we use it

I was reading up on the DWT for the first time and the document stated that it is used to represent time-frequency data of a signal which other transforms do not provide.
But when I look for a usage example of the DWT in MATLAB I see the following code:
X=imread('cameraman.tif');
X=im2double(X);
[F1,F2]= wfilters('db1', 'd');
[LL,LH,HL,HH] = dwt2(X,'db1','d');
I am unable to understand the implementation of dwt2 or rather what is it and when and where we use it. What actually does dwt2 return and what does the above code do?
The first two statements simply read in the image, and convert it so that the dynamic range of each channel is between [0,1] through im2double.
Now, the third statement, wfilters constructs the wavelet filter banks for you. These filter banks are what are used in the DWT. The method of the DWT is the same, but you can use different kinds of filters to achieve specific results.
Basically, with wfilters, you get to choose what kind of filter you want (in your case, you chose db1: Daubechies), and you can optionally specify the type of filter that you want. Different filters provide different results and have different characteristics. There are a lot of different wavelet filter banks you could use and I'm not quite the expert as to the advantages and disadvantages for each filter bank that exists. Traditionally, Daubechies-type filters are used so stick with those if you don't know which ones to use.
Not specifying the type will output both the decomposition and the reconstruction filters. Decomposition is the forward transformation where you are given the original image / 2D data and want to transform it using the DWT. Reconstruction is the reverse transformation where you are given the transform data and want to recreate the original data.
The fourth statement, dwt2, computes the 2D DWT for you, but we will get into that later.
You specified the flag d, so you want only the decomposition filters. You can use wfilters as input into the 2D DWT if you wish, as this will specify the low-pass and high-pass filters that you want to use when decomposing your image. You don't have to do it like this. You can simply specify what filter you want to use, which is how you're calling the function in your code. In other words, you can do this:
[F1,F2]= wfilters('db1', 'd');
[LL,LH,HL,HH] = dwt2(X,F1,F2);
... or you can just do this:
[LL,LH,HL,HH] = dwt2(X,'db1','d');
The above statements are the same thing. Note that there is a 'd' flag on the dwt2 function because you want the forward transform as well.
Now, dwt2 is the 2D DWT (Discrete Wavelet Transform). I won't go into the DWT in detail here because this isn't the place to talk about it, but I would definitely check out this link for better details. They also have fully working MATLAB code and their own implementation of the 2D DWT so you can fully understand what exactly the DWT is and how it's computed.
However, the basics behind the 2D DWT is that it is known as a multi-resolution transform. It analyzes your signal and decomposes your signal into multiple scales / sizes and features. Each scale / size has a bunch of features that describe something about the signal that was not seen in the other scales.
One thing about the DWT is that it naturally subsamples your image by a factor of 2 (i.e. halves each dimension) after the analysis is done - hence the multi-resolution bit I was talking about. For MATLAB, dwt2 outputs four different variables, and these correspond to the variable names of the output of dwt2:
LL - Low-Low. This means that the vertical direction of your 2D image / signal is low-pass filtered as well as the horizontal direction.
LH - Low-High. This means that the vertical direction of your 2D image / signal is low-pass filtered while the horizontal direction is high-pass filtered.
HL - High-Low. This means that the vertical direction of your 2D image / signal is high-pass filtered while the horizontal direction is low-pass filtered.
HH - High-High. This means that the vertical direction of your 2D image / signal is high-pass filtered as well as the horizontal direction.
Roughly speaking, LL corresponds to just the structural / predominant information of your image while HH corresponds to the edges of your image. The LH and HL components I'm not too familiar with, but they're used in feature analysis sometimes. If you want to do a further decomposition, you would apply the DWT again on the LL only. However, depending on your analysis, the other components are used.... it just depends on what you want to use it for! dwt2 only performs a single-level DWT decomposition, so if you want to use this again for the next level, you would call dwt2 on the LL component.
Applications
Now, for your specific question of applications. The DWT for images is mostly used in image compression and image analysis. One application of the 2D DWT is in JPEG 2000. The core of the algorithm is that they break down the image into the DWT components, then construct trees of the coefficients generated by the DWT to determine which components can be omitted before you save the image. This way, you eliminate extraneous information, but there is also a great benefit that the DWT is lossless. I don't know which filter(s) is/are being used in JPEG 2000, but I know for certain that the standard is lossless. This means that you will be able to reconstruct the original data back without any artifacts or quantization errors. JPEG 2000 also has a lossy option, where you can reduce the file size even more by eliminating more of the DWT coefficients in such a way that is imperceptible to the average use.
Another application is in watermarking images. You can embed information in the wavelet coefficients so that it prevents people from trying to steal your images without acknowledgement. The DWT is also heavily used in medical image analysis and compression as the images generated in this domain are quite high resolution and quite large. It would be extremely useful if you could represent the images in the same way but occupying less physical space in comparison to the standard image compression algorithms (that are also lossy if you want high compression ratios) that exist.
One more application I can think of would be the dynamic delivery of video content over networks. Depending on what your connection speed is or the resolution of your screen, you get a lower or higher quality video. If you specifically use the LL component of each frame, you would stream / use a particular version of the LL component depending on what device / connection you have. So if you had a bad connection or if your screen has a low resolution, you would most likely show the video with the smallest size. You would then keep increasing the resolution depending on the connection speed and/or the size of your screen.
This is just a taste as to what the DWT is used for (personally, I don't use it because the DWT is used in domains that I don't personally have any experience in), but there are a lot more applications that are quite useful where the DWT is used.

Hardware accelerated image comparison/search?

I need to find the position of a smaller image inside a bigger image. The smaller image is a subset of the bigger image. The requirement is also that pixel values can slightly differ for example if images were produced by different JPEG compressions.
I've implemented the solution by comparing bytes using the CPU but I'm now looking into any possibility to speed up the process.
Could I somehow utilize OpenGLES and thus iPhone GPU for it?
Note: images are grayscale.
#Ivan, this is a pretty standard problem in video compression (finding position of current macroblock in previous frame). You can use a metric for difference in pixels such as sum of abs differences (SAD), sum of squared differences (SSD), or sum of Hadamard-transformed differences (SATD). I assume you are not trying to compress video but rather looking for something like a watermark. In many cases, you can use a gradient descent type search to find a local minimum (best match), on the empirical observation that comparing an image (your small image) to a slightly offset version of same (a match the position of which hasn't been found exactly) produces a closer metric than comparing to a random part of another image. So you can start by sampling the space of all possible offsets/positions (motion vectors in video encoding) rather coarsely, and then do local optimization around the best result. The local optimization works by comparing a match to some number of neighboring matches, and moving to the best of those if any is better than your current match, repeat. This is very much faster than brute force (checking every possible position), but it may not work in all cases (it is dependent on the nature of what is being matched). Unfortunately, this type of algorithm does not translate very well to GPU, because each step depends on previous steps. It may still be worth it; if you check eg 16 neighbors to the position for a 256x256 image, that is enough parallel computation to send to GPU, and yes it absolutely can be done in OpenGL-ES. However the answer to all that really depends on whether you're doing brute force or local minimization type search, and whether local minimization would work for you.

How to Compare the quality of two images?

I have applied Two different Image Enhancement Algorithm on a particular Image and got two resultant image , Now i want to compare the quality of those two image in order to find the effectiveness of those two Algorithms and find the more appropriate one based on the comparison of Feature vectors of those two images.So what Suitable Feature Vectors should i compare in this Case?
Iam asking in context of comparing the texture features of the images and which feature vector will be more suitable.
I need Mathematical support for verifying the effectiveness of any one algorithm based on the evaluation of Images for example using Constrast and Variance.So are there any more approaches do that?
A better approach would be to do some Noise/Signal ratio by comparing image spectra ?
Slayton is right, you need a metric and a way to measure against it, which can be an academic project in itself. However, i could think of one approach straightaway, not sure if it makes sense to your specific task at hand:
Metric:
The sum of abs( colour difference ) across all pixels. The lower, the more similar the images are.
Method:
For each pixel, get the absolute colour difference (or distance, to be precise) in LAB space between original and processed image and sum that up. Don't ruin your day trying to understand the full wikipedia article and coding that, this has been done before. Try re-using the methods getDistanceLabFrom(Color color) or getDistanceRgbFrom(Color color) from this PHP implementation. It worked like a charm for me when i needed a way to match a color of pixels in a jpg picture - which basically is the same principle.
The theory behind it (as far as my limited understanding goes): It's doing a mathematical abstraction of rgb or (better) lab colour space as a three dimensional room, and then calculate the distance, that's why it works well - and hardly worked for me when looking at a color code from a one-dimensionional perspective.
The usual way is to start with a reference image (a good one), then add some noise on it (in a controlled way).
Then, your algorithm should remove as much as possible from the added noise. The results are easy to compare with a signal-to-noise ration (see wikipedia).
Now, the approach is easy to apply on simple noise models, but if you aim to improve more complex appearance issues, you must devise a way to apply the noise, which is not easy.
Another, quite common way to do it is the one recommended by slayton: take all your colleagues to appreciate the output of your algorithm, then average their impressions.
If you have only the 2 images and no reference (higest quality) image, then you can see my rude solution/bash script there: https://photo.stackexchange.com/questions/75995/how-do-i-compare-two-similar-images-sharpness/117823#117823
It gets the 2 filenames and outputs the higher quality filename. It assumes the content of the images is identical (same source image).
It can be fooled though.

Lucas Kanade Optical Flow, Direction Vector

I am working on optical flow, and based on the lecture notes here and some samples on the Internet, I wrote this Python code.
All code and sample images are there as well. For small displacements of around 4-5 pixels, the direction of vector calculated seems to be fine, but the magnitude of the vector is too small (that's why I had to multiply u,v by 3 before plotting them).
Is this because of the limitation of the algorithm, or error in the code? The lecture note shared above also says that motion needs to be small "u, v are less than 1 pixel", maybe that's why. What is the reason for this limitation?
#belisarius says "LK uses a first order approximation, and so (u,v) should be ideally << 1, if not, higher order terms dominate the behavior and you are toast. ".
A standard conclusion from the optical flow constraint equation (OFCE, slide 5 of your reference), is that "your motion should be less than a pixel, less higher order terms kill you". While technically true, you can overcome this in practice using larger averaging windows. This requires that you do sane statistics, i.e. not pure least square means, as suggested in the slides. Equally fast computations, and far superior results can be achieved by Tikhonov regularization. This necessitates setting a tuning value(the Tikhonov constant). This can be done as a global constant, or letting it be adjusted to local information in the image (such as the Shi-Tomasi confidence, aka structure tensor determinant).
Note that this does not replace the need for multi-scale approaches in order to deal with larger motions. It may extend the range a bit for what any single scale can deal with.
Implementations, visualizations and code is available in tutorial format here, albeit in Matlab not Python.