Determining Lost Image Quality Through Lossy Compression - image-compression

I recently came upon a question that I haven't seen anywhere else while searching about lossy compression. Can you determine the quality lost through a certain algorithm? I have been asking around and it seems like that there isn't a sure way to determine the quality lost compared to an original image and can only be differentiated by the naked eye. Is there an algorithm that shows % lost or blended?
I would really appreciate it if someone could give me some insight into this matter.

You can use lots of metrics to measure quality loss. But, of course, each metric will interpret quality loss differently.
One direction, following the suggestion already commented, would be to use something like the Euclidian distance or the mean squared error between the original and the compressed image (considered as vectors). There are many more metrics of this "absolute" kind.
The above will indicate a certain quality loss but the result may not correlate with human perception of quality. To give more weight to perception you can inspect the structural similarity of the images and use the structural similarity index measure (SSIM) or one of its variants. Another algorithm in this area is butteraugli.
In Python, for instance, there is an implementation of SSIM in the scikit-image package, see this example.
The mentioned metrics have in common that they do not return a percentage. If this is crucial to you, another conversion step will be necessary.

Related

Max-pooling vs. zero padding: Loosing spatial information

When it comes to convolutional neural networks there are normally many papers recommending different strategies. I have heard people say that it is an absolute must to add padding to the images before a convolution, otherwise to much spatial information is lost. On the other hand they are happy to use pooling, normally max-pooling, to reduce the size of the images. I guess the thought here is that max pooling reduces the spatial information but also reduces the sensitivity to relative positions, so it is a trade-off?
I have heard other people saying that zero-padding does not keep more information, just more empty data. This is because by adding zeros you will not get a reaction from your kernel anyway when part of the information is missing.
I can imagine that zero-padding works if you have big kernels with "scrap values" in the edges and the source of activation centered in a smaller region of the kernel?
I would be happy to read some papers about the effect of down-sampling using pooling contra not using padding, but I cant find much about it. Any good recommendations or thoughts?
Figure: Spatial down-sampling using convolution contra pooling (Researchgate)
Adding padding is NOT an "absolute must". Sometimes it can be useful to control the size of the output so that it is not reduced by the convolution (it can also augment the output, depending on its size and kernel size). The only information that zero padding adds is the condition of border (or near-border) of the features- pixels in the limits of the input, also depending on kernel size. (You can think of it as a "passe-partout" in a picture frame)
Pooling is of MUCH MORE IMPORTANCE in convnets. Pooling is not exactly "down-sampling", or "losing spatial information". Consider first that kernel calculations have been made previous to pooling, with full spatial information. Pooling reduces dimension but keeps -hopefully- the information learnt by the kernels previously. And, by doing so, achieves one of the most interesting things about convnets; robustness to displacement, rotation or distortion of the input. Invariance, if learnt, is located even if it appears in another location or with distortions. It also implies learning through increasing scale, discovering -again, hopefully- hierarchical patterns on different scales. And of course, and also necessary in convnets, pooling makes computation possible as number of layers grows.
I have bothered on this question for a while too, and I have also seen some papers mention this same issue. Here is a recent paper I found; Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation. I have not fully read the paper but it seems to bother on your question. I can update this answer as soon as I fully grasp the paper.

Fourier spectral analysis with Support Vector Machines

I did some reading this afternoon about SVM's. And have the hope that this looks very promising.
I am currently working on a problem, where I'm looking for a pattern in the fourier spectrum. What I'm saying is, that I have been looking at spectrums for days. I hope to find some repeating patterns. I found some criterias that match a certain pattern, but with the next sample, the whole pattern could look slightly different. So there is always slight deviation, which makes it hard to describe. Or in another way, I might be overlooking something. But I can clearly say, which is the training data.
I was hoping to make use of SVM to train it, and predict the classification. Means that if I have another set of new data, that it would tell me, that it matches the training data or it goes into the "other" group, which could be anything (no need to know).
Is that something a SVM is able to do, or am I completly off? I couldn't find any good examples of input data to see if my problem is something I could feed to SVM.
Currently using Matlab.
There actually has been tons of research done on this particular topic, but especially with Wavelet Transform. Google Wavelet Transform and SVM and you will find a number of papers. From there, you can easily go ahead with adjusting your model from Wavelet to FFT spectrum.
I don't have experience with SVM, but I do have experience with related techniques, and here's what I can say:
In all likelihood, you can't simply go from a spectrum to SVM to decision. You need to determine what it is about the spectrums that distinguish your various inputs. For example, if it's the way the data changes over time or the relationship between the high and low frequencies that makes the inputs different, you need to encode that a single parameter. Eg, you could make a parameter that's the ratio of some of your higher frequencies to some of your lower frequencies. You may also want to use parameters like frequency centroid and zero-crossing rate, which are simpler than the spectrum, but may still carry useful information (These are used in audio and speech. not sure if they apply to whatever you are looking at). Once you have these derived parameters, feed them to the SVM analysis, which will do the sorting.
Other techniques you might want to examine (which also have the same requirements) include HMM (Hidden Markov Models), K-Means, and Logistic Regression.

How to Compare the quality of two images?

I have applied Two different Image Enhancement Algorithm on a particular Image and got two resultant image , Now i want to compare the quality of those two image in order to find the effectiveness of those two Algorithms and find the more appropriate one based on the comparison of Feature vectors of those two images.So what Suitable Feature Vectors should i compare in this Case?
Iam asking in context of comparing the texture features of the images and which feature vector will be more suitable.
I need Mathematical support for verifying the effectiveness of any one algorithm based on the evaluation of Images for example using Constrast and Variance.So are there any more approaches do that?
A better approach would be to do some Noise/Signal ratio by comparing image spectra ?
Slayton is right, you need a metric and a way to measure against it, which can be an academic project in itself. However, i could think of one approach straightaway, not sure if it makes sense to your specific task at hand:
Metric:
The sum of abs( colour difference ) across all pixels. The lower, the more similar the images are.
Method:
For each pixel, get the absolute colour difference (or distance, to be precise) in LAB space between original and processed image and sum that up. Don't ruin your day trying to understand the full wikipedia article and coding that, this has been done before. Try re-using the methods getDistanceLabFrom(Color color) or getDistanceRgbFrom(Color color) from this PHP implementation. It worked like a charm for me when i needed a way to match a color of pixels in a jpg picture - which basically is the same principle.
The theory behind it (as far as my limited understanding goes): It's doing a mathematical abstraction of rgb or (better) lab colour space as a three dimensional room, and then calculate the distance, that's why it works well - and hardly worked for me when looking at a color code from a one-dimensionional perspective.
The usual way is to start with a reference image (a good one), then add some noise on it (in a controlled way).
Then, your algorithm should remove as much as possible from the added noise. The results are easy to compare with a signal-to-noise ration (see wikipedia).
Now, the approach is easy to apply on simple noise models, but if you aim to improve more complex appearance issues, you must devise a way to apply the noise, which is not easy.
Another, quite common way to do it is the one recommended by slayton: take all your colleagues to appreciate the output of your algorithm, then average their impressions.
If you have only the 2 images and no reference (higest quality) image, then you can see my rude solution/bash script there: https://photo.stackexchange.com/questions/75995/how-do-i-compare-two-similar-images-sharpness/117823#117823
It gets the 2 filenames and outputs the higher quality filename. It assumes the content of the images is identical (same source image).
It can be fooled though.

Does enlarging images make them easier to analyze programmatically?

Can you enlarge a feature so that rather than take up a certain number of pixels it actually takes up one or two times that many to make it easier to analyze? Would there be a way to generalize that in MATLAB?
This sounds an awful lot like a fictitious "zoom, enhance!" procedure that you'd hear about on CSI. In general, "blowing up" a feature doesn't make it any easier to analyze, because no additional information is created when you do this. Generally you would apply other, different transformations like noise reduction to make analysis easier.
As John F has stated, you are not adding any information. In fact, with more pixels to crunch through you are making it "harder" in the sense of requiring more processing.
You might be able to intelligently increase the resolution of an image using Compressed Sensing. It will require some work (or at least some serious thought), though, as you'll have to determine how best to sample the image you already have. There's a large number of papers referenced at Rice University Compressive Sensing Resources.
The challenge is that the image is already sampled using Nyquist-Shannon constraints. You essentially have to re-sample it using a linear basis function (with IID random elements) in such a way that the estimate is at the desired resolution and find some surrogate for the original image at that same resolution that doesn't bias the estimate.
The function imresize is useful for, well, resizing images, larger or smaller. And imcrop is useful for cropping images.
You might get other more useful answers if you tag the question image-processing too.

Machine learning - training step

When you're using Haar-like features for your training data for an Adaboost algorithm, how do you build your data sets? Do you literally have to find thousands of positive and negative samples? There must be a more efficient way of doing this...
I'm trying to analyze images in matlab (not faces) and am relatively new to image processing.
Yes, you do need many positive and negative samples for training. This is especially true for Adaboost, which works by repeatedly resampling the training set. How many samples is enough is hard to say. But generally, the more the better, because that increases the chances of your training set being representative.
Also, it seems to me that your quest for efficiency is misplaced. Training is done ahead of time, presumably off-line. It is the efficiency of classifying unknown instances after the training is done, that people usually worry about.
Undoubtedly, more data, more information, better result. You should include more information as possible. However, one thing you may need care is the ratio of positive set to negative set. For logistic regression, the ratio should not be over 1:5, for adaboost, I'm not really sure with the result, but it will certainly change with the ratio (I tried before).
Yes we need many positive and negative samples for the training but the collection of those data is very tedious. But you can make it easy by taking videos instead of pictures and using ffmpeg to convert those videos into pictures. It will make the training part much easier.
The only reason to have kind of equal positive and negative samples is to avoid bias. Sometimes you might get high accuracy , but it completely fails to classify one category. To evaluate such methods precision/recall are more useful than accuracy.