In compressed sensing, how to verify if a vector is recovered or how could one plot the figures on recovery rate? since in numerical experiments, there is always a difference between the original vector and the vector produced by compressed algorithms.
The only way to confirm that your vector is actually recovered would be to
keep one copy of the vector
compress another copy
decompress the compressed vector
compare with the original
This could be useful to do when you are trying to design or select a lossless compression method.
However, if you are in the setting where you have a vector, need to compress it. Later decompress it and want to know whether there was any loss. Then there is no practical way to be sure. Fortunately, you can get some indications whether your new vector resembles the old one by storing and comparing some statistics.
Some statistics that may be interesting, depending on your vector:
Length
Moments (Mean, variance etc.)
Number of positive values
Highest, and lowest value
Smallest nonzero difference between two values
The exact value of every 10000th element
...
Basically whatever really matters to you
Related
I am generation some data whose plots are as shown below
In all the plots i get some outliers at the beginning and at the end. Currently i am truncating the first and the last 10 values. Is there a better way to handle this?
I am basically trying to automatically identify the two points shown below.
This is a fairly general problem with lots of approaches, usually you will use some a priori knowledge of the underlying system to make it tractable.
So for instance if you expect to see the pattern above - a fast drop, a linear section (up or down) and a fast rise - you could try taking the derivative of the curve and looking for large values and/or sign reversals. Perhaps it would help to bin the data first.
If your pattern is not so easy to define but you are expecting a linear trend you might fit the data to an appropriate class of curve using fit and then detect outliers as those whose error from the fit exceeds a given threshold.
In either case you still have to choose thresholds - mean, variance and higher order moments can help here but you would probably have to analyse existing data (your training set) to determine the values empirically.
And perhaps, after all that, as Shai points out, you may find that lopping off the first and last ten points gives the best results for the time you spent (cf. Pareto principle).
I'm working on lossless data compression in MATLAB. I wish to encode a signal of about 60000 length. Here's my code:
function sig = huffman (Y, fs)
%getting array of unique values
Z = unique (Y);
%counting occurences of each element and listing it to a new array
countElY=histc(Y,Z); %# get the count of elements
p = countElY/numel(Y); %getting the probability distribution
[dict,avglen] = huffmandict(Z,p); % Create dictionary.
comp = huffmanenco(Y,dict) % Encode the data.
dsig = huffmandeco(comp, dict) %Decode the data
sound(dsig, fs)
Problem is, for a signal of such length, I exceed the 500 recursion limit at MATLAB, and that error occurs while creating the dictionary. I have already tried to break the signal into parts, but that took hell lot of time, and for only a small part of it. Any ideas how to make it work? Apart from extending the recursion limit, which is rather pointless and time consuming?
First you need to determine why you think it's possible to compress the data. Is the signal smooth? Is the range limited? Is the quantization limited? What makes it compressible will determine how to compress it.
Simply applying Huffman coding to a series of real values will not compress the data, since each of the values appears once, or maybe a few appear twice. Huffman depends on taking advantage of many occurrences of the same symbol, and a bias in the frequency, where some symbols are much more common than others.
Compressing a waveform would use different approaches. First would be to convert each sample to as few bits as are significant and that cover the range of inputs. Second would be to take differences between samples or use more advanced predictors to take advantage of smoothness in the waveform (if it is smooth). Third would be to find a way to group differences to encode more likely ones in fewer bits. That last step might use Huffman coding, but only after you've constructed something that Huffman coding can take advantage of.
I have a set of data in a vector. If I were to plot a histogram of the data I could see (by clever inspection) that the data is distributed as the sum of three distributions;
One normal distribution centered around x_1 with variance s_1;
One normal distribution centered around x_2 with variance s_2;
Once lognormal distribution.
My data is obviously a subset of the 'real' data.
What I would like to do is to take a random subset of my data away from my data ensuring that the resulting subset is a reasonable representative sample of the original data.
I would like to do this as easily as possible in matlab but am new to both statistics and matlab and am unsure where to start.
Thank you for any help :)
If you can identify each of the 3 distributions (in the sense that you can estimate their parameters), one approach could be to select a random subset of your data and then try to estimate the parameters for each distribution and see whether they are close enough (according to your own definition of "close") to the parameters of the original distributions. You should repeat this process several time and look at the average difference given a random subset size.
I would like be able to use a vector as an envelope to apply fft equalization to rather large chunks of sound, with varying sizes.
To be able to multiply the frequency domain bins by the envelope, the envelope needs to have the same resolution as the fft data, which will vary with the size of the sound chunks.
So, I need a function to resample my envelope vector. Do you know whether vDSP features a function for that purpose? I browsed the reference back and forth, but found nothing. Which doesn't mean there is nothing there - it's easy to miss something while searching the vDSP reference...
It's not that I couldn't implement something myself, but if there was a vDSP function, it would propably be much faster than anything I could possibly come up with. Which is relevant as this project is targeted at iOS devices as well.
And there's no need to reinvent the wheel :)
Thanks!!
If I understand correctly, you have a 1D array of envelope values which you want to vector multiply with a 1D array of frequency bins. The problem you are trying to solve is to scale the envelope array to the same length as the FFT array. It would be helpful to know how you are generating the envelope array in the first place, can you not simply generate it at the correct length? If so, problem solved :)
If not, then how about using vDSP_vtabi to generate the envelope vector from the lookup table of values that you currently have? You can generate the lookup table input vector A using vDSP_vramp.
This seems rather complicated and expensive to me though, with a fair amount of buffer mallocing / reallocing. It might be simpler to calculate how many FFT samples should be multiplied by each envelope value, then loop for each envelope sample using vDSP_vsmul to multiply chunks of the FFT vector by the envelope value.
Which solution will perform better really depends a lot on the relative sizes of each vector. It would also be helpful to know why the FFT vectors are different sizes, and how you are generating the envelope array in the first place to give a more accurate answer.
I'd suggest to go through a different way.
Because your input signal comes from hardware at a fixed sample rate and you exactly
know the number of envelope values , just make a sample rate conversion using
AudioConverterFillComplexBuffer
It's fast and it take care about filtering and interpolation when resampling.
I need to find the position of a smaller image inside a bigger image. The smaller image is a subset of the bigger image. The requirement is also that pixel values can slightly differ for example if images were produced by different JPEG compressions.
I've implemented the solution by comparing bytes using the CPU but I'm now looking into any possibility to speed up the process.
Could I somehow utilize OpenGLES and thus iPhone GPU for it?
Note: images are grayscale.
#Ivan, this is a pretty standard problem in video compression (finding position of current macroblock in previous frame). You can use a metric for difference in pixels such as sum of abs differences (SAD), sum of squared differences (SSD), or sum of Hadamard-transformed differences (SATD). I assume you are not trying to compress video but rather looking for something like a watermark. In many cases, you can use a gradient descent type search to find a local minimum (best match), on the empirical observation that comparing an image (your small image) to a slightly offset version of same (a match the position of which hasn't been found exactly) produces a closer metric than comparing to a random part of another image. So you can start by sampling the space of all possible offsets/positions (motion vectors in video encoding) rather coarsely, and then do local optimization around the best result. The local optimization works by comparing a match to some number of neighboring matches, and moving to the best of those if any is better than your current match, repeat. This is very much faster than brute force (checking every possible position), but it may not work in all cases (it is dependent on the nature of what is being matched). Unfortunately, this type of algorithm does not translate very well to GPU, because each step depends on previous steps. It may still be worth it; if you check eg 16 neighbors to the position for a 256x256 image, that is enough parallel computation to send to GPU, and yes it absolutely can be done in OpenGL-ES. However the answer to all that really depends on whether you're doing brute force or local minimization type search, and whether local minimization would work for you.