Easy Way To Accomplish Data Compression in MATLAB? - matlab

I am working on an assignment where I have to take a large matrix containing data, and somehow compress the data so that it will be in a form of more manageable size. However, the data needs to be re-utilized as input to something else. (A toolbox, for example). Here's what I've done so far. For this example matrix, I use the find function to give me a matrix of all the indices where the values are non-zero. But I have no idea as to how to use it as input so that the original figure information is retained. I was curious if other folks had any other better (simple) solutions to this.
number_1 = [0 0 0 0 0 0 0 0 0 0 ...
0 0 1 1 1 1 0 0 0 0 ...
0 1 1 0 1 1 0 0 0 0 ...
0 1 1 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 1 1 1 1 1 1 1 1 0 ...
0 0 0 0 0 0 0 0 0 0];
number = number_1;
compressed_number = find(number);
compressed_number = compressed_number';
disp(compressed_number)

When you have only ones and zeros, and the fill factor is not terribly small, your best bet is to store the numbers as binary numbers; if you need the original size, save it separately. I have expanded the code, showing the intermediate steps a little more clearly, and also showing the amount of storage needed for the different arrays. Note - I reshaped your data into a 13x10 array because it displays better.
number_1 = [0 0 0 0 0 0 0 0 0 0 ...
0 0 1 1 1 1 0 0 0 0 ...
0 1 1 0 1 1 0 0 0 0 ...
0 1 1 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 0 0 0 1 1 0 0 0 0 ...
0 1 1 1 1 1 1 1 1 0 ...
0 0 0 0 0 0 0 0 0 0];
n1matrix = reshape(number_1, 10, [])'; % make it nicer to display;
% transpose because data is stored column-major (row index changes fastest).
disp('the original data in 13 rows of 10:');
disp(n1matrix);
% create a matrix with 8 rows and enough columns
n1 = numel(number_1);
nc = ceil(n1/8); % "enough columns"
npad = zeros(8, nc);
npad(1:n1) = number_1; % fill the first n1 elements: the rest is zero
binVec = 2.^(7-(0:7)); % 128, 64, 32, 16, 8, 4, 2, 1 ... powers of two
compressed1 = uint8(binVec * npad); % 128 * bit 1 + 64 * bit 2 + 32 * bit 3...
% showing what we did...
disp('Organizing into groups of 8, and calculated their decimal representation:')
for ii = 1:nc
fprintf(1,'%d ', npad(:, ii));
fprintf(1, '= %d\n', compressed1(ii));
end
% now the inverse operation: using dec2bin to turn decimals into binary
% this function returns strings, so some further processing is needed
% original code used de2bi (no typo) but that requires a communications toolbox
% like this the code is more portable
decompressed = dec2bin(compressed1);
disp('the string representation of the numbers recovered:');
disp(decompressed); % this looks a lot like the data in groups of 8, but it's a string
% now we turn them back into the original array
% remember it is a string right now, and the values are stored
% in column-major order so we need to transpose
recovered = ('1'==decompressed'); % all '1' characters become logical 1
display(recovered);
% alternative solution #1: use logical array
compressed2 = (n1matrix==1);
display(compressed2);
recovered = double(compressed2); % looks just the same...
% other suggestions 1: use find
compressed3 = find(n1matrix); % fewer elements, but each element is 8 bytes
compressed3b = uint8(compressed); % if you know you have fewer than 256 elements
% or use `sparse`
compressed4 = sparse(n1matrix);
% or use logical sparse:
compressed5 = sparse((n1matrix==1));
whos number_1 comp*
the original data in 13 rows of 10:
0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 1 0 0 0 0
0 1 1 0 1 1 0 0 0 0
0 1 1 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 1 1 1 1 1 1 1 1 0
0 0 0 0 0 0 0 0 0 0
Organizing into groups of 8, and their decimal representation:
0 0 0 0 0 0 0 0 = 0
0 0 0 0 1 1 1 1 = 15
0 0 0 0 0 1 1 0 = 6
1 1 0 0 0 0 0 1 = 193
1 0 1 1 0 0 0 0 = 176
0 0 0 0 1 1 0 0 = 12
0 0 0 0 0 0 1 1 = 3
0 0 0 0 0 0 0 0 = 0
1 1 0 0 0 0 0 0 = 192
0 0 1 1 0 0 0 0 = 48
0 0 0 0 1 1 0 0 = 12
0 0 0 0 0 0 1 1 = 3
0 0 0 0 0 0 0 0 = 0
1 1 0 0 0 0 0 1 = 193
1 1 1 1 1 1 1 0 = 254
0 0 0 0 0 0 0 0 = 0
0 0 0 0 0 0 0 0 = 0
the string representation of the numbers recovered:
00000000
00001111
00000110
11000001
10110000
00001100
00000011
00000000
11000000
00110000
00001100
00000011
00000000
11000001
11111110
00000000
00000000
compressed2 =
0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 1 0 0 0 0
0 1 1 0 1 1 0 0 0 0
0 1 1 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 1 1 1 1 1 1 1 1 0
0 0 0 0 0 0 0 0 0 0
recovered =
0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 1 0 0 0 0
0 1 1 0 1 1 0 0 0 0
0 1 1 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 1 1 1 1 1 1 1 1 0
0 0 0 0 0 0 0 0 0 0
Name Size Bytes Class Attributes
compressed1 1x17 17 uint8
compressed2 13x10 130 logical
compressed3 34x1 272 double
compressed3b 34x1 34 uint8
compressed4 13x10 632 double sparse
compressed5 13x10 394 logical sparse
number_1 1x130 1040 double
As you can see, the original array takes 1040 bytes; the compressed array takes 17. You get almost 64x compression (not quite because 132 is not a multiple of 8); only a very sparse dataset would be better compressed by some other means. The only thing that gets close (and that is super fast) is
compressed3b = uint8(find(number_1));
At 34 bytes, it is definitely a contender for small arrays (< 256 elements).
Note - when you save data in Matlab (using save(fileName, 'variableName')), some compression happens automatically. This leads to an interesting and surprising result. When you take each of the above variables and save them to file using Matlab's save, the file sizes in bytes become:
number_1 195
compressed1 202
compressed2 213
compressed3 219
compressed3b 222
compressed4 256
compressed5 252
On the other hand, if you create a binary file yourself using
fid = fopen('myFile.bin', 'wb');
fwrite(fid, compressed1)
fclose(fid)
It will by default write uint8, so the file sizes are 130, 17, 130, 34, 34 -- sparse arrays cannot be written in this way. It still shows the "complicated" compression having the best compression.

First of all, you can use the find function to get all non-zero indices of your array, instead of doing it manually. More info here: http://www.mathworks.com/help/matlab/ref/find.html
Anyways, you will need not only matrix but also the original size. So when you pass matrix into whatever, you must also pass in length(number_1). This is because matrix will not tell you how many 0s there were after the last 1. You can figure it out by subtracting the last value of matrix from the original length (there might be an off-by-one error there).

Related

Matlab Equivalent of sortrows for columns

To sort a matrix according to all columns except the first, I used the following code. I do not want sortrows to consider the first column because that is meant to keep track of the row numbers.
B = [1 1 0 0 0 0 0 0 0 1
2 0 1 0 0 0 0 1 0 0
3 0 0 1 0 1 0 0 1 0
4 0 1 0 0 0 1 1 0 0
5 0 0 1 0 0 0 0 1 0
6 0 0 0 0 0 1 1 0 0
7 1 0 0 1 0 0 0 0 0
8 0 0 1 0 1 0 0 0 0];
D = -sortrows(-B,[2:size(B,2)])
What if you want to sort the matrix according to all rows except the first, so the first element of each column would be ignored when sorting them in descending order? Is there any similar function to sortrows?
To clarify, the desired output is
1 0 0 0 0 0 0 1 0 1
2 1 1 0 0 0 0 0 0 0
3 0 0 1 1 1 0 0 0 0
4 1 1 0 0 0 1 0 0 0
5 0 0 1 1 0 0 0 0 0
6 1 0 0 0 0 1 0 0 0
7 0 0 0 0 0 0 1 1 0
8 0 0 1 0 1 0 0 0 0
You can do this via
transposing the input and output
keeping column 1 separate
you can use negative sort indices to avoid what you've done making the input and output negative
A = [B(:,1) sortrows( B(:,2:end).', -(2:size(B,1)) ).'];
>> A
A =
1 0 0 0 0 0 0 1 0 1
2 1 1 0 0 0 0 0 0 0
3 0 0 1 1 1 0 0 0 0
4 1 1 0 0 0 1 0 0 0
5 0 0 1 1 0 0 0 0 0
6 1 0 0 0 0 1 0 0 0
7 0 0 0 0 0 0 1 1 0
8 0 0 1 0 1 0 0 0 0

Advanced Search and Remove in special Matrix

I have this matrix
X= [2 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 250;
3 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 250;
2 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 250;
3 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 250;
4 0 0 1 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 250;
3 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 250;
2 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 250;
4 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 250;
3 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 250;
3 1 1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 400]
I need to do three different sequence things in this matrix:
1- Search in this matrix to the following sequence 1 1 0 0 0 and write those rows that have this characteristic in new matrix (like row 1).
2- Use the matrix that generate in the first step and remove from it to the rows that have the same number in the same digits (like row 1,3,7) but at the same time keep only one row of each one (in the case of row 1,3,7 keep row 1 and remove other rows) .
3- use the matrix that generate in the second step and remove from this matrix any row that have following sequence 1 1 1 (like row 8) and put the other rows in this matrix in new matrix.
%Step-1
% Converting the matrix into a string, appending a semi-colon for similarity and removing the brackets from the string
req=mat2str(X); req(end)=';' ; req=req(2:end);
% Searching the sequence: 1 1 0 0 0
sp1=strfind(req, '1 1 0 0 0');
% Storing those rows of X in req matrix which contain the sequence
req=X(unique(ceil([sp1]/(size(req,2)/size(X,1)))),:);
%Step-2
req= unique(req,'rows');
%Step-3
% Converting the matrix into a string, appending a semi-colon for similarity and removing the brackets from the string
reqtemp=mat2str(req); reqtemp(end)=';' ; reqtemp=reqtemp(2:end);
% Searching the sequence: 1 1 1
sp1=strfind(reqtemp, '1 1 1');
% Removing those rows which contain the sequence
req(unique(ceil([sp1]/(size(reqtemp,2)/size(req,1)))),:)=[];

MATLAB: how to "equally distribute" the Trues in each column of a full lower triangular logical matrix over the columns of m new matrices?

My first question on stackoverflow! The title is vague, so let me elaborate: I have a NxN lower triangular logical matrix
N = 10 % for example
L = tril(true(N),-1)
L =
0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0
1 1 1 1 1 1 0 0 0 0
1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 0 0
1 1 1 1 1 1 1 1 1 0
with all trues below the diagonal. For a m=2^p a power of 2, I want to end up with m NxN lower triangular logical matrices L_1, ..., L_m such that each column of L_i contains the i-th 1/m-th (rounded) number of the Trues in the corresponding column in L. One consequence is that \sum_i(L_i) == L again.
For example, for m = 2 I know that
L_2 = L(:,ceil((N:2*N-1)/2))
L_2 =
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 0
L_1 = L - L_2
L_1 =
0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0
0 1 1 1 1 0 0 0 0 0
0 0 0 1 1 1 0 0 0 0
0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0
will do the trick, but this trick does not generalize to higher powers of 2 for m. Any ideas how to do this reasonably fast for general N and m = 2^p?
(Context: each column of L are logical indices for a bisection type algorithm. Every next power p of m = 2^p corresponds to a deeper level of the bisection algorithm)

Matlab: keeping non-zero matrix elements adjacent to each other and ignoring lone elements

Here is an example matrix (but the result shouldn't be constrained to only working on this):
a=zeros(7,7);
a(5,3:6)=1;
a(2,2)=1;
a(2,4)=1;
a(7,1:2)=1
a=
0 0 0 0 0 0 0
0 1 0 1 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 1 1 1 1 0
0 0 0 0 0 0 0
1 1 0 0 0 0 0
I want to get rid of all the 1's that are alone (the noise), such that I only have the line of 1's on the fifth row.
rules:
-the 1's are in 'connected lines' if there are adjacent 1's (including diagonally) e.g.:
0 0 0 1 0 0 1 0 1
1 1 1 0 1 0 0 1 0
0 0 0 0 0 1 0 0 0
(The connected lines are what I want to keep. I want to get rid of all the 1's that are not in connected lines, the connected lines can intersect each other)
the 'connected lines need to be at least 3 elements long. So in the 7x7 example, there would only be one line that matches this criteria. If a(7,3) was set to 1, then there would be a connected line at the bottom left also
I am currently looking at this through a column by column approach, and here is the first draft of my code so far:
for nnn=2:6
rowPoss=find(a(:,nnn)==1);
rowPoss2=find(a(:,nnn+1)==1);
for nn=1:length(rowPoss)
if myResult(rowPoss(nn)-1:rowPoss(nn)+1,n-1)==0 %
%then?
end
end
end
My difficulty is, during this column by column process, I'd have to enable a way to recognise the beginning of the connected line, the middle of the connected line, and when a connected line ends. The same rules for this, when applied to noise (the lone 1's), would just ignore the lone 1's.
The output I want is basically:
b=
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 1 1 1 1 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
If you have image processing toolbox, try bwareaopen
b = bwareaopen(a, 3);
Sample Run #1:
>> a
a =
0 0 0 0 0 0 0
0 1 0 1 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 1 1 1 1 0
0 0 0 0 0 0 0
1 1 0 0 0 0 0
>> b
b =
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 1 1 1 1 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
Sample Run #2:
>> a
a =
0 0 0 0 0 0 0
0 1 0 1 0 0 0
0 0 1 0 0 0 0
0 0 0 0 0 0 0
0 0 1 1 1 1 0
0 0 0 0 0 0 0
1 1 0 0 0 0 0
>> b
b =
0 0 0 0 0 0 0
0 1 0 1 0 0 0
0 0 1 0 0 0 0
0 0 0 0 0 0 0
0 0 1 1 1 1 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0

How to Count Total Number of pixel of padded value used in image(Padded Image)?

I have one binary image so it has only 2 value like 0 and 1. After, I convert this into a padded image of different values, like the image will have curve shape. I took a 3 X 3 matrix of value and if i get curve shape then I padded the image with 1, or any number. I use 15 different types shape values like junction point, end point etc.
After, I give the values 1 to 15 - or the appropriate number according its shape. As such, I am getting an image like:
Figure
0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0
I would like to count how many 1s there are in the image, followed by 2s, 3s, etc. up to 15. For example,
as shown in the figure, if the pad number was 5, the total number of pixels would be 3. If the pad number was 1, the total number of pixels would be 6.
Use histc:
>> im = [ 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 5 0 0 0 0 0 0 0 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 ]; %// data
>> values = 1:15; %// possible values
>> count = histc(im(:), values)
count =
6 %// number of 1's
0 %// number of 2's, etc
0
0
3
0
0
0
0
0
0
0
0
0
0
Or compute it manually with bsxfun:
>> count = sum(bsxfun(#eq, im(:), values(:).'), 1)
>> count =
6 0 0 0 3 0 0 0 0 0 0 0 0 0 0
I can also suggest using accumarray:
im = [0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 5 0 0 0 0 0 0 0 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 ]; %// data - borrowed from Luis Mendo
counts = accumarray(im(:) + 1, 1);
counts(1) = []
counts =
6
0
0
0
3
Note we have to offset by 1 as accumarray starts indexing the output array at 1. Because you want to disregard the 0s, I simply take the counts result and remove the first entry. This result agrees with what you are seeking. The first element is how many 1s we have encountered, which is 6. The last element is how many 5s you have encountered, which is 3. Because 5 is the largest number encountered in your image, we can say that all symbols after 5 (6, 7, 8, ..., 15) have a count of 0.