I am trying to classify my dataset. To do this, I will use the 4th column of my dataset. If the 4th column of the dataset is equal to 1, that row will added in new matrix called Q1. If the 4th column of the dataset is equal to 2, that row will be added to matrix Q2.
My code:
i = input('Enter a start row: ');
j = input('Enter a end row: ');
search = importfiledataset('search-queries-features.csv',i,j);
[n, p] = size(search);
if j>n
disp('Please enter a smaller number!');
end
for s = i:j
class_id = search(s,4);
if class_id == 1
Q1 = search(s,1:4)
elseif class_id ==2
Q2 = search(s,1:4)
end
end
This calculates the Q1 and Q2 matrices, but they all are 1x4 and when it gives new Q1 the old one is deleted. I need to add new row and make it 2x4 if conditions are true. I need to expand my Q1 matrix.
Briefly I am trying to divide my dataset into two parts using for loops and if statements.
Dataset:
I need outcome like:
Q1 = [30 64 1 1
30 62 3 1
30 65 0 1
31 59 2 1
31 65 4 1
33 58 10 1
33 60 0 1
34 58 30 1
34 60 1 1
34 61 10 1]
Q2 = [34 59 0 2
34 66 9 2]
How can I prevent my code from deleting previous rows of Q1 and Q2 and obtain the entire matrices?
The main problem in your calculation is that you overwrite Q1 and Q2 each loop iteration. Best solution: get rid of the loops and use logical indexing.
You can use logical indexing to quickly determine where a column is equal to 1 or 2:
search = [
30 64 1 1
30 62 3 1
30 65 0 1
31 59 2 1
31 65 4 1
33 58 10 1
33 60 0 1
34 59 0 2
34 66 9 2
34 58 30 1
34 60 1 1
34 61 10 1
];
Q1 = search(search(:,4)==1,:) % == compares each entry in the fourth column to 1
Q2 = search(search(:,4)==2,:)
Q1 =
30 64 1 1
30 62 3 1
30 65 0 1
31 59 2 1
31 65 4 1
33 58 10 1
33 60 0 1
34 58 30 1
34 60 1 1
34 61 10 1
Q2 =
34 59 0 2
34 66 9 2
Warning: Slow solution!
If you are hell bent on using loops, make sure to not overwrite your variables. Either extend them each iteration (which is very, very slow):
Q1=[];
Q2=[];
for ii = 1:size(search,1) % loop over all rows
if search(ii,4)==1
Q1 = [Q1;search(ii,:)];
end
if search(ii,4)==2
Q2 = [Q2;search(ii,:)];
end
end
MATLAB will put orange wiggles beneath Q1 and Q2, because it's a bad idea to grow arrays in-place. Alternatively, you can preallocate them as large as search and strip off the excess:
Q1 = zeros(size(search)); % Initialise to be as large as search
Q2 = zeros(size(search));
Q1kk = 1; % Intialiase counters
Q2kk = 1;
for ii = 1:size(search,1) % loop over all rows
if search(ii,4)==1
Q1(Q1kk,:) = search(ii,:); % store
Q1kk = Q1kk + 1; % Increase row counter
end
if search(ii,4)==2
Q2(Q2kk,:) = search(ii,:);
Q2kk = Q2kk + 1;
end
end
Q1 = Q1(1:Q1kk-1,:); % strip off excess rows
Q2 = Q2(1:Q2kk-1,:);
Another option using accumarray, if Q is your original matrix:
Q = accumarray(Q(:,4),1:size(Q,1),[],#(x){Q(x,:)});
You can access the result with Q{1} (for class_id = 1), Q{2} (for class_id = 2) and so on...
Related
I'd like to obtain all unique products for a given vector.
For example, given a:
a = [4,10,12,3,6]
I want to obtain a matrix that contains the results of:
4*10
4*12
4*3
4*6
10*12
10*3
10*6
12*3
12*6
3*6
Is there a short and/or quick way of doing this in MATLAB?
EDIT: a may contain duplicate numbers, giving duplicate products - and these must be kept.
Given:
a =
4 10 12 3 6
Construct the matrix of all pairwise products:
>> all_products = a .* a.'
all_products =
16 40 48 12 24
40 100 120 30 60
48 120 144 36 72
12 30 36 9 18
24 60 72 18 36
Now, construct a mask to keep only those values below the main diagonal:
>> mask = tril(true(size(all_products)), -1)
mask =
0 0 0 0 0
1 0 0 0 0
1 1 0 0 0
1 1 1 0 0
1 1 1 1 0
and apply the mask to the product matrix:
>> unique_products = all_products(mask)
unique_products =
40
48
12
24
120
30
60
36
72
18
If you have the Statistics Toolbox, you can abuse pdist, which considers only one of the two possible orders for each pair:
result = pdist(a(:), #times);
One option involves nchoosek, which returns all combinations of k elements out of a vector, each row is one combination. prod computes the product of rows or columns:
a = [4,10,12,3,6];
b = nchoosek(a,2);
b = prod(b,2); % 2 indicates rows
Try starting with this. Have the unique function filter out the result of multiplying a by itself.
b = unique(a*a')
I want to implement the JPEG compression by using MATLAB. Well at the point where the symbols' probabilities (Huffman coding) are calculated i can see some NEGATIVE values. I am sure that this is not correct!!! if someone can give some help or directions i would really appreciate it. Thank all of you in advance. I use MATLAB R2012b. Here is the code:
clc;
clear all;
a = imread('test.png');
b = rgb2gray(a);
b = imresize(b, [256 256]);
b = double(b);
final = zeros(256, 256);
mask = [1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 0
1 1 1 1 1 1 0 0
1 1 1 1 1 0 0 0
1 1 1 1 0 0 0 0
1 1 1 0 0 0 0 0
1 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0];
qv1 = [ 16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99];
t = dctmtx(8);
DCT2D = #(block_struct) t*block_struct.data*t';
msk = #(block_struct) mask.*block_struct.data;
for row = 1:8:256
for column = 1:8:256
x = (b(row:row+7, column:column+7));
xf = blockproc(x, [8 8], DCT2D);
xf1 = blockproc(xf, [8 8], msk);
xf1 = round(xf1./qv1).*qv1;
final(row:row+7, column:column+7) = xf1;
end
end
[symbols,p] = hist(final,unique(final));
bar(p, symbols);
p = p/sum(p); %NEGATIVE VALUES????
I think you might have the outputs of hist (symbols and p) swapped. The probability should be calculated from the bin counts, which is the first output of hist.
[nelements,centers] = hist(data,xvalues) returns an additional row vector, centers, indicating the location of each bin center on the x-axis. To plot the histogram, you can use bar(centers,nelements).
In other words, instead of your current line,
[symbols,p] = hist(final,unique(final));
just use,
[p,symbols] = hist(final,unique(final));
Also, final is a matrix rather than a vector, so nelements will be a matrix:
If data is a matrix, then a histogram is created separately for each column. Each histogram plot is displayed on the same figure with a different color.
If I have a matrix
F=[ 24 3 17 1;
28 31 19 1;
24 13 25 2;
47 43 39 1;
56 41 39 2];
in the first three columns I have feature values a forth column is for class labels. my problem is to get rid of same feature values when class label is different for that particular values.
like for F matrix I have to remove the rows 1,3,4 and 5 ,because for first column there are 2 different values in column four and same is for third column (39 and 39)as class label again got changed.
so output should look like
F=[28 31 19 1];
The straightforward approach would be iterating over the columns, counting the number of different classes for each value, and removing the rows for values associated to more than one class.
Example
F = [24 3 17 1; 28 31 19 1; 24 13 25 2; 47 43 39 1; 56 41 39 2];
%// Iterate over columns
for col = 1:size(F, 2) - 1
%// Count number of different classes for each value
[vals, k, idx] = unique(F(:, col));
count = arrayfun(#(x)length(unique(F(F(:, col) == x, end))), vals);
%// Remove values associated to more than one class
F(count(idx) > 1, :) = [];
end
This results in:
F =
28 31 19 1
Another take at the problem, without arrayfun (edited)
F = [24 3 17 1; 28 31 19 1; 24 13 25 2; 47 43 39 1; 56 41 39 2];
Separate both classes:
A1 = F(F(:,4)==1,1:3);
A2 = F(F(:,4)==2,1:3);
Replicate them to a 3D matrix to compare each line of class1 with each line of class2:
B2 = repmat(shiftdim(A2',-1),size(A1,1),1);
B1 = repmat(A1,[1,1,size(A2,1)]);
D4 = squeeze(sum(B1 == B2,2));
remove rows duplicated rows
A1(logical(sum(D4,2)),:) = [];
A2(logical(sum(D4,1)),:) = [];
reconstruct original matrix
R = [A1 ones(size(A1,1),1);A2 2*ones(size(A2,1),1)];
If I have a matrix
F=[ 24 3 17 1;
28 31 19 1;
24 13 25 2;
47 43 39 1;
56 41 39 2];
in the first three columns I have feature values a forth column is for class labels. my problem is to get rid of same feature values when class label is different for that particular values.
like for F matrix I have to remove the rows 1,3,4 and 5 ,because for first column there are 2 different values in column four and same is for third column (39 and 39)as class label again got changed.
so output should look like
F=[28 31 19 1];
The straightforward approach would be iterating over the columns, counting the number of different classes for each value, and removing the rows for values associated to more than one class.
Example
F = [24 3 17 1; 28 31 19 1; 24 13 25 2; 47 43 39 1; 56 41 39 2];
%// Iterate over columns
for col = 1:size(F, 2) - 1
%// Count number of different classes for each value
[vals, k, idx] = unique(F(:, col));
count = arrayfun(#(x)length(unique(F(F(:, col) == x, end))), vals);
%// Remove values associated to more than one class
F(count(idx) > 1, :) = [];
end
This results in:
F =
28 31 19 1
Another take at the problem, without arrayfun (edited)
F = [24 3 17 1; 28 31 19 1; 24 13 25 2; 47 43 39 1; 56 41 39 2];
Separate both classes:
A1 = F(F(:,4)==1,1:3);
A2 = F(F(:,4)==2,1:3);
Replicate them to a 3D matrix to compare each line of class1 with each line of class2:
B2 = repmat(shiftdim(A2',-1),size(A1,1),1);
B1 = repmat(A1,[1,1,size(A2,1)]);
D4 = squeeze(sum(B1 == B2,2));
remove rows duplicated rows
A1(logical(sum(D4,2)),:) = [];
A2(logical(sum(D4,1)),:) = [];
reconstruct original matrix
R = [A1 ones(size(A1,1),1);A2 2*ones(size(A2,1),1)];
I want to calculate a cumulative sum of the values in column 2 of dat.txt below for each string of ones in column 1. The desired output is shown as dat2.txt:
dat.txt dat2.txt
1 20 1 20 20 % 20 + 0
1 22 1 22 42 % 20 + 22
1 20 1 20 62 % 42 + 20
0 11 0 11 11
0 12 0 12 12
1 99 1 99 99 % 99 + 0
1 20 1 20 119 % 20 + 99
1 50 1 50 169 % 50 + 119
Here's my initial attempt:
fid=fopen('dat.txt');
A =textscan(fid,'%f%f');
in =cell2mat(A);
fclose(fid);
i = find(in(2:end,1) == 1 & in(1:end-1,1)==1)+1;
out = in;
cumulative =in;
cumulative(i,2)=cumulative (i-1,2)+ cumulative(i,2);
fid = fopen('dat2.txt','wt');
format short g;
fprintf(fid,'%g\t%g\t%g\n',[out cumulative(:)]');
fclose(fid);
Here's a completely vectorized (albeit somewhat confusing-looking) solution that uses the functions CUMSUM and DIFF along with logical indexing to produce the results you want:
>> data = [1 20;... %# Initial data
1 22;...
1 20;...
0 11;...
0 12;...
1 99;...
1 20;...
1 50];
>> data(:,3) = cumsum(data(:,2)); %# Add a third column containing the
%# cumulative sum of column 2
>> index = (diff([0; data(:,1)]) > 0); %# Find a logical index showing where
%# continuous groups of ones start
>> offset = cumsum(index.*(data(:,3)-data(:,2))); %# An adjustment required to
%# zero the cumulative sum
%# at the start of a group
%# of ones
>> data(:,3) = data(:,3)-offset; %# Apply the offset adjustment
>> index = (data(:,1) == 0); %# Find a logical index showing where
%# the first column is zero
>> data(index,3) = data(index,2) %# For each zero in column 1 set the
%# value in column 3 to be equal to
data = %# the value in column 2
1 20 20
1 22 42
1 20 62
0 11 11
0 12 12
1 99 99
1 20 119
1 50 169
Not completely vectorized solution (it loops through the segments of sequential 1s), but should be faster. It's doing only 2 loops for your data. Uses MATLAB's CUMSUM function.
istart = find(diff([0; d(:,1)])==1); %# start indices of sequential 1s
iend = find(diff([d(:,1); 0])==-1); %# end indices of sequential 1s
dcum = d(:,2);
for ind = 1:numel(istart)
dcum(istart(ind):iend(ind)) = cumsum(dcum(istart(ind):iend(ind)));
end
dlmwrite('dat2.txt',[d dcum],'\t') %# write the tab-delimited file
d=[
1 20
1 22
1 20
0 11
0 12
1 99
1 20
1 50
];
disp(d)
out=d;
%add a column
out(:,3)=0;
csum=0;
for(ind=1:length(d(:,2)))
if(d(ind,1)==0)
csum=0;
out(ind,3)=d(ind,2);
else
csum=csum+d(ind,2);
out(ind,3)=csum;
end
end
disp(out)