MATLAB - adding number on text file - matlab

I need to use "importdata" to run a script, but my file has more columns at the bottom than at the top, like this:
Example1
2 2 3 2
2 2 1 1
1 0
2 4
1 1 2 200000 80000
It starts with 4 columns, and ends with 5), so when I use importdata, it makes a matrix with 4 columns, damaging my file. What I wanted to do is add any number at the end of the first data row (or second text row), preferentially a 0, to make it read my file as a 5-column matrix, like this:
Example1
2 2 3 2 0
2 2 1 1 0
1 0 0 0 0
2 4 0 0 0
1 1 2 200000 80000
The zeros in the other columns are, as I understand, the result of the "importdata" in a 5-column matrix, I don't need to write them too. How can this be done?

You can use textscan to read in your data. Here's how to read in your file:
fid = fopen('example.txt');
mat = textscan(fid,'%d %d %d %d %d','CollectOutput', 1);
mat = mat{1}; % accesses matrix from cell array
mat(isnan(mat)) = 0; % sets NaN values to 0
fclose(fid);
And the results:
mat =
2 2 3 2 0
2 2 1 1 0
1 0 0 0 0
2 4 0 0 0
1 1 2 200000 80000
You can then save this a new file like this:
fid = fopen('newfile.txt','w');
fprintf(fid,'%d %d %d %d %d\r\n', mat);
fclose(fid);
and read it in with importdata.

Related

Is there a fast way to count occurrences of items in a matrix and save them in another matrix without using loops?

I have a time-series matrix X whose first column contains user ID and second column contains the item ID they used at different times:
X=[1 4
2 1
4 2
2 3
3 4
1 1
4 2
5 3
2 1
4 2
5 4];
I want to find out which user used which item how many times, and save it in a matrix Y. The rows of Y represent users in ascending order of ID, and the columns represent items in ascending order of ID:
Y=[1 0 0 1
2 0 1 0
0 0 0 1
0 3 0 0
0 0 1 1]
The code I use to find matrix Y uses 2 for loops which is unwieldy for my large data:
no_of_users = size(unique(X(:,1)),1);
no_of_items = size(unique(X(:,2)),1);
users=unique(X(:,1));
Y=zeros(no_of_users,no_of_items);
for a=1:size(A,1)
for b=1:no_of_users
if X(a,1)==users(b,1)
Y(b,X(a,2)) = Y(b,X(a,2)) + 1;
end
end
end
Is there a more time efficient way to do it?
sparse creates a sparse matrix from row/column indices, conveniently accumulating the number of occurrences if you give a scalar value of 1. Just convert to a full matrix.
Y = full(sparse(X(:,1), X(:,2), 1))
Y =
1 0 0 1
2 0 1 0
0 0 0 1
0 3 0 0
0 0 1 1
But it's probably quicker to just use accumarray as suggested in the comments:
>> Y2 = accumarray(X, 1)
Y2 =
1 0 0 1
2 0 1 0
0 0 0 1
0 3 0 0
0 0 1 1
(In Octave, sparse seems to take about 50% longer than accumarray.)

MATLAB generate all ways that n items can be put into m bins?

I want to find all ways that n items can be split among m bins. For example, for n=3 and m=3 the output would be (the order doesn't matter):
[3 0 0
0 3 0
0 0 3
2 1 0
1 2 0
0 1 2
0 2 1
1 0 2
2 0 1
1 1 1]
The algorithm should be as efficient as possible, preferrably vectorized/using inbuilt functions rather than for loops. Thank you!
This should be pretty efficient.
It works by generating all posible splitings of the real interval [0, n] at m−1 integer-valued, possibly coincident split points. The lengths of the resulting subintervals give the solution.
For example, for n=4 and m=3, some of the possible ways to split the interval [0, 4] at m−1 points are:
Split at 0, 0: this gives subintervals of lenghts 0, 0, 4.
Split at 0, 1: this gives subintervals of lenghts 0, 1, 3.
...
Split at 4, 4: this gives subintervals of lenghts 4, 0, 0.
Code:
n = 4; % number of items
m = 3; % number of bins
x = bsxfun(#minus, nchoosek(0:n+m-2,m-1), 0:m-2); % split points
x = [zeros(size(x,1),1) x n*ones(size(x,1),1)]; % add start and end of interval [0, n]
result = diff(x.').'; % compute subinterval lengths
The result is in lexicographical order.
As an example, for n = 4 items in m = 3 bins the output is
result =
0 0 4
0 1 3
0 2 2
0 3 1
0 4 0
1 0 3
1 1 2
1 2 1
1 3 0
2 0 2
2 1 1
2 2 0
3 0 1
3 1 0
4 0 0
I'd like to suggest a solution based on an external function and accumarray (it should work starting R2015a because of repelem):
n = uint8(4); % number of items
m = uint8(3); % number of bins
whichBin = VChooseKR(1:m,n).'; % see FEX link below. Transpose saves us a `reshape()` later.
result = accumarray([repelem(1:size(whichBin,2),n).' whichBin(:)],1);
Where VChooseKR(V,K) creates a matrix whose rows are all combinations created by choosing K elements of the vector V with repetitions.
Explanation:
The output of VChooseKR(1:m,n) for m=3 and n=4 is:
1 1 1 1
1 1 1 2
1 1 1 3
1 1 2 2
1 1 2 3
1 1 3 3
1 2 2 2
1 2 2 3
1 2 3 3
1 3 3 3
2 2 2 2
2 2 2 3
2 2 3 3
2 3 3 3
3 3 3 3
All we need to do now is "histcount" the numbers on each row using positive integer bins to get the desired result. The first output row would be [4 0 0] because all 4 elements go in the 1st bin. The second row would be [3 1 0] because 3 elements go in the 1st bin and 1 in the 2nd, etc.

Transform a matrix to a stacked vector where all zeroes after the last non-zero value per row are removed

I have a matrix with some zero values I want to erase.
a=[ 1 2 3 0 0; 1 0 1 3 2; 0 1 2 5 0]
>>a =
1 2 3 0 0
1 0 1 3 2
0 1 2 5 0
However, I want to erase only the ones after the last non-zero value of each line.
This means that I want to retain 1 2 3 from the first line, 1 0 1 3 2 from the second and 0 1 2 5 from the third.
I want to then store the remaining values in a vector. In the case of the example this would result in the vector
b=[1 2 3 1 0 1 3 2 0 1 2 5]
The only way I figured out involves a for loop that I would like to avoid:
b=[];
for ii=1:size(a,1)
l=max(find(a(ii,:)));
b=[b a(ii,1:l)];
end
Is there a way to vectorize this code?
There are many possible ways to do this, here is my approach:
arotate = a' %//rotate the matrix a by 90 degrees
b=flipud(arotate) %//flips the matrix up and down
c= flipud(cumsum(b,1)) %//cumulative sum the matrix rows -and then flip it back.
arotate(c==0)=[]
arotate =
1 2 3 1 0 1 3 2 0 1 2 5
=========================EDIT=====================
just realized cumsum can have direction parameter so this should do:
arotate = a'
b = cumsum(arotate,1,'reverse')
arotate(b==0)=[]
This direction parameter was not available on my 2010b version, but should be there for you if you are using 2013a or above.
Here's an approach using bsxfun's masking capability -
M = size(a,2); %// Save size parameter
at = a.'; %// Transpose input array, to be used for masked extraction
%// Index IDs of last non-zero for each row when looking from right side
[~,idx] = max(fliplr(a~=0),[],2);
%// Create a mask of elements that are to be picked up in a
%// transposed version of the input array using BSXFUN's broadcasting
out = at(bsxfun(#le,(1:M)',M+1-idx'))
Sample run (to showcase mask usage) -
>> a
a =
1 2 3 0 0
1 0 1 3 2
0 1 2 5 0
>> M = size(a,2);
>> at = a.';
>> [~,idx] = max(fliplr(a~=0),[],2);
>> bsxfun(#le,(1:M)',M+1-idx') %// mask to be used on transposed version
ans =
1 1 1
1 1 1
1 1 1
0 1 1
0 1 0
>> at(bsxfun(#le,(1:M)',M+1-idx')).'
ans =
1 2 3 1 0 1 3 2 0 1 2 5

Matlab input format

I have input files containing data in the following format.
65910/A
22 9 4 2
9 10 4 1
2 5 2 0
4 1 1 0
65910/T
14 7 0 4
8 4 0 2
1 2 0 0
1 1 1 1
.
.
.
I need to take the input where the first line is a combination of %d and %c with a / in between and the next four line as a 4x4 integer matrix. I need to perform some work on the matrix and then identify them with the header information.
How can I take this input format in MATLAB?
Since your file contains data that may be considered structured (or "formatted", if using MATLAB's terms), you can use the textscan function to read its contents. The main advantage of this function is that you don't need to specify how many times your "header+data" structure appears - the function just keeps going until it reaches the end of the file.
Given an input file with the following structure (let's call it q35853578.txt):
65910/A
22 9 4 2
9 10 4 1
2 5 2 0
4 1 1 0
65910/T
14 7 0 4
8 4 0 2
1 2 0 0
1 1 1 1
We can write something like this:
function [data,headers] = q35853578(filepath)
%// Default input
if nargin < 1
filepath = 'q35853578.txt';
end
%// Define constants
N_ROWS = 4;
VALS_PER_ROW = 4;
NEWLINE = '\r\n';
%// Read structured file contents
fid = fopen(filepath);
headers = textscan(fid,['%u/%c' repmat([NEWLINE repmat('%u',1,VALS_PER_ROW)],1,N_ROWS)]);
fclose(fid);
%// Parse contents and prepare outputs
data = cell2mat(reshape(cellfun(#(x)reshape(x,1,1,[]),headers(3:end),...
'UniformOutput',false),VALS_PER_ROW,N_ROWS).'); %'
headers = headers(1:2);
%// Output checking
if nargout < 2
warning('Not all outputs assigned, some outputs will not be returned!')
end
%// Debug
clear ans fid N_ROWS NEWLINE VALS_PER_ROW filepath
keyboard; %// For debugging, delete/comment when done.
The resulting output is a 3d array of uint32 (the output class can be changed by adjusting the inputs to textscan, as permitted by formatSpec):
ans(:,:,1) =
22 9 4 2
9 10 4 1
2 5 2 0
4 1 1 0
ans(:,:,2) =
14 7 0 4
8 4 0 2
1 2 0 0
1 1 1 1

MATLAB what lines are different between matrices

I am trying to find the number of the lines where the values of two matrices are not the same
I found only a way to know the indexs on the not same items by:
find(a~=b)
where a is N*N and b is N*N
How can I know the rows numbers of the not same items
ps
looking for nicer way then
dint the find and then having some vector in a loop filling with
ind2sub(size(A), 6)
You can use max on the logical array of such matches or mis-mistaches in this case along a certain dimension, alongwith find.
If you are looking to find unique row IDs for mismatches, do this -
find(max(a~=b,[],2))
For unique column IDs, just change the dimension specifier -
find(max(a~=b,[],1))
Sample run -
>> a
a =
1 2 2 2 1
1 2 1 1 1
2 2 2 2 1
1 1 2 1 1
>> b
b =
1 2 1 1 2
1 2 1 2 1
2 2 2 2 1
1 1 2 2 2
>> a~=b
ans =
0 0 1 1 1
0 0 0 1 0
0 0 0 0 0
0 0 0 1 1
>> find(max(a~=b,[],2)) %// unique row IDs
ans =
1
2
4
>> find(max(a~=b,[],1)) %// unique col IDs
ans =
3 4 5
here I found an easy way if any one will need it
indexs=find(a~=b)
[~,rows]=ind2sub(size(a),indexs)
rows=unique( sort( rows ) )
now rows are only the different rows
NotSame = 0;
for ii = 1:size(a,1)
if a(ii,:) ~= b(ii,:)
NotSame = NotSame+1;
end
end
This checks it row by row and when a row in a is not the same as the row in b this will increase the count of NotSame. Not the fastest way, I'm sure someone can produce a solution using bsxfun, but I'm not an expert in that.
You can also use the double output of find
[row, col] = find(a~=b)
myrows = unique(row);
You can also have the columns where a & b have different values
mycols = unique(col);