Save cell containing different length vectors to text file in MATLAB - matlab

I am trying to save a cell array containing different length column vectors to a text file in MATLAB, but I am a bit stuck.
cell_structure = {[1;2;3;...546] [1;2;3;...800] [1;2;3;...1011] [1;2;3;...1118] [1;2;3;...1678]}
I tried using the following code:
fid = fopen( 'myFile.txt', 'w' ) ;
for cId = 1 : numel( cell_structure )
fprintf( fid, '%f ', cell_structure{cId} ) ;
fprintf( fid, '\r\n' ) ;
end
The problem is when I open the text file the column vectors are saved as row vectors and their length is limited to 545. Any help would be much appreciated.

Your first iteration of the for loop prints the ENTIRE first array, in cell_structure. It doesn't matter whether this array is a row or a column vector, since you're using fprintf(), it's going to print each element out, one after another.
This is a little trickier than I can manage at work right now... but you will need to pad your shorter vectors to match the length of your largest.
Then:
for k = 1:size_of_largest_vector
for j = 1:total_number_of_columns
new_cell{k,j} = cell_structure{j}(k)
end
end
This will give you an array of all your column vectors.
Then, use a space delimited csvwrite() to write the columns.

I used Elijah Rockers idea to pad out the shorter column vectors so they are all the same length. And Crowley pointed out that dmlwrite and cvswrite cannot handle empty cells. I found a function that can handle empty cells:
https://github.com/mkoohafkan/UCBcode-Matlab/blob/master/OtherPeoplesFunctions/dlmcell.m
And here is my code to pad out and the save data.
for k = 1:max(cellfun('length',cell)); %longest column vector length
for j = 1:length(cell); % number of colum vctors
if k > length(cell{j}); % if the index is longer than colum vector length
new_cell{k,j} = []; % pad vector
else
new_cell{k,j} = cell{j}(k); %otherwise add to array
end
end
end
dlmcell('test.txt',new_cell,',')
It works like a charm, all of the column vectors in the cell array are now saved in separate columns of the text file.

Related

Matlab: cell to matrix row

I am trying to convert a text file (comma seperated values) to a matrix in MATLAB. I have been able to fetch the individual lines into cell array but I can't convert those values to a matrix form. The 7X1 cell obtained by reading the line of the file is a row major and has the following values
line =
'123'
''
'"Foo"'
[1X27 char]
'1.01'
'2.02'
'0'
So, the file is basically a collection of similar lines. Any suggestion as to how can I convert these lines to a matrix form ? Any help would be appreciated.
You can use the cell2mat function to create a matrix from your cell (see MATLAB help page for details). As your cell is a column vector and you want a row vector, you will have to transpose the cell first.
row = cell2mat(line.');
If you want to add a space between all elements, here's a way to do this:
% Remove all empty fields
line = line(~cellfun(#isempty,line));
% Add a space after each element
line = strcat(line,{' '});
% Now call cell2mat
row = cell2mat(line.');
% and remove space at end
row = strtrim(row);
You can use regexpi to find quotation marks and remove them.
row = row(regexpi(row,'"')) = '';

Importing text file into matrix form with indexes as strings?

I'm new to Matlab so bear with me. I have a text file in this form :
b0002 b0003 999
b0002 b0004 999
b0002 b0261 800
I need to read this file and convert it into a matrix. The first and second column in the text file are analogous to row and column of a matrix(the indices). I have another text file with a list of all values of 'indices'. So it should be possible to create an empty matrix beforehand.
b0002
b0003
b0004
b0005
b0006
b0007
b0008
Is there anyway to access matrix elements using custom string indices(I doubt it but just wondering)? If not, I'm guessing the only way to do this is to assign the first row and first column the index string values and then assign the third column values based on the first text file. Can anyone help me with that?
You can easily convert those strings to numbers and then use those as indices. For a given string, b0002:
s = 'b0002'
str2num(s(2:end); % output = 2
Furthermore, you can also do this with a char matrix:
t = ['b0002';'b0003';'b0004']
t =
b0002
b0003
b0004
str2num(t(:,2:end))
ans =
2
3
4
First, we use textscan to read the data in as two strings and a float (could use other numerical formats. We have to open the file for reading first.
fid = fopen('myfile.txt');
A = textscan(fid,'%s%s%f');
textscan returns a cell array, so we have to extract your three variables. x and y are converted to single char arrays using cell2mat (works only if all the strings inside are the same length), n is a list of numbers.
x = cell2mat(A{1});
y = cell2mat(A{2});
n = A{3};
We can now convert x and y to numbers by telling it to take every row : but only the second to final part of the row 2:end, e.g 002, 003 , not b002, b003.
x = str2num(x(:,2:end));
y = str2num(y(:,2:end));
Slight problem with indexing - if I have a matrix A and I do this:
A = magic(8);
A([1,5],[3,8])
Then it returns four elements - [1,3],[5,3],[1,8],[5,8] - not two. But what you want is the location in your matrix equivalent to x(1),y(1) to be set to n(1) and so on. To do this, we need to 1) work out the final size of matrix. 2) use sub2ind to calculate the right locations.
% find the size
% if you have a specified size you want the output to be use that instead
xsize = max(x);
ysize = max(y);
% initialise the output matrix - not always necessary but good practice
out = zeros(xsize,ysize);
% turn our x,y into linear indices
ind = sub2ind([xsize,ysize],x,y);
% put our numbers in our output matrix
out(ind) = n;

Count the number of lines starting with particular string - Matlab

I would like to count the number of strings in a text file starting with particular string only
function count = countLines(fname)
fh = fopen(fname, 'rt');
assert(fh ~= -1, 'Could not read: %s', fname);
x = onCleanup(#() fclose(fh));
count = 0;
while ~feof(fh)
count = count + sum( fread( fh, 16384, 'char' ) == char(10) );
end
count = count+1;
end
I am using the above function to count the number of lines in the whole .text file. But now I wanted to find the number of lines starting with only with particular strings (Eg. All the lines starting with a letter 's').
You can try two importdata based approaches here.
Approach #1
str1 = 's'; %// search string
%// Read in text data into a cell array with each cell holding text data from
%// each line of text file
textdata = importdata(fname,'\n') ;
%// Compare each cell's starting n characters for the search string, where
%// n is no. of chars in search string and sum those matches for the final count
count = sum(cellfun(#(x) strncmp(x,str1,numel(str1)),textdata))
Approach #2
%// ..... Get str1 and textdata as shown in Approach #1
%// Convert cell array data into char array
char_textdata = char(textdata)
%// Select the first n columns from char array and compare against search
%// string for all characters match and sum those matches for the final count.
%// Here n is the number of characters in search string
count = sum(all(bsxfun(#eq,char_textdata(:,1:numel(str1)),str1),2))
With both these approaches, you can even specify a char array as search string.
Although I don't get the idea behind your code, so I cannot modify it I use this approach:
fid = fopen(txt);
% Loop through data file until we get a -1 indicating EOF
while(x ~= (-1))
line = fgetl(fid);
r = r+1;
end
r = r-1;
with r being the number of lines in the file. You can easily check the value of line before increasing the counter r to meet your condition first.
Also I don't know what are the performance issues regarding those two methods.

matlab parse file into cell array

I have a file in the following format in matlab:
user_id_a: (item_1,rating),(item_2,rating),...(item_n,rating)
user_id_b: (item_25,rating),(item_50,rating),...(item_x,rating)
....
....
so each line has values separated by a colon where the value to the left of the colon is a number representing user_id and the values to the right are tuples of item_ids (also numbers) and rating (numbers not floats).
I would like to read this data into a matlab cell array or better yet ultimately convert it into a sparse matrix wherein the user_id represents the row index, and the item_id represents the column index and store the corresponding rating in that array index. (This would work as I know a-priori the number of users and items in my universe so ids cannot be greater than that ).
Any help would be appreciated.
I have thus far tried the textscan function as follows:
c = textscan(f,'%d %s','delimiter',':') %this creates two cells one with all the user_ids
%and another with all the remaining string values.
Now if I try to do something like str2mat(c{2}), it works but it stores the '(' and ')' characters also in the matrix. I would like to store a sparse matrix in the fashion that I described above.
I am fairly new to matlab and would appreciate any help regarding this matter.
f = fopen('data.txt','rt'); %// data file. Open as text ('t')
str = textscan(f,'%s'); %// gives a cell which contains a cell array of strings
str = str{1}; %// cell array of strings
r = str(1:2:end);
r = cellfun(#(s) str2num(s(1:end-1)), r); %// rows; numeric vector
pairs = str(2:2:end);
pairs = regexprep(pairs,'[(,)]',' ');
pairs = cellfun(#(s) str2num(s(1:end-1)), pairs, 'uni', 0);
%// pairs; cell array of numeric vectors
cols = cellfun(#(x) x(1:2:end), pairs, 'uni', 0);
%// columns; cell array of numeric vectors
vals = cellfun(#(x) x(2:2:end), pairs, 'uni', 0);
%// values; cell array of numeric vectors
rows = arrayfun(#(n) repmat(r(n),1,numel(cols{n})), 1:numel(r), 'uni', 0);
%// rows repeated to match cols; cell array of numeric vectors
matrix = sparse([rows{:}], [cols{:}], [vals{:}]);
%// concat rows, cols and vals into vectors and use as inputs to sparse
For the example file
1: (1,3),(2,4),(3,5)
10: (1,1),(2,2)
this gives the following sparse matrix:
matrix =
(1,1) 3
(10,1) 1
(1,2) 4
(10,2) 2
(1,3) 5
I think newer versions of Matlab have a stringsplit function that makes this approach overkill, but the following works, if not quickly. It splits the file into userid's and "other stuff" as you show, initializes a large empty matrix, and then iterates through the other stuff, breaking it apart and placing in the correct place in the matrix.
(I Didn't see the previous answer when I opened this for some reason - it is more sophisticated than this one, though this may be a little easier to follow at the expense of slowness). I throw in the \s* into the regex in case the spacing is inconsistent, but otherwise don't perform much in the way of data-sanity-checking. Output is the full array, that you can then turn into a sparse array if desired.
% matlab_test.txt:
% 101: (1,42),(2,65),(5,0)
% 102: (25,78),(50,12),(6,143),(2,123)
% 103: (23,6),(56,3)
clear all;
fclose('all');
% your path will vary, of course
file = '<path>/matlab_test.txt';
f = fopen(file);
c = textscan(f,'%d %s','delimiter',':');
celldisp(c)
uids = c{1}
tuples = c{2}
% These are stated as known
num_users = 3;
num_items = 40;
desired_array = zeros(num_users, num_items);
expression = '\((\d+)\s*,\s*(\d+)\)'
% Assuming length(tuples) == num_users for simplicity
for k = 1:num_users
uid = uids(k)
tokens = regexp(tuples{k}, expression, 'tokens');
for l = 1:length(tokens)
item_id = str2num(tokens{l}{1})
rating = str2num(tokens{l}{2})
desired_array(uid, item_id) = rating;
end
end

regarding saving the numerical and string values altogether

I have a string matrix with N rows and 2 columns. Each cell stores a string. I have another N*1 vector, where each entry is a numerical value.
How can I save these two structures into a single text file row by row.
In other words, each row of the saved text file is composed of three elements, the first two elements come from a row of the string matrix and the third element comes from the corresponding row of that vector.
Thanks.
If I understand correctly, then fake data can be represented as this:
% Both have N=2 rows
strMat1 = {'a','b';'c','d';};
strMat2 = {1;2};
And if you want the output of this data to be a text file with:
ac1
bd2
Then you should do this:
txtOut = [];
if size(strMat1,1) == size(strMat2,1);
for row = 1:size(strMat1,1)
txtOut= [txtOut strMat1{:,row} num2str(strMat2{row}) '\n'];
end
else
disp('Size disagreement')
end
fid=fopen('textData.txt','wt');
fprintf(fid,txtOut)
It checks the vectors to make sure there are the same number of rows and then creates a txtOut string to be passed to a fprintf command.
Hope this helps! If you wanted the output to be spaced differently, just add spaces to the appending line in the form of ' ' .