Read block of data into matlab array - matlab

I have a data file look like the following
3 1.0 1.4 1.7
2 1.2 1.5
1 1.1
2 1.1 1.2
For each line, the first integer indicates the number of floating numbers in this line.
Now I want to load all the data into a single matlab array, and ignore first column, that is, I want to get a array like this
>>arr = [1.0, 1.4, 1.7, 1.2, 1.5, 1.1, 1.1, 1.2]
if for each line, we have same number of floating numbers, I can simply do it like this
>>arr = load datafile ;
>>arr = arr(:,2:end) ; %ignore the first column
>>arr = arr(:) ;
However, if we have different number of floating numbers in each line, it seems we cannot directly loaded the file into a matrix. Is there any simple way to accomplish this ?
Thank you.

First, let's read the numbers as strings:
C = textread('myfile.txt', '%s', 'delimiter', '\n');
The result is a cell-array of strings, so let's apply str2num on each cell to obtain numerical values:
C = cellfun(#str2num, C, 'Uniform', false);
Now let's discard the first element from each cell:
C = cellfun(#(x)x(2:end), C, 'Uniform', false);
Finally, we concatenate all values into one vector:
arr = [C{:}]
This is the complete code:
C = textread('test.txt', '%s', 'delimiter', '\n'); %// Read data
C = cellfun(#str2num, C, 'Uniform', false); %// Convert to numbers
C = cellfun(#(x)x(2:end), C, 'Uniform', false); %// Remove first values
arr = [C{:}]
arr =
1.0000 1.4000 1.7000 1.2000 1.5000 1.1000 1.1000 1.2000

An easy way to do this would be to just read the file line by line
fid = fopen('data.txt');
arr = [];
tline = fgetl(fid);
while (tline ~= -1)
temp = str2num(tline);
arr = [arr temp(2:end)];
tline = fgetl(fid);
end
You might also try using the loadcell function, though I didn't try it so I'm not positive it will work for you.

Related

How to convert an array of doubles to a cell array of strings with the same length each?

I woould like to insert a space before each positive value in a matrix.
I start with:
A =
1.0000 -0.2176 0.3766
-0.2176 1.0000 0.3898
-0.3766 0.3898 1.0000
I apply a function to each value of A:
B = arrayfun(#(x) num2str(x,'% 5.2f'),A,'UniformOutput',0)
And the ouput is this:
B =
'1.00' '-0.22' '0.38'
'-0.22' '1.00' '0.39'
'-0.38' '0.39' '1.00'
However, I would like the output to be:
B =
' 1.00' '-0.22' ' 0.38'
'-0.22' ' 1.00' ' 0.39'
'-0.38' ' 0.39' ' 1.00'
Notice that each cell has the same width (5 characters), no matter if the numb is positive or negative.
Thank you!
Insert the plus for equal length, then replace it with a blank
B = arrayfun(#(x) strrep(num2str(x,'%+5.2f'),'+',' '),A,'Uni',false)
If your question is just about equal length, use:
B = arrayfun(#(x) num2str(x,'%+5.2f'),A,'Uni',false)
or
B = arrayfun(#(x) num2str(x,'%05.2f'),A,'Uni',false)
You could avoid arrayfun and vectorize the conversion by using the precision property of num2str to apply it to the whole matrix directly:
prec = 2
B = mat2cell(num2str(A,'%+5.2f'), ones(size(A,1),1), (prec+3).*ones(size(A,2),1))
B =
'+1.00' '-0.22' '+0.38'
'-0.22' '+1.00' '+0.39'
'-0.38' '+0.39' '+1.00'
Explanation:
%// apply num2str to whole matrix with precision property
charArray = num2str(A,'%+5.2f');
%// reshape resulting char array
B = mat2cell(charArray, [1 1 1], [3+2 3+2 3+2])
%// which is generically
B = mat2cell(charArray, ones(size(A,1),1), (prec+3).*ones(size(A,2),1))
Benchmark:
f1 = #() mat2cell(num2str(A,'%+5.2f'), ones(size(A,1),1), (prec+3).*ones(size(A,2),1));
f2 = #() arrayfun(#(x) num2str(x,'%+5.2f'),A,'Uni',false);
t1 = timeit(f1)
t2 = timeit(f2)
t1 = 0.25875 %// mat2cell
t2 = 4.2812 %// arrayfun
So for a 200x100 matrix, the mat2cell solution is almost 20 times faster than arrayfun.

How to convert an {Mx1} cell array of {1xN cell} arrays into a {1xN} cell array of {Mx1 cell} arrays?

Suppose that C is a cell array with shape M × 1 (i.e., size(C) returns [M 1]), and that each element of C is in turn a cell array with shape 1 × N.
I often want to convert such a cell array to a new cell array D having shape 1 × N, with elements being cell arrays with shape M × 1, and such that C{i}{j} equals D{j}{i} for all 0 < i &leq; M, and 0 < j &leq; N.
I use the following monstrosity for this
D = arrayfun(#(j) arrayfun(#(i) C{i}{j}, (1:M)', 'un', 0), 1:N, 'un', 0);
but the need for this operation arises often enough (after all, it's sort of a "cell array transpose") that I thought I'd ask:
is there a more standard way to do this operation?
Note that the desired D is different from
E = cat(2, C{:});
or, equivalently,
E = cat(1, D{:});
The E above is a two-dimensional (M × N) cell array, whereas both C and D are one-dimensional cell arrays of one-dimensional cell arrays. Of course, conversion of E back to either C or D is also another often needed operation (this sort of thing is never-ending with MATLAB), but I'll leave it for another post.
The motivation behind this question goes far beyond the problem described above. It turns out that a huge fraction of my MATLAB code, and an even larger fraction of my MATLAB programming time and effort, are devoted to this essentially unproductive chore of converting data from one format to another. Of course, format conversion is unavoidable when doing any kind of computational work, but with MATLAB I find myself doing it a lot more, or at least having to work a lot harder at it, than when I work in other systems (e.g., Mathematica or Python/NumPy). My hope is that by building up my repertoire of MATLAB "format conversion tricks" I will be able to bring down to a more reasonable level the fraction of my MATLAB programming time that I have to devote to format conversion.
P.S. The following code constructs a C like the one described above, for M = 5 and N = 2.
uc = ['A':'Z'];
randstr = #() uc(randi(26, [1 4]));
M = 5;
rng(0); % keep example reproducible
C = arrayfun(#(i) {randstr() i}, 1:M, 'un', 0)';
% C =
% {1x2 cell}
% {1x2 cell}
% {1x2 cell}
% {1x2 cell}
% {1x2 cell}
% >> cat(1, C{:});
% ans =
% 'VXDX' [1]
% 'QCHO' [2]
% 'YZEZ' [3]
% 'YMUD' [4]
% 'KXUY' [5]
%
N = 2;
D = arrayfun(#(j) arrayfun(#(i) C{i}{j}, (1:M)', 'un', 0), 1:N, 'un', 0);
% D =
% {5x1 cell} {5x1 cell}
Here's a little trick using num2cell, which actually works with cell array inputs – the key is to first expand C into a 5-by-2 cell array (equivalent to cell(5,2)).
% Your example to produce C
uc = ['A':'Z'];
randstr = #() uc(randi(26, [1 4]));
M = 5;
rng(0); % Keep example reproducible
C = arrayfun(#(i) {randstr() i}, 1:M, 'un', 0)';
% D = num2cell(reshape([C{:}],[N M]).',[1 M])
D = num2cell(reshape([C{:}],[size(C{1},2) size(C,1)]).',[1 size(C,1)])
or more simply
D = num2cell(cat(1,C{:}),1)
where D{:} returns:
ans =
'VXDX'
'QCHO'
'YZEZ'
'YMUD'
'KXUY'
ans =
[1]
[2]
[3]
[4]
[5]
The inverse operation to go from D back to C can be accomplished via:
% C = num2cell(reshape([D{:}],[N M]),[M N])
C = num2cell(reshape([D{:}],[size(D{1},1) size(D,2)]),[size(D,2) size(D{1},1)])
or
C = num2cell(cat(2,D{:}),2)
Thus you might be able to create a function like the following that would work in either direction:
transpose_cells = #(C)num2cell(cat(isrow(C)+1,C{:}),isrow(C)+1);
isequal(transpose_cells(transpose_cells(C)),C)

Matlab Adjust in Vectorized Code

A few days ago I posted this question and got the following splendid solution:
fid = fopen('C:\Testes_veloc\test.txt', 'Wt');
fmt = sprintf('%s;%d;%%d;%d;%d;%%d;%%d;%%d;%%.4f \\n',str1,num1,0,2)
[a,b,c,d] = ndgrid(vect2,vect1,day,1:15);
out = sprintf(fmt, [d(:), c(:), b(:), a(:), reshape(permute(MD,[2,1,3,4]),[],1)]');
fprintf(fid, '%s', out);
The variables str1, num1, day, vect1, vect2 and MD are inputs of this function:
str1 is a string 1x1
num1 is an integer 1x1
day is a vector 10x1
vect1 is a vector 7x1
vect2 is a vector 180x1
MD is a 4D matrix (7x180x10x15)
The objective was to have a text file as follows:
result.txt:
RED;12;7;0;2;1;4;7;0.0140
RED;12;7;0;2;2;9;7;0.1484
RED;12;7;0;2;3;7;4;0.1787
RED;12;7;0;2;4;2;6;0.7891
RED;12;7;0;2;5;9;6;0.1160
RED;12;7;0;2;6;9;1;0.9893
However, str1 is not a 1x1 string; it's a vector of names (189000x1), that has the length of the text that I desire. In other words, instead of only 'RED', I have many different others strings. Is it possible to adjust this vectorized loop to this situation?
I tried this (add the str1(:) to the concatenation part), but unsuccessfully:
fmt = sprintf('%%s;%s;%d;%%d;%d;%d;%%d;%%d;%%d;%%.4f \\n',str1,num1,0,2)
out = sprintf(fmt, [str1 (:), d(:), c(:), b(:), a(:), reshape(permute(MD,[2,1,3,4]),[],1)]');
For example, str(1,:) = 'RED'; str(2,:) = 'FHAW'; str(3,:) = 'KI81'; a cell like this.
It fails to concatenate the string to the numbers. Does anyone have solution?
Thanks in advance.
sprintf (like fprintf) fills format fields using the arguments in the order they are provided. If you provide more arguments than the format calls for, these functions repeat the format with the additional functions:
sprintf('%s %i\n', 'a', 1, 'b', 2, 'c', 3)
% returns
ans =
a 1
b 2
c 3
Using Matlab's cell unravelling technique, you can prepare your arguments first, then pass them to the sprintf:
tmp = {'a', 1, 'b', 2, 'c', 3};
sprintf('%s %i\n', tmp{:})
You can get fancier by concatenating cell arrays:
tmp1 = {'a','b','c'};
tmp2 = [1 2 3];
tmp = [tmp1' num2cell(tmp2')]'
sprintf('%s %i\n', tmp{:})
% Returns
tmp =
'a' 'b' 'c'
[1] [2] [3]
ans =
a 1
b 2
c 3
Note that the layout of tmp is transposed of the layout in the format. This is because Matlab reads the data in row-major order, so it will march down rows, then columns, to get the arguments for sprintf.
So, in your scenario, you need to create a large cell array with your arguments, then pass that to sprintf.
fid = fopen('C:\Testes_veloc\test.txt', 'Wt');
fmt = sprintf('%%s;%d;%%d;%d;%d;%%d;%%d;%%d;%%.4f \\n',num1,0,2)
[a,b,c,d] = ndgrid(vect2,vect1,day,1:15);
tmp = [str(:) num2cell([d(:) c(:) b(:) a(:) reshape(permute(MD,[2,1,3,4]),[],1)]'])]';
fprintf(fid, fmt, tmp{:});
fclose(fid);

Matlab: Put each line of a text file in a separate array

I have a file like the following
10158 18227 2055 24478 25532
12936 14953 17522 17616 20898 24993 24996
26375 27950 32700 33099 33496 3663
...
I would like to put each line in an array in order to access elements of each line separately.
I used cell arrays but it seems to create a 1 by 1 array for each cell element:
fid=fopen(filename)
nlines = fskipl(fid, Inf)
frewind(fid);
cells = cell(nlines, 1);
for ii = 1:nlines
cells{ii} = fscanf(fid, '%s', 1);
end
fclose(fid);
when I access cells{ii} I get all values in the same element and I can't access the list values
A shorter solution would be reading the file with textscan:
fid = fopen(filename, 'r');
C = cellfun(#str2num, textscan(fid, '%s', 'delimiter', ''), 'Uniform', false);
fclose(fid);
The resulting cell array C is what you're looking for.
I think that fscanf(fid, '%s', 1); is telling matlab to read the line a single string. You will still have to convert it to an array of numbers:
for ii = 1:nlines
cells{ii} = str2num(fscanf(fid, '%s', 1));
end

Loading text file in MATLAB?

I have a comma separated file with 182 rows and 501 columns, of which 500 columns are of type number (features) while the last column are strings (labels).
Example: 182x501 dimension
1,3,4,6,.........7, ABC
4,5,6,4,.........9, XYZ
3,4,5,3,.........2, ABC
How can I load this file so it will have a data set with a matrix, B, containing the number as my features, and a vector, C, containing the strings as my labels?
d = dataset(B, C);
Build a format specifier for textscan based on the number and types of columns, and have it read the file for you.
nNumberCols = 500;
format = [repmat('%f,', [1 nNumberCols]) '%s'];
fid = fopen(file);
x = textscan(fid, format);
fclose(fid);
B = cat(2, x{1:nNumberCols});
C = x{end};
You could use the textscan function. For example:
fid = fopen('test.dat');
% Read numbers and string into a cell array
data = textscan(fid, '%s %s');
% Then extract the numbers and strings into their own cell arrays
nums = data{1};
str = data{2};
% Convert string of numbers to numbers
for i = 1:length(str)
nums{i} = str2num(nums{i}); %#ok<ST2NM>
end
% Finally, convert cell array of numbers to a matrix
nums = cell2mat(nums);
fclose(fid);
Note that I have made a number of assumptions here, based on the file format you have specified. For example, I assume that there are no spaces after the commas following a number, but that there is a space immediately preceding the string at the end of each line.
To can make the above code more flexible by using a more considered format specifier (the second argument to textscan). See the section Basic Conversion Specifiers in the textscan documentation.
For example, if you have the following data in a file named data.txt:
1,3,4,6,7, ABC
4,5,6,4,9, XYZ
3,4,5,3,2, ABC
you can read it into a matrix B and a cell array C using the code
N = 5; % Number of numeric data to read
fid = fopen('data.txt');
B = []; C = {};
while ~feof(fid) % repeat until end of file is reached
b = fscanf(fid, '%f,', N); % read N numeric data separated by a comma
c = fscanf(fid, '%s', 1); % read a string
B = [B, b];
C = [C, c];
end
C
B
fclose(fid);
to give
C =
'ABC' 'XYZ' 'ABC'
B =
1 4 3
3 5 4
4 6 5
6 4 3
7 9 2