Is it possible for a structure field to contain a matrix? - matlab

I'm working on an assignment where I have to read a tab delimited text file and my output has to be a matlab structure.
The contents of the file look like this (It is a bit messy but you get the picture). The actual file contains 500 genes (the rows starting at Analyte 1) and 204 samples (the columns starting at A2)
#1.2
500 204
Name Desc A2 B2 C2 D2 E2 F2 G2 H2
Analyte 1 Analyte 1 978 903 1060 786 736 649 657 733.5
Analyte 2 Analyte 2 995 921 995.5 840 864.5 757 739 852
Analyte 3 Analyte 3 1445.5 1556.5 1579 1147.5 1249 1069.5 1048 1235
Analyte 4 Analyte 4 1550 1371 1449 1127 1196 1337 1167 1359
Analyte 5 Analyte 5 2074 1776 1960 1653 1544 1464 1338 1706
Analyte 6 Analyte 6 2667 2416.5 2601 2257 2258 2144 2173.5 2348
Analyte 7 Analyte 7 3381.5 3013.5 3353 3099.5 2763 2692 2774 2995
My code is as follows:
fid = fopen('gene_expr_500x204.gct', 'r');%Open the given file
% Skip the first line and determine the number or rows and number of samples
dims = textscan(fid, '%d', 2, 'HeaderLines', 1);
ncols = dims{1}(2);
% Now read the variable names
varnames = textscan(fid, '%s', 2 + ncols);
varnames = varnames{1};
% Now create the format spec for your data (2 strings and the rest floats)
spec = ['%s%s', repmat('%f', [1 ncols])];
% Read in all of the data using this custom format specifier. The delimiter will be a tab
data = textscan(fid, spec, 'Delimiter', '\t');
% Place the data into a struct where the variable names are the fieldnames
ge = data{3:ncols+2}
S = struct('gn', data{1}, 'gd', data{2}, 'sid', {varnames});
The part about ge is my current attempt but its not really working. Any help would be very appreciated, thank you in advance!!

A struct field can hold any datatype including a multi-dimensional array or matrix.
Your issue is that data{3:ncols+2} creates a comma-separated list. Since you only have one output on the left side of the assignment, ge will only hold the last column's value. You need to use cat to concatenate all of the columns into a big matrix.
ge = cat(2, data{3:end});
% Or you can do this implicitly with []
% ge = [data{3:end}];
Then you can pass this value to the struct constructor
S = struct('gn', data(1), 'gd', data(2), 'sid', {varnames}, 'ge', ge);

Related

MATLAB write string and then table to file

I need to write a string and a table to one text file. The string contains a description of the data in the table, but is not the headers for the table. I am using R2019a which I guess means the "Append" writetable option does not work? Please see my example data below:
% Load data
load cereal.mat;
table = table(Calories, Carbo, Cups, Fat, Fiber, Mfg, Name, Potass);
string = {'This is a string about the cereal table'};
filename = "dummyoutput.sfc";
% How I tried to do this (which does not work!)
fid = fopen(filename, 'w', 'n');
fprintf(fid, '%s', cell2mat(string))
fclose(fid);
writetable(table, filename, 'FileType', 'text', 'WriteVariableNames', 0, 'Delimiter', 'tab', 'WriteMode', 'Append')
I get this error:
Error using writetable (line 155)
Wrong number of arguments.
Does anyone have a suggestion as to how to proceed?
Thanks!
A bit hacky, but here's an idea.
Convert your existing table to a cell array with table2cell.
Prepend a row of cells which consists of your string, followed by empty cells.
Convert the cell array back to a table with cell2table, and write the new table to the file.
load cereal.mat;
table = table(Calories, Carbo, Cups, Fat, Fiber, Mfg, Name, Potass);
s = {'This is a string about the cereal table'};
filename = "dummyoutput.sfc";
new_table = cell2table([[s repmat({''},1,size(table,2)-1)] ; table2cell(table)]);
writetable(new_table,filename,'FileType','text','WriteVariableNames',0,'Delimiter','tab');
>> !head dummyoutput.sfc
This is a string about the cereal table
70 5 0.33 1 10 N 100% Bran 280
120 8 -1 5 2 Q 100% Natural Bran 135
70 7 0.33 1 9 K All-Bran 320
50 8 0.5 0 14 K All-Bran with Extra Fiber 330
110 14 0.75 2 1 R Almond Delight -1
110 10.5 0.75 2 1.5 G Apple Cinnamon Cheerios 70
110 11 1 0 1 K Apple Jacks 30
130 18 0.75 2 2 G Basic 4 100
90 15 0.67 1 4 R Bran Chex 125

Matlab calculating line parameter x for 10 layers in the given wavelength range

I know how to calculate the line parameter defined as x below for one layer, considering the given wavelength range 50 to 550 um. Now I want to repeat this calculation for all 10 layers. all the other parameters remain as a constant while temperature varies from layer 1 to 10.Any suggestion would be greatly appreciated.
wl=[100 200 300 400 500]; %5 wavelengths, 5 spectral lines
br=[0.12 0.56 0.45 0.67 0.89]; % broadening parameter for each wavelength
T=[101 102 103 104 105 106 107 108 109 110];% temperature for 10 layers
wlall=linspace(50,550,40);%all the wavelength in 50um to 550 um range
% x is defined as,
%(br*wl/(br*br + (wlall-wl)^2))*br;
%If I do a calculation for the first line
((br(1)*T(1)*wl(1))./(br(1)*br(1)*(T(1)) + (wlall(:)-wl(1)).^2))*br(1)*T(1)
%Now I'm going to calculate it for all the lines in the first layer
k= repmat(wlall,5,1);
for i=1:5;
kn(i,:)=(br(i)*T(1)* wl(i)./(br(i)*br(i)*T(1) + (k(i,:)-
wl(i)).^2))*br(i)*T(1);
end
%Above code gives me x parameter for all the wavelengths in the
%given range( 50 to 550 um) in the first layer, dimension is (5,40)
% I need only the maximum value of each column
an=(kn(:,:)');
[ll,mm]=sort(an,2,'descend');
vn=(ll(:,1))'
%Now my output has the dimension , (1,40) one is for the first layer, 40 is
%for the maximum x parameter corresponding to each wavelength in first layer
%Now I want to calculate the x parameter in all 10 layers,So T should vary
%from T(1) to T(10) and get the
%maximum in each column, so my output should have the dimension ( 10, 40)
You just need to run an extra 'for' loop for each value of 'T'. Here is an example:
clc; close all; clear all;
wl=[100 200 300 400 500]; %5 wavelengths, 5 spectral lines
br=[0.12 0.56 0.45 0.67 0.89]; % broadening parameter for each wavelength
T=[101 102 103 104 105 106 107 108 109 110];% temperature for 10 layers
wlall=linspace(50,550,40);%all the wavelength in 50um to 550 um range
% x is defined as,
%(br*wl/(br*br + (wlall-wl)^2))*br;
%If I do a calculation for the first line
((br(1)*T(1)*wl(1))./(br(1)*br(1)*(T(1)) + (wlall(:)-wl(1)).^2))*br(1)*T(1)
%Now I'm going to calculate it for all the lines in the first layer
k= repmat(wlall,5,1);
for index = 1:numel(T)
for i=1:5
kn(i,:, index)=(br(i)*T(index)* wl(i)./(br(i)*br(i)*T(index) + (k(i,:)- wl(i)).^2))*br(i)*T(index);
end
an(:, :, index) = transpose(kn(:, :, index));
vn(:, index) = max(an(:, :, index), [], 2);
end
vn = transpose(vn);

How to avoid the loop to reduce the computation time of this code?

how to avoid the loop to reduce the computation time of this code (one solution of my last question):
I hope to find the column vectors of A(1:3,:) whose corresponding values in M(4,:) are not part of one of the vectors of the cell X (and obviously not equal to one of these vectors). I look for a fast solution if X is very large.
M = [1007 1007 4044 1007 4044 1007 5002 5002 5002 622 622;
552 552 300 552 300 552 431 431 431 124 124;
2010 2010 1113 2010 1113 2010 1100 1100 1100 88 88;
7 12 25 15 12 30 2 10 55 32 12];
Here I take directly A:
A = [1007 4044 5002 622;
552 300 431 124;
2010 1113 1100 88];
A contains unique column vectors of M(1:3,:)
X = {[2 5 68 44],[2 10 55 9 17],[1 55 6 7 8 9],[32 12]};
[~, ~, subs] = unique(M(1:3,:)','rows');
A4 = accumarray(subs(:),M(4,:).',[],#(x) {x});
%// getting a mask of which columns we want
idxC(length(A4)) = false;
for ii = 1:length(A4)
idxC(ii) = ~any(cellfun(#(x) all(ismember(A4{ii},x)), X));
end
Displaying the columns we want
out = A(:,idxC)
Results:
>> out
out =
1007 4044
552 300
2010 1113
the column vector [5002;431;1100] was eliminated because [2;10;55] is contained in X{2} = [2 10 55 9 17]
the column vector [622;124;88] was eliminated because [32 12] = X{4}
Another example: with the same X
M = [1007 4044 1007 4044 1007 5002 5002 5002 622 622 1007 1007 1007;
552 300 552 300 552 431 431 431 124 124 552 11 11;
2010 1113 2010 1113 2010 1100 1100 1100 88 88 2010 20 20;
12 25 15 12 30 2 10 55 32 12 7 12 7];
X = {[2 5 68 44],[2 10 55 9 17],[1 55 6 7 8 9],[32 12]};
A = [1007 4044 5002 622 1077;
552 300 431 124 11;
2010 1113 1100 88 20];
Results: (with scmg answer)
I get if A sorted according to the first row: (correct result)
out =
1007 1007 4044
11 552 300
20 2010 1113
if I do not sort the matrix A, I get: (false result)
out =
4044 5002 622
300 431 124
1113 1100 88
the column vector A(:,4) = [622;124;88] should be eliminated because [32 12] = X{4}.
the column vector [5002;431;1100] should be eliminated because [2;10;55] is contained in X{2} = [2 10 55 9 17]
The answer of Ben Voigt is great, but the line for A4i = A4{ii} is the one causing issues : the for loop doesn't work this way with column vectors :
%row vector
for i = 1:3
disp('foo');
end
foo
foo
foo
%column vector
for i = (1:3).'
disp('foo');
end
foo
Just try for A4i = A4{ii}.' instead and it should get your work done!
Now, if we look at the output :
A(:,idxC) =
4044 5002
300 431
1113 1100
As you can see, the final result is not what we expected.
As long as unique does a kind of sort, the subs are not numbered by the order of encounter in A, but by order of encounter in C (which is sorted) :
subs =
2
2
3
2
3
2
4
4
4
1
1
Therefore you should pass by the matrix given by unique rather than A to get your final output
Enter
[C, ~, subs] = unique(M(1:3,:)','rows');
%% rather than [~, ~, subs] = unique(M(1:3,:)','rows');
Then, to get the final output, enter
>> out = C(idxC,:).'
out =
1007 4044
552 300
2010 1113
In this case, you should not be trying to eliminate loops. The vectorization is actually hurting you badly.
In particular (giving a name to your anonymous lambda)
issubset = #(x) all(ismember(A4{ii},x))
is ridiculously inefficient, because it doesn't short-circuit. Replace that with a loop.
Same for
any(cellfun(issubset, X))
Use an approach similar to this instead:
idxC = true(size(A4));
NX = numel(X);
for ii = 1:length(A4)
for jj = 1:NX
xj = X{jj};
issubset = true;
for A4i=A4{ii}
if ~ismember(A4i, xj)
issubset = false;
break;
end;
end;
if issubset
idxC(ii) = false;
break;
end;
end;
end;
The two break statements, and especially the second one, trigger an early exit that potentially saves you a huge amount of computation.
Shot #1
Listed in this section is an approach that is supposed to be a quick and direct approach to solve our case. Please note that since A is the matrix of unique columns from M considering upto the third row, it is skipped here as the input because we generate it internally with the solution code. This is maintained in the next approach/shot as well. Here's the implementation -
function out = shot1_func(M,X)
%// Get unique columns and corresponding subscripts
[unqrows, ~, subs_idx] = unique(M(1:3,:)','rows');
unqcols = unqrows.'; %//'
counts = accumarray(subs_idx(:),1); %// Counts of each unique subs_idx
%// Modify each cell of X based on their relevance with the fourth row of M
X1 = cellfun(#(x) subs_idx(ismember(M(4,:),x)),X,'Uni',0);
lensX = cellfun('length',X1); %// Cell element count of X1
Xn = vertcat(X1{:}); %// Numeric array version of X
N = max(subs_idx); %// Number of unique subs_idx
%// Finally, get decision mask to select the correst columns from unqcols
sums = cumsum(bsxfun(#eq,Xn,1:N),1);
cumsums_at_shifts = sums(cumsum(lensX),:);
mask1 = any(bsxfun(#eq,diff(cumsums_at_shifts,[],1),counts(:).'),1); %//'
decision_mask = mask1 | cumsums_at_shifts(1,:) == counts(:).'; %//'
out = unqcols(:,~decision_mask);
return
Shot #2
The earlier mentioned approach might have a bottleneck at :
cellfun(#(x) subs_idx(ismember(M4,x)),X,'Uni',0)
So, alternatively to keep performance as a good motivation, one can separate out the whole process into two stages. The first stage could take care of cells of X that are not repeated in the fourth row of M, which could be implemented with a vectorized approach and another stage solving for the rest of X's cells with our slower cellfun based approach.
Thus, the code would bloat out a bit, but hopefully would be better with performance. The final implementation would look something like this -
%// Get unique columns and corresponding subscripts
[unqrows, ~, subs_idx] = unique(M(1:3,:)','rows')
unqcols = unqrows.' %//'
counts = accumarray(subs_idx,1);
%// Form ID array for X
lX = cellfun('length',X)
X_id = zeros(1,sum(lX))
X_id([1 cumsum(lX(1:end-1)) + 1]) = 1
X_id = cumsum(X_id)
Xr = cellfun(#(x) x(:).',X,'Uni',0); %//'# Convert to cells of row vectors
X1 = [Xr{:}] %// Get numeric array version
%// Detect cells that are to be processed by part1 (vectorized code)
[valid,idx1] = ismember(M(4,:),X1)
p1v = ~ismember(1:max(X_id),unique(X_id(accumarray(idx1(valid).',1)>1))) %//'
X_part1 = Xr(p1v)
X_part2 = Xr(~p1v)
%// Get decision masks from first and second passes and thus the final output
N = size(unqcols,2);
dm1 = first_pass(X_part1,M(4,:),subs_idx,counts,N)
dm2 = second_pass(X_part2,M(4,:),subs_idx,counts)
out = unqcols(:,~dm1 & ~dm2)
Associated functions -
function decision_mask = first_pass(X,M4,subs_idx,counts,N)
lensX = cellfun('length',X)'; %//'# Get X cells lengths
X1 = [X{:}]; %// Extract cell data from X
%// Finally, get the decision mask
vals = changem(X1,subs_idx,M4) .* ismember(X1,M4);
sums = cumsum(bsxfun(#eq,vals(:),1:N),1);
cumsums_at_shifts = sums(cumsum(lensX),:);
mask1 = any(bsxfun(#eq,diff(cumsums_at_shifts,[],1),counts(:).'),1); %//'
decision_mask = mask1 | cumsums_at_shifts(1,:) == counts(:).'; %//'
return
function decision_mask = second_pass(X,M4,subs_idx,counts)
%// Modify each cell of X based on their relevance with the fourth row of M
X1 = cellfun(#(x) subs_idx(ismember(M4,x)),X,'Uni',0);
lensX = cellfun('length',X1); %// Cell element count of X1
Xn = vertcat(X1{:}); %// Numeric array version of X
N = max(subs_idx); %// Number of unique subs_idx
%// Finally, get decision mask to select the correst columns from unqcols
sums = cumsum(bsxfun(#eq,Xn,1:N),1);
cumsums_at_shifts = sums(cumsum(lensX),:);
mask1 = any(bsxfun(#eq,diff(cumsums_at_shifts,[],1),counts(:).'),1); %//'
decision_mask = mask1 | cumsums_at_shifts(1,:) == counts(:).'; %//'
return
Verficication
This section lists code to verify the output. Here's the code to do so to verify the shot #1 code -
%// Setup inputs and output
load('matrice_data.mat'); %// Load input data
X = cellfun(#(x) unique(x).',X,'Uni',0); %// Consider X's unique elements
out = shot1_func(M,X); %// output with Shot#1 function
%// Accumulate fourth row data from M based on the uniqueness from first 3 rows
[unqrows, ~, subs] = unique(M(1:3,:)','rows'); %//'
unqcols = unqrows.'; %//'
M4 = accumarray(subs(:),M(4,:).',[],#(x) {x}); %//'
M4 = cellfun(#(x) unique(x),M4,'Uni',0);
%// Find out cells in M4 that correspond to unique columns unqcols
[unqcols_idx,~] = find(pdist2(unqcols.',out.')==0);
%// Finally, verify output
for ii = 1:numel(unqcols_idx)
for jj = 1:numel(X)
if all(ismember(M4{unqcols_idx(ii)},X{jj}))
error('Error: Wrong output!')
end
end
end
disp('Success!')
Maybe you can use 2 times cellfun:
idxC = cellfun(#(a) ~any(cellfun(#(x) all(ismember(a,x)), X)), A4, 'un', 0);
idxC = cell2mat(idxC);
out = A(:,idxC)

How to load a text file in Matlab when the number of values in every line are different

I have a none rectangular text file like A which has 10 values in first line, 14 values in 2nd line, 16 values in 3rd line and so on. Here is an example of 4 lines of my text file:
line1:
1.68595314026 -1.48498177528 2.39820933342 27 20 15 2 4 62 -487.471069336 -517.781921387 5 96 -524.886108398 -485.697143555
Line2:
1.24980998039 -0.988095104694 1.89048337936 212 209 191 2 1 989 -641.149658203 -249.001220703 3 1036 -608.681762695 -300.815673828
Line3:
8.10434532166 -4.81520080566 4.90576314926 118 115 96 3 0 1703 749.967773438 -754.015136719 1 1359 1276.73632813 -941.855895996 2 1497 1338.98852539 -837.659179688
Line 4:
0.795098006725 -0.98456710577 1.89322447777 213 200 68 5 0 1438 -1386.39111328 -747.421386719 1 1565 -1153.50915527 -342.951965332 2 1481 -1341.57043457 -519.307800293 3 1920 -1058.8828125 -371.696960449 4 1303 -1466.5802002 -308.764587402
Now, I want to load this text file in to a matrix M in Matlab. I tired to use importdata function for loading it
M = importdata('A.txt');
but it loads the file in a rectangular matrix (all rows have same number of columns!!!) which is not right. The expected created matrix size should be like this:
size(M(1,:))= 1 10
size(M(2,:))= 1 14
size(M(3,:))= 1 16
How can I load this text file in a correct way into Matlab?
As #Jens suggested, you should use a cell array. Assuming your file contains only numeric values separated by whitespaces, for instance:
1 3 6
7 8 9 12 15
1 2
0 3 7
You can parse it into cell array like this:
% Read full file
str = fileread('A.txt');
% Convert text in a cell array of strings
c = textscan(str, '%s', 'Delimiter', '\n');
c = c{1};
% Convert 'string' elements to 'double'
n = cellfun(#str2num, c, 'UniformOutput', false)
You can then access individual lines like this:
>> n{1}
ans =
1 3 6
>> n{2}
ans =
7 8 9 12 15

Use textscan to read datablock

How to extract the "mean" and "depth" data like the following of each month?
MEAN, S.D., NO. OF OBSERVATIONS
January February ...
Depth Mean S.D. #Obs Mean S.D. #Obs ...
0 32.92 0.43 9 32.95 0.32 21
10 32.92 0.43 14 33.06 0.37 48
20 32.88 0.46 10 33.06 0.37 50
30 32.90 0.51 9 33.12 0.35 48
50 33.05 0.54 6 33.20 0.42 41
75 33.70 1.11 7 33.53 0.67 37
100 34.77 1 34.47 0.42 10
150
200
July August
Depth Mean S.D. #Obs Mean S.D. #Obs
0 32.76 0.45 18 32.75 0.80 73
10 32.76 0.40 23 32.65 0.92 130
20 32.98 0.53 24 32.84 0.84 121
30 32.99 0.50 24 32.93 0.59 120
50 33.21 0.48 16 33.05 0.47 109
75 33.70 0.77 10 33.41 0.73 80
100 34.72 0.54 3 34.83 0.62 20
150 34.69 1
200
It has undefinable number of spaces between the data, and a introduction line at the beginning.
Thank you!
Here is an example for how to read line from file:
fid = fopen('yourfile.txt');
tline = fgetl(fid);
while ischar(tline)
disp(tline)
tline = fgetl(fid);
end
fclose(fid);
Inside the while loop you'll want to use strtok (or something like it) to break up each line into string tokens delimited by spaces.
Matlab's regexp is powerful for pulling data out of less-structure text. It's really worth getting familiar with regular expressions in general: http://www.mathworks.com/help/techdoc/ref/regexp.html
In this case, you would define the pattern to capture each observation group (Mean SD Obs), e.g.: 32.92 0.43 9
Here I see a pattern for each group of data: each group is preceded by 6 spaces (regular expression = \s{6}), and the 3 data points are divided by less than 6 spaces (\s+). The data itself consists of two floats (\d+.\d+) and one integer (\d+):
So, putting this together, your capture pattern would look something like this (the brackets surround the pattern of data to capture):
expr = '\s{6}(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+)';
We can add names for each token (i.e. each data point to capture in the group) by adding '?' inside the brackets:
expr = '\s{6}(?<mean>\d+\.\d+)\s+(?<sd>\d+\.\d+)\s+(?<obs>\d+)';
Then, just read your file into one string variable 'strFile' and extract the data using this defined pattern:
strFile = urlread('file://mydata.txt');
[tokens data] = regexp(strFile, expr, 'tokens', 'names');
The variable 'tokens' will then contain a sequence of observation groups and 'data' is a structure with fields .mean .sd and .obs (because these are the token names in 'expr').
If you just want to get, for example, the first two columns, then textscan() is a great choice.
fid = fopen('yourfile.txt');
tline = fgetl(fid);
while ischar(tline)
oneCell = textscan(tline, '%n'); % read the whole line, put it into a cell
allTheNums = oneCell{1}; % open up the cell to get at the columns
if isempty(allTheNums) % no numbers, header line
continue;
end
usefulNums = allTheNums(1:2) % get the first two columns
end
fclose(fid);
textscan automatically splits the strings you feed it where there is whitespace, so the undefined number of strings between columns isn't an issue. A string with no numbers will give an array that you can test as empty to avoid out-of-bounds or bad data errors.
If you need to programmatically figure out which columns to get, you can scan for the words 'Depth' and 'Mean' to find the indeces. Regular expressions might be helpful here, but textscan should work fine too.