I have a huge table data= {1000 x 1000} of binary data.
They table's variable names are encoded for eg D1,D2,...,DA2,DA3,... with their real labels given in a .txt file.
The .txt file also consists of some text for eg:
D1: Age
Mean age: 33
Median :
.
.
.
D2: weight
I would just like to pick out these names from the text file and create a table with the real variable names.
Any suggestions?
If there is a specific number of lines between each of those labels, then you can extract them by reading in the file, and looping over the relevant lines. For each label, it simple to extract the label with strsplit()
e.g. Let's say there's 5 lines between each label
uselessLines = 5;
% imports as a vertical matrix with each line from the file.
dataLabelsFile = importdata(filename);
% get the total number of lines
numLines = size(dataLabelsFile);
% pre-allocate array for labels, a cell is used for a string
dataLabels = cell(ceil(numLines/(uselessLines+1)));
% use a seperate counting variable
m = 1;
% now, for each label, we add it to the dataLabels matrix
for i=1:(uselessLines+1):numLines
line = strsplit(dataLabelsFile{i}); % by default splits on whitespace
dataLabels(m) = line(2);
m = m + 1;
end
By the end of that loop you should have a variable called dataLabels that holds all of the labels. Now, you can actually very easily work out which label goes with which set of data
provided they are still in the same order. The indexes will be the same for the label to the data.
This is a method you could try if the labels are evenly spaced.
However, if the labels are a random number of lines, then you probably want to do a check with a regular expression like the person below me has suggested. Then you just replace the last two lines of the loop with something like this.
...
if (regular expression matched)
dataLabels(m) = line(2);
m = m + 1;
end
...
That being said, while regular expressions are flexible, if you can get away with replacing it with literally one function call, it's usually better to do that. Regex efficiencies are determined by the skill of the programmer, while in-built functions have generally been tested by some of the better programmers in the world. Additionally, Regex's are harder to understand if you ever want to go back and change it.
Of course there are times when Regex's are amazing, I'm just not convinced this is one of those times.
An implemention of the approach in my earlier comment:
fid = fopen(filename);
varNames = cell(0);
proceed = true;
while proceed
line = fgetl(fid);
if ischar(line)
startIdx = regexp(line,'(?<=^[A-Z]*\d*:)\s');
if ~isempty(startIdx)
varNames{end+1} = strtrim(line(startIdx:end)); %#ok<SAGROW>
end
else
proceed = false;
end
end
fclose(fid);
I cant put the resulting varNames in a table for you, since I have a version of Matlab that does not support tables.
Related
I am currently working with a script that saves matrices as .mat files from other .mat files. I need to save 96 separate files so I have a loop that goes through the matrix names. I need to have the matrices saved with specific titles, that I have saved the titles in cell arrays {}. However, when I use the save(filename,variable) function, I get an error saying:
Error using save
Must be a text scalar.
Error in File_Creator (line 35)
save(name,fname);
My matrices need to be named 'PHI_Af','PHI_Am' (so on until) 'SLR_EF' (so every cr value needs to have a matrix with every par value. Here is what I am currently attempting:
cr = {'Af','Am','As','Aw','BS','BW','Cs','Cw','Cf','Ds','Dw','Df','ET','EF'};
par = {'PHI','BLD','KS','LAMBDA','PSIS','SLR'};
underscore = {'_'};
%% i and j are parameters in a loop where i = 1:length(par) and j = 1:length(cr)
%% f is the variable currently storing the matrix
s.(horzcat(par{i},underscore{1},cr{j})) = f;
name = string(strcat(par{i},'_',cr{j},'.mat'));
fname = string(s.(horzcat(par{i},underscore{1},cr{j})));
save(name,fname);
When I replace 'fname' with a generic string e.g. 'f', then the command runs but all the matrices save as the same thing ('f'), which makes it extremely difficult to run them all in the same script later.
I hope somebody can tell me what I'm doing wrong or provide me with a better solution. Please let me know if I can provide any more information.
Thank you
Assuming that the matrix, f, changes in each iteration of the loop (due to some other code you didn't post), it seems like this is all the code you need:
cr = {'Af','Am','As','Aw','BS','BW','Cs','Cw','Cf','Ds','Dw','Df','ET','EF'};
par = {'PHI','BLD','KS','LAMBDA','PSIS','SLR'};
for i = 1:length(par)
for j = 1:length(cr)
% add code here that loads the matrix f
name = [par{i}, '_', cr{j}, '.mat'];
save(name, 'f');
end
end
I have a problem similar to here. However, it doesn't seem that there is a resolution.
My problem is as such: I need to import some files, for example, 5. There are 20 columns in each file, but the number of lines are varied. Column 1 is time in terms of crank-angle degrees, and the rest are data.
So my code first imports all of the files, finds the file with the most number of rows, then creates a multidimensional array with that many rows. The timing is in engine cycles so, I would then remove lines from the imported file that go beyond a whole engine cycle. This way, I always have data in terms of X whole engine cycles. Then I would just interpolate the data to the pre-allocated array to have a giant multi-dimensional array for the 5 data files.
However, this seems to always result in the last row of every column of every page being filled with NaNs. Please have a look at the code below. I can't see where I'm doing wrong. Oh, and by the way, as I have been screwed over before, this is NOT homework.
maxlines = 0;
maxcycle = 999;
for i = 1:1
filename = sprintf('C:\\Directory\\%s\\file.out',folder{i});
file = filelines(filename); % Import file clean
lines = size(file,1); % Find number of lines of file
if lines > maxlines
maxlines = lines; % If number of lines in this file is the most, save it
end
lastCAD = file(end,1); % Add simstart to shift the start of the cycle to 0 CAD
lastcycle = fix((lastCAD-simstart)./cycle); % Find number of whole engine cycles
if lastcycle < maxcycle
maxcycle = lastcycle; % Find lowest number of whole engine cycles amongst all designs
end
cols = size(file,2); % Find number of columns in files
end
lastcycleCAD = maxcycle.*cycle+simstart; % Define last CAD of whole cycle that can be used for analysis
% Import files
thermo = zeros(maxlines,cols,designs); % Initialize array to proper size
xq = linspace(simstart,lastcycleCAD,maxlines); % Define the CAD degrees
for i = 1:designs
filename = sprintf('C:\\Directory\\%s\\file.out',folder{i});
file = importthermo(filename, 6, inf); % Import the file clean
[~,lastcycleindex] = min(abs(file(:,1)-lastcycleCAD)); % Find index of end of last whole cycle
file = file(1:lastcycleindex,:); % Remove all CAD after that
thermo(:,1,i) = xq;
for j = 2:17
thermo(:,j,i) = interp1(file(:,1),file(:,j),xq);
end
sprintf('file from folder %s imported OK',folder{i})
end
thermo(end,:,:) = []; % Remove NaN row
Thank you very much for your help!
Are you sampling out of the range? if so, you need to tell interp1 that you want extrapolation
interp1(file(:,1),file(:,j),xq,'linear','extrap');
I am working with a 400x1200 imported table (readtable generated from an .xls) which contains strings, doubles, dates, and NaNs. Each column is typed consistently. I am looking for a way to locate all instances in the table of any given string ('Help me please') and replace them all with a double (1). Doing this in Matlab will save me loads of work making changes to the approach used on the rest of this project.
Unfortunately, all of the options I've looked at (regexp, strrep, etc) can only take a string as a replacement. Strfind was similarly unhelpful, because of the typing across the table. The lack of cellfun has also made this harder than it should be. I know the solution should have something to do with finding the indices of the strings I want and then just looping DataFile{subscript} = [1], but I can't find a way to do it.
First you should transform your table at a cell array.
Then, you can use the strrep along with str2num, e.g.
% For a given cell index
strrep(yourCellIndexVariable, "Help me please", "1");
str2num(yourCellIndexVariable);
This will replace the string "Help me please" with the string "1" (the strrep function) and the str2num will change the cell index to the double value according to the string.
By yourCellIndexVariable I mean an element from the cell array. There are several ways to get all cells from a cell array, but I think that you have solved that part already.
What you can do is as follows:
[rows, cols] = size(table); % Get the size of your table
YourString = 'Help me please'; % Create your string
Strmat = repmat(YourString,rows,cols); % Stretch to fill a matrix of table size
TrueString = double(strcmp(table,Strmat)); % Compares all entries with one another
TrueString now contains logicals, 1 where the string 'Help me please' is located, and 0 where it is not.
If you have a table containing multiple classes it might be handy to switch to cells though.
Thank you very much everyone for helping think through to a solution. Here's what I ended up with:
% Reads data
[~, ~, raw] = xlsread ( 'MyTable.xlsx');
MyTable = raw;
% Makes a backup of the data in table form
MyTableBackup = readtable( 'MyTable.xlsx' );
% Begin by ditching 1st row with variable names
MyTable(1,:) = [];
% wizard magic - find all cells with strings
StringIndex = cellfun('isclass', MyTable, 'char');
% strrep goes here to recode bad strings. For example:
MyTable(StringIndex) = strrep(MyTable(StringIndex), 'PlzHelpMe', '1');
% Eventually, we are done, so convert back to table
MyTable = cell2table(MyTable);
% Uses backup Table to add variable names
% (the readtable above means the bad characters in variable names are already escaped!)
MyTable.Properties.VariableNames = MyTableBackup.Properties.VariableNames;
This means the new values exist as strings ('1', not 1 as a double), so now I just str2double when I access them for analysis. My takeaway - Matlab is for numbers. Thanks again all!
I have a 2d matrix (A=80,42), I am trying to split it into (80,1) 42 times and save it with a different name. i.e.
M_n1, M_n2, M_n3, … etc (representing the number of column)
I tried
for i= 1:42
M_n(i)=A(:,i)
end
it didn't work
How can I do that without overwrite the result and save each iteration in a file (.txt) ?
You can use eval
for ii = 1:size(A,2)
eval( sprintf( 'M_n%d = A(:,%d);', ii, ii ) );
% now you have M_n? var for you to process
end
However, the use of eval is not recommanded, you might be better off using cell array
M_n = mat2cell( A, [size(A,1)], ones( 1, size(A,2) ) );
Now you have M_n a cell array with 42 cells one for each column of A.
You can access the ii-th column by M_n{ii}
Generally, doing if you consider doing this kind of things: don't.
It does not scale up well, and having them in one array is usually far more convenient.
As long as the results have the same shape, you can use a standard array, if not you can put each result in a cell array eg. :
results = cell(nTests,1)
result{1} = runTest(inputs{1})
or even
results = cellfun(#runTest,inputs,'UniformOutput',false); % where inputs is a cell array
And so on.
If you do want to write the numbers to a file at each iteration, you could do it without the names with csvwrite or the like (since you're only talking about 80 numbers a time).
Another option is using matfile, which lets you write directly to a variable in a .mat file. Consult help matfile for the specifics.
I need to calculate the mean, standard deviation, and other values for a number of variables and I was wondering how to use a loop to my advantage. I have 5 electrodes of data. So to calculate the mean of each I do this:
mean_ch1 = mean(ch1);
mean_ch2 = mean(ch2);
mean_ch3 = mean(ch3);
mean_ch4 = mean(ch4);
mean_ch5 = mean(ch5);
What I want is to be able to condense that code into a line or so. The code I tried does not work:
for i = 1:5
mean_ch(i) = mean(ch(i));
end
I know this code is wrong but it conveys the idea of what I'm trying to accomplish. I want to end up with 5 separate variables that are named by the loop or a cell array with all 5 variables within it allowing for easy recall. I know there must be a way to write this code I'm just not sure how to accomplish it.
You have a few options for how you can do this:
You can put all your channel data into one large matrix first, then compute the mean of the rows or columns using the function MEAN. For example, if each chX variable is an N-by-1 array, you can do the following:
chArray = [ch1 ch2 ch3 ch4 ch5]; %# Make an N-by-5 matrix
meanArray = mean(chArray); %# Take the mean of each column
You can put all your channel data into a cell array first, then compute the mean of each cell using the function CELLFUN:
meanArray = cellfun(#mean,{ch1,ch2,ch3,ch4,ch5});
This would work even if each chX array is a different length from one another.
You can use EVAL to generate the separate variables for each channel mean:
for iChannel = 1:5
varName = ['ch' int2str(iChannel)]; %# Create the name string
eval(['mean_' varName ' = mean(' varName ');']);
end
If it's always exactly 5 channels, you can do
ch = {ch1, ch2, ch3, ch4, ch5}
for j = 1:5
mean_ch(j) = mean(ch{j});
end
A more complicated way would be
for j = 1:nchannels
mean_ch(j) = eval(['mean(ch' num2str(j) ')']);
end
Apart from gnovice's answer. You could use structures and dynamic field names to accomplish your task. First I assume that your channel data variables are all in the format ch* and are the only variables in your MATLAB workspace. The you could do something like the following
%# Move the channel data into a structure with fields ch1, ch2, ....
%# This could be done by saving and reloading the workspace
save('channelData.mat','ch*');
chanData = load('channelData.mat');
%# Next you can then loop through the structure calculating the mean for each channel
flds = fieldnames(chanData); %# get the fieldnames stored in the structure
for i=1:length(flds)
mean_ch(i) = mean(chanData.(flds{i});
end