converting a string array into a struct

converting a string array into a struct - matlab

I have a string array with the contents of the row in the following manner.
X ='Xmole(1)=0.0Xmole(2)=1.0rho(1)=2343rho(2)=2343'
Now I need a struct data.Massdensity which should look like this
<data.Massdensity = Xmole(1)=0.0
Xmole(2)=1.0
rho(1)=2343
rho(2)=2343>
I did use cell2struct which will gave me a struct like this
data.Massdensity ='Xmole(1)=0.0Xmole(2)=1.0rho(1)=2343rho(2)=2343'
Is there any way possible I can get the struct like the one above.
I am reading a textfile whose contents look like this
MassDensity{
Xmole(1) = 0.0
Xmole(2) = 1.0
rho(1) = 2343 # [kg/m^3]
rho(2) = 2343 # [kg/m^3]
}
I am using fileread to read this into a single string.
So any better way of doing this

The problem with the intial way you presented your data is that there are no obvious delimiters. Whereas within the original file you have the option (one presumes) of using the line ends as delimiters.
1) Read in as separate strings (may require splitting or reassembly in MATLAB), the individual lines into a cell array. With textscan you can set a range of delimiters and other settings so make full use of the options.
For example:
a = textscan(fid,'%s','Delimiter',...
{'\n','{','}','#'},'CommentStyle','#','MultipleDelimsAsOne',1);
a = a{1}
Ideally you want to end up with:
a{1} = 'Massdensity'
a{2} = 'Xmole(1)=0.0'
...
a{4} = 'rho(2) = 2343'
You may need to do some trimming of whitespace.
2) Create your struct, using dynamic field naming:
data.(a{1})=a(2:end);
data.MassDensity{1}
ans =
Xmole(1) = 0.0

Related

Pull specific cells out of a cell array by comparing the last digit of their filename

I have a cell array of filenames - things like '20160303_144045_4.dat', '20160303_144045_5.dat', which I need to separate into separate arrays by the last digit before the '.dat'; one cell array of '...4.dat's, one of '...5.dat's, etc.
My code is below; it uses regex to split the file around the '.dat', reshapes a bit then regexes again to pull out the last number of the filename, builds a cell to store the filenames in then, and then I get a tad stuck. I have an array produced such as '1,0,1,0,1,0..' of required cell indexes which I thought might be trivial to pull out, but I'm struggling to get it to do what I want.
numFiles = length(sampleFile); %sampleFile is the input cell array
splitFiles = regexp(sampleFile,'.dat','split');
column = vertcat(splitFiles{:});
column = column(:,1);
splitNums = regexp(column,'_','split');
splitNums = splitNums(:,1);
column = vertcat(splitNums{:});
column = column(:,3);
column = cellfun(#str2double,column); %produces column array of values - 3,4,3,4,3,4, etc
uniqueVals = unique(column);
numChannels = length(uniqueVals);
fileNameCell = cell(ceil(numFiles/numChannels),numChannels);
for i = 1:numChannels
column(column ~= uniqueVals(i)) = 0;
column = column / uniqueVals(i); %e.g. 1,0,1,0,1,0
%fileNameCell(i)
end
I feel there should be an easier way than my hodge-podge of code, and I don't want to throw together a ton of messy for-loops if I can avoid it; I definitely believe I've overcomplicated this problem massively.

We can neaten your code quite a bit.
Take some example data:
files = {'abc4.dat';'abc5.dat';'def4.dat';'ghi4.dat';'abc6.dat';'def5.dat';'nonum.dat'};
You can get the final numbers using regexp and matching one or more digits followed by '.dat', then using strrep to remove the '.dat'.
filenums = cellfun(#(r) strrep(regexp(r, '\d+.dat', 'match', 'once'), '.dat', ''), ...
files, 'uniformoutput', false);
Now we can put these in a structure, using the unique numbers (prefixed by a letter because fields can't start with numbers) as field names.
% Get unique file numbers and set up the output struct
ufilenums = unique(filenums);
filestruct = struct;
% Loop over file numbers
for ii = 1:numel(ufilenums)
% Get files which have this number
idx = cellfun(#(r) strcmp(r, ufilenums{ii}), filenums);
% Assign the identified files to their struct field
filestruct.(['x' ufilenums{ii}]) = files(idx);
end
Now you have a neat output
% Files with numbers before .dat given a field in the output struct
filestruct.x4 = {'abc4.dat' 'def4.dat' 'ghi4.dat'}
filestruct.x5 = {'abc5.dat' 'def5.dat'}
filestruct.x6 = {'abc6.dat'}
% Files without numbers before .dat also captured
filestruct.x = {'nonum.dat'}

Saving figure without providing filename [duplicate]

this question about matlab:
i'm running a loop and each iteration a new set of data is produced, and I want it to be saved in a new file each time. I also overwrite old files by changing the name. Looks like this:
name_each_iter = strrep(some_source,'.string.mat','string_new.(j).mat')
and what I#m struggling here is the iteration so that I obtain files:
...string_new.1.mat
...string_new.2.mat
etc.
I was trying with various combination of () [] {} as well as 'string_new.'j'.mat' (which gave syntax error)
How can it be done?

Strings are just vectors of characters. So if you want to iteratively create filenames here's an example of how you would do it:
for j = 1:10,
filename = ['string_new.' num2str(j) '.mat'];
disp(filename)
end
The above code will create the following output:
string_new.1.mat
string_new.2.mat
string_new.3.mat
string_new.4.mat
string_new.5.mat
string_new.6.mat
string_new.7.mat
string_new.8.mat
string_new.9.mat
string_new.10.mat

You could also generate all file names in advance using NUM2STR:
>> filenames = cellstr(num2str((1:10)','string_new.%02d.mat'))
filenames =
'string_new.01.mat'
'string_new.02.mat'
'string_new.03.mat'
'string_new.04.mat'
'string_new.05.mat'
'string_new.06.mat'
'string_new.07.mat'
'string_new.08.mat'
'string_new.09.mat'
'string_new.10.mat'
Now access the cell array contents as filenames{i} in each iteration

sprintf is very useful for this:
for ii=5:12
filename = sprintf('data_%02d.mat',ii)
end
this assigns the following strings to filename:
data_05.mat
data_06.mat
data_07.mat
data_08.mat
data_09.mat
data_10.mat
data_11.mat
data_12.mat
notice the zero padding. sprintf in general is useful if you want parameterized formatted strings.

For creating a name based of an already existing file, you can use regexp to detect the '_new.(number).mat' and change the string depending on what regexp finds:
original_filename = 'data.string.mat';
im = regexp(original_filename,'_new.\d+.mat')
if isempty(im) % original file, no _new.(j) detected
newname = [original_filename(1:end-4) '_new.1.mat'];
else
num = str2double(original_filename(im(end)+5:end-4));
newname = sprintf('%s_new.%d.mat',original_filename(1:im(end)-1),num+1);
end
This does exactly that, and produces:
data.string_new.1.mat
data.string_new.2.mat
data.string_new.3.mat
...
data.string_new.9.mat
data.string_new.10.mat
data.string_new.11.mat
when iterating the above function, starting with 'data.string.mat'

addressing array in the function return

I would like to make the following code simpler.
files=dir('~/some*.txt');
numFiles=length(files);
for i = 1:numFiles
name=files(i).name;
name=strsplit(name,'.');
name=name{1};
name=strsplit(name, '_');
name=name(2);
name = str2num(name{1});
disp(name);
end
I am a begginer in Matlab, in general I would love something like:
name = str2num(strsplit(strsplit(files(i).name,'.')(1),'_')(2));
but matlab does not like this.
Another issue of the approach above is that matlab keeps giving cell type even for something like name(2) but this may be just the problem with my syntax.
Example file names:
3000_0_100ms.txt
3000_0_5s.txt
3000_110_5s.txt
...
Let's say I want to select all files ending in '5s' then I need to split them (after removing the extension) by '_' and return the second part, in the case of the three filenames above, that would be 0, 0, 110.
But I am in general curious how to do this simple operation in matlab without the complicated code that I have above.

Because your filenames follow a specific pattern, they're a prime candidate for a regular expression. While regular expressions can be confusing to learn at the outset they are very powerful tools.
Consider the following example, which pulls out all numbers that have both leading and trailing underscores:
filenames = {'3000_0_100ms.txt', '3000_0_5s.txt', '3000_110_5s.txt'};
strs = regexp(filenames, '(?<=\_)(\d+)(?=\_)', 'match');
strs = [strs{:}]; % Denest one layer of cells
nums = str2double(strs);
Which returns:
nums =
0 0 110
Being used here are what's called lookbehind (?<=...) and lookahead (?=...) operators. As their names suggest, they look in their respective directions related to the expression they're part of, (\d+) in our case, which looks for one or more digits. Though this approach requires more steps than the simple '\_(\d+)\_' expression, the latter requires you either utilize MATLAB's 'tokens' regex operator, which adds another layer of cells and that annoys me, or use the 'match' operator and strip the underscores from the match prior to converting to a numeric value.
Approach 2:
filenames = {'3000_0_100ms.txt', '3000_0_5s.txt', '3000_110_5s.txt'};
strs = regexp(filenames, '\_(\d+)\_', 'tokens');
strs = [strs{:}]; % Denest one layer of cells
strs = [strs{:}]; % Denest another layer of cells
nums = str2double(strs);
Approach 3:
filenames = {'3000_0_100ms.txt', '3000_0_5s.txt', '3000_110_5s.txt'};
strs = regexp(filenames, '\_(\d+)\_', 'match');
strs = [strs{:}]; % Denest one layer of cells
strs = regexprep(strs, '\_', '');
nums = str2double(strs);

You can use regexp to do a regular expression matching and obtain the numbers in the second place directly. This is an explanation of the regular expression I am using.
>>names = regexp({files(:).name},'\d*_(\d*)_\d*m?s\.txt$','tokens')
>>names = [names{:}]; % Get names out of their cells
>>names = [names{:}]; % Break cells one more time
>> nums = str2double(names); % Convert to double to obtain numbers

How to display selected entries of an array of structures in MATLAB

Suppose we have an array of structure. The structure has fields: name, price and cost.
Suppose the array A has size n x 1. If I'd like to display the names of the 1st, 3rd and the 4th structure, I can use the command:
A([1,3,4]).name
The problem is that it prints the following thing on screen:
ans =
name_of_item_1
ans =
name_of_item_3
ans =
name_of_item
How can I remove those ans = things? I tried:
disp(A([1,3,4]).name);
only to get an error/warning.

By doing A([1,3,4]).name, you are returning a comma-separated list. This is equivalent to typing in the following in the MATLAB command prompt:
>> A(1).name, A(3).name, A(4).name
That's why you'll see the MATLAB command prompt give you ans = ... three times.
If you want to display all of the strings together, consider using strjoin to join all of the names together and we can separate the names by a comma. To do this, you'll have to place all of these in a cell array. Let's call this cell array names. As such, if we did this:
names = {A([1,3,4]).name};
This is the same as doing:
names = {A(1).name, A(3).name, A(4).name};
This will create a 1 x 3 cell array of names and we can use these names to join them together by separating them with a comma and a space:
names = {A([1,3,4]).name};
out = strjoin(names, ', ');
You can then show what this final string looks like:
disp(out);

You can use:
[A([1,3,4]).name]
which will, however, concatenate all of the names into a single string.
The better way is to make a cell array using:
{ A([1,3,4]).name }

Extract values from filenames

I have file names stored as follows:
>> allFiles.name
ans =
k-120_knt-500_threshold-0.3_percent-34.57.csv
ans =
k-216_knt-22625_threshold-0.3_percent-33.33.csv
I wish to extract the 4 values from them and store in a cell.
data={};
for k =1:numel(allFiles)
data{k,1}=csvread(allFiles(k).name,1,0);
data{k,2}= %kvalue
data{k,3}= %kntvalue
data{k,4}=%threshold
data{k,5}=%percent
...
end

There's probably a regular expression that can be used to do this, but a simple piece of code would be
data={numel(allFiles),5};
for k =1:numel(allFiles)
data{k,1}=csvread(allFiles(k).name,1,0);
[~,name] = fileparts(allFiles(k).name);
dashIdx = strfind(name,'-'); % find location of dashes
usIdx = strfind(name,'_'); % find location of underscores
data{k,2}= str2double(name(dashIdx(1)+1:usIdx(1)-1)); %kvalue
data{k,3}= str2double(name(dashIdx(2)+1:usIdx(2)-1)); %kntvalue
data{k,4}= str2double(name(dashIdx(3)+1:usIdx(3)-1)); %threshold
data{k,5}= str2double(name(dashIdx(4)+1:end)); %percent
...
end
For efficiency, you might consider using a single matrix to store all the numeric data, and/or a structure (so that you can access the data by name rather than index).

You simply need to tokenize using strtok multiple times (there is more than 1 way to solve this). Someone has a handy matlab script somewhere on the web to tokenize strings into a cell array.
(1) Starting with:
filename = 'k-216_knt-22625_threshold-0.3_percent-33.33.csv'
Use strfind to prune out the extension
r = strfind(filename, '.csv')
filenameWithoutExtension = filename(1:r-1)
This leaves us with:
'k-216_knt-22625_threshold-0.3_percent-33.33'
(2) Then tokenize this:
'k-216_knt-22625_threshold-0.3_percent-33.33'
using '_' . You get the tokens:
'k-216'
'knt-22625'
'threshold-0.3'
'percent-33.33'
(3) Lastly, for each string, tokenize using using '-'. Each second string will be:
'216'
'22625'
'0.3'
'33.33'
And use str2num to convert.

Strategy: strsplit() + str2num().
data={};
for k =1:numel(allFiles)
data{k,1}=csvread(allFiles(k).name,1,0);
words = strsplit( allFiles(k).name(1:(end-4)), '_' );
data{k,2} = str2num(words{1}(2:end));
data{k,3} = str2num(words{2}(4:end));
data{k,4} = str2num(words{3}(10:end));
data{k,5} = str2num(words{4}(8:end));
end