I am trying to convert data file (here string representing file with three lines) into a structure array like this:
cel = textscan('1 1.1 2 2.2 3 3.3', '%u %f');
str = cell2struct(cel, {'f1', 'f2'}, 2);
However, now I have a struct array of dimension 1x1, where I can only access the columns using array's fields, but not the whole rows (like 'str(2)' for the second row).
What I need is to have an array of structs (or how it can be called) like this:
str = struct('f1', {1, 2, 3}, 'f2', {1.1, 2.2, 3.3});
because now I can (for instance) filter it like this:
subStr = str(find([str.f1] > 1))
which I could not do in the first case.
Any idea how to get there?
At the end I was able to do it by:
cel = textscan('1 1.1 2 2.2 3 3.3', '%u %f');
[f1, f2] = cel{:};
str = struct('f1', num2cell(f1'), 'f2', num2cell(f2'));
But it does not feel right and I am afraid it will be expensive (the files are quite large).
EDIT:
My solution is indeed too memory demanding, therefore not usable.
Typical files have header, footer, and c. 5e6 lines of data in six columns.
Thanks
It's easier if you're actually working with a file that contains lines. For example, if data.txt contains:
1 1.1
2 2.2
3 3.3
And now you can simply load this using:
tbl = readtable('data.txt');
tbl.Properties.VariableNames = {'f1', 'f2'};
Which results in much nicer (imho) filtering syntax:
subTbl = tbl(tbl.f1 > 1, :);
I suggest you read a bit about tables in MATLAB, to learn about their (many) capabilities.
Finally, if you insist on working with struct arrays, you can do:
str = table2struct(tbl); 3×1 struct array with fields: f1 f2
Each element of cel is an array. Using cellfun and num2cell they can be converted to cell arrays:
names = {'f1', 'f2'};
cel = textscan('1 1.1 2 2.2 3 3.3', '%u %f');
cel2 = cellfun(#num2cell, cel, 'UniformOutput', 0);
prep = [names;cel2];
str = struct(prep{:}).';
I wish I would read those more carefully sooner, but according to this and this it is not encouraged to save large datasets the way I was trying to, because
Structures with many fields and small contents have a large overhead and should be avoided. A large array of structures with numeric scalar fields requires much more memory than a structure with fields containing large numeric arrays.
and
For structures and cell arrays, MATLAB creates a header not only for each array, but also for each field of the structure and for each cell of a cell array. Because of this, the amount of memory required to store a structure or cell array depends not only on how much data it holds, but also on how it is constructed.
So therefore array str.f(1:N) requires (for larger N) much more memory than str(1:N).f.
Related
I have an excel file that I grab by:
ds = dataset('XLSFile',fullfile('file path here', 'waterReal.xlsx'))
It looks like this:
I want each column in its own numeric array though! Like how when I load an example dataset: load carsmall, I get a bunch of individual numeric arrays. But I can't figure out how to do that.
I can do this individually by writing:
A = ds.TEMP, B = ds.PROD, ...
Bu what if I had BIG excel file? What then?
You can convert a dataset to a struct or a cell like this:
To struct:
s = dataset2struct(ds, 'AsScalar',true)
To cell:
fnames = fieldnames(ds);
c = cell(1, numel(fnames));
for i = 1:numel(fnames)
c{i} = ds.(fnames{i});
end
By the way: use tables instead of datasets. They're newer and better. Use the readtable function to read your Excel file into a table. And tables are nicer enough that you might not want to bother converting them into a simpler cell array, because you can just grab the columns out with t{:,i} where t is your table and i is the index of the column you want.
I have a large amount of .csv files from my experiments (200+) and previously I have been reading them in seperately and also for later steps in my data handling this is tedious work.
co_15 = csvread('CO_15K.csv',5,0);
co_25 = csvread('CO_25K.csv',5,0);
co2_15 = csvread('CO2_15K.csv',5,0);
co2_80 = csvread('CO2_80K.csv',5,0);
h2o_15 = csvread('H2O_15K.csv',1,0);
etc.....
So I want to make a cell at the beginning of my code looking like this and then a for loop that just reads them automatically.
input = {'co_15' 5;'co_25' 5;...
'co2_15' 5; 'co2_80' 5;...
'h2o_15' 1; 'h2o_140' 1;...
'methanol_15' 5;'methanol_120' 5;'methanol_140' 5;...
'ethanol_15' 5;'ethanol_80' 1;'ethanol_140' 5;...
'co2_ethanol_15' 5 ;'co2_ethanol_80' 5;...
'h2o_ethanol_15' 1 ;'h2o_ethanol_140' 1;...
'methanol_ethanol_15' 5;'methanol_ethanol_120' 5;'methanol_ethanol_140' 5};
for n = 1:size(input,1)
input{n,1} = csvread(strcat(input{n,1},'k.csv'),input{n,2},0);
end
The cell in this code is 19 rows and 2 columns, the rows are all the files and the columns will contain the parameters to handle the data. Now the problem I can't find a solution for is that my first column is a string name and I want that string name to be the name of the variable where csvread writes its data to but the way I set it up now it just overwrites the string in the first column of the cell with the csv data. To be extra clear I want my matlab workspace to have variables with string names in the first column containing the data of my csv files. How do I solve this?
You don't actually want to do this. Even the Mathworks will tell you not to do this. If you are trying to use variable names to keep track of related data like this, there is always a better data structure to hold your data.
One way would be to have a cell array
data = cell(size(input(:,1)));
for n = 1:size(input,1)
data{n} = csvread(strcat(input{n,1},'k.csv'),input{n,2},0);
end
Another good option is to use a struct. You could have a single struct with dynamic field names that correspond to your data.
data = struct();
for n = 1:size(input,1)
data.(input{n,1}) = csvread(strcat(input{n,1},'k.csv'),input{n,2},0);
end
Or actually create an array of structs and hold both the name and the data within the struct.
for n = 1:size(input, 1)
data(n).name = input{n,1};
data(n).data = csvread(strcat(input{n,1},'k.csv'),input{n,2},0);
end
If you absolutly insist on doing this (again, it's is very much not recommended), then you could do it using eval:
for n = 1:size(input, 1)
data = csvread(strcat(input{n,1},'k.csv'),input{n,2},0);
eval([input{n, 1}, '= data;']);
end
I'm a bit new to the matlab world, and I'm running into an issue that I'm sure has an easy solution.
I've imported some data from a text file and parsed out the headers, which resulted in a 1x35 cell called Data. In each cell (for example Data{1,1,1}) is data that looks like:
'600000 -947.772827 -107.045776 -70.818062'
'600001 -920.431396 -86.098122 -56.485119'
'600002 -878.332886 -88.673630 -85.249130'
'600003 -851.637695 -68.546539 -96.691711'
'600004 -834.707642 -28.951260 -73.218872'
'600005 -783.431580 40.657402 24.242268'
The problem is, each line is contained in a single column. I'd like to parse it out so that I have 4 columns instead of one.
I tried parsing out the Data cell even further using:
textscan(Data{1,1,1}, '%u%f10%f10%f10', 1)
But it resulted in the following error:
Error using textscan
First input must be of type double or string.
Can I use textscan this way, or do I need to use some other method to break out the text?
With textscan, you can only specify a single string or a single number. With your input, I suspect it is a 6 x 1 cell array of strings. As such, you have no choice but to iterate over each cell and convert each cell array contents with textscan Also, get rid of the %10 spacing as it's actually screwing up where you're parsing out the string. Also, set the identifier to identify the first number you see to double (%f) as opposed to unsigned integer (%u) to allow for easier conversion.
Therefore, do something like this:
>> Data{1,1,1} = {'600000 -947.772827 -107.045776 -70.818062'
'600001 -920.431396 -86.098122 -56.485119'
'600002 -878.332886 -88.673630 -85.249130'
'600003 -851.637695 -68.546539 -96.691711'
'600004 -834.707642 -28.951260 -73.218872'
'600005 -783.431580 40.657402 24.242268'};
>> format long g;
>> vals = cell2mat(cellfun(#(x) cell2mat(textscan(x, '%f%f%f%f', 1)), Data{1,1,1}, 'uni', 0))
vals =
Columns 1 through 3
600000 -947.772827 -107.045776
600001 -920.431396 -86.098122
600002 -878.332886 -88.67363
600003 -851.637695 -68.546539
600004 -834.707642 -28.95126
600005 -783.43158 40.657402
Column 4
-70.818062
-56.485119
-85.24913
-96.691711
-73.218872
24.242268
That statement vals = ... is quite a mouthful, but easy to explain. Start with this statement:
cell2mat(textscan(x, '%f%f%f%f', 1))
For a given cell x in Data{1,1,1}, we want to parse out four numbers for each string that is stored in x. textscan will place these numbers as individual cell elements into a cell array. We want to convert each element into a numeric array, and so cell2mat is required for us to do so.
In order to operate over all of the elements in Data{1,1,1}, we need to use cellfun to allow us to do so:
cellfun(#(x) cell2mat(textscan(x, '%f%f%f%f', 1)), Data{1,1,1}, 'uni', 0)
The first input is a function that operates on each cell stored in Data{1,1,1} (the second input). We are basically telling cellfun that we want to operate on each cell in the cell array stored in Data{1,1,1} in the way I talked about before. This function has input parameter x, which is one cell from Data{1,1,1}. Now, the uni flag is set to 0 because the output of cellfun will not be a single number, but an array of numbers - one array per line that you have in your cell array. The output of this stage would be a 6 element cell array where each location is a 4 element numeric array. To finish it off, we call cell2mat on this output to finally convert our text into a 2D matrix and therefore:
vals = cell2mat(cellfun(#(x) cell2mat(textscan(x, '%f%f%f%f', 1)), Data{1,1,1}, 'uni', 0))
format long g allows for better display formatting so we can see both the dominant number as well as the floating point numbers neatly.
I have a matlab structure that follows the following pattern:
S.field1.data1
...
.field1.dataN
...
.fieldM.data1
...
.fieldM.dataN
I would like to assign values to one data field (say, data3) from all fields simultaneously. That would be semantically similar to:
S.*.data3 = value
Where the wildcard "*" represents all fields (field1,...,fieldM) in the structure. Is this something that can be done without a loop in matlab?
Since field1 .. fieldM are structure arrays with identical fields, why not make a struct array for "field"? Then you can easily set all "data" members to a specific value using deal.
field(1).data1 = 1;
field(1).data2 = 2;
field(2).data1 = 3;
field(2).data2 = 4;
[field.data1] = deal(5);
disp([field.data1]);
A loop-based solution can be flexible and easily readable:
names = strtrim(cellstr( num2str((1:5)','field%d') )); %'# field1,field2,...
values = num2cell(1:5); %# any values you want
S = struct();
for i=1:numel(names)
S.(names{i}).data3 = values{i};
end
In simple cases, you could do that by converting your struct into a cell array using struct2cell(). As you have a nested structure, I don't think that will work here.
On the other side, is there any reason why your data is structured like this. Your description gives the impression that a simple MxN array or cell array would be more suitable.
Many near-solutions are online, but nothing exact...
I am building a data matrix vector-by-vector:
OutputMatrix(NextSubject,:)=[OutputVector]
I need to lead each row with the name of the data being processed in that loop. The name has the form:
12345.dat
So if OutputVector=[1 2 3 4] the output should look like:
12345.dat 1 2 3 4
I have tried dozens of solutions, but a few examples:
{char(Filename(i).name) OutputVector}
{strcat((Filename(i).name) OutputVector)}
[Filname(i).name OutputVector]
Any help? Please :)
You can't store a string and a vector in a matrix. However, you can do that in a cell.
So you might consider doing:
OutputCell(NextSubject,:) = { Filename(i).name OutputVector };
The curly braces denote that you are storing the object as a cell.
Often though it is better to store strings and number separately. Something like:
OutputMatrix = [];
OutputFile = {};
...
OutputMatrix(NextSubject,:) = OutputVector;
OutputFile{NextSubject} = Filename(i).name;
Then if you access or select rows from output matrix, use the same index for the cell array:
foo(OutputMatrix(index,:), OutputFile(index))