Matlab: How do you seperate text in existing cell - matlab

I'm a bit new to the matlab world, and I'm running into an issue that I'm sure has an easy solution.
I've imported some data from a text file and parsed out the headers, which resulted in a 1x35 cell called Data. In each cell (for example Data{1,1,1}) is data that looks like:
'600000 -947.772827 -107.045776 -70.818062'
'600001 -920.431396 -86.098122 -56.485119'
'600002 -878.332886 -88.673630 -85.249130'
'600003 -851.637695 -68.546539 -96.691711'
'600004 -834.707642 -28.951260 -73.218872'
'600005 -783.431580 40.657402 24.242268'
The problem is, each line is contained in a single column. I'd like to parse it out so that I have 4 columns instead of one.
I tried parsing out the Data cell even further using:
textscan(Data{1,1,1}, '%u%f10%f10%f10', 1)
But it resulted in the following error:
Error using textscan
First input must be of type double or string.
Can I use textscan this way, or do I need to use some other method to break out the text?

With textscan, you can only specify a single string or a single number. With your input, I suspect it is a 6 x 1 cell array of strings. As such, you have no choice but to iterate over each cell and convert each cell array contents with textscan Also, get rid of the %10 spacing as it's actually screwing up where you're parsing out the string. Also, set the identifier to identify the first number you see to double (%f) as opposed to unsigned integer (%u) to allow for easier conversion.
Therefore, do something like this:
>> Data{1,1,1} = {'600000 -947.772827 -107.045776 -70.818062'
'600001 -920.431396 -86.098122 -56.485119'
'600002 -878.332886 -88.673630 -85.249130'
'600003 -851.637695 -68.546539 -96.691711'
'600004 -834.707642 -28.951260 -73.218872'
'600005 -783.431580 40.657402 24.242268'};
>> format long g;
>> vals = cell2mat(cellfun(#(x) cell2mat(textscan(x, '%f%f%f%f', 1)), Data{1,1,1}, 'uni', 0))
vals =
Columns 1 through 3
600000 -947.772827 -107.045776
600001 -920.431396 -86.098122
600002 -878.332886 -88.67363
600003 -851.637695 -68.546539
600004 -834.707642 -28.95126
600005 -783.43158 40.657402
Column 4
-70.818062
-56.485119
-85.24913
-96.691711
-73.218872
24.242268
That statement vals = ... is quite a mouthful, but easy to explain. Start with this statement:
cell2mat(textscan(x, '%f%f%f%f', 1))
For a given cell x in Data{1,1,1}, we want to parse out four numbers for each string that is stored in x. textscan will place these numbers as individual cell elements into a cell array. We want to convert each element into a numeric array, and so cell2mat is required for us to do so.
In order to operate over all of the elements in Data{1,1,1}, we need to use cellfun to allow us to do so:
cellfun(#(x) cell2mat(textscan(x, '%f%f%f%f', 1)), Data{1,1,1}, 'uni', 0)
The first input is a function that operates on each cell stored in Data{1,1,1} (the second input). We are basically telling cellfun that we want to operate on each cell in the cell array stored in Data{1,1,1} in the way I talked about before. This function has input parameter x, which is one cell from Data{1,1,1}. Now, the uni flag is set to 0 because the output of cellfun will not be a single number, but an array of numbers - one array per line that you have in your cell array. The output of this stage would be a 6 element cell array where each location is a 4 element numeric array. To finish it off, we call cell2mat on this output to finally convert our text into a 2D matrix and therefore:
vals = cell2mat(cellfun(#(x) cell2mat(textscan(x, '%f%f%f%f', 1)), Data{1,1,1}, 'uni', 0))
format long g allows for better display formatting so we can see both the dominant number as well as the floating point numbers neatly.

Related

Convert cell array with sub cells into numeric array

Assume I have A that is a 64x1 cell array.
Each of the 64 cells contains another cell with a string (which is a number, i.e. 11)
A{1, 1}{1, 1} = ’11’ (char)
A{2, 1}{1, 1} = ’13’ (char)
How can I create a numeric array such as
A = [11,13,…]
The cell2mat function seems to work only on “first level” cell array:
cell2mat does not support cell arrays containing cell arrays or objects.
You can use cellfun to convert the contents of each individual cell in A to doubles.
A{1, 1}{1, 1} = '11';
A{2, 1}{1, 1} = '13';
A_array = cellfun(#(a) str2double(a), A)
Breakdown of the function: #(a) passes the contents of each cell of A to the variable a, which can be converted to doubles using str2double(a).
It's quite simple, however you need split it to two steps. Let's assume we have simple 1x4 string array:
A= {'11','13','15','17'};
To convert it you need store the content in temporary variable S, then use sscanf to generate final result:
S = sprintf('%s ', A{:});
Result = sscanf(S, '%f')
The only problem is that it will be column vector. If you need it in a row, you can just transpose(Results).
I'm not sure I entirely understand your question. If I've got it right, you're talking about a cell array like asd below
asd=cell(61,1);
for ii=1:64
asd{ii}={['test',num2str(ii)]}
end
If I understand your goal correctly, I think the following does it somewhat neatly
A=char([asd{:}])
Then if you want to convert the strings to numbers (which wouldn't work for my test, but might for your strings), just use str2num on this new vector
Transform A into a comma-separated list of cells, then horizontally concatenate these cells and finally apply str2double;
A = str2double([A{:}]);

Convert dataset column to obsnames

I have many large dataset arrays in my workspace (loaded from a .mat file).
A minimal working example is like this
>> disp(old_ds)
Date firm1 firm2 firm3 firm4
734692 880,0 102,1 32,7 204,2
734695 880,0 102,0 30,9 196,4
734696 880,0 100,0 30,9 200,2
734697 880,0 101,4 30,9 200,2
734698 880,0 100,8 30,9 202,2
where the first row (with the strings) already are headers in the dataset, that is they are already displayed if I run old_ds.Properties.VarNames.
I'm wondering whether there is an easy and/or fast way to make the first column as ObsNames.
As a first approach, I've thought of "exporting" the data matrix (columns 2 to 5, in the example), the vector of dates and then creating a new dataset where the rows have names.
Namely:
>> mat = double(old_ds(:,2:5)); % taking the data, making it a matrix array
>> head = old_ds.Properties.VarNames % saving headers
>> head(1,1) = []; % getting rid of 'Date' from head
>> dates = dataset2cell(old_ds(:,1)); % taking dates as column cell array
>> dates(1) = []; % getting rid of 'Date' from dates
>> new_ds = mat2dataset(mat,'VarNames',head,'ObsNames',dates);
Apart from the fact that the last line returns the following error, ...
Error using setobsnames (line 25)
NEWNAMES must be a nonempty string or a cell array of nonempty strings.
Error in dataset (line 377)
a = setobsnames(a,obsnamesArg);
Error in mat2dataset (line 75)
d = dataset(vars{:},args{:});
...I would have found a solution, then created a function (such to generalize the process for all 22 dataset arrays that I have) and then run the function 22 times (once for each dataset array).
To put things into perspective, each dataset has 7660 rows and a number of columns that ranges from 2 to 1320.
I have no idea about how I could (and if I could) make the dataset directly "eat" the first column as ObsNames.
Can anyone give me a hint?
EDIT: attached a sample file.
Actually it should be quite easy (but the fact that I'm reading your question means that having the same problem, I first googled it before looking up the documentation... ;)
When loading the dataset, use the following command (adjusted to your case of course):
cell_dat{1} = dataset('File', 'YourDataFile.csv', 'Delimiter', ';',...
'ReadObsNames', true);
The 'ReadObsNames' default is false. It takes the header of the first column and saves it in the file or range as the name of the first dimension in A.Properties.DimNames.
(see the Documentation, Section: "Name/value pairs available when using text files or Excel spreadsheets as inputs")
I can't download your sample file, but if you haven't yet solved the problem otherwise, just try the suggested solution and tell if it works. Glad if I could help.
You are almost there, the error message you got is basically saying that Obsname have to be strings. In your case the 'dates' variable is cell array containing doubles. So you just need to convert them to string.
mat = double(piHU(:,2:end)); % taking the data, making it a matrix array
head = piHU.Properties.VarNames % saving headers
head(1) = []; % getting rid of 'Date' from head
dates = dataset2cell(piHU(:,1)); % taking dates as column cell array, here dates are of type double. try typing on the command window class(dates{2}), you can see the output is double.
dates(1) = []; % getting rid of 'Date' from dates
dates_str=cellfun(#(s) num2str(s),dates,'UniformOutput',false); % convert dates to string, now try typing class(dates_str{2}), the output should be char
new_ds = mat2dataset(mat,'VarNames',head,'ObsNames',dates_str); % construct new dataset.

Scanning data from cell array and removing based on file extensions

I have a cell array that is a list of file names. I transposed them because I find that easier to work with. Now I am attempting to go through each line in each cell and remove the lines based on their file extension. Eventually, I want to use this list as file names to import data from. This is how I transpose the list
for i = 1:numel(F);
a = F(1,i);
b{i} = [a{:}'];
end;
The code I am using to try and read the data in each cell keeps giving me the error input must be of type double or string. Any ideas?
for i = 1:numel(b);
for k = 1:numel(b{1,i});
b(cellfun(textscan(b{1,i}(k,1),'%s.lbl',numel(b)),b))=[];
end;
end;
Thanks in advance.
EDIT: This is for MATLAB. Should have been clear on that. Thanks Brian.
EDIT2: whos for F is
Name Size Bytes Class Attributes
b 1x11 13986188 cell
while for a is
Name Size Bytes Class Attributes
a 1x1 118408 cell
From your description I am not certain how your F array looks, but assuming
F = {'file1.ext1', 'file2.ext2', 'file3.ext2', 'file2.ext1'};
you could remove all files ending with .ext2 like this:
F = F(cellfun('isempty', regexpi(F, '\.ext2$')));
regexpi, which operates on each element in the cell array, returns [] for all files not matching the expression. The cellfun call converts the cell array to a logical array with false at positions corresponding to files ending with .ext2and true for all others. The resulting array may be used as a logical index to F that returns the files that should be kept.
You're using cellfun wrong. It's signature is [A1,...,Am] = cellfun(func,C1,...,Cn). It takes a function as first argument, but you're passing it the result of textscan, which is a cell array of the matching strings. The second argument is a cell array as it should be, but it doesn't make sense to call it over and over in a loop. `cellfun´'s job is to write the loop for you when you want to do the same thing to every cell in a cell array.
Instead of parsing the filename yourself with textscan, I suggest you use fileparts
Since you're already looping over the cell array in transpose-step, it might make sense to do the filtering there. It might look something like this:
for i = 1:numel(F);
a = F(1,i);
[~,~,ext] = fileparts(a{:});
if strcmpi(ext, '.lbl')
b{i} = [a{:}'];
end
end;

If a MATLAB function returns a variable number of values, how can I get all of them as a cell array?

I am writing a function to remove some values from a cell array, like so:
function left = remove(cells, item);
left = cells{cellfun(#(i) ~isequal(item, i), cells)};
But when I run this, left has only the first value, as the call to cells{} with a logical array returns all of the matching cells as separate values. How do I group these separate return values into a single cell array?
Also, perhaps there is already a way to remove a given item from a cell array? I could not find it in the documentation.
You have to use () instead of {} to index the cells:
function left = remove(cells, item)
left = cells(cellfun(#(i) ~isequal(item, i), cells));
Using () for indexing will give you a subset of cells, while using {} will return the contents of a subset of cells as a comma-separated list, and only the first entry of that list will get placed in left in your example.
You can check out this MATLAB documentation for more information on using cell arrays.
EDIT: Response to comment...
If you have an operation that ends up giving you a comma-separated list, you can place the individual elements of the list into cells of a cell array by surrounding the operation with curly braces. For your example, you could do:
left = {cells{cellfun(#(i) ~isequal(item, i), cells)}};
The inner set of curly braces creates a comma-separated list of the contents of cells that are not equal to item, and the outer set then collects this list into a cell array. This will, of course, give the same result as just using parentheses for the indexing, which is the more sensible approach in this case.
If you have a function that returns multiple output arguments, and you want to collect these multiple values into a cell array, then it's a bit more complicated. You first have to decide how many output arguments you will get, or you can use the function NARGOUT to get all possible outputs:
nOut = 3; %# Get the first three output arguments
%# Or...
nOut = nargout(#some_fcn); %# Get all the output arguments from some_fcn
Then you can collect the outputs into a 1-by-nOut cell array outArgs by doing the following:
[outArgs{1:nOut}] = some_fcn(...);
It should be noted that NARGOUT will return a negative value if the function has a variable number of output arguments, so you will have to choose the value for nOut yourself in such a case.

Iterating through struct fieldnames in MATLAB

My question is easily summarized as: "Why does the following not work?"
teststruct = struct('a',3,'b',5,'c',9)
fields = fieldnames(teststruct)
for i=1:numel(fields)
fields(i)
teststruct.(fields(i))
end
output:
ans = 'a'
??? Argument to dynamic structure reference must evaluate to a valid field name.
Especially since teststruct.('a') does work. And fields(i) prints out ans = 'a'.
I can't get my head around it.
You have to use curly braces ({}) to access fields, since the fieldnames function returns a cell array of strings:
for i = 1:numel(fields)
teststruct.(fields{i})
end
Using parentheses to access data in your cell array will just return another cell array, which is displayed differently from a character array:
>> fields(1) % Get the first cell of the cell array
ans =
'a' % This is how the 1-element cell array is displayed
>> fields{1} % Get the contents of the first cell of the cell array
ans =
a % This is how the single character is displayed
Since fields or fns are cell arrays, you have to index with curly brackets {} in order to access the contents of the cell, i.e. the string.
Note that instead of looping over a number, you can also loop over fields directly, making use of a neat Matlab features that lets you loop through any array. The iteration variable takes on the value of each column of the array.
teststruct = struct('a',3,'b',5,'c',9)
fields = fieldnames(teststruct)
for fn=fields'
fn
%# since fn is a 1-by-1 cell array, you still need to index into it, unfortunately
teststruct.(fn{1})
end
Your fns is a cellstr array. You need to index in to it with {} instead of () to get the single string out as char.
fns{i}
teststruct.(fns{i})
Indexing in to it with () returns a 1-long cellstr array, which isn't the same format as the char array that the ".(name)" dynamic field reference wants. The formatting, especially in the display output, can be confusing. To see the difference, try this.
name_as_char = 'a'
name_as_cellstr = {'a'}
You can use the for each toolbox from http://www.mathworks.com/matlabcentral/fileexchange/48729-for-each.
>> signal
signal =
sin: {{1x1x25 cell} {1x1x25 cell}}
cos: {{1x1x25 cell} {1x1x25 cell}}
>> each(fieldnames(signal))
ans =
CellIterator with properties:
NumberOfIterations: 2.0000e+000
Usage:
for bridge = each(fieldnames(signal))
signal.(bridge) = rand(10);
end
I like it very much. Credit of course go to Jeremy Hughes who developed the toolbox.