Importing data with engineering notation into Matlab - matlab

I've got a .xls file and I want to import it into Matlab by xlsread function..I get NaNs for numbers with engineering notation..like I get NaNs for 15.252 B or 1.25 M
Any suggestions?
Update: I can use [num,txt,raw] = xlsread('...') and the raw one is exactly what I want but how can I replace the Ms with (*106)?

First you could extract everything from excel in a cell array using
[~,~,raw] = xlsread('MyExcelFilename.xlsx')
Then you could write a simple function that returns a number from the string based on 'B', 'M' and so on. Here is such an example:
function mynumber = myfunc( mystring )
% get the numeric part
my_cell = regexp(mystring,'[0-9.]+','match');
mynumber = str2double(my_cell{1});
% get ending characters
my_cell = regexp(mystring,'[A-z]+','match');
mychars = my_cell{1};
% multiply the number based on char
switch mychars
case 'B'
mynumber = mynumber*1e9;
case 'M'
mynumber = mynumber*1e6;
otherwise
end
end
Of course there are other methods to split the numeric string from the rest, use what you want. For more info see the regexp documentation. Finally use cellfun to convert cell array to numeric array:
my_array = cellfun(#myfunc,raw);

EDIT:
Matlab does not offer any built-in formatting of strings in engineering format.
Source: http://se.mathworks.com/matlabcentral/answers/892-engineering-notation-printed-into-files
In the source you will find also function which would be helpful for you.

Related

Read specific portions of an excel file based on string values in MATLAB

I have an excel file and I need to read it based on string values in the 4th column. I have written the following but it does not work properly:
[num,txt,raw] = xlsread('Coordinates','Centerville');
zn={};
ctr=0;
for i = 3:size(raw,1)
tf = strcmp(char(raw{i,4}),char(raw{i-1,4}));
if tf == 0
ctr = ctr+1;
end
zn{ctr}=raw{i,4};
end
data=zeros(1,10); % 10 corresponds to the number of columns I want to read (herein, columns 'J' to 'S')
ctr=0;
for j = 1:length(zn)
for i=3:size(raw,1)
tf=strcmp(char(raw{i,4}),char(zn{j}));
if tf==1
ctr=ctr+1;
data(ctr,:,j)=num(i-2,10:19);
end
end
end
It gives me a "15129x10x22 double" thing and when I try to open it I get the message "Cannot display summaries of variables with more than 524288 elements". It might be obvious but what I am trying to get as the output is 'N = length(zn)' number of matrices which represent the data for different strings in the 4th column (so I probably need a struct; I just don't know how to make it work). Any ideas on how I could fix this? Thanks!
Did not test it, but this should help you get going:
EDIT: corrected wrong indexing into raw vector. Also, depending on the format you might want to restrict also the rows of the raw matrix. From your question, I assume something like selector = raw(3:end,4); and data = raw(3:end,10:19); should be correct.
[~,~,raw] = xlsread('Coordinates','Centerville');
selector = raw(:,4);
data = raw(:,10:19);
[selector,~,grpidx] = unique(selector);
nGrp = numel(selector);
out = cell(nGrp,1);
for i=1:nGrp
idx = grpidx==i;
out{i} = cell2mat(data(idx,:));
end
out is the output variable. The key here is the variable grpidx that is an output of the unique function and allows you to trace back the unique values to their position in the original vector. Note that unique as I used it may change the order of the string values. If that is an issue for you, use the setOrderparameter of the unique function and set it to 'stable'

Creating a function with variable number of inputs?

I am trying to define the following function in MATLAB:
file = #(var1,var2,var3,var4) ['var1=' num2str(var1) 'var2=' num2str(var2) 'var3=' num2str(var3) 'var4=' num2str(var4)'];
However, I want the function to expand as I add more parameters; if I wanted to add the variable vark, I want the function to be:
file = #(var1,var2,var3,var4,vark) ['var1=' num2str(var1) 'var2=' num2str(var2) 'var3=' num2str(var3) 'var4=' num2str(var4) 'vark=' num2str(vark)'];
Is there a systematic way to do this?
Use fprintf with varargin for this:
f = #(varargin) fprintf('var%i= %i\n', [(1:numel(varargin));[varargin{:}]])
f(5,6,7,88)
var1= 5
var2= 6
var3= 7
var4= 88
The format I've used is: 'var%i= %i\n'. This means it will first write var then %i says it should input an integer. Thereafter it should write = followed by a new number: %i and a newline \n.
It will choose the integer in odd positions for var%i and integers in the even positions for the actual number. Since the linear index in MATLAB goes column for column we place the vector [1 2 3 4 5 ...] on top, and the content of the variable in the second row.
By the way: If you actually want it on the format you specified in the question, skip the \n:
f = #(varargin) fprintf('var%i= %i', [(1:numel(varargin));[varargin{:}]])
f(6,12,3,15,5553)
var1= 6var2= 12var3= 3var4= 15var5= 5553
Also, you can change the second %i to floats (%f), doubles (%d) etc.
If you want to use actual variable names var1, var2, var3, ... in your input then I can only say one thing: Don't! It's a horrible idea. Use cells, structs, or anything else than numbered variable names.
Just to be crytsal clear: Don't use the output from this in MATLAB in combination with eval! eval is evil. The Mathworks actually warns you about this in the official documentation!
How about calling the function as many times as the number of parameters? I wrote this considering the specific form of the character string returned by your function where k is assumed to be the index of the 'kth' variable to be entered. Array var can be the list of your numeric parameters.
file=#(var,i)[strcat('var',num2str(i),'=') num2str(var) ];
var=[2,3,4,5];
str='';
for i=1:length(var);
str=strcat(str,file(var(i),i));
end
If you want a function to accept a flexible number of input arguments, you need varargin.
In case you want the final string to be composed of the names of your variables as in your workspace, I found no way, since you need varargin and then it looks impossible. But if you are fine with having var1, var2 in your string, you can define this function and then use it:
function str = strgen(varargin)
str = '';
for ii = 1:numel(varargin);
str = sprintf('%s var%d = %s', str, ii, num2str(varargin{ii}));
end
str = str(2:end); % to remove the initial blank space
It is also compatible with strings. Testing it:
% A = pi;
% B = 'Hello!';
strgen(A, B)
ans =
var1 = 3.1416 var2 = Hello!

Convert dataset column to obsnames

I have many large dataset arrays in my workspace (loaded from a .mat file).
A minimal working example is like this
>> disp(old_ds)
Date firm1 firm2 firm3 firm4
734692 880,0 102,1 32,7 204,2
734695 880,0 102,0 30,9 196,4
734696 880,0 100,0 30,9 200,2
734697 880,0 101,4 30,9 200,2
734698 880,0 100,8 30,9 202,2
where the first row (with the strings) already are headers in the dataset, that is they are already displayed if I run old_ds.Properties.VarNames.
I'm wondering whether there is an easy and/or fast way to make the first column as ObsNames.
As a first approach, I've thought of "exporting" the data matrix (columns 2 to 5, in the example), the vector of dates and then creating a new dataset where the rows have names.
Namely:
>> mat = double(old_ds(:,2:5)); % taking the data, making it a matrix array
>> head = old_ds.Properties.VarNames % saving headers
>> head(1,1) = []; % getting rid of 'Date' from head
>> dates = dataset2cell(old_ds(:,1)); % taking dates as column cell array
>> dates(1) = []; % getting rid of 'Date' from dates
>> new_ds = mat2dataset(mat,'VarNames',head,'ObsNames',dates);
Apart from the fact that the last line returns the following error, ...
Error using setobsnames (line 25)
NEWNAMES must be a nonempty string or a cell array of nonempty strings.
Error in dataset (line 377)
a = setobsnames(a,obsnamesArg);
Error in mat2dataset (line 75)
d = dataset(vars{:},args{:});
...I would have found a solution, then created a function (such to generalize the process for all 22 dataset arrays that I have) and then run the function 22 times (once for each dataset array).
To put things into perspective, each dataset has 7660 rows and a number of columns that ranges from 2 to 1320.
I have no idea about how I could (and if I could) make the dataset directly "eat" the first column as ObsNames.
Can anyone give me a hint?
EDIT: attached a sample file.
Actually it should be quite easy (but the fact that I'm reading your question means that having the same problem, I first googled it before looking up the documentation... ;)
When loading the dataset, use the following command (adjusted to your case of course):
cell_dat{1} = dataset('File', 'YourDataFile.csv', 'Delimiter', ';',...
'ReadObsNames', true);
The 'ReadObsNames' default is false. It takes the header of the first column and saves it in the file or range as the name of the first dimension in A.Properties.DimNames.
(see the Documentation, Section: "Name/value pairs available when using text files or Excel spreadsheets as inputs")
I can't download your sample file, but if you haven't yet solved the problem otherwise, just try the suggested solution and tell if it works. Glad if I could help.
You are almost there, the error message you got is basically saying that Obsname have to be strings. In your case the 'dates' variable is cell array containing doubles. So you just need to convert them to string.
mat = double(piHU(:,2:end)); % taking the data, making it a matrix array
head = piHU.Properties.VarNames % saving headers
head(1) = []; % getting rid of 'Date' from head
dates = dataset2cell(piHU(:,1)); % taking dates as column cell array, here dates are of type double. try typing on the command window class(dates{2}), you can see the output is double.
dates(1) = []; % getting rid of 'Date' from dates
dates_str=cellfun(#(s) num2str(s),dates,'UniformOutput',false); % convert dates to string, now try typing class(dates_str{2}), the output should be char
new_ds = mat2dataset(mat,'VarNames',head,'ObsNames',dates_str); % construct new dataset.

Using a string to refer to a structure array - matlab

I am trying to take the averages of a pretty large set of data, so i have created a function to do exactly that.
The data is stored in some struct1.struct2.data(:,column)
there are 4 struct1 and each of these have between 20 and 30 sub-struct2
the data that I want to average is always stored in column 7 and I want to output the average of each struct2.data(:,column) into a 2xN array/double (column 1 of this output is a reference to each sub-struct2 column 2 is the average)
The omly problem is, I can't find a way (lots and lots of reading) to point at each structure properly. I am using a string to refer to the structures, but I get error Attempt to reference field of non-structure array. So clearly it doesn't like this. Here is what I used. (excuse the inelegence)
function [avrg] = Takemean(prefix,numslits)
% place holder arrays
avs = [];
slits = [];
% iterate over the sub-struct (struct2)
for currslit=1:numslits
dataname = sprintf('%s_slit_%02d',prefix,currslit);
% slap the average and slit ID on the end
avs(end+1) = mean(prefix.dataname.data(:,7));
slits(end+1) = currslit;
end
% transpose the arrays
avs = avs';
slits = slits';
avrg = cat(2,slits,avs); % slap them together
It falls over at this line avs(end+1) = mean(prefix.dataname.data,7); because as you can see, prefix and dataname are strings. So, after hunting around I tried making these strings variables with genvarname() still no luck!
I have spent hours on what should have been 5min of coding. :'(
Edit: Oh prefix is a string e.g. 'Hs' and the structure of the structures (lol) is e.g. Hs.Hs_slit_XX.data() where XX is e.g. 01,02,...27
Edit: If I just run mean(Hs.Hs_slit_01.data(:,7)) it works fine... but then I cant iterate over all of the _slit_XX
If you simply want to iterate over the fields with the name pattern <something>_slit_<something>, you need neither the prefix string nor numslits for this. Pass the actual structure to your function, extract the desired fields and then itereate them:
function avrg = Takemean(s)
%// Extract only the "_slit_" fields
names = fieldnames(s);
names = names(~cellfun('isempty', strfind(names, '_slit_')));
%// Iterate over fields and calculate means
avrg = zeros(numel(names), 2);
for k = 1:numel(names)
avrg(k, :) = [k, mean(s.(names{k}).data(:, 7))];
end
This method uses dynamic field referencing to access fields in structs using strings.
First of all, think twice before you use string construction to access variables.
If you really really need it, here is how it can be used:
a.b=123;
s1 = 'a';
s2 = 'b';
eval([s1 '.' s2])
In your case probably something like:
Hs.Hs_slit_01.data= rand(3,7);
avs = [];
dataname = 'Hs_slit_01';
prefix = 'Hs';
eval(['avs(end+1) = mean(' prefix '.' dataname '.data(:,7))'])

Iterating through struct fieldnames in MATLAB

My question is easily summarized as: "Why does the following not work?"
teststruct = struct('a',3,'b',5,'c',9)
fields = fieldnames(teststruct)
for i=1:numel(fields)
fields(i)
teststruct.(fields(i))
end
output:
ans = 'a'
??? Argument to dynamic structure reference must evaluate to a valid field name.
Especially since teststruct.('a') does work. And fields(i) prints out ans = 'a'.
I can't get my head around it.
You have to use curly braces ({}) to access fields, since the fieldnames function returns a cell array of strings:
for i = 1:numel(fields)
teststruct.(fields{i})
end
Using parentheses to access data in your cell array will just return another cell array, which is displayed differently from a character array:
>> fields(1) % Get the first cell of the cell array
ans =
'a' % This is how the 1-element cell array is displayed
>> fields{1} % Get the contents of the first cell of the cell array
ans =
a % This is how the single character is displayed
Since fields or fns are cell arrays, you have to index with curly brackets {} in order to access the contents of the cell, i.e. the string.
Note that instead of looping over a number, you can also loop over fields directly, making use of a neat Matlab features that lets you loop through any array. The iteration variable takes on the value of each column of the array.
teststruct = struct('a',3,'b',5,'c',9)
fields = fieldnames(teststruct)
for fn=fields'
fn
%# since fn is a 1-by-1 cell array, you still need to index into it, unfortunately
teststruct.(fn{1})
end
Your fns is a cellstr array. You need to index in to it with {} instead of () to get the single string out as char.
fns{i}
teststruct.(fns{i})
Indexing in to it with () returns a 1-long cellstr array, which isn't the same format as the char array that the ".(name)" dynamic field reference wants. The formatting, especially in the display output, can be confusing. To see the difference, try this.
name_as_char = 'a'
name_as_cellstr = {'a'}
You can use the for each toolbox from http://www.mathworks.com/matlabcentral/fileexchange/48729-for-each.
>> signal
signal =
sin: {{1x1x25 cell} {1x1x25 cell}}
cos: {{1x1x25 cell} {1x1x25 cell}}
>> each(fieldnames(signal))
ans =
CellIterator with properties:
NumberOfIterations: 2.0000e+000
Usage:
for bridge = each(fieldnames(signal))
signal.(bridge) = rand(10);
end
I like it very much. Credit of course go to Jeremy Hughes who developed the toolbox.