How can I merge redundant fields within a structure? - matlab

I have a data set with multiple fields of which several of the fields are different names for equivalent properties. I have rescaled and adjusted the data so that the quantities are comparable and want to merge them into a single field.
As a toy example, let's say I have:
s = struct('pounds', [nan nan 4.8], 'pennies', [120 370 nan]);
s.pennies = s.pennies/100;
How do I merge my incomplete fields to get the desired output:
snew = struct(pounds, [1.2 3.7 4.8]);

If you have modified your field values such that they should be equivalent, and simply need to combine the non-NaN values, one option is to vertically concatenate the fields then use min or max down each column (which will ignore the NaN values). Then just remove the unwanted field with rmfield:
>> s = struct('pounds', [nan,nan,4.8], 'pennies', [120,370,nan]);
>> s.pounds = min([s.pounds; s.pennies./100], [], 1); % Scaling included here
>> s = rmfield(s, 'pennies')
s =
struct with fields:
pounds: [1.2000 3.7000 4.8000]

The following works for any number of fields. Since it is guaranteed that only one field is not NaN at each position, you can
Convert to a matrix such that each original field becomes a row of the matrix.
Keep only the numbers, ignoring NaN's. By assumption, this gives exactly one number per column.
Arrange that into a struct with the desired field name.
s = struct('pounds',[nan,nan,4.8], 'pennies', [120,370,nan])
s.pennies = s.pennies/100; % example data
target_field = 'pounds'; % field to which the conversion has been done
t = struct2cell(s); % convert struct to cell array
t = vertcat(t{:}); % convert cell array to matrix
t = t(~isnan(t)).'; % keep only numbers, ignoring NaN's
result = struct(target_field, t); % arrange into a struct

try my two-liner below
c=struct2cell(s);
s=struct('pounds',unique([c{:}]));
even better, you can also do it using the below oneliner
s=struct('pounds',unique(cell2mat(cellfun(#(x) x(:), struct2cell(s),'UniformOutput',false)))')

Related

Using Matlab to randomly split an Excel Sheet

I have an Excel sheet containing 1838 records and I need to RANDOMLY split these records into 3 Excel Sheets. I am trying to use Matlab but I am quite new to it and I have just managed the following code:
[xlsn, xlst, raw] = xlsread('data.xls');
numrows = 1838;
randindex = ceil(3*rand(numrows, 1));
raw1 = raw(:,randindex==1);
raw2 = raw(:,randindex==2);
raw3 = raw(:,randindex==3);
Your general procedure will be to read the spreadsheet into some matlab variables, operate on those matrices such that you end up with three thirds and then write each third back out.
So you've got the read covered with xlsread, that results in the two matrices xlsnum and xlstxt. I would suggest using the syntax
[~, ~, raw] = xlsread('data.xls');
In the xlsread help file (you can access this by typing doc xlsread into the command window) it says that the three output arguments hold the numeric cells, the text cells and the whole lot. This is because a matlab matrix can only hold one type of value and a spreadsheet will usually be expected to have text or numbers. The raw value will hold all of the values but in a 'cell array' instead, a different kind of matlab data type.
So then you will have a cell array valled raw. From here you want to do three things:
work out how many rows you have (I assume each record is a row) by using the size function and specifying the appropriate dimension (again check the help file to see how to do this)
create an index of random numbers between 1 and 3 inclusive, which you can use as a mask
randindex = ceil(3*rand(numrows, 1));
apply the mask to your cell array to extract the records matching each index
raw1 = raw(:,randindex==1); % do the same for the other two index values
write each cell back to a file
xlswrite('output1.xls', raw1);
You will probably have to fettle the arguments to get it to work the way you want but be sure to check the doc functionname page to get the syntax just right. Your main concern will be to get the indexing correct - matlab indexes row-first whereas spreadsheets tend to be column-first (e.g. cell A2 is column A and row 2, but matlab matrix element M(1,2) is the first row and the second column of matrix M, i.e. cell B1).
UPDATE: to split the file evenly is surprisingly more trouble: because we're using random numbers for the index it's not guaranteed to split evenly. So instead we can generate a vector of random floats and then pick out the lowest 33% of them to make index 1, the highest 33 to make index 3 and let the rest be 2.
randvec = rand(numrows, 1); % float between 0 and 1
pct33 = prctile(randvec,100/3); % value of 33rd percentile
pct67 = prctile(randvec,200/3); % value of 67th percentile
randindex = ones(numrows,1);
randindex(randvec>pct33) = 2;
randindex(randvec>pct67) = 3;
It probably still won't be absolutely even - 1838 isn't a multiple of 3. You can see how many members each group has this way
numel(find(randindex==1))

Importing text file into matrix form with indexes as strings?

I'm new to Matlab so bear with me. I have a text file in this form :
b0002 b0003 999
b0002 b0004 999
b0002 b0261 800
I need to read this file and convert it into a matrix. The first and second column in the text file are analogous to row and column of a matrix(the indices). I have another text file with a list of all values of 'indices'. So it should be possible to create an empty matrix beforehand.
b0002
b0003
b0004
b0005
b0006
b0007
b0008
Is there anyway to access matrix elements using custom string indices(I doubt it but just wondering)? If not, I'm guessing the only way to do this is to assign the first row and first column the index string values and then assign the third column values based on the first text file. Can anyone help me with that?
You can easily convert those strings to numbers and then use those as indices. For a given string, b0002:
s = 'b0002'
str2num(s(2:end); % output = 2
Furthermore, you can also do this with a char matrix:
t = ['b0002';'b0003';'b0004']
t =
b0002
b0003
b0004
str2num(t(:,2:end))
ans =
2
3
4
First, we use textscan to read the data in as two strings and a float (could use other numerical formats. We have to open the file for reading first.
fid = fopen('myfile.txt');
A = textscan(fid,'%s%s%f');
textscan returns a cell array, so we have to extract your three variables. x and y are converted to single char arrays using cell2mat (works only if all the strings inside are the same length), n is a list of numbers.
x = cell2mat(A{1});
y = cell2mat(A{2});
n = A{3};
We can now convert x and y to numbers by telling it to take every row : but only the second to final part of the row 2:end, e.g 002, 003 , not b002, b003.
x = str2num(x(:,2:end));
y = str2num(y(:,2:end));
Slight problem with indexing - if I have a matrix A and I do this:
A = magic(8);
A([1,5],[3,8])
Then it returns four elements - [1,3],[5,3],[1,8],[5,8] - not two. But what you want is the location in your matrix equivalent to x(1),y(1) to be set to n(1) and so on. To do this, we need to 1) work out the final size of matrix. 2) use sub2ind to calculate the right locations.
% find the size
% if you have a specified size you want the output to be use that instead
xsize = max(x);
ysize = max(y);
% initialise the output matrix - not always necessary but good practice
out = zeros(xsize,ysize);
% turn our x,y into linear indices
ind = sub2ind([xsize,ysize],x,y);
% put our numbers in our output matrix
out(ind) = n;

Matlab: Cell column with mixed char/double entries - how to make all numerical?

I'm importing large datasets into Matlab from different Excel files. I use [~,~,raw] = xlsread('myfile.xlsx') to obtain a raw input into a single Matlab cell.
One column consists of interest rates, and the entries are imported as either CHAR (if they're decimal numbers) or DOUBLE (if they're rounded to integers).
Now, I want to slice out that column and get a numerical vector, which Matlab doesn't like. If i use str2num, all the CHAR entries are converted into DOUBLE, but the DOUBLES becomes NaN. Is there a function/solution to take into account that some entries are already DOUBLE?
You can probably work this into your existing code rather than create a whole new function but this should work for you. The functions not vectorized though but since it a cell vector I don't think that's an issue
function number = str2numThatHandelsNumericInputs(obj)
if isnumeric(obj)
number = obj;
else
number = str2num(obj);
end
end
Or as Eitan points out a better function:
function num = str2numThatHandelsNumericInputs(num)
if ischar(num)
num = str2num(num);
end
end
I think I didn't quite understand your question, because I understood you have something like this:
raw = {...
'1.2345' , NaN
3 , inf
4 , #cos
'567.1232' , { struct }
};
In which case you could just use str2double:
>> inds = cellfun('isclass', raw(:,1), 'char'); % indices to non-numeric data
>> raw(inds,1) = num2cell(str2double(raw(inds,1))); % convert in-place
>> [raw{:,1}].' % extract numeric array
ans =
1.2345
3.0000
4.0000
567.1232
But is this what you mean?

Iteratively take mean of column in Matlab

Hi I have a column of values in Matlab (PDS(:,39)). This column is filtered for various things and there are two seperate flagging columns (PDS(:,[41 81])) that are either 0 for a valid row or -1 for a non-valid row. I am taking the mean of the valid data, and if the mean is above 0, I'd like to make this value non-valid and take the mean again until the mean is below a certain value (0.2 in this instance). Here is my code:
% identify the VALID values
U1 = (PDS(:,81)==0);
F1 = (PDS(:,41)==0);
% only calculate using the valid elements
shearave = mean(PDS(U1&F1,39));
while shearave > 0.2
clear im
% determine the largest shear value overall for filtered and
% non-flagged
[c im] = max(PDS(U1&F1,39));
% make this value a NaN
PDS(im,39)=NaN;
% filter using a specific column and the overall column
PDS(im,41)=-1;
F1 = (PDS(:,41)==0);
% calculate shear ave again using new flagging column - remove the ";" so I can see the average change
shearave = mean(PDS(U1&F1,39))
end
The output that Matlab gives me is:
shearave =
0.3032
shearave =
0.3032
shearave =
0.3032
etc
The loop is not re-evalulating with the new valid data. How do I solve this problem? Do I have to use a break or continue? Or perhaps a different type of loop? Thanks for any help.
You don't need to use a loop, I'd do the following:
sort your data:
m=PDS(U1&F1,39);
[x isort]=sort(m);
Then calculate the cumulative mean of the sorted vector:
y = cumsum(x)./[1:numel(x)]';
Then truncate at 0.2, and retrieve the values needed using the indices found ...
ind=find(y<=0.2);
values_needed=m(isort(ind));
You iteratively replace values in column 39 with NaN. However, mean will not ignore NaN, but instead return NaN as the new average. You can see this with a little experiment:
>> mean([3, 4, 2, NaN, 4, 1])
ans = NaN
Therefore, shearave < 0.2 will never be true.

Indexing must appear last in an index expression

I have a vector CD1 (120-by-1) and I separate CD1 into 6 parts. For example, the first part is extracted from row 1 to row 20 in CD1, and second part is extracted from row 21 to row 40 in CD1, etc. For each part, I need to compute the means of the absolute values of second differences of the data.
for PartNo = 1:6
% extract data
Y(PartNo) = CD1(1 + 20*(PartNo-1):20*(PartNo),:);
% find the second difference
Z(PartNo) = Y(PartNo)(3:end) - Y(PartNo)(1:end-2);
% mean of absolute value
MEAN_ABS_2ND_DIFF_RESULT(PartNo) = mean(abs(Z));
end
However, the commands above produce the error:
()-indexing must appear last in an index expression for Line:2
Any ideas to change the code to have it do what I want?
This error is often encountered when Y is a cell-array. For cell arrays,
Y{1}(1:3)
is legal. Curly braces ({}) mean data extraction, so this means you are extracting the array stored in location 1 in the cell array, and then referencing the elements 1 through 3 of that array.
The notation
Y(1)(1:3)
is different in that it does not extract data, but it references the cell's location 1. This means the first part (Y(1)) returns a cell-array which, in your case, contains a single array. So you won't have direct access to the regular array as before.
It is an infamous limitation in Matlab that you cannot do indirect or double-referencing, which is in effect what you are doing here.
Hence the error.
Now, to resolve: I suspect replacing a few normal braces with curly ones will do the trick:
Y{PartNo} = CD1(1+20*(PartNo-1):20*PartNo,:); % extract data
Z{PartNo} = Y{PartNo}(3:end)-Y{PartNo}(1:end-2); % find the second difference
MEAN_ABS_2ND_DIFF_RESULT{PartNo} = mean(abs(Z{PartNo})); % mean of absolute value
I might suggest a different approach
Y = reshape(CD1, 20, 6);
Z = diff(y(1:2:end,:));
MEAN_ABS_2ND_DIFF_RESULT = mean(abs(Z));
This is not a valid statement in matlab:
Y(PartNo)(3:end)
You should either make Y two-dimensional and use this indexing
Y(PartNo, 3:end)
or extract vector parts and use them directly, if you use a loop like you have shown
for PartNo = 1:6
% extract data
Y = CD1(1 + 20*(PartNo-1):20*(PartNo),:);
% find the second difference
Z = Y(3:end) - Y(1:end-2);
% mean of absolute value
MEAN_ABS_2ND_DIFF_RESULT(PartNo) = mean(abs(Z));
end
Also, since CD1 is a vector, you do not need to index the second dimension. Drop the :
Y = CD1(1 + 20*(PartNo-1):20*(PartNo));
Finally, you do not need a loop. You can reshape the CD1 vector to a two-dimensional array Y of size 20x6, in which the columns are your parts, and work directly on the resulting matrix:
Y = reshape(CD1, 20, 6);
Z = Y(3:end,:)-Y(1:end-1,:);
MEAN_ABS_2ND_DIFF_RESULT = mean(abs(Z));