How do I extract values from an array based on a common value? - matlab

I am trying to set create pareto graphs based on a dataset from excel. The dataset has three columns "Comment", "part", and "number". The values in comment and number repeat as they are general while the part is independent. As such, I need to group them based on the part.
I've been able to create two pareto graphs. By getting the unique part numbers and counting the number of occurrences of unique comments, I have been able to create a plot of number of comments (y-axis) and part (x-axis). Now the part I've been struggling with is plotting the number of comments (y-axis) by the number (x- axis) for a specified part.
Data = readtable('Example_Dataset.xlsx')
Data = Data{:,:}
part = Data(:,2) %Gets part
number = Data(:,3) %Gets number
comments = Data(:,1) %gets comment
Unique_Part= unique(part,'stable')
b = cellfun(#(x) sum(ismember(part,x)),Unique_Part,'un',0)
Unique_number = unique(number,'stable')
c = cellfun(#(x) sum(ismember(number,x)),Unique_number,'un',0)
Unique_comments = unique(comments,'stable')
comment_type =cell2mat(Unique_comments)
comments_parts = cell2mat(b)
comments_number = cell2mat(c)
figure
pareto(comments_parts,Unique_part)
figure
pareto(comments_number,Unique_number)
A simplified dataset is shown here. It should be noted that they are not equal sizes, some repeat only once others repeat numberous times. And sometimes the part is not numeric.
https://imgur.com/a/V3MxeTD

Related

Using Matlab to randomly split an Excel Sheet

I have an Excel sheet containing 1838 records and I need to RANDOMLY split these records into 3 Excel Sheets. I am trying to use Matlab but I am quite new to it and I have just managed the following code:
[xlsn, xlst, raw] = xlsread('data.xls');
numrows = 1838;
randindex = ceil(3*rand(numrows, 1));
raw1 = raw(:,randindex==1);
raw2 = raw(:,randindex==2);
raw3 = raw(:,randindex==3);
Your general procedure will be to read the spreadsheet into some matlab variables, operate on those matrices such that you end up with three thirds and then write each third back out.
So you've got the read covered with xlsread, that results in the two matrices xlsnum and xlstxt. I would suggest using the syntax
[~, ~, raw] = xlsread('data.xls');
In the xlsread help file (you can access this by typing doc xlsread into the command window) it says that the three output arguments hold the numeric cells, the text cells and the whole lot. This is because a matlab matrix can only hold one type of value and a spreadsheet will usually be expected to have text or numbers. The raw value will hold all of the values but in a 'cell array' instead, a different kind of matlab data type.
So then you will have a cell array valled raw. From here you want to do three things:
work out how many rows you have (I assume each record is a row) by using the size function and specifying the appropriate dimension (again check the help file to see how to do this)
create an index of random numbers between 1 and 3 inclusive, which you can use as a mask
randindex = ceil(3*rand(numrows, 1));
apply the mask to your cell array to extract the records matching each index
raw1 = raw(:,randindex==1); % do the same for the other two index values
write each cell back to a file
xlswrite('output1.xls', raw1);
You will probably have to fettle the arguments to get it to work the way you want but be sure to check the doc functionname page to get the syntax just right. Your main concern will be to get the indexing correct - matlab indexes row-first whereas spreadsheets tend to be column-first (e.g. cell A2 is column A and row 2, but matlab matrix element M(1,2) is the first row and the second column of matrix M, i.e. cell B1).
UPDATE: to split the file evenly is surprisingly more trouble: because we're using random numbers for the index it's not guaranteed to split evenly. So instead we can generate a vector of random floats and then pick out the lowest 33% of them to make index 1, the highest 33 to make index 3 and let the rest be 2.
randvec = rand(numrows, 1); % float between 0 and 1
pct33 = prctile(randvec,100/3); % value of 33rd percentile
pct67 = prctile(randvec,200/3); % value of 67th percentile
randindex = ones(numrows,1);
randindex(randvec>pct33) = 2;
randindex(randvec>pct67) = 3;
It probably still won't be absolutely even - 1838 isn't a multiple of 3. You can see how many members each group has this way
numel(find(randindex==1))

Average part of a multidimensional array based on another array (Matlab)

B = randn(1,25,10);
Z = [1;1;1;2;2;3;4;4;4;3];
Ok, so, I want to find the locations where Z=1(or any numbers that are equal to each other), then average across each of the 25 points at these specific locations. In the example you would end with a 1*25*4 array.
Is there an easy way to do this?
I'm not the most versed in Matlab.
First things first: break down the problem.
Define the groups (i.e. the set of unique Z values)
Find elements which belong to these groups
Take the average.
Once you have done that, you can begin to see it's a pretty standard for loop and "Select columns which meet criteria".
Something along the lines of:
B = randn(1,25,10);
Z = [1;1;1;2;2;3;4;4;4;3];
groups = unique(Z); %//find the set of groups
C = nan(1,25,length(groups)); %//predefine the output space for efficiency
for gi = 1:length(groups) %//for each group
idx = Z == groups(gi); %//find it's members
C(:,:,gi) = mean(B(:,:,idx), 3); %//select and mean across the third dimension
end
If B = randn(10,25); then it's very easy because Matlab function usually works down the rows.
Using logical indexing:
ind = Z == 1;
mean(B(ind,:));
If you're dealing with multiple dimensions use permute (and reshape if you actually have 3 dimensions or more) to get yourself to a point where you're averaging down the rows as above:
B = randn(1,25,10);
BB = permute(B, [3,2,1])
continue as above

Matlab matching first column of a row as index and then averaging all columns in that row

I need help with taking the following data which is organized in a large matrix and averaging all of the values that have a matching ID (index) and outputting another matrix with just the ID and the averaged value that trail it.
File with data format:
(This is the StarData variable)
ID>>>>Values
002141865 3.867144e-03 742.000000 0.001121 16.155089 6.297494 0.001677
002141865 5.429278e-03 1940.000000 0.000477 16.583748 11.945627 0.001622
002141865 4.360715e-03 1897.000000 0.000667 16.863406 13.438383 0.001460
002141865 3.972467e-03 2127.000000 0.000459 16.103060 21.966853 0.001196
002141865 8.542932e-03 2094.000000 0.000421 17.452007 18.067214 0.002490
Do not be mislead by the examples I posted, that first number is repeated for about 15 lines then the ID changes and that goes for an entire set of different ID's, then they are repeated as a whole group again, think first block of code = [1 2 3; 1 5 9; 2 5 7; 2 4 6] then the code repeats with different values for the columns except for the index. The main difference is the values trailing the ID which I need to average out in matlab and output a clean matrix with only one of each ID fully averaged for all occurrences of that ID.
Thanks for any help given.
A modification of this answer does the job, as follows:
[value_sort ind_sort] = sort(StarData(:,1));
[~, ii, jj] = unique(value_sort);
n = diff([0; ii]);
averages = NaN(length(n),size(StarData,2)); % preallocate
averages(:,1) = StarData(ii,1);
for col = 2:size(StarData,2)
averages(:,col) = accumarray(jj,StarData(ind_sort,col))./n;
end
The result is in variable averages. Its first column contains the values used as indices, and each subsequent column contains the average for that column according to the index value.
Compatibility issues for Matlab 2013a onwards:
The function unique has changed in Matlab 2013a. For that version onwards, add 'legacy' flag to unique, i.e. replace second line by
[~, ii, jj] = unique(value_sort,'legacy')

Taking medians from within a column vector using MATLAB

I have two column vectors.
The first column vector is several thousand data points long, and I need to take the median from the first forty items, and then the median from the next forty, and so on.
The second column vector contains a group ID (from 1 to 3).
My goal is to end up with a bunch of median calculations and to have them sorted by group. I am very unsure of how to go about this in MATLAB.
reshape your vector into a 40xN matrix, and then use median to take the median of each column.
Here's a bit of code to get you started.
If you have both vectors in one named variable, and the number of columns is exactly divisible by 40, do this:
% column 1 = data, column 2 = groupID
test = rand(400,2);
% compute medians of data
medians = median( reshape(test(:,1), 40,[]) );
% make each entry correspond to the correct groupID
medians = repmat(medians, 40,1);
medians = medians(:);
If your data is NOT exactly divisible by 40, use a simple loop:
N = 40;
test = rand(10*N+4,2);
n = 1;
medians = zeros( ceil(size(test,1)/N), 1 );
for ii = 1:numel(medians)
if n+N-1 > size(test,1)
medians(ii) = median(test(n:end,1));
else
medians(ii) = median(test(n:n+N-1,1));
end
n = n+N;
end
and replicate as before if necessary.
Adjustments to this code for if you have the groupID in a separate variable, or how to sort these things according to groupID, are pretty straightforward.
Getting the group is fairly easy:
groupIDvec = groupID(1:40:end);% A vector with group numbers
Finding the median of each group can be done as #Oli described, by using reshape
medianmat = reshape(datavector,40,[]);
medianvec = median(medianmat);
Now you just need to sort them:
[groupIdvec,idx] = sort(groupIDvec)
And here is your sorted result, where groupIDvec indicates in which group each value is:
result = medianvec(idx);
I do not have Matlab at hand so it may contain errors, but it should be about ok.

Increasing the length of a column in MATLAB

I'm just beginning to teach myself MATLAB, and I'm making a 501x6 array. The columns will contain probabilities for flipping 101 sided die, and as such, the columns contain 101,201,301 entries, not 501. Is there a way to 'stretch the column' so that I add 0s above and below the useful data? So far I've only thought of making a column like a=[zeros(200,1);die;zeros(200,1)] so that only the data shows up in rows 201-301, and similarly, b=[zeros(150,1);die2;zeros(150,1)], if I wanted 200 or 150 zeros to precede and follow the data, respectively in order for it to fit in the array.
Thanks for any suggestions.
You can do several thing:
Start with an all-zero matrix, and only modify the elements you need to be non-zero:
A = zeros(501,6);
A(someValue:someOtherValue, 5) = value;
% OR: assign the range to a vector:
A(someValue:someOtherValue, 5) = 1:20; % if someValue:someOtherValue is the same length as 1:20