I have some data that I'd like to plot on a graph in MATLAB.
the data is discrete - specifically these are 2 series of data against a single vector.
I could easily do it in excel like this:
but I want to do it in matlab. I tried to use the stem function, but the 2 series values are shown on the same bar (and I want them side by side, like the excel does):
In addition, I would like to show on the x bar only the values I'm interested in (in my case: 2,4,8,16,32). How do I do that?
Since you want to draw a bar graph, there is a dedicated built-in function, named bar(), for that purpose.
You can do it using:
N = [2 4 8 16 32];
val1 = [1; 2; 3; 4; 5];
val2 = [3; 5; 6; 12; 17];
bar([N],[val2,val1]); % If you want val1 to appear first then use bar([N],[val1,val2]);
which gives the following desired result:
Related
I am trying to implement this code so it works as quickly as possible.
Say I have a population of 100 different values, you can think of it as pop = 1:100 or pop = randn(1,100) to keep things simple. I have a vector n which gives me the size of samples I want to get. Say, for example, that n=[1 3 10 6 2]. What I want to do is to take 5 (which in reality is length(n)) different samples of pop, each consisting of n(i) elements without replacement. This means that for my first sample I want 1 element out of pop, for the second sample I want 3, for the third I want 10, and so on.
To be honest, I am not really interested in which elements are sampled. What I want to get is the sum of those elements that are present in the ith-sample. This would be trivial if I implemented it with a loop, but I am trying to avoid using them to keep my code as quick as possible. I have to do this for many different populations and with length(n)being very large.
If I had to do it with a loop, this would be how:
pop = randn(1,100);
n = [1 3 10 6 2];
sum_sample = zeros(length(n),1);
for i = 1:length(n)
sum_sample(i,1) = sum(randsample(pop,n(i)));
end
Is there a way to do this?
The only way to figure out what is fastest for you is to do a comparison of the different methods.
In fact the loop appears to be very fast in this case!
pop = randn(1,100);
n = [1 3 10 6 2];
tic
sr = #(n) sum(randsample(pop,n));
sum_sample = arrayfun(sr,n);
toc %% Returns about 0.004
clear su
tic
for t=numel(n):-1:1
su(t)=sum(randsample(pop,n(t)));
end
toc %% Returns about 0.003
You can create a function handle which choses the random samples and sums these up. Then you can use arrayfun to execute this function for all values of n:
pop = randn(1,100);
n = [1 3 10 6 2];
sr = #(n) sum(randsample(pop,n));
sum_sample = arrayfun(sr,n);
You can do something like this:
pop = randn(1,100);
n = [1 3 10 6 2];
sampled_data_index = randi(length(pop),1,sum(n));
sampled_data = pop(sampled_data_index);
The randi function randomly selects integer values in a specified range that is suitable for indexing. After you have the indices you can use those at once to sample the data from the pop database.
If you want to have unique indices you can replace the randi function with randperm:
sampled_data_index = randperm(length(pop),sum(n));
Finally:
You can have all the sampled values as a cell variable using the following code:
pop = randn(1,100);
n = [1 3 10 6 2];
fun = #(m) pop(randperm(length(pop),m));
C = arrayfun(fun,n,'UniformOutput',0)
Also having the sum of the sampled data:
funs = #(m) sum(pop(randperm(length(pop),m)));
sumC = arrayfun(funs,n)
I have a data set as follows:
Data = [4 12; 5 10; 8 7; 5 3; 5 4; 2 11; 5 4; 3 8; 6 2; 7 4; 10 8; 8 9; 10 9; 10 12]
Then I proceed with:
[idx,ctrs, sumD] = kmeans(Data,3)
It gives me the centroids and sumD (sums of point-to-centroid distances within cluster) like:
ctrs = [5.6000 3.4000; 3.5000 10.2500; 9.2000 9.0000]
sumD = [6.4000; 13.7500; 18.8000]
Whereas according to Excel Solver (from a published article), ctrs and sumD are as follows for k=3:
ctrs = [5.21815716 3.66736761; 3.615385665 10.461533; 9.47841197 8.75055345]
sumD = [5.151897802; 7.285383286; 8.573829765]
(NB: In that article, the authors give an initial (seed) centroid to each cluster such as [4 4; 5 12; 10 6] by visual decision from the plot.)
Apparently, Excel finds more accurate ctrs values thereby smaller sumD values. I could not achieve this with Matlab. That's why I used other parameters of kmeans function. I used 'replicates'` and 'options' (MaxIter) and also 'start' parameters - even with 3D array seed - to no avail. I even adopted the same initial seed from the article to Matlab. Followings are what I tried and failed:
First:
opts = statset('MaxIter',100);
Seed = [4 4; 5 12; 10 6];
[idx,ctrs] = kmeans(Data,3,'Replicates',50,'options',opts,'start',Seed)
This gives an error: The third dimension of the 'Start' array must match the 'replicates' parameter value.
Second:
I created a 3D array of 50 pages where the first page is the same initial seed above and the rest 49 are random. I created the random pages as:
T = rand(3,2,49);
After that, I created the 50 pages 3D array like this:
Seed2 = cat(3,Seed,T);
Then used kmeans:
[idx,ctrs] = kmeans(Data,3,'Replicates',50,'options',opts,'start',Seed2)
However, Matlab gave warnings indicated that all the replicates after the first replication were terminated due to empty cluster created at iteration 1. Also, the idx, ctrs and sumD values obtained were still the same as before - as if I ran my very first function above (i.e. [idx,ctrs, sumD] = kmeans(Data,3) ).
I am stuck. I am trying to verify the results of the Excel solver published in the article using Matlab because then I will apply the same algorithm applied on 14 observations from the article to a larger data set of 900+ observations.
What am I doing wrong? What should I correct in my coding to obtain the same or much similar result of the Excel Solver?
The difference appears to be in the choice of the measure of distance used, not in the coding. There is more than one way to define "distance" in this context.
MATLAB uses squared Euclidean distance by default. By hand calculating this with the MATLAB results I can replicate the sumD results you get. However, using squared Euclidean distance measure with the results you give from the paper gives a higher value of sumD.
I get the same results for sumD as the paper if I use plain (not squared) Euclidean distance. Using this measure the MATLAB results return higher values for sumD.
So neither result is wrong as such, they're just measuring "rightness" in different ways.
How can you be certain that excel values are correct and MATLAB kmeans gives you not so accurate result.
With the quick MATLAB script below, I plotted the centroids, and at least visually it seems correct
Data = [4 12; 5 10; 8 7; 5 3; 5 4; 2 11; 5 4; 3 8; 6 2; 7 4; 10 8; 8 9; 10 9; 10 12];
plot(Data(:,1), Data(:,2),'ob','markersize', 10);
axis([min(Data(:,1))-2, max(Data(:,1))+2, min(Data(:,2))-2, max(Data(:,2))+2]);
hold on;
[idx,ctrs, sumD] = kmeans(Data,3);
plot(ctrs(:,1), ctrs(:,2), '*r', 'markersize', 10);
If this is not accurate enough, Instead of trying to customize MATLAB's kmeans, we can define our kmean function. I had implemented the kmeans sometime ago and it seemed easier that asking matlab to fine tune the parameters.
For example I have function that creates matrix 2x2 : [1 2; 3 4]
I have such simple function:
function[result] = Rho(x)
// I've tried and so:
result = [1 2; 3 4];
// And so:
result(1,1) = 1;
result(1,2) = 2;
result(2,1) = 3;
result(2,2) = 4;
In Matlab window i see right result:
>> Rho(1)
ans =
1 2
3 4
But in Simulink I always get [1;2;3;4]. Where is my mistake?
P.S. I forgot to remove argument x of function, because in real function I make matrix depending on argument x. But it doesn't play role in our example
The problem you are having is likely due to the parameter settings for your MATLAB Function block (now called an Interpreted MATLAB Function block in newer versions). Take a look at the Parameters Dialog Box for that block:
Note that you will want to set the Output dimensions to 2 and uncheck the Collapse 2-D results to 1-D check box. If this is left checked, then your 2-by-2 matrix will be turned into a 1-D array by extracting values along each column from left to right, which ends up being [1 3 2 4] in your example.
Once you apply the above changes, then all you should have to do is resize your Display block so that it shows your 2 rows and 2 columns.
In MATLAB, is there a more concise way to handle discrete conditional indexing by column than using a for loop? Here's my code:
x=[1 2 3;4 5 6;7 8 9];
w=[5 3 2];
q=zeros(3,1);
for i = 1:3
q(i)=mean(x(x(:,i)>w(i),i));
end
q
My goal is to take the mean of the top x% of a set of values for each column. The above code works, but I'm just wondering if there is a more concise way to do it?
You mentioned that you were using the function PRCTILE, which would indicate that you have access to the Statistics Toolbox. This gives you yet another option for how you could solve your problem, using the function NANMEAN. In the following code, all the entries in x less than or equal to the threshold w for a column are set to NaN using BSXFUN, then the mean of each column is computed with NANMEAN:
x(bsxfun(#le,x,w)) = nan;
q = nanmean(x);
I don't know of any way to index the columns the way you want. This may be faster than a for loop, but it also creates a matrix y that is the size of x.
x=[1 2 3;4 5 6;7 8 9];
w=[5 3 2];
y = x > repmat(w,size(x,1),1);
q = sum(x.*y) ./ sum(y)
I don't claim this is more concise.
Here's a way to solve your original problem: You have an array, and you want to know the mean of the top x% of each column.
%# make up some data
data = magic(5);
%# find out how many rows the top 40% are
nRows = floor(size(data,1)*0.4);
%# sort the data in descending order
data = sort(data,1,'descend');
%# take the mean of the top 20% of values in each column
topMean = mean(data(1:nRows,:),1);
I would like to load variables from a text file.
For example, my text file, varA, varB, and varC.
In MATLAB, I would like to give these variables values so that every variable is a 2x2 matrix.
So from the text file containing the above information I would get a matrix that looks like this:
[ 1 2 3 4 5 6;
1 2 3 4 5 6]
Is this possible?
I added a second example to try to make things a little clearer.
My text file, text.txt, looks like this
x1 x2 x3
In MATLAB my .m file gives the values to these variables like
x1 = [1 1; 1 1]
x2 = [2 2; 2 2]
x3 = [3 3; 3 3]
So, when I import my textfile I would get
a = (textfile)
a = [1 1 2 2 3 3 ; 1 1 2 2 3 3]
I basically try to adapt a genetic algorithm (GA) on a very huge problem (of travelling salesman problem (TSP) type). The problem is that every variable I have is a matrix and the crossover, fitness and mutation codes get pretty complicated. And I am having problems of making a random start population as well.
I would like to randomly select, let's say 30 variables, from a list with 256 so that the variable can only be picked once. Each variable however have their own specific values in a 2*2 matrix that cannot be changed.
I would like to use randperm and then put an x before every value making them variables instead of values...
If the data in the text file looks like this (strings separated by spaces):
x1 x2 x3 ...
You can read the strings into a cell array using TEXTSCAN like so:
fid = fopen('file.txt','r');
A = textscan(fid,'%s');
fclose(fid);
A = A{:};
A now stores the strings in a cell array: {'x1'; 'x2'; 'x3'; ...}. Now, to make a variable out of one of these strings and assign it a value, I would use ASSIGNIN:
assignin('base',A{1},[1 2; 1 2]);
This will create a variable x1 in the base workspace and assign it the value [1 2; 1 2]. The first argument can be either 'base' or 'caller' to create a variable in either the MATLAB base workspace or the workspace of the caller function. You would repeat this for each string name in A, giving it whatever value you want.
ALTERNATE OPTION:
This is an alternate answer to the one I gave you above. The above answer addresses the specific problem you raised in your question. This answer gives you a whole other option to potentially avoid doing things the way you were describing them in your question, and it will hopefully make things easier for you...
If I understand your problem, you basically have 256 2-by-2 matrices, and you want to randomly pick 30 of them. Each of these 2-by-2 matrices sounds like it is stored in its own variable (x1 to x256). Instead, I would suggest storing all 256 matrices in just one variable as either a 3-D array:
xArray = zeros(2,2,256); % Initialize all matrices as [0 0; 0 0]
xArray(:,:,1) = [1 1; 2 2]; % This enters a value for the first matrix
or a cell array:
xArray = cell(1,256); % Initializes an empty array of cells
xArray{1} = [1 1; 2 2]; % Enters a value for the first matrix
You would have to initialize all the values first. Then if you want to randomly pick 30 values, you can next randomize the order of either the third dimension of the 3-D array or the order of the cell array by using RANDPERM:
startOrder = 1:256; % The default order of the matrices
index = randperm(256); % Randomly order the numbers 1 to 256
xArray = xArray(:,:,index); % For a 3-d array
xArray = xArray(index); % For a cell array
Then just use the first 30 entries in xArray for your calculations (instead of the individual variables like you were before):
x = xArray(:,:,1); % Puts the first matrix from the 3-D array in x
x = xArray{1}; % Puts the first matrix from the cell array in x
You can keep repeating the use of RANDPERM to keep generating new randomized arrays of matrices. If you have to keep track of which original matrices you are using, you have to add this line after you randomize xArray:
startOrder = startOrder(index);
Now the entries of startOrder will tell you the original position a matrix was in. For example, if the first array entry in startOrder is 40, then the matrix in the first position of xArray was originally the 40th matrix you entered when you initialized xArray.
Hope this helps!