matlab: grouping variables for gscatter

matlab: grouping variables for gscatter - matlab

I'm a complete Matlab newbie, so please bear with me :) I'm using my friend's vague instructions, so I don't if they are correct.
I have variable named m12 (imported form an .xls file), which is an 61x3 array. No labels.
The first column contains leverages, the second standardised residuals for the training (first 46 rows) and validation (remaining 15 rows) sets of a PLS model.
I want to group those first two columns, so that the training set is represented by blue 'X', and validation set by red 'O', so I put 46 rows of '1's and next 15 rows '2's in the third column.
My friend told me to simply type:
group(:,3)
gscatter(m12(:,1), m12(:,2), group,
'br', 'xo')
but when I type
group(:,3)
I get an "??? Undefined variable group." error.
Can anyone help me?

Just write
group = m12(:,3);
instead of your first line.
This way, you're defining a vector group that contains all the entries of the third column of m12, i.e. your grouping variable.

Related

How to get a collective output of multiple loop run using a selection condition in Matlab?

I have a table (L-arrival) of 279 rows and 252 columns. Only the first column has values while others are just NaN. The cells in the first column have multiple values (i.e. some have 1, some have 4 number of values). First of all, I am trying to select a single maximum value from each cell of the first column so that I can have a column of a single value for each cell only. Then I want to do this in a loop so that for every new value that I get, they are sorted and only the maximum values are chosen. Finally, I want to make a collection of these values obtained from multiple runs for each cell. Can anyone suggest to me how it can be approached in MatLab?I tried using the following code but didn't work well.
for b=1:279
m = numel(cell2mat(L_arrival(b,1)));
g(b)=mat2cell([cell2mat(g(b)); cell(L_arrival(b,1))]',[1 2]);
end

How to avoid CIRCULAR REFERENCE in TABLEAU using LOOKUP and PREVIOUS_VALUE?

Hello!
In Excel I have 2 columns C and D with formulas in there for a specific purpose.
As an Example I have here cells C12 and D12 in these 2 columns to show he formulas.
C12 = 0.001855 * B12/E12 + 0.998145 * (C11+D11)
D12 = 0.981119 * (C12-C11) + 0.018881 * D11
Let's say the C-column variable is "Running Base" and the D-column variable is the "Growth" and the rows are months. And say I want to copy these formula's to a Tableau worksheet with months in the rows.
You see that C12 is using both it's own previous value C11 (the lag -1 of C) and the lag -1 of D (D11). I can find C11 in the formula in TABLEAU using the PREVIOUS_VALUE function and the previous value of D with the LOOKUP([D],-1) function (the B12 and E12 are not important for the discussion).
Then D12 is also using it's own previous value D11 and both C12 and its previous value C11. Of course we can do similar TABLEAU exercises here, but you already feel a CIRCULAR REFERENCE error coming up ;-).
So, there is no actual CIRCULAR REFERENCE and it's working in Excel. But I do understand why TABLEAU is giving one and I am sure there must be a work-round to this.
Can anybody help please???
Thx very much in advance!!
Herman Mentink

One way to solve this is to drop the calculation in Details part of the Marks and then reference the rows of that field in calculation which doesn't create a circular reference else you are correct tableau will create a circular reference error as behaviour is different from Excel.
Edit----------------------------------------------------------------
Not just a tool tip infact you can use those in calculated fields aswell, Few months back I have implemented same in my report.
A calculation creates 4 rows in my report and I need to do (1st row+4th row) in the same column, so I dropped the calculation in Detail on the Marks and referred same another calculated field. Just check the below example code:
LOOKUP(ATTR([Values]),FIRST()+4) + LOOKUP(ATTR([Values]),FIRST()+1)
Values is the calculated field now above caluction is referring to Values and picking rows 2 and 5 which I can't use directly on the Values column if I drop Values on Text in Marks
In the above image values in detail is original calcualtion and the one I used in text is the referring to values and displaying.

MATLAB Extracting Column Number

My goal is to create a random, 20 by 5 array of integers, sort them by increasing order from top to bottom and from left to right, and then calculate the mean in each of the resulting 20 rows. This gives me a 1 by 20 array of the means. I then have to find the column whose mean is closest to 0. Here is my code so far:
RandomArray= randi([-100 100],20,5);
NewArray=reshape(sort(RandomArray(:)),20,5);
MeanArray= mean(transpose(NewArray(:,:)))
X=min(abs(x-0))
How can I store the column number whose mean is closest to 0 into a variable? I'm only about a month into coding so this probably seems like a very simple problem. Thanks

You're almost there. All you need is a find:
RandomArray= randi([-100 100],20,5);
NewArray=reshape(sort(RandomArray(:)),20,5);
% MeanArray= mean(transpose(NewArray(:,:))) %// gives means per row, not column
ColNum = find(abs(mean(NewArray,1))==min(abs(mean(NewArray,1)))); %// gives you the column number of the minimum
MeanColumn = RandomArray(:,ColNum);
find will give you the index of the entry where abs(mean(NewArray)), i.e. the absolute values of the mean per column equals the minimum of that same array, thus the index where the mean of the column is closest to 0.
Note that you don't need your MeanArray, as it transposes (which can be done by NewArray.', and then gives the mean per column, i.e. your old rows. I chucked everything in the find statement.
As suggested in the comment by Matthias W. it's faster to use the second output of min directly instead of a find:
RandomArray= randi([-100 100],20,5);
NewArray=reshape(sort(RandomArray(:)),20,5);
% MeanArray= mean(transpose(NewArray(:,:))) %// gives means per row, not column
[~,ColNum] = min(abs(mean(NewArray,1)));
MeanColumn = RandomArray(:,ColNum);

Display multiple values in one line

I'm new to MATLAB. I have two values x and y. Both of them contain values with unknown accuracy. The problem: How could I display them both in one row with 2 digits after comma? Like:
x<tabulation or stack of spaces>y<then goes new line>
Example
RAW data
0,324352 0,75234
1,563 3,4556
Expected output
0,34 0,75
1,56 3,45
Upd: for one value it works well
disp('x=' num2str(x,3));
Purpose is: display TWO values on one row with the new line symbol

The answer is:
disp(num2str([x y],3));
The 3 value means - max.quanity of symbols after comma including it(am I wrong? just thoughts)
Another Idea:
Somehow represent X and Y as array values and then display them.

How to implement data I have to svmtrain() function in MATLAB?

I have to write a script using MATLAB which will classify my data.
My data consists of 1051 web pages (rows) and 11000+ words (columns). I am holding the word occurences in the matrix for each page. The first 230 rows are about computer science course (to be labeled with +1) and remaining 821 are not (to be labeled with -1). I am going to label few part of these rows (say 30 rows) by myself. Then SVM will label the remaining unlabeled rows.
I have found that I could solve my problem using MATLAB's svmtrain() and svmclassify() methods. First I need to create SVMStruct.
SVMStruct = svmtrain(Training,Group)
Then I need to use
Group = svmclassify(SVMStruct,Sample)
But the point that I do not know what Training and Group are. For Group Mathworks says:
Grouping variable, which can be a categorical, numeric, or logical
vector, a cell vector of strings, or a character matrix with each row
representing a class label. Each element of Group specifies the group
of the corresponding row of Training. Group should divide Training
into two groups. Group has the same number of elements as there are
rows in Training. svmtrain treats each NaN, empty string, or
'undefined' in Group as a missing value, and ignores the corresponding
row of Training.
And for Training it is said that:
Matrix of training data, where each row corresponds to an observation
or replicate, and each column corresponds to a feature or variable.
svmtrain treats NaNs or empty strings in Training as missing values
and ignores the corresponding rows of Group.
I want to know how I can adopt my data to Training and Group? I need (at least) a little code sample.
EDIT
What I did not understand is that in order to have SVMStruct I have to run
SVMStruct = svmtrain(Training, Group);
and in order to have Group I have to run
Group = svmclassify(SVMStruct,Sample);
Also I still did not get what Sample should be like?
I am confused.

Training would be a matrix with 1051 rows (the webpages/training instances) and 11000 columns (the features/words). I'm assuming you want to test for the existence of each word on a webpage? In this case you could make the entry of the matrix a 1 if the word exists for a given webpage and a 0 if not.
You could initialize the matrix with Training = zeros(1051,11000); but filling the entries would be up to you, presumably done with some other code you've written.
Group is a 1-D column vector with one entry for every training instance (webpage) than tells you which of two classes the webpage belongs to. In your case you would make the first 230 entries a "+1" for computer science and the remaining 821 entries a "-1" for not.
Group = zeros(1051,1); % gives you a matrix of zeros with 1051 rows and 1 column
Group(1:230) = 1; % set first 230 entries to +1
Group(231:end) = -1; % set the rest to -1