MATLAB Tree Construction - matlab

Now, I have separate any pair that is in common between the two input files. Find out the mean between that pair like this : (correlation in first text file)X(correlation in second text file)/ (correlation in first text file)+(correlation in second text file). Again store these in a separate matrix.
Building a tree :
Now, out of all the elements in both the input files, select the 10 most frequent ones. Each of these form the root of a separate K tree.The algorithm goes like this : For the word at the root level, check all its harmonic mean values with the other tags in the matrix that is developed in the previous step. Select the top two highest harmonic means, and put the other word in the tag pair as the child node of the root.
Can someone please guide me through the MATLAB steps of going through this? Thank you for your time.

Okay, so start by putting the data in a useful format; maybe count the number of distinct words, and make an N-by-M matrix of binary values (I'll call this data1). Each of the N rows will describe the words associated with a single image. Each of the M columns will descibe the images for which a single word is tagged. Therefore, the value at (N, M) is 0 if tag M is not in image N, and 1 if it is.
From this matrix, to find correlation between all pairs of words, you could do:
correlations1 = zeros(M, M);
for i=1:M
for j=1:M
correlations1(i, j) = corr(data1(:, i), data1(:, j));
end
end
now the matrix correlations tells you the correlation between tags. Do the same for the other text file. You can make a matrix of harmonic means with:
h_means = correlations1.*correlations2./(correlations1+correlations2);
You can find the 30 most freqent tags by counting the number of 1s in each column of the data matrix. Since we want to find the most common tags in both files, we'll add the data matricies first:
[~, tag_ranks] = sort(sum(data1 + data2, 1), 'descending'); %get the indices in sorted order
top_tags = tag_ranks(1:30);
For the tree building at the end, you will either want to create a tree class (see classdef), or store the tree in an array. To find the top two highest harmonic means, you will want to look in the h_means matrix; for a tag m1, we can do:
[~, tag_ranks] = sort(h_means(m1, :), 'descending');
top_tag = tag_ranks(1);
second_tag = tag_ranks(2);
You will then need to insert these tags into the tree and repeat.

Related

Deletion of all but the first channel in a cell of matrices

I have a row cell vector M, containing matrices in each cell. Every matrix m (matrix inside the big matrix M) is made of 2 channels (columns), of which I only want to use the first.
The approach I thought about was going through each m, check if it has 2 channels, and if that is the case delete the second channel.
Is there a way to just slice it in matlab? or loop it and obtain the matrix M as the matrix m would disappear.
First code is:
load('ECGdata.mat')
I have the below.
when I double-click in one of the variable , here is what I can see:
As you can see the length of each matrix in each cell is different. Now let's see one cell:
The loop I'm trying to get must check the shape of the matrix (I'm talking python here/ I mean if the matrix has 2 columns then delete the second) because some of the variables of the dataframe have matrix containing one column (or just a normal column).
In here I'm only showing the SR variable that has 2 columns for each matrix. Its not the case for the rest of the variables
You do not need to delete the extra "channel", what you can do is quite simple:
newVar = cellfun(#(x)x(:,1), varName, 'UniformOutput', false);
where varName is SR, VF etc. (just run this command once for each of the variables you load).
What the code above does is go over each element of the input cell (an Nx2 matrix in your example), and select the first column only. Then it stores all outputs in a new cell array. In case of matrices with a single column, there is no effect - we just get the input back.
(I apologize in advance if there is some typo / error in the code, as I am writing this answer from my phone and cannot test it. Please leave a comment if something is wrong, and I'll do my best to fix it tomorrow.)

Build a matrix starting from instances of structure fields in MATLAB

I'm really sorry to bother so I hope it is not a silly or repetitive question.
I have been scraping a website, saving the results as a collection in MongoDB, exporting it as a JSON file and importing it in MATLAB.
At the end of the story I obtained a struct object organised
like this one in the picture.
What I'm interested in are the two last cell arrays (which can be easily converted to string arrays with string()). The first cell array is a collection of keys (think unique products) and the second cell array is a collection of values (think prices), like a dictionary. Each field is an instance of possible values for a set of this keys (think daily prices). My goal is to build a matrix made like this:
KEYS VALUES_OF_FIELD_1 VALUES_OF_FIELD2 ... VALUES_OF_FIELDn
A x x x
B x z NaN
C z x y
D NaN y x
E y x z
The main problem is that, as shown in the image and as I tried to explain in the example matrix, I don't always have a value for all the keys in every field (as you can see sometimes they are 321, other times 319 or 320 or 317) and so the key is missing from the first array. In that case I should fill the missing value with a NaN. The keys can be ordered alphabetically and are all unique.
What would you think would be the best and most scalable way to approach this problem in MATLAB?
Thank you very much for your time, I hope I explained myself clearly.
EDIT:
Both arrays are made of strings in my case, so types are not a problem (I've modified the example). The main problem is that, since the keys vary in each field, firstly I have to find all the (unique) keys in the structure, to build the rows, and then for each column (field) I have to fill the values putting NaN where the key is missing.
One thing to remember you can't simply use both strings and number in one matrix. So, if you combine them together they can be either all strings or all numbers. I think all strings will work for you.
Before make a matrix make sure that all the cells have same element.
new_matrix = horzcat(keys,values1,...valuesn);
This will provide a matrix for each row (according to your image). Now you can use a for loop to get matrices for all the rows.
For now, I've solved it by considering the longest array of keys in the structure as the complete set of keys, let's call it keys_set.
Then I've created for each field in the structure a Map object in this way:
for i=1:length(structure)
structure(i).myMap = containers.Map(structure(i).key_field, structure(i).value_field);
end
Then I've built my matrix (M) by checking every map against the keys_set array:
for i=1:length(keys_set)
for j=1:length(structure)
if isKey(structure(j).myMap,char(keys_set(i)))
M(i,j) = string(structure(j).myMap(char(keys_set(i))));
else
M(i,j) = string('MISSING');
end
end
end
This works, but it would be ideal to also be able to check that keys_set is really complete.
EDIT: I've solved my problem by using this function and building the correct set of all the possible keys:
%% Finding the maximum number of keys in all the fields
maxnk = length(structure(1).key_field);
for i=2:length(structure)
if length(structure(i).key_field) > maxnk
maxnk = length(structure(i).key_field);
end
end
%% Initialiting the matrix containing all the possibile set of keys
keys_set=string(zeros(maxnk,length(structure)));
%% Filling the matrix by putting "0" if the dimension is smaller
for i=1:length(structure)
d = length(string(structure(i).key_field));
if d == maxnk
keys_set(:,i) = string(structure(i).key_field);
else
clear tmp
tmp = [string(structure(i).key_field); string(zeros(maxnk-d,1))];
keys_set(:,i) = tmp;
end
end
%% Merging without duplication and removing the "0" element
keys_set = union_several(keys_set);
keys_set = keys_set(keys_set ~= string(0));

Creating all possible combination of rows in matlab

I have a matrix which is 9x10000 size.
So rows are R1, R2, upto R9.
I want to generate all possible combination of the rows such as
[R1;R2] [R1;R3].. [R1;R9]
[R1;R2;R3]...[R1;R2;R4]... [R1;R2:R3;R4;..R8]
I am currently doing this using for loops.
Is there any better way of doing this.
Basically, counting up the binary from 1 to 2^9-i indicates which rows need to be selected:
M=... your matrix
S=dec2bin(1:2^size(M,1)-1)=='1';
allSubsets=cell(size(S,1),1);
for ix=1:size(S,1)
allSubsets{ix}=M(find(S(ix,:)),:);
end
As in the comment, I'm not sure if you always want the first row. This code doesn't do that, but you can modify it for that easily enough. It still uses for loops, but relies on the "nchoosek" function for the row index generation.
%generate data matrix
nMax=9; %number of rows
M=rand(nMax,1e4); %the data
%cell array of matrices with row combinations
select=cell(2^nMax-nMax-1,1); %ignore singletons, empty set
%for loop to generate the row selections
idx=0;
for i=2:nMax
%I is the matrix of row selections
I=nchoosek(1:nMax,i);
%step through the row selections and form the new matrices
for j=1:size(I,1)
idx=idx+1; %idx tracks number of entries
select{idx}=M(I(j,:),:); %select{idx} is the new matrix with selected rows
%per Floris' comment above you could do
%select{idx}=I(j,:); %save the selection for later
end
end
The function nchoosek, when given a vector, will return all possible ways to choose k values from that vector. You can trick it into giving you what you want with
allCombis = unique(nchoosek([zeros(1,9) 1:9], 9), 'rows');
This will include all possible ways to select 9 values from the set that includes nine zeros, plus the indices of each of the rows. Now you have every possible combination (including "no row at all"). With this matrix generated just once, you can find any combination easily - without having to store them all in memory. You can now pick you combination:
thisNumber = 49; % pick any combination
rows = allCombis(thisNumber, :);
rows(rows==0)=[]; % get rid of the zeros
thisCombination = myMatrix(rows, :); % pick just the rows corresponding to this combination

Preserving matrix columns using Matlab brush/select data tool

I'm working with matrices in Matlab which have five columns and several million rows. I'm interested in picking particular groups of this data. Currently I'm doing this using plot3() and the brush/select data tool.
I plot the first three columns of the matrix as X,Y, Z and highlight the matrix region I'm interested in. I then use the brush/select tool's "Create variable" tool to export that region as a new matrix.
The problem is that when I do that, the remaining two columns of the original, bigger matrix are dropped. I understand why- they weren't plotted and hence the figure tool doesn't know about them. I need all five columns of that subregion though in order to continue the processing pipeline.
I'm adding the appropriate 4th and 5th column values to the exported matrix using a horrible nested if loop approach- if columns 1, 2 and 3 match in both the original and exported matrix, attach columns 4/5 of the original matrix to the exported one. It's bad design and agonizingly slow. I know there has to be a Matlab function/trick for this- can anyone help?
Thanks!
This might help:
1. I start with matrix 1 with columns X,Y,Z,A,B
2. Using the brush/select tool, I create a new (subregion) matrix 2 with columns X,Y,Z
3. I then loop through all members of matrix 2 against all members of matrix 1. If X,Y,Z match for a pair of rows, I append A and B
from that row in matrix 1 to the appropriate row in matrix 2.
4. I become very sad as this takes forever and shows my ignorance of Matlab.
If I understand your situation correctly here is a simple way to do it:
Assuming you have a matrix like so: M = [A B C D E] where each letter is a Nx1 vector.
You select a range, this part is not really clear to me, but suppose you can create the following:
idxA,idxB and idxC, that are 1 if they are in the region and 0 otherwise.
Then you can simply use:
M(idxA&idxB&idxC,:)
and you will get the additional two columns as well.

What's an appropriate data structure for a matrix with random variable entries?

I'm currently working in an area that is related to simulation and trying to design a data structure that can include random variables within matrices. To motivate this let me say I have the following matrix:
[a b; c d]
I want to find a data structure that will allow for a, b, c, d to either be real numbers or random variables. As an example, let's say that a = 1, b = -1, c = 2 but let d be a normally distributed random variable with mean 0 and standard deviation 1.
The data structure that I have in mind will give no value to d. However, I also want to be able to design a function that can take in the structure, simulate a uniform(0,1), obtain a value for d using an inverse CDF and then spit out an actual matrix.
I have several ideas to do this (all related to the MATLAB icdf function) but would like to know how more experienced programmers would do this. In this application, it's important that the structure is as "lean" as possible since I will be working with very very large matrices and memory will be an issue.
EDIT #1:
Thank you all for the feedback. I have decided to use a cell structure and store random variables as function handles. To save some processing time for large scale applications, I have decided to reference the location of the random variables to save time during the "evaluation" part.
One solution is to create your matrix initially as a cell array containing both numeric values and function handles to functions designed to generate a value for that entry. For your example, you could do the following:
generatorMatrix = {1 -1; 2 #randn};
Then you could create a function that takes a matrix of the above form, evaluates the cells containing function handles, then combines the results with the numeric cell entries to create a numeric matrix to use for further calculations:
function numMatrix = create_matrix(generatorMatrix)
index = cellfun(#(c) isa(c,'function_handle'),... %# Find function handles
generatorMatrix);
generatorMatrix(index) = cellfun(#feval,... %# Evaluate functions
generatorMatrix(index),...
'UniformOutput',false);
numMatrix = cell2mat(generatorMatrix); %# Change from cell to numeric matrix
end
Some additional things you can do would be to use anonymous functions to do more complicated things with built-in functions or create cell entries of varying size. This is illustrated by the following sample matrix, which can be used to create a matrix with the first row containing a 5 followed by 9 ones and the other 9 rows containing a 1 followed by 9 numbers drawn from a uniform distribution between 5 and 10:
generatorMatrix = {5 ones(1,9); ones(9,1) #() 5*rand(9)+5};
And each time this matrix is passed to create_matrix it will create a new 10-by-10 matrix where the 9-by-9 submatrix will contain a different set of random values.
An alternative solution...
If your matrix can be easily broken into blocks of submatrices (as in the second example above) then using a cell array to store numeric values and function handles may be your best option.
However, if the random values are single elements scattered sparsely throughout the entire matrix, then a variation similar to what user57368 suggested may work better. You could store your matrix data in three parts: a numeric matrix with placeholders (such as NaN) where the randomly-generated values will go, an index vector containing linear indices of the positions of the randomly-generated values, and a cell array of the same length as the index vector containing function handles for the functions to be used to generate the random values. To make things easier, you can even store these three pieces of data in a structure.
As an example, the following defines a 3-by-3 matrix with 3 random values stored in indices 2, 4, and 9 and drawn respectively from a normal distribution, a uniform distribution from 5 to 10, and an exponential distribution:
matData = struct('numMatrix',[1 nan 3; nan 2 4; 0 5 nan],...
'randIndex',[2 4 9],...
'randFcns',{{#randn , #() 5*rand+5 , #() -log(rand)/2}});
And you can define a new create_matrix function to easily create a matrix from this data:
function numMatrix = create_matrix(matData)
numMatrix = matData.numMatrix;
numMatrix(matData.randIndex) = cellfun(#feval,matData.randFcns);
end
If you were using NumPy, then masked arrays would be the obvious place to start, but I don't know of any equivalent in MATLAB. Cell arrays might not be compact enough, and if you did use a cell array, then you would have to come up with an efficient way to find the non-real entries and replace them with a sample from the right distribution.
Try using a regular or sparse matrix to hold the real values, and leave it at zero wherever you want a random variable. Then alongside that store a sparse matrix of the same shape whose non-zero entries correspond to the random variables in your matrix. If you want, the value of the entry in the second matrix can be used to indicate which distribution (ie. 1 for uniform, 2 for normal, etc.).
Whenever you want to get a purely real matrix to work with, you iterate over the non-zero values in the second matrix to convert them to samples, and then add that matrix to your first.