Suppose [1 0 0 1 0 1] <--> [0 0 0 1], this is a association then in implementing BAM, why we convert 0 to -1 and then calculate weight matrix.
The fundamental reason why 0 are unsuitable for BAM storage is that 0's in binary patterns are ignored when added, but -1's in bipolar patterns are not: 1+0=1 but 1+(-1)=0. If the numbers are matrix entries that represent synaptic strengths, then multiplying and ass in binary quantities can only produce excitatory connections.
Meanwhile multiplying and add in bipolar quantities produces excitatory and inhibitory connections. The connections strengths represent the frequency of excitatory and inhibitory connections int the individual correlation matrices.
Refer:
http://sipi.usc.edu/~kosko/BAM.pdf "Bidirectional Associative Memory, Bart Kosko"
Related
Suppose a dataset comprises independent variables that are continuous and binary variables. Usually the label/outcome column is converted to a one hot vector, whereas continuous variables can be normalized. But what needs to be applied for binary variables.
AGE RACE GENDER NEURO EMOT
15.95346 0 0 3 1
14.57084 1 1 0 0
15.8193 1 0 0 0
15.59754 0 1 0 0
How does this apply for logistic regression and neural networks?
If the range of continuous value is small, encode it into a binary form and use each bit of that binary form as a predictor.
For example, number 2 = 10 in binary.
Therefore
predictor_bit_0 = 0
predictor_bit_1 = 1
Try and see if it works. Just to warn you, this method is very subjective and may or may not yield good results for your data. I'll keep you posted if I find a better solution
I'm trying to make an ANN which could tell me if there is causality between my input and output data. Data is following:
My input are measured values of pesticides (19 total) in an area eg:
-1.031413662 -0.156086316 -1.079232918 -0.659174849 -0.734577317 -0.944137546 -0.596917991 -0.282641072 -0.023508282 3.405638835 -1.008434997 -0.102330305 -0.65961995 -0.687140701 -0.167400684 -0.4387984 -0.855708613 -0.775964435 1.283238514
And the output is the measured value of plant-somthing in the same area (55 total) eg:
0.00 0.00 0.00 13.56 0 13.56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13.56 0 0 0 1.69 0 0 0 0 0 0 0 0 0 0 1.69 0 0 0 0 13.56 0 0 0 0 13.56 0 0 0 0 0 0
Values for input are in range from -2.5 to 10, and for output from 0 to 100.
So the question I'm trying to answer is: in what measure does pesticide A affect plant-somthings.
What are good ways to model (represent) input/output neurons to be able to process the mentioned input/output data? And how to scale/convert input/output data to be useful for NN?
Is there a book/paper that I should look at?
First, a neural network cannot find the causality between output and input, but only the correlation (just like every other probabilistic methods). Causality can only be derived logically from reasoning (and even then, it's not always clear, it all depends on your axioms).
Secondly, about how to design a neural network to model your data, here is a pretty simple rule that can be generally applied to make a first working draft:
set the number of input neurons = the number of input variables for one sample
set the number of output neurons = the number of output variables for one sample
then play with the number of hidden layers and the number of hidden neurons per hidden layer. In practice, you want to use the fewest number of hidden layers/neurons to model your data correctly, but enough so that the function approximated by your neural network fits correctly the data (else the error in output will be huge compared to the real output dataset).
Why do you need to use just enough neurons but not too much? This is because if you use a lot of hidden neurons, you are sure to overfit your data, and thus you will make a perfect prediction on your training dataset, but not in the general case when you will use real datasets. Theoretically, this is because a neural network is a function approximator, thus it can approximate any function, but using a too high order function will lead to overfitting. See PAC learning for more info on this.
So, in your precise case, the first thing to do is to clarify how many variables you have in input and in output for each sample. If it's 19 in input, then create 19 input nodes, and if you have 55 output variables, then create 55 output neurons.
About scaling and pre-processing, yes you should normalize your data between the range 0 and 1 (or -1 and 1 it's up to you and it depends on the activation function). A very good place to start is to watch the videos at the machine learning course by Andrew Ng at Coursera, this should get you kickstarted quickly and correctly (you'll be taught the tools to check that your neural network is working correctly, and this is immensely important and useful).
Note: you should check your output variables, from the sample you gave it seems they use discrete values: if the values are discrete, then you can use discrete output variables which will be a lot more precise and predictive than using real, floating values (eg, instead of having [0, 1.69, 13.56] as the possible output values, you'll have [0,1,2], this is called "binning" or multi-class categorization). In practice, this means you have to change the way your network works, by using a classification neural network (using activation functions such as sigmoid) instead of a regressive neural network (using activation functions such as logistic regression or rectified linear unit).
I was looking to find the most efficient way to find the non zero minimum of a matrix and found this on a forum :
Let the data be a matrix A.
A(~A) = nan;
minNonZero = min(A);
This is very short and efficient (at least in number of code lines) but I don't understand what happens when we do this. I can't find any documentation about this since it's not an operation on matrices like +,-,\,... would be.
Could anyone explain me or give me a link or something that could help me understand what is done ?
Thank you !
It uses logical indexing
~ in Matlab is the not operator. When used on a double array, it finds all elements equal to zero. e.g.:
~[0 3 4 0]
Results in the logical matrix
[1 0 0 1]
i.e. it's a quick way to find all the zero elements
So if A = [0 3 4 0] then ~A = [1 0 0 1] so now A(~A) = A([1 0 0 1]). A([1 0 0 1]) uses logical indexing to only affect the elements that are true so in this case element 1 and element 4.
Finally A(~A) = NaN will replace all the elements in A that were equal to 0 with NaN which min ignores and thus you find the smallest non-zero element.
The code you provided:
A(~A) = NaN;
minNonZero = min(A);
Does the following:
Create a logical index
Apply the logical index on A
Change A, by assigning NaN values
Get the minimum of all values, while not including NaN values
Note that this leaves you with a changed A, which may be indesirable. But more importantly this has some inefficiencies as you spend time changing A and possibly even because you get the minimum of a large matrix.
Therefore you could speed things up (and even reduce one line) by doing:
minNonZero = min(A(logical(A)))
Basically you have now skipped step 3 and possibly reduced step 4.
Furthermore, you seem to get an additional small speedup by doing:
minNonZero = min(A(A~=0))
I don't have any good reason for this, but it seems like step 1 is now done more efficiently.
I am new to Hidden Markov Model. I understand the main idea and I have tried some Matlab built-in HMM functions to help me understand more.
If I have a sequence of observations and corresponding states,
e.g.
seq = 2 6 6 1 4 1 1 1 5 4
states = 1 1 2 2 2 2 2 2 2 2
and I can use hmmestimate function to calculate transition and emission probability matrices as:
[TRANS_EST, EMIS_EST] = hmmestimate(seq, states)
TRANS_EST =
0.5000 0.5000
0 1.0000
EMIS_EST =
0 0.5000 0 0 0 0.5000
0.5000 0 0 0.2500 0.1250 0.1250
In the example, the observation is just a single value.
The example picture below describes my situation.
If I have states: {Sleep, Work, Sport}, and I have a set of observations: {lightoff, light on, heart rate>100 .....}
If I use number to represent each observation, in my situation each state has multiple observations at the same time,
seq = {2,3,5} {6,1} {2} {2,3,6} {4} {1,2} {1}
states = 1 1 2 2 2 2 2
I have no idea how to implement this in Matlab to get transition and emission probability matrix. I am quite lost, what shall I do in the next step? Am I using the right approach?
Thanks!
If you know the hidden state sequence, then max likelihood estimation is trivial: it's the normalized empirical counts. In other words, count up the transitions and emissions and then divide the elements in each row by the total counts in that row.
In the case where you have multiple observation variables, code the observations as a vector where each element gives the value of one of the random variables on that time step, e.g. '{lights=1, computer=0, Heart Rate >100 = 1, location =0}'. The key is that you need to have the same number of observations at each time step or else things will be much more difficult.
I think you have two options.
1) code multiple observations into one number. For example, if you know that the maximal possible value for the observation is N, and at each state you may have at most K observations, then you can code any combinations of observations as a number between 0 and N^K - 1. By doing this, you are assuming that {2,3,6} and {2,3,5} do not share anything, they are completely different two observations.
2) Or you can have multiple emission distributions for each state. I haven't used the built-in functions in matlab for HMM estimation, so I have no idea whether or not it supports that. But the idea is, if you have multiple emission distributions at a state, the emission likelihood is just the product of them. This is what jerad suggests.
Given a lower triangular matrix (100x100) containg cross-correlation
values, where entry 'ij' is the correlation value between signal 'i'
and 'j' and so a high value means that these two signals belong to
the same class of objects, and knowing there are at most four distinct
classes in the data set, does someone know of a fast and effective way
to classify the data and assign all the signals to the 4 different
classes, rather than search and cross check all the entries against
each other? The following 7x7 matrix may help illustrate
the point:
1 0 0 0 0 0 0
.2 1 0 0 0 0 0
.8 .15 1 0 0 0 0
.9 .17 .8 1 0 0 0
.23 .8 .15 .14 1 0 0
.7 .13 .77 .83. .11 1 0
.1 .21 .19 .11 .17 .16 1
there are three classes in this example:
class 1: rows <1 3 4 6>,
class 2: rows <2 5>,
class 3: rows <7>
This is a good problem for hierarchical clustering. Using complete linkage clustering you will get compact clusters, all you have to do is determine the cutoff distance, at which two clusters should be considered different.
First, you need to convert the correlation matrix to a dissimilarity matrix. Since correlation is between 0 and 1, 1-correlation will work well - high correlations get a score close to 0, and low correlations get a score close to 1. Assume that the correlations are stored in an array corrMat
%# remove diagonal elements
corrMat = corrMat - eye(size(corrMat));
%# and convert to a vector (as pdist)
dissimilarity = 1 - corrMat(find(corrMat))';
%# decide on a cutoff
%# remember that 0.4 corresponds to corr of 0.6!
cutoff = 0.5;
%# perform complete linkage clustering
Z = linkage(dissimilarity,'complete');
%# group the data into clusters
%# (cutoff is at a correlation of 0.5)
groups = cluster(Z,'cutoff',cutoff,'criterion','distance')
groups =
2
3
2
2
3
2
1
To confirm that everything is great, you can visualize the dendrogram
dendrogram(Z,0,'colorthreshold',cutoff)
You can use the following method instead of creating the dissimilarity matrix.
Z = linkage(corrMat,'complete','correlation')
This allows Matlab to interpret your matrix as correlation distance and then, you can plot the dendrogram as follows:
dendrogram(Z);
One way to verify if your dendrogram is right or not is by checking its maximum height which should correspond to 1-min(corrMat). If the minimum value in corrMat is 0 then the maximum height of your tree should be 1. If the minimum value is -1 (negative correlation), the height should be 2.
Since it is given that there are going to be 4 groups, I'd start with a pretty simplistic two stage approach.
In the first stage you find the maximum correlation among any two elements, place those two elements in a group, then zero out their correlation in the matrix. Repeat, finding the next highest correlation among two elements and either adding those to an existing group or creating a new one until you have the correct number of groups.
Finally, check which elements aren't in a group, go to their column, and identify the highest correlation they have with any other group. If that element is in a group already, place them in that group as well, otherwise skip to the next element and come back to them later.
If there is interest or anything isn't clear I can add code later. Like I said, the approach is simplistic but if you don't need to verify the number of groups I think it should be effective.