Is there a way of reversing some axis in the blob (N x C x H x W -> N x C x Rev(H) x W or Rev(N) x C x H x W)? The only idea I have is to slice it by this dim to N blobs with shape 1 (by required axis) and then concat in reverse order. Is there any better way?
Because I don't want to make slice with 128 tops (or smth like that) and then concat with 128 bottoms. And I'm not sure about the productivity.
Also I can't use BatchReindex because it requires 2 bottom blobs, but I must emulate it with caffe layers and with 1 bottom. And I can't manually extend caffe.
Reason of restrictions: it should work with unmodified caffe on other computers.
Related
How we get Feature maps number in the Conv output (Depth H)
I think that H = D * number of filters !!
source of picture
H can be chosen freely. It does not depend on any of the other parameters.
Each of the H "feature maps" will be produced by a different k x k x D kernel. This is often described as a single 4D kernel with shape H x k x k x D.
In the source text it says the same thing, but perhaps more clearly:
The input is of size N x N x D and is convolved with H kernels, each of size k x k x D separately. Convolution of an input with one kernel produces one output feature, and with H kernels independently produces H features.
The terminology can be confusing at first, as there are multiple terms used for the same thing. H could be called the number of kernels, or number of filters, or number of output features or the number of filter maps.
I have a set of climate data (temperature, pressure and moisture for example), X, Y, Z which are matricies with dimensions (n x p) where n is the number of observations and p is the number of spatial points.
Previously, to investigate modes of variability in dataset X, I simply performed a empirical orthogonal function (EOF) analysis OR Principle component Analysis (PCA) on X. This involved decomposing (via SVD), the matrix X.
To investigate the coupling of the modes of variability of X and Y, I used maximum covariance analysis (MCA) which involved decomposing a covariance matrix proportional to XY^{T}. (T is transpose)
However, if I wish to looked at all three datasets, how do I go about doing this? One idea I had was to form a fourth matrix, L, which will be the 'feature' concatenation of the three datasets:
L = [X, Y, Z]
so that my matrix L will have dimensions (n x 3p).
I would then use standard PCA/EOF analysis and use SVD to decompose this matrix L and then I would obtain modes of variabiilty with size (3p x 1) and thus subsequently the mode associated with X is the first p values, the mode associated with Y is the second set of p values and the mode associated with Z is the last p values.
Is this correct? Or can anyone suggest a better way of looking at the coupling of all three (or more) datasets?
Thank you so much!
I'd recommend to treat spatial points as extra dimension, i.e. f x n x p, where 'f' is your number of features. At this point you should use multilinear extension of PCA that can work on tensor data.
The title of this post may be a bit confusing. Please allow me to provide a bit of context and then elaborate on what I'm asking. For your reference, the question I'm asking is toward the end and is denoted by bold letters. I provide some code, outlining where I'm currently at in solving the problem, immediately beforehand.
Essentially what I'm trying to do is Kernel Regression, which is usually done using a single test point x and a set of training instances . A reference to this can be found on wikipedia here. The kernel I'm using is the RBF kernel, a Wikipedia reference for which can be found here.
Anyway, I have some code written in Matlab so that this can be done quickly for a single instance of x, which is 1 x p in size. What I'd like to do is make it so I can estimate for numerous points very quickly, say m x p.
For the sake of avoiding notational mixups, I'll let the training instances be denoted Train and the instances I want estimates for as Test: and . It also needs to be mentioned that I want to estimate a vector of numbers for each of the m points. For a single point this vector would be 1 x v in size. Now I need it to be m x v. Therefore, Train will also have a vector of these know values associated with it called TS: . Lastly, we need a vector of sigmas that is 1 x v in size. This is denoted as Sig.
Here's the code I have so far:
%First, we have to get the matrices to equivalent size so we can subtract Train from Test
tm0 = kron(ones(size(Train,1),1),Test) - kron(ones(size(Test,1),1),Train);
%Secondly, we apply the Euclidean norm sq by row and then multiply each of these results by each element (j) in Sig times 1/2j^2
tm3 = exp(-kron(sum((tm0).^2,2),1/2./(Sig.^2)));
Now, at this point tm3 is an (m*n) x v matrix. This is where my question is: I now need to multiply TS' (TS transpose) times each of the n x v-sized segments in tm3 (there are m of these segments), get the diagonal elements of each of these resulting segments (after multiplication one of the m segments will be v x v, so each chunk of diagonal elements will be 1 x v meaning the resulting matrix is m x v) and sum these diagonal elements together to produce an m x 1 sized matrix. Lastly, I will need to divide each entry i in this m x 1 matrix by each of the v elements in the ith row of the diagonal-holding m x v-sized matrix, producing an m x v-sized result matrix.
I hope all of that makes sense. I'm sure there's some kind of trick that can be employed, but I'm just not coming up with it. Any help is greatly appreciated.
Edit 1: I was asked to provide more of an example to help demonstrate what it is that I would like done. The following represent that two matrices I'm talking about, TS and tm3:
As you can see, TS'(TS transpose) is v x n and tm3 is mn x v. In tm3 there are blocks that are of size n x v -- there are m blocks of this size. Notice that the size of TS' is of size v x n. This means that I can multiply TS' by a single block of tm3, which again is of size n x v. This would result in a matrix that is v x v in size. I would like to do this operation -- individually multiplying TS' by each of the n x v-sized blocks of tm3, which would produce m v x v matrices.
From here, though, I would like to obtain the diagonal elements from each of these v x v matrices. So, for a single v x v matrix, denoted using a:
Ultimately, I would to do this for each of the m v x v matrices giving me something that looks like the following, where s is the mth v x v matrix:
If I denote this last matrix as Q, which is m x v in size, it is trivial to sum the elements across the rows to produce the m x 1 vector I was looking for. I will refer to this vector as C. However, I would then like to divide each of these m scalar values by the corresponding row of matrix Q, to produce another m x v matrix:
This is the final matrix I'm looking for. Hopefully this helps make it clear what I'm looking for. Thanks for taking the time to read this!
Thought: I'm pretty sure I could accomplish this by converting tm3 to a cell matrix by doing tc1 = mat2cell(tm3,repmat(length(Train),1,m),length(Sig)), and then put replicate TS m times in another cell matrix tc2 = mat2cell(TS',length(indirectSigma),repmat(length(Train),1,m))'. Finally, I could do operations like tc3 = cellfun(#(a,b) a*b, tc2,tc1,'UniformOutput',false), which would give me m cells filled with the v x v matrices I was looking for. I could proceed from there. However, I'm not sure how fast these cell operations are. Can anybody comment? I'm afraid they might be slow, so I would prefer operations be performed on normal matrices, which I know to be fast. Thanks!
I have just started working using CCA in Matlab. I have two vectors X and Y of dimension 60x1920 and 60x1536 with the number of samples being 60 and variables in the different set of vectors being 1920 and 1536 respectively. I want to know do CCA for reducing them to the subspace and then do feature matching.
I am using this commands.
%% DO CCA
[A,B,r,U,V] = canoncorr(X,Y);
The output I get is this :
Name Size Bytes Class Attributes
A 1920x58 890880 double
B 1536x58 712704 double
U 60x58 27840 double
V 60x58 27840 double
r 1x58 464 double
Can anyone please tell me what these variables mean. I have gone over the documentation several times and still is unclear about them. As I understand CCA finds two linear projection matrices Wx and Wy such that the projection of X and Y on Wx and Wy are maximally correlated.
1) Could anyone please tell me which of the following matrices are these?
2) Also how can I find the projected vectors in the learned subspace of CCA?
Any help will be appreciated. Thanks in advance.
As I understand it, with X and Y being your original data matrices, A and B are the sets of coefficients that perform a change of basis to maximally correlate your original data. Your data is represented in the new bases as the matrices U and V.
So to answer your questions:
The projection matrices you are looking for would be A and B since they transform X and Y into the new space.
The resulting projections of X and Y into the new space would be U and V, respectively. (The r vector represents the entries of the correlation matrix between U and V, which is a diagonal matrix.)
The The MATLAB documentation says this transformation can be done with the following formulae, where N is the number of observations:
U = (X-repmat(mean(X),N,1))*A
V = (Y-repmat(mean(Y),N,1))*B
This page lays out the process nicely so you can see what each coefficient means in the transformation process.
I am trying to do BRISK my own code in matlab.
Where ı am stack, ı don't understand what this expression means.
let us consider one of the N*(N −1)/2 sampling-point pairs (pi, pj).
A = {(pi, pj) ∈ R2 × R2 | i < N ∧ j < i ∧ i, j ∈ N }
The other my question , what is the difference between local gradient and global gradient?
The expression means that you are looking at a pair of pixels (pi, pj), such that both pixels belong to the region R2 x R2, and the two pixels cannot be the same.
Gradient is a vector (Ix, Iy), where Ix is the first derivative in the x direction, and Iy is the first derivative in the y direction. This vector is defined at a point, so gradient is local by definition. I don't know what global gradient means. More context may help here.
Given that we have set of points of size N. N*(N −1)/2 is N choose 2 which equals number of subsets of size 2 that can be taken from a set of size N ( a concept in probability called Combinations). Because you are working with pair of points you need subset size to be 2.
R refers to the set of all real numbers ( a single value). When it is squared it refers to the Cartesian plane, so pi is a pair of real numbers(x,y), a point in the Cartesian plane.
The character '^' is AND operation. So all the following conditions has to be satisfied:
the index i of the first point, pi, should be less than N
the index j of the second point must be less than the index of the first point.
Like i, j must also be less than N
Local gradient is computed locally on the pair of pixels pi and pj. while, global gradient is estimated for the region surrounding the keypoint by accumulating local gradients.