I need to have layers of matrix.
i thought it can be done this way:
having n turtles for each patch (in every patch) for representing n layers.
the layers should be grupped in turtle sets.
i will have some comande for showing the k-th layers hinfing the others.
is this a possible solution?
is there a better way?
Related
I am new to convolutional neural networks, and I am learning 3D convolution.
What I could understand is that 2D convolution gives us relationships between low level features in the X-Y dimension, while the 3D convolution helps detect low level features and relationships between them in all the 3 dimensions.
Consider a CNN employing 2D conv layers to recognize hand written digits. If a digit, say 5, was written in different colors:
Would a strictly 2D CNN would perform poorly (since they belong to different channels in the z dimension)?
Also, are there practical well-known neural nets that employ 3D convolution?
The problem is that the 2D aspects of an image have locality. In a sense, things that are nearby are expected to be related in some fundamental way. E.g. a pixel near a hair pixel is expected to be a hair pixel, a priori. However, the different channels have no such relationship. When you only have 3 channels, a 3D convolution is equivalent to being fully connected in z. When you have 27 channels (e.g. in the middle of the net), why would any 3 channels be considered "close" to each other?
This answer explains the difference nicely.
Doing a "fully-connected" relationship over the channels is what most libraries do by default. Note this line in particular: "...a filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels]". For an input vector of size in_channels, a matrix of size [in_channels, out_channels] is fully-connected. So, the filter can be thought of as a fully-connected layer on a "patch" of image size [filter_height, filter_width].
To illustrate, on a single channel, a regular plain old image filter takes a patch of image and maps that patch to a single pixel in a new image. Like so: (image credit)
On the other hand, suppose that we have multiple channels. Instead of performing a linear mapping from a 3x3 patch to a 1x1 pixel, we perform a linear mapping from a 3x3xin_channels patch to a 1x1xout_channels set of pixels. How do we do this? Well, a linear mapping is just a matrix. Note that a 3x3xin_channels patch can be written as a vector with 3*3*in_channels entries. A 1x1xout_channels set of pixels can be written as a vector with out_channels entries. A linear mapping between the two is given by a matrix with 3*3*in_channels rows and out_channels columns. The entries of that matrix are the parameters of that layer of the network. The layer works by simply multiplying the in vector by the matrix of weights to get the out vector. This is repeated over all patches of an image. (Actually, instead of doing this in a loop over all patches, you can achieve an equivalent thing with some fanciness which is what libraries do in practice, but it gives the same result)
To illustrate, the mapping takes this 3x3xin_channels column:
To this 1x1xout_channels stack of pixels:
Now, what you are proposing is to do something with the following bit:
There is no mathematical reason why you can't do something with that 3x3x3 patch containing only 3 channels of your whole set of in_channels. However, whatever 3 channels you choose is totally arbitrary, and they have no intrinsic relationship to one another that would suggest that treating them as being "nearby" would help.
To reiterate, in an image, the pixels that are near each other are expected to be "similar" or "related" in some sense. This is why a convolution works at all. If you jumbled up the pixels and then did a convolution, it would be worthless. On that note, all of the channels are just a jumble. There is no "nearby relatedness" property along the channels. E.g. the "red" channel isn't near the "green" channel OR the "blue" channel, because "nearness" doesn't make any sense between the channels. Since "nearness" isn't a property of the channel dimension, then doing a convolution in that dimension probably isn't going to be useful.
On the other hand, we can simply take the input of ALL of the in_channels to generate the output from ALL of the out_channels simultaneously, and let them influence each other in a linear sort of way. Note that the linear transformation described involves a sort of cross-pollination of the parameters. For example, for a layer at the top of the network, taking in a 3x3 patch of r,g,b channels labeled r_1_1-r_3_3 etc., a single pixel in a single channel of the output from that patch would look like:
A*r_1_1 + B*r_1_2 + ... C*r_3_3 + D*b_1_1 + E*b_1_2 + ... F*b_3_3 + G*g_1_1 + ...
Where the capital letters are entries of the weight matrix.
So your observation: "Would a strictly 2D CNN would perform poorly?" is based on an assumption that the convolutional layer doesn't include any "mixing" between the various channels. This is not the case. The in_channels are ALL combined in a linear mapping to obtain the out_channels.
I am looking for a clustering algorithm such a s DBSCAN do deal with 3d data, in which is possible to set different epsilons depending on the axis. So for instance an epsilon of 10m on the x-y plan, and an epsilon 0.2m on the z axis.
Essentially, I am looking for large but flat clusters.
Note: I am an archaeologist, the algorithm will be used to look for potential correlations between objects scattered in large surfaces, but in narrow vertical layers
Solution 1:
Scale your data set to match your desired epsilon.
In your case, scale z by 50.
Solution 2:
Use a weighted distance function.
E.g. WeightedEuclideanDistanceFunction in ELKI, and choose your weights accordingly, e.g. -distance.weights 1,1,50 will put 50x as much weight on the third axis.
This may be the most convenient option, since you are already using ELKI.
Just define a custom distance metric when computing the DBSCAN core points. The standard DBSCAN uses the Euclidean distance to compute points within an epsilon. So all dimensions are treated the same.
However, you could use the Mahalanobis distance to weigh each dimension differently. You can use a diagonal covariance matrix for flat clusters. You can use a full symmetric covariance matrix for flat tilted clusters, etc.
In your case, you would use a covariance matrix like:
100 0 0 0 100 0 0 0 0.04
In the pseudo code provided at the Wikipedia entry for DBSCAN just use one of the distance metrics suggested above in the regionQuery function.
Update
Note: scaling the data is equivalent to using an appropriate metric.
I am trying to implement a classification NN in Matlab.
My inputs are clusters of coordinates from an image. (Corresponding to delaunay triangulation vertexes)
There are 3 clusters (results of the optics algorithm) in this format:
( Not all clusters are of the same size.). Elements represent coordinates in euclidean 2d space . So (110,12) is a point in my image and the matrix depicted represents one cluster of points.
Clustering was done on image edges. So coordinates refer to logical values (always 1s in this case) on the image matrix.(After edge detection there are 3 "dense" areas in an image, and these collections of pixels are used for classification). There are 6 target classes.
So, my question is how can I format them into single column vector inputs to use in a neural network?
(There is a relevant answer here but I would like some elaboration if possible. ( I am probably too tired right now from 12 hours of trying stuff and dont get it 100% :D :( )
Remember, there are 3 different coordinate matrices for each picture, so my initial thought was, create an nn with 3 inputs (of different length). But how to serialize this?
Here's a cluster with its tags on in case it helps:
For you to train the classifier, you need a matrix X where each row will correspond to an image. If you want to use a coordinate representation, this means all images will have to be of the same size, say, M by N. So, the row of an image will have M times N elements (features) and the corresponding feature values will be the cluster assignments. Class vector y will be whatever labels you have, that is one of the six different classes you mentioned through the comments above. You should keep in mind that if you use a coordinate representation, X can get very high-dimensional, and unless you have a large number of images, chances are your classifier will perform very poorly. If you have few images, consider using fractions of pixels belonging to clusters that I suggested in one of the comments: this can give you a shorter feature description that is invariant to rotation and translation, and may yield better classification.
Please help. The condensation algorithm steps to track an object in a frame are:
Initialize a point and choose N random points around it and set the weight to be 1/N.
Propagate the points as per the dynamic model (constant velocity model).
Now, calculate the weights of each particle using an observation model. Till now, there are N particles.
Now, in the resampling step pick N particles from the above set of particles?
What? How can we pick N particles from N particles?
How can we pick N particles from N particles?
Pick N particles with replacement, i.e one particle can be chosen more than one time according to the weight you assign to it.
If you have 3 particles , with weight [0.1 0.2 0.7 ] , then choose the 3rd particle twice and 2nd particle one,hence you have selected get 3 particles from 3 particles itself.
There are many techniques to do this step i.e resampling N particles.Even I am trying to write a code for this part only.
Some sites where you can learn about resampling is :-
Udacity - Artificial Intelligence for Robotics - Link to the course page
IEEE paper- tutorial on Particle filter for online .... by Arulampalam ,gordon maskell, This is a highly cited paper,and almost every where people have taken references for particle filter from here only.
This paper is lecture tutorial in which they have explained resampling nicely, just follow the algorithm, and i think your code will do the resampling Link
I don't know how much detail is required...forgive me if most of this is known. The particle filter attempts to estimate the posterior distribution p(x_t|y_1,...,y_t) based on the observations y_1,...,y_t. The "correction step" relies on the simplification:
p(x_t|y_1,...,y_t) = p(y_t|x_t)p(x_t|y_1,...,y_t-1)/p(y_t|y_1,...,y_t-1)
N points are sampled from this posterior distribution, then evolved according to the right-hand side to approximate the next posterior. We are not dealing with normals so we need to approximate more than just 2 moments. The N points for the next step are then resampled from the new posterior, rather than just using the old points, wherever they have eveolved. The reason is the well-known degeneracy effect - that all points but 1 will tend to 0.
So we are not picking N particles from N particles, but rather throwing away the old particles, and resampling N from the new estimate of the posterior.
Say, I have a cube of dimensions 1x1x1 spanning between coordinates (0,0,0) and (1,1,1). I want to generate a random set of points (assume 10 points) within this cube which are somewhat uniformly distributed (i.e. within certain minimum and maximum distance from each other and also not too close to the boundaries). How do I go about this without using loops? If this is not possible using vector/matrix operations then the solution with loops will also do.
Let me provide some more background details about my problem (This will help in terms of what I exactly need and why). I want to integrate a function, F(x,y,z), inside a polyhedron. I want to do it numerically as follows:
$F(x,y,z) = \sum_{i} F(x_i,y_i,z_i) \times V_i(x_i,y_i,z_i)$
Here, $F(x_i,y_i,z_i)$ is the value of function at point $(x_i,y_i,z_i)$ and $V_i$ is the weight. So to calculate the integral accurately, I need to identify set of random points which are not too close to each other or not too far from each other (Sorry but I myself don't know what this range is. I will be able to figure this out using parametric study only after I have a working code). Also, I need to do this for a 3D mesh which has multiple polyhedrons, hence I want to avoid loops to speed things out.
Check out this nice random vectors generator with fixed sum FEX file.
The code "generates m random n-element column vectors of values, [x1;x2;...;xn], each with a fixed sum, s, and subject to a restriction a<=xi<=b. The vectors are randomly and uniformly distributed in the n-1 dimensional space of solutions. This is accomplished by decomposing that space into a number of different types of simplexes (the many-dimensional generalizations of line segments, triangles, and tetrahedra.) The 'rand' function is used to distribute vectors within each simplex uniformly, and further calls on 'rand' serve to select different types of simplexes with probabilities proportional to their respective n-1 dimensional volumes. This algorithm does not perform any rejection of solutions - all are generated so as to already fit within the prescribed hypercube."
Use i=rand(3,10) where each column corresponds to one point, and each row corresponds to the coordinate in one axis (x,y,z)