What model is Rasa NLU entity extraction using? Is it LSTM or just a simple neural network? - neural-network

What kind of model is RASA NLU using to extract the entities and intents after word embedding?

This blog post from Rasa clarifies some aspects.
With Rasa you will first train a vectorizer that transforms each document in a N-dimensional vector, where N is the size of your vocabulary. This is exactly what scikit-learn's CountVectorizer does.
Each intent embedding is instead built as an one-hot vector (or a vector with more 1s if you have "mixed" intents). Each of these vectors has the same dimensions of a document embedding, so I guess N may actually be (vocabulary size) + (number of intents).
At that point Rasa will train a neural network (default: 2 hidden layers) where the loss function is designed to maximise the similarity between document d and intent i if d is labelled as i in the training set (and minimize d's similarity with all the other intent embeddings). The similarity is by default calculated as cosine similarity.
Each new, unseen document is embedded by the neural network and its similarity computed for each of the intents. The intent which is most similar to the new document will be returned as the predicted label.
Old answer:
It's not an LSTM. They say their approach is inspired by Facebook's
StarSpace.
I didn't find the paper above very enlightning, however looking at
Starspace's Github repo, the text classification use case is said
to have same setting as their previous work TagSpace.
The TagSpace paper is more clear and explains how they use a CNN
to embed each document in a space such that its distance to the
associated class vector is minimized. Both words, documents and
classes ("tags") are embedded in the same d-dimensional space and
their distance measured via cosine similarity or inner product.

Related

The visualization of high-dimensional input for two-class classification in SVM

I am trying to find a way to visualize the data with high-dimensional input for two-class classification in SVM, before analysis to decide which kernel to use. In documents online, the visualization of data is given only for two dimensional inputs (I mean two attributes).
Another question rises: What if I have multi-class and more than two attributes?
To visualize, the data should be represented by 3 or less dimension.
Simply PCA can be applied to reduce dimension.
use pre-image using MDS.
refer to a paper The pre-image problem in kernel methods and its matlab code in http://www.cse.ust.hk/~jamesk/publication.html

Matlab Neural Network to classify fingerprint

I have already extracted the features of a fingerprint database then a Neural Network should be applied to classify the images by gender. I haven't worked with NN yet and I know a bit.
What type of NN should be used? Is it Artificial Neural Network or Multi-layer perceptron?
If the image size is not the same among all, does it matter?
Maybe some code sample in this area could help.
A neural network is a function approximator. You can think of it as a high-tech cousin to piecewise linear fitting. If you want to fit the most complex phenomena ever with a single parameter - you are going to get the mean and should not be surprised if it isn't infinitely useful. To get a useful fit, you must couple the nature of the phenomena being modeled with the NN. If you are modeling a planar surface, then you are going to need more than one coefficient (typically 3 or 4 depending on your formulation).
One of the questions behind this question is "what is the basis of fingerprints". By basis I mean the heavily baggaged word from Linear Algebra and calculus that talks about vector spaces, span, and eigens. Once you know what the "basis" is then you can build a neural network to approximate the basis, and this neural network will give reasonable results.
So while I was looking for a paper on the basis, I found this:
http://phys.org/news/2012-02-experts-human-error-fingerprint-analysis.html
http://phys.org/news/2013-07-fingerprint-grading.html
http://phys.org/news/2013-04-forensic-scientists-recover-fingerprints-foods.html
http://phys.org/news/2012-11-method-artificial-fingerprints.html
http://phys.org/news/2011-08-chemist-contributes-method-recovering-fingerprints.html
And here you go, a good document of the basis of fingerprints:
http://math.arizona.edu/~anewell/publications/Fingerprint_Formation.pdf
Taking a very crude stab, you might try growing some variation on an narxnet (nonlinear autogregressive network with external inputs) link. I would grow it until it characterizes your set using some sort of doubling the capacity. I would look at convergence rates as a function of "size" so that the smaller networks inform how long convergence takes for the larger ones. That means it might take a very large network to make this work, but large networks are like the 787 - they cost a lot, take forever to build, and sometimes do not fly well.
If I were being clever, I would pay attention to the article by Kucken and formulate the inputs as some sort of a inverse modeling of a stress field.
Best of luck.
You can try a SOM/LVQ network for classification in MATLAB, and image sizes does matter you should try to normalize the images down to a standard size before doing the feature extraction. This will ensure that each feature vector gets assigned to an input neuron.
function scan(img)
files = dir('*.jpg');
hist = [];
for n = 1 : length(files)
filename = files(n).name;
file = imread(filename);
hist = [hist, imhist(rgb2gray(imresize(file,[ 50 50])))]; %#ok
end
som = selforgmap([10 10]);
som = train(som, hist);
t = som(hist); %extract class data
net = lvqnet(10);
net = train(net, hist, t);
like(img, hist, files, net)
end
Doesn't have code examples but this paper may be helpful: An Effective Fingerprint Verification Technique, Gogoi & Bhattacharyya
This paper presents an effective method for fingerprint verification based on a data mining technique called minutiae clustering and a graph-theoretic approach to analyze the process of fingerprint comparison to give a feature space representation of minutiae and to produce a lower bound on the number of detectably distinct fingerprints. The method also proving the invariance of each individual fingerprint by using both the topological behavior of the minutiae graph and also using a distance measure called Hausdorff distance.The method provides a graph based index generation mechanism of fingerprint biometric data. The self-organizing map neural network is also used for classifying the fingerprints.

EM soft clustering in lingpipe

In Lingpipe's EM tutorial they said that it is possible to run the algorithm with no supervised data:
It is possible to train a classifier in a completely unsupervised fashion by having the initial classifier assign categories at random. Only the number of categories must be fixed. The algorithm is exactly the same, and the result after convergence or the maximum number of epochs is a classifier.
But their class, TradNaiveBayesClassifier required a labeled and an unlabeled corpora to run. How can I modify it to run with no labelled data?
EM is a probabilistic maximal likelihood optimization algorithm. In general, it is applied to unsupervised algorithms (for clustering) such as PLSA, Gaussian Mixture Model.
I think the linepipe doc is saying that you can using random initialization of all data labels (distribution of labels for each data) and then feed into NB to compute the ELBO (evidence lower bound), and then maximize it to get update of parameters.
In short, you will need to use the NB to write up the M step --- updating the model parameters.

Time series classification MATLAB

My task is to classify time-series data with use of MATLAB and any neural-network framework.
Describing task more specifically:
Is is a problem from computer-vision field. Is is a scene boundary detection task.
Source data are 4 arrays of neighbouring frame histogram correlations from the videoflow.
Based on this data, we have to classify this timeseries with 2 classes:
"scene break"
"no scene break"
So network input is 4 double values for each source data entry, and output is one binary value. I am going to show example of src data below:
0.997894,0.999413,0.982098,0.992164
0.998964,0.999986,0.999127,0.982068
0.993807,0.998823,0.994008,0.994299
0.225917,0.000000,0.407494,0.400424
0.881150,0.999427,0.949031,0.994918
Problem is that pattern-recogition tools from Matlab Neural Toolbox (like patternnet) threat source data like independant entrues. But I have strong belief that results will be precise only if net take decision based on the history of previous correlations.
But I also did not manage to get valid response from reccurent nets which serve time series analysis (like delaynet and narxnet).
narxnet and delaynet return lousy result and it looks like these types of networks not supposed to solve classification tasks. I am not insert any code here while it is allmost totally autogenerated with use of Matlab Neural Toolbox GUI.
I would apprecite any help. Especially, some advice which tool fits better for accomplishing my task.
I am not sure how difficult to classify this problem.
Given your sample, 4 input and 1 output feed-forward neural network is sufficient.
If you insist on using historical inputs, you simply pre-process your input d, such that
Your new input D(t) (a vector at time t) is composed of d(t) is a 1x4 vector at time t; d(t-1) is 1x4 vector at time t-1;... and d(t-k) is a 1x4 vector at time t-k.
If t-k <0, just treat it as '0'.
So you have a 1x(4(k+1)) vector as input, and 1 output.
Similar as Dan mentioned, you need to find a good k.
Speaking of the weights, I think additional pre-processing like windowing method on the input is not necessary, since neural network would be trained to assign weights to each input dimension.
It sounds a bit messy, since the neural network would consider each input dimension independently. That means you lose the information as four neighboring correlations.
One possible solution is the pre-processing extracts the neighborhood features, e.g. using mean and std as two features representative for the originals.

How to use LibSVM for multiple descriptors for image classification - Matlab

I need to classify pairs of image and indicate whether they're the same of not. I use several descriptors as SIFT LBP and more.
I want now to use LIBSVM to do the training and test.
how can I use teh svmTrain.
should I save only the distance between 2 descriptors and then just have 1 1:SIftDelta, 2:LBPDelta
is this the correct way or is there any better approach?
thanks
I'm not sure this is the right forum for this question, as it deals more with "high level" notions of learning, rather the specific implementation of it in Matlab.
Having said that, it seems like you are trying to combine multiple cues for learning, which is not a trivial task.
I can propose two methods for you:
Direct method - just concatenate all your descriptors into a single, very long, one and do the learning in this high dimensional space.
Do the learning in two stages (consequently, you'll have to partition your training data into two):
At the first stage, learn K classifiers, each using a different descriptor (assuming you wish to use K different descriptors).
Then, at the second stage, (using the reminder of your training data), you classify each example using the K classifiers you have: this will give you a new K-dimensional feature vector for each sample (you can put the classification result, or use the distance from the separating hyper plane to populate the k-th entry in the new descriptor). Now you can train a second classifier on the new K-dimension vectors. This second classifier gives you the final output of your multi-descriptor system.
-Enjoy!