Can I formalize an expert system as a OWL knowledge base, rules, input and classes as a classifikator function like a neural network? - classification

A neural network f is defined as a function that maps instances of input X to labels of Y as follows:
f: X --> Y
Let us assume that an expert system has a knowledge base K modelled as a OWL ontology, rules are expressed in SWRL and the set of rules is represented by R. For classification, X is loaded / imported into K (i.e. the ontology) where every instance of x is represented as an individual in K, thus X ⊆ K.
The classification is done by applying the rules of R on K that contains X. The classification (defining the type) of instances (i.e. individuals of X) to a class of Y is conducted by a rule engine.
Defining my expert system w.r.t. to the definition of a neural network, can I say that: the expert system's rule engine is f that infer based on R to which class y_i of Y ⊆ K the inputs of x_i of X are a type of?
Meaning the expert system also can be formalized as:
f: X --> Y
I think it should be possible since in both cases we have a classificator. But how can I express the difference of the inner working of f, thus, if is a neural network or an expert system?
Thanks for any comments on this

Related

Is it possible to train a neural netowrk using a loss function on unseen data (data different from input data)?

Normally, a loss function may be defined as
L(y_hat, y) or L(f(X), y), where f is the neural network, X is the input data, y is the target.
Is it possible to implement (preferably in PyTorch) a loss function that depends not only on the input data X, but also on X' (X != X)?
For example, let's say I have a neural network f, input data (X,y) and X'. Can I construct a loss function such that
f(X) is as close as possible to y, and also
f(X') > f(X)?
The first part can be easily implemented (PyTorch: nn.MSELoss()), the second part seems to be way harder.
P.S: this question is a reformulation of Multiple regression while avoiding line intersections using neural nets, which was closed. In the original data, input data and photos with a theoretical example are available.
Yes it is possible.
For instance, you can add a loss term using ReLU as follows:
loss = nn.MSELoss()(f(X),y) + lambd * nn.ReLU()(f(X)-f(X'))
where lambd is a hyperparameter.
Note that this corresponds to f(X') >= f(X) but it's easily modifiable to f(X') > f(X) by adding an epsilon<0 (small enough in absolute value) constant inside the ReLU.

Choice of cost function in Michael Neilsen's book: Neural Networks and deep learning

Here, w denotes the collection of all weights in the network, b all
the biases, n is the total number of training inputs, a is the vector
of outputs from the network when x is input, and the sum is over all
training inputs, x. Of course, the output aa depends on x, w and b,
but to keep the notation simple I haven't explicitly indicated this
dependence.
Taken from Michael Neilsen's Neural Network and Deep Learning
Does anyone know why he divides the sum by 2? I thought he was going to find the average by dividing by n; instead, he divides by 2n.
This is done so that when the partial derivatives of C(w, b) are computed, it will counter out with the 2 that is produced by the derivative of the quadratic term.
You are correct, normally we'd divide by n, but this trick is done for computational ease.

PCA (Principle Component Analysis) on multiple datasets

I have a set of climate data (temperature, pressure and moisture for example), X, Y, Z which are matricies with dimensions (n x p) where n is the number of observations and p is the number of spatial points.
Previously, to investigate modes of variability in dataset X, I simply performed a empirical orthogonal function (EOF) analysis OR Principle component Analysis (PCA) on X. This involved decomposing (via SVD), the matrix X.
To investigate the coupling of the modes of variability of X and Y, I used maximum covariance analysis (MCA) which involved decomposing a covariance matrix proportional to XY^{T}. (T is transpose)
However, if I wish to looked at all three datasets, how do I go about doing this? One idea I had was to form a fourth matrix, L, which will be the 'feature' concatenation of the three datasets:
L = [X, Y, Z]
so that my matrix L will have dimensions (n x 3p).
I would then use standard PCA/EOF analysis and use SVD to decompose this matrix L and then I would obtain modes of variabiilty with size (3p x 1) and thus subsequently the mode associated with X is the first p values, the mode associated with Y is the second set of p values and the mode associated with Z is the last p values.
Is this correct? Or can anyone suggest a better way of looking at the coupling of all three (or more) datasets?
Thank you so much!
I'd recommend to treat spatial points as extra dimension, i.e. f x n x p, where 'f' is your number of features. At this point you should use multilinear extension of PCA that can work on tensor data.

State space system formation from SISO transfer functions

I want to form a MIMO state-space system from SIMO Transfer functions. Lets say that the system has 2 inputs (U1 and U2) and 2 states (X1 and X2).
If I apply U1 to the LTI system, I get X1 and X2 in a SIMO operation. And, I can extract 2 transfer functions: T11 (X1/U1) and T21 (X2/U1).
Similarly, if I apply U2 input, I can get T12 (X1/U2) and T22 (X2/U2).
So, I have 4 SISO transfer functions of the system.
I want to use them to generate a state space matrix of the system. How can I do that?
Thanks in advance.
In general, there are infinitely many choices for state-space representations of your matrix transfer function (T). It is sensible to choose the one with lowest order (smallest number of states) often called the "minimal realization."
There are many approaches to computing the minimal realization. Some are algorithmic starting with T and arriving at the minimal A*,B*,C*,D* immediately. Others suppose that you already found some non-minimal A,B,C,D by inspection, and then provide the procedure for transforming that non-minimal representation into the minimal one.Typically it is a matrix transformation of A and B into some canonical form that exposes uncontrollable (sometimes called "unreachable") states.
http://www.egr.msu.edu/classes/me851/jchoi/lecture/Lect_20.pdf
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-241j-dynamic-systems-and-control-spring-2011/readings/MIT6_241JS11_chap25.pdf
https://www.youtube.com/watch?v=cnbY2AUtGAY&t=2m14s
If you are less concerned with manual implementation, in MatLab use the function tf2ss.

How to use svmtrain() with a custom kernel in Matlab?

svmtain() is a function in MATLAB for SVM learning. The help doc is here:
http://www.mathworks.com/help/bioinfo/ref/svmtrain.html
How can I use it with a custom kernel? In the help doc, it says:
#kfun — Function handle to a kernel function. A kernel function must be of the form
function K = kfun(U, V)
The returned value, K, is a matrix of size M-by-N, where U and V have M and N rows respectively.
It mentions nothing about what U and V are and what M and N mean. I just don't know how to use it in the right format.
Can anyone tell me what U and V are and what M and N mean?
For example, the training data are 5-dimensional vectors and the kernel function is the sum of the length of the vectors. How can I write the kernel function?
Thank you!
just a guess:
according to: http://www.tech.dmu.ac.uk/~hseker/Statistics%20in%20Genetics/Statistical%20Learning%20and%20Visualization%20in%20MATLAB.doc , U, V should be just parameters in your functional prescription K, e.g. if your kernel is tanh, then:
function K = kfun(U,V,P1,P2)
K = tanh(U*V');
And P1, P2 are for some additional features of your respective kernel. But as I wrote in the comment, you need to be good mathematician to reach better results than the ones acquired by already defined kernels.
Kernel functions are one of the most common techniques used at Machine Learning algorithms. Here is definition of it from Wikipedia:
For machine learning algorithms, the kernel trick is a way of mapping
observations from a general set S into an inner product space V
(equipped with its natural norm), without ever having to compute the
mapping explicitly, in the hope that the observations will gain
meaningful linear structure in V.
i.e. this kernel is used at RBF:
K(x,y) = (x*y + c)^d
Here is an detailed explanation of Kernels: http://www.youtube.com/watch?v=bUv9bfMPMb4 by Andrew Ng.
There are some Kernels (i.e. Gaussian Kernel), kernels have same convention thats why it is generalized as K(u,v). You can try different kernels performances or you can search about related works about what you work on and try to use that kind of kernels.