Indecision about the minimization of automata - minimize

I have to minimize some finite state automata but I was born a doubt: before this transformation the automata must be proved ε-productions? Or can I leave them?
The automata is this: this http://www.mediafire.com/convkey/3fad/yk24smo642ozob0fg.jpg

The standard minimization algorithm works on deterministic finite automata. The lambda transitions mean this is not a DFA. So you must eliminate the lambdas first.
It's pretty easy to see this machine accepts a*|ab. The DFA for this is pretty easy to design and minimize intuitively.
Or you can go through the formal algorithms for NFA-lambda to DFA conversion followed by minimization.

Related

Certified calculations in a proof assistant

Symbolic calculations performed manually or by a computer algebra system may be faulty or hold only subject to certain assumptions. A classical example is sqrt(x^2) == x which is not true in general but it does hold if x is real and non-negative.
Are there examples where proof assistants/checkers such as Coq, Isabelle, HOL, Metamath, or others are used to certify correctness of symbolic calculations? In particular, I am interested in calculus and linear algebra examples such as solving definite or indefinite integrals, differential equations, and matrix equations.
Update:
To be more concrete, it would be interesting to know whether there are examples of undergraduate assignments in calculus and linear algebra that could be formally solved (possibly with the help of a proof assistant) such that the solution can be automatically verified by a proof checker. A very simple example assignment for Lean is here.
For the Coq proof assistant there are several libraries to help with that. One matching your request quite well is Coquelicot (https://gitlab.inria.fr/coquelicot/coquelicot). The Coquelicot team made an exercise and participated in the French baccalauréat - I would say comparable more to a college than a high school math exam - and finished proofs for a good part of the exercises. The proofs can be found in the examples here (https://gitlab.inria.fr/coquelicot/coquelicot/-/tree/master/examples). I thought about translating the exercises and solutions to English.
But this was quite a few years ago and meanwhile there are very powerful tools for specific applications. E.g. there is coq-interval (https://gitlab.inria.fr/coqinterval/interval) which fully automatically does Coq proofs of rather complicated inequalities, say that a high order polynomial matches a sine function in a certain interval with a certain maximum deviation. It does this by Taylor decomposition and computing upper bounds for the residual. It can also do error proofs for a wide range of numerical integrals. A new feature added recently is the ability to do proven correct plots.
A tool for proving in Coq the error between infinite precision real and floating point computations is Gappa (https://gitlab.inria.fr/gappa/gappa).
Another very interesting Coq development is CoRN (https://github.com/coq-community/corn), a formalization of constructive reals in Coq. Constructive Reals are true real numbers which do compute. Essentially a constructive real number is an algorithm to compute a number to any desired precision together with a proof that this algorithm converges. One can prove that such numbers fulfill all usual properties of real numbers. An interesting side effect of constructive reals is that they need only LPO as axiom, while in classical reals the existence of the real numbers itself is an axiom. Any computation you do in CoRN, say pi>3, is automatically proven correct.
All these tools are included in Coq Platform, a common distribution of the Coq proof assistant.
There is more and this is steadily increasing. I would say it is not that far in the future that we have a usable proven correct CAS.
The only thing that comes to my mind is that Isabelle/HOL can replay SMT proofs (as produced e.g. by Z3 or CVC4), e.g. involving integer and real arithmetic. For computer algebra systems, I don't know of any comparable examples.
The problem is that computer algebra systems tend not to be set up in a way where they can output a detailed certificate for their simplifications – if they were able to do that, one could attempt to replay that in a theorem prover. But it would have to go beyond purely equational reasoning, since many rules (such as your example) require proving inequalities as preconditions.
If computer algebra systems were able to output a trace of their computations as a list of rewrite rules that were used, including how to prove each of their preconditions, one could in principle replay such a trace in a theorem prover – but that would of course require that every rule used by the CAS has a corresponding rule in the theorem prover (this is roughly how replaying SMT proofs works in Isabelle). However, I do not know of any projects like this.
There are, on the other hand, various examples where CASs are used to compute some easily verifiable (but hard to compute) result, e.g. factoring a polynomial, isolating the roots of a real polynomial, Wilf–Zeilberger witnesses, and then verifying that this is really a valid result in a theorem prover. However, this does not involve certifying the computation process of the CAS, just the result.
for demonstration purposes, I prepared a small "fake exercise" both to illustrate what it means to verify a calculation and to illustrate the most graphical approaches available in Coq (this shows some of the things you can do in Nov. 2021).
It can be seen on github at github.com:ybertot/osxp_demos_coq, especially the file sin_properties.v.
The demonstration follows this path:
Show that we can state and prove automatically a statement giving a "safe approximation" of PI (that's the name of the mathematical constant in the Coq library).
Show that Coq can be used to plot a known mathematical function, in this case the sin function between 0 and PI. This relies on a connexion to gnuplot for the graphical display. I am afraid that gnuplot will not be included in the Coq platform mentioned by M. Soegtrop in another answer.
Show that we can also plot the function sin(1/x) using Coq
(the plot is actually preserved as a pdf file on the github repository)
Show that a generic function plotter actually returns a misleading result in that case (the generic function plotter is gnuplot).
The misleading plot is also given in the github repository as a pdf file.
The next step is to show that we can prove guaranties of intervals for some computations, sometimes automatically using the interval tactic, and sometimes the interval tactic fails to conclude. The important point here is that the command fails to conclude, instead of giving an answer that cannot be trusted. When this happens, users can rely on knowledge and mathematical reasoning to obtain the desired result. The demo shows how to prove that for any x in a certain range, the sin function is guaranteed to have a positive value.
The next step is about proving that sin x < x for every positive x, it shows that mathematical reasoning can rely on various techniques of mathematics:
decomposing the interval in two parts,
using the mean value theorem,
computing the derivative of x - sin x (and this can be done automatically in Coq),
relying on the fact that cos is known to be strictly decreasing between 0 and PI.
This is just a short demo, which is also meant to explain how a theorem prover has to be used differently from a pocket calculator, because just returning an approximation without qualification for the value of a mathematical formula is a process that cannot really be trusted.
The original question also includes questions about computing integrals. The interval package also contains facilities for this.

How does MATLAB's fit() function differentiate arbitrary MATLAB expressions for Levenberg-Marquardt to be applicable?

As far as I understand the LM algorithm, it is an improvement over the Newton's method, so very roughly speaking, an algorithm which tries to build a path in the parameter space, leading to the point where the error function is minimal, which follows the direction of the biggest gradient of error function (differentiated with respect to the parameters).
I have written a Newton's method optimizer for a neural network once, as an exercise, and the critical part of the algorithm was that we could apply the chain rule (error backpropagation) to compute the gradient. And it was me who used the chain rule to the write out a formula for the gradients. (Essentially by symbolic differentiating on paper once and coding the resulting formula.)
In MATLAB (Curve Fitting Toolbox), there is a standard fit() function, which claims to use Levenberg-Marquardt's method to fit basically any parametric MATLAB expression as well as a set of prepared models.
Well, I suspect that the prepared models could be pre-differentiated by Mathworks' engineers to generate the code for the gradients. But what about the 'arbitrary' fits?
Is MATLAB trying to do symbolic differentiation implicitly? I highly doubt that anyone can write rules for differentiation of all the complex MATLAB constructions, i.e. classes and enumerations.
Or, maybe MATLAB is just differentiating the function by evaluating it in ξ and ξ+Δξ and dividing by Δξ? But that would require finding the best shift and require n+1 function evaluations, where n is the number of parameters optimized.
And in any case, even this strategy would fail if the function is not differentiable, which I suspect to be the case for almost any general MATLAB expression.
Could anyone give a plausible hypothesis of what is actually happening inside?
(Well, knowing what actually happens inside would be even better, but even an informal insight would be great.)

Neural network: Just some regression?

I just started reading about neural networks. I thought they are something magic and extremly intelligent, but at the end of the day it just seems to be a large math function with many "undefined" constants? The learning is just another way for some kind of (more or less "stupid") regression? Is this true? For me this seems not to be very brilliant, so i am a bit surprised why this works that well.
Thank you very much
It has been proved that an artificial neural network with just one hidden layer is a universal approximator; that is, under proper parameterisation, it can approximate any continuous function (see universal approximation theorem). More importantly, as the Wikipedia article mentions:
Work by Hava Siegelmann and Eduardo D. Sontag has provided a proof that a specific recurrent architecture with rational valued weights (as opposed to full precision real number-valued weights) has the full power of a Universal Turing Machine using a finite number of neurons and standard linear connections.
This means that, at least in theory, a neural net is as much as clever as your expensive PC. And this is true without taking into account all modern extensions, e.g. as in Long-Short Term Memory networks. As one of the comments mentions, though, the real problem is learnability, i.e. how to find the right set of parameters for the task under consideration.

kmean clustering: variable selection

I'm applying a kmean algorithm for clustering my customer base. I'm struggling conceptually on the selection process of the dimensions (variables) to include in the model. I was wondering if there are methods established to compare among models with different variables. In particular, I was thinking to use the common SSwithin / SSbetween ratio, but I'm not sure if that can be applied to compare models with a different number of dimensions...
Any suggestions>?
Thanks a lot.
Classic approaches are sequential selection algorithms like "sequential floating forward selection" (SFFS) or "sequential floating backward elimination (SFBS). Those are heuristic methods where you eliminate (or add) one feature at the time based on your performance metric, e.g,. mean squared error (MSE). Also, you could use a genetic algorithm for that if you like.
Here is an easy-going paper that summarizes the ideas:
Feature Selection from Huge Feature Sets
And a more advanced one that could be useful: Unsupervised Feature Selection for the k-means Clustering Problem
EDIT:
When I think about it again, I initially had the question in mind "how do I select the k (a fixed number) best features (where k < d)," e.g., for computational efficiency or visualization purposes. Now, I think what you where asking is more like "What is the feature subset that performs best overall?" The silhouette index (similarity of points within a cluster) could be useful, but I really don't think you can really improve the performance via feature selection unless you have the ground truth labels.
I have to admit that I have more experience with supervised rather than unsupervised methods. Thus, I typically prefer regularization over feature selection/dimensionality reduction when it comes to tackling the "curse of dimensionality." I use dimensionality reduction frequently for data compression though.

Why do sigmoid functions work in Neural Nets?

I have just started programming for Neural networks. I am currently working on understanding how a Backpropogation (BP) neural net works. While the algorithm for training in BP nets is quite straightforward, I was unable to find any text on why the algorithm works. More specifically, I am looking for some mathematical reasoning to justify using sigmoid functions in neural nets, and what makes them mimic almost any data distribution thrown at them.
Thanks!
The sigmoid function introduces non-linearity in the network. Without a non-linear activation function, the net can only learn functions which are linear combinations of its inputs. The result is called universal approximation theorem or Cybenko theorem, after the gentleman who proved it in 1989. Wikipedia is a good place to start, and it has a link to the original paper (the proof is somewhat involved though). The reason why you would use a sigmoid as opposed to something else is that it is continuous and differentiable, its derivative is very fast to compute (as opposed to the derivative of tanh, which has similar properties) and has a limited range (from 0 to 1, exclusive)