Difference between Disjoint and Excess genes in NEAT? - neural-network

I was reading Stanley's paper but I couldn't figure out what exactly are Disjoint and Excess genes in NEAT. I understand they appear to be related in some particular way with the fact that all of them contain innovation numbers not pertaining to both parents. But what distinguishes them?
Could anyone shed some light into the issue?

When aligning two genomes by gene ID (innovation number), the mismatches at the ends are referred to as excess genes, and all other mismatches are referred to as disjoint genes. As far as I am aware no NEAT implementation has ever treated disjoint and excess genes differently. The distinction was made in early NEAT papers, most probably because treating the types of mismatches differently was being suggested as a possible future research topic.
(FYI - speaking as the author of SharpNEAT).

Related

NEAT algorithm: How to crossover disjoint and excess genes?

I am currently implementing the NEAT algorithm developed by Kenneth Stanley, taking the original paper as a reference.
In the section where the crossover method is described, one thing confuses me a little bit.
So, the above figure is illustrating the crossover method for NEAT. To decide from which parent a gene inherited, the paper says the following:
Matching genes are inherited
randomly, whereas disjoint genes (those that do not match in the middle) and excess
genes (those that do not match in the end) are inherited from the more fit parent.
For the matching genes (1 - 5) it's easy to understand. You just randomly inherit from either Parent1 or Parent2 (with 50% chance for both). But for the disjoint (6-8) and excess (9-10) genes you cannot inherit from the more fit parent because you only have those genes in either Parent1 or Parent2.
For example:
Parent1's fitness is higher than Parent2's. The disjoint gene 6 only exists in Parent2 (of course, because disjoint and excess genes only occur in one parent)
So, you cannot decide to inherit this gene from the more fit parent. Same goes for all other disjoint and excess genes. You can only inherit those from the parent they exist in.
So my question is: Do you maybe inherit all matching genes from the more fit parent and just take over the disjoint and excess genes? Or do i missunderstand something here?
Thanks in advance.
It might help to look at the actual implementation and see how it is handled. In the original C++ code here (look at lines 2085 onwards), the disjoint and excess genes from the unfit parent seem to be just skipped.
In your implementation, you could inherit disjoint and excess genes from the unfit parent but disable them with probability 1 so you can do pointwise mutations on them (toggle disabled to enabled) later on. However, this might result in significant genome bloat, so test and see what works.
It makes more sense to take mismatching genes only from 'more fit parent'. This will create strong offspring as a result of crossover. For matching genes, apply usual crossover operator. For improving diversity, create second offspring by random selection of mismatching genes from two parents.
In this way, First offspring will be more fit and second offspring will maintain diversity. Hope, this helps.
The graphic depicts the special case of two parents with the same fitness, so selection is random again and therefore could lead to the depicted case. I agree that it is misleading without that additional piece of information.

NEAT Two Identical Genes with Different Innovation Numbers

In paper written about the NEAT algorithm, the author states
A possible problem is that the same structural innovation will receive different innovation numbers in the same generation if it occurs by chance more than once. However, by keeping a list of the innovations that occurred in the current generation, it is possible to ensure that when the same structure arises more than once through independent mutations in the same generation, each identical mutation is assigned the same innovation number.
This makes sense because you don't want identical genes to end up with different innovation numbers. If they did, there would be problems when crossing over two genomes with identical genes but different innovation numbers because you would end up with an offspring with a copy of each gene from each parent, creating the same connection twice.
What doesn't make sense to me though is what happens if a mutation occurs between two genes, and then in the next generation the same mutation occurs? In the paper, it's very clear that only a list of mutations in the current generation is kept to avoid an "explosion of innovation numbers" but doesn't specify anything about what happens if the same mutation occurs across different generations.
Do you keep a global list of gene pairs and their corresponding innovation numbers to prevent this problem? Is there a reason why the paper only specifies what happens if the same mutation happens in the same generation and doesn't consider the case of cross-generational mutations?
No. You don't keep a global list of gene pairs. You can if you want to avoid the same mutation from happening. But something that I want to point out: it doesn't matter. The only effect of the same mutation happening is that you will do some unnecessary computation and your global innovation number will increase.
For future genomes however, there is no chance that they will have two innovation numbers that are the same innovation.
Matching genes are inherited
randomly, whereas disjoint genes (those that do not match in the middle) and excess
genes (those that do not match in the end) are inherited from the more fit parent
So when two identical innovations occur, they will be either a disjoint or excess gene. These will get inherited from the more fit parent and only one parent can be fitter, so óne offspring will never have the same innovation genes.

Grouping similar words (bad , worse )

I know there are ways to find synonyms either by using NLTK/pywordnet or Pattern package in python but it isn't solving my problem.
If there are words like
bad,worst,poor
bag,baggage
lost,lose,misplace
I am not able to capture them. Can anyone suggest me a possible way?
There have been numerous research in this area in past 20 years. Yes computers don't understand language but we can train them to find similarity or difference in two words with the help of some manual effort.
Approaches may be:
Based on manually curated datasets that contain how words in a language are related to each other.
Based on statistical or probabilistic measures of words appearing in a corpus.
Method 1:
Try Wordnet. It is a human-curated network of words which preserves the relationship between words according to human understanding. In short, it is a graph with nodes as something called 'synsets' and edges as relations between them. So any two words which are very close to each other are close in meaning. Words that fall within the same synset might mean exactly the same. Bag and Baggage are close - which you can find either by iteratively exploring node-to-node in a breadth first style - like starting with 'baggage', exploring its neighbors in an attempt to find 'baggage'. You'll have to limit this search upto a small number of iterations for any practical application. Another style is starting a random walk from a node and trying to reach the other node within a number of tries and distance. It you reach baggage from bag say, 500 times out of 1000 within 10 moves, you can be pretty sure that they are very similar to each other. Random walk is more helpful in much larger and complex graphs.
There are many other similar resources online.
Method 2:
Word2Vec. Hard to explain it here but it works by creating a vector of a user's suggested number of dimensions based on its context in the text. There has been an idea for two decades that words in similar context mean the same. e.g. I'm gonna check out my bags and I'm gonna check out my baggage both might appear in text. You can read the paper for explanation (link in the end).
So you can train a Word2Vec model over a large amount of corpus. In the end, you will be able to get 'vector' for each word. You do not need to understand the significance of this vector. You can this vector representation to find similarity or difference between words, or generate synonyms of any word. The idea is that words which are similar to each other have vectors close to each other.
Word2vec came up two years ago and immediately became the 'thing-to-use' in most of NLP applications. The quality of this approach depends on amount and quality of your data. Generally Wikipedia dump is considered good training data for training as it contains articles about almost everything that makes sense. You can easily find ready-to-use models trained on Wikipedia online.
A tiny example from Radim's website:
>>> model.most_similar(positive=['woman', 'king'], negative=['man'], topn=1)
[('queen', 0.50882536)]
>>> model.doesnt_match("breakfast cereal dinner lunch".split())
'cereal'
>>> model.similarity('woman', 'man')
0.73723527
First example tells you the closest word (topn=1) to words woman and king but meanwhile also most away from the word man. The answer is queen.. Second example is odd one out. Third one tells you how similar the two words are, in your corpus.
Easy to use tool for Word2vec :
https://radimrehurek.com/gensim/models/word2vec.html
http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf (Warning : Lots of Maths Ahead)

Does heuristics in constraint satisfaction problems ensure no backtracking? (when there exists a solution)

I'm doing a map-coloring problem with Scheme, and I used minimum remaining values (Select the vertex with the fewest legal colors) and degree heuristics select the vertex that has the largest number of neighbors). If there exists a solution for a certain configuration, will these heuristics ensures that it won't need to backtrack?
Let's do a simple theoretical analysis.
Graph coloring is NP-complete for general graphs (if not asking for a coloring with less than 4 colors). This means there exists no known polynomial time algorithm.
Your heuristic is computable in polynomial time.
Assuming you need no backtracking, then you need to make n steps, each of which requires polynomial time (n is number of vertices). Thus you can solve the problem in polynomial time.
Either you have proven P=NP or your assumption is wrong.
I leave it up to you to decide upon which option in point (4) is more plausible.
In general: no, MRV and your other heuristic will not guarantee a straight walk to the goal. (I imagine they might if your problem has some very specific structure, but don't count on it until you've seen the theorem.)
Heuristics prune the search space, or change the order of the search to make an early termination more likely. This is not the same thing as backtracking.
But it's a related concept.
We prune some spaces because we are confident that the solution does not lie in those branches of the search tree, or change the order because we have some reason to believe that it will be quicker if we look in some subtrees before others.
We also cut ourselves off from backtracking because we are confident that the solution is in the branch of the space we are in now (so that if we don't find it in this subtree, we can declare failure and don't bother).
Both kinds of strategies are ultimately about searching less of the space somehow and getting to the answer (positive or negative) without searching everything.
MRV and the degrees heuristic are about reordering the sub-searches, not about avoiding backtracking. Heuristics can be right and make a short search but that's not the same
thing as eliminating backtracking (e.g. the "cut" operator in Prolog). When you find what you're looking for, you can declare success, and of course that eliminates further backtracking. But real backtracking elimination means making a decision not to backtrack no matter what, before the search completes.
E.g. if you're doing a depth-first search, and you find what you're looking for by dumb luck without backtracking, we cannot say that dumb luck is a fence operation that eliminates backtracking. :)

Do hash functions exist in nature?

Given that there are quite a few bioinformaticians around these days, wondering if any of you have seen examples of hash functions (or mapping mechanisms) occuring in nature? And if so how do they work?
Smell is a sort of hashing.
A smelling hash function transforms a huge variety of chemical compounds to a small piece of information (several things may smell alike). Living organisms learn to distinguish the "buckets" such "hashing" puts the objects to, and tend to infer behavioral patterns that rely on such a function.
This applies to taste as well.
On a more humorous note, I would say that the nervous and sensory systems of most or all animals might be seen a hash function designed to divide things into:
things to mate with
things to eat
things to run away from
rocks.
Kudos to Terry Pratchett for identifying this elaborate system of categories. ;)
The visual system (lens, retina, cortex) is a suite of serially composed hashing functions.
(1) projection from 3d to 2d.
(2) photon to neural stimulus
(3) spatial to pattern
(4) spatial and pattern to temporal
(5) (in predator species with aligned pairs of eyes) two sets of all the above back to 3d.
etc.
Hubel and Wiesel got the Nobel in 1981 for figuring this out at the expense of a bunch of kittehs. They didn't call them hash functions, but that's actually a terrific way of representing them.
I guess turning a color image into a B&W one could be see has a mapping/hashing -- and such filter exist in the nature for photography.