Does heuristics in constraint satisfaction problems ensure no backtracking? (when there exists a solution) - lisp

I'm doing a map-coloring problem with Scheme, and I used minimum remaining values (Select the vertex with the fewest legal colors) and degree heuristics select the vertex that has the largest number of neighbors). If there exists a solution for a certain configuration, will these heuristics ensures that it won't need to backtrack?

Let's do a simple theoretical analysis.
Graph coloring is NP-complete for general graphs (if not asking for a coloring with less than 4 colors). This means there exists no known polynomial time algorithm.
Your heuristic is computable in polynomial time.
Assuming you need no backtracking, then you need to make n steps, each of which requires polynomial time (n is number of vertices). Thus you can solve the problem in polynomial time.
Either you have proven P=NP or your assumption is wrong.
I leave it up to you to decide upon which option in point (4) is more plausible.

In general: no, MRV and your other heuristic will not guarantee a straight walk to the goal. (I imagine they might if your problem has some very specific structure, but don't count on it until you've seen the theorem.)

Heuristics prune the search space, or change the order of the search to make an early termination more likely. This is not the same thing as backtracking.
But it's a related concept.
We prune some spaces because we are confident that the solution does not lie in those branches of the search tree, or change the order because we have some reason to believe that it will be quicker if we look in some subtrees before others.
We also cut ourselves off from backtracking because we are confident that the solution is in the branch of the space we are in now (so that if we don't find it in this subtree, we can declare failure and don't bother).
Both kinds of strategies are ultimately about searching less of the space somehow and getting to the answer (positive or negative) without searching everything.
MRV and the degrees heuristic are about reordering the sub-searches, not about avoiding backtracking. Heuristics can be right and make a short search but that's not the same
thing as eliminating backtracking (e.g. the "cut" operator in Prolog). When you find what you're looking for, you can declare success, and of course that eliminates further backtracking. But real backtracking elimination means making a decision not to backtrack no matter what, before the search completes.
E.g. if you're doing a depth-first search, and you find what you're looking for by dumb luck without backtracking, we cannot say that dumb luck is a fence operation that eliminates backtracking. :)

Related

Implementating spell drawing/casting mechanism in Luau (Roblox)

I am coding a spell-casting system where you draw a symbol with your wand (mouse), and it can recognize said symbol.
There are two methods I believe might work; neural networking and an "invisible grid system"
The problem with the neural networking system is that It would be (likely) suboptimal in Roblox Luau, and not be able to match the performance nor speed I wish for. (Although, I may just be lacking in neural networking knowledge. Please let me know whether I should continue to try implementing it this way)
For the invisible grid system, I thought of converting the drawing into 1s and 0s (1 = drawn, 0 = blank), then seeing if it is similar to one of the symbols. I create the symbols by making a dictionary like:
local Symbol = { -- "Answer Key" shape, looks like a tilted square
00100,
01010,
10001,
01010,
00100,
}
The problem is that user error will likely cause it to be inaccurate, like this "spell"'s blue boxes, showing user error/inaccuracy. I'm also sure that if I have multiple Symbols, comparing every value in every symbol will surely not be quick.
Do you know an algorithm that could help me do this? Or just some alternative way of doing this I am missing? Thank you for reading my post.
I'm sorry if the format on this is incorrect, this is my first stack-overflow post. I will gladly delete this post if it doesn't abide to one of the rules. ( Let me know if there are any tags I should add )
One possible approach to solving this problem is to use a template matching algorithm. In this approach, you would create a "template" for each symbol that you want to recognize, which would be a grid of 1s and 0s similar to what you described in your question. Then, when the user draws a symbol, you would convert their drawing into a grid of 1s and 0s in the same way.
Next, you would compare the user's drawing to each of the templates using a similarity metric, such as the sum of absolute differences (SAD) or normalized cross-correlation (NCC). The template with the lowest SAD or highest NCC value would be considered the "best match" for the user's drawing, and therefore the recognized symbol.
There are a few advantages to using this approach:
It is relatively simple to implement, compared to a neural network.
It is fast, since you only need to compare the user's drawing to a small number of templates.
It can tolerate some user error, since the templates can be designed to be tolerant of slight variations in the user's drawing.
There are also some potential disadvantages to consider:
It may not be as accurate as a neural network, especially for complex or highly variable symbols.
The templates must be carefully designed to be representative of the expected variations in the user's drawings, which can be time-consuming.
Overall, whether this approach is suitable for your use case will depend on the specific requirements of your spell-casting system, including the number and complexity of the symbols you want to recognize, the accuracy and speed you need, and the resources (e.g. time, compute power) that are available to you.

checking for convergence in complex hierarchical models JAGS

I have estimated a complex hierarchical model with many random effects, but don't really know what the best approach is to checking for convergend. I have complex longitudinal data from a few hundred individuals and estimate quite a few parameters for every individual. Because of that, I have way to many traceplots to inspect visually. Or should I really spend a day going through all the traceplots? What would be a better way to check for convergence? Do I have to calculate Gelman and Rubin's Rhat for every parameter on the person level? And when can I conclude that the model converged? When absolutely all of the thousends of parameters reached convergence? Is it even sensible to expect that? Or is there something like "overall convergence"? And what does it mean when some person-level parameters did not converge? Does it make sense to use autorun.jags from the R2jags package with such a model or will it just run for ever? I know, these are a lot of question, but I just don't know how to approach that.
The measure I am using for convergence is a potential scale reduction factor (psrf)* using the gelman.diag function from the R package coda.
But nevertheless, I am also quickly visually inspecting all the traceplots, even though I also have tens/hundreds of them. It can be really fast if you put them in PNG files and then quickly go through them using e.g. IrfanView (let me know if you need me to expand on this).
The reason you should inspect the traceplots is pretty well described by an example from Marc Kery (author of great Bayesian books): see "Never blindly trust Rhat for convergence in a Bayesian analysis", here I include a self explanatory image from this email:
This is related to Rhat statistics while I use psrf, but it's pretty likely that psrf suffers from this too... and better to check the chains.
*) Gelman, A. & Rubin, D. B. Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–472 (1992).

Is there a way to verify a common seed to a cumulative sequence of hashes with unknown repetitions between each value presented?

I am writing a variant of the Cuckoo Cycle that uses an adjacency list for presenting solutions from two pairs of 8 bit coordinates, and I am not having any problems finding what I think should be an optimal solver for it, that uses two pairs of head/tail binary search trees to keep track of possible solution nodes, reject (branches) nodes and a binary tree that keeps a list of the candidate cycles as they are being assembled (as I understand it, binary search trees shorten the amount of processing for finding duplicates), but I need to refine the verifier function for solutions.
I see in Cuckoo that there is some process by which it modifies the edges with XOR functions and masks to identify a valid cycle, but I have two issues.
One is that each hash is generated from the previous hash, starting with the nonce, and proving that all offered node/edge pairs are valid derivatives from the nonce seems to me to require the verifier to repeat the hash function each time checking for a match until it gets a hit, which could be up to several thousand, in the worst case. Is there some property that can be used to shortcut this identification process, since unlike protection against DoS, we are providing the salt of the hash?
Second is that even if the presented cycle is perfectly valid, it is possible that one or more of the node/edge pairs in the cycle has a duplicate coordinate. The hashes are 32 bits long, and each coordinate is 8 bits. The answer to this probably has some relation to the previous question also, as having the seed for a hash function is a known security risk because of collisions. So obviously, as well as verifying the nodes are part of a cycle in the lowest possible values in the finite field, I need a way to be sure that a pair does not overlap with another possible, and branching pair.
I will be studying the verifier closer in the Cuckoo Cycle implementation to see if I can figure out how the algorithm ensures it is not approving a cycle that actually has a branch (and thus is invalid), but I thought I'd pop the question on this site in case someone knows better the ways of recognising hashes from a common seed, and if there is any way to recognise a 50% collision between a given coordinate and another one.
Note: After thinking about it for a while, I realised that I could solve the 'fake cycle' with one or more nodes having a branch by simply splitting the heads and tails into separate hashes, subsequent (odd then even), such as Murmur3 16 bit hashes.
And further thinking about it, I realised that Cuckoo Cycle is actually a special type of hash collision search that seeks only collisions that occur only once in the low order of the finite field. I am devising a new scheme called Hummingbird, which instead will not target the smallest numbers (which is also the same thing done by hashcash) but instead will target the most proximate hashes in a chain to the seed nonce. This will mean that attempts to insert branched nodes in the graph of the solution will be discovered in the verification. Which will probably take about 2-5 seconds depending on how deep. These solutions could be eliminated by specifying a maximum hash chain length as part of the consensus.
I just wanted to add that I answered my own question by realising that I am looking for, essentially, a hash collision, in my algorithm, and the simplest solution, with the least bit-twiddling was to make each coordinate a distinct hash in a hash chain (hash of nonce, then hash of hash, etc)
I didn't understand fully that Cuckoo Cycle is essentially a search for partial hash collisions, and when that dawned on me, I realised that the simple solution is to just make it into a search for hash collisions.
I have, from this realisation, moved very quickly forward to figuring out how my variation of Cuckoo can be much more simply implemented, as well as how to structure the B-tree based progressive search algorithm, the difficulty adjustment, and the rest.
I wasn't aware there was a stackexchange specialist site for math, or cryptography, or I would have posted it there instead. I studied FEC a few months ago and that opened the floodgates to a whole bunch of other ideas that led me to getting so worked up about Cuckoo Cycle. I believe I have generalised the Cuckoo Cycle into a generic, parameterisable graph theoretic proof of work and I will get back to finishing my implementation.
Thanks to everyone who submitted an answer, I will upvote as I deem correct, though I have zero or nearly zero rep, for what it's worth.

Is backtracking considered an heuristic?

More specifically, I am trying to figure out if the following statement is correct:
Every BackTracking is an Heuristic but not every Heuristic is a BackTracking.
Am I right? Cause I feel I am missing something and messing things up.
I think the only question here is: "Is every backtracking a heuristic?"
Backtracking is a general algorithm for finding all (or some) solutions to SOME computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions and abandons each partial candidate as soon as it determines that it cannot possibly be completed to a valid solution.
Backtracking should be relatively fast, therefor it is used to determine whether or not a candidate is a valid solution (a rough selection is done). As backtracking obviosly not work with precise solutions, it is definitely a general heuristic method.
Of course, backtracking is not the only metaheuristic principle, so the second part of the sentence does not make sense.

improve hashing using genetic programming/algorithm

I'm writing a program which can significantly lessen the number of collisions that occur while using hash functions like 'key mod table_size'. For this I would like to use Genetic Programming/Algorithm. But I don't know much about it. Even after reading many articles and examples I don't know that in my case (as in program definition) what would be the fitness function, target (target is usually the required result), what would pose as the population/individuals and parents, etc.
Please help me in identifying the above and with a few codes/pseudo-codes snippets if possible as this is my project.
Its not necessary to be using genetic programming/algorithm, it can be anything using evolutionary programming/algorithm.
thanks..
My advice would be: don't do this that way. The literature on hash functions is vast and we more or less understand what makes a good hash function. We know enough mathematics not to look for them blindly.
If you need a hash function to use, there is plenty to choose from.
However, if this is your uni project and you cannot possibly change the subject or steer it in a more manageable direction, then as you noticed there will be complex issues of getting fitness function and mutation operators right. As far as I can tell off the top of my head, there are no obvious candidates.
You may look up e.g. 'strict avalanche criterion' and try to see if you can reason about it in terms of fitness and mutations.
Another question is how do you want to represent your function? Just a boolean expression? Something built from word operations like AND, XOR, NOT, ROT ?
Depending on your constraints (or rather, assumptions) the question of fitness and mutation will be different.
Broadly fitness is clearly minimize the number of collisions in your 'hash modulo table-size' model.
The obvious part is to take a suitably large and (very important) representative distribution of keys and chuck them through your 'candidate' function.
Then you might pass them through 'hash modulo table-size' for one or more values of table-size and evaluate some measure of 'niceness' of the arising distribution(s).
So what that boils down to is what table-sizes to try and what niceness measure to apply.
Niceness is context dependent.
You might measure 'fullest bucket' as a measure of 'worst case' insert/find time.
You might measure sum of squares of bucket sizes as a measure of 'average' insert/find time based on uniform distribution of amongst the keys look-up.
Finally you would need to decide what table-size (or sizes) to test at.
Conventional wisdom often uses primes because hash modulo prime tends to be nicely volatile to all the bits in hash where as something like hash modulo 2^n only involves the lower n-1 bits.
To keep computation down you might consider the series of next prime larger than each power of two. 5(>2^2) 11 (>2^3), 17 (>2^4) , etc. up to and including the first power of 2 greater than your 'sample' size.
There are other ways of considering fitness but without a practical application the question is (of course) ill-defined.
If your 'space' of potential hash functions don't all have the same execution time you should also factor in 'cost'.
It's fairly easy to define very good hash functions but execution time can be a significant factor.