Hashing a graph to find duplicates (including rotated and reflected versions)

Hashing a graph to find duplicates (including rotated and reflected versions) - hash

I am making a game that involves solving a path through graphs. Depending on the size of the graph this can take a little while so I want to cache my results.
This has me looking for an algorithm to hash a graph to find duplicates.
This is straightforward for exact copies of a graph, I simply use the node positions relative to the top corner. It becomes quite a bit more complicated for rotated or even reflected graphs. I suspect this isn't a new problem, but I'm unsure of what the terminology for it is?
My specific case is on a grid, so a node (if present) will always be connected to its four neighbors, north, south, east and west. In my current implementation each node stores an array of its adjacent nodes.
Suggestions for further reading or even complete algorithms are much appreciated.
My current hashing implementation starts at the first found node in the graph which depends on how i iterate over the playfield, then notes the position of all nodes relative to it. The base graph will have a hash that might be something like: 0:1,0:2,1:2,1:3,-1:1,

I suggest you do this:
Make a function to generate a hash for any graph, position-independent. It sounds like you already have this.
When you first generate the pathfinding solution for a graph, cache it by the hash for that graph...
...Then also generate the 7 other unique forms of that graph (rotated 90deg; rotated 270deg; flipped x; flipped y; flipped x & y; flipped along one diagonal axis; flipped along the other diagonal axis). You can of course generate these using simple vector/matrix transformations. For each of these 7 transformed graphs, you also generate that graph's hash, and cache the same pathfinding solution (which you first apply the same transform to, so the solution maps appropriately to the new graph configuration).
You're done. Later your code will look up the pathfinding solution for a graph, and even if it's an alternate (rotated, flipped) form of the graph you found the earlier solution for, the cache already contains the correct solution.
I spent some time this morning thinking about this and I think this is probably the most optimal solution. But I'll share the other over-analyzed versions of the solution that I was also thinking about...
I was considering the fact that what you really needed was a function that would take a graph G, and return the "canonical version" of G (which I'll call G'), AND the transform matrix required to convert G to G'. (It seemed like you would need the transform so you could apply it to the pathfinding data and get the correct path for G, since you would have just stored the pathfinding data for G'.) You could, of course, look up pathfinding data for G', apply the transform matrix to it, and have your pathfinding solution.
The problem is that I don't think there's any unambiguous and performant way to determine a "canonical version" of G, because it means you have to recognize all 8 variants of G and always pick the same one as G' based on some criteria. I thought I could do something clever by looking at each axis of the graph, counting the number of points along each row/column in that axis, and then rotating/flipping to put the more imbalanced half of the axis always in the top-or-left... in other words, if you pass in "d", "q", "b", "d", "p", etc. shapes, you would always get back the "p" shape (where the imbalance is towards the top-left). This would have the nice property that it should recognize when the graph was symmetrical along a given axis, and not bother to distinguish between the flipped versions on that axis, since they were the same.
So basically I just took the row-by-row/column-by-column point counts, counting the points in each half of the shape, and then rotating/flipping until the count is higher in the top-left. (Note that it doesn't matter that the count would sometimes be the same for different shapes, because all the function was concerned with was transforming the shape into a single canonical version out of all the different possible permutations.)
Where it fell down for me was deciding which axis was which in the canonical case - basically handling the case of whether to invert along the diagonal axis. Once again, for shapes that are symmetrical about a diagonal axis, the function should recognize this and not care; for any other case, it should have a criteria for saying "the axis of the shape that has the property [???] is, in the canonical version, the x axis of the shape, while the other axis will be the y axis". And without this kind of criteria, you can't distinguish two graphs that are flipped about the diagonal axis (e.g. "p" versus "σ"/sigma). The criteria I was trying to use was again "imbalance", but this turned out to be harder and harder to determine, at least the way I was approaching it. (Maybe I should have just applied the technique I was using for the x/y axes to the diagonal axes? I haven't thought through how that would work.) If you wanted to go with such a solution, you'd either need to solve this problem I failed to solve, or else give up on worrying about treating versions that are flipped about the diagonal axis as equivalent.
Although I was trying to focus on solutions that just involved calculating simple sums, I realized that even this kind of summing is going to end up being somewhat expensive to do (especially on large graphs) at runtime in pathfinding code (which needs to be as performant as possible, and which is the real point of your problem). In other words I realized that we were probably both overthinking it. You're much better off just taking a slight hit on the initial caching side and then having lightning-fast lookups based on the graph's position-independent hash, which also seems like a pretty foolproof solution as well.

Based on the twitter conversation, let me rephrase the problem (I hope I got it right):
How to compare graphs (planar, on a grid) that are treated as invariant under 90deg rotations and reflection. Bonus points if it uses hashes.
I don't have a full answer for you, but a few ideas that might be helpful:
Divide the problem into subproblems that are independently solvable. That would make
How to compare the graphs given the invariance conditions
How to transform them into a canonical basis
How to hash this canonical basis subject to tradeoffs (speed, size, collisions, ...)
You could try to solve 1 and 2 in a singe step. A naive geometric approach could be as follows:
For rotation invariance, you could try to count the edges in each direction and rotate the graph so that the major direction always point to the right. If there is no main direction you could see the graph as a point cloud of its vertices and use Eigenvectors and Priciple Compoment Analysis (PCA) to obtain the main direction and rotate it accordingly.
I don't have a smart solution for the reflection problem. My brute force way would be to just create the reflected graph all the time. Say you have a graph g and the reflected graph r(g). If you want to know if some other graph h == g you have to answer h == g || h == r(g).
Now onto the hashing:
For the hashing you probably have to trade off speed, size and collisions. If you just use the string of edges, you are high on speed and size and low on collisions. If you just take this string and apply some generic string hasher to it, you get different results.
If you use a short hash, with more frequent collisions, you can get achieve a rather small cost for comparing non matching graphs. The cost for matching graphs is a bit higher then, as you have to do a full comparison to see if they actually match.
Hope this makes some kind of sense...
best, Simon
update: another thought on the rotation problem if the edges don't give a clear winner: Compute the center of mass of the vertices and see to which side of the center of the bounding box it falls. Rotate accordingly.

Related

Compare two nonlinear transformed (monochromatic) images

Given are two monochromatic images of same size. Both are prealigned/anchored to one common point. Some points of the original image did move to a new position in the new image, but not in a linear fashion.
Below you see a picture of an overlay of the original (red) and transformed image (green). What I am looking for now is a measure of "how much did the "individual" points shift".
At first I thought of a simple average correlation of the whole matrix or some kind of phase correlation, but I was wondering whether there is a better way of doing so.
I already found that link, but it didn't help that much. Currently I implement this in Matlab, but this shouldn't be the point I guess.
Update For clarity: I have hundreds of these image pairs and I want to compare each pair how similar they are. It doesn't have to be the most fancy algorithm, rather easy to implement and yielding in a good estimate on similarity.

An unorthodox approach uses RASL to align an image pair. A python implementation is here: https://github.com/welch/rasl and it also
provides a link to the RASL authors' original MATLAB implementation.
You can give RASL a pair of related images, and it will solve for the
transformation (scaling, rotation, translation, you choose) that best
overlays the pixels in the images. A transformation parameter vector
is found for each image, and the difference in parameters tells how "far apart" they are (in terms of transform parameters)
This is not the intended use of
RASL, which is designed to align large collections of related images while being indifferent to changes in alignment and illumination. But I just tried it out on a pair of jittered images and it worked quickly and well.
I may add a shell command that explicitly does this (I'm the author of the python implementation) if I receive encouragement :) (today, you'd need to write a few lines of python to load your images and return the resulting alignment difference).

You can try using Optical Flow. http://www.mathworks.com/discovery/optical-flow.html .
It is usually used to measure the movement of objects from frame T to frame T+1, but you can also use it in your case. You would get a map that tells you the "offset" each point in Image1 moved to Image2.
Then, if you want a metric that gives you a "distance" between the images, you can perhaps average the pixel values or something similar.

DITMatlab: How to calculate hysteresis for experimental data set?

I got an experimental data set that looks more or less like this.
I need to determine how big the hysteresis loop is, aka if I look at two points with the same capacity (Y axis), whats the maximum distance between said points (X axis).
The issue is, data points arent located on the same Y value, aka I cant just find max X and min X for every Y and subtract them - that'd be too easy :^)
I figured I can use convex hull (convhull) to calculate the outer envelope of the set, but then I realised, it will only work for the convex part, not the concaved part, but I guess I can divide my data set into smaller subsets and find a sum of them... or something.
And then, assuming I have the data set thats only the outer outline of the data set, I need to calculate distances between left and right border (as shown here), but then again, thats just data set of X and Y, and Id need to find the point where green line crosses outer rim
So here are the questions:
Is there a matlab procedure that calculates the outer outline of data set, that works with the concaved part - kinda like convhull, but better?
Assuming I have the outline data set, is there an easy way to calculate secant line of data set, like shown on second picture??
Thanks for any advice, hope I made what I have in mind clear enough - english isnt my first language
Benji
EDIT 1: Or perhaps there is an easier (?) way to determine, which points form biggest outline? Like... group points into (duh) groups, lets say, those near 20%, 30%, 40%... and then pick two randomly (or brute force pick all possible pairs), one for top boundary, other for bot boundary, and then calculate area of polygon formed this way? Then, select set of points resulting in polygon with biggest area?
EDIT 2: Ooor I could group them like I thought I would before, and then work on only two groups at a time. Find convex hull for two groups, then for two next groups, and when Im done with all the groups, Id only need to find points common to all the group, and find a global hull :D Yeah, that might work :D

How do I optimize point-to-circle matching?

I have a table that contains a bunch of Earth coordinates (latitude/longitude) and associated radii. I also have a table containing a bunch of points that I want to match with those circles, and vice versa. Both are dynamic; that is, a new circle or a new point can be added or deleted at any time. When either is added, I want to be able to match the new circle or point with all applicable points or circles, respectively.
I currently have a PostgreSQL module containing a C function to find the distance between two points on earth given their coordinates, and it seems to work. The problem is scalability. In order for it to do its thing, the function currently has to scan the whole table and do some trigonometric calculations against each row. Both tables are indexed by latitude and longitude, but the function can't use them. It has to do its thing before we know whether the two things match. New information may be posted as often as several times a second, and checking every point every time is starting to become quite unwieldy.
I've looked at PostgreSQL's geometric types, but they seem more suited to rectangular coordinates than to points on a sphere.
How can I arrange/optimize/filter/precalculate this data to make the matching faster and lighten the load?

You haven't mentioned PostGIS - why have you ruled that out as a possibility?
http://postgis.refractions.net/documentation/manual-2.0/PostGIS_Special_Functions_Index.html#PostGIS_GeographyFunctions

Thinking out loud a bit here... you have a point (lat/long) and a radius, and you want to find all extisting point-radii combinations that may overlap? (or some thing like that...)
Seems you might be able to store a few more bits of information Along with those numbers that could help you rule out others that are nowhere close during your query... This might avoid a lot of trig operations.
Example, with point x,y and radius r, you could easily calculate a range a feasible lat/long (squarish area) that could be used to help rule it out if needless calculations against another point.
You could then store the max and min lat and long along with that point in the database. Then, before running your trig on every row, you could Filter your results to eliminate points obviously out of bounds.

If I undestand you correctly then my first idea would be to cache some data and eliminate most of the checking.
Like imagine your circle is actually a box and it has 4 sides
you could store the base coordinates of those lines much like you have lines (a mesh) on a real map. So you store east, west, north, south edge of each circle
If you get your coordinate and its outside of that box you can be sure it won't be inside the circle either since the box is bigger than the circle.
If it isn't then you have to check like you do now. But I guess you can eliminate most of the steps already.

Jelly physics 3d

I want to ask about jelly physics ( http://www.youtube.com/watch?v=I74rJFB_W1k ), where I can find some good place to start making things like that ? I want to make simulation of cars crash and I want use this jelly physics, but I can't find a lot about them. I don't want use existing physics engine, I want write my own :)

Something like what you see in the video you linked to could be accomplished with a mass-spring system. However, as you vary the number of masses and springs, keeping your spring constants the same, you will get wildly varying results. In short, mass-spring systems are not good approximations of a continuum of matter.
Typically, these sorts of animations are created using what is called the Finite Element Method (FEM). The FEM does converge to a continuum, which is nice. And although it does require a bit more know-how than a mass-spring system, it really isn't too bad. The basic idea, derived from the study of continuum mechanics, can be put this way:
Break the volume of your object up into many small pieces (elements), usually tetrahedra. Let's call the entire collection of these elements the mesh. You'll actually want to make two copies of this mesh. Label one the "rest" mesh, and the other the "world" mesh. I'll tell you why next.
For each tetrahedron in your world mesh, measure how deformed it is relative to its corresponding rest tetrahedron. The measure of how deformed it is is called "strain". This is typically accomplished by first measuring what is known as the deformation gradient (often denoted F). There are several good papers that describe how to do this. Once you have F, one very typical way to define the strain (e) is:
e = 1/2(F^T * F) - I. This is known as Green's strain. It is invariant to rotations, which makes it very convenient.
Using the properties of the material you are trying to simulate (gelatin, rubber, steel, etc.), and using the strain you measured in the step above, derive the "stress" of each tetrahdron.
For each tetrahedron, visit each node (vertex, corner, point (these all mean the same thing)) and average the area-weighted normal vectors (in the rest shape) of the three triangular faces that share that node. Multiply the tetrahedron's stress by that averaged vector, and there's the elastic force acting on that node due to the stress of that tetrahedron. Of course, each node could potentially belong to multiple tetrahedra, so you'll want to be able to sum up these forces.
Integrate! There are easy ways to do this, and hard ways. Either way, you'll want to loop over every node in your world mesh and divide its forces by its mass to determine its acceleration. The easy way to proceed from here is to:
Multiply its acceleration by some small time value dt. This gives you a change in velocity, dv.
Add dv to the node's current velocity to get a new total velocity.
Multiply that velocity by dt to get a change in position, dx.
Add dx to the node's current position to get a new position.
This approach is known as explicit forward Euler integration. You will have to use very small values of dt to get it to work without blowing up, but it is so easy to implement that it works well as a starting point.
Repeat steps 2 through 5 for as long as you want.
I've left out a lot of details and fancy extras, but hopefully you can infer a lot of what I've left out. Here is a link to some instructions I used the first time I did this. The webpage contains some useful pseudocode, as well as links to some relevant material.
http://sealab.cs.utah.edu/Courses/CS6967-F08/Project-2/
The following link is also very useful:
http://sealab.cs.utah.edu/Courses/CS6967-F08/FE-notes.pdf
This is a really fun topic, and I wish you the best of luck! If you get stuck, just drop me a comment.

That rolling jelly cube video was made with Blender, which uses the Bullet physics engine for soft body simulation. The bullet documentation in general is very sparse and for soft body dynamics almost nonexistent. You're best bet would be to read the source code.
Then write your own version ;)

Here is a page with some pretty good tutorials on it. The one you are looking for is probably in the (inverse) Kinematics and Mass & Spring Models sections.
Hint: A jelly can be seen as a 3 dimensional cloth ;-)
Also, try having a look at the search results for spring pressure soft body model - they might get you going in the right direction :-)

See this guy's page Maciej Matyka, topic of soft body

Unfortunately 2d only but might be something to start with is JellyPhysics and JellyCar

How do I visualise clusters of users?

I have an application in which users interact with each-other. I want to visualize these interactions so that I can determine whether clusters of users exist (within which interactions are more frequent).
I've assigned a 2D point to each user (where each coordinate is between 0 and 1). My idea is that two users' points move closer together when they interact, an "attractive force", and I just repeatedly go through my interaction logs over and over again.
Of course, I need a "repulsive force" that will push users apart too, otherwise they will all just collapse into a single point.
First I tried monitoring the lowest and highest of each of the XY coordinates, and normalizing their positions, but this didn't work, a few users with a small number of interactions stayed at the edges, and the rest all collapsed into the middle.
Does anyone know what equations I should use to move the points, both for the "attractive" force between users when they interact, and a "repulsive" force to stop them all collapsing into a single point?
Edit: In response to a question, I should point out that I'm dealing with about 1 million users, and about 10 million interactions between users. If anyone can recommend a tool that could do this for me, I'm all ears :-)

In the past, when I've tried this kind of thing, I've used a spring model to pull linked nodes together, something like: dx = -k*(x-l). dx is the change in the position, x is the current position, l is the desired separation, and k is the spring coefficient that you tweak until you get a nice balance between spring strength and stability, it'll be less than 0.1. Having l > 0 ensures that everything doesn't end up in the middle.
In addition to that, a general "repulsive" force between all nodes will spread them out, something like: dx = k / x^2. This will be larger the closer two nodes are, tweak k to get a reasonable effect.

I can recommend some possibilities: first, try log-scaling the interactions or running them through a sigmoidal function to squash the range. This will give you a smoother visual distribution of spacing.
Independent of this scaling issue: look at some of the rendering strategies in graphviz, particularly the programs "neato" and "fdp". From the man page:
neato draws undirected graphs using ``spring'' models (see Kamada and
Kawai, Information Processing Letters 31:1, April 1989). Input files
must be formatted in the dot attributed graph language. By default,
the output of neato is the input graph with layout coordinates
appended.
fdp draws undirected graphs using a ``spring'' model. It relies on a
force-directed approach in the spirit of Fruchterman and Reingold (cf.
Software-Practice & Experience 21(11), 1991, pp. 1129-1164).
Finally, consider one of the scaling strategies, an attractive force, and some sort of drag coefficient instead of a repulsive force. Actually moving things closer and then possibly farther later on may just get you cyclic behavior.
Consider a model in which everything will collapse eventually, but slowly. Then just run until some condition is met (a node crosses the center of the layout region or some such).
Drag or momentum can just be encoded as a basic resistance to motion and amount to throttling the movements; it can be applied differentially (things can move slower based on how far they've gone, where they are in space, how many other nodes are close, etc.).
Hope this helps.

The spring model is the traditional way to do this: make an attractive force between each node based on the interaction, and a repulsive force between all nodes based on the inverse square of their distance. Then solve, minimizing the energy. You may need some fairly high powered programming to get an efficient solution to this if you have more than a few nodes. Make sure the start positions are random, and run the program several times: a case like this almost always has several local energy minima in it, and you want to make sure you've got a good one.
Also, unless you have only a few nodes, I would do this in 3D. An extra dimension of freedom allows for better solutions, and you should be able to visualize clusters in 3D as well if not better than 2D.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse