assignment problem with costs - variable-assignment

i have a problem, which i'm stuck with, and cant find anywhere to start with, so i'm hopelessly turning to stackoverflow.
the problem wants us to find out if it is np-hard or polynomial, if its np-hard prove np-completeness, else give the algorithm.
the problem is as follows:
a product exists of n modules. there are two companies that can build each module, with some cost (c_ij, i: module number, j: company number). if modules a and b are built by different companies, they also have an additional cost, (p_ab). the modules a and b do not have to be successive, the same additional cost applies for a and c too. as expected, the problem wants us to find the assignment of modules to companies so that the total cost is minimum.
any ideas ?

It can be reduced to min cut problem, which can be found by any max flow algorithm.
So what's the network?
Modules will be vertecies of our graph and also we add 2 new vertices source and sink.
From source we add edge to every module i with capacity Ci1. Similarly from every module i we add edge to sink with capacity Ci2. Also for any modules i and j we add edge with capacity pij
(graph oriented thus there will be two edges (i j) and (j i)). It is easy to see that value of min cut is solution of the problem (modules in part of the cut with the source assign to the second company and rest modules to the first company)

Related

How to make simulated electric components behave nicely?

I'm making a simple electric circuit simulator. It will (at least initially) only feature batteries, wires and resistors in series and parallel. However, I'm at a loss how best to simulate said circuit in a good way.
Specifically, I will have batteries and resistors with two contact points each, and wires that go between two contact points. I assume that each component will have a field for its resistance, the current through it and the voltage across it (current and voltage will, of course, be signed). Each component is given a resistance, and the batteries are given a voltage. The goal of the simulation is to assign correct values to all the other fields in real time as the player connects and disconnects components and wires.
These are the requirements:
It must be correct, including Ohm's and Kirchhoff's laws (I'm modeling real world circuits, and there is little point if the model does something completely different)
It must be numerically stable (we can't have uncontrolled oscillations or something just because two neighbouring resistors can't make up their minds together)
It should stabilize relatively quickly for, let's say, fewer than 30 components (having to wait a few seconds before the values are correct doesn't really satisfy "real time", but I really don't plan on using it for more than 10 or maybe 20 components)
The optimal formulation for me (how I envision this in my head) would be if I could assign a script to each component that took care of that component only, possibly by communicating field values with neighbouring components, and each component script works in parallel and adjusts as is needed
I only see problems here and no solutions. The biggest problem, I think, is Kirchhoff's voltage law (going around any sub-circuit, the voltage across all components, including signs, add up to 0), because that's a global law (it says somehting about a whole circuit and not just a single component / connection point). There is a mathematical reformulation saying that there exists a potential function on the points in the circuit (for instance, the voltage measured against the + pole of the battery), which is a bit more local, but I still don't see how to let a component know how much the voltage / potential drops across it.
Kirchhoff's current law (the net current flow into an intersection is 0) might also be trouble. It seems to force me to make intersections into separate objects to enforce it. I originally thought that I could just let each component have two lists (a left list and a right list) containing every other component that is connected to it at that point, but that might not make KCL easily enforcable.
I know there are circuit simulators out there, and they must have solved this exact problem somehow. I just can't find an explanation because if I try googling it, I only find the already made simulators and no explanations anywhere.

Kademlia XOR metric properties purposes

In the Kademlia paper by Petar Maymounkov and David Mazières, it is said that the XOR distance is a valid non-Euclidian metric with limited explanations as to why each of the properties of a valid metric are necessary or interesting, namely:
d(x,x) = 0
d(x,y) > 0, if x != y
forall x,y : d(x,y) = d(y,x) -- symmetry
d(x,z) <= d(x,y) + d(y,z) -- triangle inequality
Why is it important for a metric to have these properties in general? Why is each of these properties necessary in the context of routing queries in the Kademlia Distributed Hash Table implementation?
In addition, the paper mentions that unidirectionality (for a given x, and a distance l, there exist only a single y for which d(x,y) = l) guarantees that all queries will converge along the same path. Why is that so?
I can only speak for Kademlia, maybe someone else can provide a more general answer. In the meantime...
d(x,x) = 0
d(x,y) > 0, if x != y
These two points together effectively mean that the closest point to x is x itself; every other point is further away. (This may seem intuitive, but other aspects of the XOR metric aren't.)
In the context of Kademlia, this is important since a lookup for node with ID x will yield that node as the closest. It would be awkward if that were not the case, since a search converging towards x might not find node x.
forall x,y : d(x,y) = d(y,x)
The structure of the Kademlia routing table is such that nodes maintain detailed knowledge of the address space closest to them, and exponentially decreasing knowledge of more distant address space. In short, a node tries to keep all the k closest contacts it hears about.
The symmetry is useful since it means that each of these closest contacts will be maintaining detailed knowledge of a similar part of the address space, rather than a remote part.
If we didn't have this property, it might be helpful to think of the search as more like the hands of a clock moving in one direction round a clockface. The node at 1 o'clock (Node1) is close to Node2 at 2 o'clock (30°), but Node2 is far from Node1 (330°). So imagine we're looking for the two closest to 3 o'clock (i.e. Node1 and Node2). If the search reaches Node2, it won't know about Node1 since it's far away. The whole lookup and topology would have to change.
d(x,z) <= d(x,y) + d(y,z)
If this weren't the case, it would be impossible for a node to know which contacts from its routing table to return during a lookup. It would know the k closest to the target, but there would be no guarantee that one of the other more distant contacts wouldn't yield a shorter overall path.
Because of this property and unidirectionality, different searches starting from vastly separated points will tend to converge down the same path.
The unidirectionality means that no two nodes can have the same distance from a given point. If that weren't the case, then the target point could be encircled by a bunch of nodes all the same distance from it. Then various different searches would be free pick any of those to pass through. However, unidirectionality guarantees that exactly one of this bunch will be the closest, and any search which chooses between this group will always select the same one.
I've been bashing my head on this for quite some time: how can the XOR - as in the number of differing bits, a proper Hamming distance - be the basis of a total order?
Well it can't, such a metric on its own is not enough for a comparable relationship, all it can do is dump nodes in circles around a point.
Then I read the paper more closely and noticed that it says "the XOR as an integer value" and it dawned on me: the crux is not the "XOR metric", but the length of the common prefix of the ID (of which XOR is a derivation mechanism.)
Take two nodes with the same Hamming distance from "self" and the length of their prefix common to "self": the one with shortest common prefix is the furthest node.
The paper uses "XOR distance metric" but it really should read "ID prefix length total ordering"
I think this may explain it a wee bit, let me know http://metaquestions.me/2014/08/01/shortest-distance-between-two-points-is-not-always-a-straight-line/
Basically each hop if it were only one bit at a time in a fully populated network (extreme) then would have twice the knowledge of the previous hop. As you converge the knowledge is greater until you get to the closest nodes whose knowledge is ultimate in the network.

Online k-means clustering

Is there a online version of the k-Means clustering algorithm?
By online I mean that every data point is processed in serial, one at a time as they enter the system, hence saving computing time when used in real time.
I have wrote one my self with good results, but I would really prefer to have something "standardized" to refer to, since it is to be used in my master thesis.
Also, does anyone have advice for other online clustering algorithms?
(lmgtfy failed ;))
Yes there is. Google failed to find it because it's more commonly known as "sequential k-means".
You can find two pseudo-code implementations of sequential K-means in this section of some Princeton CS class notes by Richard Duda. I've reproduced one of the two implementations below:
Make initial guesses for the means m1, m2, ..., mk
Set the counts n1, n2, ..., nk to zero
Until interrupted
Acquire the next example, x
If mi is closest to x
Increment ni
Replace mi by mi + (1/ni)*( x - mi)
end_if
end_until
The beautiful thing about it is that you only need to remember the mean of each cluster and the count of the number of data points assigned to the cluster. Once you update those two variables, you can throw away the data point.
I'm not sure where you would be able to find a citation for it. I would start looking in Duda's classic text Pattern Classification and Scene Analysis or the newer edition Pattern Classification. If it's not there, you could try Chris Bishop's newest book or Daphne Koller and Nir Friedman's recent text.

Graph/tree representation and recursion

I'm currently writing an optimization algorithm in MATLAB, at which I completely suck, therefore I could really use your help. I'm really struggling to find a good way of representing a graph (or well more like a tree with several roots) which would look more or less like this:
alt text http://img100.imageshack.us/img100/3232/graphe.png
Basically 11/12/13 are our roots (stage 0), 2x is stage1, 3x stage2 and 4x stage3. As you can see nodes from stageX are only connected to several nodes from stage(X+1) (so they don't have to be connected to all of them).
Important: each node has to hold several values (at least 3-4), one will be it's number and at least two other variables (which will be used to optimize the decisions).
I do have a simple representation using matrices but it's really hard to maintain, so I was wondering is there a good way to do it?
Second question: when I'm done with that representation I need to calculate how good each route (from roots to the end) is (like let's say I need to compare is 11-21-31-41 the best or is 11-21-31-42 better) to do that I will be using the variables that each node holds. But the values will have to be calculated recursively, let's say we start at 11 but to calcultate how good 11-21-31-41 is we first need to go to 41, do some calculations, go to 31, do some calculations, go to 21 do some calculations and then we can calculate 11 using all the previous calculations. Same with 11-21-31-42 (we start with 42 then 31->21->11). I need to check all the possible routes that way. And here's the question, how to do it? Maybe a BFS/DFS? But I'm not quite sure how to store all the results.
Those are some lengthy questions, but I hope I'm not asking you for doing my homework (as I got all the algorithms, it's just that I'm not really good at matlab and my teacher wouldn't let me to do it in java).
Granted, it may not be the most efficient solution, but if you have access to Matlab 2008+, you can define a node class to represent your graph.
The Matlab documentation has a nice example on linked lists, which you can use as a template.
Basically, a node would have a property 'linksTo', which points to the index of the node it links to, and a method to calculate the cost of each of the links (possibly with some additional property that describe each link). Then, all you need is a function that moves down each link, and brings the cost(s) with it when it moves back up.

Matrix-Algebra Design Decomposition

I am looking at refactoring some very complex code which is a subsystem of a project I have at work. Part of my examination of this code is that it is incredibly complex, and contains a lot of inputs, intermediate values and outputs depending on some core business logic.
I want to redesign this code to be easier to maintain as well as executing a hell of a lot faster, so to start off with I have been trying to look at each of the parameters and their dependencies on each other. This has lead to quite a large and tangled graph and I would like a mechanism for simplifying this graph.
A while back I came across a technique in a book about SOA design called "Matrix Design Decomposition" which uses a matrix of outputs and the dependencies they have on the inputs, applies some form of matrix algebra and can generate Business Process diagrams for those dependencies.
I know there is a web tool available at http://www.designdecomposition.com/ however it is limited in the number of input/output dependencies you can have. I have tried looking around for the algorithmic source for this tool (so I could attempt to implement it myself without the size limitation), however I have had no luck.
Does anybody know a similar technique that I could use? Currently I am even considering taking the dependency matrix and applying some Genetic Algorithms to see if evolution can come up with a simpler workflow...
Cheers,
Aidos
EDIT:
I will explain the motivation:
The original code was written for a system which computed all of the values (about 60) every time the user performed an operation (adding, removing or modifying certain properties of a item). This code was written over ten years ago and is definitely showing signs of age - others have added more complex calculations into the system and now we are getting completely unreasonable performance (up to 2 minutes before control is returned to the user). It has been decided to detach the calculations from the user actions and provide a button to "recalculate" the values.
My problem arises because there are so many calculations that are going on and they are based on the assumption that all of the required data will be available for their computation - now when I try to re-implement the calculations I keep encountering problems because I haven't got the result for a different calculation that this calculation relies on.
This is where I want to use the matrix-decomposition approach. The MD approach allows me to specify all of the inputs and outputs and gives me the "simplest" workflow that I can use for generating all of the outputs.
I can then use this "workflow" to know the precedence of the calculations I need to perform to get the same result without generating any exceptions. It also shows me which parts of the calculation system I can parallelise and where the fork and join points will be (I won't worry about that part just yet). At the moment all I have is an insanely large matrix with lots of dependencies showing in it, with no idea where to start.
I will elaborate from my comment a little more:
I don't want to use the solution from the EA process in the actual program. I want to take the dependency matrix and decompose it into modules that I will then code manually - this is purely a design aid - I am just interested in what the inputs/outputs for these modules will be. Basically a representation of the complex interdependencies between these calculations, as well as some idea of precedence.
Say I have A requires B and C. D requires A and E. F requires B, A and E, I want to effectively partition the problem space from a complex set of dependencies into a "workflow" that I can examine to get a better understanding. Once I have this understanding I can come up with a better design / implementation that is still human readable, so for the example I know I need to calculate A, then C, then D, then F.
--
I know this seems kind of strange, if you take a look at the website I linked to before the matrix based decomposition there should give you some understanding of what I am thinking of...
kquinn, If it's the piece of code I think he's referring to (I used to work there), it's already a black box solution that no human can understand as is. He's not looking to make it more complicated, less in fact. What he's trying to achieve is a whole heap of interlinked calculations.
What currently happens, is that whenever anything changes, it's an avalanche of events which cause a whole bunch of calculations to fire off, which in turn causes a whole bunch more events which continues on until finally it reaches a state of equilibrium.
What I assume he wants to do is find the dependencies for those outlying calculations and work in from there so they can be rewritten and find a way for the calculations from happening for the sake of it, rather than because they need to.
I can't offer much advice in regards to simplifying the graph, as unfortunately it's not something I have much experience in. That said, I would start looking for those outlying calculations which have no dependencies, and just traverse the graph from there. Start building up a new framework that includes the core business logic of each calculation in the simplest possible way, and refactor the crap out of it along the way.
If this is, as you say, "core business logic", then you really don't want to be screwing around with fancy decompositions and evolutionary algorithms that produce a "black box" solution that no one in the world understands or is capable of modifying. I would be very surprised if any of these techniques actually yielded any useful result; the human brain is still incomprehensibly more capable than any machine at untangling complicated relationships.
What you want to do is traditional refactoring: clean up the individual procedures, streamlining them and merging them where possible. Your goal is to make the code clear, so your successor doesn't have to go through the same process.
What language are you using?
Your problem should be pretty easy to model using Java Executors and Future<> tasks, but a similar framework is perhaps availabe on your chosen platform as well?
Also, if I understand this correctly, you want to generate a critical path for a large set of interdependent calculations -- is that something done dynamically, or do you "just" need a static analysis?
Regarding an algorithmic solution; pick up the closest copy of your numerical analysis textbook and refresh your memory on singular value decompositions and LU factorization; I'm guessing from the top off my head that this is what lies behind the tool you linked to.
EDIT: Since you're using Java, I'll give a brief outline of a suggestion proposal:
-> Use a threadpool executor to parallellize all calculations easily
-> Solve interdependencies with an object map of Future<> or FutureTask<>:s, i.e. if you variables are A, B and C, where A = B + C, do something like this:
static final Map<String, FutureTask<Integer>> mapping = ...
static final ThreadPoolExecutor threadpool = ...
FutureTask<Integer> a = new FutureTask<Integer>(new Callable<Integer>() {
public Integer call() {
Integer b = mapping.get("B").get();
Integer c = mapping.get("C").get();
return b + c;
}
}
);
FutureTask<Integer> b = new FutureTask<Integer>(...);
FutureTask<Integer> c = new FutureTask<Integer>(...);
map.put("A", a);
map.put("B", a);
map.put("C", a);
for ( FutureTask<Integer> task : map.values() )
threadpool.execute(task);
Now, if I'm not totally off (and I may very well be, it was a while since I worked in Java), you should be able to solve the apparent deadlock problem by tuning the thread pool size, or use a growing thread pool. (You still have to make sure that there are no interdependent tasks though, such as if A = B + C, and B = A + 1...)
If the black-box is linear you can discover all the coefficients by simply concatenating many vectors of input and many vectors of output.
you have input x[i] and output y[i], then you create a matrix Y whose columns are y[0], y[1], ... y[n], and a matrix X whose columns are x[0], x[1], ..., x[n]. There will be a transformation Y = T * X, then you may determine T = Y * inverse(X).
But since you said it is complex I bet it is not linear. Then if you still want a general framework you can use this a factor-graph
https://ieeexplore.ieee.org/document/910572
I would be curious if you can do this.
What I think is easier is to understand the code and rewrite it using the best practices.