For each element A[i] of array A, find the closest j such that A[j] > A[i] - distance

Given : An array A[1..n] of real numbers.
Goal : An array D[1..n] such that
D[i] = min{ distance(i,j) : A[j] > A[i] }
or some default value (like 0) when there is no higher-valued element. I would really like to use Euclidean distance here.
Example :
A = [-1.35, 3.03, 0.73, -0.06, 0.71, -0.21, -0.12, 1.49, 1.41, 1.42]
D = [1, 0, 1, 1, 2, 1, 1, 6, 1, 2]
Is there any way to beat the obvious O(n^2) solution? The only progress I've made so far is that D[i] = 1 whenever A[i] is not a local maxima. I've been thinking a lot and have come up with NOTHING. I hope to eventually extend this to 2D (so A and D are matrices).

So I've puzzled on this a bit but I haven't come up with anything better that works. A few ideas:
Augment the array with extra information that can be gained in O(n) time or better. e.g., add indices, difference between neighbors, etc.
Would sorting (O(n(log n)) help in any way?
Seems like dynamic programming could be helpful here, if you can figure out a way to solve for each element based on the solution for its neighbors (augmenting the answers with information like the j for each A[i] instead of just the distance maybe).

Sort the array from highest to lowest element. If I understand your problem correctly, this gives you the answer immediately, since the closest bigger element to any element in the original list is the one before it. This way you don't even need to create the D[] array, since computation of its contents can be done using the array A[] exclusively. The first element in the sorted A[] array does not have a bigger friend so the answer for it would be your default valye ( 0 perhaps?). Extending the algorithm for matrices might be easy (depends on how you "look" at the matrix) - just use a mapping function which sort of transofrms the matrix into a 1D array.

Related

Mean of values before and after a specific element

I have an array of 1 x 400, where all element values are above 1500. However, I have some elements that have values<50 which are wrong measures and I would like to have the mean of the elements before and after the wrong measured data points and replace it in the main array.
For instance, element number 17 is below 50 so I want to take the mean of elements 16 and 18 and replace element 17 with the new mean.
Can someone help me, please? many thanks in advance.
No language is specified in the question, but for Python you could work with List Comprehension:
# array with 400 values, some of which are incorrect
arr = [...]
arr = [arr[i] if arr[i] >= 50 else (arr[i-1]+arr[i+1])/2 for i in range(len(arr))]
That is, if arr[i] is less than 50, it'll be replaced by the average value of the element before and after it. There are two issues with this approach.
If i is the first or last element, then one of the two values will be undefined, and no mean can be obtained. This can be fixed by just using the value of the available neighbour, as specified below
If two values in a row are very low, the leftmost one will use the rightmost one to calculate its value, which will result in a very low value. This is a problem that may not occur for you in practice, but it is an inherent result of the way you wish to recalculate values, and you might want to keep it in mind.
Improved version, keeping in mind the edge cases:
# don't alter the first and last item, even if they're low
arr = [arr[i] if arr[i] >= 50 or i == 0 or i+1 == len(arr) else (arr[i-1]+arr[i+1])/2 for i in range(len(arr))]
# replace the first and last element if needed
if arr[0] < 50:
arr[0] = arr[1]
if arr[len(arr)-1] < 50:
arr[len(arr)-1] = arr[len(arr)-2]
I hope this answer was useful for you, even if you intend to use another language or framework than python.

Most efficient way to store dictionaries in Matlab

I have a set of IDs associated with costs which is just a double value. IDs are integers and unique. Two IDs may have same costs. I stored them as:-
a=containers.Map('KeyType','uint32','ValueType','double');
a(1)=7.3
a(2)=8.4
a(3)=7.3
Now i want to find the minimum cost.
b=[];
c=values(a);
b=[b,c{:}];
cost_min=min(b);
Now i want to find all IDs associated i.e. 1 and 3 with the minimum cost i.e. 7.3. I can collect all the keys into an array and then do a for loop over this array. Is there a better way to do this entire thing in Matlab so that for loops are not required?
sparse matrix can work as a hashmap, just do this:
a= sparse(1:3,1,[7.3 8.4 7.3])
find(a == min(nonzeros(a))
There are methods which can be used on maps for this kind of operations
http://se.mathworks.com/help/matlab/ref/containers.map-class.html
The approach finding minimum values and minimum keys can be done something like this,
a=containers.Map('KeyType','uint32','ValueType','double');
a(1)=7.3;
a(3)=8.4;
a(4)=7.3;
minval = inf;
minkeys = -1;
for k = keys(a)
val = a.values(k);
val = val{1};
if (val < minval(1))
minkeys = k;
minval = val;
elseif (val == minval(1))
minkeys = [minkeys,k];
end
end
disp(minval);
disp(minkeys);
This is not efficient though and value search is clumsy for maps. This is not what they are intended for. Maps is supposed to do efficient key lookup. In case you are going to do a lot of lookups and this is what takes time, then use a map. If you need to do a lot of value searches, I would recommend that you use a matrix (or two arrays) for this instead.
idx = [1;3;4];
val = [7.3,8.3,7.3];
minval = min(val);
minidx = idx(val==minval);
disp(minval);
disp(minidx);
There is also another post with an example where it is shown how a sparse matrix can be used as a hashmap. Let the index become the key. This will take about 3 times the memory as all non-zero elements an ordinary array, but a map uses more memory than an array as well.

z3py: How to implement a counter in z3?

I want to design logics similar to a counter in Z3py.
If writing python script, we usually define a variable "counter" and then keep incrementing it when necessary. However, in Z3, there is no variant. Therefore, instead of defining an variant, I define a trace of that variant.
This is a sample code. Suppose there is an array "myArray" of size 5, and the elements in the array are 1 or 2. I want to assert a constraint that there must be two '2's in "myArray"
from z3 import *
s = Solver()
myArray = IntVector('myArray',5)
for i in range(5):
s.add(Or(myArray[i]==1,myArray[i]==2))
counterTrace = IntVector('counterTrace',6)
s.add(counterTrace[0]==0)
for i in range(5):
s.add(If(myArray[i]==2,counterTrace[i+1]==counterTrace[i]+1,counterTrace[i+1]==counterTrace[i]))
s.add(counterTrace[5]==2)
print s.check()
print s.model()
My question is that is this an efficient way of implementing the concept of counter in Z3? In my real problem, which is more complicated, this is really inefficient.
You can do this but it is much easier to create the sum over myArray[i] == 2 ? 1 : 0. That way you don't need to assert anything and you are dealing with normal expressions.

Find value in vector "p" that corresponds to maximum value in vector "r = f(p)"

As simple as in title. I have nx1 sized vector p. I'm interested in the maximum value of r = p/foo - floor(p/foo), with foo being a scalar, so I just call:
max_value = max(p/foo-floor(p/foo))
How can I get which value of p gave out max_value?
I thought about calling:
[max_value, max_index] = max(p/foo-floor(p/foo))
but soon I realised that max_index is pretty useless. I'm sorry asking this, real beginner here.
Having dropped the issue to pieces, I realized there's no unique corrispondence between values p and values in my related vector p/foo-floor(p/foo), so there's a logical issue rather than a language one.
However, given my input data, I know that the solution is unique. How can I fix this?
I ended up doing:
result = p(p/foo-floor(p/foo) == max(p/foo-floor(p/foo)))
Looks terrible, so if you know any other way...
Once you have the index, use it:
result = p(max_index)
You can create a new vector with your lets say "transformed" values:
p2 = (p/foo-floor(p/foo))
and then just use find to find the max values on p2:
max_index = find(p2 == max(p2))
that will return the index or indices of p2 with the max value of that operation, and finally just lookup the original value in p
p(max_index)
in 1 line, this is:
p(find((p/foo-floor(p/foo) == max((p/foo-floor(p/foo))))))
which is basically the same thing you did in the end :)

GCD test - to test dependency between loop statements

I understand how the GCD works on a trivial example as below:
for(i=1; i<=100; i++)
{
X[2*i+3] = X[2*i] + 50;
}
we first transform it into the following form:
X[a*i + b] and X[c*i + d]
a=2, b=3, c=2, d=0 and GCD(a,c)=2 and (d-b) is -3. Since 2 does not divide -3, no dependence is possible.
But how can we do this GCD test on a doubly nested loop?
For example:
for (i=0; i<10; i++){
for (j=0; j<10; j++){
A[1+2*i + 20*j] = A[2+20*i + 2*j);
}
}
While the subscripts can be delinearized, the GCD test is simple to apply directly. In your example, the subscript pair is [1+2*i + 20*j] and [2+20*i + 2*j], so we're looking for an integer solution to the equation
1 + 2*i + 20*j = 2 + 20*i' + 2*j'
Rearranging, we get
2*i - 20*i' + 20*j - 2*j = 1
Compute the GCD of all the coefficients, 2, -20, 20, and -2, and see if it divides the constant. In this case, the GCD is 2. Since 2 doesn't divide 1, there's no dependence.
The "easy" way to apply GCD in the nested loop case is to apply it only in cases where the arrays themselves are multidemsional; i.e., the original source code uses multiple subscripts rather than already linearized expressions. Of course if you can "back transform" these linearized subscripts then you'll have the equivalent.
Once you've cast the problem as a multidemsional problem then you may simply apply the GCD test "dimension by dimension". If any dimension shows no dependence then you can stop and declare there is no dependence for the entire multidemsional subscripting sequence.
The key of course is that casting as a multidimensional indexing problem gives you the nice property that there's a one-to-one mapping between individual index values and the corresponding index expression tuples. Without this the problem is harder.
This is the approach I took in the ASC Fortran vectorizing compiler back in the 70's and it worked pretty well, particularly used in conjunction with directional subscript analysis for the non disjoint case. The GCD test by itself is really not sufficient, but it does give you a relatively inexpensive way of making an early decision in your analysis in those cases where you then can avoid the more expensive dependence analysis.