spatial partitioning algorithms for 1D dataset? - cluster-analysis

This is Grid which represents geographical area in 10,000 squares, where each square is 55225 square meter.
The dataset has the traffic volume per square raning from 100 to 1000.
for ex:
square 1 - 100,
square 2 - 500
.
.
Square 10,000 - 800
Now, I want to partition this area in such a way that each partition may have different area but will carry similar amount of traffic, standard deviation of traffic among partitions should be minimum.Any suggestion for the spatial partition algorithm?

There are a few decisions you have to make in order to inform your procedure. The first question that comes to mind is if the number of partitions is defined? The second question is if there are any geometric restrictions on a group, i.e. must they be contiguous, or is any particular shape ideal? The third question is regarding how good is good enough? There is often a huge difference in the run time of an algorithm that provides an ideal answer (perhaps a greedy algorithm) and an algorithm that provides an optimal answer (perhaps an exhaustive or "brute force" approach). You will get a minimum standard deviation by grouping any 2 sectors that have the same volume, as your groups will will each have 0 standard deviation. Any way, this sounds a lot like a expanded bin packing problem and you should probably start your literature review there.
You need to pack your bins in order...
Here I selected center points for my circles biased on the highest traffic flow and filled them in from there.
class trafficNode:
def __init__(self,v,i):
self.cluster = None
self.value = v
self.index = i
self.occupied = False
def occupy(self):
self.occupied=True
def tryAdd(xList,mList,irow,icol):
try:
if not(mList[irow][icol] in xList and !mList[irow][icol].occupied):
xlist.append(mList[irow][icol])
except IndexError:
chill = None
return(xlist)
class cluster:
def __init__(self):
self.nodes = []
def getTotal(self):
total = 0
for k in self.nodes:
total += k.value
return(total)
def addNode(self,n):
self.nodes.append(n)
def getNeighbors(self,m,r = 0):
neighbors = []
for k in self.nodes:
i = k.index()
for k2 in range(0,4):
if k2==0:
neighbors = tryAdd(neighbors,m,i[0]+0,i[1]+1)
elif k2==1:
neighbors = tryAdd(neighbors,m,i[0]+1,i[1]+0)
elif k2==2:
neighbors = tryAdd(neighbors,m,i[0]+0,i[1]-1)
elif k2==3:
neighbors = tryAdd(neighbors,m,i[0]-1,i[1]+0)
if r != 0:
if k2==0:
neighbors = tryAdd(neighbors,m,i[0]+1,i[1]+1)
elif k2==1:
neighbors = tryAdd(neighbors,m,i[0]+1,i[1]-1)
elif k2==2:
neighbors = tryAdd(neighbors,m,i[0]-1,i[1]+1)
elif k2==3:
neighbors = tryAdd(neighbors,m,i[0]-1,i[1]-1)
return(neighbors)
def seed(self,m,irow,icol):
self.nodes.append(m[irow][icol])
m[irow][icol].occupy()
def propogate(self,m,target):
total = 0
for k in self.nodes:
total += k.value
s = 1
while total<target:
s = 1 if !s else 0
lastTotal=Total
n = self.getNeighbors(m,s)
if len(n==0):
break;
else:
if(abs(target-(total+sum([k.value for k in n])))<abs(target-total)):
for k in n:
self.nodes.append(k)
m[k.index[0]][k.index[1]].occupy()
else:
break;
def contains(self,i):
ret = False
for k in self.nodes
if k.index == i
ret = False
break;
return(ret)
def parseData(d,s): # Where d is the source datafile and s is the number of units per row.
ret = []
f = open(d,"r")
text = f.read()
lines = f.split("\n")
n = 0
r = 0
temp = []
for k in lines:
v = k.split(" - ")[1]
n+=1
temp.append(trafficNode(v,(r,n)))
if n == s:
n = 0
r += 1
ret.append(temp)
temp = []
return(ret)
def mapTotal(m):
return sum([sum([k2.value for k2 in k]) for k in m])
def pTotal(m,n):
return(mapTotal/n)
import sys
infile = sys.argv[1]
ncols = sys.argv[2]
ntowers = sys.argv[3]
m = parseData(infile,ncols)
s = pTotal(m,ntowers)
spots = [k.index for k in m if !k.occupied]
clusters = []
while len(spots > 0):
spotVals = [m[k[0]][k[1]].value for k in spots]
nextSpotIndex = spots[spotVals.index(max(spotVals))]
clusters.append(cluster)
clusters[n].seed(self,m,nextSpotIndex[0],nextSpotIndex[1])
clusters[n].propogate(m,s)
spots = [k.index for k in m if !k.occupied]
That said I haven't tested it yet... Does your data come as that image or another file?

Related

How to solve Ax=b for large condition numbers

I am dealing with highly ill-conditioned matrix (cond> 10^25).
I have tried most of scipy methods: splu, gmers solve
and pyamg.krylov.steepest_descent from PyAMG.
also iterative Refinement method:
def iterRef(A, b, tol=1e-5, maxiter=100, verbose=True):
"""
Solve the equation a x = b for x using Iterative Refinement method.
:param A: [(M,M) array_like] A square matrix.
:param b: [(M,) array like]Right-hand side matrix in a x = b.
:param maxIter: [int] max number of iteration for convergence.
:param t: #!
:return x: (M,) or (M, N) ndarray
Solution to the system a x = b. Shape of the return matches the shape of b.
**Reference:**
Burden, R.L. and Faires, J.D., 2011. Numerical analysis.
"""
# declarations
n = len(b)
xx = np.zeros_like(b)
r = np.zeros_like(b)
lu = splu(A)
x = lu.solve(b)
res = np.sum(A.dot(x)-b)
print("res :: A * x - b = {:e}".format(res))
# check if converged
if np.abs(res) < tol:
print("IR ::: A * x - b = {:e}".format(res))
return x
k = 1 # step 1
while (k <= maxiter): # step 2
r = b - A.dot(x) # step 3
y = lu.solve(r) # step 4
xx = copy(x + y) # step 5
if k == 1: # step 6
# t = 16
# COND = np.linalg.norm(y)/np.linalg.norm(xx) * 10**t
# print("cond is ::::", COND)
COND = np.linalg.cond(A.toarray())
norm_ = np.linalg.norm(x-xx) # * np.linalg.norm(x) * 1e10
print("iteration {:3d}, norm = {:e}".format(k, norm_))
if norm_ < tol: # step 7
if verbose:
print("Conditional number of matrix A is: {:e}".format(COND))
print("The procedure was successful.")
print("IR: A * x - b = {:e}".format(np.sum(A.dot(xx)-b)))
print(f"number of iteration is : {k:d}")
print(" ")
return xx
k += 1 # step 8
x = copy(xx) # step 9
print("Max iteration exceeded.")
print("The procedure was not successful.")
print("Conditional number of matrix A is: {:e}".format(COND))
print(" ")
return None
The accuracy depends on condition number and for larger ones, the accuracy disappears.
I tried some method from eigen3: Catalogue of decompositions offered by Eigen:
VectorXd x = A.partialPivLu().solve(b);
VectorXd x = A.fullPivLu().solve(b);
VectorXd x = A.bdcSvd(ComputeThinU | ComputeThinV).solve(b);
Here is the link to github for the full example in python and eigen.
It is possible to use Eigen modules with e.g quad precision or higher? Is there any example?
I am not sure we can do this with scipy modules.
output from C++ file is:
condition number is :3.16172e+28
The relative error for partialPivLu is 0.0000000000000011:
The relative error for fullPivLu is 0.0157277957331164:
The relative error for householderQr is 0.0000000000000020:
The relative error for colPivHouseholderQr is 0.0000000000577384:
The relative error for fullPivHouseholderQr is 0.0157277957331138:
The relative error for completeOrthogonalDecomposition is 0.0157277957331138:
The relative error for llt is -nan:
The relative error for ldlt is 0.0000000001045718:
The relative error for ldlt is 396613336624311706845184.0000000000000000:
The relative error for bdcSvd is 0.0157277957329992:
The relative error for jacobiSvd is 0.0157277957528586:
I put the python and C++ code in the link attached in case.

How do I implement Neumann series iteration to approximate Ax = b?

I am working on MatLab problems from my textbook and one of the problems (as an example of Neumann series iteration) asks me to follow the pseudocode below:
INPUT: A n x n matrix, b n x 1 vector, T a positive integer
OUTPUT: An approximation y of x after T iterations
STEP 1: Set y = zeros(n,1)
STEP 2: Set M = eye(n) - A
STEP 2: For i = 1,2,...,T do STEP 3
STEP 3: Set y = M*y + b
STEP 4: OUTPUT(y)
I am trying to find the smallest value of T such that the largest entry of the vector Ay - b in absolute value is less than the tolerance I set (the variable e as shown below). I then save T and E (the largest entry in absolute value of Ay - b).
function [T,E] = neumann(A,b,e)
n = size(A);
y = zeros(n(1,1),1);
M = eye(n(1,1)) - A;
t = 10000;
for ii = 1:t
y = M*y + b;
if max(abs(A*y - b)) < e
T = t;
E = max(abs(A*y - b));
break
end
end
end
A = [1.1,.2,-.2,.5;
.2,.9,.5,.3;
.1,0.,1.,.4;
.1,.1,.1,1.2];
b = [1;0;1;0];
[T_2, E_2] = neumann(A,b,1e-2);
[T_4, E_4] = neumann(A,b,1e-4);
[T_6, E_6] = neumann(A,b,1e-6);
output = [T_2, E_2; T_4, E_4; T_6, E_6];
Instead of getting the smallest possible T, the for loop goes through all of the iterations even though I used the break statement to end the loop's execution once the condition was met. I can't really figure out what's wrong with my loop. I followed the pseudocode as closely as possible. Any feedback or suggestions is appreciated, thank you in advance.
You always set T = t, you've perhaps forgotten what t is.
You define t = 10000 on line 5 of the neumann function, this doesn't change so your output T is always 10000.
Instead, I assume you wanted T = ii;, as ii is the current time step when the threshold is reached.

Conditional Split arrays within a cell

Assume I have a 50x1 cell(say Q) with column matrices of varying dimensions (say 1568936x1 , 88x1,5040x1 ) etc
losing values isn't an issue. I need all the matrices inside the cell to be divisible by said number (say 500) so like 1568500x1 , 5000x1 skipping over 88x1 etc.
Currently I have:
z=cell(length(Q),1)
for p=1:length(z)
n=length(Q{p})
for w=1:length(z)
if n-mod(length(Q{w}),500)<500
w=w+1;
else
o=length(Q{w}-mod(length(Q{w}),500));
for k=1:length(z)
z=Q{w}((1:o));
end
end
end
end
but when I reach the 88x1 matrix it throws a dimensions exceeded error although I think I have covered that with the if condition where it should skip the matrix and move on to the next cell.
This should work fine:
Q = {
rand(58,1);
rand(168,1);
rand(33,1);
rand(199,1);
rand(100,1)
};
Q_len = numel(Q);
K = 50;
Z = cell(Q_len,1);
for i = 1:Q_len
Qi = Q{i};
Qi_len = numel(Qi);
k = floor(Qi_len / K) * K
Z{i} = Qi(1:k);
end
Given the starting vectors (shortened down in order to avoid excessive overloads), the final output Z is:
>> cellfun(#numel,Z)
ans =
50
150
0
150
100
If you want a shorter, one-liner version, here is one:
Q = {
rand(58,1);
rand(168,1);
rand(33,1);
rand(199,1);
rand(100,1)
};
K = 50;
Z = cellfun(#(x)x(1:(floor(numel(x)/K)*K)),Q,'UniformOutput',false);

Theano ANN "TypeError: randint() takes at least 1 positional argument (0 given)"

This is the error that I'm receiving:
File "mtrand.pyx",line 1192, in mtrand.RandomState.randint(numpy/random/mtrand/mtrand.c:14128)
I am somewhat new to coding, but I really want to get started with simple ANNs so I decided to start this project.
TypeError: randint() takes at least 1 positional argument (0 given)
# -- coding: utf-8 --
"""
Created on Sun Sep 18 14:56:44 2016
#author: Jamoonie
"""
##theano practice
import numpy as np
import theano
import theano.tensor as T
from sklearn.datasets import load_digits
digits=load_digits()
print (digits.data.shape)
train_x = list(digits.data)
#print train_x.count
train_x = np.array(train_x)
#print train_x
train_y = list(digits.target)
#print train_y.count
train_y = np.array(train_y)
#print train_y
#q = T.matrix('q') checking how matrix dot products work, and how the row,col of the W0 should be set up
#q = np.zeros([5,10])
#print q
#p = T.matrix('p')
#p = np.zeros([10,5])
#
#print np.dot(q,p)
nn_input_dim = train_x.shape[1] ## if shape[0] it yields 1797, which is the number of rows
print nn_input_dim ##shows 64; shape[1] yields 1 row thus 64 columns! which are the layers of data we want to apply
nn_hdim0 = 10
nn_output_dim = len(train_y)
#nn_hdim0 = np.transpose(np.zeros(digits.data.shape))
#print nn_hdim0
epsilon = 0.008
batch_size = 100 ## how much data input per iteration
X = T.matrix('X')
y = T.lvector('y')
## set weight shapeswith random values
#W0 = np.transpose(np.zeros(digits.data.shape))
W0 = theano.shared(np.random.randn(nn_input_dim,nn_hdim0),name='W0') ##the shape of W0 should be row=input_dim, col=# hidden nodes
b0 = theano.shared(np.zeros(nn_hdim0),name='b0')
W1 = theano.shared(np.random.randn(nn_hdim0,nn_output_dim),name='W1') ## shape of W1 should have row=#hidden nodes, col = output dimension
b1 = theano.shared(np.zeros(nn_output_dim),name='b1')
z0 = X.dot(W0)+b0
a0 = T.nnet.softmax(z0) ## first hidden layer result
z1 = a0.dot(W1)+b1
a1 = T.nnet.softmax(z1) ## final result or prediction
loss = T.nnet.categorical_crossentropy(a1,y).mean() ## howmuch the prediction differrs from the real result
prediction = T.argmax(a1,axis=1) ## the maximum values of a1, presented in index posn 1
fwd_propagation = theano.function([X],a1) ## forward propagation function dpeneding on the array of X values and final prediction
calc_loss = theano.function([X,y],loss)
predict= theano.function([X],prediction)
accuracy = theano.function([X],T.sum(T.eq(prediction,train_y))) ## T.eq is elementwise. so this does an elementwise sum of prediction and train_y
dW0 = T.grad(loss,W0)
dW1 = T.grad(loss,W1)
db0=T.grad(loss,b0)
db1=T.grad(loss,b1)
np.random.randint()
gradient_step = theano.function(
[X,y], ##for each set of X,y values
updates=((W1,W1-epsilon*dW1), ##updates W1 by deltaW1(error)*learning rate and subtracting from original W1
(W0,W0-epsilon*dW0),
(b1,b1-epsilon*db1),
(b0,b0-epsilon*db0)))
def build(iterations = 80000):
W1.set_value(np.random.randn(nn_hdim0,nn_output_dim)/np.sqrt(nn_input_dim)) ## why dividing by the sqrt of nn_input_dim,i'm not sure, but they're meant to be random anyway.
W0.set_value(np.random.randn(nn_input_dim,nn_hdim0)/np.sqrt(nn_input_dim))
b1.set_value(np.zeros(nn_output_dim))
b0.set_value(np.zeros(nn_hdim0))
for i in range(0, iterations):
batch_indicies=np.random.randint(0,17,size=100)
batch_x,batch_y=train_x[batch_indicies],train_y[batch_indicies]
gradient_step(batch_x,batch_y)
##so we're providing the values now for the weights, biases and input output values
if i%2000==0:
print("loss after iteration %r: %r" % (i, calc_loss(train_x,train_y)))
print(accuracy(train_x))
if i==80000:
print (W0,b0,W1,b1)
build()
As per the documentation, you need to at-least specify the lowest value of integer to be drawn from the distribution. If you want a random number less than 213 (to be exact between 0 and 213) then you would do r = np.random.randint(213), and if you want a random number between some range let's say 213 and 537 then you would do, r = np.random.randint(213, 537). Also you are trying to get a random number from randint(..) without even storing it to any variable (or passing to any function), which is useless. I would suggest going through basic Theano tutorials to get started, start from here.

How I can alter the probability of an event in Matlab?

I have a network with N = 5 nodes. The probability that a new connection exit node "Ni" is:
P(N1) = P(N2) = P(N3) = P(N4) = P(N5) = 1/5
And the sum of all P(Ni) = 1.
which is a uniform distribution. I would like, nodes N3 and N5 had more chance to leave the rest. For example:
P(N1) = P(N2) = P(N4) = 2/15
P(N3) = P(N5) = 3/10
And the sum of all P(Ni) = 1.
The code I am using now is this:
nodes = 21;
NODES=(1:nodes);
R=randperm(nodes);
nodeSource=NODES(R(1));
nodeDestin=NODES(R(2));
Thanks.
You might want to look at randsample
nodeSource = randsample(1:numel(P), numel(P), true, P)