Optimal transport code (Scipy linear programming optimization) takes much longer time - scipy

I have been trying to compute the Wasserstein distance between two one dimensional Gaussian distributions with mean 0.0 and 4.0, with variances 9.0 and 16.0 respectively. I used scipy.linprog.optimize module and used the "interior-point" method as said in the following link
https://yetanothermathprogrammingconsultant.blogspot.com/2019/10/scipy-linear-programming-large-but-easy.html.
However, it takes more than 17 hours, and still (my code is )running to run solve 300 x 300 LP matrix problems (i.e) 300 source nodes and 300 destination nodes. However, the document says it could be possible to solve the problem with 1000 source nodes and 1000 destination nodes.(i.e) one can solve the LP problem with 1,000,000 (one million) decisive variables. What is wrong with my code? Why it takes such a long time? Do we need large memory (or clusters) to solve such problems?
my code
from datetime import datetime
start_time = datetime.now()
from scipy.optimize import linprog
import scipy
#Initializing the LP matrix
Piprob=np.zeros(500*500).reshape(500,500)
def Piprobmin(Krv,rhoi,rhoj):
r1=np.shape(Krv)[0]
r2=np.shape(Krv)[1]
print("r1,r2",r1,r2)
#Computing the LP Matrix which has just two ones in each column
pmat=np.zeros((r1+r2)*(r1*r2)).reshape((r1+r2),(r1*r2))
for i in range(r1+r2):
for j in range(r1*r2):
if((i<r1) and (j<((i+1)*r2)) and (j>=(i*r2))):
pmat[i][j]=1
if(i>=r1):
for k in range(r1*r2):
if j==(i-r1)+(k*r2):
pmat[i][j]=1
#flattening the cost matrix into one dimensional array
krvf=Krv.flatten()
tempr=np.append(rhoi,rhoj)
Xv=[] #Creating list for joint probability matrix elements
res = scipy.optimize.linprog(c=krvf,method='interior-point',A_eq=pmat,b_eq=tempr,options=
{'sparse':True, 'disp':True})
print("res=\n",res)
wv=res.fun
for l1 in range(r1*r2):
Xv.append(res.x[l1])
Yv=np.array(Xv)
Yv=Yv.reshape(r1,r2)
#returning Yv-joint probability and ,Wv-minimized wasserstein distance
return Yv,wv
Piprob,W=Piprobmin(K,result1,result2) #K-cost function matrix,result1 is the first
#marginal,result2 is the second marginal
end_time = datetime.now()
print('Duration: {}'.format(end_time - start_time))
The size of the cost function is 300 X 300 and size, each marginal have 300 points (total 600 constraints). I verified my cost function is symmetric and non-negative. and each marginal is summed to one as they are just probabilities.

In the blog post the word sparse is used many times. Not without reason. It is extremely important to store the A matrix as a sparse matrix. Otherwise, you will not be able to handle large problems. The blog post discusses the difference in memory requirements of the transportation LP matrix in great detail, so this point should have been hard to miss.
Here is some example code on how to set up a transportation model with 1000 source nodes and 1000 destination nodes using scipy.optimize.linprog. Again, the LP matrix has 2,000 rows and 1,000,000 columns and is stored sparse.
import numpy as np
import scipy as sp
import scipy.sparse as sparse
import scipy.optimize as opt
from memory_profiler import profile
def GenerateData(M,N):
np.random.seed(123)
# form objective function
c = np.random.uniform(0,10,(M,N))
# demand, supply
s = np.random.uniform(0,15,M)
d = np.random.uniform(0,10,N)
assert np.sum(d) <= np.sum(s), "supply too small"
#print('c',c)
#print('s',s)
#print('d',d)
return {'c':c, 's':s, 'd':d, 'n':N, 'm':M}
def FormLPData(data):
rhs = np.append(data['s'],-data['d'])
# form A
# column (i,j)=n*i+j has two nonzeroes:
# 1 at row i with rhs supply(i)
# 1 at row N+j with rhs demand(j)
N = data['n']
M = data['m']
NZ = 2*N*M
irow = np.zeros(NZ, dtype=int)
jcol = np.zeros(NZ, dtype=int)
value = np.zeros(NZ)
for i in range(N):
for j in range(M):
k = M*i+j
k1 = 2*k
k2 = k1+1
irow[k1] = i
jcol[k1] = k
value[k1] = 1.0
irow[k2] = N+j
jcol[k2] = k
value[k2] = -1.0
A = sparse.coo_matrix((value, (irow, jcol)))
#print('A',A)
#print('rhs',rhs)
return {'A':A,'rhs':rhs}
#profile
def run():
# dimensions
M = 1000 # sources
N = 1000 # destinations
data = GenerateData(M,N)
lpdata = FormLPData(data)
res = opt.linprog(c=np.reshape(data['c'],M*N),A_ub=lpdata['A'],b_ub=lpdata['rhs'],options={'sparse':True, 'disp':True})
if __name__ == '__main__':
run()
So it looks like you totally missed the whole point about the blog post.

Related

Hierarchical Agglomerative clustering for Spark

I am working on a project using Spark and Scala and I am looking for a hierarchical clustering algorithm, which is similar to scipy.cluster.hierarchy.fcluster or sklearn.cluster.AgglomerativeClustering, which will be useable for large amounts of data.
MLlib for Spark implements Bisecting k-means, which needs as input the number of clusters. Unfortunately in my case, I don't know the number of clusters and I would prefer to use some distance threshold as an input parameter, as it is possible to use in those two python implementations above.
If anyone would know the answer, I would be very grateful.
So I had the same problem and after looking high and low found no answers so I will post what I did here in the hopes that it helps anyone else and that maybe someone will build on it.
The basic idea of what I did was to use bisecting K-means recursively to continue to split clusters in half until all points in the cluster were a specified distance away from the centroid. I was using gps data so I have a little bit of extra machinery to deal with that.
The first step is to create a model that will cut the data in half. I used bisecting K means but I think this would work with any of the pyspark clustering methods so long as you can get the distance to the centroid.
import pyspark.sql.functions as f
from pyspark import SparkContext, SQLContext
from pyspark.ml.clustering import BisectingKMeans
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
bkm = BisectingKMeans().setK(2).setSeed(1)
assembler = VectorAssembler(inputCols=['lat','long'], outputCol="features")
adf = assembler.transform(locAggDf)#locAggDf contains my location info
model = bkm.fit(adf)
# predictions will have the original data plus the "features" col which assigns a cluster number
predictions = model.transform(adf)
predictions.persist()
The next step is our recursive function. The idea here is that we specify some distance from the centroid and if any point in a cluster is farther than that distance we cut the cluster in half. When a cluster is tight enough that it meets the condition I add it to a result array that I use to build the final clustering
def bisectToDist(model, predictions, bkm, precision, result = []):
centers = model.clusterCenters()
# row[0] is predictedClusterNum, row[1] is unit, row[2] point lat, row[3] point long
# centers[row[0]] is the lat long of center, centers[row[0]][0] = lat, centers[row[0]][1] = long
distUdf = f.udf(
lambda row: getDistWrapper((centers[row[0]][0], centers[row[0]][1], row[1]), (row[2], row[3], row[1])),
FloatType())##getDistWrapper(is how I calculate the distance of lat and long but you can define any distance metric)
predictions = predictions.withColumn('dist', distUdf(
f.struct(predictions.prediction, predictions.encodedPrecisionUnit, predictions.lat, predictions.long)))
#create a df of all rows that were in clusters that had a point outside of the threshold
toBig = predictions.join(
predictions.groupby('prediction').agg({"dist": "max"}).filter(f.col('max(dist)') > self.precision).select(
'prediction'), ['prediction'], 'leftsemi')
#this could probably be improved
#get all cluster numbers that were to big
listids = toBig.select("prediction").distinct().rdd.flatMap(lambda x: x).collect()
#if all data points are within the speficed distance of the centroid we can return the clustering
if len(listids) == 0:
return predictions
# assuming binary class now k must be = 2
# if one of the two clusters was small enough we will not have another recusion call for that cluster
# we must save it and return it at this depth the clustiering that was 2 big will be cut in half in the loop below
if len(listids) == 1:
ok = predictions.join(
predictions.groupby('prediction').agg({"dist": "max"}).filter(
f.col('max(dist)') <= precision).select(
'prediction'), ['prediction'], 'leftsemi')
for clusterId in listids:
# get all of the pieces that were to big
part = toBig.filter(toBig.prediction == clusterId)
# we now deed to refit the subset of the data
assembler = VectorAssembler(inputCols=['lat', 'long'], outputCol="features")
adf = assembler.transform(part.drop('prediction').drop('features').drop('dist'))
model = bkm.fit(adf)
#predictions now holds the new subclustering and we are ready for recursion
predictions = model.transform(adf)
result.append(bisectToDist(model, predictions, bkm, result=result))
#return anything that was given and already good
if len(listids) == 1:
return ok
Finally we can call the function and build the resulting dataframe
result = []
self.bisectToDist(model, predictions, bkm, result=result)
#drop any nones can happen in recursive not top level call
result =[r for r in result if r]
r = result[0]
r = r.withColumn('subIdx',f.lit(0))
result = result[1:]
idx = 1
for r1 in result:
r1 = r1.withColumn('subIdx',f.lit(idx))
r = r.unionByName(r1)
idx = idx + 1
# each of the subclusters will have a 0 or 1 classification in order to make it 0 - n I added the following
r = r.withColumn('delta', r.subIdx * 100 + r.prediction)
r = r.withColumn('delta', r.delta - f.lag(r.delta, 1).over(Window.orderBy("delta"))).fillna(0)
r = r.withColumn('ddelta', f.when(r.delta != 0,1).otherwise(0))
r = r.withColumn('spacialLocNum',f.sum('ddelta').over(Window.orderBy(['subIdx','prediction'])))
#spacialLocNum should be the final clustering
Admittadly this is quite convoluted and slow but it does get the job done, hope this helps!

pytorch linear regression given wrong results

I implemented a simple linear regression and I’m getting some poor results. Just wondering if these results are normal or I’m making some mistake.
I tried different optimizers and learning rates, I always get bad/poor results
Here is my code:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from torch.autograd import Variable
class LinearRegressionPytorch(nn.Module):
def __init__(self, input_dim=1, output_dim=1):
super(LinearRegressionPytorch, self).__init__()
self.linear = nn.Linear(input_dim, output_dim)
def forward(self,x):
x = x.view(x.size(0),-1)
y = self.linear(x)
return y
input_dim=1
output_dim = 1
if torch.cuda.is_available():
model = LinearRegressionPytorch(input_dim, output_dim).cuda()
else:
model = LinearRegressionPytorch(input_dim, output_dim)
criterium = nn.MSELoss()
l_rate =0.00001
optimizer = torch.optim.SGD(model.parameters(), lr=l_rate)
#optimizer = torch.optim.Adam(model.parameters(),lr=l_rate)
epochs = 100
#create data
x = np.random.uniform(0,10,size = 100) #np.linspace(0,10,100);
y = 6*x+5
mu = 0
sigma = 5
noise = np.random.normal(mu, sigma, len(y))
y_noise = y+noise
#pass it to pytorch
x_data = torch.from_numpy(x).float()
y_data = torch.from_numpy(y_noise).float()
if torch.cuda.is_available():
inputs = Variable(x_data).cuda()
target = Variable(y_data).cuda()
else:
inputs = Variable(x_data)
target = Variable(y_data)
for epoch in range(epochs):
#predict data
pred_y= model(inputs)
#compute loss
loss = criterium(pred_y, target)
#zero grad and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()
#if epoch % 50 == 0:
# print(f'epoch = {epoch}, loss = {loss.item()}')
#print params
for name, param in model.named_parameters():
if param.requires_grad:
print(name, param.data)
There are the poor results :
linear.weight tensor([[1.7374]], device='cuda:0')
linear.bias tensor([0.1815], device='cuda:0')
The results should be weight = 6 , bias = 5
Problem Solution
Actually your batch_size is problematic. If you have it set as one, your targetneeds the same shape as outputs (which you are, correctly, reshaping with view(-1, 1)).
Your loss should be defined like this:
loss = criterium(pred_y, target.view(-1, 1))
This network is correct
Results
Your results will not be bias=5 (yes, weight will go towards 6 indeed) as you are adding random noise to target (and as it's a single value for all your data points, only bias will be affected).
If you want bias equal to 5 remove addition of noise.
You should increase number of your epochs as well, as your data is quite small and network (linear regression in fact) is not really powerful. 10000 say should be fine and your loss should oscillate around 0 (if you change your noise to something sensible).
Noise
You are creating multiple gaussian distributions with different variations, hence your loss would be higher. Linear regression is unable to fit your data and find sensible bias (as the optimal slope is still approximately 6 for your noise, you may try to increase multiplication of 5 to 1000 and see what weight and bias will be learned).
Style (a little offtopic)
Please read documentation about PyTorch and keep your code up to date (e.g. Variable is deprecated in favor of Tensor and rightfully so).
This part of code:
x_data = torch.from_numpy(x).float()
y_data = torch.from_numpy(y_noise).float()
if torch.cuda.is_available():
inputs = Tensor(x_data).cuda()
target = Tensor(y_data).cuda()
else:
inputs = Tensor(x_data)
target = Tensor(y_data)
Could be written succinctly like this (without much thought):
inputs = torch.from_numpy(x).float()
target = torch.from_numpy(y_noise).float()
if torch.cuda.is_available():
inputs = inputs.cuda()
target = target.cuda()
I know deep learning has it's reputation for bad code and fatal practice, but please do not help spreading this approach.

Merging two tensors by convolution in Keras

I'm trying to convolve two 1D tensors in Keras.
I get two inputs from other models:
x - of length 100
ker - of length 5
I would like to get the 1D convolution of x using the kernel ker.
I wrote a Lambda layer to do it:
import tensorflow as tf
def convolve1d(x):
y = tf.nn.conv1d(value=x[0], filters=x[1], padding='VALID', stride=1)
return y
x = Input(shape=(100,))
ker = Input(shape=(5,))
y = Lambda(convolve1d)([x,ker])
model = Model([x,ker], [y])
I get the following error:
ValueError: Shape must be rank 4 but is rank 3 for 'lambda_67/conv1d/Conv2D' (op: 'Conv2D') with input shapes: [?,1,100], [1,?,5].
Can anyone help me understand how to fix it?
It was much harder than I expected because Keras and Tensorflow don't expect any batch dimension in the convolution kernel so I had to write the loop over the batch dimension myself, which requires to specify batch_shape instead of just shape in the Input layer. Here it is :
import numpy as np
import tensorflow as tf
import keras
from keras import backend as K
from keras import Input, Model
from keras.layers import Lambda
def convolve1d(x):
input, kernel = x
output_list = []
if K.image_data_format() == 'channels_last':
kernel = K.expand_dims(kernel, axis=-2)
else:
kernel = K.expand_dims(kernel, axis=0)
for i in range(batch_size): # Loop over batch dimension
output_temp = tf.nn.conv1d(value=input[i:i+1, :, :],
filters=kernel[i, :, :],
padding='VALID',
stride=1)
output_list.append(output_temp)
print(K.int_shape(output_temp))
return K.concatenate(output_list, axis=0)
batch_input_shape = (1, 100, 1)
batch_kernel_shape = (1, 5, 1)
x = Input(batch_shape=batch_input_shape)
ker = Input(batch_shape=batch_kernel_shape)
y = Lambda(convolve1d)([x,ker])
model = Model([x, ker], [y])
a = np.ones(batch_input_shape)
b = np.ones(batch_kernel_shape)
c = model.predict([a, b])
In the current state :
It doesn't work for inputs (x) with multiple channels.
If you provide several filters, you get as many outputs, each being the convolution of the input with the corresponding kernel.
From given code it is difficult to point out what you mean when you say
is it possible
But if what you mean is to merge two layers and feed merged layer to convulation, yes it is possible.
x = Input(shape=(100,))
ker = Input(shape=(5,))
merged = keras.layers.concatenate([x,ker], axis=-1)
y = K.conv1d(merged, 'same')
model = Model([x,ker], y)
EDIT:
#user2179331 thanks for clarifying your intention. Now you are using Lambda Class incorrectly, that is why the error message is showing.
But what you are trying to do can be achieved using keras.backend layers.
Though be noted that when using lower level layers you will lose some higher level abstraction. E.g when using keras.backend.conv1d you need to have input shape of (BATCH_SIZE,width, channels) and kernel with shape of (kernel_size,input_channels,output_channels). So in your case let as assume the x has channels of 1(input channels ==1) and y also have the same number of channels(output channels == 1).
So your code now can be refactored as follows
from keras import backend as K
def convolve1d(x,kernel):
y = K.conv1d(x,kernel, padding='valid', strides=1,data_format="channels_last")
return y
input_channels = 1
output_channels = 1
kernel_width = 5
input_width = 100
ker = K.variable(K.random_uniform([kernel_width,input_channels,output_channels]),K.floatx())
x = Input(shape=(input_width,input_channels)
y = convolve1d(x,ker)
I guess I have understood what you mean. Given the wrong example code below:
input_signal = Input(shape=(L), name='input_signal')
input_h = Input(shape=(N), name='input_h')
faded= Lambda(lambda x: tf.nn.conv1d(input, x))(input_h)
You want to convolute each signal vector with different fading coefficients vector.
The 'conv' operation in TensorFlow, etc. tf.nn.conv1d, only support a fixed value kernel. Therefore, the code above can not run as you want.
I have no idea, too. The code you given can run normally, however, it is too complex and not efficient. In my idea, another feasible but also inefficient way is to multiply with the Toeplitz matrix whose row vector is the shifted fading coefficients vector. When the signal vector is too long, the matrix will be extremely large.

Select a subset of stocks using genetic algorithm in Matlab

I want to select 10 stocks out of the a possible set of given stocks that should be given some weight while the rest should be given zero weight. I have read the covariance matrix and returns from a file. My code is
Aeq = ones(1,stocks);
beq = 1;
lb = zeros(1,stocks);
up = ones(1,stocks);
options = gaoptimset;
options = gaoptimset(options,'PopulationSize' ,10);
fitnessFunction = #(x) (x * covariance * x') - (x * returns);
W = ga(fitnessFunction,stocks,[],[],Aeq,beq,lb,up,[],options);
This code is giving weights to all the stocks. I cannot figure it out how to limit the number to 10.
The 'PopulationSize' parameters specifies how many entities - in your case portfolios - exist at each epoch, it has nothing to do with the weights assigned to each asset.
You need to write appropriate crossoverFcn and mutationFcn functions that explicitly include maintaining exactly 10 non-zero weights.

Solving a system of equations using Python/Scipy for a set of measurements

I have an physical instrument of measurement (force platform with load cells) which gives me three values, A, B and C. It happens, though, that these values - that should be orthogonal - actually are somewhat coupled, due to physical characteristics of the measuring device, which causes cross-talk between applied and returned values of force and torque.
Then, it is recommended that a calibration matrix be used to transform the measured values into a better estimate of the actual values, like this:
The problem is that it is necessary to perform a SET of measurements, so that different measured(Fz, Mx, My) and actual(Fz, Mx, My) are least-squared to get some C matrix that works best for the system as a whole.
I can solve Ax = B problems with scipy.linalg.lststq, or even scipy.linalg.solve (giving an exact solution) for ONE measurement, but how should I proceed to consider a set of different measurements, each one with its own equation giving a potentially different 3x3 matrix?
Any help is much appreciated, thanks for reading.
I posted a similar question containing just the mathematical part of this at math.stackexchange.com, and this answer solved the problem:
math.stackexchange.com/a/232124/27435
In case anyone have a similar problem in the future, here is the almost literal Scipy implementation of that answer (first lines are initialization boilerplate code):
import numpy
import scipy.linalg
### Origin of the coordinate system: upper left corner!
"""
1----------2
| |
| |
4----------3
"""
platform_width = 600
platform_height = 400
# positions of each load cell (one per corner)
loadcell_positions = numpy.array([[0, 0],
[platform_width, 0],
[platform_width, platform_height],
[0, platform_height]])
platform_origin = numpy.array([platform_width, platform_height]) * 0.5
# applying a known force at known positions and taking the measurements
measurements_per_axis = 5
total_load = 50
results = []
for x in numpy.linspace(0, platform_width, measurements_per_axis):
for y in numpy.linspace(0, platform_height, measurements_per_axis):
position = numpy.array([x,y])
for loadpos in loadcell_positions:
moments = platform_origin-loadpos * total_load
load = numpy.array([total_load])
result = numpy.hstack([load, moments])
results.append(result)
results = numpy.array(results)
noise = numpy.random.rand(*results.shape) - 0.5
measurements = results + noise
# now expand ("stuff") the 3x3 matrix to get a linearly independent 3x3 matrix
expands = []
for n in xrange(measurements.shape[0]):
k = results[n,:]
m = measurements[n,:]
expand = numpy.zeros((3,9))
expand[0,0:3] = m
expand[1,3:6] = m
expand[2,6:9] = m
expands.append(expand)
expands = numpy.vstack(expands)
# perform the actual regression
C = scipy.linalg.lstsq(expands, measurements.reshape((-1,1)))
C = numpy.array(C[0]).reshape((3,3))
# the result with pure noise (not actual coupling) should be
# very close to a 3x3 identity matrix (and is!)
print C
Hope this helps someone!