scipy sparse matrix to cvxopt spmatrix? - scipy

I need to convert a scipy sparse matrix to cvxopt's sparse matrix format, spmatrix, and haven't come across anything yet (the matrix is too big to be converted to dense, of course). Any ideas how to do this?

The more robust answer is a combination of hpaulj's answer and OferHelman's answer.
def scipy_sparse_to_spmatrix(A):
coo = A.tocoo()
SP = spmatrix(coo.data.tolist(), coo.row.tolist(), coo.col.tolist(), size=A.shape)
return SP
Defining the shape variable preserves the dimensionality of A on SP. I found that any zero columns ending the scipy sparse matrix would be lost without this added step.

taken from http://maggotroot.blogspot.co.il/2013/11/constrained-linear-least-squares-in.html
coo = A.tocoo()
SP = spmatrix(coo.data, coo.row.tolist(), coo.col.tolist())

From http://cvxopt.org/userguide/matrices.html#sparse-matrices
cvxopt.spmatrix(x, I, J[, size[, tc]])
looks similar to the scipy.sparse
coo_matrix((data, (i, j)), [shape=(M, N)])
My guess is that if A is a matrix in coo format, that
cvxopt.spmatrix(A.data, A.row, A.col, A.shape)
would work. (I don't have cvxopt installed to test this.)

Related

What's the equivalence of cell2mat of Matlab in Julia?

Well, I am not sure if some of you has encountered the same issue.
I need to convert a matlab program into julia where 'cell2mat' was extensively used.
For example a big matrix A is composed of 3*2 small matrix, see a11, a12, a21, a22, a31, a32, whose dimensions are all 4*2.
Or A = [a11 a12; a21 a22; a31 a32] with a11 = rand(4,2) for example.
I first used an Array{Array{Float64,2},1} type to create the whole matrix A.
Then, I need to convert this A into a usual matrix, say, Array{Float64,2}.
I did try to do like hvcat((NUM),A...), but the order of the converted matrix doesn't correspond to the original Array{Array{Float64,2},1} type.
Thanks for anyone who could provide some pistes.
Wish you all a good day!
Let us start with an array:
A = [reshape((1:8) .+ (10i+100j), 4, 2) for i in 1:3, j in 1:2]
so that we can visually verify that the result is correct.
Now the approach could be:
hcat([vcat(A[:,i]...) for i in 1:size(A,2)]...)
In Julia 0.7 you can use the fact that permutedims is not recursive to get the same in a simpler way:
hvcat(size(A,2), permutedims(A)...)
This will also work under Julia 0.6 but you have to write permutedims(A, (2,1)).
As a side note it is interesting (and was problematic here) that hvcat traverses arguments in rows, but matrices are stored in columns.
EDIT: actually transpose is recursive in 0.7, changed to permutedims which also works under 0.6.
By transposing your original element of the array of array---which is matrix of course----and then transposing the final array of array in the command, such as hvcat(NUM,A...'), the correct answer can be found.

MATLAB: How to apply a vectorized function using sparsity structure?

I need to (repeatedly) build a vector of length 200 from a vector of length 2500. I can describe this operation using multiplication by a matrix which is extremely sparse: it is 200x2500 and has only one entry in each row. But I have very little control over where this entry is. My actual problem is that I need to apply this matrix not to the vector that I currently have, but rather to some componentwise function of this vector. Since I have all this sparsity, it is wasteful to apply this componentwise function to all 2500 components of my vector. Instead I would rather apply it only to the 200 components that actually contribute.
A program (with randomly chosen numbers replacing of my actual numbers) which would have a similar problem would be something like this:
ind=randi(2500,200,1);
coefficients=randn(200,1);
A=sparse(1:200,ind,coefficients,200,2500);
x=randn(2500,1);
y=A*subplus(x);
What I don't like here is applying subplus to all of x; I would rather only have to apply it to x(ind), since only that contributes to the matrix product.
Right now the only way I can see to work around this is to replace my sparse matrix with a 200-component vector of coefficients and a 200-component vector of indices. Working this way, the code above would become:
ind=randi(2500,200,1);
coefficients=randn(200,1);
x=randn(2500,1);
y=coefficients.*subplus(x(ind))
Is there a better way to do this, preferably one that would work when A contains a few elements per row instead of just one?
The code in your question throws an exception, I think it should be:
n=2500;
m=200;
ind=randi(n,m,1);
coefficients=randn(m,1);
A=sparse(1:m,ind,coefficients,m,n);
x=randn(n,1);
Your idea using x(ind) was basically right, but ind would reorder x which is not intended. Instead you could use sort(unique(ind)). I opted to use the sparse logical index any(A~=0) because I expect it to be faster, but you could compare both versions.
%original code
y=A*subplus(x);
.
%multiplication using sparse logical indexing:
relevant=any(A~=0);
y=A(:,relevant)*subplus(x(relevant));
.
%fixed version of your code
relevant=sort(unique(ind));
y=A(:,relevant)*subplus(x(relevant));

Conversion from Matlab CSC to CSR format

I am using mex bridge to perform some operations on Sparse matrices from Matlab.
For that I need to convert input matrix into CSR (compressed row storage) format, since Matlab stores the sparse matrices in CSC (compressed column storage).
I was able to get value array and column_indices array. However, I am struggling to get row_pointer array for CSR format.Is there any C library that can help in conversion from CSC to CSR ?
Further, while writing a CUDA kernel, will it be efficient to use CSR format for sparse operations or should I just use following arrays :- row indices, column indices and values?
Which on would give me more control over the data, minimizing the number for-loops in the custom kernel?
Compressed row storage is similar to compressed column storage, just transposed. So the simplest thing is to use MATLAB to transpose the matrix before you pass it to your MEX file. Then, use the functions
Ap = mxGetJc(spA);
Ai = mxGetIr(spA);
Ax = mxGetPr(spA);
to get the internal pointers and treat them as row storage. Ap is row pointer, Ai is column indices of the non-zero entries, Ax are the non-zero values. Note that for symmetric matrices you do not have to do anything at all! CSC and CSR are the same.
Which format to use heavily depends on what you want to do with the matrix later. For example, have a look at matrix formats for Sparse matrix vector multiplication. That is one of the classic papers, research has moved since then so you can look around further.
I ended up converting CSC format from Matlab to CSR using CUSP library as follows.
After getting the matrix A from matlab and I got its row,col and values vectors and I copied them in respective thrust::host_vector created for each of them.
After that I created two cusp::array1d of type Indices and Values as follows.
typedef typename cusp::array1d<int,cusp::host_memory>Indices;
typedef typename cusp::array1d<float,cusp::host_memory>Values;
Indices row_indices(rows.begin(),rows.end());
Indices col_indices(cols.begin(),cols.end());
Values Vals(Val.begin(),Val.end());
where rows, cols and Val are thrust::host_vector that I got from Matlab.
After that I created a cusp::coo_matrix_view as given below.
typedef cusp::coo_matrix_view<Indices,Indices,Values>HostView;
HostView Ah(m,n,NNZa,row_indices,col_indices,Vals);
where m,n and NNZa are the parameters that I get from mex functions of sparse matrices.
I copied this view matrix to cusp::csr_matrixin device memory with proper dimensions set as given below.
cusp::csr_matrix<int,float,cusp::device_memory>CSR(m,n,NNZa);
CSR = Ah;
After that I just copied the three individual content arrays of this CSR matrix back to the host using thrust::raw_pointer_cast where arrays with proper dimension are already mxCalloced as given below.
cudaMemcpy(Acol,thrust::raw_pointer_cast(&CSR.column_indices[0]),sizeof(int)*(NNZa),cudaMemcpyDeviceToHost);
cudaMemcpy(Aptr,thrust::raw_pointer_cast(&CSR.row_offsets[0]),sizeof(int)*(n+1),cudaMemcpyDeviceToHost);
cudaMemcpy(Aval,thrust::raw_pointer_cast(&CSR.values[0]),sizeof(float)*(NNZa),cudaMemcpyDeviceToHost);
Hope this is useful to anyone who is using CUSP with Matlab
you can do something like this:
n = size(M,1);
nz_num = nnz(M);
[col,rowi,vals] = find(M');
row = zeros(n+1,1);
ll = 1; row(1) = 1;
for l = 2:n
if rowi(l)~=rowi(l-1)
ll = ll + 1;
row(ll) = l;
end
end
row(n+1) = nz_num+1;`
It works for me, hope it can help somebody else!

Forcing a specific size when using spconvert in Matlab

I am loading a sparse matrix in MATLAB using the command:
A = spconvert(load('mymatrix.txt'));
I know that the dimension of my matrix is 1222 x 1222, but the matrix is loaded as 1220 x 1221. I know that it is impossible for MATLAB to infer the real size of my matrix, when it is saved sparse.
A possible solution for making A the right size, is to include a line in mymatrix.txt with the contents "1222 1222 0". But I have hundreds of matrices, and I do not want to do this in all of them.
How can I make MATLAB change the size of the matrix to a 1222 x 1222?
I found the following solution to the problem, which is simple and short, but not as elegant as I hoped:
A = spconvert(load('mymatrix.txt'));
if size(A,1) ~= pSize || size(A,2) ~= pSize
A(pSize,pSize) = 0;
end
where pSize is the preferred size of the matrix. So I load the matrix, and if the dimensions are not as I wanted, I insert a 0-element in the lower right corner.
Sorry, this post is more a pair of clarifying questions than it is an answer.
First, is the issue with the 'load' command or with 'spconvert'? As in, if you do
B = load('mymatrix.txt')
is B the size you expect? If not, then you can use 'textread' or 'fread' to write a function that creates the matrix of the right size before inputting into 'spconvert'.
Second, you say that you are loading several matrices. Is the issue consistent among all the matrices you are loading. As in, does the matrix always end up being two rows less and one column less than you expect?
I had the same problem, and this is the solution I came across:
nRows = 1222;
nCols = 1222;
A = spconvert(load('mymatrix.txt'));
[i,j,s] = find(A);
A = sparse(i,j,s,nRows,nCols);
It's an adaptation of one of the examples here.

MATLAB: Using interpolation to replace missing values (NaN)

I have cell array each containing a sequence of values as a row vector. The sequences contain some missing values represented by NaN.
I would like to replace all NaNs using some sort of interpolation method, how can I can do this in MATLAB? I am also open to other suggestions on how to deal with these missing values.
Consider this sample data to illustrate the problem:
seq = {randn(1,10); randn(1,7); randn(1,8)};
for i=1:numel(seq)
%# simulate some missing values
ind = rand( size(seq{i}) ) < 0.2;
seq{i}(ind) = nan;
end
The resulting sequences:
seq{1}
ans =
-0.50782 -0.32058 NaN -3.0292 -0.45701 1.2424 NaN 0.93373 NaN -0.029006
seq{2}
ans =
0.18245 -1.5651 -0.084539 1.6039 0.098348 0.041374 -0.73417
seq{3}
ans =
NaN NaN 0.42639 -0.37281 -0.23645 2.0237 -2.2584 2.2294
Edit:
Based on the responses, I think there's been a confusion: obviously I'm not working with random data, the code shown above is simply an example of how the data is structured.
The actual data is some form of processed signals. The problem is that during the analysis, my solution would fail if the sequences contain missing values, hence the need for filtering/interpolation (I already considered using the mean of each sequence to fill the blanks, but I am hoping for something more powerful)
Well, if you're working with time-series data then you can use Matlab's built in interpolation function.
Something like this should work for your situation, but you'll need to tailor it a little ... ie. if you don't have equal spaced sampling you'll need to modify the times line.
nseq = cell(size(seq))
for i = 1:numel(seq)
times = 1:length(seq{i});
mask = ~isnan(seq{i});
nseq{i} = seq{i};
nseq{i}(~mask) = interp1(times(mask), seq{i}(mask), times(~mask));
end
You'll need to play around with the options of interp1 to figure out which ones work best for your situation.
I would use inpaint_nans, a tool designed to replace nan elements in 1-d or 2-d matrices by interpolation.
seq{1} = [-0.50782 -0.32058 NaN -3.0292 -0.45701 1.2424 NaN 0.93373 NaN -0.029006];
seq{2} = [0.18245 -1.5651 -0.084539 1.6039 0.098348 0.041374 -0.73417];
seq{3} = [NaN NaN 0.42639 -0.37281 -0.23645 2.0237];
for i = 1:3
seq{i} = inpaint_nans(seq{i});
end
seq{:}
ans =
-0.50782 -0.32058 -2.0724 -3.0292 -0.45701 1.2424 1.4528 0.93373 0.44482 -0.029006
ans =
0.18245 -1.5651 -0.084539 1.6039 0.098348 0.041374 -0.73417
ans =
2.0248 1.2256 0.42639 -0.37281 -0.23645 2.0237
If you have access to the System Identification Toolbox, you can use the MISDATA function to estimate missing values. According to the documentation:
This command linearly interpolates
missing values to estimate the first
model. Then, it uses this model to
estimate the missing data as
parameters by minimizing the output
prediction errors obtained from the
reconstructed data.
Basically the algorithm alternates between estimating missing data and estimating models, in a way similar to the Expectation Maximization (EM) algorithm.
The model estimated can be any of the linear models idmodel (AR/ARX/..), or if non given, uses a default-order state-space model.
Here's how to apply it to your data:
for i=1:numel(seq)
dat = misdata( iddata(seq{i}(:)) );
seq{i} = dat.OutputData;
end
Use griddedInterpolant
There also some other functions like interp1. For curved plots spline is the the best method to find missing data.
As JudoWill says, you need to assume some sort of relationship between your data.
One trivial option would be to compute the mean of your total series, and use those for missing data. Another trivial option would be to take the mean of the n previous and n next values.
But be very careful with this: if you're missing data, you're generally better to deal with those missing data, than to make up some fake data that could screw up your analysis.
Consider the following example
X=some Nx1 array
Y=F(X) with some NaNs in it
then use
X1=X(find(~isnan(Y)));
Y1=Y(find(~isnan(Y)));
Now interpolate over X1 and Y1 to compute all values at all X.