Applying scipy.sparse.linalg.svds throws a Memory Error? - scipy

I try to decompose a sparse matrix(40,000×1,400,000) with scipy.sparse.linalg.svds on my 64-bit machine with 140GB RAM. as following:
k = 5000
tfidf_mtx = tfidf_m.tocsr()
u_45,s_45,vT_45 = scipy.sparse.linalg.svds(tfidf_mtx, k=k)
When the K ranges from 1000 to 4500, it works. But the K is 5000, it throws an MemoryError.The precise error is given below:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-6-31a69ce54e2c> in <module>()
4 k = 4000
5 tfidf_mtx = tfidf_m.tocsr()
----> 6 get_ipython().magic(u'time u_50,s_50,vT_50 =linalg.svds(tfidf_mtx, k=k))
7 # print len(s),s
8
/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
2163 magic_name, _, magic_arg_s = arg_s.partition(' ')
2164 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2165 return self.run_line_magic(magic_name, magic_arg_s)
2166
2167 #-------------------------------------------------------------------------
/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
2084 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2085 with self.builtin_trap:
-> 2086 result = fn(*args,**kwargs)
2087 return result
2088
/usr/lib/python2.7/dist-packages/IPython/core/magics/execution.pyc in time(self, line, cell, local_ns)
/usr/lib/python2.7/dist-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
189 # but it's overkill for just that one bit of state.
190 def magic_deco(arg):
--> 191 call = lambda f, *a, **k: f(*a, **k)
192
193 if callable(arg):
/usr/lib/python2.7/dist-packages/IPython/core/magics/execution.pyc in time(self, line, cell, local_ns)
1043 else:
1044 st = clock2()
-> 1045 exec code in glob, local_ns
1046 end = clock2()
1047 out = None
<timed exec> in <module>()
/usr/local/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.pyc in svds(A, k, ncv, tol, which, v0, maxiter, return_singular_vectors)
1751 else:
1752 ularge = eigvec[:, above_cutoff]
-> 1753 vhlarge = _herm(X_matmat(ularge) / slarge)
1754
1755 u = _augmented_orthonormal_cols(ularge, nsmall)
/usr/local/lib/python2.7/dist-packages/scipy/sparse/base.pyc in dot(self, other)
244
245 """
--> 246 return self * other
247
248 def __eq__(self, other):
/usr/local/lib/python2.7/dist-packages/scipy/sparse/base.pyc in __mul__(self, other)
298 return self._mul_vector(other.ravel()).reshape(M, 1)
299 elif other.ndim == 2 and other.shape[0] == N:
--> 300 return self._mul_multivector(other)
301
302 if isscalarlike(other):
/usr/local/lib/python2.7/dist-packages/scipy/sparse/compressed.pyc in _mul_multivector(self, other)
463
464 result = np.zeros((M,n_vecs), dtype=upcast_char(self.dtype.char,
--> 465 other.dtype.char))
466
467 # csr_matvecs or csc_matvecs
MemoryError:
The when the k is 3000 and 4500, the ratio of the sum of the square of singular values to the sum of the square of all matrix entities is respectively 0.7033 and 0.8230. I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this.

So the return is an (M,k) array. On an ordinary older machine:
In [368]: np.ones((40000,1000))
....
In [369]: np.ones((40000,4000))
...
In [370]: np.ones((40000,5000))
...
--> 190 a = empty(shape, dtype, order)
191 multiarray.copyto(a, 1, casting='unsafe')
192 return a
MemoryError:
Now may just be a coincidence that I hit the memory error at the same size are your code. But if you make the problem big enough you will hit memory errors at some point.
Your stacktrace shows the error occurs while multiplying a sparse matrix and a dense 2d array (other), and the result will be dense as well.

Related

Find the Median of a list<int> in Dart

I have a list of integers which contains times in milliseconds (ex. 1433, 834, 1020..). I need to calculate the Median. I developed the following code but the Median I get is completely wrong compared to the one I calculate in Excel. Any ideas? is there any Dart/flutter library I could use for statistics?
/// Calculate median
static int calculateMedian(TimeRecordNotifier timeRecordNotifier) {
List<int> mList = List();
timeRecordNotifier.timeRecords.forEach((element) {
mList.add(element.partialTime);
});
//clone list
List<int> clonedList = List();
clonedList.addAll(mList);
int median;
//sort list
clonedList.sort((a, b) => a.compareTo(b));
if (clonedList.length == 1)
median = mList[clonedList.length - 1];
else if (clonedList.length % 2 == 1)
median = mList[(((clonedList.length) / 2) - 1).round()];
else {
int lower = mList[((clonedList.length ~/ 2) - 1)];
int upper = mList[(clonedList.length ~/ 2)];
median = ((lower + upper) / 2.0).round();
}
return median;
}
On the following dataset the expected median value is 901,5, however this algorithm gives me 461
131
144
203
206
241
401
415
427
439
439
452
455
456
469
471
471
483
483
491
495
495
502
505
512
521
522
523
547
551
561
610
727
745
777
790
793
892
911
924
943
957
977
978
989
992
1008
1024
1039
1070
1074
1092
1115
1139
1155
1159
1174
1176
1194
1203
1208
1227
1228
1248
1270
1271
1272
1273
1276
1284
1290
1294
1439
1740
1786
I refactored the code into this using NumDart implementation and now it works. thanks #MartinM for you comment!
/// Calculate median
static int calculateMedian(TimeRecordNotifier timeRecordNotifier) {
List<int> mList = List();
timeRecordNotifier.timeRecords.forEach((element) {
mList.add(element.partialTime);
});
//clone list
List<int> clonedList = List();
clonedList.addAll(mList);
//sort list
clonedList.sort((a, b) => a.compareTo(b));
int median;
int middle = clonedList.length ~/ 2;
if (clonedList.length % 2 == 1) {
median = clonedList[middle];
} else {
median = ((clonedList[middle - 1] + clonedList[middle]) / 2.0).round();
}
return median;
}

Constraints in scipy.optimize throwing x0 error

Looking to take a list of stocks and adjust their weight in a portfolio until the overall portfolio beta is 1.0 the output of "stonkBetas" is static and is:
[3.19292010501853,
0.7472001935364129,
1.0889157697158605,
0.8944059912707691,
0.04192080860817828,
1.0011520737327186,
0.9155119223385676]
I then create two functions. One to define how the betas are weighted. The second just as con that will make the constraint that the sum of the minimized weighted portfolio will have an overall summed beta of 1.0.
def betaOpp(weights):
a,b,c,d,e,f,g=weights
f=a*stonkBetas[0]+b*stonkBetas[1]+c*stonkBetas[2]+d*stonkBetas[3]+e*stonkBetas[4]+f*stonkBetas[5]+g*stonkBetas[6]
return f
initial_guess = [.1,.1,.1,.1,.2,.2,.2]
print('hi')
print(sum(initial_guess))
print('bye')
def con(t):
print('this should be zero:')
print(sum(t)-1)
return sum(t) - 1.0
cons = {'type':'eq', 'fun': con}
bnds = ((.02,.8),(.02,.8),(.02,.8),(.02,.8),(.02,.8),(.02,.8),(.02,.8))
res = optimize.minimize(betaOpp,initial_guess, bounds=bnds, constraints=cons)
print(res)
This gives me this output
hi
1.0
bye
this should be zero:
0.0
this should be zero:
0.0
this should be zero:
0.0
this should be zero:
1.4901161193847656e-08
this should be zero:
1.4901161193847656e-08
this should be zero:
1.4901161193847656e-08
this should be zero:
1.4901161193847656e-08
this should be zero:
1.4901161193847656e-08
this should be zero:
1.4901161193847656e-08
this should be zero:
1.4901161193847656e-08
this should be zero:
6.661338147750939e-16
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-30-6567109e94a4> in <module>
16 cons = {'type':'eq', 'fun': con}
17 bnds = ((.02,.8),(.02,.8),(.02,.8),(.02,.8),(.02,.8),(.02,.8),(.02,.8))
---> 18 res = optimize.minimize(betaOpp,x0=initial_guess, bounds=bnds, constraints=cons)
19 print(res)
/opt/miniconda3/lib/python3.6/site-packages/scipy/optimize/_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
624 elif meth == 'slsqp':
625 return _minimize_slsqp(fun, x0, args, jac, bounds,
--> 626 constraints, callback=callback, **options)
627 elif meth == 'trust-constr':
628 return _minimize_trustregion_constr(fun, x0, args, jac, hess, hessp,
/opt/miniconda3/lib/python3.6/site-packages/scipy/optimize/slsqp.py in _minimize_slsqp(func, x0, args, jac, bounds, constraints, maxiter, ftol, iprint, disp, eps, callback, finite_diff_rel_step, **unknown_options)
424
425 if mode == -1: # gradient evaluation required
--> 426 g = append(sf.grad(x), 0.0)
427 a = _eval_con_normals(x, cons, la, n, m, meq, mieq)
428
/opt/miniconda3/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in grad(self, x)
186 if not np.array_equal(x, self.x):
187 self._update_x_impl(x)
--> 188 self._update_grad()
189 return self.g
190
/opt/miniconda3/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in _update_grad(self)
169 def _update_grad(self):
170 if not self.g_updated:
--> 171 self._update_grad_impl()
172 self.g_updated = True
173
/opt/miniconda3/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in update_grad()
90 self.ngev += 1
91 self.g = approx_derivative(fun_wrapped, self.x, f0=self.f,
---> 92 **finite_diff_options)
93
94 self._update_grad_impl = update_grad
/opt/miniconda3/lib/python3.6/site-packages/scipy/optimize/_numdiff.py in approx_derivative(fun, x0, method, rel_step, abs_step, f0, bounds, sparsity, as_linear_operator, args, kwargs)
389
390 if np.any((x0 < lb) | (x0 > ub)):
--> 391 raise ValueError("`x0` violates bound constraints.")
392
393 if as_linear_operator:
ValueError: `x0` violates bound constraints.
And I just don't understand where I'm going wrong. The x0 is perfectly 1.0 - I can see it! Hopefully I'm just doing something stupid here. Please help!

networkx maximum_flow crashes on some pairs of nodes

I have a graph composed of 742 edges, and 360 nodes.
I want to compute max flow between some pairs of nodes and it happens, for some of them the nx.maximum_flow ends with the pasted error, despite the fact that a path exists between the two concerned nodes.
Any idea what causes that?
Thanks.
ValueError Traceback (most recent call last)
<ipython-input-186-6dae3501e3d0> in <module>()
1 #print(nx.shortest_path(G,source="Sink_0",target="node_32"))
----> 2 nx.maximum_flow(G, "Sink_0", "Aircraft2_32")
/Library/Python/2.7/site-packages/networkx/algorithms/flow/maxflow.pyc in maximum_flow(G, s, t, capacity, flow_func, **kwargs)
156 raise nx.NetworkXError("flow_func has to be callable.")
157
--> 158 R = flow_func(G, s, t, capacity=capacity, value_only=False, **kwargs)
159 flow_dict = build_flow_dict(G, R)
160
/Library/Python/2.7/site-packages/networkx/algorithms/flow/preflowpush.pyc in preflow_push(G, s, t, capacity, residual, global_relabel_freq, value_only)
420 """
421 R = preflow_push_impl(G, s, t, capacity, residual, global_relabel_freq,
--> 422 value_only)
423 R.graph['algorithm'] = 'preflow_push'
424 return R
/Library/Python/2.7/site-packages/networkx/algorithms/flow/preflowpush.pyc in preflow_push_impl(G, s, t, capacity, residual, global_relabel_freq, value_only)
279 break
280 u = next(iter(level.active))
--> 281 height = discharge(u, False)
282 if grt.is_reached():
283 # Global relabeling heuristic.
/Library/Python/2.7/site-packages/networkx/algorithms/flow/preflowpush.pyc in discharge(u, is_phase1)
156 # We have run off the end of the adjacency list, and there can
157 # be no more admissible edges. Relabel the node to create one.
--> 158 height = relabel(u)
159 if is_phase1 and height >= n - 1:
160 # Although the node is still active, with a height at least
/Library/Python/2.7/site-packages/networkx/algorithms/flow/preflowpush.pyc in relabel(u)
125 """
126 grt.add_work(len(R_succ[u]))
--> 127 return min(R_node[v]['height'] for v, attr in R_succ[u].items()
128 if attr['flow'] < attr['capacity']) + 1
129
ValueError: min() arg is an empty sequence

Bicubic interpolation beyond grid values in Matlab

Is it possible to achieve bi-cubic interpolation beyond grid values?
For example:
L = [5,10,20,25,40];
W= 1:3;
S= [50 99 787
779 795 850
803 779 388
886 753 486
849 780 598];
size1 = griddata(W,L,S,2,40,'cubic')
sizeBeyond = griddata(W,L,S,2,41,'cubic')
sizeV4 = griddata(W,L,S,2,41,'v4')
returns:
size1 = 780
sizeBeyond = NaN
sizeV4 = 721.57
What I was suggesting is, you can input the values which are extrapolated. Check the below code. But note that, as suggested by flawr, the extrapolation behave really bad.
l = [5,10,20,25,40];
w = 1:3;
li = [l 41] ;
S = [50 99 787
779 795 850
803 779 388
886 753 486
849 780 598];
[W,L] = meshgrid(w,l) ;
[Wi,Li] = meshgrid(w,li) ;
Si = interp2(W,L,S,Wi,Li,'spline') ;
size1 = griddata(W,L,S,2,40,'cubic')
sizeBeyond = griddata(Wi,Li,Si,2,41,'cubic')
sizeV4 = griddata(W,L,S,2,41,'v4')
Note: Don't use inbuilt commands like length,size etc as variables in the code, even for demonstration, it is trouble some.
Though, this is not answer, I have to post it here as for discussion.

How to arrange pixel address as order under certain condition?

I am currently doing map processing in matlab. Now i solved the maze and get the path of maze. Now i have turning point in map. But this address pixel are not in correct order. So i want to order the incorrect order of pixel address in to correct order.
INCORRECT ORDER:
shape(1).cen=[28;136];
shape(2).cen=[122;136];
shape(3).cen=[344;391];
shape(4).cen=[548;493];
shape(5).cen=[548;191];
shape(6).cen=[344;191];
shape(7).cen=[122;391];
CORRECT ORDR:
map(1).cen=[28;136];
map(2).cen=[122;136];
map(3).cen=[122;391];
map(4).cen=[344;391];
map(5).cen=[344;191];
map(6).cen=[548;191];
map(7).cen=[548;493];
My code is below:-
`map(1).cen=[28;136];
o=0; order=1;xflag=0;yflag=0;
k=length(shape); %indicates the total elements in shape.cen structure
for (j=1:k)
order=order+1; o=o+1;
if (j==1)
x=map(1).cen(1,1);
y=map(1).cen(2,1);
for(i=1:k)
xi=shape(i).cen(1,1);
yi=shape(i).cen(2,1);
if((x==xi)||(y==yi))
if(x==xi)
map(order).cen(1,1)=xi;
map(order).cen(2,1)=yi;
xflag=1;
break;
else
(y==yi)
map(order).cen(1,1)=xi;
map(order).cen(2,1)=yi;
yflag=1;
break;
end
end
end
end
x=map(o).cen(1,1);
y=map(o).cen(2,1);
for(i=1:k)
xi=shape(i).cen(1,1);
yi=shape(i).cen(2,1);
if(xflag==1)
if(y==yi)
map(order).cen(1,1)=xi;
map(order).cen(2,1)=yi;
xflag=0;
yflag=1;
break;
end
end
if (yflag==1)
if(x==xi)
map(order).cen(1,1)=xi;
map(order).cen(2,1)=yi;
xflag=1;
yflag=0;
break;
end
end
end
end
`
[shape.cen]' will give you the following array:
ans =
28 136
122 136
344 391
548 493
548 191
344 191
122 391
Now that it's a regular numerical array, you can use sortrows, like this.
map = sortrows([shape.cen]')
to get:
map =
28 136
122 136
122 391
344 191
344 391
548 191
548 493
If you don't want it as a numerical array, but a struct similar to shape, you can do:
[~, ID] = sortrows([shape.cen]')
map = shape(ID)'