I know there are many similar questions like the title ask here.
However, I still believe no of these discussions can solve my problem.
I am testing the following software "SPOTless" developed by MIT:
https://github.com/spot-toolbox/spotless
This is a software which transform the polynomial optimization problem to LP or SOCP. And then call the other software "mosek" to solve it.
This example is from the tutorial in:
https://github.com/spot-toolbox/spotless/tree/master/doc (please see the last example)
n = 2 ;
d = 4 ;
x = msspoly('x',n ) ;
basis = monomials(x , 0 : d ) ;
p = randn(length(basis))'*basis ;
g = 1 - x'*x ;
prog = spotsosprog ;
prog = prog.withIndeterminate(x) ;
[prog,r] = prog.newFree(1) ;
[prog,f] = prog.newFreePoly(x,monomials(x,0:d-2)) ;
prog = prog.withSOS(r-p-f*g) ;
sol = prog.minimize(r) ;
However, when run it, Matlab shows the following error:
The line 401 and the related part is the following:
function [pr,poly,coeff] = newFreePoly(pr,basis,n)
if nargin < 3, n = 1; end
if ~isempty(basis) && n > 0 % line 401
if ~pr.isPolyInIndet(basis)
error('Basis must be polynomial in the indeterminates.');
end
[pr,coeff] = pr.newFree(length(basis)*n);
poly = reshape(coeff,n,length(basis))*basis;
else
poly = [];
end
end
$n$ is an integer, so $n > 0$ is valid obviously.
I ask the related professor (not the author of this software); he told me he has not seen this problem before. The problem comes from one .m file in the software.
I am not sure if someone can help me test this software and this example and let me know the result or how to solve this error.
Note: if you really test this software (easy to install), please also install "mosek", which is a famous optimization software.
Note: if you do not want to install "mosek", please cross out the last two lines in the code.
See the code and error. I have already tried Do, For,...and it is not working.
CODE + Error from Mathematica:
Import of survival probabilities _{k}p_x and _{k}p_y (calculated in excel)
px = Import["C:\Users\Eva\Desktop\kpx.xlsx"];
px = Flatten[Take[px, All], 1];
NOTE: The probability _{k}p_x can be found on the position px[[k+2, x -16]
i = 0.04;
v = 1/(1 + i);
JointLifeIndep[x_, y_, n_] = Sum[v^k*px[[k + 2, x - 16]]*py[[k + 2, y - 16]], {k , 0, n - 1}]
Part::pkspec1: The expression 2+k cannot be used as a part specification.
Part::pkspec1: The expression 2+k cannot be used as a part specification.
Part::pkspec1: The expression 2+k cannot be used as a part specification.
General::stop: Further output of Part::pkspec1 will be suppressed during this calculation.
Part of dataset (left corner of the dataset):
k\x 18 19 20
0 1 1 1
1 0.999478086278185 0.999363078716059 0.99927911905056
2 0.998841497412202 0.998642656911039 0.99858030519133
3 0.998121451605207 0.99794428814123 0.99788275311401
4 0.997423447323642 0.997247180349674 0.997174407432264
5 0.996726703362208 0.996539285828369 0.996437857252448
6 0.996019178300768 0.995803204773039 0.99563600297737
7 0.995283481416241 0.995001861216016 0.994823584922968
8 0.994482556091416 0.994189960607964 0.99405569519175
9 0.993671079225432 0.99342255996206 0.993339856748282
10 0.992904079096455 0.992707177451333 0.992611817294026
11 0.992189069953677 0.9919796017009 0.991832027835091
Without having the exact same data files to work with it is often easy for each of us to make mistakes that the other cannot reproduce or understand.
From your snapshot of your data set I used Export in Mathematica to try to reproduce your .xlsx file. Then I tried the following
px = Import["kpx.xlsx"];
px = Flatten[Take[px, All], 1];
py = px; (* fake some py data *)
i = 0.04;
v = 1/(1 + i);
JointLifeIndep[x_, y_, n_] := Sum[v^k*px[[k+2,x-16]]*py[[k+2,y-16]], {k,0,n-1}];
JointLifeIndep[17, 17, 12]
and it displays 362.402
Notice I used := instead of = in my definition of JointLifeIndep. := and = do different things in Mathematica. = will immediately evaluate the right hand side of that definition. This is possibly the reason that you are getting the error that you do.
You should also be careful with your subscript values and make sure that every subscript is between 1 and the number of rows (or columns) in your matrix.
So see if you can try this example with an Excel sheet containing only the snapshot of data that you showed and see if you get the same result that I do.
Hopefully that will be enough for you to make progress.
This answer points to the difference between an Epoch and an iteration while training a neural network. However, when I look at the source code for the solver API in the Stanford CS231n course (and I'm assuming this is the case for most libraries out there as well), during each iteration, batch_size number of examples are randomly selected with replacement. Thus, there is no guarantee that all examples would been seen during each epoch is there?
Does an epoch then mean that all examples would be seen in expectation? Or am I understanding this wrong?
Relevant Source Code:
def _step(self):
"""
Make a single gradient update. This is called by train() and should not
be called manually.
"""
# Make a minibatch of training data
num_train = self.X_train.shape[0]
batch_mask = np.random.choice(num_train, self.batch_size)
X_batch = self.X_train[batch_mask]
y_batch = self.y_train[batch_mask]
# Compute loss and gradient
loss, grads = self.model.loss(X_batch, y_batch)
self.loss_history.append(loss)
# Perform a parameter update
for p, w in self.model.params.iteritems():
dw = grads[p]
config = self.optim_configs[p]
next_w, next_config = self.update_rule(w, dw, config)
self.model.params[p] = next_w
self.optim_configs[p] = next_config
def train(self):
"""
Run optimization to train the model.
"""
num_train = self.X_train.shape[0]
iterations_per_epoch = max(num_train / self.batch_size, 1)
num_iterations = self.num_epochs * iterations_per_epoch
for t in xrange(num_iterations):
self._step()
# Maybe print training loss
if self.verbose and t % self.print_every == 0:
print '(Iteration %d / %d) loss: %f' % (
t + 1, num_iterations, self.loss_history[-1])
# At the end of every epoch, increment the epoch counter and decay the
# learning rate.
epoch_end = (t + 1) % iterations_per_epoch == 0
if epoch_end:
self.epoch += 1
for k in self.optim_configs:
self.optim_configs[k]['learning_rate'] *= self.lr_decay
# Check train and val accuracy on the first iteration, the last
# iteration, and at the end of each epoch.
first_it = (t == 0)
last_it = (t == num_iterations + 1)
if first_it or last_it or epoch_end:
train_acc = self.check_accuracy(self.X_train, self.y_train,
num_samples=1000)
val_acc = self.check_accuracy(self.X_val, self.y_val)
self.train_acc_history.append(train_acc)
self.val_acc_history.append(val_acc)
if self.verbose:
print '(Epoch %d / %d) train acc: %f; val_acc: %f' % (
self.epoch, self.num_epochs, train_acc, val_acc)
# Keep track of the best model
if val_acc > self.best_val_acc:
self.best_val_acc = val_acc
self.best_params = {}
for k, v in self.model.params.iteritems():
self.best_params[k] = v.copy()
# At the end of training swap the best params into the model
self.model.params = self.best_params
Thanks.
I believe, as you say, that in the Stanford course they are effectively using "epoch" with the less strict meaning of "expected number of times each example is seen during training". However, in my experience, most implementations consider an epoch as running through every example in the training set once, and I'd say they only chose the sampling with replacement for simplicity. If you have a good amount of data, chances are that you will not see a difference, but still, it is more correct to sample without replacement until there are no more examples.
You can check, for example, how Keras does the training in its source code; it's a bit complicated, but the important point is that make_batches is called to split the (possibly shuffled) examples into batches, which matches your initial idea of "epoch".
I am parsing a large text file full of data and then saving it to disk as a *.mat file so that I can easily load in only parts of it (see here for more information on reading in the files, and here for the data). To do so, I read in one line at a time, parse the line, and then append it to the file. The problem is that the file itself is >3 orders of magnitude larger than the data contained therein!
Here is a stripped down version of my code:
database = which('01_hit12.par');
[directory,filename,~] = fileparts(database);
matObj = matfile(fullfile(directory,[filename '.mat']),'Writable',true);
fidr = fopen(database);
hitranTemp = fgetl(fidr);
k = 1;
while ischar(hitranTemp)
if abs(hitranTemp(1)) == 32;
hitranTemp(1) = '0';
end
hitran = textscan(hitranTemp,'%2u%1u%12f%10f%10f%5f%5f%10f%4f%8f%15c%15c%15c%15c%6u%2u%2u%2u%2u%2u%2u%1c%7f%7f','delimiter','','whitespace','');
matObj.moleculeNumber(1,k) = uint8(hitran{1});
matObj.isotopeologueNumber(1,k) = uint8(hitran{2});
matObj.vacuumWavenumber(1,k) = hitran{3};
matObj.lineIntensity(1,k) = hitran{4};
matObj.airWidth(1,k) = single(hitran{6});
matObj.selfWidth(1,k) = single(hitran{7});
matObj.lowStateE(1,k) = single(hitran{8});
matObj.tempDependWidth(1,k) = single(hitran{9});
matObj.pressureShift(1,k) = single(hitran{10});
if rem(k,1e4) == 0;
display(sprintf('line %u (%2.2f)',k,100*k/K));
end
hitranTemp = fgetl(fidr);
k = k + 1;
end
fclose(fidr);
I stopped the code after 13,813 of the 224,515 lines had been parsed because it had been taking a very long time and the file size was getting huge, but the last printout indicated that I had only just cleared 10k lines. I cleared the memory, and then ran:
S = whos('-file','01_hit12.mat');
fileBytes = sum([S.bytes]);
T = dir(which('01_hit12.mat'));
diskBytes = T.bytes;
disp([fileBytes diskBytes diskBytes/fileBytes])
and get the output:
524894 896189009 1707.37141022759
What is taking up the extra 895,664,115 bytes? I know the help page says there should be a little extra overhead, but I feel that nearly a Gb of descriptive header is a bit excessive!
New information:
I tried pre-allocating the file, thinking that perhaps MATLAB was doing the same thing it does when a matrix is embiggened in a loop and reallocating a chunk of disk space for the entire matrix on each write, and that isn't it. Filling the file with zeros of the appropriate data types results in a file that my short check script returns:
8531570 71467 0.00837677004349727
This makes more sense to me. Matlab is saving the file sparsely, so the disk file size is much smaller than the size of the full matrix in memory. Once it starts replacing values with real data, however, I get the same behavior as before and the file size starts skyrocketing beyond all reasonable bounds.
New new information:
Tried this on a subset of the data, 100 lines long. To stream to disk, the data has to be in v7.3 format, so I ran the subset through my script, loaded it into memory, and then resaved as v7.0 format. Here are the results:
v7.3: 3800 8752 2.30
v7.0: 3800 2561 0.67
No wonder the v7.3 format isn't the default. Does anyone know a way around this? Is this a bug or a feature?
This seems like a bug to me. A workaround is to write in chunks to pre-allocated arrays.
Start off by pre-allocating:
fid = fopen('01_hit12.par', 'r');
data = fread(fid, inf, 'uint8');
nlines = nnz(data == 10) + 1;
fclose(fid);
matObj.moleculeNumber = zeros(1,nlines,'uint8');
matObj.isotopeologueNumber = zeros(1,nlines,'uint8');
matObj.vacuumWavenumber = zeros(1,nlines,'double');
matObj.lineIntensity = zeros(1,nlines,'double');
matObj.airWidth = zeros(1,nlines,'single');
matObj.selfWidth = zeros(1,nlines,'single');
matObj.lowStateE = zeros(1,nlines,'single');
matObj.tempDependWidth = zeros(1,nlines,'single');
matObj.pressureShift = zeros(1,nlines,'single');
Then to write in chunks of 10000, I modified your code as follows:
... % your code plus pre-alloc first
bs = 10000;
while ischar(hitranTemp)
if abs(hitranTemp(1)) == 32;
hitranTemp(1) = '0';
end
for ii = 1:bs,
hitran{ii} = textscan(hitranTemp,'%2u%1u%12f%10f%10f%5f%5f%10f%4f%8f%15c%15c%15c%15c%6u%2u%2u%2u%2u%2u%2 u%1c%7f%7f','delimiter','','whitespace','');
hitranTemp = fgetl(fidr);
if hitranTemp==-1, bs=ii; break; end
end
% this part really ugly, sorry! trying to keep it compact...
matObj.moleculeNumber(1,k:k+bs-1) = uint8(builtin('_paren',cellfun(#(c)c{1},hitran),1:bs));
matObj.isotopeologueNumber(1,k:k+bs-1) = uint8(builtin('_paren',cellfun(#(c)c{2},hitran),1:bs));
matObj.vacuumWavenumber(1,k:k+bs-1) = builtin('_paren',cellfun(#(c)c{3},hitran),1:bs);
matObj.lineIntensity(1,k:k+bs-1) = builtin('_paren',cellfun(#(c)c{4},hitran),1:bs);
matObj.airWidth(1,k:k+bs-1) = single(builtin('_paren',cellfun(#(c)c{5},hitran),1:bs));
matObj.selfWidth(1,k:k+bs-1) = single(builtin('_paren',cellfun(#(c)c{6},hitran),1:bs));
matObj.lowStateE(1,k:k+bs-1) = single(builtin('_paren',cellfun(#(c)c{7},hitran),1:bs));
matObj.tempDependWidth(1,k:k+bs-1) = single(builtin('_paren',cellfun(#(c)c{8},hitran),1:bs));
matObj.pressureShift(1,k:k+bs-1) = single(builtin('_paren',cellfun(#(c)c{9},hitran),1:bs));
k = k + bs;
fprintf('.');
end
fclose(fidr);
The final size on disk is 21,393,408 bytes. The usage breaks down as,
>> S = whos('-file','01_hit12.mat');
>> fileBytes = sum([S.bytes]);
>> T = dir(which('01_hit12.mat'));
>> diskBytes = T.bytes; ratio = diskBytes/fileBytes;
>> fprintf('%10d whos\n%10d disk\n%10.6f\n',fileBytes,diskBytes,ratio)
8531608 whos
21389582 disk
2.507099
Still fairly inefficient, but not out of control.
I'm new to Matlab and now learning the basic grammar.
I've written the file GetBin.m:
function res = GetBin(num_bin, bin, val)
if val >= bin(num_bin - 1)
res = num_bin;
else
for i = (num_bin - 1) : 1
if val < bin(i)
res = i;
end
end
end
and I call it with:
num_bin = 5;
bin = [48.4,96.8,145.2,193.6]; % bin stands for the intermediate borders, so there are 5 bins
fea_val = GetBin(num_bin,bin,fea(1,1)) % fea is a pre-defined 280x4096 matrix
It returns error:
Error in GetBin (line 2)
if val >= bin(num_bin - 1)
Output argument "res" (and maybe others) not assigned during call to
"/Users/mac/Documents/MATLAB/GetBin.m>GetBin".
Could anybody tell me what's wrong here? Thanks.
You need to ensure that every possible path through your code assigns a value to res.
In your case, it looks like that's not the case, because you have a loop:
for i = (num_bins-1) : 1
...
end
That loop will never iterate (so it will never assign a value to res). You need to explicitly specify that it's a decrementing loop:
for i = (num_bins-1) : -1 : 1
...
end
For more info, see the documentation on the colon operator.
for i = (num_bin - 1) : -1 : 1