Matlab: Building a molecular library through permutation [duplicate] - matlab

This question already has answers here:
Generate a matrix containing all combinations of elements taken from n vectors
(4 answers)
Closed 8 years ago.
I am trying to make an index all possible molecules with 0-46 Hydrogen, 0-20 carbon, 0-13 oxygen, etc. I have 7 atoms in which I am interested: H, C, O, N, Cl, F, and S. I have written the following for loop to show what I am trying to achieve:
MassListIndex = []
%MassIndex = [h,c,o,n,cl,f,s]
for h = 0:46;
for c = 0:20;
for o = 0:13;
for n = 0:15;
for cl=0:5;
for f=0:5;
for s=0:5;
MassListIndex = [MassListIndex;[h,c,o,n,cl,f,s]];
end;
end;
end;
end;
end;
end;
end;
This strikes me as terribly inefficient; I don't want to wait around for 2 months for this to run. I have tried using the combinator.m script, but the problem is that there is only one input for the length of the set that is 'permutated' ie if I want to have up to 46 hydrogens, I need to also have 46 of each of the other 6 atoms. This is computationally...heavy (46^7 ~= 436 billion).
Is there any way to make this sort of computation more efficient? Or do I need to think more about shrinking my list by riding it of 'nonsense permutations' (As far as I know, the molecule H40C2 has never been observed!)
Thanks

The first problem is not that hard. At least not if you remember to preallocate!
I changed you the code into this:
mxidx = 47*21*14*16*6*6*6;
MassListIndex = zeros(mxidx,7);
idx = 1;
for h = 0:46;
for c = 0:20;
for o = 0:13;
for n = 0:15;
for cl=0:5;
for f=0:5;
for s=0:5;
MassListIndex(idx,:) = [h,c,o,n,cl,f,s];
idx = idx + 1;
end;
end;
end;
end;
end;
end;
end;
And it ran in less than a minute on my computer.
Usually Matlab will warn you if you forget to preallocate; and whenever you (like in this case) know in advance the size of you matrix, you should preallocate!
The other problem on the other hand, 47^7 = 506623120463 (more than 500 billions - it is 47^7 instead of 46^7 since the list 0:46 has 47 elements). So even if you only use one byte pr. row in you matrix (which you certainly don't) it will still take up more that a half terabyte! And the calculation times will likewise be humongous!
But really when would you ever need this list. The way you have constructed your list you can easily calculate an entry just by the index eg.:
function m = MassListIndex(a,b)
a = a - 1;
lst = zeros(1,7);
for i = 1:7
lst(8-i) = mod(a,47);
a = floor(a /47);
end
if nargin < 2
m = lst;
else
m = lst(b);
end
end
Edit:
If you want it to also calculate the mass, you may do something like:
function mass = getMassFromPermutationNumber(a)
a = a - 1;
lst = zeros(1,7);
for i = 1:7
lst(8-i) = mod(a,47);
a = floor(a /47);
end
mass = lst*[1.00794;12.011;15.9994;20.1797;35.4527;18.9984;32.066];
end
Source for masses: http://environmentalchemistry.com/yogi/periodic/mass.html
Disclaimer: I'm not that good at chemistry, so please apply reasonable amounts of skepticism!

Related

How to break loop if number repeats -Matlab

I recognized this is a quite hard problem for me. I asked this problem on official Matlab side but no-one could help me either so maybe someone of you can come up with an outstanding approach.
In detail my Problem consist of:
N = 100 %some number
G = 21 %random guess < N
for x = 1:N;
a = mod(G^x,N);
end
Now I want the calculation of a to stop, if a number repeats.
For example: a = 1, 2, 3, 1 -break
Seems simple but I just can't handle it right after many tries.
For instance I've put:
for x = 1:N
a = mod(G^x,N);
b = unique(a);
if a ~= b
break
end
end
but doesn't seem to work bc. it's not element wise I guess.
This approach keeps a running log of the past Results and uses the ismember() function to check if the current value of a has been previously seen.
clc;
N = 100; %some number
G = 21; %random guess < N
Results = NaN(1,N);
for x = 1:N
a = mod(G^x,N);
disp(a);
if ismember(a,Results)
disp("-break");
break
end
Results(x) = a;
end
Ran using MATLAB R2019b

Solving probability problems with MATLAB

How can I simulate this question using MATLAB?
Out of 100 apples, 10 are rotten. We randomly choose 5 apples without
replacement. What is the probability that there is at least one
rotten?
The Expected Answer
0.4162476
My Attempt:
r=0
for i=1:10000
for c=1:5
a = randi(1,100);
if a < 11
r=r+1;
end
end
end
r/10000
but it didn't work, so what would be a better way of doing it?
Use randperm to choose randomly without replacement:
A = false(1, 100);
A(1:10) = true;
r = 0;
for k = 1:10000
a = randperm(100, 5);
r = r + any(A(a));
end
result = r/10000;
Short answer:
Your problem follow an hypergeometric distribution (similar to a binomial distribution but without replacement), if you have the necessary toolbox you can simply use the probability density function of the hypergeometric distribution:
r = 1-hygepdf(0,100,10,5) % r = 0.4162
Since P(x>=1) = P(x=1) + P(x=2) + P(x=3) + P(x=4) + P(x=5) = 1-P(x=0)
Of course, here I calculate the exact probability, this is not an experimental result.
To get further:
Noticed that if you do not have access to hygepdf, you can easily write the function yourself by using binomial coefficient:
N = 100; K = 10;
n = 5; k = 0;
r = 1-(nchoosek(K,k)*nchoosek(N-K,n-k))/nchoosek(N,n) % r = 0.4162
You can also use the binomial probability density function, it is a bit more tricky (but also more intuitive):
r = 1-prod(binopdf(0,1,10./(100-[0:4])))
Here we compute the probability to obtain 0 rotten apple five time in a row, the probabily increase at every step since we remove 1 good juicy apple each time. And then, according to the above explaination, we take 1-P(x=0).
There are a couple of issues with your code. First of all, implicitly in what you wrote, you replace the apple after you look at it. When you generate the random number, you need to eliminate the possibility of choosing that number again.
I've rewritten your code to include better practices:
clear
n_runs = 1000;
success = zeros(n_runs, 1);
failure = zeros(n_runs, 1);
approach = zeros(n_runs, 1);
for ii = 1:n_runs
apples = 1:100;
a = randperm(100, 5);
if any(a < 11)
success(ii) = 1;
elseif a >= 11
failure(ii) = 1;
end
approach(ii) = sum(success)/(sum(success)+sum(failure));
end
figure; hold on
plot(approach)
title("r = "+ approach(end))
hold off
The results are stored in an array (called approach), rather than a single number being updated every time, which means you can see how quickly you approach the end value of r.
Another good habit is including clear at the beginning of any script, which reduces the possibility of an error occurring due to variables stored in the workspace.

How to implement this function in a single line in MATLAB

The resulting indices of the transformed light field data are ub, vb, sb, tb. Each of them is depending on variables u,v,s,t.
Sorry for being unclear, let me mention that I am trying to transform a 4D dataset through some sort of matrix. In the code below M is simply a 3D transformation matrix.
f=0.1;
n = 11;
[u,v,s,t] = ndgrid([1:Size(3)],[1:Size(4)],[1:Size(1)],[1:Size(2)]);
alpha = M(3,1)*s+M(3,2)*t+M(3,3)*nf;
beta1 = M(1,1)*u+M(1,2)*v+M(1,4);
beta2 = M(2,1)*u+M(2,2)*v+M(2,4);
C = M(3,1)*u+M(3,2)*v+M(3,4);
D1 = M(1,1)*s+M(1,2)*t+M(1,3)*nf;
D2 = M(2,1)*s+M(2,2)*t+M(2,3)*nf;
ub = -D1.*C./alpha+beta1;
vb = -D2.*C./alpha+beta2;
sb = nf*D1./alpha;
tb = nf*D2./alpha;
for s = 1:Size(1)
for t = 1:Size(2)
for u = 1:Size(3)
for v = 1:Size(4)
newLF(sb(u,v,s,t),tb(u,v,s,t),ub(u,v,s,t),vb(u,v,s,t)) = LF2(s,t,u,v);
end;
end;
end;
end;
Now since ub,vb,sb and tb are depending on u,v,s,t therefore, it is not possible to assign it like newLF = LF2;
Now the question is how to minimize these for loops to a single line.
The answer is
newLF = LF2;
Meaning, that code does nothing but copy LF2 to newLF.
To check that I'm right, just let the code run with some random matrix LF2 and then evaluate
all(newLF(:) == LF2(:))
and you'll find it always evaluates to "true".
First of all, your use of sb, tb, ub, vb is redundant. You are indexing into a grid, but this just reproduces the indices. The line
newLF(sb(u,v,s,t),tb(u,v,s,t),ub(u,v,s,t),vb(u,v,s,t)) = LF2(s,t,u,v);
is equivalent to the line
newLF(s,t,u,v) = LF2(s,t,u,v);
This of course is just element-wise copying.
The impression of permutation noted by Shai is given by the line
[ub,vb,sb,tb] = ndgrid([1:Size(3)],[1:Size(4)],[1:Size(1)],[1:Size(2)]);
which looks like you are preparing to permute dimensions (1, 2) with dimensions (3,4). However, you use this index grid in the form sb, tb, ub, vb, assigning the value from s, t, u, v, so the permutation is not actually performed.
Assuming you actually do want to do that permutation of dimensions, the correct code would be
for s = 1:Size(1)
for t = 1:Size(2)
for u = 1:Size(3)
for v = 1:Size(4)
newLF(u,v,s,t) = LF2(s,t,u,v);
end;
end;
end;
end;
In this case Shai would be right, the corresponding one-liner is
newLF = permute(LF2, [3 4 1 2]);

How can I rearrange a list of numbers so that every N numbers is nonrepeating?

So I have a list of 190 numbers ranging from 1:19 (each number is repeated 10 times) that I need to sample 10 at a time. Within each sample of 10, I don't want the numbers to repeat, I tried incorporating a while loop, but computation time was way too long. So far I'm at the point where I can generate the numbers and see if there are repetitions within each subset. Any ideas?
N=[];
for i=1:10
N=[N randperm(19)];
end
B=[];
for j=1:10
if length(unique(N(j*10-9:j*10)))<10
B=[B 1];
end
end
sum(B)
Below is an updated version of the code. this might be a little more clear in showing what I want. (19 targets taken 10 at a time without repetition until all 19 targets have been repeated 10 times)
nTargs = 19;
pairs = nchoosek(1:nTargs, 10);
nPairs = size(pairs, 1);
order = randperm(nPairs);
values=randsample(order,19);
targs=pairs(values,:);
Alltargs=false;
while ~Alltargs
targs=pairs(randsample(order,19),:);
B=[];
for i=1:19
G=length(find(targs==i))==10;
B=[B G];
end
if sum(B)==19
Alltargs=true;
end
end
Here are some very simple steps to do this, basically you just shuffle the vector once, and then you grab the last 10 unique values:
v = repmat(1:19,1,10);
v = v(randperm(numel(v)));
[a idx]=unique(v);
result = unique(v);
v(idx)=[];
The algorithm should be fairly efficient, if you want to do the next 10, just run the last part again and combine the results into a totalResult
You want to sample the numbers 1:19 randomly in blocks of 10 without repetitions. The Matlab function 'randsample' has an optional 'replacement' argument which you can set to 'false' if you do not want repetitions. For example:
N = [];
replacement = false;
for i = 1:19
N = [N randsample(19,10,replacement)];
end
This generates a 19 x 10 matrix of random integers in the range [1,..,19] without repetitions within each column.
Edit: Here is a solution that addresses the requirement that each of the integers [1,..,19] occurs exactly 10 times, in addition to no repetition within each column / sample:
nRange = 19; nRep = 10;
valueRep = true; % true while there are repetitions
nLoops = 0; % count the number of iterations
while valueRep
l = zeros(1,nRep);
v = [];
for m = 1:nRep
v = [v, randperm(nRange,nRange)];
end
m1 = reshape(v,nRep,nRange);
for n = 1:nRep
l(n) = length(unique(m1(:,n)));
end
if all(l == nRep)
valueRep = false;
end
nLoops = nLoops + 1;
end
result = m1;
For the parameters in the question it takes about 300 iterations to find a result.
I think you should approach this constructively.
It's easy to initially find a 19 groups that fulfill your conditions just by rearranging the series 1:19: series1 = repmat(1:19,1,10); and rearranged= reshape(series1,10,19)
then shuffle the values
I would select two random columns copy them and switch the values at two random positions
then make a test if it fulfills your condition - like: test = #(x) numel(unique(x))==10 - if yes replace your columns
just keep shuffling till your time runs out or you are happy
of course you might come up with more efficient shuffling or testing
I was given another solution through the MATLAB forum that works pretty well (Credit to Niklas Nylen over on the MATLAB forum). Computation time is pretty low too. It basically shuffles the numbers until there are no repetitions within every 10 values. Thanks all for your help.
y = repmat(1:19,1,10);
% Run enough iterations to get the output random enough, I selected 100000
for ii = 1:100000
% Select random index
index = randi(length(y)-1);
% Check if it is allowed to switch places
if y(index)~=y(min(index+10, length(y))) && y(index+1)~=y(max(1,index-9))
% Make the switch
yTmp = y(index);
y(index)=y(index+1);
y(index+1)=yTmp;
end
end

LCG in MatLab implementation

hi im having trouble creating a linear congruential generator in MatLab, the ones that I found online work quite different than mine. then im trying to print values of the m and a (relatively prime, m being a large prime obviously) and check when the cycle is full. I know all the math stuff, im getting used to matlab and its hard to implement this for me even though i should know. my program looks like this:
M = [];
for m = 100:10000;
M(m) = m;
A = [];
for a = 2:(m-1);
A(a) = a;
B = [];
R = [];
for n = 1:1000;
R(n) = n;
B(n) = A(a) * n;
K = [];
K(n)=mod(B(n),M(m));
n=n+1;
a=a+1;
m=m+1;
if K(n) == R(n)
print (m)
print (a)
print ('the cycle is done')
end
end
end
end
also im not too familiar with MatLab so im probably creating arrays the wrong way. thanks in advance.
Well you aren't really asking a question there. Here is some advice for you:
1) Pre-allocate the matrices: M = zeros(9900,1), A = zeros(9998,1), you will get much faster results when you loop, or even better M = 100:10000 works directly if the values you want to put in are as simple as that.
2) You do not need to do the a = a+1, the for loop does it automatically for you (unless it's there for another reason I'm unaware of).