Can you constrain a type by the size of the data in the type? - swift

I was messing around with matrix operations (it was the first programming I ever did nearly 20 years ago) and wanted to recreate what I had done all that time ago but with more modern practices.
Anyway...
One of the constraints with matrix operations is that the size of the matrices involved in the operations matter.
i.e. for addition the two matrices must be of the same size. i.e. M(i, j) + N(i, j). And multiplication only works if the number of columns of the left matrix is the same as the number of rows of the right matrix, etc...
I was looking for ways that I could apply these constraints at compile time but I'm not sure if that's possible.
I know I could create different subtypes for each size of matrix (Matrix1x1, Matrix1x2, Matrix2x3, ...) but there are "quite a lot" of those so that's a non-starter.
I could also use a precondition on the function which checks the input matrices for the correct sizes before doing anything (a bit like index out of bounds check on an array).
But I was wondering if there was a way of applying the size of the matrix to the type of the matrix at all. I don't think I've heard of this before but wanted to check before throwing out the idea entirely.
Something akin to when I create the matrix it applies the fact that it knows at that point what size the matrix has.
The function definition might then look something like...
func add(m: Matrix<i, j>, n: Matrix<i, j>) -> Matrix<i, j>
and
func multiply(m: Matrix<i, j>, n: Matrix<k, i>) -> Matrix<j, k> // or something
Where i and j are not generic type constraints but size constraints. But that isn't valid syntax just give the general idea of what I'm thinking.

Related

Pseudo randomization in MATLAB with minimum intervals between stimulus categories

For an experiment I need to pseudo randomize a vector of 100 trials of stimulus categories, 80% of which are category A, 10% B, and 10% C. The B trials have at least two non-B trials between each other, and the C trials must come after two A trials and have two A trials following them.
At first I tried building a script that randomized a vector and sort of "popped" out the trials that were not where they should be, and put them in a space in the vector where there was a long series of A trials. I'm worried though that this is overcomplicated and will create an endless series of unforeseen errors that will need to be debugged, as well as it not being random enough.
After that I tried building a script which simply shuffles the vector until it reaches the criteria, which seems to require less code. However now that I have spent several hours on it, I am wondering if these criteria aren't too strict for this to make sense, meaning that it would take forever for the vector to shuffle before it actually met the criteria.
What do you think is the simplest way to handle this problem? Additionally, which would be the best shuffle function to use, since Shuffle in psychtoolbox seems to not be working correctly?
The scope of this question moves much beyond language-specific constructs, and involves a good understanding of probability and permutation/combinations.
An approach to solving this question is:
Create blocks of vectors, such that each block is independent to be placed anywhere.
Randomly allocate these blocks to get a final random vector satisfying all constraints.
Part 0: Category A
Since category A has no constraints imposed on it, we will go to the next category.
Part 1: Make category C independent
The only constraint on category C is that it must have two A's before and after. Hence, we first create random groups of 5 vectors, of the pattern A A C A A.
At this point, we have an array of A vectors (excluding blocks), blocks of A A C A A vectors, and B vectors.
Part 2: Resolving placement of B
The constraint on B is that two consecutive Bs must have at-least 2 non-B vectors between them.
Visualize as follows: Let's pool A and A A C A A in one array, X. Let's place all Bs in a row (suppose there are 3 Bs):
s0 B s1 B s2 B s3
Where s is the number of vectors between each B. Hence, we require that s1, s2 be at least 2, and overall s0 + s1 + s2 + s3 equal to number of vectors in X.
The task is then to choose random vectors from X and assign them to each s. At the end, we finally have a random vector with all categories shuffled, satisfying the constraints.
P.S. This can be mapped to the classic problem of finding a set of random numbers that add up to a certain sum, with constraints.
It is easier to reduce the constrained sum problem to one with no constraints. This can be done as:
s0 B s1 t1 B s2 t2 B s3
Where t1 and t2 are chosen from X just enough to satisfy constraints on B, and s0 + s1 + s2 + s3 equal to number of vectors in X not in t.
Implementation
Implementing the same in MATLAB could benefit from using cell arrays, and this algorithm for the random numbers of constant sum.
You would also need to maintain separate pools for each category, and keep building blocks and piece them together.
Really, this is not trivial but also not impossible. This is the approach you could try, if you want to step aside from brute-force search like you have tried before.

(matlab matrix operation), Is it possible to get a group of value from matrix without loop?

I'm currently working on implementing a gradient check function in which it requires to get certain index values from the result matrix. Could someone tell me how to get a group of values from the matrix?
To be specific, for a result matrx res with size M x N, I'll need to get element res(3,1), res(4,2), res(1,3), res(2,4)...
In my case, M is dimension and N is batch size and there's a label array whose size is 1xbatch_size, [3 4 1 2...]. So the desired values are res(label(:),1:batch_size). Since I'm trying to practice vectorization programming and it's better not using loop. Could someone tell me how to get a group of value without a iteration?
Cheers.
--------------------------UPDATE----------------------------------------------
The only idea I found is firstly building a 'mask matrix' then use the original result matrix to do element wise multiplication (technically called 'Hadamard product', see in wiki). After that just get non-zero element out and do the sum operation, the code in matlab should look like:
temp=Mask.*res;
desired_res=temp(temp~=0); %Note: the temp(temp~=0) extract non-zero elements in a 'column' fashion: it searches temp matrix column by column then put the non-zero number into container 'desired_res'.
In my case, what I wanna do next is simply sum(desired_res) so I don't need to consider the order of those non-zero elements in 'desired_res'.
Based on this idea above, creating mask matrix is the key aim. There are two methods to do this job.
Codes are shown below. In my case, use accumarray function to add '1' in certain location (which are stored in matrix 'subs') and add '0' to other space. This will give you a mask matrix size [rwo column]. The usage of full(sparse()) is similar. I made some comparisons on those two methods (repeat around 10 times), turns out full(sparse) is faster and their time costs magnitude is 10^-4. So small difference but in a large scale experiments, this matters. One benefit of using accumarray is that it could define the matrix size while full(sparse()) cannot. The full(sparse(subs, 1)) would create matrix with size [max(subs(:,1)), max(subs(:,2))]. Since in my case, this is sufficient for my requirement and I only know few of their usage. If you find out more, please share with us. Thanks.
The detailed description of those two functions could be found on matlab's official website. accumarray and full, sparse.
% assume we have a label vector
test_labels=ones(10000,1);
% method one, accumarray(subs,1,[row column])
tic
subs=zeros(10000,2);
subs(:,1)=test_labels;
subs(:,2)=1:10000;
k1=accumarray(subs,1,[10, 10000]);
t1=toc % to compare with method two to check which one is faster
%method two: full(sparse(),1)
tic
k2=full(sparse(test_labels,1:10000,1));
t2=toc

Matlab non-linear binary Minimisation

I have to set up a phoneme table with a specific probability distribution for encoding things.
Now there are 22 base elements (each with an assigned probability, sum 100%), which shall be mapped on a 12 element table, which has desired element probabilities (sum 100%).
So part of the minimisation is to merge several base elements to get 12 table elements. Each base element must occur exactly once.
In addition, the table has 3 rows. So the same 12 element composition of the 22 base elements must minimise the error for 3 target vectors. Let's say the given target vectors are b1,b2,b3 (dimension 12x1), the given base vector is x (dimension 22x1) and they are connected by the unknown matrix A (12x22) by:
b1+err1=Ax
b2+err2=Ax
b3+err3=Ax
To sum it up: A is to be found so that dot_prod(err1+err2+err3, err1+err2+err3)=min (least squares). And - according to the above explanation - A must contain only 1's and 0's, while having exactly one 1 per column.
Unfortunately I have no idea how to approach this problem. Can it be expressed in a way different from the matrix-vector form?
Which tools in matlab could do it?
I think I found the answer while parsing some sections of the Matlab documentation.
First of all, the problem can be rewritten as:
errSum=err1+err2+err3=3Ax-b1-b2-b3
=> dot_prod(errSum, errSum) = min(A)
Applying the dot product (least squares) yields a quadratic scalar expression.
Syntax-wise, the fmincon tool within the optimization box could do the job. It has constraints parameters, which allow to force Aij to be binary and each column to be 1 in sum.
But apparently fmincon is not ideal for binary problems algorithm-wise and the ga tool should be used instead, which can be called in a similar way.
Since the equation would be very long in my case and needs to be written out, I haven't tried yet. Please correct me, if I'm wrong. Or add further solution-methods, if available.

Partitioning a number into a number of almost equal partitions

I would like to partition a number into an almost equal number of values in each partition. The only criteria is that each partition must be in between 60 to 80.
For example, if I have a value = 300, this means that 75 * 4 = 300.
I would like to know a method to get this 4 and 75 in the above example. In some cases, all partitions don't need to be of equal value, but they should be in between 60 and 80. Any constraints can be used (addition, subtraction, etc..). However, the outputs must not be floating point.
Also it's not that the total must be exactly 300 as in this case, but they can be up to a maximum of +40 of the total, and so for the case of 300, the numbers can sum up to 340 if required.
Assuming only addition, you can formulate this problem into a linear programming problem. You would choose an objective function that would maximize the sum of all of the factors chosen to generate that number for you. Therefore, your objective function would be:
(source: codecogs.com)
.
In this case, n would be the number of factors you are using to try and decompose your number into. Each x_i is a particular factor in the overall sum of the value you want to decompose. I'm also going to assume that none of the factors can be floating point, and can only be integer. As such, you need to use a special case of linear programming called integer programming where the constraints and the actual solution to your problem are all in integers. In general, the integer programming problem is formulated thusly:
You are actually trying to minimize this objective function, such that you produce a parameter vector of x that are subject to all of these constraints. In our case, x would be a vector of numbers where each element forms part of the sum to the value you are trying to decompose (300 in your case).
You have inequalities, equalities and also boundaries of x that each parameter in your solution must respect. You also need to make sure that each parameter of x is an integer. As such, MATLAB has a function called intlinprog that will perform this for you. However, this function assumes that you are minimizing the objective function, and so if you want to maximize, simply minimize on the negative. f is a vector of weights to be applied to each value in your parameter vector, and with our objective function, you just need to set all of these to -1.
Therefore, to formulate your problem in an integer programming framework, you are actually doing:
(source: codecogs.com)
V would be the value you are trying to decompose (so 300 in your example).
The standard way to call intlinprog is in the following way:
x = intlinprog(f,intcon,A,b,Aeq,beq,lb,ub);
f is the vector that weights each parameter of the solution you want to solve, intcon denotes which of your parameters need to be integer. In this case, you want all of them to be integer so you would have to supply an increasing vector from 1 to n, where n is the number of factors you want to decompose the number V into (same as before). A and b are matrices and vectors that define your inequality constraints. Because you want equality, you'd set this to empty ([]). Aeq and beq are the same as A and b, but for equality. Because you only have one constraint here, you would simply create a matrix of 1 row, where each value is set to 1. beq would be a single value which denotes the number you are trying to factorize. lb and ub are the lower and upper bounds for each value in the parameter set that you are bounding with, so this would be 60 and 80 respectively, and you'd have to specify a vector to ensure that each value of the parameters are bounded between these two ranges.
Now, because you don't know how many factors will evenly decompose your value, you'll have to loop over a given set of factors (like between 1 to 10, or 1 to 20, etc.), place your results in a cell array, then you have to manually examine yourself whether or not an integer decomposition was successful.
num_factors = 20; %// Number of factors to try and decompose your value
V = 300;
results = cell(1, num_factors);
%// Try to solve the problem for a number of different factors
for n = 1 : num_factors
x = intlinprog(-ones(n,1),1:n,[],[],ones(1,n),V,60*ones(n,1),80*ones(n,1));
results{n} = x;
end
You can then go through results and see which value of n was successful in decomposing your number into that said number of factors.
One small problem here is that we also don't know how many factors we should check up to. That unfortunately I don't have an answer to, and so you'll have to play with this value until you get good results. This is also an unconstrained parameter, and I'll talk about this more later in this post.
However, intlinprog was only released in recent versions of MATLAB. If you want to do the same thing without it, you can use linprog, which is the floating point version of integer programming... actually, it's just the core linear programming framework itself. You would call linprog this way:
x = linprog(f,A,b,Aeq,beq,lb,ub);
All of the variables are the same, except that intcon is not used here... which makes sense as linprog may generate floating point numbers as part of its solution. Due to the fact that linprog can generate floating point solutions, what you can do is if you want to ensure that for a given value of n, you could loop over your results, take the floor of the result and subtract with the final result, and sum over the result. If you get a value of 0, this means that you had a completely integer result. Therefore, you'd have to do something like:
num_factors = 20; %// Number of factors to try and decompose your value
V = 300;
results = cell(1, num_factors);
%// Try to solve the problem for a number of different factors
for n = 1 : num_factors
x = linprog(-ones(n,1),[],[],ones(1,n),V,60*ones(n,1),80*ones(n,1));
results{n} = x;
end
%// Loop through and determine which decompositions were successful integer ones
out = cellfun(#(x) sum(abs(floor(x) - x)), results);
%// Determine which values of n were successful in the integer composition.
final_factors = find(~out);
final_factors will contain which number of factors you specified that was successful in an integer decomposition. Now, if final_factors is empty, this means that it wasn't successful in finding anything that would be able to decompose the value into integer factors. Noting your problem description, you said you can allow for tolerances, so perhaps scan through results and determine which overall sum best matches the value, then choose whatever number of factors that gave you that result as the final answer.
Now, noting from my comments, you'll see that this problem is very unconstrained. You don't know how many factors are required to get an integer decomposition of your value, which is why we had to semi-brute-force it. In fact, this is a more general case of the subset sum problem. This problem is NP-complete. Basically, what this means is that it is not known whether there is a polynomial-time algorithm that can be used to solve this kind of problem and that the only way to get a valid solution is to brute-force each possible solution and check if it works with the specified problem. Usually, brute-forcing solutions requires exponential time, which is very intractable for large problems. Another interesting fact is that modern cryptography algorithms use NP-Complete intractability as part of their ciphertext and encrypting. Basically, they're banking on the fact that the only way for you to determine the right key that was used to encrypt your plain text is to check all possible keys, which is an intractable problem... especially if you use 128-bit encryption! This means you would have to check 2^128 possibilities, and assuming a moderately fast computer, the worst-case time to find the right key will take more than the current age of the universe. Check out this cool Wikipedia post for more details in intractability with regards to key breaking in cryptography.
In fact, NP-complete problems are very popular and there have been many attempts to determine whether there is or there isn't a polynomial-time algorithm to solve such problems. An interesting property is that if you can find a polynomial-time algorithm that will solve one problem, you will have found an algorithm to solve them all.
The Clay Mathematics Institute has what are known as Millennium Problems where if you solve any problem listed on their website, you get a million dollars.
Also, that's for each problem, so one problem solved == 1 million dollars!
(source: quickmeme.com)
The NP problem is amongst one of the seven problems up for solving. If I recall correctly, only one problem has been solved so far, and these problems were first released to the public in the year 2000 (hence millennium...). So... it has been about 14 years and only one problem has been solved. Don't let that discourage you though! If you want to invest some time and try to solve one of the problems, please do!
Hopefully this will be enough to get you started. Good luck!

matlab: out of memory when concatenating sparse matrix with a vector

I create a 3560 x 3560 sparse matrix, A. I then create two 1 X 3560 vectors, S and T.
When I run the following code (which concatenates S and T as rows in A and afterwards also as columns in A)
A=[A;S;T];
S=[S 0 0];
T=[T 0 0];
A=[A, S', T'];
The last line produces an out of memory error.
I guess I am running out of memory since I have other variables stored, but it seems odd to me that adding two 3560 vectors would be the point in which I am exactly hitting my limit, so I think (or more accurately, wishfully think) that somehow the concatenations aren't done in a smart way...
Am I right or is there no hope (except for optimizing other pieces in my code)?
EDIT:
At the request of yoda, I am posting the full code.
Basically what it does is get a N X N matrix of edge weights between the nodes of a graph, and adds two vectors that will act as a source and sink in a max flow computation.
nbr_sim(nbr_sim<0.8)=0;
A=sparse(size(nbr_sim,1)+2,size(nbr_sim,2)+2);
nelements=size(nbr_sim,1);
A(nbr_sim>0)=nbr_sim(nbr_sim>0);
clear nbr_sim;
S=abs([1 0 0]*n);
T=abs([0 1 0]*n);
A(1:nelements,end-1)=S';
A(1:nelements,end)=T';
A(end-1,1:nelements)=S;
A(end,1:nelements)=T;
EDIT:
As you say you have used considerable resources before this operation, it is entirely likely that you are close to the tipping point, when MATLAB gives you an out of memory error.
Remember that when you grow matrices on the fly either by concatenating or by indexing out of range, MATLAB creates a copy of the matrix in memory. So you're not just using up resources for that extra row, but for a copy of that entire matrix!
Here's an example on my machine where I try to grow a vector that's large enough to tip it over the memory limit.
clear
a=rand(2*10^9+1,1); %#create a large array
whos a
Name Size Bytes Class Attributes
a 2000000001x1 16000000008 double
%#Now repeat the same, but by growing the array by one element
clear
a=rand(2*10^9,1);
a=[a;0];
??? Error using ==> vertcat
Out of memory. Type HELP MEMORY for your options.
So you see that although MATLAB can create a matrix with 2*10^9+1 elements in one go, when you try to create an array of the same size by append a single element to a 2*10^9 element vector, it runs out of memory.
If S and T are column vectors as you say, then A=[A;S;T] should give you an error:
??? Error using ==> vertcat
CAT arguments dimensions are not consistent.
So you must be doing something else. Concatenating will not change sparseness of the matrix i.e., it won't switch from sparse to full.
A=sprand(3560,3560,0.01); %#test matrices
S=rand(3560,1);
T=rand(3560,1);
B=[A,S,T]; %#join the columns
issparse(B)
ans =
1
Moreover, a 3560x3560 matrix of doubles is only ~97 MB, which shouldn't give you an "out of memory" error...
When dealing with large matrix:
For full matrix, you'd better preallocate memory to avoid memory copy during extending.see why
The sparse case is more complicated, and can be even less efficiency than extending in full matrix, because the elements is stored in a compressed manner. Setting an "inner" entry may cause large memory overwrites(have a look here).
So you'd better edit all the entries in advance and create with sparse() function, rather than call sparse() and then pad the data.