Matlab: linear congruence solver that supports a non-prime modulus? - matlab

I'm working on some Matlab code to perform something called the Index Calculus attack on a given cryptosystem (this involves calculating discrete log values), and I've gotten it all done except for one small thing. I cant figure out (in Matlab) how to solve a linear system of congruences mod p, where p is not prime. Also, this system has more than one variable, so, unless I'm missing something, the Chinese remainder theorem wont work.
I asked a question on the mathematics stackexchange with more detail/formatted mathjax here. I solved the issue in my question at that link, and now I'm attempting to find a utility that will allow me to solve the system of congruences modulo a non-prime. I did find a suite that includes a solver supporting modular arithmetic, but the modulus must be prime (here). I also tried stepping through to modify it to work with non-primes, but whatever method is used doesn't work, because it requires all elements of the system have inverses modulo p.
I've looked into using the ability in Matlab to call MuPAD functions, but from my testing, the MuPAD function linsolve (which seemed to be the best candidate) doesn't support non-prime modulus values either. Additionally, I've verified with Maple that this system is solvable modulo my integer of interest (8), so it can be done.
To be more specific, this is the exact command I'm trying to run in MuPAD:
linsolve([0*x + 5*y + 4*z + q = 2946321, x + 7*y + 2*q = 5851213, 8*x + y + 2*q = 2563617, 10*x + 5*y + z = 10670279],[x,y,z,q], Domain = Dom::IntegerMod(8))
Error: expecting 'Domain=R', where R is a domain of category 'Cat::Field' [linsolve]
The same command returns correct values if I change the domain to IntegerMod(23) and IntegerMod(59407), so I believe 8 is unsuitable because it's not prime. Here is the output when I try the above command with each 23 and 59407 as my domain:
[x = 1 mod 23, y = 1 mod 23, z = 12 mod 23, q = 14 mod 23]
[x = 14087 mod 59407, y = 1 mod 59407, z = 14365 mod 59407, q = 37320 mod 59407]
These answers are correct- x, y, z, and q correspond to L1, L2, L3, and L4 in the system of congruences located at my Math.StackExchange link above.

I'm wondering if you tried to use sym/linsolve and sym/solve previously, but may have passed in numeric rather than symbolic values. For example, this returns nonsense in terms of what you're looking for:
A = [0 5 4 1;1 7 0 2;8 1 0 2;10 5 1 0];
b = [2946321;5851213;2563617;10670279];
s = mod(linsolve(A,b),8)
But if you convert the numeric values to symbolic integers, sym/linsolve will keep everything in terms of rational fractions. Then
s = mod(linsolve(sym(A),sym(b)),8)
returns the expected answer
s =
6
1
6
4
This just solves the system linear system using symbolic math as if it were a normal matrix. For large systems this can be expensive, but I'd imagine no more than using MuPAD's numeric::linsolve or linalg::matlinsolve. sym/mod should return the modulus of the numerator of each solution component. I believe that you will get an error if the modulus and the denominator are not at least coprime.
sym/solve can also be used to solve this in a similar manner:
L = sym('L',[4,1]);
[L1,L2,L3,L4] = solve(A*L==b);
s = mod([L1;L2;L3;L4],8)
A possible issue with using either sym/solve or sym/linsolve is that if there are multiple solutions to the linear congruence problem (as opposed to the linear system), this approach may not return all of them.
Finally, using the MuPAD function numlib::ichrem (chinese remainder theorem for integers), here's some code that attempts to obtain the complete solution:
A = [0 5 4 1;1 7 0 2;8 1 0 2;10 5 1 0];
b = [2946321;5851213;2563617;10670279];
m = 10930888;
mf = str2num(strrep(char(factor(sym(m))),'*',' '));
A = sym(A);
b = sym(b);
s = sym(zeros(length(b),length(mf)));
for i = 1:length(mf)
s(:,i) = mod(linsolve(A,b),mf(i));
end
mstr = ['[' sprintf('%d,',mf)];
mstr(end) = ']';
r = sym(zeros(length(b),1));
for i = 1:length(b)
sstr = char(s(i,:));
r(i) = feval(symengine,'numlib::ichrem',sstr(9:end-2),mstr);
end
check = isequal(mod(A*r,m),b)
I'm not sure if any of this is what you're looking for, but hopefully it might be helpful. I think that it might be a good idea to put in a enhancement/service request with the MathWorks so that MuPAD and the other solvers can handle systems better in the future.

Related

Definite integration using int command

Firstly, I'm quite new to Matlab.
I am currently trying to do a definite integral with respect to y of a particular function. The function that I want to integrate is
(note that the big parenthesis is multiplying with the first factor - I can't get the latex to not make it look like power)
I have tried plugging the above integral into Desmos and it worked as intended. My plan was to vary the value of x and y and will be using for loop via matlab.
However, after trying to use the int function to calculate the definite integral with the code as follow:
h = 5;
a = 2;
syms y
x = 3.8;
p = 2.*x.^2+2.*a.*y;
q = x.^2+y.^2;
r = x.^2+a.^2;
f = (-1./sqrt(1-(p.^2./(4.*q.*r)))).*(2.*sqrt(q).*sqrt(r).*2.*a-p.*2.*y.*sqrt(r)./sqrt(q))./(4.*q.*r);
theta = int(f,y,a+0.01,h) %the integral is undefined at y=2, hence the +0.01
the result is not quite as expected
theta =
int(-((8*461^(1/2)*(y^2 + 361/25)^(1/2))/5 - (461^(1/2)*y*(8*y + 1444/25))/(5*(y^2 + 361/25)^(1/2)))/((1 - (4*y + 722/25)^2/((1844*y^2)/25 + 665684/625))^(1/2)*((1844*y^2)/25 + 665684/625)), y, 21/10, 5)
After browsing through various posts, the common mistake is the undefined interval but the +0.01 should have fixed it. Any guidance on what went wrong is much appreciated.
The Definite Integrals example in the docs shows exactly this type of output when a closed form cannot be computed. You can approximate it numerically using vpa, i.e.
F = int(f,y,a,h);
theta = vpa(F);
Or you can do a numerical computation directly
theta = vpaintegral(f,y,a,h);
From the docs:
The vpaintegral function is faster and provides control over integration tolerances.

Using MATLAB plots to find linear equation constants

Finding m and c for an equation y = mx + c, with the help of math and plots.
y is data_model_1, x is time.
Avoid other MATLAB functions like fitlm as it defeats the purpose.
I am having trouble finding the constants m and c. I am trying to find both m and c by limiting them to a range (based on smart guess) and I need to deduce the m and c values based on the mean error range. The point where mean error range is closest to 0 should be my m and c values.
load(file)
figure
plot(time,data_model_1,'bo')
hold on
for a = 0.11:0.01:0.13
c = -13:0.1:-10
data_a = a * time + c ;
plot(time,data_a,'r');
end
figure
hold on
for a = 0.11:0.01:0.13
c = -13:0.1:-10
data_a = a * time + c ;
mean_range = mean(abs(data_a - data_model_1));
plot(a,mean_range,'b.')
end
A quick & dirty approach
You can quickly get m and c using fminsearch(). In the first example below, the error function is the sum of squared error (SSE). The second example uses the sum of absolute error. The key here is ensuring the error function is convex.
Note that c = Beta(1) and m = Beta(2).
Reproducible example (MATLAB code[1]):
% Generate some example data
N = 50;
X = 2 + 13*random(makedist('Beta',.7,.8),N,1);
Y = 5 + 1.5.*X + randn(N,1);
% Example 1
SSEh =#(Beta) sum((Y - (Beta(1) + (Beta(2).*X))).^2);
Beta0 = [0.5 0.5]; % Initial Guess
[Beta SSE] = fminsearch(SSEh,Beta0)
% Example 2
SAEh =#(Beta) sum(abs(Y-(Beta(1) + Beta(2).*X)));
[Beta SumAbsErr] = fminsearch(SAEh,Beta0)
This is a quick & dirty approach that can work for many applications.
#Wolfie's comment directs you to the analytical approach to solve a system of linear equations with the \ operator or mldivide(). This is the more correct approach (though it will get a similar answer). One caveat is this approach gets the SSE answer.
[1] Tested with MATLAB R2018a

Matlab AR digital fiter computation w/o loops

I'm given the a(k) matrix and e(n) and I need to compute y(n) from the following eq:
y(n) = sum(k = 1 to 10 )( a(k)*y(n-k) ) + e(n).
I have the a matrix(filter coefficients) and the e matrix (the residual), therefore there is only 1 unknown: y, which is built by the previous 10 samples of y.
An example to this equation:
say e(0) (my first residual sample) = 3
and y(-10) to y(-1) = 0
then y(0), my first sample in the signal y, would just be e(0) = 3:
y(0) = a(1)*y(-1) + a(2)*y(-2) + .... + a(10)*y(-10) + e(0) = e(0) = 3
and if e(1) = 4, and a(1) = 5, then
y(1) = a(1)*y(0) + a(2)*y(-1) + a(3)&y(-2) + ... + a(10)*y(-9) + e(1) = 19
The problem is
I don't know how to do this without loops because, say, y(n) needs y(n-1), so I need to immediately append y(n-1) into my matrix in order to get y(n).
If n (the number of samples) = say, 10,000,000, then using a loop is not ideal.
What I've done so far
I have not implemented anything. The only thing I've done for this particular problem is research on what kind of Matlab functions I could use.
What I need
A Matlab function that, given an equation and or input matrix, computes the next y(n) and automatically appends that to the input matrix, and then computes the next y(n+1), and automatically append that to the input matrix and so on.
If there is anything regarding my approach
That seems like it's on the wrong track, or my question isn't clear enough, of if there is no such matlab function that exists, then I apologize in advance. Thank you for your time.

Solve equation with exponential term

I have the equation 1 = ((π r2)n) / n! ∙ e(-π r2)
I want to solve it using MATLAB. Is the following the correct code for doing this? The answer isn't clear to me.
n= 500;
A= 1000000;
d= n / A;
f= factorial( n );
solve (' 1 = ( d * pi * r^2 )^n / f . exp(- d * pi * r^2) ' , 'r')
The answer I get is:
Warning: The solutions are parametrized by the symbols:
k = Z_ intersect Dom::Interval([-(PI/2 -
Im(log(`fexp(-PI*d*r^2)`)/n)/2)/(PI*Re(1/n))], (PI/2 +
Im(log(`fexp(-PI*d*r^2)`)/n)/2)/(PI*Re(1/n)))
> In solve at 190
ans =
(fexp(-PI*d*r^2)^(1/n))^(1/2)/(pi^(1/2)*d^(1/2)*exp((pi*k*(2*i))/n)^(1/2))
-(fexp(-PI*d*r^2)^(1/n))^(1/2)/(pi^(1/2)*d^(1/2)*exp((pi*k*(2*i))/n)^(1/2))
You have several issues with your code.
1. First, you're evaluating some parts in floating-point. This isn't always bad as long as you know the solution will be exact. However, factorial(500) overflows to Inf. In fact, for factorial, anything bigger than 170 will overflow and any input bigger than 21 is potentially inexact because the result will be larger than flintmax. This calculation should be preformed symbolically via sym/factorial:
n = sym(500);
f = factorial(n);
which returns an integer approximately equal to 1.22e1134 for f.
2. You're using a period ('.') to specify multiplication. In MuPAD, upon which most of the symbolic math functions are based, a period is shorthand for concatenation.
Additionally, as is stated in the R2015a documentation (and possibly earlier):
String inputs will be removed in a future release. Use syms to declare the variables instead, and pass them as a comma-separated list or vector.
If you had not used a string, I don't think that it would have been possible for your command to get misinterpreted and return such a confusing result. Here is how you could use solve with symbolic variables:
syms r;
n = sym(500);
A = sym(1000000);
d = n/A;
s = solve(1==(d*sym(pi)*r^2)^n/factorial(n)*exp(-d*sym(pi)*r^2),r)
which, after several minutes, returns a 1,000-by-1 vector of solutions, all of which are complex. As #BenVoigt suggests, you can try the 'Real' option for solve. However, in R2015a at least, the four solutions returned in terms of lambertw don't appear to actually be real.
A couple things to note:
MATLAB is not using the values of A, d, and f from your workspace.
f . exp is not doing at all what you wanted, which was multiplication. It's instead becoming an unknown function fexp
Passing additional options of 'Real', true to solve gets rid of most of these extraneous conditions.
You probably should avoid calling the version of solve which accepts a string, and use the Symbolic Toolbox instead (syms 'r')

Getting rank deficient warning when using regress function in MATLAB

I have a dataset comprising of 30 independent variables and I tried performing linear regression in MATLAB R2010b using the regress function.
I get a warning stating that my matrix X is rank deficient to within machine precision.
Now, the coefficients I get after executing this function don't match with the experimental one.
Can anyone please suggest me how to perform the regression analysis for this equation which is comprising of 30 variables?
Going with our discussion, the reason why you are getting that warning is because you have what is known as an underdetermined system. Basically, you have a set of constraints where you have more variables that you want to solve for than the data that is available. One example of an underdetermined system is something like:
x + y + z = 1
x + y + 2z = 3
There are an infinite number of combinations of (x,y,z) that can solve the above system. For example, (x, y, z) = (1, −2, 2), (2, −3, 2), and (3, −4, 2). What rank deficient means in your case is that there is more than one set of regression coefficients that would satisfy the governing equation that would describe the relationship between your input variables and output observations. This is probably why the output of regress isn't matching up with your ground truth regression coefficients. Though it isn't the same answer, do know that the output is one possible answer. By running through regress with your data, this is what I get if I define your observation matrix to be X and your output vector to be Y:
>> format long g;
>> B = regress(Y, X);
>> B
B =
0
0
28321.7264417536
0
35241.9719076362
899.386999172398
-95491.6154990829
-2879.96318251964
-31375.7038251919
5993.52959752106
0
18312.6649115112
0
0
8031.4391233753
27923.2569044728
7716.51932560781
-13621.1638587172
36721.8387047613
80622.0849069525
-114048.707780113
-70838.6034825939
-22843.7931997405
5345.06937207617
0
106542.307940305
-14178.0346010715
-20506.8096166108
-2498.51437396558
6783.3107243113
You can see that there are seven regression coefficients that are equal to 0, which corresponds to 30 - 23 = 7. We have 30 variables and 23 constraints to work with. Be advised that this is not the only possible solution. regress essentially computes the least squared error solution such that sum of residuals of Y - X*B has the least amount of error. This essentially simplifies to:
B = X^(*)*Y
X^(*) is what is known as the pseudo-inverse of the matrix. MATLAB has this available, and it is called pinv. Therefore, if we did:
B = pinv(X)*Y
We get:
B =
44741.6923363563
32972.479220139
-31055.2846404536
-22897.9685877566
28888.7558524005
1146.70695371731
-4002.86163441217
9161.6908044046
-22704.9986509788
5526.10730457192
9161.69080479427
2607.08283489226
2591.21062004404
-31631.9969765197
-5357.85253691504
6025.47661106009
5519.89341411127
-7356.00479046122
-15411.5144034056
49827.6984426955
-26352.0537850382
-11144.2988973666
-14835.9087945295
-121.889618144655
-32355.2405829636
53712.1245333841
-1941.40823106236
-10929.3953469692
-3817.40117809984
2732.64066796307
You see that there are no zero coefficients because pinv finds the solution using the L2-norm, which promotes the "spreading" out of the errors (for a lack of a better term). You can verify that these are correct regression coefficients by doing:
>> Y2 = X*B
Y2 =
16.1491563400241
16.1264219600856
16.525331600049
17.3170318001845
16.7481541301999
17.3266932502295
16.5465094100486
16.5184456100487
16.8428701100165
17.0749421099829
16.7393450000517
17.2993993099419
17.3925811702017
17.3347117202356
17.3362798302375
17.3184486799219
17.1169638102517
17.2813552099096
16.8792925100727
17.2557945601102
17.501873690151
17.6490477001922
17.7733493802508
Similarly, if we used the regression coefficients from regress, so B = regress(Y,X); then doing Y2 = X*B, we get:
Y2 =
16.1491563399927
16.1264219599996
16.5253315999987
17.3170317999969
16.7481541299967
17.3266932499992
16.5465094099978
16.5184456099983
16.8428701099975
17.0749421099985
16.7393449999981
17.2993993099983
17.3925811699993
17.3347117199991
17.3362798299967
17.3184486799987
17.1169638100025
17.281355209999
16.8792925099983
17.2557945599979
17.5018736899983
17.6490476999977
17.7733493799981
There are some slight computational differences, which is to be expected. Similarly, we can also find the answer by using mldivide:
B = X \ Y
B =
0
0
28321.726441712
0
35241.9719075889
899.386999170666
-95491.6154989513
-2879.96318251572
-31375.7038251485
5993.52959751295
0
18312.6649114859
0
0
8031.43912336425
27923.2569044349
7716.51932559712
-13621.1638586983
36721.8387047123
80622.0849068411
-114048.707779954
-70838.6034824987
-22843.7931997086
5345.06937206919
0
106542.307940158
-14178.0346010521
-20506.8096165825
-2498.51437396236
6783.31072430201
You can see that this curiously matches up with what regress gives you. That's because \ is a more smarter operator. Depending on how your matrix is structured, it finds the solution to the system by a different method. I'd like to defer you to the post by Amro that talks about what algorithms mldivide uses when examining the properties of the input matrix being operated on:
How to implement Matlab's mldivide (a.k.a. the backslash operator "\")
What you should take away from this answer is that you can certainly go ahead and use those regression coefficients and they will more or less give you the expected output for each value of Y with each set of inputs for X. However, be warned that those coefficients are not unique. This is apparent as you said that you have ground truth coefficients that don't match up with the output of regress. It isn't matching up because it generated another answer that satisfies the constraints you have provided.
There is more than one answer that can describe that relationship if you have an underdetermined system, as you have seen by my experiments shown above.