I've been trying to create a script that performs the non-restoring division algorithm and also prints out what it's doing in every step with the values of what each 'register' is holding in that moment. Without going too into detail of what the algorithm entails, in a particular step, if the MSB of A is 1 (as in A is -ve), we left shift A-Q together and add the value of M to A afterwards and if MSB of A was 0, we left shift A-Q and subtract the value of M.
I'm aware that when the Overflow action is set to Wrap, 1010 when shifted left would become 0100 instead of 1111 as would be the case with the default property of Saturate. In my actual script, both Q and A are defined to be unsigned with this property set, and Q shows the proper behaviour. But when I subtract A and it becomes 'negative' by looping around, and then tried shifting it, it acted as if the Overflow action was set to Saturate (i.e. 1010 becomes 1111). I was able to replicate this issue separately with a test script as well, where shifting was fine before any operation that makes this unsigned number loop around.
This needed a very simple fix too, all I had to do was make A signed, but I wanted to know why the original case wasn't working in the first place, any ideas?
Also here's the test script :
X = fi(5,1,8,0,'OverflowAction','Wrap');
Y = fi(5,0,8,0,'OverflowAction','Wrap');
fprintf('X %s \n',dec2bin(X,8))
X = accumneg(X,6);
fprintf('X %s \n',dec2bin(X,8))
X = bitshift(X,1);
fprintf('X %s \n',dec2bin(X,8))
X = bitshift(X,1);
fprintf('X %s \n',dec2bin(X,8))
X = bitshift(X,1);
fprintf('X %s \n',dec2bin(X,8))
fprintf('Y %s \n',dec2bin(Y,8))
Y = accumneg(Y,6);
fprintf('Y %s \n',dec2bin(Y,8))
Y = bitshift(Y,1);
fprintf('Y %s \n',dec2bin(Y,8))
Y = bitshift(Y,1);
fprintf('Y %s \n',dec2bin(Y,8))
Y = bitshift(Y,1);
fprintf('Y %s \n',dec2bin(Y,8))
And here's the output :
>> test
X 00000101
X 11111111
X 11111110
X 11111100
X 11111000
Y 00000101
Y 11111111
Y 11111111
Y 11111111
Y 11111111
Related
I'm new to Matlab and want to write a program that chooses the value of a parameter (P) to minimize the difference between two vectors, where each vector is a variable in a dataframe. The first vector (call it A) is a predetermined vector of 1s and 0s, and the second vector (call it B) has each of its entries determined as an indicator function that depends on the value of the parameter P and other variables in the dataframe. For instance, let C be a third variable in the dataset, so
A = [1, 0, 0, 1, 0]
B = [x, y, z, u, v]
where x = 1 if (C[1]+10)^0.5 - P > (C[1])^0.5 and otherwise x = 0, and similarly, y = 1 if (C[2]+10)^0.5 - P > (C[2])^0.5 and otherwise y = 0, and so on.
I'm not really sure where to start with the code, except that it might be useful to use the fminsearch command. Any suggestions?
Edit: I changed the above by raising to a power, which is closer to the actual example that I have. I'm also providing a complete example in response to a comment:
Let A be as above, and let C = [10, 1, 100, 1000, 1]. Then my goal with the Matlab code would be to choose a value of P to minimize the differences between the coordinates of the vectors A and B, where B[1] = 1 if (10+10)^0.5 - P > (10)^0.5 and otherwise B[1] = 0, and similarly B[2] = 1 if (1+10)^0.5 - P > (1)^0.5 and otherwise B[2] = 0, etc. So I want to choose P to maximize the likelihood that A[1] = B[1], A[2] = B[2], etc.
I have the following setup in Matlab, where ds is the name of my dataset:
ds.B = zeros(size(ds,1),1); % empty vector to fill
for i = 1:size(ds,1)
if ((ds.C(i) + 10)^(0.5) - P > (ds.C(i))^(0.5))
ds.B(i) = 1;
else
ds.B(i) = 0;
end
end
Now I want to choose the value of P to minimize the difference between A and B. How can I do this?
EDIT: I'm also wondering how to do this when the inequality is something like (C[i]+10)^0.5 - P*D[i] > (C[i])^0.5, where D is another variable in my dataset. Now P is a scalar being multiplied rather than just added. This seems more complicated since I can't solve for P exactly. How can I solve the problem in this case?
EDIT 1: It seems fminbnd() isn't optimal, likely due to the stairstep nature of the indicator function. I've updated to test the midpoints of all the regions between indicator function flips, plus endpoints.
EDIT 2: Updated to include dataset D as a coefficient of P.
If you can package your distance calculation up in a single function based on P, you can then search for its minimum.
arraySize = 1000;
ds.A = double(rand([arraySize,1]) > 0.5);
ds.C = rand(size(ds.A));
ds.D = rand(size(ds.A));
B = #(P)double((ds.C+10).^0.5 - P.*ds.D > ds.C.^0.5);
costFcn = #(P)sqrt(sum((ds.A-B(P)).^2));
% Solving the equation (C+10)^0.5 - P*D = C^0.5 for P, and sorting the results
BCrossingPoints = sort(((ds.C+10).^0.5-ds.C.^0.5)./ds.D);
% Taking the average of each crossing point with its neighbors
BMidpoints = (BCrossingPoints(1:end-1)+BCrossingPoints(2:end))/2;
% Appending endpoints onto the midpoints
PsToTest = [BCrossingPoints(1)-0.1; BMidpoints; BCrossingPoints(end)+0.1];
% Calculate the distance from A to B at each P to test
costResult = arrayfun(costFcn,PsToTest);
% Find the minimum cost
[~,lowestCostIndex] = min(costResult);
% Find the optimum P
optimumP = PsToTest(lowestCostIndex);
ds.B = B(optimumP);
semilogx(PsToTest,costResult)
xlabel('P')
ylabel('Distance from A to B')
1.- x is assumed positive real only, because with x<0 then complex values show up.
Since no comment is made in the question it seems reasonable to assume x real and x>0 only.
As requested, P 'the parameter' a scalar, P only has 2 significant states >0 or <0, let's see how is this:
2.- The following lines generate kind-of random A and C.
Then a sweep of p is carried out and distances d1 and d2 are calculated.
d1 is euclidean distance and d2 is the absolute of the difference between A and and B converting both from binary to decimal:
N=10
% A=[1 0 0 1 0]
A=randi([0 1],1,N);
% C=[10 1 1e2 1e3 1]
C=randi([0 1e3],1,N)
p=[-1e4:1:1e4]; % parameter to optimize
B=zeros(1,numel(A));
d1=zeros(1,numel(p)); % euclidean distance
d2=zeros(1,numel(p)); % difference distance
for k1=1:1:numel(p)
B=(C+10).^.5-p(k1)>C.^.5;
d1(k1)=(sum((B-A).^2))^.5;
d2(k1)=abs(sum(A.*2.^[numel(A)-1:-1:0])-sum(B.*2.^[numel(A)-1:-1:0]));
end
figure;
plot(p,d1)
grid on
xlabel('p');title('d1')
figure
plot(p,d2)
grid on
xlabel('p');title('d2')
The only degree of freedom to optimise seems to be the sign of P regardless of |P| value.
3.- f(p,x) has either no root, or just one root, depending upon p
The threshold funtion is
if f(x)>0 then B(k)==1 else B(k)==0
this is
f(p,x)=(x+10)^.5-p-x^.5
Now
(x+10).^.5-p>x.^.5 is same as (x+10).^.5-x.^.5>p
There's a range of p that keeps f(p,x)=0 without any (real) root.
For the particular case p=0 then (x+10).^.5 and x.^.5 do not intersect (until Inf reached = there's no intersection)
figure;plot(x,(x+10).^.5,x,x.^.5);grid on
[![enter image description here][3]][3]
y2=diff((x+10).^.5-x.^.5)
figure;plot(x(2:end),y2);
grid on;xlabel('x')
title('y2=diff((x+10).^.5-x.^.5)')
[![enter image description here][3]][3]
% 005
This means the condition f(x)>0 is always true holding all bits of B=1. With B=1 then d(A,B) turns into d(A,1), a constant.
However, for a certain value of p then there's one root and f(x)>0 is always false keeping all bits of B=0.
In this case d(A,B) the cost function turns into d(A,0) and this is A itself.
4.- P as a vector
The optimization gains in degrees of freedom if instead of P scalar, P is considered as vector.
For a given x there's a value of p that switches B(k) from 0 to 1.
Any value of p below such threshold keeps B(k)=0.
Equivalently, inverting f(x) :
g(p)=(10-p^2)^2/(4*p^2)>x
Values of x below this threshold bring B closer to A because for each element of B it's flipped to the element value of A.
Therefore, it's convenient to consider P as a vector, not a ascalar, and :
For all, or as many (as possible) elements of C to meet c(k)<(10-p^2)^2/(4*p^2) in order to get C=A or
minimize d(A,C)
5.- roots of f(p,x)
syms t positive
p=[-1000:.1:1000];
zp=NaN*ones(1,numel(p));
sol=zeros(1,numel(p));
for k1=1:1:numel(p)
p(k1)
eq1=(t+10)^.5-p(k1)-t^.5-p(k1)==0;
s1=solve(eq1,t);
if ~isempty(s1)
zp(k1)=s1;
end
end
nzp=~isnan(zp);
zp(nzp)
returns
=
620.0100 151.2900 64.5344 34.2225 20.2500 12.7211
8.2451 5.4056 3.5260 2.2500 1.3753 0.7803
0.3882 0.1488 0.0278
I'm in the process of implementing Three Fish block cipher using MATLAB. At first, I implemented the algorithm on uint8 numbers to validate my code. Every thing was OK and the decryption was successful. But when I replaced the numbers to uint64 the plain text did not retrieved correctly.
I traced the rounds results again and over again to find the reason, but I couldn't find it so far. There is difference in the first four digits between encryption and decryption, that is, along the rounds x encrypted as 9824265115183455531, but it decrypts as 9824265115183455488.
I think the reason behind this difference is in the functions AddMod64 and SubMod64 to find arithmetic modulo 2 to the power 64. but really I could not fix it so far.
I know that
double(2^64) = 18446744073709552000
and
uint64(2^64) = 18446744073709551615 % z = ( x + y ) % 2^64
function z = AddMod64(x , y)
m = uint64(2^64);
z = double(mod(mod(double(x),m)+mod(double(y),m),m));
end
% z = (x - y ) % 2^64
function z = SubMod64(x , y)
m = uint64(2^64);
z = double(mod(mod(double(x),m) - mod(double(y),m),m));
end
double(2^64) is already the wrong result, the double type can hold only up to 2^52-1 as an integer without rounding.
Also, when you do uint64(2^64), the power is computed using double, giving the wrong result, which you then cast to uint64. And because the maximum value that a uint64 van hold is 2^64-1, that whole operation is wrong.
Use maxint instead:
m = maxint('uint64');
To do modulo addition in MATLAB is rather tricky, because MATLAB does saturated arithmetic with integers. You need to test for overflow before doing the computation.
if x > m - y
x = y - (m - x + 1);
else
x = x + y
end
I started learning Julia not a long time ago and I decided to do a simple
comparison between Julia and Matlab on a simple code for computing Euclidean
distance matrices from a set of high dimensional points.
The task is simple and can be divided into two cases:
Case 1: Given two datasets in the form of n x d matrices, say X1 and X2, compute the pair wise Euclidean distance between each point in X1 and all the points in X2. If X1 is of size n1 x d, and X2 is of size n2 x d, then the resulting Euclidean distance matrix D will be of size n1 x n2. In the general setting, matrix D is not symmetric, and diagonal elements are not equal to zero.
Case 2: Given one dataset in the form of n x d matrix X, compute the pair wise Euclidean distance between all the n points in X. The resulting Euclidean distance matrix D will be of size n x n, symmetric, with zero elements on the main diagonal.
My implementation of these functions in Matlab and in Julia is given below. Note that none of the implementations rely on loops of any sort, but rather simple linear algebra operations. Also, note that the implementation using both languages is very similar.
My expectations before running any tests for these implementations is that the Julia code will be much faster than the Matlab code, and by a significant margin. To my surprise, this was not the case!
The parameters for my experiments are given below with the code. My machine is a MacBook Pro. (15" Mid 2015) with 2.8 GHz Intel Core i7 (Quad Core), and 16 GB 1600 MHz DDR3.
Matlab version: R2018a
Julia version: 0.6.3
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
The results are given in Table (1) below.
Table 1: Average time in seconds (with standard deviation) over 30 trials for computing Euclidean distance matrices between two different datasets (Col. 1),
and between all pairwise points in one dataset (Col. 2).
Two Datasets || One Dataset
Matlab: 2.68 (0.12) sec. 1.88 (0.04) sec.
Julia V1: 5.38 (0.17) sec. 4.74 (0.05) sec.
Julia V2: 5.2 (0.1) sec.
I was not expecting this significant difference between both languages. I expected Julia to be faster than Matlab, or at least, as fast as Matlab. It was really a surprise to see that Matlab is almost 2.5 times faster than Julia in this particular task. I didn't want to draw any early conclusions based on these results for few reasons.
First, while I think that my Matlab implementation is as good as it can be, I'm wondering whether my Julia implementation is the best one for this task. I'm still learning Julia and I hope there is a more efficient Julia code that can yield faster computation time for this task. In particular, where is the main bottleneck for Julia in this task? Or, why does Matlab have an edge in this case?
Second, my current Julia package is based on the generic and standard BLAS and LAPACK packages for MacOS. I'm wondering whether JuliaPro with BLAS and LAPACK based on Intel MKL will be faster than the current version I'm using. This is why I opted to get some feedback from more knowledgeable people on StackOverflow.
The third reason is that I'm wondering whether the compile time for Julia was
included in the timings shown in Table 1 (2nd and 3rd rows), and whether there is a better way to assess the execution time for a function.
I will appreciate any feedback on my previous three questions.
Thank you!
Hint: This question has been identified as a possible duplicate of another question on StackOverflow. However, this is not entirely true. This question has three aspects as reflected by the answers below. First, yes, one part of the question is related to the comparison of OpenBLAS vs. MKL. Second, it turns out that the implementation as well can be improved as shown by one of the answers. And last, bench-marking the julia code itself can be improved by using BenchmarkTools.jl.
MATLAB
num_trials = 30;
dim = 1000;
n1 = 10000;
n2 = 10000;
T = zeros(num_trials,1);
XX1 = randn(n1,dim);
XX2 = rand(n2,dim);
%%% DIFEERENT MATRICES
DD2ds = zeros(n1,n2);
for (i = 1:num_trials)
tic;
DD2ds = distmat_euc2ds(XX1,XX2);
T(i) = toc;
end
mt = mean(T);
st = std(T);
fprintf(1,'\nDifferent Matrices:: dim: %d, n1 x n2: %d x %d -> Avg. Time %f (+- %f) \n',dim,n1,n2,mt,st);
%%% SAME Matrix
T = zeros(num_trials,1);
DD1ds = zeros(n1,n1);
for (i = 1:num_trials)
tic;
DD1ds = distmat_euc1ds(XX1);
T(i) = toc;
end
mt = mean(T);
st = std(T);
fprintf(1,'\nSame Matrix:: dim: %d, n1 x n1 : %d x %d -> Avg. Time %f (+- %f) \n\n',dim,n1,n1,mt,st);
distmat_euc2ds.m
function [DD] = distmat_euc2ds (XX1,XX2)
n1 = size(XX1,1);
n2 = size(XX2,1);
DD = sqrt(ones(n1,1)*sum(XX2.^2.0,2)' + (ones(n2,1)*sum(XX1.^2.0,2)')' - 2.*XX1*XX2');
end
distmat_euc1ds.m
function [DD] = distmat_euc1ds (XX)
n1 = size(XX,1);
GG = XX*XX';
DD = sqrt(ones(n1,1)*diag(GG)' + diag(GG)*ones(1,n1) - 2.*GG);
end
JULIA
include("distmat_euc.jl")
num_trials = 30;
dim = 1000;
n1 = 10000;
n2 = 10000;
T = zeros(num_trials);
XX1 = randn(n1,dim)
XX2 = rand(n2,dim)
DD = zeros(n1,n2)
# Euclidean Distance Matrix: Two Different Matrices V1
# ====================================================
for i = 1:num_trials
tic()
DD = distmat_eucv1(XX1,XX2)
T[i] = toq();
end
mt = mean(T)
st = std(T)
println("Different Matrices V1:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")
# Euclidean Distance Matrix: Two Different Matrices V2
# ====================================================
for i = 1:num_trials
tic()
DD = distmat_eucv2(XX1,XX2)
T[i] = toq();
end
mt = mean(T)
st = std(T)
println("Different Matrices V2:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")
# Euclidean Distance Matrix: Same Matrix V1
# =========================================
for i = 1:num_trials
tic()
DD = distmat_eucv1(XX1)
T[i] = toq();
end
mt = mean(T)
st = std(T)
println("Same Matrix V1:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")
distmat_euc.jl
function distmat_eucv1(XX1::Array{Float64,2},XX2::Array{Float64,2})
(num1,dim1) = size(XX1)
(num2,dim2) = size(XX2)
if (dim1 != dim2)
error("Matrices' 2nd dimensions must agree!")
end
DD = sqrt.((ones(num1)*sum(XX2.^2.0,2)') +
(ones(num2)*sum(XX1.^2.0,2)')' - 2.0.*XX1*XX2');
end
function distmat_eucv2(XX1::Array{Float64,2},XX2::Array{Float64,2})
(num1,dim1) = size(XX1)
(num2,dim2) = size(XX2)
if (dim1 != dim2)
error("Matrices' 2nd dimensions must agree!")
end
DD = (ones(num1)*sum(Base.FastMath.pow_fast.(XX2,2.0),2)') +
(ones(num2)*sum(Base.FastMath.pow_fast.(XX1,2.0),2)')' -
Base.LinAlg.BLAS.gemm('N','T',2.0,XX1,XX2);
DD = Base.FastMath.sqrt_fast.(DD)
end
function distmat_eucv1(XX::Array{Float64,2})
n = size(XX,1)
GG = XX*XX';
DD = sqrt.(ones(n)*diag(GG)' + diag(GG)*ones(1,n) - 2.0.*GG)
end
First question: If I re-write the julia distance function like so:
function dist2(X1::Matrix, X2::Matrix)
size(X1, 2) != size(X2, 2) && error("Matrices' 2nd dimensions must agree!")
return sqrt.(sum(abs2, X1, 2) .+ sum(abs2, X2, 2)' .- 2 .* (X1 * X2'))
end
I shave >40% off the execution time.
For a single dataset you can save a bit more, like this:
function dist2(X::Matrix)
G = X * X'
dG = diag(G)
return sqrt.(dG .+ dG' .- 2 .* G)
end
Third question: You should do your benchmarking with BenchmarkTools.jl, and perform the benchmarking like this (remember $ for variable interpolation):
julia> using BenchmarkTools
julia> #btime dist2($XX1, $XX2);
Additionally, you should not do powers using floats, like this: X.^2.0. It is faster, and equally correct to write X.^2.
For multiplication there is no speed difference between 2.0 .* X and 2 .* X, but you should still prefer using an integer, because it is more generic. As an example, if X has Float32 elements, multiplying with 2.0 will promote the array to Float64s, while multiplying with 2 will preserve the eltype.
And finally, note that in new versions of Matlab, too, you can get broadcasting behaviour by simply adding Mx1 arrays with 1xN arrays. There is no need to first expand them by multiplying with ones(...).
i am trying to implement Hack ALU without using muxes but i cant upload the hdl into the simulator. Any help would be appreciated. Thanks
CHIP ALU {
IN
x[16], y[16], // 16-bit inputs
zx, // zero the x input?
nx, // negate the x input?
zy, // zero the y input?
ny, // negate the y input?
f, // compute out = x + y (if 1) or x & y (if 0)
no; // negate the out output?
OUT
out[16], // 16-bit output
zr, // 1 if (out == 0), 0 otherwise
ng; // 1 if (out < 0), 0 otherwise
PARTS:
// Put you code here:
//To zero x or not
Not(in=zx, out=notzx);
And16(a=x, b[0..15]=notzx, out=zerox);
//To zero y or not
Not(in=zy, out=notzy);
And16(a=y, b[0..15]=notzy, out=zeroy);
//Negate x or not
Xor16(a=zerox, b[0..15]=nx, out=negatex);
//Negate y or not
Xor16(a=zeroy, b[0..15]=ny, out=negatey);
Not(in=f, out=fnot);
//"and" or "add" x?
And16(a=negatex, b[0..15]=f, out=addx);
And16(a=negatex, b[0..15]=fnot, out=andx);
//"and" or "add" y
And16(a=negatey, b[0..15]=f, out=addy);
And16(a=negatey, b[0..15]=fnot, out=andy);
//adding x and y
Add16(a=addx, b=addy, out=sum);
//anding x and y
And16(a=andx, b=andy, out=outxandy);
//output of adding or anding
Or16(a=sum, b=outxandy, out=out1);
//Negating using "Xor"
Xor16(a=out1, b[0..15]=no, out=out2);
Not(in=out2[15], out=ng);
Or8Way(in=out2[0..7], out=zr1);
Or8Way(in=out2[8..15], out=zr2);
Or(a=zr1, b=zr2, out=zr);
And16(a=out2, b=out2, out=out);
Inputs like b[0..15]=notzx are not valid in HDL because the two sides of the assignment do not have the same width. The only values that have variable width are true and false.
You might try something like:
And16(a=x, b[0]=notzx, b[1]=notzx, ... , b[15]=notzx, out=zerox);
I'm trying to calculate how the errors depend on the step, h, for the trapezoidal rule. The errors should get smaller with a smaller value of h, but for me this doesn't happen. This is my code:
Iref is a reference value calculated and verified with Simpson's method and the MATLAB function quad, respectively
for h = 0.01:0.1:1
x = a:h:b;
v = y(x);
Itrap = (sum(v)-v(1)/2-v(end)/2)*h;
Error = abs(Itrap-Iref)
end
I think there's something wrong with the way I'm using h, because the trapezoidal rule works for known integrals. I would be really happy if someone could help me with this, because I can't understand why the errors are "jumping around" the way the do.
I wonder if maybe part of the problem is that not all intervals - for each step size h - have the same a and b just because of the way that x is constructed. Try the following with the additional fprintf statement:
for h = 0.01:0.1:1
x = a:h:b;
fprintf('a=%f b=%f\n',x(1),x(end));
v = y(x);
Itrap = (sum(v)-v(1)/2-v(end)/2)*h;
Error = abs(Itrap-Iref);
end
Depending upon your a and b (I chose a=0 and b=5) all the a values were identical (as expected) but the b varied from 4.55 to 5.0.
I think that you always want to keep the interval [a,b] the same for each step size that you choose in order to get a better comparison between each iteration. So rather than iterating over the step size, you could instead iterate over the n, the number of equally spaced sub-intervals within [a,b].
Rather than
for h = 0.01:0.1:1
x = a:h:b;
you could do something more like
% iterate over each value of n, chosen so that the step size
% is similar to what you had before
for n = [501 46 24 17 13 10 9 8 7 6]
% create an equally spaced vector of n numbers between a and b
x = linspace(a,b,n);
% get the step delta
h = x(2)-x(1);
v = y(x);
Itrap = (sum(v)-v(1)/2-v(end)/2)*h;
Error = abs(Itrap-Iref);
fprintf('a=%f b=%f len=%d h=%f Error=%f\n',x(1),x(end),length(x),h,Error);
end
When you evaluate the above code, you will notice that a and b are consistent for each iteration, h is roughly what you chose before, and the Error does increase as the step size increases.
Try the above and see what happens!