let b0 = UInt32(block[block.startIndex + 0 + (0 << 2)]) << 0 | UInt32(block[block.startIndex + 1 + (0 << 2)]) << 8 | UInt32(block[block.startIndex + 2 + (0 << 2)]) << 16
b0 = b0 | UInt32(block[block.startIndex + 3 + (0 << 2)]) << 24
let b1 = UInt32(block[block.startIndex + 0 + (1 << 2)]) << 0 | UInt32(block[block.startIndex + 1 + (1 << 2)]) << 8 | UInt32(block[block.startIndex + 2 + (1 << 2)]) << 16
b1 = b1 | UInt32(block[block.startIndex + 3 + (1 << 2)]) << 24
let b2 = UInt32(block[block.startIndex + 0 + (2 << 2)]) << 0 | UInt32(block[block.startIndex + 1 + (2 << 2)]) << 8 | UInt32(block[block.startIndex + 2 + (2 << 2)]) << 16
b2 = b2 | UInt32(block[block.startIndex + 3 + (2 << 2)]) << 24
let b3 = UInt32(block[block.startIndex + 0 + (3 << 2)]) << 0 | UInt32(block[block.startIndex + 1 + (3 << 2)]) << 8 | UInt32(block[block.startIndex + 2 + (3 << 2)]) << 16
b3 = b3 | UInt32(block[block.startIndex + 3 + (3 << 2)]) << 24
If you just format this code properly, you'll see there's a very clear pattern:
let start = block.startIndex
let b0 = UInt32(block[start + 0 + (0 << 2)]) << 0
| UInt32(block[start + 1 + (0 << 2)]) << 8
| UInt32(block[start + 2 + (0 << 2)]) << 16
| UInt32(block[start + 3 + (0 << 2)]) << 24
let b1 = UInt32(block[start + 0 + (1 << 2)]) << 0
| UInt32(block[start + 1 + (1 << 2)]) << 8
| UInt32(block[start + 2 + (1 << 2)]) << 16
| UInt32(block[start + 3 + (1 << 2)]) << 24
let b2 = UInt32(block[start + 0 + (2 << 2)]) << 0
| UInt32(block[start + 1 + (2 << 2)]) << 8
| UInt32(block[start + 2 + (2 << 2)]) << 16
| UInt32(block[start + 3 + (2 << 2)]) << 24
let b3 = UInt32(block[start + 0 + (3 << 2)]) << 0
| UInt32(block[start + 1 + (3 << 2)]) << 8
| UInt32(block[start + 2 + (3 << 2)]) << 16
| UInt32(block[start + 3 + (3 << 2)]) << 24
Each b constant is just the numbers 0...3 transformed in similar ways, all bitwise-OR'ed together. Sounds like a job for map/reduce:
let start = block.startIndex
let b0 = (0...3).lazy.map{ UInt32(block[start + $0 + (0 << $0)]) << $0 * 8 }.reduce(0, |)
let b1 = (0...3).lazy.map{ UInt32(block[start + $0 + (1 << $0)]) << $0 * 8 }.reduce(0, |)
let b2 = (0...3).lazy.map{ UInt32(block[start + $0 + (2 << $0)]) << $0 * 8 }.reduce(0, |)
let b3 = (0...3).lazy.map{ UInt32(block[start + $0 + (3 << $0)]) << $0 * 8 }.reduce(0, |)
This can be even further simplified, if you made a b array with 4 elements, rather than 4 seperate b# variables:
let start = block.startIndex
let b = (0...3).map{ x -> UInt32 in
fatalError("I don't know what the number x represents, so I just named it x. Give it a better name.")
return (0...3).lazy
.map{ UInt32(block[start + $0 + (x << $0)]) << $0*8 }
.reduce(0, |)
}
Related
I have these equations:
syms pm pr teta s
A1 = -2 * b1 * pm + 2 * b2 * pr + b * teta + (1-t) * s + (1-p) * a + c * (b1 - b2);
A2 = 2 * b2 * pm + 2 * b1 * pr + (1-b) * teta + t * s + p * a + c * (b1 - b2);
A3 = b * pm + (1-b) * pr - n * teta - c;
A4 = (1-t) * pm + t * pr - k * s - c;
eqns = [A1,A2,A3,A4];
F=#(pm, pr, teta, s) [A1
A2
A3
A4];
x0 = [10, 10, 10, 10];
fsolve(F, x0)
How I can solve them?
(When I use fsolve, it shows this error: FSOLVE requires all values returned by functions to be of data type double)
Since you tagged Mathematica
A1 = -2*b1*pm + 2*b2*pr + b*teta + (1 - t)*s + (1 - p)*a + c*(b1 - b2);
A2 = 2*b2*pm + 2*b1*pr + (1 - b)*teta + t*s + p*a + c*(b1 - b2);
A3 = b*pm + (1 - b)*pr - n*teta - c;
A4 = (1 - t)*pm + t*pr - k*s - c;
FullSimplify[Solve[{A1 == 10, A2 == 10, A3 == 10, A4 == 10}, {pm, pr, teta, s}]]
pm -> ((k ((-1 + b)^2 + 2 b1 n) + n t^2) (b (10 + c) k +
n (10 + c + 10 k - a k - b1 c k + b2 c k +
a k p - (10 + c) t)) + ((-1 + b) (10 + c) k +
k n (-10 + b1 c - b2 c + a p) - (10 + c) n t) (b k - b^2 k +
n (2 b2 k + t - t^2)))/(k n (1 - 2 b1 k + b^2 (1 + 4 b2 k) +
2 (b1 - 2 (b1^2 + b2^2) k) n - 2 t -
4 (b1 + b2) n t + (1 + 4 b2 n) t^2 +
2 b (-1 + 2 b1 k - 2 b2 k + t)))
pr -> (c +
b^2 (10 + c - (-20 + a + 2 b1 c - 2 b2 c) k) + 2 b1^2 c k n -
b2 (20 + c - 2 (-10 + a) k + 2 b2 c k) n - a (1 + 2 b2 k) n p -
2 b1 k (10 + c + n (10 - a p)) + 10 (1 + n - 2 t) -
c (2 + b2 n) t +
n (-30 + a + 20 b2 + a p) t + (10 + c - (-20 + a) n +
2 b2 c n) t^2 - b1 n (c + 20 t + c t (-1 + 2 t)) +
b (k (-10 + a + 20 b1 - 20 b2 - a p) + 20 (-1 + t) +
c (-2 + 3 b1 k - 3 b2 k + 2 t)))/(1 - 2 b1 k +
b^2 (1 + 4 b2 k) + 2 (b1 - 2 (b1^2 + b2^2) k) n - 2 t -
4 (b1 + b2) n t + (1 + 4 b2 n) t^2 +
2 b (-1 + 2 b1 k - 2 b2 k + t))
teta -> (10 - 20 b2 - b2 c -
20 b2 k + 2 a b2 k + 40 b2^2 k + 2 b2^2 c k +
2 b1^2 (20 + 3 c) k - a p -
2 a b2 k p + (-30 + 3 b2 (20 + c) + a (1 + p)) t - (-20 + a +
2 b2 (20 + c)) t^2 +
b (-10 - 4 b1^2 c k + a p +
b1 (20 + 40 k - 2 a k + c (3 + 4 b2 k - 2 t)) + 20 t - a t +
b2 (20 + c - 2 a k + 4 a k p - 2 (20 + c) t)) +
b1 (2 k (-10 + a p) + 20 (-1 + t) + c (-3 + (5 - 2 t) t)))/(1 -
2 b1 k + b^2 (1 + 4 b2 k) + 2 (b1 - 2 (b1^2 + b2^2) k) n - 2 t -
4 (b1 + b2) n t + (1 + 4 b2 n) t^2 +
2 b (-1 + 2 b1 k - 2 b2 k + t))
s -> (20 b1 + b1 c - b2 c -
2 b^2 (-10 + b1 c + b2 (20 + c)) + 20 b1 n + 40 b1^2 n -
20 b2 n + 40 b2^2 n + 2 b1^2 c n + 4 b1 b2 c n +
2 b2^2 c n - (b1 - b2) (20 + c) t +
4 b1 (-10 + b1 c - b2 c) n t - 10 (-1 + 2 b2 + t) +
a (-1 - b^2 - 2 b1 n + p + 2 (b1 + b2) n p + t - p t +
2 n (b1 + b2 - 2 b2 p) t - b (-2 + p + t)) +
b (-(-10 + b2 (20 + c)) (-3 + 2 t) + b1 (-20 + c - 2 c t)))/(1 -
2 b1 k + b^2 (1 + 4 b2 k) + 2 (b1 - 2 (b1^2 + b2^2) k) n - 2 t -
4 (b1 + b2) n t + (1 + 4 b2 n) t^2 +
2 b (-1 + 2 b1 k - 2 b2 k + t))
I am in big problem in solving a code essential for me, and I need the solution as soon as possible
in fact, I have little knowledge of programming in basic
I have a problem with this code.
I have an equation, and I use this code to solve this equation
when I run the program
this error appears
subscript out of range
is there any solution to this problem
0 Print "******** impact *******"
20 Print "____________"
30 Print "this programs is used to solve impact integral"
40 Print "equation of simply supported slab to "
50 Print "optain the following"
60 Print " (1) force _time history"
70 Print " (2) central deflection - time history"
80 Print "-----------------"
90 Print "input data:"
100 Print " (1) FUNDAMENTAL NATURAL FREQUANCY (RAD/SEC)--- W1"
110 Print " (2) STRIKER MASS (KG.) ----- Mst"
120 Print " (3) MASS OF SLAB (KG)---- Ms"
130 Print " (4) hertz constant (n/m^1.5)----k"
140 Print " (5) STRICKER VELOCITY (M/S)----Vo"
150 Print " (6) NUMBER OF MODES----N"
160 Print "_________"
170 Input " W11, MST, MS, K, VO, N", W11, MST, MS, K, VO, N
180 Print "W1="; W11; "RAD/SEC"
190 Print "MST="; MST; "KG"
200 Print "MS="; MS; "KG"
210
220 Print " K = "; K; "N/M^1.5"
230 Print STANDARD
240 Print "VO="; VO; "M/S"
241 Print " N = "; N
250 Print
260 Print
270 K1 = K
280 V = VO
290 W1 = W11 / 2
300 TINF = 2.94 * (MST / (.8 * K1 * V ^ .5)) ^ .4 * 1000
310 DT = TINF / 10
320 M = 20
330 DT = PROUND(DT, 0)
340 DT = DT / 1000
350 M = 20
360 Option Base 1
370 Dim W(11, 11), Z(11, 11), F(30), D(30), BM(30), SH(30), A(30), T(30), DF(30), S(11, 11, 30), C(11, 11, 30)
380 ReDim W(N, N), Z(N, N)
390 For I = 1 To N Step 2
400 For K = 1 To N Step 2
410 W(K, I) = W1 * (I ^ 2 + K ^ 2)
420 Z(K, I) = W(K, I) * DT
430 Next K
440 Next I
450 ReDim F(M), D(M), A(M), T(M), DF(M), BM(M), SH(M), S(N, N, M), C(N, N, M)
460 F(1) = D(1) = A(1) = T(1) = 0
470 For I = 1 To N Step 2
480 For K = 1 To N Step 2
490 S(K, I, 1) = C(K, I, 1) = 0
500 Next K
510 Next I
520 B1 = 0
530 For I = 1 To N Step 2
540 For K = 1 To N Step 2
550 B1 = B1 + (1 - Sin(Z(K, I)) / Z(K, I)) / W(K, I) ^ 2
560 Next K
570 Next I
580 B = -DT ^ 2 / (6 * MST) - 4 * B1 / MS
590 VE = 0
600 For I = 2 To M
610 T(I) = (I - 1) * DT
620 GM = 0
630 If VE = 1 Then 970
640 For J = 2 To I
650 GM = GM + F(J - 1)
660 Next J
670 AA = 0
680 For J = 1 To N Step 2
690 For K = 1 To N Step 2
700 FF = F(I - 1) * (Sin(Z(K, J)) / Z(K, J) - Cos(Z(K, J))) / W(K, J)
710 AA = AA + 4 * (Cos(Z(K, J)) * S(K, J, I - 1) + Sin(Z(K, J)) * C(K, J, I - 1) + FF) / (MS * W(K, J))
720 Next K
730 Next J
740 A(I - 1) = V * (I - 1) * DT - (D(I - 1) + DT ^ 2 * (GM - F(I - 1) / 6)) / MST - AA
750 F = F(I - 1)
760 If A(I - 1) + B * F < 0 Then 840
770 F1 = (A(I - 1) + B * F) ^ 1.5 * K1
780 X = Abs(F1 - F)
790 If X < 10 Then 820
800 F = F1
810 GoTo 770
820 F(I) = F1
830 GoTo 850
840 F(I) = 0
850 D(I) = D(I - 1) + DT ^ 2 * (GM + (F(I) - F(I - 1)) / 6)
860 For J = 1 To N Step 2
870 For K = 1 To N Step 2
880 S(K, J, I) = Cos(Z(K, J)) * S(K, J, I - 1) + Sin(Z(K, J)) * C(K, J, I - 1) + (1 - Sin(Z(K, J)) / Z(K, J)) * (F(I) - F(I - 1)) / W(K, J) + (1 - Cos(Z(K, J))) * F(I - 1) / W(K, J)
890 C(K, J, I) = Cos(Z(K, J)) * C(K, J, I - 1) - Sin(Z(K, J)) * S(K, J, I - 1) + (1 - Cos(Z(K, J))) / Z(K, J) * (F(I) - F(I - 1)) / W(K, J) + Sin(Z(K, J)) * F(I - 1) / W(K, J)
900 Next K
910 Next J
920 DF = 0
930 For J = 1 To N Step 2
940 For K = 1 To N Step 2
950 DF = DF + 4 * S(K, J, I) / W(K, J) / MS
960 Next K
970 Next J
980 DF(I) = DF
990 If F(I) = 0 Then 1010
1000 Next I
1010 Print "----------------------------------------------------------"
1020 Print "{TIME (MS)},{FORCE (KN)},{DEFLECTION(MM)}"
1030 Print "----------------------------------------------------------"
1040 II = I
1050 For O = 1 To II
1060
1070 Print Tab(1); ":"; Tab(5); T(0) * 1000; Tab(18); ":"; Tab(22); F(O) / 1000; Tab(34); ":"; Tab(42); DF(O) * 1000; Tab(56); ":"
1080 Print "-----------------------------------------------------------"
1090 Next O
1100 End
I have tried to do a simple fft and compare the results between MATLAB and CUDA on 2d arrays.
MATLAB:
array of 9 numbers 1-9
I = [1 2 3
4 5 6
7 8 9];
and use this code:
fft(I)
gives the results:
12.0000 + 0.0000i 15.0000 + 0.0000i 18.0000 + 0.0000i
-4.5000 + 2.5981i -4.5000 + 2.5981i -4.5000 + 2.5981i
-4.5000 - 2.5981i -4.5000 - 2.5981i -4.5000 - 2.5981i
And CUDA code:
int FFT_Test_Function() {
int width = 3;
int height = 3;
int n = width * height;
double in[width][height];
Complex out[width][height];
for (int i = 0; i<width; i++)
{
for (int j = 0; j < height; j++)
{
in[i][j] = (i * width) + j + 1;
}
}
// Allocate the buffer
cufftDoubleReal *d_in;
cufftDoubleComplex *d_out;
unsigned int out_mem_size = sizeof(cufftDoubleComplex)*n;
unsigned int in_mem_size = sizeof(cufftDoubleReal)*n;
cudaMalloc((void **)&d_in, in_mem_size);
cudaMalloc((void **)&d_out, out_mem_size);
// Save time stamp
milliseconds timeStart = getCurrentTimeStamp();
cufftHandle plan;
cufftResult res = cufftPlan2d(&plan, width, height, CUFFT_D2Z);
if (res != CUFFT_SUCCESS) { cout << "cufft plan error: " << res << endl; return 1; }
cudaCheckErrors("cuda malloc fail");
for (int i = 0; i < width; i++)
{
cudaMemcpy(d_in + (i * width), &in[i], height * sizeof(double), cudaMemcpyHostToDevice);
cudaCheckErrors("cuda memcpy H2D fail");
}
cudaCheckErrors("cuda memcpy H2D fail");
res = cufftExecD2Z(plan, d_in, d_out);
if (res != CUFFT_SUCCESS) { cout << "cufft exec error: " << res << endl; return 1; }
for (int i = 0; i < width; i++)
{
cudaMemcpy(&out[i], d_out + (i * width), height * sizeof(Complex), cudaMemcpyDeviceToHost);
cudaCheckErrors("cuda memcpy H2D fail");
}
cudaCheckErrors("cuda memcpy D2H fail");
milliseconds timeEnd = getCurrentTimeStamp();
milliseconds totalTime = timeEnd - timeStart;
std::cout << "Total time: " << totalTime.count() << std::endl;
return 0;
}
In this CUDA code i got the result:
You can see that CUDA gives different results.
What am i missed?
Thank you very much for your attention!
The cuFFT result looks correct, but your FFT code is wrong - it should be:
octave:1> I = [ 1 2 3; 4 5 6; 7 8 9 ]
I =
1 2 3
4 5 6
7 8 9
octave:2> fft2(I)
ans =
45.00000 + 0.00000i -4.50000 + 2.59808i -4.50000 - 2.59808i
-13.50000 + 7.79423i 0.00000 + 0.00000i 0.00000 + 0.00000i
-13.50000 - 7.79423i 0.00000 - 0.00000i 0.00000 - 0.00000i
Note the use of fft2.
I wanna ask some basic law of boolean algebra.
What i learn is :
1. A+A'B=A+B
2. A+AB'=A+B'
3. A+AB=A
4. A+A'B'=A+B'
but i meet some condition like :
A'+AB
so, what is the answer for A'+AB?
Let's say A' = D so when A is false, then D is true and vice versa.
Then A' + AB = D + D'B and if you understand your first equation:
D + D'B = D + B = A' + B
Regarding your comment:
I'll use this equality: AB + A'B = B and I will combine the first with the third and the second with the fifth term:
x'y'z'+x'yz+xy'z'+xy'z+xyz = y'z' + yz + xy'z
Now, from the result, I can do this:
y'z' + yz + xy'z = yz + y'(z' + zx)
and now, using using A' + AB = A' + B:
yz + y'(z' + zx) = yz + y'(z' + x) = yz + y'z' + y'x
or do this:
y'z' + yz + xy'z = y'z' + z(y+ xy') = y'z' + z(y + x) = y'z' + zy + xz
Are they different? No, take a look at this:
x y z | yz + y'z' + y'x | y'z' + zy + xz
0 0 0 | 1 | 1
0 0 1 | 0 | 0
0 1 0 | 0 | 0
0 1 1 | 1 | 1
1 0 0 | 1 | 1
1 0 1 | 1 | 1
1 1 0 | 0 | 0
1 1 1 | 1 | 1
You can use this open source project to solve basic boolean expression, its solve all the basic boolean expression
Below recursive method sums the integer values between a range
def sumInts(a: Int, b: Int): Int = {
if(a > b) 0
else {
println(a +"," + b)
a + sumInts(a + 1 , b)
}
}
So sumInts(2 , 5) returns 14
I'm confused about how the recursive call to sumInts sums the integer range. Can explain textually how this method works ?
How does sumInts return the incremented value ?? Perhaps I am missing something fundamental to recursion here
It calculates the sum of values in the range [a, b] by first calculating the sum of the range [a+1, b] (by recursively calling sumInts(a + 1 , b)) then adding a to it.
[Update] In Scala, the return statement is optional; functions return the value of the last expression evaluated. Thus the above function body is equivalent to
if(a > b) return 0
else {
println(a +"," + b)
return a + sumInts(a + 1 , b)
}
[/Update]
Which for the range [2, 5] it would do the following (I removed the println call for the sake of simplicity, and added brackets to mark recursive calls):
if(2 > 5) 0 else 2 + sumInts(2 + 1, 5) which, the condition being false, evaluates to
2 + sumInts(3, 5)
2 + (if(3 > 5) 0 else 3 + sumInts(3 + 1, 5)) which evaluates to
2 + (3 + sumInts(4, 5))
2 + (3 + (if(4 > 5) 0 else 4 + sumInts(4 + 1, 5))) which evaluates to
2 + (3 + (4 + sumInts(5, 5)))
2 + (3 + (4 + (if(5 > 5) 0 else 5 + sumInts(5 + 1, 5)))) which evaluates to
2 + (3 + (4 + (5 + sumInts(6, 5))))
2 + (3 + (4 + (5 + (if(6 > 5) 0 else 6 + sumInts(6 + 1, 5))))) which, the condition being true, evaluates to
2 + (3 + (4 + (5 + (0))))