Trigonometric functions in constant array initializers in OpenCL

Trigonometric functions in constant array initializers in OpenCL - constants

I have a number of vectors I would like to use in my application. In a square grid, these are the vectors of the cardinal and diagonal directions in which I can go from the center of a cell. My OpenCL kernels will use them often so I would like to define them in constant memory. I have written the following piece of code in my kernel file:
#define N_RADIAN 2 * M_PI_4_F
#define NE_RADIAN 1 * M_PI_4_F
#define E_RADIAN 0 * M_PI_4_F
#define SE_RADIAN 7 * M_PI_4_F
#define S_RADIAN 6 * M_PI_4_F
#define SW_RADIAN 5 * M_PI_4_F
#define W_RADIAN 4 * M_PI_4_F
#define NW_RADIAN 3 * M_PI_4_F
constant float2 E[8] = {
(float2)(cos( N_RADIAN), sin( N_RADIAN)), // N
(float2)(cos(NE_RADIAN), sin(NE_RADIAN)), // NE
(float2)(cos( E_RADIAN), sin( E_RADIAN)), // E
(float2)(cos(SE_RADIAN), sin(SE_RADIAN)), // SE
(float2)(cos( S_RADIAN), sin( S_RADIAN)), // S
(float2)(cos(SW_RADIAN), sin(SW_RADIAN)), // SW
(float2)(cos( W_RADIAN), sin( W_RADIAN)), // W
(float2)(cos(NW_RADIAN), sin(NW_RADIAN)) // NW
};
This code refuses to compile for me. The error message I get is
error: initializer element is not a compile-time constant. I can understand it if the mathematical functions has to be called on the device for the array to get its values. If that is the case I can make a kernel which computes these values without much fuss. However, this method would be more convenient for me. Is there any way in which I can get these values declared in constant memory? Do you see any other problems with this approach or the code?

I think the only way to do it with constants is to go with something like this:
constant float2 E[8] = {
(0.0 , 1.0 ) , // N
(CL_M_SQRT1_2 , CL_M_SQRT1_2 ) , // NE
(1.0 , 0.0 ) , // E
(CL_M_SQRT1_2 , -CL_M_SQRT1_2) , // SE
(0.0 , -1.0 ) , // S
(-CL_M_SQRT1_2 , -CL_M_SQRT1_2) , // SW
(-1.0 , 0.0 ) , // W
(-CL_M_SQRT1_2 , CL_M_SQRT1_2 ) // NW
};
The problem may actually be a blessing in disguise. This alternative code generates values accurate to the limitation of 32-bit IEEE float. The original code is a little off due to the difference between pi/4 and M_PI_4_F. For example, the original code generates cosine (north) = -4.37114e-008 instead of the presumably intended value of zero.

Related

TypeError("can't convert expression to float")

The code which I wrote might look foolish, because it is integration of a derivative function. since it is the basic foundation to the other code which I'm writing on acoustical analysis. this analysis contains integration of different derivative functions which are in multiplication. for this purpose I'm using SciPy for integration and sympy for differentiation. but it is giving an error showing TypeError("can't convert expression to float"). below is the code which I wrote. hoping a solution for this.
import sympy
from sympy import *
from scipy.integrate import quad
var('r')
def diff(r):
r=symbols('x')
Z = 64.25 * r ** 5 - 175.71 *r ** 4 + 170.6 *r ** 3 - 71.103 *r ** 2 + 3 * r
E=sympy.diff(Z,r)
print(E)
return E
R=quad(diff,0,1)[0]
print(R)

I have to say that I'm a bit confused by your statement "integration of a derivative function" since the fundamental theorem of calculus would suggest that this is just a waste of CPU cycles. I'll presume that you know what you're doing though and that you just want to be able to compute some definite integrals numerically...
The SymPy expression that you want to integrate is this:
In [33]: from sympy import *
In [34]: r = symbols("x") # Why are you calling this x?
In [35]: Z = 64.25 * r ** 5 - 175.71 * r ** 4 + 170.6 * r ** 3 - 71.103 * r ** 2 +
...: 3 * r
In [36]: E = diff(Z, r)
In [37]: E
Out[37]:
4 3 2
321.25⋅x - 702.84⋅x + 511.8⋅x - 142.206⋅x + 3
There are a two basic ways to do this with SymPy:
In [38]: integrate(E, (r, 0, 1)) # symbolic integration
Out[38]: -8.96299999999999
In [39]: Integral(E, (r, 0, 1)).evalf() # numeric integration
Out[39]: -8.96300000000002
Note that had you used exact rational numbers you would see a more accurate result in either case:
In [40]: nsimplify(E)
Out[40]:
4 3 2
1285⋅x 17571⋅x 2559⋅x 71103⋅x
─────── - ──────── + ─────── - ─────── + 3
4 25 5 500
In [41]: integrate(nsimplify(E), (r, 0, 1))
Out[41]:
-8963
──────
1000
In [42]: Integral(nsimplify(E), (r, 0, 1)).evalf()
Out[42]: -8.96300000000000
While the approaches above are very accurate and work nicely for this particular integral which is easy to compute both symbolically and numerically they are both slower than using something like scipy's quad function which works with machine precision floating point and efficient numpy arrays for the calculation. To use scipy's quad function you need to lambdify your expression into an ordinary Python function:
In [44]: from scipy.integrate import quad
In [45]: f = lambdify(r, E, "numpy")
In [46]: f(0)
Out[46]: 3.0
In [47]: f(1)
Out[47]: -8.99600000000001
In [48]: quad(f, 0, 1)[0]
Out[48]: -8.963000000000001
What lambdify does is just to generate an efficient Python function for you. You can see the code that it uses like this:
In [51]: import inspect
In [52]: print(inspect.getsource(f))
def _lambdifygenerated(x):
return 321.25*x**4 - 702.84*x**3 + 511.8*x**2 - 142.206*x + 3
The quad routine will pass in numpy arrays for x and so this can be very efficient. If you have high-order polynomials then sympy's horner function can be used to optimise the expression:
In [53]: horner(E)
Out[53]: x⋅(x⋅(x⋅(321.25⋅x - 702.84) + 511.8) - 142.206) + 3.0
In [54]: f2 = lambdify(r, horner(E), "numpy")
In [56]: print(inspect.getsource(f2))
def _lambdifygenerated(x):
return x*(x*(x*(321.25*x - 702.84) + 511.8) - 142.206) + 3.0
https://docs.sympy.org/latest/tutorial/calculus.html#integrals
https://docs.sympy.org/latest/modules/utilities/lambdify.html#sympy.utilities.lambdify.lambdify
https://docs.sympy.org/latest/modules/polys/reference.html#sympy.polys.polyfuncs.horner

Receiving NaN as an output when using the pow() function to generate a Decimal

I've been at this for hours so forgive me if I'm missing something obvious.
I'm using the pow(_ x: Decimal, _ y: Int) -> Decimal function to help generate a monthly payment amount using a basic formula. I have this function linked to the infix operator *** but I've tried using it just by typing out the function and have the same problem.
Xcode was yelling at me yesterday for having too long of a formula, so I broke it up into a couple constants and incorporated that into the overall formula I need.
Code:
precedencegroup PowerPrecedence { higherThan: MultiplicationPrecedence }
infix operator *** : PowerPrecedence
func *** (radix: Decimal, power: Int) -> Decimal {
return (pow((radix), (power)))
}
func calculateMonthlyPayment() {
let rateAndMonths: Decimal = ((0.0199 / 12.0) + (0.0199 / 12.0))
let rateTwo: Decimal = ((1.0+(0.0199 / 12.0)))
loan12YearsPayment[0] = ((rateAndMonths / rateTwo) *** 144 - 1.0) * ((values.installedSystemCost + loanFees12YearsCombined[0]) * 0.7)
When I print to console or run this in the simulator, the output is NaN. I know the pow function itself is working properly because I've tried it with random integers.

Kindly find my point of view for this Apple function implementation, Note the following examples:
pow(1 as Decimal, -2) // 1; (1 ^ Any number) = 1
pow(10 as Decimal, -2) // NAN
pow(0.1 as Decimal, -2) // 100
pow(0.01 as Decimal, -2) // 10000
pow(1.5 as Decimal, -2) // NAN
pow(0.5 as Decimal, -2) // NAN
It seems like, pow with decimal don't consider any floating numbers except for 10 basis. So It deals with:
0.1 ^ -2 == (1/10) ^ -2 == 10 ^ 2 // It calculates it appropriately, It's 10 basis 10, 100, 1000, ...
1.5 ^ -2 == (3/2) ^ -2 // (3/2) is a floating number ,so deal with it as Double not decimal, It returns NAN.
0.5 ^ -2 == (1/2) ^ -2 // (2) isn't 10 basis, So It will be dealt as (1/2) as It is, It's a floating number also. It returns NAN.

how to add echo effect on audio file using objective-c

I am developing an application in which I want to add echo effect in recorded audio files using objective-c.
I am using DIRAC to add other effect e.g. man to women, slow, fast.
now I have to make Robot voice of recorded voice. for robot voice I need to add echo effect
Please help me to do this

Echo is pretty simple. You need a delay line, and little multiplication. Assuming one channel and audio already represented in floating point, a delay line would look something like this (in C-like pseudo-code):
int LENGTH = samplerate * seconds; //seconds is the desired length of the delay in seconds
float buffer[ LENGTH ];
int readIndex = 0, writeIndex = LENGTH - 1;
float delayLine.readNext( float x ) {
float ret = buffer[readIndex];
++readIndex;
if( readIndex >= LENGTH )
readIndex = 0;
return ret;
}
void delayLine.writeNext( float x ) {
buffer[ writeIndex ] = x;
++writeIndex;
if( writeIndex >= LENGTH )
writeIndex = 0;
}
Don't forget to initialize the buffer to all zeros.
So that's your delay line. Basic usage would be this:
float singleDelay( float x ) {
delayLine.writeNext(x);
return delayLine.readNext( x );
}
But you won't hear much difference: it'll just come out later. If you want to hear a single echo, you'll need something like this:
float singleEcho( float x, float g ) {
delayLine.writeNext(x);
return x + g * delayLine.readNext( x );
}
where g is some constant, usually between zero and one.
Now say you want a stream of echos: "HELLO... Hello... hello... h..." like that. You just need to do a bit more work:
float echo( float x, float g ) {
float ret = x + g * delayLine.readNext( x );
delayLine.writeNext( ret );
return ret;
}
Notice how the output of the whole thing is getting fed back into the delay line this time, rather than the input. In this case, it's very important that |g| < 1.
You may run into issues of denormals here. I can't recall if that's an issue on iOS, but I don't think so.

FFTW with MEX and MATLAB argument issues

I wrote the following C/MEX code using the FFTW library to control the number of threads used for a FFT computation from MATLAB. The code works great (complex FFT forward and backward) with the FFTW_ESTIMATE argument in the planner although it is slower than MATLAB. But, when I switch to the FFTW_MEASURE argument to tune up the FFTW planner, it turns out that applying one FFT forward and then one FFT backward does not return the initial image. Instead, the image is scaled by a factor. Using FFTW_PATIENT gives me an even worse result with null matrices.
My code is as follows:
Matlab functions:
FFT forward:
function Y = fftNmx(X,NumCPU)
if nargin < 2
NumCPU = maxNumCompThreads;
disp('Warning: Use the max maxNumCompThreads');
end
Y = FFTN_mx(X,NumCPU)./numel(X);
FFT backward:
function Y = ifftNmx(X,NumCPU)
if nargin < 2
NumCPU = maxNumCompThreads;
disp('Warning: Use the max maxNumCompThreads');
end
Y = iFFTN_mx(X,NumCPU);
Mex functions:
FFT forward:
# include <string.h>
# include <stdlib.h>
# include <stdio.h>
# include <mex.h>
# include <matrix.h>
# include <math.h>
# include </home/nicolas/Code/C/lib/include/fftw3.h>
char *Wisfile = NULL;
char *Wistemplate = "%s/.fftwis";
#define WISLEN 8
void set_wisfile(void)
{
char *home;
if (Wisfile) return;
home = getenv("HOME");
Wisfile = (char *)malloc(strlen(home) + WISLEN + 1);
sprintf(Wisfile, Wistemplate, home);
}
fftw_plan CreatePlan(int NumDims, int N[], double *XReal, double *XImag, double *YReal, double *YImag)
{
fftw_plan Plan;
fftw_iodim Dim[NumDims];
int k, NumEl;
FILE *wisdom;
for(k = 0, NumEl = 1; k < NumDims; k++)
{
Dim[NumDims - k - 1].n = N[k];
Dim[NumDims - k - 1].is = Dim[NumDims - k - 1].os = (k == 0) ? 1 : (N[k-1] * Dim[NumDims-k].is);
NumEl *= N[k];
}
/* Import the wisdom. */
set_wisfile();
wisdom = fopen(Wisfile, "r");
if (wisdom) {
fftw_import_wisdom_from_file(wisdom);
fclose(wisdom);
}
if(!(Plan = fftw_plan_guru_split_dft(NumDims, Dim, 0, NULL, XReal, XImag, YReal, YImag, FFTW_MEASURE *(or FFTW_ESTIMATE respectively)* )))
mexErrMsgTxt("FFTW3 failed to create plan.");
/* Save the wisdom. */
wisdom = fopen(Wisfile, "w");
if (wisdom) {
fftw_export_wisdom_to_file(wisdom);
fclose(wisdom);
}
return Plan;
}
void mexFunction( int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[] )
{
#define B_OUT plhs[0]
int k, numCPU, NumDims;
const mwSize *N;
double *pr, *pi, *pr2, *pi2;
static long MatLeng = 0;
fftw_iodim Dim[NumDims];
fftw_plan PlanForward;
int NumEl = 1;
int *N2;
if (nrhs != 2) {
mexErrMsgIdAndTxt( "MATLAB:FFT2mx:invalidNumInputs",
"Two input argument required.");
}
if (!mxIsDouble(prhs[0])) {
mexErrMsgIdAndTxt( "MATLAB:FFT2mx:invalidNumInputs",
"Array must be double");
}
numCPU = (int) mxGetScalar(prhs[1]);
if (numCPU > 8) {
mexErrMsgIdAndTxt( "MATLAB:FFT2mx:invalidNumInputs",
"NumOfThreads < 8 requested");
}
if (!mxIsComplex(prhs[0])) {
mexErrMsgIdAndTxt( "MATLAB:FFT2mx:invalidNumInputs",
"Array must be complex");
}
NumDims = mxGetNumberOfDimensions(prhs[0]);
N = mxGetDimensions(prhs[0]);
N2 = (int*) mxMalloc( sizeof(int) * NumDims);
for(k=0;k<NumDims;k++) {
NumEl *= NumEl * N[k];
N2[k] = N[k];
}
pr = (double *) mxGetPr(prhs[0]);
pi = (double *) mxGetPi(prhs[0]);
//B_OUT = mxCreateNumericArray(NumDims, N, mxDOUBLE_CLASS, mxCOMPLEX);
B_OUT = mxCreateNumericMatrix(0, 0, mxDOUBLE_CLASS, mxCOMPLEX);
mxSetDimensions(B_OUT , N, NumDims);
mxSetData(B_OUT , (double* ) mxMalloc( sizeof(double) * mxGetNumberOfElements(prhs[0]) ));
mxSetImagData(B_OUT , (double* ) mxMalloc( sizeof(double) * mxGetNumberOfElements(prhs[0]) ));
pr2 = (double* ) mxGetPr(B_OUT);
pi2 = (double* ) mxGetPi(B_OUT);
fftw_init_threads();
fftw_plan_with_nthreads(numCPU);
PlanForward = CreatePlan(NumDims, N2, pr, pi, pr2, pi2);
fftw_execute_split_dft(PlanForward, pr, pi, pr2, pi2);
fftw_destroy_plan(PlanForward);
fftw_cleanup_threads();
}
FFT backward:
This MEX function differs from the above only in switching pointers pr <-> pi, and pr2 <-> pi2 in the CreatePlan function and in the execution of the plan, as suggested in the FFTW documentation.
If I run
A = imread('cameraman.tif');
>> A = double(A) + i*double(A);
>> B = fftNmx(A,8);
>> C = ifftNmx(B,8);
>> figure,imagesc(real(C))
with the FFTW_MEASURE and FFTW_ESTIMATE arguments respectively I get this result.
I wonder if this is due to an error in my code or in the library. I tried different thing around the wisdom, saving not saving. Using the wisdom produce by the FFTW standalone tool to produce wisdom. I haven't seen any improvement. Can anyone suggest why this is happening?
Additional information:
I compile the MEX code using static libraries:
mex FFTN_Meas_mx.cpp /home/nicolas/Code/C/lib/lib/libfftw3.a /home/nicolas/Code/C/lib/lib/libfftw3_threads.a -lm
The FFTW library hasn't been compiled with:
./configure CFLAGS="-fPIC" --prefix=/home/nicolas/Code/C/lib --enable-sse2 --enable-threads --&& make && make install
I tried different flags without success. I am using MATLAB 2011b on a Linux 64-bit station (AMD opteron quad core).

FFTW computes not normalized transform, see here:
http://www.fftw.org/doc/What-FFTW-Really-Computes.html
Roughly speaking, when you perform direct transform followed by inverse one, you get
back the input (plus round-off errors) multiplied by the length of your data.
When you create a plan using flags other than FFTW_ESTIMATE, your input is overwritten:
http://www.fftw.org/doc/Planner-Flags.html

"Nearly divisible"

I want to check if a floating point value is "nearly" a multiple of 32. E.g. 64.1 is "nearly" divisible by 32, and so is 63.9.
Right now I'm doing this:
#define NEARLY_DIVISIBLE 0.1f
float offset = fmodf( val, 32.0f ) ;
if( offset < NEARLY_DIVISIBLE )
{
// its near from above
}
// if it was 63.9, then the remainder would be large, so add some then and check again
else if( fmodf( val + 2*NEARLY_DIVISIBLE, 32.0f ) < NEARLY_DIVISIBLE )
{
// its near from below
}
Got a better way to do this?

well, you could cut out the second fmodf by just subtracting 32 one more time to get the mod from below.
if( offset < NEARLY_DIVISIBLE )
{
// it's near from above
}
else if( offset-32.0f>-1*NEARLY_DIVISIBLE)
{
// it's near from below
}

In a standard-compliant C implementation, one would use the remainder function instead of fmod:
#define NEARLY_DIVISIBLE 0.1f
float offset = remainderf(val, 32.0f);
if (fabsf(offset) < NEARLY_DIVISIBLE) {
// Stuff
}
If one is on a non-compliant platform (MSVC++, for example), then remainder isn't available, sadly. I think that fastmultiplication's answer is quite reasonable in that case.

You mention that you have to test near-divisibility with 32. The following theory ought to hold true for near-divisibility testing against powers of two:
#define THRESHOLD 0.11
int nearly_divisible(float f) {
// printf(" %f\n", (a - (float)((long) a)));
register long l1, l2;
l1 = (long) (f + THRESHOLD);
l2 = (long) f;
return !(l1 & 31) && (l2 & 31 ? 1 : f - (float) l2 <= THRESHOLD);
}
What we're doing is coercing the float, and float + THRESHOLD to long.
f (long) f (long) (f + THRESHOLD)
63.9 63 64
64 64 64
64.1 64 64
Now we test if (long) f is divisible with 32. Just check the lower five bits, if they are all set to zero, the number is divisible by 32. This leads to a series of false positives: 64.2 to 64.8, when converted to long, are also 64, and would pass the first test. So, we check if the difference between their truncated form and f is less than or equal to THRESHOLD.
This, too, has a problem: f - (float) l2 <= THRESHOLD would hold true for 64 and 64.1, but not for 63.9. So, we add an exception for numbers less than 64 (which, when incremented by THRESHOLD and subsequently coerced to long -- note that the test under discussion has to be inclusive with the first test -- is divisible by 32), by specifying that the lower 5 bits are not zero. This will hold true for 63 (1000000 - 1 == 1 11111).
A combination of these three tests would indicate whether the number is divisible by 32 or not. I hope this is clear, please forgive my weird English.
I just tested the extensibility to other powers of three -- the following program prints numbers between 383.5 and 388.4 that are divisible by 128.
#include <stdio.h>
#define THRESHOLD 0.11
int main(void) {
int nearly_divisible(float);
int i;
float f = 383.5;
for (i=0; i<50; i++) {
printf("%6.1f %s\n", f, (nearly_divisible(f) ? "true" : "false"));
f += 0.1;
}
return 0;
}
int nearly_divisible(float f) {
// printf(" %f\n", (a - (float)((long) a)));
register long l1, l2;
l1 = (long) (f + THRESHOLD);
l2 = (long) f;
return !(l1 & 127) && (l2 & 127 ? 1 : f - (float) l2 <= THRESHOLD);
}
Seems to work well so far!

I think it's right:
bool nearlyDivisible(float num,float div){
float f = num % div;
if(f>div/2.0f){
f=f-div;
}
f=f>0?f:0.0f-f;
return f<0.1f;
}

For what I gather you want to detect if a number is nearly divisible by other, right?
I'd do something like this:
#define NEARLY_DIVISIBLE 0.1f
bool IsNearlyDivisible(float n1, float n2)
{
float remainder = (fmodf(n1, n2) / n2);
remainder = remainder < 0f ? -remainder : remainder;
remainder = remainder > 0.5f ? 1 - remainder : remainder;
return (remainder <= NEARLY_DIVISIBLE);
}

Why wouldn't you just divide by 32, then round and take the difference between the rounded number and the actual result?
Something like (forgive the untested/pseudo code, no time to lookup):
#define NEARLY_DIVISIBLE 0.1f
float result = val / 32.0f;
float nearest_int = nearbyintf(result);
float difference = abs(result - nearest_int);
if( difference < NEARLY_DIVISIBLE )
{
// It's nearly divisible
}
If you still wanted to do checks from above and below, you could remove the abs, and check to see if the difference is >0 or <0.

This is without uing the fmodf twice.
int main(void)
{
#define NEARLY_DIVISIBLE 0.1f
#define DIVISOR 32.0f
#define ARRAY_SIZE 4
double test_var1[ARRAY_SIZE] = {63.9,64.1,65,63.8};
int i = 54;
double rest;
for(i=0;i<ARRAY_SIZE;i++)
{
rest = fmod(test_var1[i] ,DIVISOR);
if(rest < NEARLY_DIVISIBLE)
{
printf("Number %f max %f larger than a factor of the divisor:%f\n",test_var1[i],NEARLY_DIVISIBLE,DIVISOR);
}
else if( -(rest-DIVISOR) < NEARLY_DIVISIBLE)
{
printf("Number %f max %f less than a factor of the divisor:%f\n",test_var1[i],NEARLY_DIVISIBLE,DIVISOR);
}
}
return 0;
}