An algorithm for inverting a soundwave - iphone

Hi I’m making a simple program in Objective-C and I need to get the inverse of a soundwave.
I've tried searching for an algorithm for doing this but I haven't found anything. I guess it’s more complicated than just multiplying each value with -1 :P
Here’s my code so far, I’ve cast the data to int32_t to be able to manipulate it:
int32_t* samples = (int32_t*)(sourceBuffer.mData);
for ( int i = 0; i < sourceBuffer.mDataByteSize / sizeof(int32_t); i++ )
{
// Add algorithm here
}
Thanks.

Multiplying by -1 should work no? What output do you get? Bear in mind you won't hear ANY difference with the wave unless you layer the normal and inverted together in which case they will cancel each other out.

Related

Declaring a functional recursive sequence in Matlab

I'd like to declare first of all, that I'm a mathematician. This might be a stupid stupid question; but I've gone through all the matlab tutorials--they've gotten me nowhere. I imagine I could code this in C (it'd be exhausting); but I need matlab for this particular function. And I don't get exactly how to do it.
Here is the pasted Matlab code of where I'm running into trouble:
function y = TAU(z,n)
y=0;
for i =[1,n]
y(z) = log(beta(z+1,i) + y(z+1)) - beta(z,i);
end
end
(beta is an arbitrary "float" to "float" function with an index i.)
I'm having trouble declaring y as a function, in which we call the function at a different argument. I want to define y_n(z) with something something y_{n-1}(z+1). This is all done in a recursive process to create the function. I really feel like I'm missing something stupid.
As a default function it assigns y to be an array (or whatever you call the default index assignment). But I don't want an array. I want y to be assigned as a "function" class (i.e. takes "float" to "float"). And then I'm defining a sequence of y_n : "float" to "float". So that z to z+1 is a map on "float" to "float".
I don't know if I'm asking too much of matlab...
Help a poor mathematician who hasn't coded since the glory days of X-box mods.
...Please don't tell me I have to go back to Pari-GP/C drawing boards over something so stupid.
Please help!
EDIT: At rahnema1 & mimocha's request, I'll describe the math, and of what I am trying to do with my program. I can't see how to implement latex in here. So I'll write the latex code in a generator and upload a picture. I'm not so sure if there even is a work around to what I want to do.
As to the expected output. We'd want,
beta(z+1,i) + TAU(z+1,i) = exp(beta(z,i) + TAU(z,i+1))
And we want to grow i to a fixed value n. Again, I haven't programmed in forever, so I apologize if I'm speaking a little nonsensically.
EDIT2:
So, as #rahnema1 suggests; I should produce a reproducible example. In order to do this, I'll write the code for my beta function. It's surprisingly simple. This is for the case where the "multiplier" variable is set to log(2); but you don't need to worry about any of that.
function f = beta(z,n)
f=0;
for i = 0:n-1
f = exp(f)/(1+exp(log(2)*(n-i-z)));
end
end
This will work fine for z a float no greater than 4. Once you make z larger it'll start to overflow. So for example, if you put in,
beta(2,100)
1.4242
beta(3,100)
3.3235
beta(3,100) - exp(beta(2,100))/(1/4+1)
0
The significance of the 100, is simply how many iterations we perform; it converges fast so even setting this to 15 or so will still produce the same numerical accuracy. Now, the expected output I want for TAU is pretty straight forward,
TAU(z,1) = log(beta(z+1,1)) - beta(z,1)
TAU(z,2) = log(beta(z+1,2) + TAU(z+1,1)) - beta(z,2)
TAU(z,3) = log(beta(z+1,3) + TAU(z+1,2)) - beta(z,3)
...
TAU(z,n) = log(beta(z+1,n) + TAU(z+1,n-1)) -beta(z,n)
I hope this helps. I feel like there should be an easy way to program this sequence, and I must be missing something obvious; but maybe it's just not possible in Matlab.
At mimocha's suggestion, I'll look into tail-end recursion. I hope to god I don't have to go back to Pari-gp; but it looks like I may have to. Not looking forward to doing a deep dive on that language, lol.
Thanks, again!
Is this what you are looking for?
function out = tau(z,n)
% Ends recursion when n == 1
if n == 1
out = log(beta(z+1,1)) - beta(z,1);
return
end
out = log(beta(z+1,n) + tau(z+1,n-1)) - beta(z,n);
end
function f = beta(z,n)
f = 0;
for i = 0:n-1
f = exp(f) / (1 + exp(log(2)*(n-i-z)));
end
end
This is basically your code from the most recent edit, but I've added a simple catch in the tau function. I tried running your code and noticed that n gets decremented infinitely (no exit condition).
With the modification, the code runs successfully on my laptop for smaller integer values of n, where 1e5 > n >= 1; and for floating values of z, real and complex. So the code will unfortunately break for floating values of n, since I don't know what values to return for, say, tau(1,0) or tau(1,0.9). This should easily be fixable if you know the math though.
However, many of the values I get are NaNs or Infs. So I'm not sure if your original problem was Out of memory error (infinite recursion), or values blowing up to infinity / NaN (numerical stability issue).
Here is a quick 100x100 grid calculation I made with this code.
Then I tested on negative values of z, and found the imaginary part of the output to looks kinda cool.
Not to mention I'm slightly geeking out over the fact that pi is showing up in the imaginary part as well :)
tau(-0.3,2) == -1.45179335740446147085 +3.14159265358979311600i

HLSL - asuint of a float seems to return the wrong value

I've been attempting to encode 4 uints (8-bit) into one float so that I can easily store them in a texture along with a depth value. My code wasn't working, and ultimately I found that the issue boiled down to this:
asuint(asfloat(uint(x))) returns 0 in most cases, when it should return x.
In theory, this code should return x (where x is a whole number) because the bits in x are being converted to float, then back to uint, so the same bits end up being interpreted as a uint again. However, I found that the only case where this function seems to return x is when the bits of x are interpreted as a very large float. I considered the possibility that this could be a graphics driver issue, so I tried it on two different computers and got the same issue on both.
I tested several other variations of this code, and all of these seem to work correctly.
asfloat(asuint(float(x))) = x
asuint(asint(uint(x))) = x
asuint(uint(x)) = x
The only case that does not work as intended is the first case mentioned in this post. Is this a bug, or am I doing something wrong? Also, this code is being run in a fragment shader inside of Unity.
After a long time of searching, I found some sort of answer, so I figured I would post it here just in case anyone else stumbles across this problem. The reason that this code does not work has something to do with float denormalization. (I don't completely understand it.) Anyway, denormalized floats were being interpreted as 0 by asuint so that asuint of a denormalized float would always be 0.
A somewhat acceptable solution may be (asuint(asfloat(x | 1073741824)) & 3221225471)
This ensures that the float is normalized, however it also erases any data stored in the second bit. If anyone has any other solutions that can preserve this bit, let me know!

How to speed up array concatenation?

Instead of concatening results to this, Is there other way to do the following, I mean the loop will persist but vector=[vector,sum(othervector)]; can be gotten in any other way?
vector=[];
while a - b ~= 0
othervector = sum(something') %returns a vector like [ 1 ; 3 ]
vector=[vector,sum(othervector)];
...
end
vector=vector./100
Well, this really depends on what you are trying to do. Starting from this code, you might need to think about the actions you are doing and if you can change that behavior. Since the snippet of code you present shows little dependencies (i.e. how are a, b, something and vector related), I think we can only present vague solutions.
I suspect you want to get rid of the code to circumvent the effect of constantly moving the array around by concatenating your new results into it.
First of all, just make sure that the slowest portion of your application is caused by this. Take a look at the Matlab profiler. If that portion of your code is not a major time hog, don't bother spending a lot of time on improving it (and just say to mlint to ignore that line of code).
If you can analyse your code enough to ensure that you have a constant number of iterations, you can preallocate your variables and prevent any performance penalty (i.e. write a for loop in the worst case, or better yet really vectorized code). Or if you can `factor out' some variables, this might help also (move any loop invariants outside of the loop). So that might look something like this:
vector = zeros(1,100);
while a - b ~= 0
othervector = sum(something);
vector(iIteration) = sum(othervector);
iIteration = iIteration + 1;
end
If the nature of your code doesn't allow this (e.g. you are iterating to attain convergence; in that case, beware of checking equality of doubles: always include a tolerance), there are some tricks you can perform to improve performance, but most of them are just rules of thumb or trying to make the best of a bad situation. In this last case, you might add some maintenance code to get slightly better performance (but what you gain in time consumption, you lose in memory usage).
Let's say, you expect the code to run 100*n iterations most of the time, you might try to do something like this:
iIteration = 0;
expectedIterations = 100;
vector = [];
while a - b ~= 0
if mod(iIteration,expectedIterations) == 0
vector = [vector zeros(1,expectedIterations)];
end
iIteration = iIteration + 1;
vector(iIteration) = sum(sum(something));
...
end
vector = vector(1:iIteration); % throw away uninitialized
vector = vector/100;
It might not look pretty, but instead of resizing the array every iteration, the array only gets resized every 100th iteration. I haven't run this piece of code, but I've used very similar code in a former project.
If you want to optimize for speed, you should preallocate the vector and have a counter for the index as #Egon answered already.
If you just want to have a different way of writing vector=[vector,sum(othervector)];, you could use vector(end + 1) = sum(othervector); instead.

Efficient way to make a Programmatic Audio Mixdown

I'm currently working on the iPhone with Audio Units and I'm playing four tracks simultaneously. To improve the performance of my setup, I thought it would be a good idea to minimize the number of Audio Units / threads, by mixing down the four tracks into one.
With the following code I'm processing the next buffer by adding up the samples of the four tracks, keep them in the SInt16 range and add them to a temporary buffer, which will later on be copied into the ioData.mBuffers of the Audio Unit.
Although it works, I don't have the impression that this is the most efficient way to do this.
SInt16* buffer = bufferToWriteTo;
int reads = bufferSize/sizeof(SInt16);
SInt16** files = circularBuffer->files;
float tempValue;
SInt16 values[reads];
int k,j;
int numFiles=4;
for (k=0; k<reads; k++)
{
tempValue=0.f;
for (j=0; j<numFiles; j++)
{
tempValue += files[j][packetNumber];
}
if (tempValue > 32767.f) tempValue = 32767.f;
else if (tempValue < -32768.f) tempValue =- 32768.f;
values[k] = (SInt16) tempValue;
values[k] += values[k] << 16;
packetNumber++;
if (packetNumber >= totalPackets) packetNumber=0;
}
memcpy(buffer,values,bufferSize);
Any ideas or pointers to speed this up? Am I right?
The biggest improvement you can get from this code would be by not using floating point arithmetic. While the arithmetic by itself is fast, the conversions which happen in the nested loops, take a long time, especially on the ARM processor in the iPhone. You can achieve exactly the same results by using 'SInt32' instead of 'float' for the 'tempValue' variable.
Also, see if you can get rid of the memcpy() in the last string: perhaps you can construct the 'buffer' directly, without using a temporary buffer called 'values'. That saves one copy, which would be significant improvement for such a function.
Other notes: the last two lines of the loop probably belong outside of the loop and the body of the nested loop should use 'k' as a second index, instead of 'packetNumber', but I'm not sure about this logic.
And the last note: you're squashing the peaks of your resulting sound. While this seems like a good idea, it will sound pretty rough. You probably want to scale the result down instead of cropping it. Like that: instead of this code
for (j=0; j<numFiles; j++)
{
tempValue += files[j][packetNumber];
}
if (tempValue > 32767.f) tempValue = 32767.f;
else if (tempValue < -32768.f) tempValue =- 32768.f;
you probably want something like this:
for (j=0; j<numFiles; j++)
{
tempValue += files[j][packetNumber] / numFiles;
}
Edit: and please do not forget to measure the performance before and after, to see which one of the improvements gave the biggest impact. This is the best way to learn performance: trial and measurement
A couple of pointers even though I'm not really familliar with iPhone development.
You could unwind the inner loop. You don't need a for loop to add 4 numbers together although it might be your compiler will do this for you.
Write directly to the buffer in your for loop. memcpy at the end will do another loop to copy the buffers.
Don't use a float for tempvalue. Depending on the hardware integer math is quicker and you don't need floats for summing channels.
Remove the if/endif. Digital clipping will sound horrible anyway so try to avoid it before summing the channels together. Branching inside a loop like this should be avoided if possible.
One thing I found when writing the audio mixing routines for my app is that incremented pointers worked much faster than indexing. Some compilers may sort this out for you but - not sure on the iphone - but certainly this gave my app a big boost for these tight loops (about 30% if I recall).
eg: instead of this:
for (k=0; k<reads; k++)
{
// Use buffer[k]
}
do this:
SInt16* p=buffer;
SInt16* pEnd=buffer+reads;
while (p!=pEnd)
{
// Use *p
p++;
}
Also, I believe iPhone has some sort of SIMD (single instruction multiple data) support called VFP. This would let you perform math on a number of samples in one instruction but I know little about this on iPhone.

Why is this C-style code 10X slower than this obj-C style code?

//obj C version, with some - less than one second on 18,000 iterations
for (NSString* coordStr in splitPoints) {
char *buf = [coordStr UTF8String];
sscanf(buf, "%f,%f,", &routePoints[i].latitude, &routePoints[i].longitude);
i++;
}
//C version - over 13 seconds on 18,000 iterations
for (i = 0; buf != NULL; buf = strchr(buf,'['), ++i) {
buf += sizeof(char);
sscanf(buf, "%f,%f,", &routePoints[i].latitude, &routePoints[i].longitude);
}
As a corollary question, is there any way to make this loop faster?
Also see this question: Another Speed Boost Possible?
Measure, measure, measure.
Measure the code with the Sampler instrument in Instruments.
With that said, there is an obvious inefficiency in the C code compared to the Objective-C code.
Namely, fast enumeration -- the for(x in y) syntax -- is really fast and, more importantly, implies that splitPoints is an array or set that contains a bunch of data that has already been parsed into individual objects.
The strchr() call in the second loop implies that you are parsing stuff on the fly. In and of itself, strchr() is a looping operation and will consume time, more-so as the # of characters between occurrences of the target character increase.
That is all conjecture, though. As with all optimizations, speculation is useless and gathering concrete data using the [rather awesome] set of tools provided is the only way to know for sure.
Once you have measured, then you can make it faster.
Having nothing to do with performance, your C code has an error in it. buf += sizeof(char) should simply be buf++. Pointer arithmetic always moves in units the size of the type. It worked fine in this case because sizeof(char) was 1.
Obj C code looks like it has precomputed some split points, while the C code seeks them in each iteration. Simple answer? If N is the length of buf and M the number of your split points, it looks like your two snippets have complexities of O(M) versus O(N*M); which one's slower?
edit: Really amazed me though, that some would think C code is axiomatically faster than any other solution.
Vectorization can be used to speed up C code.
Example:
Even faster UTF-8 character counting
(But maybe just try to avoid the function call strchr() in the loop condition.)