Expression result unused - iphone

I got some codes and I'm trying to fix some compiling bugs:
StkFrames& PRCRev :: tick( StkFrames& frames, unsigned int channel )
{
#if defined(_STK_DEBUG_)
if ( channel >= frames.channels() - 1 ) {
errorString_ << "PRCRev::tick(): channel and StkFrames arguments are incompatible!";
handleError( StkError::FUNCTION_ARGUMENT );
}
#endif
StkFloat *samples = &frames[channel];
unsigned int hop = frames.channels();
for ( unsigned int i=0; i<frames.frames(); i++, samples += hop ) {
*samples = tick( *samples );
*samples++; <<<<<<<<<--------- Expression result unused.
*samples = lastFrame_[1];
}
return frames;
}
I don't understand what the codes is trying to do. The codes are huge and I fixed quite a few. But googling didn't work for this.
Any ideas?

First, you do an increment (the line which actually gives you warning).
*samples++;
And then you assign to that variable something else, which makes previous action unused.
*samples = lastFrame_[1];
I recommend you to read this code inside 'for' loop more carefully. It doesn't look very logical.

Related

scanf hangs when copy and paste many line of inputs at a time

This may be a simple question, but I'm new to C, and yet couldn't find any answer. My program is simple, it takes 21 lines of string input in a for loop, and print them after that. The number could be less or greater.
int t = 21;
char *lines[t];
for (i = 0; i < t; i++) {
lines[i] = malloc(100);
scanf("%s", lines[i]);
}
for (int i = 0; i < t; i++) {
printf("%s\n", lines[i]);
free(lines[i]);
}
...
So when I copy & paste the inputs at a time, my program hangs, no error, no crash. It's fine if there's only 20 lines or below. And if I enter by hand line by line, it works normally regardless of number of inputs.
I'm using XCode 5 in Mac OS X 10.10, but I don't think this is the issue.
Update:
I tried to debug it when the program hangs, it stopped when i == 20 at the line below:
0x7fff9209430a: jae 0x7fff92094314 ; __read_nocancel + 20
The issue may be related to scanf, but it's so confused, why the number 20? May be I'm using it the wrong way, great thanks to any help.
Update:
I have tried to compile the program using the CLI gcc. It works just fine. So, it is the issue of XCode eventually. Somehow it prevents user from pasting multiple inputs.
Use fgets when you want to read a string in C , and see this documentation about that function:
[FGETS Function]
So you should use it like this :
fgets (lines[i],100,stdin);
So it'll get the string from the input of the user and you can have a look on these two posts as well about reading strings in C:
Post1
Post2
I hope that this'll help you with your problem.
Edit :
#include <stdio.h>
void main(){
int t = 21;
int i;
char *lines[t];
for (i = 0; i < t; i++) {
lines[i] = malloc(100);
fgets(lines[i],255,stdin);
}
for (i = 0; i < t; i++) {
printf("String %d : %s\n",i, lines[i]);
free(lines[i]);
}
}
This code gives :
As you can see , I got the 21 strings that I entered (From 0 to 20, that's why it stops when i==20).
I tried with your input ,here's the results :
I wrote the same code and ran. It works.
It might contain more than 99 characters (include line feed) per line...
Or it might contain spaces and tabs.
scanf(3)
When one or more whitespace characters (space, horizontal tab \t, vertical tab \v, form feed \f, carriage return \r, newline or linefeed \n) occur in the format string, input data up to the first non-whitespace character is read, or until no more data remains. If no whitespace characters are found in the input data, the scanning is complete, and the function returns.
To avoid this, try
scanf ("%[^\n]%*c", lines[i]);
The whole code is:
#include <stdio.h>
int main() {
const int T = 5;
char lines[T][100]; // length: 99 (null terminated string)
// if the length per line is fixed, you don't need to use malloc.
printf("input -------\n");
for (int i = 0; i < T; i++) {
scanf ("%[^\n]%*c", lines[i]);
}
printf("result -------\n");
for (int i = 0; i < T; i++) {
printf("%s\n", lines[i]);
}
return 0;
}
If you still continue to face the problem, show us the input data and more details. Best regards.

CUDA class with multidimensional pointers

I have been struggling with this class implementation now for quite a while and hope someone can help me with it.
class Material_Properties_Class_device
{
public:
int max_variables;
Logical * table_prop;
Table_Class ** prop_table;
};
The implementation for the pointers looks like this
Material_Properties_Class **d_material_prop = new Material_Properties_Class* [4];
Logical *table_prop;
for (int k = 1; k <= 3; k++ )
{
cutilSafeCall(cudaMalloc((void**)&(d_material_prop[k]),sizeof(Material_Properties_Class)));
cutilSafeCall(cudaMemcpy(d_material_prop[k], material_prop[k], sizeof(Material_Properties_Class ), cudaMemcpyHostToDevice));
}
for( int i = 1; i <= 3; i++ )
{
cutilSafeCall(cudaMalloc((void**)&(table_prop), sizeof(Logical)));
cudaMemcpy(&(d_material_prop[i]->table_prop), &(table_prop), sizeof(Logical*),cudaMemcpyHostToDevice);
cudaMemcpy(table_prop, material_prop[i]->table_prop, sizeof(Logical),cudaMemcpyHostToDevice);
}
cutilSafeCall(cudaMalloc((void ***)&material_prop_device, (4) * sizeof(Material_Properties_Class *)));
cutilSafeCall(cudaMemcpy(material_prop_device, d_material_prop, (4) * sizeof(Material_Properties_Class *), cudaMemcpyHostToDevice));
This implementation works but it can't get it working for the **prop_table.
I assume it must somehow follow the same principle but I just can't get my head around it.
I have already tried
Table_Class_device **prop_table = new Table_Class_device*[3];
and insert another loop inside the second for loop
for (int k = 1; k <= 3; k++ )
{
cutilSafeCall(cudaMalloc((void**)&(prop_table[k]), sizeof(Table_Class)));
cutilSafeCall(cudaMemcpy( prop_table[k], material_prop[i]->prop_table[k], sizeof( Table_Class *), cudaMemcpyHostToDevice));
}
Help would be much appriciated
some magic. May be it'll help
struct fading_coefficient
{
double* frequency_array;
double* temperature_array;
int frequency_size;
int temperature_size;
double** fading_coefficients;
};
struct fading_coefficient* cuda_fading_coefficient;
double* frequency_array = NULL;
double* temperature_array = NULL;
double** fading_coefficients = NULL;
double** fading_coefficients1 = (double **)malloc(fading_coefficient->frequency_size * sizeof(double *));
cudaMalloc((void**)&frequency_array,fading_coefficient->frequency_size *sizeof(double));
cudaMemcpy( frequency_array, fading_coefficient->frequency_array, fading_coefficient->frequency_size *sizeof(double), cudaMemcpyHostToDevice );
free(fading_coefficient->frequency_array);
cudaMalloc((void**)&temperature_array,fading_coefficient->temperature_size *sizeof(double));
cudaMemcpy( temperature_array, fading_coefficient->temperature_array, fading_coefficient->temperature_size *sizeof(double), cudaMemcpyHostToDevice );
free(fading_coefficient->temperature_array);
cudaMalloc((void***)&fading_coefficients,fading_coefficient->temperature_size *sizeof(double*));
for (int i = 0; i < fading_coefficient->temperature_size; i++)
{
cudaMalloc((void**)&(fading_coefficients1[i]),fading_coefficient->frequency_size *sizeof(double));
cudaMemcpy( fading_coefficients1[i], fading_coefficient->fading_coefficients[i], fading_coefficient->frequency_size *sizeof(double), cudaMemcpyHostToDevice );
free(fading_coefficient->fading_coefficients[i]);
}
cudaMemcpy(fading_coefficients, fading_coefficients1, fading_coefficient->temperature_size *sizeof(double*), cudaMemcpyHostToDevice );
fading_coefficient->frequency_array = frequency_array;
fading_coefficient->temperature_array = temperature_array;
fading_coefficient->fading_coefficients = fading_coefficients;
cudaMalloc((void**)&cuda_fading_coefficient,sizeof(struct fading_coefficient));
cudaMemcpy( cuda_fading_coefficient, fading_coefficient, sizeof(struct fading_coefficient), cudaMemcpyHostToDevice );
This question comes up frequently. Multidimensional pointers are especially challenging.
If possible, it's recommended that you flatten multidimensional pointer usage (**) to single-dimensional pointer usage (*), and as you've seen, even that is somewhat cumbersome.
The single-dimensional case (*) is further described here. Although you seem to have already figured it out.
If you really want to handle the 2 dimensional (**) case, look here.
An example implementation for 3 dimensional case (***) is here. ("madness!")
Working with 2 and 3 dimensions this way is quite difficult. Thus the recommendation to flatten.

Why is the call to array_view::synchronize() so slow?

i've started experimenting with C++ AMP. I've created a simple test app just to see what it can do, however the results are quite surprising to me. Consider the following code:
#include <amp.h>
#include "Timer.h"
using namespace concurrency;
int main( int argc, char* argv[] )
{
uint32_t u32Threads = 16;
uint32_t u32DataRank = u32Threads * 256;
uint32_t u32DataSize = (u32DataRank * u32DataRank) / u32Threads;
uint32_t* pu32Data = new (std::nothrow) uint32_t[ u32DataRank * u32DataRank ];
for ( uint32_t i = 0; i < u32DataRank * u32DataRank; i++ )
{
pu32Data[i] = 1;
}
uint32_t* pu32Sum = new (std::nothrow) uint32_t[ u32Threads ];
Timer tmr;
tmr.Start();
array< uint32_t, 1 > source( u32DataRank * u32DataRank, pu32Data );
array_view< uint32_t, 1 > sum( u32Threads, pu32Sum );
printf( "Array<> deep copy time: %.6f\n", tmr.Stop() );
tmr.Start();
parallel_for_each(
sum.extent,
[=, &source](index<1> idx) restrict(amp)
{
uint32_t u32Sum = 0;
uint32_t u32Start = idx[0] * u32DataSize;
uint32_t u32End = (idx[0] * u32DataSize) + u32DataSize;
for ( uint32_t i = u32Start; i < u32End; i++ )
{
u32Sum += source[i];
}
sum[idx] = u32Sum;
}
);
double dDuration = tmr.Stop();
printf( "gpu computation time: %.6f\n", dDuration );
tmr.Start();
sum.synchronize();
dDuration = tmr.Stop();
printf( "synchronize time: %.6f\n", dDuration );
printf( "first and second row sum = %u, %u\n", pu32Sum[0], pu32Sum[1] );
tmr.Start();
for ( uint32_t idx = 0; idx < u32Threads; idx++ )
{
uint32_t u32Sum = 0;
for ( uint32_t i = 0; i < u32DataSize; i++ )
{
u32Sum += pu32Data[(idx * u32DataSize) + i];
}
pu32Sum[idx] = u32Sum;
}
dDuration = tmr.Stop();
printf( "cpu computation time: %.6f\n", dDuration );
printf( "first and second row sum = %u, %u\n", pu32Sum[0], pu32Sum[1] );
delete [] pu32Sum;
delete [] pu32Data;
return 0;
}
Note that Timer is a simple timing class using QueryPerformanceCounter. Anyway, the output of the code is the following:
Array<> deep copy time: 0.089784
gpu computation time: 0.000449
synchronize time: 8.671081
first and second row sum = 1048576, 1048576
cpu computation time: 0.006647
first and second row sum = 1048576, 1048576
Why is the call to synchronize() taking so long? Is there a way how to get around this? Other than that the performance of the computation performance is amazing, however the synchronize() overhead makes it unusable for me.
It is also possible that i am doing something terribly wrong, if so, please tell me. Thanks in advance.
Function synchronize() is probably taking so long because it is waiting for the actual kernel to complete its work.
From parallel_for_each from amp.h:
Please note that the parallel_for_each executes as if synchronous to the calling code, but in reality, it is asynchronous. I.e. once the parallel_for_each call is made and the kernel has been passed to the runtime, the [code after the parallel_for_each] continues to execute immediately by the CPU thread, while in parallel the kernel is executed by the GPU threads.
So, measuring the time spent in parallel_for_each is not particularly meaningful.
EDIT: The way the algorithm is written, it won't benefit much from GPU acceleration. The read of source[i] is non-coalesced, and so it will be almost 16x slower than a coalesced read. It is possible to coalesce the read by using shared memory, but it is not quite trivial. I'd recommend reading up on GPU programming.
If you just want a simple example that demonstrates the utility of C++ AMP, try matrix multiplication.
Of course, the performance you'll observe also greatly depends on the model of you GPU hardware.
In addition to Igor's response on your specific algorithm, please note that there are multiple incorrect aspects of the way you are measuring C++ AMP performance in general (no runtime initialization exclusion, no discarding of initial JIT, no warmup of data, and the already pointed out assumption of p_f_e being synchronous), so please follow our guidelines here:
http://blogs.msdn.com/b/nativeconcurrency/archive/2011/12/28/how-to-measure-the-performance-of-c-amp-algorithms.aspx

Convert int to CFString without CFStringCreateWithFormat

The following is extremely slow for what I need.
CFStringCreateWithFormat(NULL, NULL, CFSTR("%d"), i);
Currently this takes 20,000ns in my tests to execute on my 3gs. Perhaps that sounds fast, but I can create and release two NSMutableDictionaries in the time this executes. My C is weak, but there must be something equivalent to itoa that I can use on IOS.
This is the faster I can get:
CFStringRef TECFStringCreateWithInteger(NSInteger integer)
{
size_t size = 21; // long enough for 64 bits integer
char buffer[size];
char *characters = buffer + size;
*(--characters) = 0; // NULL-terminated string
int sign = integer < 0 ? -1 : 1;
do {
*(--characters) = '0' + (integer % 10) * sign;
integer /= 10;
}
while ( integer );
if ( sign == -1 )
*(--characters) = '-';
return CFStringCreateWithCString(NULL, characters, kCFStringEncodingASCII);
}

Finding log2() using sqrt()

This is an interview question I saw on some site.
It was mentioned that the answer involves forming a recurrence of log2() as follows:
double log2(double x )
{
if ( x<=2 ) return 1;
if ( IsSqureNum(x) )
return log2(sqrt(x) ) * 2;
return log2( sqrt(x) ) * 2 + 1; // Why the plus one here.
}
as for the recurrence, clearly the +1 is wrong. Also, the base case is also erroneous.
Does anyone know a better answer?
How is log() and log10() actually implemented in C.
Perhaps I have found the exact answers the interviewers were looking for. From my part, I would say it's little bit difficult to derive this under interview pressure. The idea is, say you want to find log2(13), you can know that it lies between 3 to 4. Also 3 = log2(8) and 4 = log2(16),
from properties of logarithm, we know that log( sqrt( (8*16) ) = (log(8) + log(16))/2 = (3+4)/2 = 3.5
Now, sqrt(8*16) = 11.3137 and log2(11.3137) = 3.5. Since 11.3137<13, we know that our desired log2(13) would lie between 3.5 and 4 and we proceed to locate that. It is easy to notice that this has a Binary Search solution and we iterate up to a point when our value converges to the value whose log2() we wish to find. Code is given below:
double Log2(double val)
{
int lox,hix;
double rval, lval;
hix = 0;
while((1<<hix)<val)
hix++;
lox =hix-1;
lval = (1<<lox) ;
rval = (1<<hix);
double lo=lox,hi=hix;
// cout<<lox<<" "<<hix<<endl;
//cout<<lval<<" "<<rval;
while( fabs(lval-val)>1e-7)
{
double mid = (lo+hi)/2;
double midValue = sqrt(lval*rval);
if ( midValue > val)
{
hi = mid;
rval = midValue;
}
else{
lo=mid;
lval = midValue;
}
}
return lo;
}
It's been a long time since I've written pure C, so here it is in C++ (I think the only difference is the output function, so you should be able to follow it):
#include <iostream>
using namespace std;
const static double CUTOFF = 1e-10;
double log2_aux(double x, double power, double twoToTheMinusN, unsigned int accumulator) {
if (twoToTheMinusN < CUTOFF)
return accumulator * twoToTheMinusN * 2;
else {
int thisBit;
if (x > power) {
thisBit = 1;
x /= power;
}
else
thisBit = 0;
accumulator = (accumulator << 1) + thisBit;
return log2_aux(x, sqrt(power), twoToTheMinusN / 2.0, accumulator);
}
}
double mylog2(double x) {
if (x < 1)
return -mylog2(1.0/x);
else if (x == 1)
return 0;
else if (x > 2.0)
return mylog2(x / 2.0) + 1;
else
return log2_aux(x, 2.0, 1.0, 0);
}
int main() {
cout << "5 " << mylog2(5) << "\n";
cout << "1.25 " << mylog2(1.25) << "\n";
return 0;
}
The function 'mylog2' does some simple log trickery to get a related number which is between 1 and 2, then call log2_aux with that number.
The log2_aux more or less follows the algorithm that Scorpi0 linked to above. At each step, you get 1 bit of the result. When you have enough bits, stop.
If you can get a hold of a copy, the Feynman Lectures on Physics, number 23, starts off with a great explanation of logs and more or less how to do this conversion. Vastly superior to the Wikipedia article.