C++ AMP Division Quirk When Stressing GPU - c++-amp

Wrote a program that uses C++ AMP to run on my laptop's GPU (Intel HD Graphics 520). The GPU kernel is long so I will give a high level description (but let me know if more is needed).
Note that I fall into the "know enough to be dangerous" category of programmer.
parallel_for_each(accelerator_view, number_of_runs.extent, [data](index<1> idx) restrict(amp)
{
double total = data.starting_total[idx];
//these "working variables" are used for a variety of things in the code
double working_variable = 0.0;
double working_variable2 = 0.0;
for (int i = 0; i < 20; i++)
{
...do lots of stuff. "total" is changed by various factors...
//total is still a positive number that is greater than zero
//working_variable now has a positive non-zero value, and I want to find what %
//of the remaining total that value is
working_variable2 = 1.0 / total;
working_variable2 = working_variable * working_variable2;
//Note that if I write it like this the same issue will happen:
working_variable2 = working_variable / total;
...keep going and doing more things, write some values to data..
if (total == 0)
break;
}
}
When I run this without doing much else on my computer this runs just fine and I get the results I expect.
Where it gets really tricky is when I am stressing the system (or I think I am stressing the system). I test stress the system by
1) Kicking off my program
2) Opening up Chrome
3) Going to Youtube and starting a video
When I do that I get unexpected results (either when I am opening a program or running a video). I traced it back to the "1.0 / total" calculation returning infinity (inf), even though "total" is greater than zero. Here is an example of what I output to the console when this issue happens:
total = 51805.6
1.0 / total = inf
precise_math::pow(total, -1) = 1.93029e-05
I am running the kernel about 1.6 million times and I'll see between 0 and 15 of those 1.6 million hit this issue. The number of issues varies and which threads hit the issue varies.
So I feel confident that "total" is not zero and this is not a divide by zero situation. What am I missing? What could be causing this issue? Any way to prevent this? I am thinking of replacing all division in the kernel with the pow(num, -1)
P.S. Yes I am aware that the part of the answer is "don't watch videos while running". It is the opening of programs during execution that I am most concerned about.
Thank you!

Related

Matlab code to charge to a maximum value of capacity

I want to get rCl, which is remaining charge of a battery in an e-vehicle. Currently the values are wrong as I am not getting the code right.
rCl should be a max of 1000. nCl describes the amount (in kWh) it is using, and pCp the amount (in kWh), it could potentially load. So it should be like that:
rCl(x) = rCl(x-1) -nCl(x-1) + pCp(x-1) --> to a a max of 1000kWh
I tried several things, including this:
for x = 2:2:2734
Cp(x) = min(nC(x-1),pCp(x))
end
for x = 2:1:2734
rC(x) = rC(x-1) - nC(x-1) + Cp(x-1)
end
But I just cant figure it out - I might have been looking at it for to long already... any suggestions are welcome!
If you look at the pic of the data, you may notice, it works as long as pCp is greater than nCl and therefor it can load to full. Once this is not working anymore, the load never goes to full again and keeps on decreasing.
I figured it out:
for x = 2:1:tablesize
rCl(x) = rCl(x-1)-nCl(x-1)+pCp(x-1)
if rCl(x) > 1000
rCl(x) = 1000
end
end

gnu sort - default buffer size

I have read the full documentation for gnu sort and searched online but I cannot find what the default for the --buffer-size option is (which determines how much system memory the program uses when it runs). I am guessing it is somehow determined based on total system memory? (or perhaps on memory available at the time the program is begins execution?). How can I determine this?
update: I've experimented a bit and it seems that when I don't specify a particular --buffer-size value, it ends up using very little ram and thus going very slowly. It would be nice though to better understand what exactly is determining this behavior.
I went digging through the coreutils sort source code and found these functions: default_sort_size and sort_buffer_size.
It turns out that --buffer-size (sort_size in the source code) isn't the target buffer size but rather the maximum buffer size. If no --buffer-size value is specified, the default_sort_size function is used to determine a safe maximum buffer size. It does this based on resource limits, available memory, and total memory. A summary of the function is as follows:
size = MIN(SIZE_MAX, resource_limit) / 2;
mem = MAX(available_memory, total_memory / 8);
if ( size > total_memory * 0.75 )
size = total * 0.75;
buffer_max = MIN(mem, size);
buffer_max = MAX(buffer, MIN_SORT_SIZE);
The other function, sort_buffer_size, is used to determine exactly how much memory to allocate for the given input files. A summary of the function is as follows:
if (sort_size is set)
size_bound = sort_size;
else
size_bound = default_sort_size();
buffer_size = line_bytes + 2;
for each input_file
if (input_file is regular)
file_size = input_file_size;
else
if (sort_size is set)
return sort_size;
else
file_size = guess;
worst_case = file_size * worst_case_per_input_byte + 1;
if (worst_case overflows || size + worst_case >= size_bound)
return size_bound;
else
size += worst_case;
return size;
Possibly the most important point of the sort_buffer_size function is that if you're sorting data from STDIN or a pipe, it will automatically default to sort_size (i.e. --buffer-size) if it was provided. Otherwise, for regular files it will make some rough calculations based on the file sizes and only use sort_size as an upper limit.
To summarize in English, the defaults are:
Reading from a real file:
Use all free memory, up to 3/4 and not less than 1/8 of total memory.
(If there is a process (rusage) memory limit in effect, sort will not use more than half of that.)
Reading from a pipe:
Use a small, fixed amount (tens of MB).
You will probably want -S.
Current for GNU coreutils 8.29, Jan 2018.

how to solve something associated with memory and speed of my computer?

first of all , i have a specific task that works at the expense of every possible combination of range , i already used a program that perform my purpose this code is :
clc
clear
a = input('Please, select your array: ')
b = a(:).'
c = length(b)
for d =1:c
if (d<c)
e{d} = nchoosek(b, d);
end
end
tt=cellfun(#(m) padarray(m,[0 max(cellfun(#(n) size(n,2), e)) - size(m,2)],'post'), e,'UniformOutput',0);
uu=cell2mat(tt([1:d-1])')
suu=size(uu)
uu(:,((suu(2))+1))=sum(uu')'
but i faced a big problem during the implementation of the program that this command :
e{d} = nchoosek(b, d)
has failed to continue as result of the number of possible choices and number of selected choices and finally the limitations:
When b = nchoosek(n,k) is sufficiently large, nchoosek displays a warning that the result might not be exact. In this case, the result is only accurate to 15 digits for double-precision inputs, or 8 digits for single-precision inputs.
C = nchoosek(v,k) is only practical for situations where length(v) is less than about 15.
that's where my vector consist of the hundreds of numbers which i want to do this process on.
when i run the program , then matlab is being busy until giving me out of memory error in a time and takes along time without implementation of the program then i am forced to out of the program . i looked for a solution by help "out of memory" but i failed to find any solution then i tried to use another command that's
combnk(v,k)
but it takes along time without implementation of the program then i am forced to out of the program too
please please please , i want to practical solution to perform a lot of processes like this for my work .
If the solution depends on the capabilities of my computer tell me about that computer specifications which performs this program easily , quickly and immediately
note that my computer specifications are :
win 7
64 bit
4 GB RAM
2.3 cpu i3

How to run the code for large variables

I have one code like below-
W = 3;
i = 4;
s = fullfact(ones(1,i)*(W + 1)) - 1;
p2 = unique(sort(s(sum(s,2) == i,:),2),'rows');
I can run this code only upto "i=11" but i want to run this code for upto "i=25".When i run these code for i=12 it shows error message "Out of Memory".
I need to keep these code as it is.How can i modify these code for larger value of "i"?
Matlab experts need your valuable suggestion.
Just wanting to do silly things is not enough. You are generating arrays that are simply too large to fit into memory.
See that the size of the matrix s is a function of i here. size(s) will be 2^(2*i) by i. (By the way, some will argue it is a bad idea to use i as a variable, which is normally sqrt(-1), for such variables.)
So when i = 4, s is only 256x4.
When i = 11, s is 4194304x11. This array takes 369098752 bytes of space, so 370 megabytes.
When i = 25, the array will be of size
2^50*25
ans =
2.8147e+16
Multiply that by 8 to get the memory needed. Something like 224 petabytes of memory! If you have that much memory, then send me a few terabytes of RAM. You won't miss them.
Yes, there are times when MATLAB runs out of memory. You can get the amount of memory available at any point of time by executing the following:
memory
However, I would suggest follow one of the strategies to reduce memory usage available here. Also, you might want to clear the variables which are not required in every iteration by
clear variable_name

How to calculate value of short options call with Black-Scholes formula?

I am trying to calculate the profit/loss of a short call at various times in the future, but it isn't coming out correct. Compared to the time of expiration, the ones with time left have less profit above the strike price, but at some point below the strike they don't lose value as fast as the t=0 line. Below is the formula in pseudocode, what am I doing wrong?
profit(stockprice) = -1 * (black_scholes_price_of_call(stockPrice,optionStrike,daysTillExpiration) - premium);
Real matlab code:
function [ x ] = sell_call( current,strike,price,days)
if (days > 0)
Sigma = .25;
Rates = 0.05;
Settle = today;
Maturity = today + days;
RateSpec = intenvset('ValuationDate', Settle, 'StartDates', Settle, 'EndDates',...
Maturity, 'Rates', Rates, 'Compounding', -1);
StockSpec = stockspec(Sigma, current);
x = -1 * (optstockbybls(RateSpec, StockSpec, Settle, Maturity, 'call', strike) - price);
else
x = min(price,strike-current-price);
end
end
Your formula ain't right. I don't know why you need that leading -1 as a multiplier for, because when I distribute it out the "formula" is a simple one:
profit(stockprice) = premium - black_scholes_price_of_call(stockPrice,optionStrike,daysTillExpiration);
Pretty simple. So that means the problem is buried in that function for the price of the call, right?
When I compare your formula to what I see as the definition on Wikipedia, I don't see a correspondence at all. Your MATLAB code doesn't help, either. Dig into the functions and see where you went wrong.
Did you write those? How did you test them before you assembled them into this larger function. Test the smaller blocks before you assemble them into the bigger thing.
What baseline are you testing against? What known situation are you comparing your calculation to? There are lots of B-S calculators available. Maybe you can use one of those.
I'd assume that it's an error in your code rather than MATLAB. Or you've misunderstood the meaning of the parameters you're passing. Look at your stuff more carefully, re-read the documentation for that function, and get a good set of baseline cases.
I found the problem, it had to do with the RateSpec argument. When you pass in a interest rate, it affects the option pricing.