Perfomance slowdown with growth of operations number - numbers

I have next problem. My code performance depends on number of operation! How can it be? (I use gcc v 4.3.2 under openSuse 11.1)
Here is my code:
#define N_MAX 1000000
typedef unsigned int uint;
double a[N_MAX];
double b[N_MAX];
uint n;
int main(){
for (uint i=0; i<N_MAX; i++) {
a[i]=(double)rand()/RAND_MAX;
}
for (uint n=100000; n<500000; n+=5000) {
uint time1 = time(NULL);
for (uint i=0; i<n;++i)
for (uint j=0;j<n;++j)
b[j] = a[j];
uint time2 = time(NULL);
double time = double(time2-time1);
printf("%5d ", n);
printf("%5.2f %.3f\n", time, ((double)n*n/time)/1e9);
}
return 0;
}
And here is the log of results:
n-time-Gflops (=)
200000 23.00 1.739
205000 24.00 1.751
210000 25.00 1.764
215000 26.00 1.778
220000 27.00 1.793
225000 29.00 1.746
230000 30.00 1.763
235000 32.00 1.726
240000 32.00 1.800
245000 34.00 1.765
250000 36.00 1.736
255000 37.00 1.757
260000 38.00 1.779
265000 40.00 1.756
270000 42.00 1.736
275000 44.00 1.719
280000 46.00 1.704
285000 48.00 1.692
290000 49.00 1.716
295000 51.00 1.706
300000 54.00 1.667
305000 54.00 1.723
310000 59.00 1.629
315000 61.00 1.627
320000 66.00 1.552
325000 71.00 1.488
330000 76.00 1.433
335000 79.00 1.421
340000 84.00 1.376
345000 85.00 1.400
350000 89.00 1.376
355000 96.00 1.313
360000 102.00 1.271
365000 110.00 1.211
370000 121.00 1.131
375000 143.00 0.983
380000 156.00 0.926
385000 163.00 0.909
There is also the image but I can't post it cause of new users restrictions. But here is the log plot.
What is the reason of this slowdown?
How to get rid of it? Please Help!

Your inner loops increase number of iterations every time - it is expected to take more time to do their job if there is more calculations to do. First time there are 100k operations to be done, second time 105k operations, and so on. It simply must take more and more time.
EDIT: To be clearer, I tried to say it looks to something that Spolsky called Shlemiel the painter's algorithm

Thank a lot for response!
My expectation bases on idea that operation number per time unit should remains the same. So if I increase number of operations, say twice, then computation time should increase also twice, therefore the ration of operations number and the computation time should be constant. This is something that i didn't meet. :(

Related

How to code dynamic stop loss orders under each swing low/high in a pine strategy script?

I am trying to program a strategy with stop loss orders that would automatically be placed under each swing low or high, because I believe that static stop loss orders with percentages or ticks are more likely to fail in the market.
After searching online for a long time I could not find any script that worked with dynamic stop loss, but most of them rather worked with fixed stop loss orders. Then, I came across the
ta.lowest(low,14) / ta.highest(high,14) function
and thought that it could be used to identify the value of the lowest candle in a certain range which could then be used as the initial point on which to add a percentage, and voila, that would be a dynamic stop loss based on swing low/high. Now the problem is it did not work out as I imagined and I have no clue why, which is why I am posting this question and hope that someone can point out my mistake.
Here the proof that it is not working:
The trade should be stopped out after reaching the price of the lowest candle (-0.1%) within the range of the last 5 candles
This is the part of the script that does not work:
// *************Entry orders*************
if longCondition
strategy.entry("Long", strategy.long)
if shortCondition
strategy.entry("Short", strategy.short)
//*************Stop Loss Orders*************
long_pos = strategy.position_size > 0
short_pos = strategy.position_size < 0
sl_perc = input.float(0.1, "Stop Loss Percentage under swing low/high")
sl_long = ta.lowest(low, 5) - ta.lowest(low, 14)/100*sl_perc
sl_short = ta.highest(high,5) + ta.lowest(low, 14)/100*sl_perc
strategy.exit("LongExit", "SL Long", stop = sl_long )
strategy.exit("ShortExit", "SL Short",stop = sl_short )
Thanks for any suggestions and help!
I think I found the answer myself:
I chose the wrong ID for the Exits. Really stupid mistake :(
(Hope that can be useful for anyone looking to try to do the same)
//*************STOP LOSS*************
long_pos = strategy.position_size > 0
short_pos = strategy.position_size < 0
sl_perc = input.float(1, "Stop Loss Percentage under swing low/high")
sl_long = ta.lowest(low, 5) - ta.lowest(low, 14)/100*sl_perc
sl_short = ta.highest(high, 5) + ta.lowest(low, 14)/100*sl_perc
strategy.exit("SL Long", "Long", stop=sl_long)
strategy.exit("SL Short", "Short", stop=sl_short)

haproxy stats: qtime,ctime,rtime,ttime?

Running a web app behind HAProxy 1.6.3-1ubuntu0.1, I'm getting haproxy stats qtime,ctime,rtime,ttime values of 0,0,0,2704.
From the docs (https://www.haproxy.org/download/1.6/doc/management.txt):
58. qtime [..BS]: the average queue time in ms over the 1024 last requests
59. ctime [..BS]: the average connect time in ms over the 1024 last requests
60. rtime [..BS]: the average response time in ms over the 1024 last requests
(0 for TCP)
61. ttime [..BS]: the average total session time in ms over the 1024 last requests
I'm expecting response times in the 0-10ms range. ttime of 2704 milliseconds seems unrealistically high. Is it possible the units are off and this is 2704 microseconds rather than 2704 millseconds?
Secondly, it seems suspicious that ttime isn't even close to qtime+ctime+rtime. Is total response time not the sum of the time to queue, connect, and respond? What is the other time, that is included in total but not queue/connect/response? Why can my response times be <1ms, but my total response times be ~2704 ms?
Here is my full csv stats:
$ curl "http://localhost:9000/haproxy_stats;csv"
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
http-in,FRONTEND,,,4707,18646,50000,5284057,209236612829,42137321877,0,0,997514,,,,,OPEN,,,,,,,,,1,2,0,,,,0,4,0,2068,,,,0,578425742,0,997712,22764,1858,,1561,3922,579448076,,,0,0,0,0,,,,,,,,
servers,server1,0,0,0,4337,20000,578546476,209231794363,41950395095,,0,,22861,1754,95914,0,no check,1,1,0,,,,,,1,3,1,,578450562,,2,1561,,6773,,,,0,578425742,0,198,0,0,0,,,,29,1751,,,,,0,,,0,0,0,2704,
servers,BACKEND,0,0,0,5919,5000,578450562,209231794363,41950395095,0,0,,22861,1754,95914,0,UP,1,1,0,,0,320458,0,,1,3,0,,578450562,,1,1561,,3922,,,,0,578425742,0,198,22764,1858,,,,,29,1751,0,0,0,0,0,,,0,0,0,2704,
stats,FRONTEND,,,2,5,2000,5588,639269,8045341,0,0,29,,,,,OPEN,,,,,,,,,1,4,0,,,,0,1,0,5,,,,0,5374,0,29,196,0,,1,5,5600,,,0,0,0,0,,,,,,,,
stats,BACKEND,0,0,0,1,200,196,639269,8045341,0,0,,196,0,0,0,UP,0,0,0,,0,320458,0,,1,4,0,,0,,1,0,,5,,,,0,0,0,0,196,0,,,,,0,0,0,0,0,0,0,,,0,0,0,0,
In haproxy >2 you now get two values n / n which is the max within a sliding window and the average for that window. The max value remains the max across all sample windows until a higher value is found. On 1.8 you only get the average.
Example of haproxy 2 v 1.8. Note these proxies are used very differently and with dramatically different loads.
So looks like the average response times at least since last reboot are 66m and 275ms.
The average is computed as:
data time + cumulative http connections - 1 / cumulative http connections
This might not be a perfect analysis so if anyone has improvements it'd be appreciated. This is meant to show how I came to the answer above so you can use it to gather more insight into the other counters you asked about. Most of this information was gathered from reading stats.c. The counters you asked about are defined here.
unsigned int q_time, c_time, d_time, t_time; /* sums of conn_time, queue_time, data_time, total_time */
unsigned int qtime_max, ctime_max, dtime_max, ttime_max; /* maximum of conn_time, queue_time, data_time, total_time observed */```
The stats page values are built from this code:
if (strcmp(field_str(stats, ST_F_MODE), "http") == 0)
chunk_appendf(out, "<tr><th>- Responses time:</th><td>%s / %s</td><td>ms</td></tr>",
U2H(stats[ST_F_RT_MAX].u.u32), U2H(stats[ST_F_RTIME].u.u32));
chunk_appendf(out, "<tr><th>- Total time:</th><td>%s / %s</td><td>ms</td></tr>",
U2H(stats[ST_F_TT_MAX].u.u32), U2H(stats[ST_F_TTIME].u.u32));
You asked about all the counter but I'll focus on one. As can be seen in the snippit above for "Response time:" ST_F_RT_MAX and ST_F_RTIME are the values displayed on the stats page as n (rtime_max) / n (rtime) respectively. These are defined as follows:
[ST_F_RT_MAX] = { .name = "rtime_max", .desc = "Maximum observed time spent waiting for a server response, in milliseconds (backend/server)" },
[ST_F_RTIME] = { .name = "rtime", .desc = "Time spent waiting for a server response, in milliseconds, averaged over the 1024 last requests (backend/server)" },
These set a "metric" value (among other things) in a case statement further down in the code:
case ST_F_RT_MAX:
metric = mkf_u32(FN_MAX, sv->counters.dtime_max);
break;
case ST_F_RTIME:
metric = mkf_u32(FN_AVG, swrate_avg(sv->counters.d_time, srv_samples_window));
break;
These metric values give us a good look at what the stats page is telling us. The first value in the "Responses time: 0 / 0" ST_F_RT_MAX, is some max value time spent waiting. The second value in "Responses time: 0 / 0" ST_F_RTIME is an average time taken for each connection. These are the max and average taken within a window of time, i.e. however long it takes for you to get 1024 connections.
For example "Responses time: 10000 / 20":
max time spent waiting (max value ever reached including http keepalive time) over the last 1024 connections 10 seconds
average time over the last 1024 connections 20ms
So for all intents and purposes
rtime_max = dtime_max
rtime = swrate_avg(d_time, srv_samples_window)
Which begs the question what is dtime_max d_time and srv_sample_window? These are the data time windows, I couldn't actually figure how these time values are being set, but at face value it's "some time" for the last 1024 connections. As pointed out here keepalive times are included in max totals which is why the numbers are high.
Now that we know ST_F_RT_MAX is a max value and ST_F_RTIME is an average, an average of what?
/* compue time values for later use */
if (selected_field == NULL || *selected_field == ST_F_QTIME ||
*selected_field == ST_F_CTIME || *selected_field == ST_F_RTIME ||
*selected_field == ST_F_TTIME) {
srv_samples_counter = (px->mode == PR_MODE_HTTP) ? sv->counters.p.http.cum_req : sv->counters.cum_lbconn;
if (srv_samples_counter < TIME_STATS_SAMPLES && srv_samples_counter > 0)
srv_samples_window = srv_samples_counter;
}
TIME_STATS_SAMPLES value is defined as
#define TIME_STATS_SAMPLES 512
unsigned int srv_samples_window = TIME_STATS_SAMPLES;
In mode http srv_sample_counter is sv->counters.p.http.cum_req. http.cum_req is defined as ST_F_REQ_TOT.
[ST_F_REQ_TOT] = { .name = "req_tot", .desc = "Total number of HTTP requests processed by this object since the worker process started" },
For example if the value of http.cum_req is 10, then srv_sample_counter will be 10. The sample appears to be the number of successful requests for a given sample window for a given backends server. d_time (data time) is passed as "sum" and is computed as some non-negative value or it's counted as an error. I thought I found the code for how d_time is created but I wasn't sure so I haven't included it.
/* Returns the average sample value for the sum <sum> over a sliding window of
* <n> samples. Better if <n> is a power of two. It must be the same <n> as the
* one used above in all additions.
*/
static inline unsigned int swrate_avg(unsigned int sum, unsigned int n)
{
return (sum + n - 1) / n;
}

Meaning of the benchmark variables in snakemake

I included a benchmark directive to some of the rules in my snakemake workflow, and the resulting files have the following header:
s h:m:s max_rss max_vms max_uss max_pss io_in io_out mean_load
The only documentation I've found mentions a "benchmark txt file (which will contain a tab-separated table of run times and memory usage in MiB)".
I can guess that columns 1 and 2 are two different ways of displaying the time taken to execute the rule (in seconds, and converted to hours, minutes and seconds).
io_in and io_out likely related to disk read and write activity, but in what units are they measured?
What are the others? Is this documented somewhere?
Edit: Looking at the source code
I've found the following piece of code in /snakemake/benchmark.py, that might well be where the benchmark data come from:
def _update_record(self):
"""Perform the actual measurement"""
# Memory measurements
rss, vms, uss, pss = 0, 0, 0, 0
# I/O measurements
io_in, io_out = 0, 0
# CPU seconds
cpu_seconds = 0
# Iterate over process and all children
try:
main = psutil.Process(self.pid)
this_time = time.time()
for proc in chain((main,), main.children(recursive=True)):
meminfo = proc.memory_full_info()
rss += meminfo.rss
vms += meminfo.vms
uss += meminfo.uss
pss += meminfo.pss
ioinfo = proc.io_counters()
io_in += ioinfo.read_bytes
io_out += ioinfo.write_bytes
if self.bench_record.prev_time:
cpu_seconds += proc.cpu_percent() / 100 * (
this_time - self.bench_record.prev_time)
self.bench_record.prev_time = this_time
if not self.bench_record.first_time:
self.bench_record.prev_time = this_time
rss /= 1024 * 1024
vms /= 1024 * 1024
uss /= 1024 * 1024
pss /= 1024 * 1024
io_in /= 1024 * 1024
io_out /= 1024 * 1024
except psutil.Error as e:
return
# Update benchmark record's RSS and VMS
self.bench_record.max_rss = max(self.bench_record.max_rss or 0, rss)
self.bench_record.max_vms = max(self.bench_record.max_vms or 0, vms)
self.bench_record.max_uss = max(self.bench_record.max_uss or 0, uss)
self.bench_record.max_pss = max(self.bench_record.max_pss or 0, pss)
self.bench_record.io_in = io_in
self.bench_record.io_out = io_out
self.bench_record.cpu_seconds += cpu_seconds
So this seems to come from functionalities provided by psutil.
I will just leave this here for future reference.
Reading through
snakemake >= 6.0.0 benchmark module
psutil's memory_info(), memory_full_info(), io_counters(), cpu_times()
as previously suggested:
colname
type (unit)
description
s
float (seconds)
Running time in seconds
h:m:s
string (-)
Running time in hour, minutes, seconds format
max_rss
float (MB)
Maximum "Resident Set Size”, this is the non-swapped physical memory a process has used.
max_vms
float (MB)
Maximum “Virtual Memory Size”, this is the total amount of virtual memory used by the process
max_uss
float (MB)
“Unique Set Size”, this is the memory which is unique to a process and which would be freed if the process was terminated right now.
max_pss
float (MB)
“Proportional Set Size”, is the amount of memory shared with other processes, accounted in a way that the amount is divided evenly between the processes that share it (Linux only)
io_in
float (MB)
the number of MB read (cumulative).
io_out
float (MB)
the number of MB written (cumulative).
mean_load
float (-)
CPU usage over time, divided by the total running time (first row)
cpu_time
float(-)
CPU time summed for user and system
Benchmarking in snakemake could certainly be better documented, but psutil is documanted here:
get_memory_info()
Return a tuple representing RSS (Resident Set Size) and VMS (Virtual Memory Size) in bytes.
On UNIX RSS and VMS are the same values shown by ps.
On Windows RSS and VMS refer to "Mem Usage" and "VM Size" columns of taskmgr.exe.
psutil.disk_io_counters(perdisk=False)
Return system disk I/O statistics as a namedtuple including the following attributes:
read_count: number of reads
write_count: number of writes
read_bytes: number of bytes read
write_bytes: number of bytes written
read_time: time spent reading from disk (in milliseconds)
write_time: time spent writing to disk (in milliseconds)
The code you found confirms that all the memory usage and IO counts are reported in MB (= bytes * 1024 * 1024).

Marathon backoff - is it really exponential?

I'm trying to figure out Marathon's exponential backoff configuration. Here's the documentation:
The backoffSeconds and backoffFactor values are multiplied until they reach the maxLaunchDelaySeconds value. After they reach that value, Marathon waits maxLaunchDelaySeconds before repeating this cycle exponentially. For example, if backoffSeconds: 3, backoffFactor: 2, and maxLaunchDelaySeconds: 3600, there will be ten attempts to launch a failed task, each three seconds apart. After these ten attempts, Marathon will wait 3600 seconds before repeating this cycle.
The way I think of exponential backoff is that the wait periods should be:
3*2^0 = 3
3*2^1 = 6
3*2^2 = 12
3*2^3 = 24 and so on
so every time the app crashes, Marathon will wait a longer period of time before retrying. However, given the description above, Marathon's logic for waiting looks something like this:
int retryCount = 0;
while(backoffSeconds * (backoffFactor ^ retryCount) < maxLaunchDelaySeconds)
{
wait(backoffSeconds);
retryCount++;
}
wait(maxLaunchDelaySeconds);
This matches the explanation in the documentation, since 3*2^x < 3600 for values of x fewer than or equal to 10. However, I really don't see how it can be called an exponential backoff, since the wait time is constant.
Is there a way to make Marathon wait progressively longer times with every restart of the app? Am I misunderstand the doc? Any help would be appreciated!
as far as I understand the code in the RateLimiter.scala, it is like you described, but then capped to the maxLaunchDelay waiting period. Let`s say maxLaunchDelay is one hour (3600s)
3*2^0 = 3
3*2^1 = 6
3*2^2 = 12
3*2^3 = 24
3*2^4 = 48
3*2^5 = 96
3*2^6 = 192
3*2^7 = 384
3*2^8 = 768
3*2^9 = 1536
3*2^10 = 3072
3*2^11 = 3600 (6144)
3*2^12 = 3600 (12288)
3*2^13 = 3600 (24576)
Which brings us a typically 2^n graph, see
You would get a bigger increase, if you would other backoffFactors,
for example backoff factor 10:
or backoff factor 20:
Additionally I saw a re-work of this topic, code review currently open here: https://phabricator.mesosphere.com/D1007
What do you think?
Thanks
Johannes

Instrument for count the number of method calls on iPhone

The Time Profiler can measure the amount of time spent on certain methods. Is there a similar method that measures the number of times a method is called?
DTrace can do this, but only in the iPhone Simulator (it's supported by Snow Leopard, but not yet by iOS). I have two writeups about this technology on MacResearch here and here where I walk through some case studies of using DTrace to look for specific methods and when they are called.
For example, I created the following DTrace script to measure the number of times methods were called on classes with the CP prefix, as well as total up the time spent in those methods:
#pragma D option quiet
#pragma D option aggsortrev
dtrace:::BEGIN
{
printf("Sampling Core Plot methods ... Hit Ctrl-C to end.\n");
starttime = timestamp;
}
objc$target:CP*::entry
{
starttimeformethod[probemod,probefunc] = timestamp;
methodhasenteredatleastonce[probemod,probefunc] = 1;
}
objc$target:CP*::return
/methodhasenteredatleastonce[probemod,probefunc] == 1/
{
this->executiontime = (timestamp - starttimeformethod[probemod,probefunc]) / 1000;
#overallexecutions[probemod,probefunc] = count();
#overallexecutiontime[probemod,probefunc] = sum(this->executiontime);
#averageexecutiontime[probemod,probefunc] = avg(this->executiontime);
}
dtrace:::END
{
milliseconds = (timestamp - starttime) / 1000000;
normalize(#overallexecutiontime, 1000);
printf("Ran for %u ms\n", milliseconds);
printf("%30s %30s %20s %20s %20s\n", "Class", "Method", "Total CPU time (ms)", "Executions", "Average CPU time (us)");
printa("%30s %30s %20#u %20#u %20#u\n", #overallexecutiontime, #overallexecutions, #averageexecutiontime);
}
This generates the following nicely formatted output:
Class Method Total CPU time (ms) Executions Average CPU time (us)
CPLayer -drawInContext: 6995 352 19874
CPPlot -drawInContext: 5312 88 60374
CPScatterPlot -renderAsVectorInContext: 4332 44 98455
CPXYPlotSpace -viewPointForPlotPoint: 3208 4576 701
CPAxis -layoutSublayers 2050 44 46595
CPXYPlotSpace -viewCoordinateForViewLength:linearPlotRange:plotCoordinateValue: 1870 9152
...
While you can create and run DTrace scripts from the command line, probably your best bet would be to create a custom instrument in Instruments and fill in the appropriate D code within that instrument. You can then easily run that against your application in the Simulator.
Again, this won't work on the device, but if you just want statistics on the number of times something is called, and not the duration it runs for, this might do the job.