I'm completely confused about Devel::NYTProf reports generated by nytprofhtml. I'm using old version of NYTProf 1.90. I know it is very old version but should use it for number of reasons.
So these HTML reports look something
like this (when looking on particular *.pl file report):
|Line|Stmts.| Time | Avg. |Code|
|42 | 6804 | 0.04506 | 7e-06 | }; |
I have never seen reports from new version of nytprofhtml, so not sure if they look the same.
In, my case this line is most slow part in whole program (it not a small program).
So my question is how can statement like this '};' be slowest part in the program with lot more complex statements. I think misunderstand what NYTProf reports.
If my question is confusing just give me definitions of each column from these reports this will help? This will help a lot.
Especially I'm interested what Stmts. mean. I'm guessing, but I don't want to guess!
Thanks in advance.
Stmts. is the number of times the statement was executed or, more precisely, the number of times execution moved from a statement associated with that line (which is not always accurate) to whichever statement was executed next.
Time is the sum of the time spent executing statements associated with that line.
Avg. is simply Time divided by Stmts.
These extracts from the current Devel::NYTProf documentation may help:
The statement profiler measures the time between entering one perl statement and entering the next. Whenever execution reaches a new statement, the time since entering the previous statement is calculated and added to the time associated with the line of the source file that the previous statement starts on. [...]
For example, given:
while (<>) {
...
1;
}
After the first time around the loop, any further time spent evaluating the condition (waiting for input in this example) would be recorded as having been spent on the last statement executed in the loop.
More recent versions of NYTProf, of which there are many, offer much more accurate timings for this situation by intercepting the appropriate internal loop opcodes, plus many other significant improvements.
Related
I've stumbled on this page in PostgreSQL wiki, where it's advised to not use BETWEEN with timestamps:
Why not?
BETWEEN uses a closed-interval comparison: the values of both ends of
the specified range are included in the result.
This is a particular problem with queries of the form
SELECT * FROM blah WHERE timestampcol BETWEEN '2018-06-01' AND
'2018-06-08'
This will include results where the timestamp is exactly 2018-06-08
00:00:00.000000, but not timestamps later in that same day. So the
query might seem to work, but as soon as you get an entry exactly on
midnight, you'll end up double-counting it.
Can anyone explain how this "double-counting" can occur?
Often when using BETWEEN, you want to use multiple ranges (in separate queries) which cumulatively cover all the data.
If the next invocation of your query uses BETWEEN '2018-06-08' AND '2018-06-015', then exact midnight of 2018-06-08 will be included in both invocations. On the other hand, if the next one uses BETWEEN '2018-06-09' AND '2018-06-015', then all of 2018-06-08 except for exact midnight has been overlooked. There is no easy way (using BETWEEN) to construct ranges which cover every data point once and only once.
In Netlogo Behavior space, if one of the runs is throwing an error, how to skip that run and ask netlogo to proceed with the next run?
Is it even possible?
From the docs,
If you do want spreadsheet output, note that if anything interrupts
the experiment, such as a runtime error, running out of memory, or a
crash or power outage, no spreadsheet results will be written. For
long experiments, you may want to also enable table format as a
precaution so that if something happens and you get no spreadsheet
output you'll at least get partial table output.)
So, I'll assume this isn't possible and the best way to fix this would be to handle the situation where your code has an error. Alternatively, you could use the carefully command to handle the error messages.
I have a Matlab code (from a journal paper) and I'm trying to re-simulate their data.
I executed the code one week ago. I think the code is taking so long time to run. Matlab is still busy and taking 50% of my cpu.
I was wondering if the process has ended with some errors somewhere in the code. My question is:
When I see no errors, can I be sure that everything is fine with this running process? And I can wait until it is finished?
Is there any way to check which part of code is being run now ( without stopping the execution)?
Or I should stop the program and try something else?
Actually I don't want to loose this 1 week and if you think everything is fine, I would wait until the code stops.
(The authors of the paper didn't reply to my question and I don't know how long should it naturally take... They just mentioned it may take a long time to simulate the data).
Unfortunately, there is little we can do for you.
When I see no errors, can I be sure that everything is fine with this running process?
That's pretty much the definition of an error. If no error is raised, then it means that the program is still running.
Is there any way to check which part of code is being run now (without stopping the execution)?
Unfortunately no. For long-lasting execution times like that, a good developing practice is to display some information from time to time to inform the end user of the execution status.
However, if the programs produces files all along the way (like for instance at every step in an iterative simulation) you can check on your computer that the files are well-produced, and the production rate will more or less inform you on the total execution time.
For all your other questions, well, it's up to you to decide what to do (stop it or let it run). Be aware that the execution time can differ significantly from one machine to another, so the time it took on the author's machine may not be really informative to you.
In the future, I would advise you to react faster than within a week. When you launch a code that has a long execution time and see that there is no display within the first hour, you should stop it, modify it such that it regulatly displays information, and re-run it. It's better to loose one hour than one week.
Best,
I am working on some in which the speed and time are of high importance. I am using profiler to find the bottleneck of my code, but i cannot understand some things in profiler.
first, what does self and total time mean?
second, it has something called workspacefunc>local_min and workspacefunc>local_max, what are they?
self time is the total time spent in a function, not including any spent in any child functions called. As an example, if you had a function which was calling a whole bunch of other functions, the profiler only includes the time spent in the main function called from the profiler and not in any of the other functions defined inside the main function.
total time is the total time spent on a function (makes sense, right?). This includes the timing in all of the child functions called. Also, you need to be careful where the profiler itself can take some time to execute as well, which is included in the results. One small thing as well: the total time can be zero for functions whose running time are inconsequential.
Reference: http://www.mathworks.com/help/matlab/matlab_prog/profiling-for-improving-performance.html
workspacefunc... there doesn't seem to be any documentation on it, but this is the help text that I get when checking what it does:
workspacefunc Support function for Workspace browser component.
The Workspace browser is a window that shows you all of the variables that are defined in your workspace. If I were to take an educated guess, profiler does some analysis on your workspace variables, which include the min and max of certain variables in your workspace. I can't really say much more as there is absolutely no documentation on this, but it's safe to ignore. Simply focus on the functions that you are calling from your own code.
I wrote a simple powershell script that recursively walks a file tree and returns the paths of each node along with the time of its creation in tab-separated form, so that I can write it out to a text file and use it to do statistical analysis:
echo "PATH CREATEDATE"
get-childitem -recurse | foreach-object {
$filepath = $_.FullName
$datecreated = $_.CreationTime
echo "$filepath $datecreated"
}
Once I had done this, however, I noticed that the CreationDate times that get produced by the script are exactly one hour ahead of what Windows Explorer says when I look at the same attribute of the same files. Based on inspecting the rest of my dataset (which recorded surrounding events in a different format), it's clear that the results I get from explorer are the only ones that fit the overall narrative, which leads me to believe that there's something wrong with the Powershell script that makes it write out the incorrect time. Does anyone have a sense for why that might be?
Problem background:
I'm trying to correct for a problem in the design of some XML log files, which logged when the users started and stopped using an application when it was actually supposed to log how long it took the users to get through different stages of the workflow. I found a possible way to overcome this problem, by pulling date information from some backup files that the users sent along with the XML logs. The backups are generated by our end-user application at the exact moment when a user transitions between stages in the workflow, so I'm trying to bring information from those files' timestamps together with the contents of the original XML log to figure out what I wanted to know about the workflow steps.
Summary of points that have come out in comment discussion:
The files are located on the same machine as the script I'm running (not a network store)
Correcting for daylight savings and time zones has improved the data quality, but not for the specific issue posed in the original question.
I never found the ultimate technical reason for the discrepancy between the timestamps from powershell vs. explorer, but I was able to correct for it by just subtracting an hour off all the timestamps I got from the powershell script. After doing that, however, there was still a large amount of disagreement between the time stamps I got from out of my XML log files and the ones I pulled from the filesystem using the powershell script. Reasoning that the end-users probably stayed in the same time zone when they were generating the files, I wrote a little algorithm to estimate the time zone of each user by evaluating the median amount of time between steps 1 and 2 in the workflow and steps 2 and 3. If there was a problem with the user's time zone, one of those two timespans would be negative (since the time of the step 2 event was estimated and the times of the steps 1 and 3 events were known from the XML logs.) I then rounded the positive value down to the nearest hour and applied that number of hours as an offset to that user's step 2 times. Overall, this took the amount of bad data in my dataset from 20% down to 0.01%, so I'm happy with the results.
In case anyone needs it, here's the code I used to make the hour offset in the timestamps (not powershell code, this was in a C# script that handled another part of data processing):
DateTime step2time = DateTime.Parse(LastModifyDate);
TimeSpan shenanigansCorrection = new TimeSpan(step2time.Hour-1,step2time.Minute,step2time.Second);
step2time= step2time.Date + shenanigansCorrection;
The reason for redefining the step2time variable is that DateTimes aren't mutable in .NET.