Netlogo Profiler: How can the exclusive time be greater than the inclusive time? - netlogo

I am trying to optimize my NetLogo model using the Profiler extension. I get the following output [excerpt]:
Profiler
BEGIN PROFILING DUMP
Sorted by Exclusive Time
Name Calls Incl T(ms) Excl T(ms) Excl/calls
COMPLETE-COOKING 38741 0.711 4480.369 0.116
GET-RECIPE 10701 2618.651 2618.651 0.245
GET-EQUIPMENT 38741 1204.293 1204.293 0.031
SELECT-RECIPE-AT-TICK 990 9533.460 470.269 0.475
GIVE-RECIPE-REVIEW 10701 4.294 449.523 0.042
COMPLETE-COOKING and GIVE-RECIPE-REVIEW have a greater exclusive than inclusive time.
How can this be? And if it is an error, how do I fix it?

Related

Different response times for same SQL statement in Oracle

I am running a simple select statement in my database (connected from a remote machine using putty). I connected to SQLPLUS from the putty and executing the select statement. But getting different response times each time I run the query. Here are my observations.
1) Enabled the trace(10046). "elapsed_time" in the trace file is different for each execution of the query.
2) there is a huge difference in the elapsed time which is displayed on the console, and which is there in trace file... From the Putty, the elapsed time is showing approx. 2-3 secs. Whereas the elapsed time logged in the trace is showing the elapsed time as 1 sec... What is the difference between elapsed time on the putty console and trace file log?
Putty Console output:
select * from WWSH_TEST.T01DATA_LINK
489043 rows selected.
Elapsed: 00:02:57.16
Tracefile output:
select *
from
WWSH_TEST.T01DATA_LINK
call count cpu elapsed disk query current rows
---- ------- ------- --------- -------- -------- ------- ------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 32604 0.38 2.32 10706 42576 0 489043
From the putty console, the elapsed time is showing as 2.57 secs, whereas from the trace file the elapsed time is 2.32 .. Why we see this difference?
Moreover, when I am running this same SQL statement repeatedly, I could see different elapsed times in the trace files (ranging from 2.3 to 2.5 secs) .. What could be the reason for this difference in response time when there is no change in the database at all.
Database Version: 11.2.0.3.0
It looks like the time difference is the "client processing" - basically the time spent by sqlplus formatting the output.
Also, it looks like your array size is 15. Try running with a larger array size, such as 512 or 1024.
The mechanism to set the array size will vary from client to client. In sqlplus:
set arraysize 1024
The fetch time does not include network time, but if you use a level 8 trace
alter session set events '10046 trace name context forever, level 8';
this will given you wait events. The one to look for is SQL*Net message from client, which is basically the time the database is waiting to be asked to do something, such as fetch the next set of rows.

Why does referring to CURRENT_DATE in a complex Redshift view slow down the query significantly?

I have a complex Redshift view looking to filter results based on a variable date range. So I have to compare a date and interval to CURRENT_DATE. The more complex the view, the longer the query takes. Even simply SELECT'ing CURRENT_DATE within the view results in a significant slowdown.
SELECT CURRENT_DATE FROM complex_view; ==> Average time: ~ 800ms
SELECT CURRENT_DATE FROM less_complex_view; ==> Average time: ~ 400ms
SELECT CURRENT_DATE; ==> Average time: ~ 30ms
The query also never seems to become cached unlike even the following:
SELECT * FROM complex_view; ==> Average time after 4 slow initial calls: ~30 ms
However, if I insert CURRENT_DATE into a table in the view, and compare using that instead, the query is fast.
SELECT curr_date_in_table FROM complex_view; ==> Average time: ~ 30ms
The issue with that is more complexity (a cron job to update a single row daily, when the task is honestly quite a basic one) and worse code maintainability. Why is it that simply referring to CURRENT_DATE in certain situations takes so much time? As with this very old related post, hardcoding the date also ensures a quick runtime, but I would like to automate the process.
I'm relatively new to using EXPLAINs, but there seemed to be no noticeable difference between querying using either the hardcoded current date, curr_date_in_table, or CURRENT_DATE. They all have some ridiculously high top-level cost regardless of runtime.
EDIT: Pavel and Jasen seem to be correct. I created an immutable UDF to return GETDATE() in SQL, and queries on the view ran nearly instantly. It only needs to be defined once, so automation and code maintainability are back on track! It is still very strange that this basic functionality needs to be redefined.
CURRENT_DATE is a function, and usually should be very fast (on my comp about 300us). I really don't know what is real reason of your slow query - it is not possible deduce from informations that are here. The fundamental information is a execution plan of slow query, and it is not here.
But I am think so there can be some optimization issue. CURRENT_DATE although doesn't look like function, it is a function (stable function). Stable functions are not evaluated in planning/optimization stage - so when you use CURRENT_DATE in your query, the optimizer doesn't know what is a value and cannot be too aggressive.

PerfView's Metric/Interval greater than 1?

According to some PerfView material that I have seen/read online, I should dig into further CPU investigation if the Metric/msec is close to 1 (ex: 0.92), but I am seeing that in my case this value is 10.62. What does this mean?
Totals
Metric: 218,443.0
Count: 218,443.0
First: 353.835
Last: 20,927.284
Last-First: 20,573.449
Metric/Interval: 10.62
TimeBucket: 686.0
TotalProcs 12
From Last-First, I can see that the profiling was done for 20,573 milliseconds, but the Metric is 218,443 samples. How come? Since PerfView takes one sample every 1 millisecond, I should not have seen this value greater than 20,573, right?
Is this count so high because the Metric here from ALL those 12 processors? If yes, so should I do the calculation like (218443/12)/20573, which gives me 0.88?

What happen if you count twice? [duplicate]

I run an action two times, and the second time takes very little time to run, so I suspect that spark automatically cache some results. But I did find any source.
Im using Spark1.4.
doc = sc.textFile('...')
doc_wc = doc.flatMap(lambda x: re.split('\W', x))\
.filter(lambda x: x != '') \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda x,y: x+y)
%%time
doc_wc.take(5) # first time
# CPU times: user 10.7 ms, sys: 425 µs, total: 11.1 ms
# Wall time: 4.39 s
%%time
doc_wc.take(5) # second time
# CPU times: user 6.13 ms, sys: 276 µs, total: 6.41 ms
# Wall time: 151 ms
From the documentation:
Spark also automatically persists some intermediate data in shuffle operations (e.g. reduceByKey), even without users calling persist. This is done to avoid recomputing the entire input if a node fails during the shuffle. We still recommend users call persist on the resulting RDD if they plan to reuse it.
The underlying filesystem will also be caching access to the disk.

Web analytics schema with postgres

I am building a web analytics tool and use Postgresql as a database. I will not insert postgres each user visit but only aggregated data each 5 seconds:
time country browser num_visits
========================================
0 USA Chrome 12
0 USA IE 7
5 France IE 5
As you can see each 5 seconds I insert multiple rows (one per each dimensions combination).
In order to reduce the number of rows need to be scanned in queries, I am thinking to have multiple tables with the above schema based on their resolution: 5SecondResolution, 30SecondResolution, 5MinResolution, ..., 1HourResolution. Now when the user asks about the last day I will go to the hour resolution table which is smaller than the 5 sec resolution table (although I could have used that one too - it's just more rows to scan).
Now what if the hour resolution table has data on hours 0,1,2,3,... but users asks to see hourly trend from 1:59 to 8:59. In order to get data for the 1:59-2:59 period I could do multiple queries to the different resolutions tables so I get 1:59:2:00 from 1MinResolution, 2:00-2:30 from 30MinResolution and etc. AFAIU I have traded one query to a huge table (that has many relevant rows to scan) with multiple queries to medium tables + combine results on client side.
Does this sound like a good optimization?
Any other considerations on this?
Now what if the hour resolution table has data on hours 0,1,2,3,... but users asks to see hourly trend from 1:59 to 8:59. In order to get data for the 1:59-2:59 period I could do multiple queries to the different resolutions tables so I get 1:59:2:00 from 1MinResolution, 2:00-2:30 from 30MinResolution and etc.
You can't do that if you want your results to be accurate. Imagine if they're asking for one hour resolution from 01:30 to 04:30. You're imagining that you'd get the first and last half hour from the 5 second (or 1 minute) res table, then the rest from the one hour table.
The problem is that the one-hour table is offset by half an hour, so the answers won't actually be correct; each hour will be from 2:00 to 3:00, etc, when the user wants 2:30 to 3:30. It's an even more serious problem as you move to coarser resolutions.
So: This is a perfectly reasonable optimisation technique, but only if you limit your users' search start precision to the resolution of the aggregated table. If they want one hour resolution, force them to pick 1:00, 2:00, etc and disallow setting minutes. If they want 5 min resolution, make them pick 1:00, 1:05, 1:10, ... and so on. You don't have to limit the end precision the same way, since an incomplete ending interval won't affect data prior to the end and can easily be marked as incomplete when displayed. "Current day to date", "Hour so far", etc.
If you limit the start precision you not only give them correct results but greatly simplify the query. If you limit the end precision too then your query is purely against the aggregated table, but if you want "to date" data it's easy enough to write something like:
SELECT blah, mytimestamp
FROM mydata_1hour
WHERE mytimestamp BETWEEN current_date + INTERVAL '1' HOUR AND current_date + INTERVAL '4' HOUR
UNION ALL
SELECT sum(blah), current_date + INTERVAL '5' HOUR
FROM mydata_5second
WHERE mytimestamp BETWEEN current_date + INTERVAL '4' HOUR AND current_date + INTERVAL '5' HOUR;
... or even use several levels of union to satisfy requests for coarser resolutions.
You could use inheritance/partition. One resolution master table and many hourly resolution children tables ( and, perhaps, many minutes and seconds resolution children tables).
Thus you only have to select from the master table only, let the constraint of each children tables decide which is which.
Of course you have to add a trigger function to separate insert into appropriate children tables.
Complexities in insert versus complexities in display.
PostgreSQL - View or Partitioning?