Could some one please elaborate on "RTL simulation is faster than delta-cycle simulation but can not be used in all situations"?
I don't know what Delta cycle simulation
In a Verilog simulation delta-cycles are those used by the simulator to calculate the next value.
When entering a combinatorial section the simulator will use many delta cycles to resolve the dependencies required for the answer. If you have combinatorial loops not broken up with a flip-flop the simulator may get stuck constantly reiterating the loop trying to resolve the dependencies, which is impossible due to the loop. If you get a simulation which just hangs this is often the cause.
The non-blocking assignment (<=) makes use of delta-cycles, by resolving the right hand side values (potentially many delta-cycles), then a delta cycle later assign these to the left hand side.
a<=b;
b<=a;
In the simple case above b and a are copied to temporary location (think next_a next_b). Then a delta cycle later a is assigned to next_a and b to next_b.
There are other ZERO-Time constructs which do not use a delta cycles often used in Test benches for modelling. I have no experience of these hopefully another answer could describe there use.
Related
I'am trying to run real-life experiment for an application in Raspberry Pi, and I need to estimate or predicate the execution time for the application. in other words, before execution/ run the task i need to know how long (roughly) this task/app going to take to get the result back. I have identified several techniques and works that have been done before. but most of it are simulation work which doesn't work with real-life experiment. does anyone can help me with any idea or technique (No code). thank you in advance
Estimating the execution time of an application or function is going to be difficult in any context. You might want to look up the halting problem for some insight to why. It's impossible to determine whether a given program will finish executing, and therefore, you can't really tell how long a given program will take to finish executing.
For general computing, varying hardware capabilities of any given system will always have an effect on the execution time of a program. Raspberry Pi is a little more discrete than that, and therefore more predictable in that sense, but those specifications will not always be consistent across its various versions. That adds to the complexity of determining a run time.
Practically, the most reliable way to determine how long a process will take would be to just run it and time it. If you absolutely need predicted times for something, you might be able to do a bit of a composite estimate - time the smaller chunks of the application separately, and then use those to determine how long you expect the application as a whole to run. For most situations, though, it would be much faster to just run the program itself rather than trying to predict it.
Store the time before and after the execution ? Then you could know the execution time
I am hoping somebody here will be able to help me out with my small issue with one of the Simulink/Matlab code. It is quite similar to the problem that I’ve discussed earlier, but a little bit more complicated and now it is more a Simulink issue, rather than a Matlab one.
So I have a turbine which speed is controlled by the gate’s opening, hence the control voltage. By controlling the gate opening I am accelerating the turbine and at some point in time, I need to introduce a saturation effect (since I am testing the code now, it will be done an external signal). This effect won’t change the control voltage, but it affects other components of the system, hence at the same control voltage, the turbine’s speed will go up. But at the same time, I need to keep the speed at the same value as it was before the saturation effect (let’s say it was 320 rpm). To do so I need to decrease the control voltage and should keep doing it until I reach the speed as it was before. There is no need to do it instantly (this approach will be later introduced in hardware), but it will be a nice thing to check the algorithm in these synthetic tests.
In terms of the model, I was planning to use a while loop with the speed requirement “if speed > 320” again, now just to simplify things. To decrease the control voltage I was planning to subtract from the original 50 (% opening) - 0.25 (u2) at first and after that increasing this value by 0.25 until I decrease the speed below 320. I can’t know the exact opening when this requirement will be satisfied, hence I need some kind of algorithm to “track” this voltage.
So it should be something like this:
u2 = 0;
While speed > 320
u2 = u2+0.25
End
u2 is initially zero since we have a predefined initial control voltage. And obviously, when we reach the motor’s speed below 320, I need to keep the latest value of the u2 (and control voltage).
Overall, it is a small code and should be done in Simulink (don’t want to introduce any other Fcn function into the model). I’ve never used while and if blocks in Simulink, but so far I came up with this system. It’s a simplified version of my model, but the control principle is the same.
We are getting the motor speed of 350, compared with 320 (the speed before “saturation), and if our speed after saturation is higher, we need to reduce the control voltage. To trigger the while loop block I’ve decided to use a simple switch. The while block meanwhile is:
Definitely not the best implementation but I was trying a lot of different combinations and without any real success. I am always getting the same error:
Was trying to use a step signal instead of the constant “7” – to model acceleration of the motor, and was getting the same error at the moment of acceleration above 320 threshold. So looks like the approach is almost right but mathematically it fails to find the most suitable solution. I’ve tried to implement a transport delay in the memory part of the while subsystem but was getting errors during compilation all the time.
Are there any obvious (and not so) mistakes? Or maybe from the beginning, I should have chosen another approach… I really hope that somebody will be able to help. Thank you in advance and have a great day.
I do not think that you have used While block correctly.
This is what I have done, I used a "Matlab function" block instead of "While" block as follows,
The function in Matlab function is
function u2=fcn(speed,u2d)
if speed>320
u2=u2d+0.25;
else
u2=u2d;
end
And the results I have got, Scope 1
Scope
Edit
As you prefer a function free model, the following may do the same.
I have noticed that the first time I run a script, it takes considerably more time than the second and third time1. The "warm-up" is mentioned in this question without an explanation.
Why does the code run faster after it is "warmed up"?
I don't clear all between calls2, but the input parameters change for every function call. Does anyone know why this is?
1. I have my license locally, so it's not a problem related to license checking.
2. Actually, the behavior doesn't change if I clear all.
One reason why it would run faster after the first time is that many things are initialized once, and their results are cached and reused the next time. For example in the M-side, variables can be defined as persistent in functions that can be locked. This can also occur on the MEX-side of things.
In addition many dependencies are loaded after the first time and remain so in memory to be re-used. This include M-functions, OOP classes, Java classes, MEX-functions, and so on. This applies to both builtin and user-defined ones.
For example issue the following command before and after running your script for the first run, then compare:
[M,X,C] = inmem('-completenames')
Note that clear all does not necessarily clear all of the above, not to mention locked functions...
Finally let us not forget the role of the accelerator. Instead of interpreting the M-code every time a function is invoked, it gets compiled into machine code instructions during runtime. JIT compilation occurs only for the first invocation, so ideally the efficiency of running object code the following times will overcome the overhead of re-interpreting the program every time it runs.
Matlab is interpreted. If you don't warm up the code, you will be losing a lot of time due to interpretation instead of the actual algorithm. This can skew results of timings significantly.
Running the code at least once will enable Matlab to actually compile appropriate code segments.
Besides Matlab-specific reasons like JIT-compilation, modern CPUs have large caches, branch predictors, and so on. Warming these up is an issue for benchmarking even in assembly language.
Also, more importantly, modern CPUs usually idle at low clock speed, and only jump to full speed after several milliseconds of sustained load.
Intel's Turbo feature gets even more funky: when power and thermal limits allow, the CPU can run faster than its sustainable max frequency. So the first ~20 seconds to 1 minute of your benchmark may run faster than the rest of it, if you aren't careful to control for these factors.
Another issue not mensioned by Amro and Marc is memory (pre)allocation.
If your script does not pre-allocate its memory it's first run would be very slow due to memory allocation. Once it completed its first iteration all memory is allocated, so consecutive invokations of the script would be more efficient.
An illustrative example
for ii = 1:1000
vec(ii) = ii; %// vec grows inside loop the first time this code is executed only
end
I am working on the development of an Iterative Learning Controller for a simple transfer function.
The iterations are controlled by the external matlab loop.
But the error e(k) (k is trial number) is not updating ... as the trials increases.
Please detect the error I've commited.
Thanks and Regards.
You might have solved the problem. But as the question is still open, I would like to add something here.
First of all, you might want to check the usage of "memory" block. "The Memory block holds and delays its input by one major integration time step." The reason why the error wasn't updating is that the output of your plant produced was the same in each iteration(you defined external loop). The memory block only delayed one step of your U(K), not the whole iteration.
You might want to store the error of each iteration to workspace, and use it for the next iteration.
The memory should be a vector with a lenght of the single iteration. Not just single value. Delay block can store multiple past samples.
This guy did probably what you were looking for: https://github.com/arthurrichards77/iterative-learning-control
I'm trying to optimize the performance of my code, but I'm not familiar with xcode's debuggers or debuggers in general. Is it possible to track the execution time and frequency of calls being made at runtime?
Imagine a chain of events with some recursive calls over a fraction of a second. What's the best way to track where the CPU spends most of its time?
Many thanks.
Edit: Maybe this is better asked by saying, how do I use the xcode debug tools to do a stack trace?
You want to use the built-in performance tools called 'Instruments', check out Apples guide to Instruments. Specifically you probably want the System Instruments. There's also the Tuning Guide which could be useful to you and Shark.
Imagine a chain of events with some
recursive calls over a fraction of a
second. What's the best way to track
where the CPU spends most of its time?
Short version of previous answer.
Learn an IDE or debugger. Make sure it has a "pause" button or you can interrupt it when your program is running and taking too long.
If your code runs too quickly to be manually paused, wrap a temporary loop of 10 to 1000 times around it.
When you pause it, make a copy of the call stack, into some text editor. Repeat several times.
Your answer will be in those stacks. If the CPU is spending most of its time in a statement, that statement will be at the bottom of most of the stack samples. If there is some function call that causes most of the time to be used, that function call will be on most of the stacks. It doesn't matter if it's recursive - that just means it shows up more than once on a stack.
Don't think about measuring microseconds, or counting calls. Think about "percent of time active". That's what stack samples tell you, and that's roughly what you'll save if you fix it.
It's that simple.
BTW, when you fix that problem, you will get a speedup factor. Then, other issues in your code will be magnified by that factor, so they will be easier to find. This way, you can keep going until you've squeezed every cycle out of it.
The first thing I tell people is to recognize the difference between
1) timing routines and counting how many times they are called, and
2) finding code that you can fruitfully optimize.
For (1) there are instrumenting profilers.
To be really successful at (2) you need a rare type of profiler.
You need a sampling profiler that
samples the entire call stack, not just the program counter
samples at random wall clock times, not just CPU, so as to capture possible I/O problems
samples when you want it to (not when waiting for user input)
for output, gives you, for each line of code that appears on stack samples, the percent of samples containing that line. That is a direct measure of the total time that could be saved if that line were not there.
(I actually do it by hand, interrupting the program under the debugger.)
Don't get sidetracked by problems you don't have, such as
accuracy of measurement. If a line of code appears on 30% of call stack samples, it's actual cost could be anywhere in a range around 30%. If you can find a way to eliminate it or invoke it a lot less, you will save what it costs, even if you don't know in advance exactly what its cost is.
efficiency of sampling. Since you don't need accuracy of time measurement, you don't need a large number of samples. Even if you get a large number of samples, they don't skew the results significantly, because they don't fail to spot the costly lines of code.
call graphs. They make nice graphics, but are not what you need to know. An arc on a call graph corresponds to a line of code in the best case, usually multiple lines, so knowing cost of an arc only tells the cost of a line in the best case. Call graphs concentrate on functions, when what you need to find is lines of code. Call graphs get wrapped up in the issue of recursion, which is irrelevant.
It's important to understand what to expect. Many programmers, using traditional profilers, can get a 20% improvement, consider that terrific, count the profiler a winner, and stop there. Others, working with large programs, can often get speedup factors of 20 times.
This is done by fixing a series of problems, each one giving a multiplicative speedup factor. As soon as the profiler fails to find the next problem, the process stops. That's why "good enough" isn't good enough.
Here is a brief explanation of the method.