Does ProfileOptimization actually work? - multicore

One of the new performance enhanchements for .NET 4.5 is the introduction of the 'MultiCode JIT'.
See here for more details.
I have tried this, but it seems to have no effect on my application.
The reason why I am interested is that my app (IronScheme) takes a good long time to startup if not NGEN'd, which implies a fair amount of JIT'ng is involved at startup. (1.4 sec vs 0.1 sec when NGEN'd).
I have followed the instructions on how to enable this, and I can see a 'small' (4-12KB) is created. But on subsequent startup, it seems to have absolutely no effect on improving the startup time. It is still 1.4 sec.
Has anyone actually seen (or made) this work in practice?
Also, are there any limitations on which code will be 'tracked'? Eg: assembly loading contexts, transient assemblies, etc. I ask this as the created file never seems to grow, but I am in fact generating a fair amount of code (in a transient assembly).
One bug that I did encounter was that SetProfileRoot does not seem to understand a / as a path separator, make sure to use \ .

The rule of thumb we use at Microsoft is that Multicore JIT gets you about half way towards NGEN startup performance. Thus if your app starts in 0.1 seconds with NGEN and 1.4 seconds without NGEN, we would expect Multicore JIT startup to take about 0.75 seconds.
That being said, we had to put some limitations in place to guarantee that program execution order is the same with and without MCJ. MCJ will sometimes pause the background thread waiting for modules to be loaded by the foreground thread, and will abort background compilation if there is an assembly resolve or module resolve event.
If you want to find out what's happening in your case, we have ETW (Event Tracing For Windows) instrumentation of the MCJ feature and we will be releasing a version of PerfView soon which will be able to collect these events by if you take a trace of your app startup.
Update: PerfView has been updated to be able to show background JIT information. Here are the steps to diagnosing with the latest version (1.2.2.0):
Collect a trace using PerfView of your application startup, either using Collect->Run or Collect->Collect from the main PerfView menu.
Assuming you used Collect-> Run, put the name of your .exe in the Command text box, pick a filename (i.e. IronScheme.etl), select Background JIT from Advanced Options, and click Run Command.
Close your application and double click on the IronScheme.etl file that gets generated.
Double click on the JIT Stats view in the list underneath IronScheme.etl, you should see something like this in the view that pops up:
This process uses Background JIT compilation (System.Runtime.ProfileOptimize)
Methods Background JITTed : 2,951
Percent # Methods Background JITTed : 52.9%
MSec Background JITTing : 3,901
Percent Time JITTing is Background : 50.9%
Background JIT Thread : 11308
You can click on "View Raw Background Jit Diagnostics" to see all of the MCJ events in excel. One question I forgot to ask: are you running this on a multicore machine or multicore VM? It is a common mistake to test out MCJ in a VM that only has a single logical processor.

Calling Activator.CreateInstance during startup seems to kill MCJ?
Or rather that triggered an Assembly Resolve, which completely seems to stop MCJ. And never work after that. Maybe the MSDN docs should mention this.

Related

Can I open and run from multiple command line prompts in the same directory?

I want to open two command line prompts (I am using CMDer) from the same directory and run different commands at the same time.
Would those two commands interrupt each other?
One is for compiling a web application I am building (takes like 7 minutes to compile), and the other is to see the history of the commands I ran (this one should be done quickly).
Thank you!
Assuming that CMDer does nothing else than to issue the same commands to the operating system as a standard cmd.exe console would do, then the answer is a clear "Yes, they do interfere, but it depends" :D
Break down:
The first part "opening multiple consoles" is certainly possible. You can open up N console windows and in each of them switch to the same directory without any problems (Except maybe RAM restrictions).
The second part "run commands which do or do not interfere" is the tricky part. If your idea is that a console window presents you something like an isolated environment where you can do things as you like and if you close the window everything is back to normal as if you never ever touched anything (think of a virtual machine snapshot which is lost/reverted when closing the VM) - then the answer is: This is not the case. There will be cross console effects observable.
Think about deleting a file in one console window and then opening this file in a second console window: It would not be very intuitive if the file would not have been vanished in the second console window as well.
However, sometimes there are delays until changes to the file system are visible to another console window. It could be, that you delete the file in one console and make a dir where the file is sitting in another console and still see that file in the listing. But if you try to access it, the operating system will certainly quit with an error message of the kind "File not found".
Generally you should consider a console window to be a "View" on your system. If you do something in one window, the effect will be present in the other, because you changed the underlying system which exists only once (the system is the "Model" - as in "Model-View-Controller Design Pattern" you may have heard of).
An exception to this might be changes to the environment variables. These are copied from the current state when a console window is started. And if you change the value of such a variable, the other console windows will stay unaffected.
So, in your scenario, if you let a build/compile operation run and during this process some files on your file system are created, read (locked), altered or deleted then this would be a possible conflicting situation if the other console window tries to access the same files. It will be a so called "race condition", that is, a non-deterministic process, which state of a file will be actual to the second console window (or both, if the second one also changes files which the first one wants to work with).
If there is no interference on a file level (reading the same files is allowed, writing to the same file is not), then there should be no problem of letting both tasks run at the same time.
However, on a very detailed view, both processes would interfere in that they need the same limited but vastly available CPU and RAM resources of your system. This should not pose any problems with the todays PC computing power, considering features like X separate cores, 16GB of RAM, Terabytes of hard drive storage or fast SSDs, and so on.
Unless there is a very demanding, highly parallelizable, high priority task to be considered, which eats up 98% CPU time, for example. Then there might be a considerable slow down impact on other processes.
Normally, the operating system's scheduler does a good job on giving each user-process enough CPU time to finish as quickly as possible, while still presenting a responsive mouse cursor, playing some music in the background, allowing a Chrome running with more than 2 tabs ;) and uploading the newest telemetry data to some servers on the internet, all at the same time.
There are techniques which make it possible that a file is available as certain snapshots to a given timestamp. The key word would be "Shadow Copy" under Windows. Without going into details, this technique allows for example defragmenting a file while it is being edited in some application or a backup could copy a (large) file while a delete operation is run at the same file. The operating system ensures that the access time is considered when a process requests access to a file. So the OS could let the backup finish first, until it schedules the delete operation to run, since this was started after the backup (in this example) or could do even more sophisticated things to present a synchronized file system state, even if it is actually changing at the moment.

exe stops execution after couple of hours

I have one exe which collect some information and once information collected saved in local machine. I have managed loop such that it will do same task for infinite time.
But exe stops execution after couple of hours (approx 5-6 hours), it neither crashed nor gives exception.
I tried to find reason in windbg but I haven't got any exception in it.
Now, Could anyone help me to detect problem?
should I go for sysinternal tool or any other, which debugger tool should I use?
A few things jump to mind that could be killing your program:
Out of memory condition
Stack overflow
Integer wrap in loop counter
Programs that run forever are notoriously difficult to write correctly, because your memory management must be perfect. Without more information though, it's impossible to answer this question.
If the executable is not yours and is Naitive C/C++ code, you may want to capture first chance exception dumps or monitor the exe using Windows debug tools (such as DebugDiag or ADPlus).
Alternatively, if you have access to the developer of the executable, they may add more tracing to the exe (ETW or otherwise) to understand the possible failure points in the code.

Time Profiler - Wait for app launch

When launching my app from a custom URL scheme, when app is not backgrounded, the launch sequence is taking longer then I would like. I want to use time profiler to see what methods are taking so long. I know on run there is an option for "Wait for App Launch" so I can launch it using the URL, but I don't see that under the profiling scheme. Does anyone know a way that I can launch the app fresh, using the URL, and have time profiler running on launch?
"see what methods are taking so long"
Do you suppose some method (or a few) are sopping up a lot of CPU time in themselves or by calling other methods that do?
If so, it will be easy to fix, but it's Not Likely.
More likely the time is spent in I/O of one sort or another, and you need to figure out why, not where.
If you're able to start it under a debugger (say by using #ChrisTruman's recommendation), then all you need to do is interrupt it with Ctrl-C, Ctrl-Break, Escape, or whatever key combination interrupts it.
Do this during the time when, subjectively, it is slow.
Let's suppose the startup is taking three times longer than you think it should.
If that's so, that means two thirds of the time is spent doing the unnecessary I/O or whatever it is.
That means each time you interrupt it, the probability is 2/3 that you will catch it in the act of doing whatever causes the slowness.
So interrupt it a few times, and each time just read the stack, look at variables, etc.
You will see why it's being slow.
Don't even look for where - that will appear by itself.
That's the basic idea behind this technique.

How is multitasking implemented at the elementary level?

How is the multitasking implemented at the basic level ? To clarify my question, lets say we are given a C runtime to make an application which implements multitasking, which can run only one task at a time on a single core processor, say, by calling main() function of this "mutlitasking" application.
How do standard OS kernels implement this ? How does this change with multicore processors
OS sets an interrupt timer, and lets the program run. Once the timer expires, control flow jumps to code of the OS for context switch.
On the context switch OS saves registers and supporting data of the current process and replaces it in CPU with data of the next process in queue. Then it sets another interrupt timer and let the next program run from where it was interrupted.
Also a system call from the current process gives control to the OS to decide if it is time for a context switch (eq. process is waiting for an IO operation)
The mechanics is transparent for programs.
Run. Switch. Repeat. :)
I've not done much work with multi-core processors, so I will refrain from attempting to answer that part of the query. However, with uniprocessors, two strategies come to mind when it comes to multi-tasking.
If I remember correctly, the x86 supports hardware task switching. (I've had minimal experience with this type of multi-tasking.) From what I recall, when the processor detects the conditions for a task switch, it automatically saves all the registers of the outgoing task into its Task State Segment (x86), and loads all the registers from the incoming task's Task State Segment. There are various caveats and limitations with this approach such as the 'busy bit' being set and only being able to switched back to a 'busy task' under special conditions. Personally, I do not find this method to be particularly useful to me.
The more common solution that I have seen is task switching by software. This, can be broken down into cooperative task switching and pre-emptive task switching. If you are coding up a cooperative task switching strategy, a task switch only occurs when the task voluntarily gives up the processor. In this strategy, you only need to save and load the non-volatile registers. If a pre-emptive strategy is chosen, then a task switch can occur either voluntarily, or non-voluntarily. In this case, all the registers must be saved and loaded. When coding either scenario, you have to pay extra care that you do not corrupt your register contents and that you set up your stack correctly so that when you return from task-switching code you are at the right place on the stack of the incoming task.
Hope this helps.

anyway to see method execution times in Xcode?

I need to debug a certain ViewController I have and I can't seem to pinpoint exactly what is causing my lag time for the view to show.
IS there any debugger tool in Xcode that will show me how long my methods are taking to run so i can at least find the right place to start?
Instruments has a profiler built into it ever since iOS 4.0 (before which you used a stand-alone profiler tool called Shark).
Here's a quick little tutorial that will get you started: http://blancer.com/tutorials/flex/78335/apple-profiling-tools-shark-is-out-instruments-is-in/
If you don't know about Instruments, you should. It's how you know what's really going on inside your code while it runs.
Apart from Time Profiler as suggested by Dan you can also use Sampler instrument which generally stops a program at prescribed intervals and records the stack trace information for each of the program’s threads. You can use this information to determine where execution time is being spent in your program and improve your code to reduce running time.
The main difference between sampler & Time profiler :
Sampler instrument operates upon a single process but Time Profiler operates upon a single/All processes.