Identification and Diagnosis of Efficiency Problems in HPC - hpc

There are many articles and books on problems in HPC, but I feel like I am missing on the diagnose of scaling and efficiency issues. For example, I am reading a books called "Introduction to High Performance Computing for Scientists and Engineers" by Horst Simon where he discusses a wide variety of problems and solutions such as,
Cache misses
Load Imbalance
Poor Vectorization of code
etc.
But if I were handed a piece of code even remotely complex (ie more than nested for-loops) I would have a very hard time discovering what the bottleneck was or proving that the code had reached the limits of a given piece of hardware.
In analog with medicine, I can currently list out a bunch of possible diseases that make people "less efficient", but this is hardly useful. I need to figure out how to diagnose my "patients" and then prescribe a "cure".
Could I please be referred to literature that teaches how to diagnosis of HPC problems (efficiency, scalability, etc)? Almost a step-by-step guide. Like put stethoscope of chest, then listen, ...

This question is two questions: one is how do I find bottlenecks, the other is how do I know the limits of my hardware and if I am at them.
The first is that you must run the code inside a profiler. Any profiler with a "top down" view of your code according to time is showing you the bottlenecks.
Try the profilers suggested here (answer applies to c++ and Fortran): Good profiler for Fortran and MPI - both Allinea MAP and HPC Toolkit have the sort of presentation you need. (NB I work for Allinea).
The second question is the most "open" part. That one needs your book or optimization guide. However, a good start is to see how much vectorization you have (Some of the profiler examples can show this) as this is where the most compute power can be found.
The bigger question is what the theoretical limit of your problem is - eg. Some problems are not amenable to vectorization, some have memory access needs that can never be cache friendly, some have communication needs that are simple whereas others require costly regular global updates.

Related

Factors that Impact Translation Time

I have run across issues in developing models where the translation time (simulates quickly but takes far too long to translate) has become a serious issue and could use some insight so I can look into resolving this.
So the question is:
What are some of the primary factors that impact the translation time of a model and ideas to address the issue?
For example, things that may have an impact:
for loops vs a vectorized method - a basic model testing this didn't seem to impact anything
using input variables vs parameters
impact of annotations (e.g., Evaluate=true)
or tough luck, this is tool dependent (Dymola, OMEdit, etc.) :(
use of many connect() - this seems to be a factor (perhaps primary) as it forces translater to do all the heavy lifting
Any insight is greatly appreciated.
Clearly the answer to this question if naturally open ended. There are many things to consider when computation times may be a factor.
For distributed models (e.g., finite difference) the use of simple models and then using connect equations to link them in the appropriate order is not the best way to produce the models. Experience has shown that this method significantly increases the translation time to unbearable lengths. It is better to create distributed models in the same approach that is used the MSL Dynamic pipe (not exactly like it but similar).
Changing the approach as described is significantly faster in translational time (orders of magnitude for larger models, >~100,000 equations) than using connect statements as the number of distributed elements increases to larger numbers. This was tested using Dymola 2017 and 2017FD01.
Some related materials pointed out by others that may be useful for more information have been included below:
https://modelica.org/events/modelica2011/Proceedings/pages/papers/07_1_ID_183_a_fv.pdf
Scalable Test Suite : https://dx.doi.org/10.3384/ecp15118459

For a Single Cycle CPU How Much Energy Required For Execution Of ADD Command

The question is obvious like specified in the title. I wonder this. Any expert can help?
OK, this is was going to be a long answer, so long that I may write an article about it instead. Strangely enough, I've been working on experiments that are closely related to your question -- determining performance per watt for a modern processor. As Paul and Sneftel indicated, it's not really possible with any real architecture today. You can probably compute this if you are looking at only the execution of that instruction given a certain silicon technology and a certain ALU design through calculating gate leakage and switching currents, voltages, etc. But that isn't a useful value because there is something always going on (from a HW perspective) in any processor newer than an 8086, and instructions haven't been executed in isolation since a pipeline first came into being.
Today, we have multi-function ALUs, out-of-order execution, multiple pipelines, hyperthreading, branch prediction, memory hierarchies, etc. What does this have to do with the execution of one ADD command? The energy used to execute one ADD command is different from the execution of multiple ADD commands. And if you wrap a program around it, then it gets really complicated.
SORT-OF-AN-ANSWER:
So let's look at what you can do.
Statistically measure running a given add over and over again. Remember that there are many different types of adds such as integer adds, floating-point, double precision, adds with carries, and even simultaneous adds (SIMD) to name a few. Limits: OSs and other apps are always there, though you may be able to run on bare metal if you know how; varies with different hardware, silicon technologies, architecture, etc; probably not useful because it is so far from reality that it means little; limits of measurement equipment (using interprocessor PMUs, from the wall meters, interposer socket, etc); memory hierarchy; and more
Statistically measuring an integer/floating-point/double -based workload kernel. This is beginning to have some meaning because it means something to the community. Limits: Still not real; still varies with architecture, silicon technology, hardware, etc; measuring equipment limits; etc
Statistically measuring a real application. Limits: same as above but it at least means something to the community; power states come into play during periods of idle; potentially cluster issues come into play.
When I say "Limits", that just means you need to well define the constraints of your answer / experiment, not that it isn't useful.
SUMMARY: it is possible to come up with a value for one add but it doesn't really mean anything anymore. A value that means anything is way more complicated but is useful and requires a lot of work to find.
By the way, I do think it is a good and important question -- in part because it is so deceptively simple.

Matlab vs Aforge vs OpenCV

I am about to start a project in visual image-processing and have no had experience with Matlab, Aforge, OpenCV and was wondering if anyone had any experiences with these different software packages.
I was also wondering which of the three packages were most efficient I assume OpenCV but has anyone had any experience?
Thanks
Jamie.
The question you need to ask yourself is which is more important - your time or the computer's time. If your task is really simple, you may be able to code it up in MATLAB and have it work right off the bat. MATLAB is by far the easiest for development - a scripted language with built-in memory management, a huge array of provided functions, and a great interface for displaying and manipulating data while debugging.
On the other hand, MATLAB is at least an order of magnitude slower than compiled openCV code for many tasks. This is especially true if you use the intel performance primitives libraries.
If you know how to code in MATLAB, I would suggest writing and debugging your algorithms in that language, then porting them to c/c++ with openCV for speed. If there are only a couple of simple functions that you need to speed up, you can call c code from MATLAB, but it's hard to get this working right the first few times you try it, so you're probably better off just rewriting your finished code entirely in c/c++
First, please elaborate about your project's needs. It has the biggest impact on the choice, in addition to other factors - your general programming knowledge (If you haven't dealt with dot net but just with C++, AForge is not a good choice, for example).
Generally,
Both AForge and OpenCV has a built-in interface to .Net, and OpenCV also with C++, python, and more. Matlab might be more efficient, but if you don't have any experience with it - you should also learn its syntax. Take it into consideration.
Matlab probably has the largest variety of functions, but it is more complicated than the other projects. OpenCV and AForge themselves have some differences - see them described in this StackOverflow question/ answers.
I worked last year in two similar projects with cars on the highway. Afaik, Matlab allows to process only one picture frame at a time (surely you could elaborate an algorithm to compute a stream) but using Simulink you can process the stream directly.
On the other hand, i found AForge a lot friendlier and easier to use since you can easily adjust the processing parameters from a GUI (not so fast/easy) to do in Matlab/simulink.
I'd go for Aforge.Net. It's also fast enough if you're worrying about processing speed. (using 640x480)
If you are asking about using one of these in .net,easily you can get info by this:
1-matlab mostly used in simulation of projects not the End-prototype project; my numer : 30;
2-aforge (as I'v used in many project) if you do not need the circular process like capturing image, or recognition of something in images or ... you'll find it very good, cause it is easy to use but useful for single processes; my number : 50
3-opencv very good at speed and useful for circular processes, for example you can capture images from a webcam and Instantly cartoonize it without any delay, But not easy-to-use as aforge. I like it anyway cause of its speed and MANY functions it gives us mostly anything we need in programming; my number : 80
Dr.Taha - Tahasoft.net

Your experiences with Matlab/F#/R for data analysis and modeling algorithms

I've been using F# for a while now to model algorithms before coding them in C++, and also using it afterwards to check the results of the C++ code, and also against real-world recorded data.
For the modeling side of things, it's very handy, but for the 'data mashup' kind of stuff, pulling in data from CSV and other sources, generating statistics, drawing charts etc., my colleague teases me no end ("why are you coding that yourself? It's built in to MatLab").
And I have another colleague who swears by R, which also has charting stuff 'built-in'.
I know that MatLab, R and F# are not strictly comparable, so I'm not asking for a 'feature comparison shoot out'. I just wondered what other people are using for these kind of pre- and post-analysis scenarios, and how happy they are with it.
(If there's anyone out there working on wrapping Microsoft Charts into something F#-friendly, let me know, I'd be happy to participate...)
(Note: answers to this question will be subjective, but based on experience, please)
I have very little experience with F#, but regarding C++/Matlab/R: If the speed of your program's execution is the most important, use C++. If speed of implementation is the most important, use Matlab or R. This is true for a number of reasons, not the least of which is their massive libraries of math/stats packages.
Both Matlab and R can be sped up through parallelism: so generally, I think that speed and quality of implementation should be a bigger concern. That's where the real "value" of programming is taking place, in the design of the application. It's not a minor proposition if you can write 3 or 4 good R programs in the same time it takes you to write 1 good C++ program.
Regarding F#: so far as it is part of Microsoft's framework, it must have a lot to offer. If you're developing in Visual Studio or working on a big .Net project (for instance), it might make sense to use F#. On the other hand, you can call both Matlab and R from .Net applications, so I would probably argue that their libraries should be a bigger concern. For instance, see this article as an example for R and the Matlab Builder.
Long story short: comparing F# and Matlab/R isn't a good comparison. F# is a general purpose programming language, while Matlab/R can be viewed as massive mathematical/data analysis toolkits. Some people call Matlab or R from F# in order to take advantage of each language's benefits (e.g. see this discussion, this article on Matlab/F#, or this article on R/F#).
So far as charting is concerned: R is extremely strong on this front. Have a look at the graphics view on CRAN and this series of posts on the LearnR blog about Lattice and ggplot2.
I've worked a bit with matlab and python/pylab for these purposes. What these tools have 'built-in' is a programming environment, a shell, and gui tools designed for quickly looking at data from a variety of sources.
In a few commands, you can go from having a csv file to interactive plots on the screen, then to an image export in just about any format. It takes a minute or two to go from data to visualization once you have the hang of it. I would imagine this is uncommon in the C++ world (although I have seen some professors with pretty impressive work-flows).
I've tried R, but I can't say much useful about it. It seems to offer about the same set of features, but it may be troublesome to Google for support.
If you are spending more than a couple minutes getting from data to plot using your current method, it's definitely worth learning one of these environments. The best choice depends on your colleagues, your work environment, experience, and your budget.
This is a reasonable close double to the previous question on suitable functional language for scientific/statistical computing so you may want to peruse the long and detailed answers there.
Answers depends, as so often, on your experience and prior language training. I very much prefer R for data munging / modeling / visualization.
I use R because on the one hand it has everything built in and on the other hand you can still manipulate almost everything or start from scratch. Nevertheless, R is rather slow for heavy calculations (although I do all my Monte Carlo simulations in it).
I would say that Matlab is best for the availability of mathematical functionalities in general, R is best for data input/manipulation/visualisation/analysis/etc., and C++ for high-speed subroutines. You can by the way easily integrate C++ (or C, fortran, ...) code in R. Why not read and manipulate input data in R, apply the models in C++, and analyse/visualize output back in R?
I always prototype my models in MATLAB. If my prototype is fast enough, I refactor and it's done. If not, I go back and implement certain functions in C to be called by MATLAB. This requires knowledge of a low level language, which I think is always going to be the case if you are doing anything that is technically challenging.
I'm intrigued with this Lisp flavor if it ever gets off the ground.

How do I estimate tasks using function points?

What are the steps to estimating using function points?
Is there a quick-reference guide of some sort out there?
I took a conference session on Function Point Analysis a few years back. There is a lot too it. You can check out the Free Function Point Training Manual online, the Fundamentals of Function Points, or I suspect you can get a book on it at a computer store.
You might also check out the International Function Point Users Group and see if they have some resources or a local meeting for you.
You really need to get some training on it. Check with IFPUG. You will unknowingly pick up some destructive bad habits if self-taught. It also helps to have an experienced FP analyst review some of your early attempts.
It's the kind of thing that appears overwhelmingly complex until you "get it" and then it's fairly quick to do. It improved my requirements analysis a lot too. I often spot contradictions and gaps when doing a count.
It isn't limited to BDUF Waterfall projects either. I spent three years using FP and Planning Poker as cross-checks on one another when contracting agile methods projects.
I was IFPUG-certified from 2002-2005 and am still using FP analysis. I've seen it misused a lot, and I think that's why it has such a bad reputation.
I recommend you take a look at COSMIC Function points. https://cosmic-sizing.org. COSMIC Function points are also an ISO standard for measuring software size. They are an evolved improvement over IFPUG.
You can quickly estimate size by counting the entries, exits, reads and writes.
Compared with the IFPUG manual, learning COSMIC is much easier, the free book below is all you need, and you can read it in a day.
Recommended reading: https://cosmic-sizing.org/publications/measurement-guide/