Can Matlab use multiple processors for plots and interactions in plots? - matlab

I am analyzing EMG data in my research lab. One of the steps is to calculate a continuous wavelet transformation of the dataset (size ~80000). Therefore, I use Matlab with the wavelet toolbox and "cwt" to plot a 3D-scalogram.
The calculation takes a lot of time and any interaction like 3D-rotation (which is very important to see different aspects of the data) is nearly impossible.
The resource-monitor shows that only one of my hexa-core processor is working. I use parallel computing for other calculations and haven't found any solution or even a similiar question like this.
Is there anything I can do to activate multicore support for plots?

I'll hazard an educated guess and plump for the answer No to your question Is there anything I can do to activate multicore support for plots?
Matlab can certainly use multiple cores for its computations. Many of its intrinsic functions are already multi-threaded and will use any available cores without the programmer (or user) having to do take any special measures. For your own computations you can use the Parallel Compute Toolbox.
However, unless you have some very special graphics hardware (and if you do why didn't you mention it ?) your display shows you why only one processor is being used when you interact with your 3D plots -- somewhere between the screen and the hardware of your computer there is a bottleneck through which the outputs of all those cores are squeezed into one stream of bits and bytes for presentation.
Your experience is consistent with that bottleneck being the Matlab visualisation routines, I think it is safe to conclude from the evidence you present, that the Mathworks haven't multi-threaded the routines which compute the new screen positions of each element in a plot as you rotate it, or any of the other processing that goes on to turn the results of your analyses into a picture or pictures. If they did parallelise those routines, that would shift the bottleneck but not remove it.
To remove the bottleneck you would have to have a way for different Matlab threads to separately address different parts of your screen; I see no evidence that Matlab has that capability. Google will find you a ton of references to parallel rendering but I see no sign that Matlab currently implements any aspect of this.
I'll just add, in response to your comment where you write unfortunately I can't resample my data that you should be mindful that Matlab's visualisation routines are resampling your data for presentation unless you are only visualising datasets with numbers of samples less than the number of pixels available. If you visualise a time series with 80000 samples on a display with 2000 pixels horizontally, something has got to give.
You might get better graphics performance and superior understanding if you take charge of that resampling yourself.

Matlab plotting performance is pretty bad, it's more focused on customizability than performance. Using MEX to run some native C++ code to plot the data with OpenGL will likely be much much faster.

Related

How do I determine processor speed required for optical flow?

I'd like to use an optical flow system to get velocities from surrounding environment. I've read papers about how optical flow works, but they don't treat details about optic sensors.
My question is: How do I determine how much computational power is required to perform optical flow analysis?
I'd like to use a low-power system (like microcontrollers), but I don't know what kind of camera I could use with such a system. I mean, could it be color or does it need to be B/W? Rolling shutter or global shutter? Which frame rate or number of pixels?
I'd like to specify the system myself but, without knowing how those camera attributes impact the processing load, I'm not sure where to start.
As Chuck already said in the comment. You first need to start with something. Opticalflow calculation really depends on what you are using it for and what you are trying to achieve. For realtime applications you might want to consider using faster processors (this is always true though).
Continuing to my answer.
Opticalflow calculation performance depends on few main things:
The optical-flow method you choose (dense or sparse), you can read more about it here and here. Of course that you should take into account not only that sparse is faster than dense, also that sparse might be less accurate in some cases. Again, this depends on what you're trying to achieve.
In addition, you will see that there are different optical-flow algorithms. Some might be faster than others. There are many algorithms such as Lucas-Kanade, Horn-Schunck, TVL1, Farneback, etc.
Most optical-flow methods from libraries such as OpenCV gives you the ability to change some parameters in order to play with the trade-off between accuracy and performance. See this and also check the OpenCV methods such as this and this for example - see the different arguments.
The resolution of your image. Smaller image usually means faster calculation.
Few things you might also want to consider:
If you are using a processor that has multiple cores, make sure that you are using all the cores in the optical-flow calculation. Some libraries may already do this for you, but in some cases you will need to do it by yourself. Take a look at my question and answer in this post, it might give you some idea and help you getting starting with such case.
If you want more accurate optical-flow results you must use global shutter camera. Rolling shutter cameras, such as most of the web-cams, will give you an extra error you don't want.
You don't need color image, if you have a grayscale camera it will be even better. If not, you will need to convert it to grayscale (not B/W) for faster performance as well.
Some libraries such as OpenCV has an option (in some cases) to run these algorithms on a GPU. If using a GPU is an option you might want to consider this as well.
From my own experience, the main thing that gave me a boost in performance was changing my resolution from 640x480 to 320x240 and even 160x120. In my case it didn't really hurt the accuracy.
I used an Odroid U3 mini-pc with OpenCV PyrLK algorithm and input frames of 320x240 resolution. After applying what's described here (splitting the image to 4 for parallel calculation) it worked pretty well (realtime).
The answer given by Sarid has some strong points, and many of them are shared by researchers around the world. My opinions are shared by anyone who has actually worked with these topics in the real-world setting.... with real world, i mean implementing optical flow in drones, on mobile phones and IP cameras that are not sitting in a protected office, and where other systems (such as humans) need to interact and be co-dependent.
First of all, depending on your problem, you may want to invest time in looking for ready-made solutions. Optical flow sensors are readily available, cheap and robust (but usually not strong in accuracy). These are the kind of sensors you find in optical mice. They are low power, and easily interfaced with micro-controllers. Some have staggering sample rates of thousands of fps. They commonly have low spatial resolution however, and (to emphasize) high robustness but low accuracy.
If instead you are looking for the kind of optical flow that can be used for shape from motion, pedestrian detection and video-encoding, for example, then you are probably better off to look for something more advanced, and thats where Sarids answer becomes relevant.
Since your question has been migrated from robotics stack exchange, I am going to assume you are interested applications close to machine control and human machine interaction. In that case, the most important aspects are the ones usually most ignored by people working in the field of optical flow estimation, namely:
Latency. If you have a human interfacing at the front-end... then the common term is "glass-to-glass latency". This is completely different from the fps of your system, which is connected to throughput. If you find that you are in a discussion with someone, and they do not understand the difference between latency and fps, then they are not the expert you are interested in. For example, almost all researchers in computer vision who do GPU implementations of optical flow add massive latency by allowing for frame delays and ineffecient memory handling (inefficient from perspective of latency, but efficient in terms of throughput and hard-ware utilization). Consider the problem of controlling a drone, say make it self-stabilizing, it is better to receive a bad optical flow estimation 10 ms earlier, then a good one with 10 ms extra delay.... especially if the optical system does not give you any upper bounds of the delay for any given time.
Algorithm stability. This is completely different from accuracy. Accuracy is what 99% of all research in optical flow has been obsessing about for the last 30 years. Stability is not at all something evaluated in the Middlebury benchmark for example. Stability deals with how small changes in your data will guarantee small changes in the estimated optical flow. While some good work has been done in the community (on robust statistics most interestingly) in the end the final evaluation of any algortihm disregards stability. Consider the optical mouse as a good example. The first generations of optical mice had higher accuracy (the average error from the true motion was smaller) but they had lower stability (especially when you ran the mice over "bad textures", with rotational motions). Later generations of optical mouse have worse accuracy, but are focusing on the stability, as that is the most important thing. You dont experience the mouse cursor jumping around as much as you did the earlier days of the devices.... but if you move the mouse on your mat, left and right repeatedly, you will see the cursor slowly drifting (i.e. low accuracy).
Heat. Any device that will estimate high accuracy optical flow, will require lots of computations. When it comes to computations per watt, GPUs are not that good. In drones, you may be able to get away with this, because it is a setting where you have active cooling as a by-product of the propulsion system. In the real-world, you most often can not assume active cooling nor unlimited power supply.
To conclude, its a fascinating area, and I hope you have a great experience coding solutions.

Gaussian Mixture Models implementation in Real time for Object Detection

I am working on real time video compression. I am modeling it with MATLAB. I need to later implement it on DSP processor with 5832MIPS, 729MHz.
Is it feasible to implement Gaussian Mixture models on DSP processor or are there better algorithms for object detection ?
Thanks in advance
Any kind of iterative procedure, in particular mixture model fitting, is quite time-consuming, and thus unlikely to be fast enough for real-time processing, unless you have only a handful of kernels to fit (we fit hundreds to a few thousands of spots, which takes several seconds per frame, despite our fast mixture-model implementation).
Why don't you use a non-iterative spot detector, such as H-Dome? Or do you need sub-pixel positions of your features?

Matlab versus simulation products such as ANSYS and COMSOL

This may be the wrong place to ask this, but I can't find a better place on the SE network.
I've briefly worked with both Matlab and Ansys, and from what I have learnt/can gather, Matlab is a programming environment that has functions that perform common math, visualization and analysis operations. You primarily write programs in a textual fashion (.m files) or use Simulink to generate flow graphs (model-based development). Ansys on the other hand is primary a simulation environment where quite a lot can be done simply with the GUI (3D models, physics domains, configuration, display settings), and you can add equations at various points in the simulation engine in order to modify the simulation flow.
Whatever I understand is cursory and only serves as an overview. Can anyone give me a suitable real-world comparison between Matlab and Ansys (or any other simulation product such as COMSOL) that would allow us to understand when to use which, and the weaknesses of each system.
I haven't used Ansys, but Ansys is often compared with Comsol, and I've used Comsol and Matlab for years.
Matlab:
Programming language and environment that runs it. Which means it can do anything (that any other programming language can do). What are its highlights, compared to other languages?
Hundreds of built-in functions to work with Matrices. For example, in one project I needed to do simple matrix algebra (add, multiply, scale matrices), and also needed singular value decomposition. SVD is not something you could write in 50 lines of code, so I needed a ready-made library. At the time I used a library for Java, and wrote my own code for representing matrices and doing matrix algebra on them. That's a few hundreds of lines of code. Had I used Matlab, it would have been about ten lines of code, because all of it is there. I would have needed only to type help svd to find out how to use it. However, if you don't need any of that, stay away from Matlab at all costs! There are much better languages that are free.
Great to use as a calculator that is always open on the desktop, and can do back-of-the-envelope style calculations.
Plotting graphs. Many academics recommend Matlab as the tool of choice for producing publication-quality graphics. These can be exported as PDF and imported into Inkscape for further editing. The best thing is that commands for plotting a graph could be put into a script file, and then parts of it can be changed later as needed, which can save a lot of work compared to manually drawing a graph (imagine you wanted to change the axes or symbols used to present the data points).
Personally, I also use it for curve-fitting. It has many toolboxes, one of which is a neat tool that allows me to find equations that model a set of data points.
Comsol:
Specialised tool for solving partial differential equations (PDEs) on complicated domains using the finite element method (FEM). This might sound obscure, but many real-world engineering needs reduce to this. Such things as:
Finding loads, stresses and strains in civil engineering structures with complicated real-world geometry (what happens when there is gusty wind blowing onto a building or bridge?)
How do currents flow in particular conductive objects?
Chemical reactions in various industrial reactors.
What is the power efficiency of a generator (magnet spinning in coil) design?
How to place aircon outlets in a nontrivially-shaped room to achieve both good temperature distribution and good efficiency?
Comsol, as any other FEM tool that can work with arbitrary equations, can do multiphysics, which means, for example, that one could solve for chemistry of a battery, as well as the temperature and pressure, and how that feeds back into the chemical reaction (speeds up or slows down). Compared with a tool where you need to provide the equations, in Comsol, most of the things that would be needed to solve most problems are already there, and just need to be selected and applied to the geometry, which is also built inside Comsol. Also, equations of arbitrary description can be introduced.
The physical descriptions of how these physical substances behave are called PDEs.
Once Comsol has finished solving a problem, the data could be exported for post-processing into Matlab, which has much more versatile tools for manipulating data and making various plots.

Lucas Kanade Optical Flow, Direction Vector

I am working on optical flow, and based on the lecture notes here and some samples on the Internet, I wrote this Python code.
All code and sample images are there as well. For small displacements of around 4-5 pixels, the direction of vector calculated seems to be fine, but the magnitude of the vector is too small (that's why I had to multiply u,v by 3 before plotting them).
Is this because of the limitation of the algorithm, or error in the code? The lecture note shared above also says that motion needs to be small "u, v are less than 1 pixel", maybe that's why. What is the reason for this limitation?
#belisarius says "LK uses a first order approximation, and so (u,v) should be ideally << 1, if not, higher order terms dominate the behavior and you are toast. ".
A standard conclusion from the optical flow constraint equation (OFCE, slide 5 of your reference), is that "your motion should be less than a pixel, less higher order terms kill you". While technically true, you can overcome this in practice using larger averaging windows. This requires that you do sane statistics, i.e. not pure least square means, as suggested in the slides. Equally fast computations, and far superior results can be achieved by Tikhonov regularization. This necessitates setting a tuning value(the Tikhonov constant). This can be done as a global constant, or letting it be adjusted to local information in the image (such as the Shi-Tomasi confidence, aka structure tensor determinant).
Note that this does not replace the need for multi-scale approaches in order to deal with larger motions. It may extend the range a bit for what any single scale can deal with.
Implementations, visualizations and code is available in tutorial format here, albeit in Matlab not Python.

Matlab and GPU/CUDA programming

I need to run several independent analyses on the same data set.
Specifically, I need to run bunches of 100 glm (generalized linear models) analyses and was thinking to take advantage of my video card (GTX580).
As I have access to Matlab and the Parallel Computing Toolbox (and I'm not good with C++), I decided to give it a try.
I understand that a single GLM is not ideal for parallel computing, but as I need to run 100-200 in parallel, I thought that using parfor could be a solution.
My problem is that it is not clear to me which approach I should follow. I wrote a gpuArray version of the matlab function glmfit, but using parfor doesn't have any advantage over a standard "for" loop.
Has this anything to do with the matlabpool setting? It is not even clear to me how to set this to "see" the GPU card. By default, it is set to the number of cores in the CPU (4 in my case), if I'm not wrong.
Am I completely wrong on the approach?
Any suggestion would be highly appreciated.
Edit
Thanks. I'm aware of GPUmat and Jacket, and I could start writing in C without too much effort, but I'm testing the GPU computing possibilities for a department where everybody uses Matlab or R. The final goal would be a cluster based on C2050 and the Matlab Distribution Server (or at least this was the first project).
Reading the ADs from Mathworks I was under the impression that parallel computing was possible even without C skills. It is impossible to ask the researchers in my department to learn C, so I'm guessing that GPUmat and Jacket are the better solutions, even if the limitations are quite big and the support to several commonly used routines like glm is non-existent.
How can they be interfaced with a cluster? Do they work with some job distribution system?
I would recommend you try either GPUMat (free) or AccelerEyes Jacket (buy, but has free trial) rather than the Parallel Computing Toolbox. The toolbox doesn't have as much functionality.
To get the most performance, you may want to learn some C (no need for C++) and code in raw CUDA yourself. Many of these high level tools may not be smart enough about how they manage memory transfers (you could lose all your computational benefits from needlessly shuffling data across the PCI-E bus).
Parfor will help you for utilizing multiple GPUs, but not a single GPU. The thing is that a single GPU can do only one thing at a time, so parfor on a single GPU or for on a single GPU will achieve the exact same effect (as you are seeing).
Jacket tends to be more efficient as it can combine multiple operations and run them more efficiently and has more features, but most departments already have parallel computing toolbox and not jacket so that can be an issue. You can try the demo to check.
No experience with gpumat.
The parallel computing toolbox is getting better, what you need is some large matrix operations. GPUs are good at doing the same thing multiple times, so you need to either combine your code somehow into one operation or make each operation big enough. We are talking a need for ~10000 things in parallel at least, although it's not a set of 1e4 matrices but rather a large matrix with at least 1e4 elements.
I do find that with the parallel computing toolbox you still need quite a bit of inline CUDA code to be effective (it's still pretty limited). It does better allow you to inline kernels and transform matlab code into kernels though, something that