Parallel computing data extraction from a SQL database in Matlab - matlab

In my current setup I have a for loop in which I extract different type of data from a SQL database hosted on Amazon EC2. This extraction is done in the function extractData(variableName). After that the data gets parsed and stored as a mat file in parsestoreData(data):
variables = {'A','B','C','D','E'}
for i = 1:length(variables)
data = extractData(variables{i});
parsestoreData(data);
end
I would like to parallelize this extraction and parsing of the data and to speed up the process. I argue that I could do this using a parfor instead of for in the above example.
However, I am worried that the extraction will not be improved as the SQL database will get slowed down when multiple requests are made on the same database.
I am therefore wondering if Matlab can handle this issue in a smart way, in terms of parralelization?

The workers in parallel pool running parfor are basically full MATLAB processes without a UI, and they default to running in "single computational thread" mode. I'm not sure whether parfor will benefit you in this case - the parfor loop simply arranges for the MATLAB workers to execute the iterations of your loop in parallel. You can estimate for yourself how well your problem will parallelise by launching multiple full desktop MATLABs, and set them off running your problem simultaneously. I would run something like this:
maxNumCompThreads(1);
while true
t = tic();
data = extractData(...);
parsestoreData(data);
toc(t)
end
and then check how the times reported by toc vary as the number of MATLAB clients varies. If the times remain constant, you could reasonably expect parfor to give you benefit (because it means the body can be parallelised effectively). If however, the times decrease significantly as you run more MATLAB clients, then it's almost certain that parfor would experience the same (relative) slow-down.

Related

Parfor limitations [duplicate]

the code I'm dealing with has loops like the following:
bistar = zeros(numdims,numcases);
parfor hh=1:nt
bistar = bistar + A(:,:,hh)*data(:,:,hh+1)' ;
end
for small nt (10).
After timing it, it is actually 100 times slower than using the regular loop!!! I know that parfor can do parallel sums, so I'm not sure why this isn't working.
I run
matlabpool
with the out-of-the-box configurations before running my code.
I'm relatively new to matlab, and just started to use the parallel features, so please don't assume that I'm am not doing something stupid.
Thanks!
PS: I'm running the code on a quad core so I would expect to see some improvements.
Making the partitioning and grouping the results (overhead in dividing the work and gathering results from the several threads/cores) is high for small values of nt. This is normal, you would not partition data for easy tasks that can be performed quickly in a simple loop.
Always perform something challenging inside the loop that is worth the partitioning overhead. Here is a nice introduction to parallel programming.
The threads come from a thread pool so the overhead of creating the threads should not be there. But in order to create the partial results n matrices from the bistar size must be created, all the partial results computed and then all these partial results have to be added (recombining). In a straight loop, this is with a high probability done in-place, no allocations take place.
The complete statement in the help (thanks for your link hereunder) is:
If the time to compute f, g, and h is
large, parfor will be significantly
faster than the corresponding for
statement, even if n is relatively
small.
So you see they mean exactly the same as what I mean, the overhead for small n values is only worth the effort if what you do in the loop is complex/time consuming enough.
Parforcomes with a bit of overhead. Thus, if nt is really small, and if the computation in the loop is done very quickly (like an addition), the parfor solution is slower. Furthermore, if you run parforon a quad-core, speed gain will be close to linear for 1-3 cores, but less if you use 4 cores, since the last core also needs to run system processes.
For example, if parfor comes with 100ms of overhead, and the computation in the loop takes 5ms, and if we assume that speed gain is linear up to 4 cores with a coefficient of 1 (i.e. using 4 cores makes the computation 4 times faster), nt needs to be about 30 for you to achieve a speed gain with parfor (150ms with for, 132ms with parfor). If you were to run only 10 iterations, parfor would be slower (50ms with for, 112ms with parfor).
You can calculate the overhead on your machine by comparing execution time with 1 worker vs 0 workers, and you can estimate speed gain by making a liner fit through the execution times with 1 to 4 workers. Then you'll know when it's useful to use parfor.
Besides the bad performance because of the communication overhead (see other answers), there is another reason not to use parfor in this case. Everything which is done within the parfor in this case uses built-in multithreading. Assuming all workers are running on the same PC there is no advantage because a single call already uses all cores of your processor.

Matlab parfor execution speed [duplicate]

the code I'm dealing with has loops like the following:
bistar = zeros(numdims,numcases);
parfor hh=1:nt
bistar = bistar + A(:,:,hh)*data(:,:,hh+1)' ;
end
for small nt (10).
After timing it, it is actually 100 times slower than using the regular loop!!! I know that parfor can do parallel sums, so I'm not sure why this isn't working.
I run
matlabpool
with the out-of-the-box configurations before running my code.
I'm relatively new to matlab, and just started to use the parallel features, so please don't assume that I'm am not doing something stupid.
Thanks!
PS: I'm running the code on a quad core so I would expect to see some improvements.
Making the partitioning and grouping the results (overhead in dividing the work and gathering results from the several threads/cores) is high for small values of nt. This is normal, you would not partition data for easy tasks that can be performed quickly in a simple loop.
Always perform something challenging inside the loop that is worth the partitioning overhead. Here is a nice introduction to parallel programming.
The threads come from a thread pool so the overhead of creating the threads should not be there. But in order to create the partial results n matrices from the bistar size must be created, all the partial results computed and then all these partial results have to be added (recombining). In a straight loop, this is with a high probability done in-place, no allocations take place.
The complete statement in the help (thanks for your link hereunder) is:
If the time to compute f, g, and h is
large, parfor will be significantly
faster than the corresponding for
statement, even if n is relatively
small.
So you see they mean exactly the same as what I mean, the overhead for small n values is only worth the effort if what you do in the loop is complex/time consuming enough.
Parforcomes with a bit of overhead. Thus, if nt is really small, and if the computation in the loop is done very quickly (like an addition), the parfor solution is slower. Furthermore, if you run parforon a quad-core, speed gain will be close to linear for 1-3 cores, but less if you use 4 cores, since the last core also needs to run system processes.
For example, if parfor comes with 100ms of overhead, and the computation in the loop takes 5ms, and if we assume that speed gain is linear up to 4 cores with a coefficient of 1 (i.e. using 4 cores makes the computation 4 times faster), nt needs to be about 30 for you to achieve a speed gain with parfor (150ms with for, 132ms with parfor). If you were to run only 10 iterations, parfor would be slower (50ms with for, 112ms with parfor).
You can calculate the overhead on your machine by comparing execution time with 1 worker vs 0 workers, and you can estimate speed gain by making a liner fit through the execution times with 1 to 4 workers. Then you'll know when it's useful to use parfor.
Besides the bad performance because of the communication overhead (see other answers), there is another reason not to use parfor in this case. Everything which is done within the parfor in this case uses built-in multithreading. Assuming all workers are running on the same PC there is no advantage because a single call already uses all cores of your processor.

parfor not giving speed ups [duplicate]

the code I'm dealing with has loops like the following:
bistar = zeros(numdims,numcases);
parfor hh=1:nt
bistar = bistar + A(:,:,hh)*data(:,:,hh+1)' ;
end
for small nt (10).
After timing it, it is actually 100 times slower than using the regular loop!!! I know that parfor can do parallel sums, so I'm not sure why this isn't working.
I run
matlabpool
with the out-of-the-box configurations before running my code.
I'm relatively new to matlab, and just started to use the parallel features, so please don't assume that I'm am not doing something stupid.
Thanks!
PS: I'm running the code on a quad core so I would expect to see some improvements.
Making the partitioning and grouping the results (overhead in dividing the work and gathering results from the several threads/cores) is high for small values of nt. This is normal, you would not partition data for easy tasks that can be performed quickly in a simple loop.
Always perform something challenging inside the loop that is worth the partitioning overhead. Here is a nice introduction to parallel programming.
The threads come from a thread pool so the overhead of creating the threads should not be there. But in order to create the partial results n matrices from the bistar size must be created, all the partial results computed and then all these partial results have to be added (recombining). In a straight loop, this is with a high probability done in-place, no allocations take place.
The complete statement in the help (thanks for your link hereunder) is:
If the time to compute f, g, and h is
large, parfor will be significantly
faster than the corresponding for
statement, even if n is relatively
small.
So you see they mean exactly the same as what I mean, the overhead for small n values is only worth the effort if what you do in the loop is complex/time consuming enough.
Parforcomes with a bit of overhead. Thus, if nt is really small, and if the computation in the loop is done very quickly (like an addition), the parfor solution is slower. Furthermore, if you run parforon a quad-core, speed gain will be close to linear for 1-3 cores, but less if you use 4 cores, since the last core also needs to run system processes.
For example, if parfor comes with 100ms of overhead, and the computation in the loop takes 5ms, and if we assume that speed gain is linear up to 4 cores with a coefficient of 1 (i.e. using 4 cores makes the computation 4 times faster), nt needs to be about 30 for you to achieve a speed gain with parfor (150ms with for, 132ms with parfor). If you were to run only 10 iterations, parfor would be slower (50ms with for, 112ms with parfor).
You can calculate the overhead on your machine by comparing execution time with 1 worker vs 0 workers, and you can estimate speed gain by making a liner fit through the execution times with 1 to 4 workers. Then you'll know when it's useful to use parfor.
Besides the bad performance because of the communication overhead (see other answers), there is another reason not to use parfor in this case. Everything which is done within the parfor in this case uses built-in multithreading. Assuming all workers are running on the same PC there is no advantage because a single call already uses all cores of your processor.

Can I run a script on multiple MATLAB sessions instead of parallelizing the script?

I have a script which solves a system of differential equations for many parameters in a for loop. ( iterations are completely independent, but at the end of each iteration , a large matrix ( mat ) is modified according to the results of the computation ). Here is the code: (B is a matrix containing parameters)
mat=zeros(20000,1);
for n=1:20000
prop=B(n,:); % B is a (20000 * 2 ) matrix that contains U and V parameters
U=prop(1);
V=prop(2);
options=odeset('RelTol',1e-6,'AbsTol',1e-20);
[T,X]=ode45(#acceleration,tspan,x0,options);
rad=X(:,1);
if max(rad)<radius % radius is a constant
mat(n)=1;
end
function xprime=acceleration(T,X)
.
.
.
end
First I tried to use parfor, but because the acceleration function (ode45 input) was defined as an inline function, (to achieve better performance) I couldn't do that .
Can I open 4 MATLAB sessions (my CPU has 4 cores) and run the code separately in each session , instead of modifying the code to implement acceleration as a separate function, and therefore , using parfor? Does it give 4X the performance of running on one session? (or does it give same performance as parallelized code ? - in parallel code I can't define inline functions-)
(on Windows)
If you're prepared do to the work of separating out the problem to run separately in 4 sessions, then reassemble the results, sure you can do that. In my experience (on Windows) it actually runs faster to run code in four separate sessions than to have a parfor loop with 4 workers. Not quite as fast as 4x performance of a single session, because the operating system will have other work to do... so for example if you have no other processor-heavy applications running, the OS itself might take up 25% of one core, leaving you with maybe 3.75x performance of a single session. However, this assumes you have enough memory for that not be the limiting factor.
If you wanted to do this regularly you might need to create some file-based signalling/data passing system.
This is obviously not as elegant as a parfor, but is workable for your situation, or if you can't afford the license fee for the parallel toolbox.

Running identical Matlab scripts on multiple local threads

I have a quad-core desktop computer
I have the Parallel Computing toolbox in Matlab.
I have a script file that I need to run simultaneously on each core
I'm not sure what the most efficient way to do this is, I know I can create a 'matlabpool' with 4 local workers, but how do I then assign the same script to each one? Or can I use the 'batch' command to run the script on a specific thread, then do that for each one?
Thank you!
You can run a single script using multiple cores using the Parallel Computing toolbox, by using
matlabpool open local 4
and using parfor instead of for loops to execute whatever is in your loop across four threads. I'm not sure if Parallel Computing toolbox supports running the entirety of the script individually on each core, this will likely not be supported by your hardware.
Not sure if this works, but here is something to try:
When trying to paralelize calculations, they are usually wrapped with something like parfor
So I would recommend doing the same with your script, make sure that all required inputs and outputs have the neccesary dimensions and just call:
parfor ii = 1:4
myscript;
end
Sidenote: Before trying this kind of stuff you may want to check your cpu utilization. If it is already high that means that the inner part of the code uses parallel processing and you should not expect much speedup.