can tensor flow using multi-cpu process? [duplicate] - neural-network

This question already has an answer here:
Configuring Tensorflow to use all CPU's
(1 answer)
Closed 6 years ago.
I have a desktop computer with 4 cpu.
can I run tensorflow with 4 cpu so it is faster than single cpu ?
like a MPI programme ?
and can I use tflearn to implement it ?
thanks very much !

Yes tensorflow will take advantage of the multiple cores on your CPU. When you build it from source you can build it to only run on the CPU or be able to run on both the CPU and GPU. Right now I have tensorflow built to run on my GPU but if I want to place it on the CPU I stick the following at the top of my python code.
os.environ['CUDA_VISIBLE_DEVICES'] = ''
This just sets an environment variable which can be done through the terminal as well. When I run it with no cuda devices aviable it runs all cores of my CPU at 100%. I would imagine that if you were to put this at the top of your tflearn code it would force it onto the cpu as well.

Related

How can I speed up Matlab matrix multiplication that is running very slowly on a VM on a Server?

I used matlab bench to bench matlab on a Laptop and on a Server:( VM ).
I also did matrix multiplication that shows a drastic difference.
But the bench of the server shows it to be better than the laptop.
Guessing the Windows:VM to be a problem. Not sure how to improve speed?
This is probably not a problem with Matlab or Windows. You probably just have a slow VM. That Xeon E5-2650L is eight years old now (launched in early 2012), and it doesn't look like you have many cores/vCPUs allocated to your VM.
That reference benchmark you posted is probably using all eight cores of the E5-2650L they're testing. I'm guessing that since you've only got 4 GB of RAM in your VM instance, you only have one or two vCPU cores allocated. So you're not getting nearly the performance that the benchmark indicates.
If you want your Matlab code to go faster, just upgrade your VM. Sorry there's no free fix here.

Tensorflow. Cifar10 Multi-gpu example performs worse with more gpus

I have to test the distributed version of tensorflow across multiple gpus.
I run the Cifar-10 multi-gpu example on an AWS g2.8x EC2 instance.
Running time for 2000 steps of the cifar10_multi_gpu_train.py (code here) was 427 seconds with 1 gpu (flag num_gpu=1). Afterwards the eval.py script returned precision # 1 = 0.537.
With the same example running for the same number of steps (with one step being executed in parallel across all gpus), but using 4 gpus (flag num_gpu=4) running time was about 530 seconds and the eval.py script returned only a slightly higher precision # 1 of 0.552 (maybe due to randomness in the computation?).
Why is the example performing worse with a higher number of gpus? I have used a very small number of steps for testing purposes and was expecting a much higher gain in precision using 4 gpus.
Did I miss something or made some basic mistakes?
Did someone else try the above example?
Thank you very much.
The cifar10 example uses variables on CPU by default, which is what you need for a multi-GPU architecture. You may achieve about 1.5x speed up compared to a single GPU setup with 2 GPUs.
Your problem has to do with the Dual GPU architecture for Nvidia Tesla K80. It has a PCIe switch to communicate both GPU cards internally. It shall introduce an overhead on communication. See block diagram:

How to enable multithreading in MATLAB?

Some of MATLAB's functions support multithreading and will make use of your multi-core architecture when available. Hence, I'm not referring to MATLAB support for parallel execution when you explicitly invoke it, e.g. using parfor.
In my code I'm running imregtform. My issue with using this function is that on one device (Win 8, x64, MATLAB 2014b) the function (called thousands of times) is maxing all my CPUs but whereas on another device (Win 7, x64, MATLAB 2014a) it is barely using half of my CPUs and only about 20%. Why is that? Is there a switch somewhere?
If tried some of the suggestions found in:
Checking if MATLAB is running in multithread mode and Matlab 2011a Use all Cores Available on 64 bit Linux?.
Any other suggestions?

Using matlabpool with a specified number of workers

I've been using the command matlabpool open 8 for a while in order to speed up things.
However I just tried using it and am denied 8 cores and now limited to 4.
My laptop is an i7 with 4 cores but hyperthreaded which meant I had no issue making matlab working on 8 virtual cores.
Simultaneously I noticed the following warning message:
Warning: matlabpool will be removed in a future release.
Use parpool instead.
Seems like MathsWorks decided this was a great update for some reason.
Any ideas how I can get my code running on 8 cores again?
Note: I was using R2010b (I think) and now using R2014b.
It looks like #horchler has provided you with a direct solution to your question in the comments.
However, I would recommend sticking to the default 4 workers suggested by MATLAB, and not using 8. You're very unlikely to get significant speedup by moving to 8, and you're even likely to slow things down a bit.
You have four physical cores, and they can only do so much work. Hyperthreading enables the operating system to pretend that there are 8 cores, by interleaving operations done on pairs of virtual cores.
This is great for applications such as Outlook, which are not compute-intensive, but require lots of operations to appear simultaneous in order, for example, to keep a GUI responsive while checking for email over a network connection.
But for compute-intensive applications such as MATLAB, it will not give you any sort of real speed up, as the operations are just interleaved - you haven't increased the amount of work that the 4 real, physical cores can do. In addition, there's a small overhead in performing the hyperthreading.
In my experience, MATLAB will benefit slightly by turning hyperthreading off. (Of course other things, such as Outlook, won't: your choice).

How to fully use the CPU in Matlab [Improving performance of a repetitive, time-consuming program]

I'm working on an adaptive and Fully automatic segmentation algorithm under varying light condition , the core of this algorithm uses Particle swarm Optimization(PSO) to tune the fuzzy system and believe me it's very time consuming :| for only 5 particles and 100 iterations I have to wait 2 to 3 hours ! and it's just processing one image from my data set containing over 100 photos !
I'm using matlab R2013 ,with a intel coer i7-2670Qm # 2.2GHz //8.00GB RAM//64-bit operating system
the problem is : when starting the program it uses only 12%-16% of my CPU and only one core is working !!
I've searched a lot and came into matlabpool so I added this line to my code :
matlabpool open 8
now when I start the program the task manger shows 98% CPU usage, but it's just for a few seconds ! after that it came back to 12-13% CPU usage :|
Do you have any idea how can I get this code run faster ?!
12 Percent sounds like Matlab is using only one Thread/Core and this one with with full load, which is normal.
matlabpool open 8 is not enough, this simply opens workers. You have to use commands like parfor, to assign work to them.
Further to Daniel's suggestion, ideally to apply PARFOR you'd find a time-consuming FOR loop in you algorithm where the iterations are independent and convert that to PARFOR. Generally, PARFOR works best when applied at the outermost level possible. It's also definitely worth using the MATLAB profiler to help you optimise your serial code before you start adding parallelism.
With my own simulations I find that I cannot recode them using Parfor, the for loops I have are too intertwined to take advantage of multiple cores.
HOWEVER:
You can open a second (and third, and fourth etc) instance of Matlab and tell this additional instance to run another job. Each instance of matlab open will use a different core. So if you have a quadcore, you can have 4 instances open and get 100% efficiency by running code in all 4.
So, I gained efficiency by having multiple instances of matlab open at the same time and running a job. My jobs took 8 to 27 hours at a time, and as one might imagine without liquid cooling I burnt out my cpu fan and had to replace it.
Also do look into optimizing your matlab code, I recently optimized my code and now it runs 40% faster.