I have an embarrassingly parallel job that requires no communication between the workers. I'm trying to use the dfeval function, but the overhead seems to be enormous. To get started, I'm trying to run the example from the documentation.
>> matlabpool open
Starting matlabpool using the 'local' configuration ... connected to 8 labs.
>> sched = findResource('scheduler','type','local')
sched =
Local Scheduler Information
===========================
Type : local
ClusterOsType : pc
ClusterSize : 8
DataLocation : C:\Users\~\AppData\Roaming\MathWorks\MATLAB\local_scheduler_data\R2010a
HasSharedFilesystem : true
- Assigned Jobs
Number Pending : 0
Number Queued : 0
Number Running : 1
Number Finished : 8
- Local Specific Properties
ClusterMatlabRoot : C:\Program Files\MATLAB\R2010a
>> matlabpool close force local
Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.
>> sched = findResource('scheduler','type','local')
sched =
Local Scheduler Information
===========================
Type : local
ClusterOsType : pc
ClusterSize : 8
DataLocation : C:\Users\~\AppData\Roaming\MathWorks\MATLAB\local_scheduler_data\R2010a
HasSharedFilesystem : true
- Assigned Jobs
Number Pending : 0
Number Queued : 0
Number Running : 0
Number Finished : 8
- Local Specific Properties
ClusterMatlabRoot : C:\Program Files\MATLAB\R2010a
>> tic;y = dfeval(#rand,{1 2 3},'Configuration', 'local');toc
Elapsed time is 4.442944 seconds.
Running subsequent times produces similar timings. So my questions are:
Why do I need to run matlabpool close force local to get the Number Running to zero, given that I run matlabpool open in a fresh instance?
Is five seconds of overhead really necessary for such a trivial example? especially given the Matlab workers have already been started up?
The DFEVAL function is a wrapper around submitting a job with one or more tasks to a given scheduler, in your case the 'local' scheduler. With the 'local' scheduler, each new task runs in a fresh MATLAB worker session, which is why you see the 4.5 second overhead - that's the time take to launch the worker, work out what to do, do it, and then quit.
The reason that you need the number of running jobs to be zero is that the local scheduler can only run a restricted number of workers.
In general, PARFOR with MATLABPOOL is an easier combination to use than DFEVAL. Also, when you open a MATLABPOOL, the workers are launched and ready, so the overhead of PARFOR is much less (but still not zero as the body of the loop needs to be sent to the workers).
Related
What is the correct method for using multiple CPU cores with jax.pmap?
The following example creates an environment variable for SPMD on CPU core backends, tests that JAX recognises the devices, and attempts a device lock.
import os
os.environ["XLA_FLAGS"] = '--xla_force_host_platform_device_count=2'
import jax as jx
import jax.numpy as jnp
jx.local_device_count()
# WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
# 2
jx.devices("cpu")
# [CpuDevice(id=0), CpuDevice(id=1)]
def sfunc(x): while True: pass
jx.pmap(sfunc)(jnp.arange(2))
Executing from a jupyter kernel and observing htop shows that only one core is locked
I receive the same output from htop when omitting the first two lines and running:
$ env XLA_FLAGS=--xla_force_host_platform_device_count=2 python test.py
Replacing sfunc with
def sfunc(x): return 2.0*x
and calling
jx.pmap(sfunc)(jnp.arange(2))
# ShardedDeviceArray([0., 2.], dtype=float32, weak_type=True)
does return a SharedDeviecArray.
Clearly I am not correctly configuring JAX/XLA to use two cores. What am I missing and what can I do to diagnose the problem?
As far as I can tell, you are configuring the cores correctly (see e.g. Issue #2714). The problem lies in your test function:
def sfunc(x): while True: pass
This function gets stuck in an infinite loop at trace-time, not at run-time. Tracing happens in your host Python process on a single CPU (see How to think in JAX for an introduction to the idea of tracing within JAX transformations).
If you want to observe CPU usage at runtime, you'll have to use a function that finishes tracing and begins running. For that you could use any long-running function that actually produces results. Here is a simple example:
def sfunc(x):
for i in range(100):
x = (x # x)
return x
jx.pmap(sfunc)(jnp.zeros((2, 1000, 1000)))
I'm using :
TYPO3 6.2
ke_search 2.2
Everything work fine except the indexing process, I mean :
If I manually index (with the backend module) it's OK, no error messages.
If I run manually the scheduler indexing task it's OK, no error messages.
If I run the scheduler with the php typo3/cli_dispatch.phpsh scheduler command, then I got this error :
Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to
allocate 87 bytes) in
/path_to_my_website/typo3/sysext/core/Classes/Cache/Frontend/VariableFrontend.php on line 99
For your information :
my PHP memory_limit setting is on 128M.
Other tasks are OK.
After this error appears on my console, the scheduler task is locked :
I can't figure out what's wrong ?
EDIT : I made flush frontend caches + flush general caches + flush system caches. If I run one more time the scheduler via the console, this is the new error I get :
Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to
allocate 12288 bytes) in
/path_to_my_website/typo3/sysext/core/Classes/Database/QueryGenerator.php
on line 1265
EDIT 2 : if I disable all my indexer configurations, all goes well. But if I enable even 1 configuration -> PHP error.
Here is one of the indexer file :
I am creating 4 mountpoint disk in Windows OS. I need to copy files up to a threshold value (say 50 GB).
I tried with vdbench. It works fine, but it throws an exception at last.
compratio=4
dedupratio=1
dedupunit=256k
* Host Definition section
hd=default,user=Administator,shell=vdbench,jvms=1
hd=localhost,system=localhost
********************************************************************************
* Storage Definition section
fsd=fsd1,anchor=C:\UnMapTest-Volume1\disk1\,depth=1,width=1,files=1,size=5g
fsd=fsd2,anchor=C:\UnMapTest-Volume2\disk2\,depth=1,width=1,files=1,size=5g
fwd=fwd1,fsd=fsd*,operation=write,xfersize=1m,fileio=sequential,fileselect=random,threads=10
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=1h,interval=1
Below is the exception from vdbench. Due to this my calling script would fail.
05:29:14.287 Message from slave localhost-0:
05:29:14.289 file=C:\UnMapTest-Volume1\disk1\\vdb.1_1.dir\vdb_f0001.file,busy=true
05:29:14.290 Thread: FwgThread write C:\UnMapTest-Volume1\disk1\ rd=rd1 For loops: None
05:29:14.291
05:29:14.292 last_ok_request: Thu Dec 28 05:28:57 PST 2017
05:29:14.292 Duration: 16.92 seconds
05:29:14.293 consecutive_blocks: 10001
05:29:14.294 last_block: FILE_BUSY File busy
05:29:14.294 operation: write
05:29:14.295
05:29:14.296 Do you maybe have more threads running than that you have
05:29:14.296 files and therefore some threads ultimately give up after 10000 tries?
05:29:14.300 *
05:29:14.301 ******************************************************
05:29:14.302 * Slave localhost-0 aborting: Too many thread blocks *
05:29:14.302 ******************************************************
05:29:14.303 *
05:29:21.235
05:29:21.235 Slave localhost-0 prematurely terminated.
05:29:21.235
05:29:21.235 Slave aborted. Abort message received:
05:29:21.235 Too many thread blocks
05:29:21.235
05:29:21.235 Look at file localhost-0.stdout.html for more information.
05:29:21.735
05:29:21.735 Slave localhost-0 prematurely terminated.
05:29:21.735
java.lang.RuntimeException: Slave localhost-0 prematurely terminated.
at Vdb.common.failure(common.java:335)
at Vdb.SlaveStarter.startSlave(SlaveStarter.java:198)
at Vdb.SlaveStarter.run(SlaveStarter.java:47)
I am using PowerShell in a Windows machine. Even if some other tools like Diskspd is having way to fill data up to some threshold then please provide me.
I found the answer by myself.
I have done this using Diskspd.exe as below
The following command fill 50 GB data in the mentioned disk folder
.\diskspd.exe -c50G -b4K -t2 C:\UnMapTest-Volume1\disk1\testfile1.dat
It is very simple than Vdbench for my requirement.
Caution : But it is not having real data so array side disk size is
not shown up with the size
In MATLAB, when I use the numlabs instruction in the Command Window after calling parpool(4), it is 1. But, when I use the same instruction in the parallel command window it is 4, Why?
>> parpool(4)
Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.
ans =
Pool with properties:
Connected: true
NumWorkers: 4
Cluster: local
AttachedFiles: {}
IdleTimeout: 30 minute(s) (30 minutes remaining)
SpmdEnabled: true
>> numlabs
ans =
1
>> pmode start
Starting pmode using the 'local' profile ... connected to 4 workers.
numlabs 4 4 4 4
numlabs returns the "total number of workers operating in parallel on current job". If you type the command after starting a pool, there obviously won't be any ongoing job. Also from the numlabs documentation:
In an spmd block, numlabs on each worker returns the parallel pool size.
However, inside a parfor-loop, numlabs always returns a value of 1.
If you just want the number of workers, you can use gcp (get current pool):
hp = gcp;
hp.NumWorkers
I think that is normal behavior, since numlabs should return 1 when not used inside a spmd block or during a parallel job. What you have done above is only open the parallel pool, but not asking it to do a parallel job.
I can't test right now, but if you call something like this:
spmd
some statement with numlabs
end
you will see printed in the Command Window that you are indeed using the 4 available workers.
I am trying to use parfor loop in Matlab 2013a. However, it gives the error cited below when I try to open matlabpool.
matlabpool open
Starting matlabpool using the 'local' profile ... stopped.
Error using matlabpool (line 144)
Failed to open matlabpool. (For information in addition to the causing error,
validate the profile 'local' in the Cluster Profile Manager.)
Caused by:
Error using parallel.internal.pool.InteractiveClient/start (line 281)
Failed to start matlabpool.
Error using parallel.Cluster/createCommunicatingJob (line 82)
The property "NumWorkersRange" cannot be set after submission.
Or when I use:
n = matlabpool('size')
it gives me 0 as answer.
How can I fix this problem?
My system is equipped with two 2.66 GHz 6core Intel Xeon processors.
MathWorks completely changed the command structure for the Parallel Toolbox. The command you want is parpool. I'd start from there. One nice new feature is that if you don't explicitly start a pool and then call a command like parfor, MATLAB will automatically start one for you.