I tried to run Matlab program on gpu (CentOS 7.3).
This Matlab use caffe.
When I run it from the command line with:
matlab -nodisplay -r "demo, quit"
it run okay.
When I run it with LSF command:
bsub -q gpu -R "select[ngpus>0] rusage[ngpus_shared=1]" matlab -nodisplay -r "demo, quit"
I get the error :
ERROR: No OpenCL platforms found, check OpenCL installation
I comprare the LD_PATH_LIBRARY - are the same.
What can be the problem?
Any ideas are welcome!
clinfo output:
Number of platforms 1
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.2 CUDA 8.0.0
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts
Platform Extensions function suffix NV
Platform Name NVIDIA CUDA
Number of devices 1
Device Name Tesla K40m
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 CUDA
Driver Version 375.26
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Available Yes
Device Profile FULL_PROFILE
Device Topology (NV) PCI-E, 09:00.0
Max compute units 15
Max clock frequency 745MHz
Compute Capability (NV) 3.5
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Compiler Available Yes
Linker Available Yes
Preferred work group size multiple 32
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 11995578368 (11.17GiB)
Error Correction support Yes
Max memory allocation 2998894592 (2.793GiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Global Memory cache type Read/Write
Global Memory cache size 245760 (240KiB)
Global Memory cache line 128 bytes
Image support Yes
Max number of samplers per kernel 32
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 4096x4096x4096 pixels
Max number of read image args 256
Max number of write image args 16
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max constant buffer size 65536 (64KiB)
Max number of constant args 9
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) No
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [NV]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
My best guess would be that the bsub command from LSF schedules the job on another machine (compute node) in a cluster, where OpenCL is not installed.
Having OpenCL/CUDA on the frontend, but not the compute nodes of a cluster is something I've witnessed quite a few times. Even parts of the filesystem with the libraries are shared, the folder /etc/OpenCL/vendors, used for OpenCLs ICD mechanism must be present.
You could try running clinfo via bsub (if you didn't already), or use bsub to execute ls /etc/OpenCL/vendors.
If you're not sure whether or not the LSF-submitted jobs run on the same machine or not, use the hostname command with and without bsub.
Hope that helps.
Related
Attempting to build a yocto image using the "thud" release, bitbake fails on building the version of u-boot that comes with the meta-gumstix thud branch, which is 2016.03 (which seems antique?).
The error I'm seeing is regarding conflicting types, e.g.
ERROR: u-boot-v2016.03+gitAUTOINC+df61a74e68-r0 do_compile: oe_runmake failed
…
/home/kwisatz/yocto-new/build/tmp/work/overo-poky-linux-gnueabi/u-boot/v2016.03+gitAUTOINC+df61a74e68-r0/recipe-sysroot-native/usr/include/libfdt_env.h:71:30: error: conflicting types for 'fdt64_t'
typedef uint64_t FDT_BITWISE fdt64_t;
Searching the Internet for that, one quickly comes across a range of threads explaining that the problem is the libfdt-dev.h header that comes with the dtc package. Some recommend to blacklist or uninstall the dtc package, but from what I see, it's explicitly required by the u-boot recipe in the gumstix layer for yocto:
DEPENDS += "dtc-native"
See also https://patchwork.openembedded.org/patch/147816/
However, in the thread linked to above, we're talking versions 2018.01 and 2018.03, not 2016.03
The poky layer for thud brings u-boot 2018.07 which builds fine, but with that one, my overo (Airstorm-Y) won't boot anymore:
Booting from nand with DTS...
UBI: attaching mtd1 to ubi0
UBI: scanning is finished
UBI: attached mtd1 (name "mtd=4", size 1013 MiB) to ubi0
UBI: PEB size: 131072 bytes (128 KiB), LEB size: 129024 bytes
UBI: min./max. I/O unit sizes: 2048/2048, sub-page size 512
UBI: VID header offset: 512 (aligned 512), data offset: 2048
UBI: good PEBs: 8108, bad PEBs: 0, corrupted PEBs: 0
UBI: user volume: 1, internal volumes: 1, max. volumes count: 128
UBI: max/mean erase counter: 1/0, WL threshold: 4096, image sequence number: 1485359018
UBI: available PEBs: 0, total reserved PEBs: 8108, PEBs reserved for bad PEB handling: 160
** File not found /boot/omap3-overo-storm-tobi.dtb **
Loading file '/boot/zImage' to addr 0x82000000 with size 5097744 (0x004dc910)...
Done
Kernel image # 0x82000000 [ 0x000000 - 0x4dc910 ]
ERROR: Did not find a cmdline Flattened Device Tree
Could not find a valid device tree
I'm not entirely sure if this boot problem is related to the u-boot build or to the kernel image that I've built (see my previous thread)?
Any tips on how I could solve this issue? Is there a more recent version of u-boot in a gumstix layer for yocto that I haven't discovered just yet, or do you have any other tips on how I could get a working yocto image for my overo?
P.S. Note that during the build, I'm also seeing these warning, but I don't think there's an actual problem here:
WARNING: u-boot-v2016.03+gitAUTOINC+df61a74e68-r0 do_patch:
Some of the context lines in patches were ignored. This can lead to incorrectly applied patches.
The context lines in the patches can be updated with devtool:
devtool modify <recipe>
devtool finish --force-patch-refresh <recipe> <layer_path>
Then the updated patches and the source tree (in devtool's workspace)
should be reviewed to make sure the patches apply in the correct place
and don't introduce duplicate lines (which can, and does happen
when some of the context is ignored). Further information:
http://lists.openembedded.org/pipermail/openembedded-core/2018-March/148675.html
https://bugzilla.yoctoproject.org/show_bug.cgi?id=10450
Details:
Applying patch 0006-duovero-Read-eeprom-over-i2c.patch
patching file board/gumstix/duovero/duovero.c
patching file include/configs/duovero.h
Hunk #2 succeeded at 50 with fuzz 2 (offset -4 lines).
Now at patch 0006-duovero-Read-eeprom-over-i2c.patch
[…]
If you're looking for a new (development, not stable) image for the Overo I would recommend the Warrior branch
https://github.com/gumstix/yocto-manifest/tree/warrior
It has been tested and confirmed working for the Overo. The Thud branch was added to our repo to add support for the Raspberry Pi CM3+. For an older (stable) image, I would recommend Morty
https://github.com/gumstix/yocto-manifest/tree/morty
Thanks.
At least for me, with the same issue, i just removed
DEPENDS += "dtc-native"
and build completed.
When I am writing message length is more than 1024B(mtu), it failed in softroce mode, pls help check why.
Using the standard tool ib_write_lat to test:
when ib_write_lat -s 1024 -n 5
When ib_write_lat -s 1025 -n 5, it fails.
My softroce version in in Red Hat Enterprise Linux Server release 7.4 (Maipo)
Is it a bug in softroce?
No it isn't a bug. I had similar problems.
What did you configure at your interface configuration?
I expect that you have a MTU of 1500 Bytes configured (or leaved the default value), this will result in RoCE using 1024. If you configure your interface MTU to 4200 you can use the ib_write_lat command with up to 4096 bytes.
InfiniBand protocol Maximum Transmission Unit (MTU) defines several fix size MTU: 256, 512, 1024, 2048 or 4096 bytes.
RoCE based application that uses RDMA that runs over Ethernet should take into account that the RoCE MTU is smaller than the Ethernet MTU. (normally 1500 is the default).
https://community.mellanox.com/docs/DOC-1447
On recent Intel CPUs it's possible to count the number of SMIs that have occurred, by reading msr 0x34.
I have checked the manuals at -
https://developer.amd.com/resources/developer-guides-manuals/
for an equivalent register/function, without success.
AMD Zen specifies the LsSmiRx performance counter for System Management Interrupts (SMIs):
PMCx02B [SMIs Received] (Core::X86::Pmc::Core::LsSmiRx)
Counts the number of SMIs received.
(Open-Source
Register Reference
For AMD Family 17h Processors
Models 00h-2Fh. Rev 3.03, 2018, page 153)
On Linux, you can monitor it like this:
# perf stat -e ls_smi_rx -I 60000
This command prints each minute a count of all newly triggered SMIs aggregated over all CPUs.
That means for monitoring - unlike with the MSR_SMI_COUNT register available on Intel CPUs - you have to actively program a PMU register (to observe the LsSmiRx event).
NB: The above referenced AMD documentation confirms that AMD Zen doesn't support the SMI_COUNT MSR (0x34), since it isn't included in the list of available MSRs (in Chapter 2.1.10, page 77).
No, but SMI count is available as a PMC (performance counter) on AMD processors.
I am trying to access the framebuffer on my systems VGA controller card.
lscpi -vn gives:
00:02.0 0300: 8086:2a02 (rev 0c) (prog-if 00 [VGA controller])
Subsystem: 1028:022f
Flags: bus master, fast devsel, latency 0, IRQ 45
Memory at fea00000 (64-bit, non-prefetchable) [size=1M]
Memory at e0000000 (64-bit, prefetchable) [size=256M]
I/O ports at eff8 [size=8]
Expansion ROM at <unassigned> [disabled]
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 3
Kernel driver in use: i915
Now, I access the device and I get:
fb_base = pci_resource_start( devp, 0 ); **output: FEA00000**
fb_size = pci_resource_len( devp, 0 ); **output: 1MB**
So the range of framebuffer is FEA00000 - FEB00000
But from the lspci -vn output This region is non prefetchable.
Does that mean I am not pointing to the frame buffer at all.
Is my framebuffer at address E0000000:
The driver currently using the resource is the Intel i915
So maybe when I request region or IRQ it can clash if not shared by that driver.
If I remove the i915 rmmod it to insmod my driver, will my screen go blank.
Please help.
Thanks.
I am trying to run code in Matlab that uses the Psychtoolbox and OpenGL. The commands that throw the error described below are:
PsychJavaTrouble
AssertOpenGL
Here are my specs:
OS: Ubuntu 14.04 LTS, 64bit
Processor: Intel Core i5-2450M CPT # 2.50GHz x 4
Graphics: Intel Sandybridge Mobile
Matlab Version: Matlab 64-Bit (Version 3.0.11 - Build date: Apr 6 2014)
Psychophysics version installed: 3
Installation methodology:
1. sudo apt-get install psychtoolbox in Terminal
2. updated it via UpdatePsychToolbox command in Matlab console
Here is the error message:
PsychJavaTrouble: Will now try to add the PsychJava folder to Matlabs dynamic
classpath...
Warning: "/home/lillian/Desktop/Matlab/Mona_Lisa/Psychtoolbox/PsychJava" is already
specified on static java path.
> In javaclasspath>local_validate_dynamic_path at 285
In javaclasspath>local_javapath at 182
In javaclasspath at 119
In javaaddpath at 71
In PsychJavaTrouble at 86
In ReverseCorrelationFaces at 2
PsychJavaTrouble: Added PsychJava folder to dynamic class path. Psychtoolbox Java
commands should work now!
PTB-INFO: Display ':0' : X-Screen 0 : Assigning primary output as 0 with RandR-CRTC
0 and GPU-CRTC 0.
PTB-INFO: This is Psychtoolbox-3 for GNU/Linux X11, under Matlab 64-Bit (Version
3.0.11 - Build date: Apr 6 2014).
PTB-INFO: No low-level controllable GPU on screenId 0. Beamposition timestamping and
other special functions disabled.
PTB-INFO: Failed to enable realtime-scheduling [Operation not permitted]!
PTB-DEBUG:PsychOSGetSwapCompletionTimestamp: Invalid return values ust = 0, msc = 0
from call with success return code (sbc = 304)! Failing with rc = -2.
PTB-DEBUG:PsychOSGetSwapCompletionTimestamp: This likely means a driver bug or
malfunction, or that timestamping support has been disabled by the user in the
driver!
PTB-INFO: OpenGL-Renderer is Intel Open Source Technology Center :: Mesa DRI
Intel(R) Sandybridge Mobile :: 3.0 Mesa 10.1.3
PTB-INFO: VBL startline = 768 , VBL Endline = -1
PTB-INFO: Will try to use OS-Builtin OpenML sync control support for accurate Flip
timestamping.
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.685075 ms [59.933804
Hz]. (297 valid samples taken, stddev=0.310528 ms.)
PTB-INFO: Reported monitor refresh interval from operating system = 16.646968 ms
[60.070999 Hz].
PTB-INFO: Small deviations between reported values are normal and no reason to
worry.
WARNING: Couldn't compute a reliable estimate of monitor refresh interval! Trouble
with VBL syncing?!?
----- ! PTB - ERROR: SYNCHRONIZATION FAILURE ! ----
One or more internal checks (see Warnings above) indicate that synchronization
of Psychtoolbox to the vertical retrace (VBL) is not working on your setup.
This will seriously impair proper stimulus presentation and stimulus presentation
timing!
Please read 'help SyncTrouble' for information about how to solve or work-around the
problem.
You can force Psychtoolbox to continue, despite the severe problems, by adding the
command
Screen('Preference', 'SkipSyncTests', 1); at the top of your script, if you really
know what you are doing.
Error using Screen
See error message printed above.
Error in ReverseCorrelationFaces (line 81)
window=Screen('OpenWindow', windowNum);
What am I missing? A package? Is my hardware not okay? I can't figure this error out.
So.. buried deep inside the DownloadPsychtoolbox.m file found here (see installation instructions here), is the instruction that apparently Psychtoolbox requires a special SDK. Super annoying. I will never use this toolbox again because it's so much drama to use. But this is what was missing that was causing the Screen call to fail
Missing SDK download link:
http://docs.gstreamer.com/display/GstSDK/Installing+on+Windows