I check the cudnn&tensorrt environment of MATLAB GPU coder in MATLAB 2021b and MATLAB 2022a independently, MATLAB 2021b passed and MATLAB 2022a failed.
Here is the picture:
In MATLAB 2021b
while in MATLAB 2022a:
meanwhile, the Basic Code Generation failed too.
My two question is:
why cudnn&tensort was configured correctly in MATLAB2021b, while error occurs in MATLAB 2022A?
what is the proper reason that the Basic Code generation failed both in MATLAB 2022a and 2021b
here is my gpu configuration
Index: 1
ComputeCapability: '8.6'
SupportsDouble: 1
DriverVersion: 11.6000
ToolkitVersion: 11
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.5769e+10
AvailableMemory: 2.4283e+10
MultiprocessorCount: 82
ClockRateKHz: 1695000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
Related
First of all sorry for my poor English
In my ceph cluster, when i run the ceph df detail command it shows me like as following result
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 62 TiB 52 TiB 10 TiB 10 TiB 16.47
ssd 8.7 TiB 8.4 TiB 370 GiB 377 GiB 4.22
TOTAL 71 TiB 60 TiB 11 TiB 11 TiB 14.96
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd-kubernetes 36 288 GiB 71.56k 865 GiB 1.73 16 TiB N/A N/A 71.56k 0 B 0 B
rbd-cache 41 2.4 GiB 208.09k 7.2 GiB 0.09 2.6 TiB N/A N/A 205.39k 0 B 0 B
cephfs-metadata 51 529 MiB 221 1.6 GiB 0 16 TiB N/A N/A 221 0 B 0 B
cephfs-data 52 1.0 GiB 424 3.1 GiB 0 16 TiB N/A N/A 424 0 B 0 B
So i have a question about the result
As you can see, sum of my pools used storage is less than 1 TB, But in RAW STORAGE section the used from HDD hard disks is 10TB and it is growing every day.I think this is unusual and something is wrong with this CEPH cluster.
And also FYI the output of ceph osd dump | grep replicated is
pool 36 'rbd-kubernetes' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 244 pg_num_target 64 pgp_num_target 64 last_change 1376476 lfor 2193/2193/2193 flags hashpspool,selfmanaged_snaps,creating tiers 41 read_tier 41 write_tier 41 stripe_width 0 application rbd
pool 41 'rbd-cache' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 1376476 lfor 2193/2193/2193 flags hashpspool,incomplete_clones,selfmanaged_snaps,creating tier_of 36 cache_mode writeback target_bytes 1000000000000 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 decay_rate 0 search_last_n 0 min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
pool 51 'cephfs-metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 31675 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
pool 52 'cephfs-data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 742334 flags hashpspool,selfmanaged_snaps stripe_width 0 application cephfs
Ceph Version ceph -v
ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
Ceph OSD versions ceph tell osd.* version return for all OSDs like
osd.0: {
"version": "ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)"
}
Ceph status ceph -s
cluster:
id: 6a86aee0-3171-4824-98f3-2b5761b09feb
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-sn-03,ceph-sn-02,ceph-sn-01 (age 37h)
mgr: ceph-sn-01(active, since 4d), standbys: ceph-sn-03, ceph-sn-02
mds: cephfs-shared:1 {0=ceph-sn-02=up:active} 2 up:standby
osd: 63 osds: 63 up (since 41h), 63 in (since 41h)
task status:
scrub status:
mds.ceph-sn-02: idle
data:
pools: 4 pools, 384 pgs
objects: 280.29k objects, 293 GiB
usage: 11 TiB used, 60 TiB / 71 TiB avail
pgs: 384 active+clean
According to the provided data, you should evaluate the following considerations and scenarios:
The replication size is inclusive, and once the min_size is achieved in a write operation, you receive a completion message. That means you should expect storage consumption with the minimum of min_size and maximum of the replication size.
Ceph stores metadata and logs for housekeeping purposes, obviously consuming storage.
If you do benchmark operation via "rados bench" or a similar interface with the --no-cleanup parameter, objects will be permanently stored within the cluster that consumes storage.
All the mentioned scenarios are a couple of possibilities.
I'm using an imx6slevk board with yocto-bsp.
I use mfgtool to flash the images and it works fine for 1GB DRAM.
Now I'm trying to change DRAM to 512 MB.
I modified the dts file memory node:
memory {
reg = <0x80000000 0x20000000>; //it was 0x40000000
};
I ran calibration tool and updated 2 registers
DATA 4 0x021b0848 0x4644484a //changed for 512 mb old value = 0x4241444a
DATA 4 0x021b0850 0x3a363a30 //changed for 512 mb old value = 0x3030312b
However u-boot log still shows 1 GiB DRAM in flash log:
U-Boot 2017.03-imx_v2017.03_4.9.88_2.0.0_ga+gb76bb1b (Sep 24 2019 - 11:04:03 +0530)
CPU: Freescale i.MX6SL rev1.2 996 MHz (running at 792 MHz)
CPU: Commercial temperature grade (0C to 95C) at 48C
Reset cause: POR
Model: Freescale i.MX6 SoloLite EVK Board
Board: MX6SLEVK
DRAM: 1 GiB
How can I change DRAM from 1 GiB o 512 MiB?
The kernel doesn't flash without this.
I have a CPU with 32 processors and each has 16 cores. Here is the truncated output for cat /proc/cpuinfo for 32'nd processor.
processor : 31
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2686 v4 # 2.30GHz
stepping : 1
microcode : 0xb000037
cpu MHz : 2700.787
cache size : 46080 KB
physical id : 0
siblings : 32
core id : 15
cpu cores : 16
apicid : 31
initial apicid : 31
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt ida
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 4600.08
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
What does it mean for OS? Can it run 32*16=512 processes completely in parallel?
However when I run the following python code I still get 32 as output.
import multiprocessing
print("Number of cpu : ", multiprocessing.cpu_count())
So can python only run 32 processes completely in parallel?
Your processor has 16 cores that allow you to run 32 threads in parallel. This means that python can parallelize (Multi-thread) across those 32 threads.
From the Intel Website:
A Thread, or thread of execution, is a software term for the basic
ordered sequence of instructions that can be passed through or
processed by a single CPU core.
Plainly, yes you can "only" run 32 threads in parallel.
I'm working on setup a Ceph cluster with Docker and image 'ceph/daemon:v3.1.0-stable-3.1-luminous-centos-7'. But after the cluster has been setup, the ceph status command never reaches HEALTH_OK. Here is my cluster's information. It has enough disk space and the network is all right.
My question are:
Why does Ceph not replicate the 'undersized' pages?
How to fix it?
Thank you very much!
➜ ~ ceph -s
cluster:
id: 483a61c4-d3c7-424d-b96b-311d2c6eb69b
health: HEALTH_WARN
Degraded data redundancy: 3 pgs undersized
services:
mon: 3 daemons, quorum pc-10-10-0-13,pc-10-10-0-89,pc-10-10-0-160
mgr: pc-10-10-0-89(active), standbys: pc-10-10-0-13, pc-10-10-0-160
mds: cephfs-1/1/1 up {0=pc-10-10-0-160=up:active}, 2 up:standby
osd: 5 osds: 5 up, 5 in
rbd-mirror: 3 daemons active
rgw: 3 daemons active
data:
pools: 6 pools, 68 pgs
objects: 212 objects, 5.27KiB
usage: 5.02GiB used, 12.7TiB / 12.7TiB avail
pgs: 65 active+clean
3 active+undersized
➜ ~ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 12.73497 root default
-5 0.90959 host pc-10-10-0-13
3 hdd 0.90959 osd.3 up 1.00000 1.00000
-7 0.90959 host pc-10-10-0-160
4 hdd 0.90959 osd.4 up 1.00000 1.00000
-3 10.91579 host pc-10-10-0-89
0 hdd 3.63860 osd.0 up 1.00000 1.00000
1 hdd 3.63860 osd.1 up 1.00000 1.00000
2 hdd 3.63860 osd.2 up 1.00000 1.00000
➜ ~ ceph osd pool ls detail
pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 24 flags hashpspool stripe_width 0 application cephfs
pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 24 flags hashpspool stripe_width 0 application cephfs
pool 3 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 13 pgp_num 13 last_change 27 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 13 pgp_num 13 last_change 30 flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 13 pgp_num 13 last_change 32 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 6 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 13 pgp_num 13 last_change 34 flags hashpspool stripe_width 0 application rgw
#itsafire This is not the solution. He is asking for solution not asking for hardware recommendation.
I'm running 8 nodes and 5 nodes multiple CEPH clusters. I always use 2 replica with multiple crush map (for SSD, SAS and 72k drives)
Why you need 3 replica if you are using a small cluster with limited resources.
Could you please explain why my solution is Recipe for disaster? You have good reputation and I'm not sure how did you get them. Maybe just replying recommendation not solution.
Create a new Pool with Size 2 and Min Size 1.
For pg-num use Ceph PG Calculator https://ceph.com/pgcalc/
It seems you created a three node cluster with different osd configurations and sizes. The standard crush rule tells ceph to have 3 copies of a PG on different hosts. If there is not enough space to spread the PGs over the three hosts, then your cluster will never be healthy.
It is always a good idea to start with a set of equally sized hosts (RAM, CPU, OSDs).
Update for discussion about cluster with size of 2 vs 3
Don't use 2 replicas. Go for 3. Ceph started out with a size default of 2. But this was changed to 3 in Ceph 0.82 (Firefly release).
Why ? Because if one drive fails you are left with only one drive containing your data. Should this drive fail too while recovery is running, then your data is gone for good.
See this thread on the ceph user mailing list
2 replicas isn't safe, no matter how big or small the cluster is. With
disks becoming larger recovery times will grow. In that window you don't
want to run on a single replica.
I do not know why but when y friend gets inebriated she like to hook her phone up to a PC and play with it. she has a basic knowledge of ADB and fastboot commmand and i verified with her what was thrown. When she went to re-lock the bootloader it did not with thisI did. she downloaded Google minimal sdk tools to get the updated ADB and Fastboot then went all the way and got mfastboot from Motorola to insure parsing for flashing. All of these fastboot packages were also tested on Mac and Linux Ubuntu, on Windows 8.1 Pro N Update 1 and Windows 7 Professional N SP2 (all x64). Resulted in the same errors. She is super thorough and I only taught here how to manually erase and flash no scripts or tool kits.
fastboot oem lock
and it returned.
(bootloader) FAIL: Please run fastboot oem lock begin first!
(bootloader) sst lock failure!
FAILED (remote failure)
finished. total time: 0.014s
Then tried again, then again, and then yep again. At this this point she either read the log and followed it. personally though I think based on the point she starts playing with phones it more likely she started to panic because she needs the bootloader locked for work and started attempting to flash.
fastboot oem lock begin
and it returned.
M:\SHAMU\FACTORY IMAGE\shamu-lmy47z>fastboot oem lock begin
...
(bootloader) Ready to flash signed images
OKAY [ 0.121s]
finished. total time: 0.123s
FACTORY IMAGE\shamu-lmy47z>fastboot flash boot boot.img
target reported max download size of 536870912 bytes
sending 'boot' (7731 KB)...
OKAY [ 0.252s]
writing 'boot'...
(bootloader) Preflash validation failed
FAILED (remote failure)
finished. total time: 0.271s
Then the bootloader log stated
cmd: oem lock
hab check failed for boot
failed to validate boot image
upon flashing boot.img the Bootloader Logs lists "Mismatched partition size (boot)".
intresting sometimes it returns
fastboot oem lock begin
...
(bootloader) Ready to flash signed images
OKAY [ 0.121s]
finished. total time: 0.123s
fastboot flash boot boot.img
target reported max download size of 536870912 bytes
sending 'boot' (7731 KB)...
OKAY [ 0.252s]
writing 'boot'...
(bootloader) Preflash validation failed
FAILED (remote failure)
finished. total time: 0.271s
I logged the partitions to see if they are zeroed out indicating bad emmc but they are not.
cat /proc/partitions
cat /proc/partitions
major minor #blocks name
179 0 61079552 mmcblk0
179 1 114688 mmcblk0p1
179 2 16384 mmcblk0p2
179 3 384 mmcblk0p3
179 4 56 mmcblk0p4
179 5 16 mmcblk0p5
179 6 32 mmcblk0p6
179 7 1024 mmcblk0p7
179 8 256 mmcblk0p8
179 9 512 mmcblk0p9
179 10 500 mmcblk0p10
179 11 4156 mmcblk0p11
179 12 384 mmcblk0p12
179 13 1024 mmcblk0p13
179 14 256 mmcblk0p14
179 15 512 mmcblk0p15
179 16 500 mmcblk0p16
179 17 4 mmcblk0p17
179 18 512 mmcblk0p18
179 19 1024 mmcblk0p19
179 20 1024 mmcblk0p20
179 21 1024 mmcblk0p21
179 22 1024 mmcblk0p22
179 23 16384 mmcblk0p23
179 24 16384 mmcblk0p24
179 25 2048 mmcblk0p25
179 26 32768 mmcblk0p26
179 27 256 mmcblk0p27
179 28 32 mmcblk0p28
179 29 128 mmcblk0p29
179 30 8192 mmcblk0p30
179 31 1024 mmcblk0p31
259 0 2528 mmcblk0p32
259 1 1 mmcblk0p33
259 2 8 mmcblk0p34
259 3 16400 mmcblk0p35
259 4 9088 mmcblk0p36
259 5 16384 mmcblk0p37
259 6 262144 mmcblk0p38
259 7 65536 mmcblk0p39
259 8 1024 mmcblk0p40
259 9 2097152 mmcblk0p41
259 10 58351488 mmcblk0p42
179 32 4096 mmcblk0rpmb
254 0 58351488 dm-0
Ive asked for log or the total process to see the full warning, error, and failure message but she is super far on business. From what I do have and what literature i have started to crack. I am starting to believe from all my research and learnng about the android boot proccess. Maybe there is a missing or corrupted key in the SST table which is I beleieved called the bigtable to google. or a hash password failure when locking the bootloader security down or i could be way off please let me know. What I do not know how to investigate or disprove this issue to move on. Would I be able to get confirmation through a stack trace for missing or corrupted coding. So then it can be a puzzle thats solved. Honestly though this has become a puzzle that begs to be solved not an emergency thanks.
You should try "fastboot flashing lock" command instead.