Nvidia digits on TX2 Error code 1 - neural-network

I am new to Digits and TX2. I am trying to create object detection model using the tutorial from: https://github.com/dusty-nv/jetson-inference
I created dataset sucessfully. The issue is with the model
While creating a model, I am getting the following error.
Memory required for data: 3268934784
creating layer bbox_loss
Creating Layer bbox_loss
bbox_loss <- bboxes-obj-masked-norm
bbox_loss <- bbox-obj-label-norm
bbox_loss -> loss_bbox
Setting up bbox_loss
Top shape: (1)
with loss weight 2
Memory required for data: 3268934788
Creating layer coverage_loss
Creating Layer coverage_loss
coverage_loss <- coverage_coverage/sig_0_split_0
coverage_loss <- coverage-label_slice-label_4_split_0
coverage_loss -> loss_coverage
Setting up coverage_loss
Top shape: (1)
with loss weight 1
Memory required for data: 3268934792
Creating layer cluster
The job directory information on the left is:
Job Directory
/home/nvidia/DIGITS/digits/jobs/20180816-161051-e67a
Disk Size
0 B
Network (train/val)
train_val.prototxt
Network (deploy)
deploy.prototxt
Network (original)
original.prototxt
Solver
solver.prototxt
Raw caffe output
caffe_output.log
Pretrained Model
/home/nvidia/bvlc_googlenet.caffemodel.4
Visualizations
Tensorboard
The error on the server is
2018-08-16 16:10:53 [20180816-161051-e67a] [INFO ] Task subprocess args: "/home/nvidia/Caffe/caffe/build/tools/caffe train --solver=/home/nvidia/DIGITS/digits/jobs/20180816-161051-e67a/solver.prototxt --gpu=0 --weights=/home/nvidia/bvlc_googlenet.caffemodel.4"
2018-08-16 16:11:00 [20180816-161051-e67a] [ERROR] Train Caffe Model task failed with error code 1
I have no idea on how to free up memory as I have more than 2 gb available in the job directory.
Please help me. Thanks in advance.

Had the same issue for the last few days, maybe it will help someone in the future. Firstly, make sure that you have the right version of protobuf. You can check it with:
protoc --version
If it's 2.* you have to update to 3.*, for example to build it as listed here https://github.com/NVIDIA/DIGITS/blob/digits-6.0/docs/BuildProtobuf.md, and then rebuild the Caffe. Also, make sure that you have the compatible version of pip package of protobuf. For me the following version is working well right now for Digits and Caffe from the tutorial https://github.com/dusty-nv/jetson-inference :
pip install --user --upgrade protobuf==3.1.0.post1

Related

OpenModelica IBPSA example error spatialDistribution

I'm quite new to Modelica and I'm trying to understand some basic examples. I'm looking at the example IBPSA.Fluid.FixedResistances.Examples.PlugFlowPipe and checking this model gives me the following error
Number of classes to check: 2
Checking: model IBPSA.Fluid.FixedResistances.Examples.PlugFlowPipe... 0.2350000000001273 seconds -> FAILED!
Error String:
Error Buffer:
Warning: Requested package Modelica of version 3.2.2, but this package was already loaded with version 3.2.3. You might experience problems if these versions are incompatible.
[C:/Program Files/OpenModelica1.14.0-64bit/lib/omlibrary/IBPSA 3.0.0/Fluid/FixedResistances/BaseClasses/PlugFlowTransportDelay.mo:49:3-55:44:writable] Error: Function argument initialValues={time + pip.cor.timDel.t_in_start, time + pip.cor.timDel.t_out_start} in call to spatialDistribution has variability continuous which is not a parameter expression.
#[-], 0.2350000000001273, IBPSA.Fluid.FixedResistances.Examples.PlugFlowPipe
-------------------------------------------------------------------------
Checking skipped: package IBPSA.Fluid.FixedResistances.Examples.PlugFlowPipe.Medium...
[2] 11:48:12 Scripting Notification
Number of classes checked / failed: 2/1
It seems that the module pip.cor.timDel uses the function spatialDistributuon. My guess is that there is something wrong with pip.cor.timDel.t_in_start or pip.cor.timDel.t_out_start? It would be greatly appreciated if someone could help me with this.
P.S. I'm using OMEdit v1.14.0 on windows 10 with Modelica library v3.2.3
OpenModelica does not support the function spatialDistribution yet, i opened a ticket on Trac where you can follow for the current development status.

Trouble building u-boot for gumstix overo on yocto "thud" release

Attempting to build a yocto image using the "thud" release, bitbake fails on building the version of u-boot that comes with the meta-gumstix thud branch, which is 2016.03 (which seems antique?).
The error I'm seeing is regarding conflicting types, e.g.
ERROR: u-boot-v2016.03+gitAUTOINC+df61a74e68-r0 do_compile: oe_runmake failed
…
/home/kwisatz/yocto-new/build/tmp/work/overo-poky-linux-gnueabi/u-boot/v2016.03+gitAUTOINC+df61a74e68-r0/recipe-sysroot-native/usr/include/libfdt_env.h:71:30: error: conflicting types for 'fdt64_t'
typedef uint64_t FDT_BITWISE fdt64_t;
Searching the Internet for that, one quickly comes across a range of threads explaining that the problem is the libfdt-dev.h header that comes with the dtc package. Some recommend to blacklist or uninstall the dtc package, but from what I see, it's explicitly required by the u-boot recipe in the gumstix layer for yocto:
DEPENDS += "dtc-native"
See also https://patchwork.openembedded.org/patch/147816/
However, in the thread linked to above, we're talking versions 2018.01 and 2018.03, not 2016.03
The poky layer for thud brings u-boot 2018.07 which builds fine, but with that one, my overo (Airstorm-Y) won't boot anymore:
Booting from nand with DTS...
UBI: attaching mtd1 to ubi0
UBI: scanning is finished
UBI: attached mtd1 (name "mtd=4", size 1013 MiB) to ubi0
UBI: PEB size: 131072 bytes (128 KiB), LEB size: 129024 bytes
UBI: min./max. I/O unit sizes: 2048/2048, sub-page size 512
UBI: VID header offset: 512 (aligned 512), data offset: 2048
UBI: good PEBs: 8108, bad PEBs: 0, corrupted PEBs: 0
UBI: user volume: 1, internal volumes: 1, max. volumes count: 128
UBI: max/mean erase counter: 1/0, WL threshold: 4096, image sequence number: 1485359018
UBI: available PEBs: 0, total reserved PEBs: 8108, PEBs reserved for bad PEB handling: 160
** File not found /boot/omap3-overo-storm-tobi.dtb **
Loading file '/boot/zImage' to addr 0x82000000 with size 5097744 (0x004dc910)...
Done
Kernel image # 0x82000000 [ 0x000000 - 0x4dc910 ]
ERROR: Did not find a cmdline Flattened Device Tree
Could not find a valid device tree
I'm not entirely sure if this boot problem is related to the u-boot build or to the kernel image that I've built (see my previous thread)?
Any tips on how I could solve this issue? Is there a more recent version of u-boot in a gumstix layer for yocto that I haven't discovered just yet, or do you have any other tips on how I could get a working yocto image for my overo?
P.S. Note that during the build, I'm also seeing these warning, but I don't think there's an actual problem here:
WARNING: u-boot-v2016.03+gitAUTOINC+df61a74e68-r0 do_patch:
Some of the context lines in patches were ignored. This can lead to incorrectly applied patches.
The context lines in the patches can be updated with devtool:
devtool modify <recipe>
devtool finish --force-patch-refresh <recipe> <layer_path>
Then the updated patches and the source tree (in devtool's workspace)
should be reviewed to make sure the patches apply in the correct place
and don't introduce duplicate lines (which can, and does happen
when some of the context is ignored). Further information:
http://lists.openembedded.org/pipermail/openembedded-core/2018-March/148675.html
https://bugzilla.yoctoproject.org/show_bug.cgi?id=10450
Details:
Applying patch 0006-duovero-Read-eeprom-over-i2c.patch
patching file board/gumstix/duovero/duovero.c
patching file include/configs/duovero.h
Hunk #2 succeeded at 50 with fuzz 2 (offset -4 lines).
Now at patch 0006-duovero-Read-eeprom-over-i2c.patch
[…]
If you're looking for a new (development, not stable) image for the Overo I would recommend the Warrior branch
https://github.com/gumstix/yocto-manifest/tree/warrior
It has been tested and confirmed working for the Overo. The Thud branch was added to our repo to add support for the Raspberry Pi CM3+. For an older (stable) image, I would recommend Morty
https://github.com/gumstix/yocto-manifest/tree/morty
Thanks.
At least for me, with the same issue, i just removed
DEPENDS += "dtc-native"
and build completed.

Yocto Conflict between attempted installs

I have a conflict between a number of install files.
I am getting the below error:
Transaction Summary
================================================================================
Install 612 Packages
Total size: 110 M Installed size: 403 M Downloading Packages: Running
transaction check Transaction check succeeded. Running transaction
test Error: Transaction check error: file /etc/iproute2/rt_protos
conflicts between attempted installs of
base-files-3.0.14-r89.nexbox_a95x_s905x and iproute2-4.14.1-r0.aarch64
file /etc/iproute2/rt_tables conflicts between attempted installs of
base-files-3.0.14-r89.nexbox_a95x_s905x and iproute2-4.14.1-r0.aarch64
file /etc/sysctl.conf conflicts between attempted installs of
base-files-3.0.14-r89.nexbox_a95x_s905x and procps-3.3.12-r0.aarch64
Error Summary
-------------
ERROR: amlogic-image-headless-sd-1.0-r0 do_rootfs: Function failed:
do_rootfs ERROR: Logfile of failure stored in:
/home/user/amlogic-bsp/build/tmp/work/nexbox_a95x_s905x-poky-linux/amlogic-image-headless-sd/1.0-r0/temp/log.do_rootfs.29264
ERROR: Task
(/home/user/amlogic-bsp/meta-meson/recipes-core/images/amlogic-image-headless-sd.bb:do_rootfs)
failed with exit code '1' NOTE: Tasks Summary: Attempted 3131 tasks of
which 3130 didn't need to be rerun and 1 failed.
I have seen somewhere that I should pin a file, but how do I do this? I can't find a tutorial or any reference to what that means.
I am also getting the below warning. Is this related?
WARNING: Layer meson should set LAYERSERIES_COMPAT_meson in its
conf/layer.conf file to list the core layer names it is compatible
with.
I'm new to OE coming over from OpenWRT.
For bitbake, I've added the layers for the packages below:
meta-openwrt:- OE/Yocto metadata layer for OpenWRT
superna9999/meta-meson:- Upstream Linux Amlogic Meson Yocto/OpenEmbedded Layer
And tried compiling the nexbox-a95x-s905x image
I think the problem is that /etc/iproute2/rt_protos is provided by base-files which is coming from meta-openwrt as well as from iproute2 package which is coming from other OE layers. its not clear for the image builder which one to use and hence the conflict
You can solve it via defining a iproute2_%.bbappend file in meta-openwrt where this file gets deleted from iproute2 package and preference is given to the one openwrt provides
do_install_append() {
rm -rf ${D}${sysconfdir}/iproute2/rt_protos
}
should help.

Commands invalid after 'import_board_preset' command

Currently I am trying to follow the MathWorks tutorial 1 to register a TE0720 with a TE0701-6 carrier board in Matlab. I followed the instructions, designed the block design and exported it as advised. Using the Matlab HDL Workflow Advisor I can follow unitl step 4.1 Create Project. Here, I get the following error message:
invalid command name "CONFIG.PCW_INCLUDE_ACP_TRANS_CHECK"
while executing
"CONFIG.PCW_INCLUDE_ACP_TRANS_CHECK {0} CONFIG.PCW_IOPLL_CTRL_FBDIV {30} CONFIG.PCW_IO_IO_PLL_FREQMHZ {1000.000} CONFIG.PCW_IRQ_F2P_INTR {1} CONFIG..."
(procedure "create_root_design" line 49)
invoked from within
"create_root_design """
(file "vivado_custom_block_design.tcl" line 986)
while executing
"source vivado_custom_block_design.tcl"
(file "vivado_create_prj.tcl" line 15)
This is regarding the exported block design in the corresponding *.tlc file.
After deleting the line mentioned in the error, the error persists, but for the following line. This holds true until I deleted all lines following
CONFIG.PCW_IMPORT_BOARD_PRESET {preset}
It seems to me that once the preset for the board is imported, all following commands are seen as invalid. If I put this line in the end of the list though, I get the error
ERROR [Common 17-69] Command failed: Missing name/value pair in -dict argument.
If I remove this line, I get the error
ERROR [BD 41-1811] The interconnect </axi_interconnect_0> is missing a valid master interface connection
ERROR [Common 17-39] 'validate_bd_design' failed due to earlier errors.
Is there a way to fix this or what is the problem here?
EDIT: I am using Vivado 2017.4 from the Vivado HL WebPACK. Could it be that there is a feature not available in this version for rebuilding the project as MATLAB intends to do?
EDIT 2: I started the complete tutorial fresh from scratch again and now I only get the error
ERROR: [BD 41-1811] The interconnect </axi_interconnect_0> is missing a valid master Interface connection
when going throught the HDL Workflow Advisor. As far as I understand the issue, Vivado searches for something to connect the axi_interconnect to. But isn't this the interface port (DUT) as described later in the tutorial (end of step 2 in Register the custom reference design in HDL Workflow Advisor, where the compiled simulink model should be connected?

crash on the GPU with {inc,set}_subtensor and broadcasting the value

I am fine-tuning vgg16 network with keras 2.0.2 and theano 0.9.0 as backend on Windows10 64bit Anaconda 2 as this blog:https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
I find someone else had the same issue in the pull requests and it was fixed by changing a few lines of code (link: https://github.com/Theano/Theano/pull/2075). However , that's an old version of theano.(the pr was in 2014) . Theano 0.9.0 have already change the code and I still have this problem
every time I run the last line(i.e. model .fit_generator) , it shows that everything works fine until the last of first epoch. That's when exactly GPU always crash
model.fit_generator(
train_generator,
samples_per_epoch=2000,
nb_epoch=50,
validation_data=validation_generator,
nb_val_samples=400)
And here is the error message:
CudaNdarray_CopyFromCudaNdarray: need same dimensions for dim 0,
destination=32, source=16Apply node that caused the error:
GpuIncSubtensor{Set;::, ::, int64:int64:,
int64:int64:}(GpuAlloc{memset_0=True}.0,
GpuElemwise{mul,no_inplace}.0, Constant{1}, Constant{225},
Constant{1}, Constant{225})Toposort index: 143