I've been struggling lately in how to get o2ib to function properly with a particular MOFED version. What I've tried so far is to install the Lustre kernel, rebuild MOFED for that Lustre kernel (which appears to be working) and then observe that ib0 is listed upon a reboot, and install the generic Lustre kmod-lustre kmod-lustre-osd-ldiskfs lustre-osd-ldiskfs-mount lustre lustre-resource-agents. However just because ib0 is there, does not mean that o2ib presents itself in Lustre. Even running "lnetctl net add --net o2ib --if ib0" gives nothing but errors that the interface cannot be found.
I have tried rebuilding Lustre serveral times to try and get the o2ib interface, but to no avail. The rpms are built, but when I install them the situation is no better. My process is as follows (for Lustre 2.12):
git clone git://git.whamcloud.com/fs/lustre-release.git
cd lustre-release
git checkout 2.12.0
sh autogen.sh
./configure --with-o2ib=/usr/src/ofa_kernel/default/
make rpms
Would anyone have any suggestons?
Thanks!
Actually you should do the other way around: you should compile Lustre to properly use your MOFED and your kernel. This is the order of dependencies:
Your kernel (e.g: 3.10.0-1127.8.2.el7.x86_64)
Your MOFED has to be compiled for your kernel. If your kernel is one of the ones available from Mellanox you just need to install the rpms or let the MOFED installer do it for you:
# ./mlnxofedinstall
If you're using a different kernel you need to recompile MOFED (you need to install kernel-devel for this) with support for your kernel:
# ./mlnxofedinstall --add-kernel-support
Last, you'll have to rebuild Lustre against your kernel (kernel-devel) AND your MOFED (mlnx-ofa_kernel-devel):
# ./configure --with-linux=/usr/src/kernels/3.10.0-1127.8.2.el7.x86_64/ --with-o2ib=/usr/src/ofa_kernel/default/
Now your MOFED is ready to run on top of your kernel and your Lustre RPMs are ready to run on top of your kernel and the o2ib driver will use the symbols compiled for your MOFED.
Related
Note: This problem, which I have already solved, is a very different problem from every other similar question on Stack Overflow. I have posted this question and answer in the hopes that it will help someone else experiencing the same issue (or so that, when I have this problem again in 3 years, I'll find this answer).
I am running VirtualBox 6.1.26 on macOS Catalina 10.15.7. I am emulating centOS 7:
$ uname -r
3.10.0-1160.36.2.el7.x86_64
I "inserted" the VirtualBox Guest Additions CD and followed the auto-run prompts to install the Guest Additions. Part way through, it aborted, saying:
This system is currently not set up to build kernel modules.
Please install the gcc make perl packages from your distribution.
Note that I have gcc, make, perl, kernel-devel, and kernel-headers all installed. It also prompted me to check the file /var/log/vboxadd-setup.log for more details. The contents of that log were interesting:
Building the main Guest Additions 6.1.26 module for kernel 3.10.0-1160.36.2.el7.x86_64.
Error building the module. Build output follows.
make V=1 CONFIG_MODULE_SIG= CONFIG_MODULE_SIG_ALL= -C /lib/modules/3.10.0-1160.36.2.el7.x86_64/build M=/tmp/vbox.0 SRCROOT=/tmp/vbox.0 -j4 modules
arch/x86/Makefile:96: stack-protector enabled but compiler support broken
arch/x86/Makefile:166: *** CONFIG_RETPOLINE=y, but not supported by the compiler. Compiler update recommended.. Stop.
make: *** [vboxguest] Error 2
modprobe vboxguest failed
Extensive searching for these errors yields multiple forum posts and Stack Overflow questions whose replies and accepted answers reveal either that I'm missing one of those installed packages (I'm not) or that my GCC version is less than 7.3 (when support for CONFIG_RETPOLINE=y was added). However:
$ gcc --version
gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
This is > 7.3, so it does support this feature. I should note that I installed GCC using the Yum devtoolset packages in order to use this newer compiler:
$ sudo yum list installed|grep devtoolset
...
devtoolset-8-gcc.x86_64 8.3.1-3.2-el7 #centos-sclo-rh
devtoolset-8-gcc-c++.x86_64 8.3.1-3.2-el7 #centos-sclo-rh
devtoolset-8-gcc-gdb-plugin.x86_64 8.3.1-3.2-el7 #centos-sclo-rh
...
devtoolset-8-make.x86_64 1:4.2.1-4.el7 #centos-sclo-rh
...
And I do not have any other GCC versions installed:
$ sudo yum list installed|grep gcc
devtoolset-8-gcc.x86_64 8.3.1-3.2-el7 #centos-sclo-rh
devtoolset-8-gcc-c++.x86_64 8.3.1-3.2-el7 #centos-sclo-rh
devtoolset-8-gcc-gdb-plugin.x86_64 8.3.1-3.2-el7 #centos-sclo-rh
libgcc.x86_64 4.8.5-44.el7 #anaconda
And I have this in ~/.bashrc to enable devtoolset upon login:
...
source scl_source enable devtoolset-8
...
What am I doing wrong?
It turned out the problem wasn't that I was using the wrong GCC version (not possible) or that I was missing any installed packages (I wasn't). Instead, it was a consequence of how the VirtualBox Guest Additions "auto-run" works. Something about the way it runs results in a "fresh" environment without devtoolset-8 properly sourced. As a result, it cannot find the installed GCC 8.3.
The solution was simple: When the auto-run prompt appeared, I dismissed it and did not run auto-run. Instead, I opened a fresh Terminal window and changed directories to /run/media/[username]/VBox_GAs_6.1.26 (YMMV on the exact location of the mounted disk), then ran this command:
$ sudo ./VBoxLinuxAdditions.run
That command completed successfully, the kernel module compiled, the Guest Additions installed, and they are working properly now.
TLDR: Ensure to keep system OS up-to-date to help ensure consistency with current spec files.
Symptom
When rebuilding PostgreSQL 11.1 SRPM using mock, the build fails with:
BUILDSTDERR: /builddir/build/BUILD/postgresql-11.1/src/bin/psql/command.c:1814 undefined reference to `PQencryptPasswordConn`
NB: PQencryptPasswordConn is a libpq.so function (provided by postgresql-devel-10.3-5.fc27.x86_64 on my system...outside the mock chroot environment). Unless I'm mistaken, the Postgresql SRPM builds the postgresql-devel RPM along with others.
Steps to reproduce
I ran the following to rebuild the SRPM before attempting to apply any patches not already present in the SRPM:
# Obtain SRPM source
git clone https://src.fedoraproject.org/rpms/postgresql.git
cd postgresql
# Download local copies of SRPM sources
wget $(spectool -S *.spec | awk '/^Source.*:\/\//{IFS=" "; print $2}')
# ...check SHAs of downloaded sources...
# Run SRPM-specific prep scripts
./generate-pdf.sh
./generate-sources.sh
# Generate the SRPM
mock --root=fedora-27-x86_64 --resultdir="./SRPMS" --buildsrpm --spec postgresql.spec --sources .
# >>> Everything seems to work fine up to this point <<<
# Build the RPM inside mock chroot
mock --root=fedora-27-x86_64 --rebuild ./SRPMS/postgresql-11.1-4.fc27.src.rpm
# !!! Fail here (with symptom above) !!!
The Problem
I have so far been unable to have mock load the appropriate libpq library headers into the chroot environment to make sure rpmbuild builds against the libpq that contains the PQencryptPasswordConn header (which appears to exist on my system outside the build environment):
grep -lr "PQencryptPasswordConn" /usr/include
# /usr/include/libpq-fe.h
grep -lr "PQencryptPasswordConn" /var/lib/mock/fedora-27-x86_64/root/usr/include
# (Nothing returned)
When reviewing mock's installed_pkgs.log, the following were installed (the latter of which I expect would provide a version of libpq headers):
postgresql-libs-9.6.10-3.fc27.x86_64
postgresql-devel-9.6.10-3.fc27.x86_64
However, I cannot find a way to install the postgresql-* packages into the chroot environment that contain the updated library headers.
The Ask
Since postgresql SRPM is supposed to build postgresql-devel RPM, I think that mock will need to build and install the postgresql-devel RPM in the chroot before rpmbuild attempts to compile psql/command.c so that the latter compilation finds the appropriate library headers (unless the build process is intelligent enough to identify new libraries currently under build).
How can I best accomplish this (would prefer to avoid multiple mock calls for each RPM package built from the SRPM unless that's the only way to go)?
Please note that the build process on my system spawns multiple processes to parallel compilations.
I have also tried to use mockchain —recurse without success.
System Info
Linux 4.16.6-202.fc27.x86_64
First hint, you use the latest postgresql.spec version, but you try to build it against rather old (in fact unsupported nowadays) version 27 of Fedora distribution. I'd encourage you to migrate to a newer version of Fedora, or at least checkout the branch f27 in the same RPM git repository.
Second hint, we changed the layout of PostgreSQL packaging in Fedora 30+. We've cut out the library (libpq.so) into separate package, per announcement.
How to continue; always checkout appropriate branch based on what Fedora you build against, and adjust the spec file appropriately (checkout f27 and update to PostgreSQL 11.1 in this case).
JFTR (might help), there already is a testing modular build of PostgreSQL 11 against Fedora 28+, and the build scripts are maintained in separate branch stream-postgresql-11. With a bit of luck, you would be able to build that branch against old Fedora 27, too. Note that this version of postgresql.spec file is a little bit complicated (it needs to be because we build it against different versions of Fedora).
Does anyone know if it is possible to install lustre client software on a linux machine that has kernel 4+? From what I have experimented so far, all the working examples are on kernel 3.10. And if I try to install kmod-luster-client on 4+ machine, it fails with:
rpm -ivh kmod-lustre-client-2.10.5-1.el7.x86_64.rpm
error: Failed dependencies:
kernel < 3.10.0-863 is needed by kmod-lustre-client-2.10.5-1.el7.x86_64
kernel(PDE_DATA) = 0x44f0d59d is needed by kmod-lustre-client-2.10.5-1.el7.x86_64
According to lustre/ChangeLog in the b2_10 branch, it works with kernels at least 4.4.133-94.33 (SLES12SP3) and 4.4.0-131 (Ubuntu 16.04).
If you are using a newer kernel, you also need to use a newer version of Lustre. The lustre/ChangeLog on the tip of master (almost 2.12 release) reports support for kernels 4.15.0-32 (Ubuntu 18.04).
It looks like you are trying to install a binary kernel module RPM built for the RHEL7 kernel on a non-RHEL kernel. That is never going to work. You need to either get the right RPMs/Debs for your kernel from https://lustre.org/download/ or download the source and rebuild it for your kernel.
The 2.10.x kernels are currently the LTS maintained releases (bugfixes backported to that release), while 2.11.0 is a feature release that does not have bugfixes backported.
I am currently having some problems installing tensor flow with gpu support.
This is the guide i've followed.
Install NVIDIA CUDA (preinstalled)
Install NVIDIA cuDNN (preinstalled)
Install bazel
wget https://github.com/bazelbuild/bazel/releases/download/0.4.3/bazel-0.4.3-installer-linux-x86_64.sh
chmod +x bazel-0.4.3-installer-linux-x86_64.sh
./bazel-0.4.3-installer-linux-x86_64.sh --user
install tensor flow from source
git clone https://github.com/tensorflow/tensorflow
cd tensorflow/
./configure
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
the last one don't finish.. or it does.. it comes up with an error message
a#fe1:~/tensorflow$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
.
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
Server finished RPC without an explicit exit code
after this should i be able to do this
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tenso
rflow_pkg
according to the guide, but is not possible.
i've previously had tensor flow cpu version running, but since the need of gpu were sincerely pressing i decided to install it... but didn't think it would be this troublesome..
any idea on why it is not possible to build it?
os:centOS 7.1
gpu:nvidea
I've had problems attemping to install Tensorflow from source using Bazel as well.It could be that the current bazel build has problems,since that occured in the past with me.It would help if you provided us with information about your system(OS & GPU),but you're probably better off using pip(or pip3) and doing sudo pip(3) install tensorflow-gpu.
I am running into problems installing certain perl modules within docker. Is there a recommended stable way of doing this for the default ubuntu image?
Also I'm unlear how to access the install log file in a failed build (ie for cpan minus at /.cpanm/build.log).
The following Dockerfile fails with the message:
Please specify prototyping behavior for locale.xs (see perlxs manual)
When it attempts to resolve the dependency on PerlIO::locale.
# use the ubuntu base image provided by dotCloud
FROM ubuntu
# make sure the package repository is up to date
RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
RUN apt-get update
# install perl and modules
RUN apt-get install -y make
RUN apt-get install -y perl
RUN apt-get install -y cpanminus
RUN cpanm -v Text::Names
Some modules include C code which needs to be compiled on the target systems (“XS modules”). For that, you'll need a complete C toolchain. This implies make, the compiler: gcc, and the C standard library headers: libc-dev. The build-essential metapackage includes these components (and some more), so I'd recommend you install that instead.
According to perlxstut, that's just a warning rather than a fatal error.
There's a clearly documented default (perlxs: "Prototypes are enabled by default"). Furthermore, this particular XS component doesn't actually export any functions to Perl, so the setting is never even used.
The warning can be silenced by adding a PROTOTYPES: ENABLE to locale.xs — you could even ask the author to make that change — but it won't make any difference.
The problem is elsewhere.