Redhat Cluster (Pacemaker/Corosync): DLM Not Starting - redhat

I need help regarding my cluster error:
[root#db2]# pcs status
Cluster name: oracluster
Last updated: Mon Feb 22 16:00:12 2016
Last change: Mon Feb 22 15:45:14 2016
Stack: corosync
Current DC: db2 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
5 Resources configured
Online: [ db1 db2 ]
Full list of resources:
ClusterVIP (ocf::heartbeat:IPaddr2): Started db2
Clone Set: dlm-clone [dlm]
Stopped: [ db1 db2 ]
Clone Set: clvmd-clone [clvmd]
Stopped: [ db1 db2 ]
Failed actions:
dlm_start_0 on db2 'not configured' (6): call=18, status=complete, exit-reason='none', last-rc-change='Mon Feb 22 15:57:04 2016', queued=0ms, exec=34ms
PCSD Status:
db1: Online
db2: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Details:
I have 2 nodes (db1, db2) with shared storage (SAN). Both servers are in RHEL7.1. Now I want to add the storage as a resource. According to RHEL documentation DLM and CLVMD should be added also as a resource. I discovered that the error will disapper when STONITH is enabled, but still DLM is not starting. The log says it needs Fencing Device to be configured, which I don't have right now.
Any work around for this? Do we have a way to disable the fence mechanism and still make the cluster work? Thank you so much in advance!

you said that you have san storage, then you create a partion for fencing and use it as scsi stonith, il will solve your problem, like this exemple:
pcs stonith create scsi-stonith-device fence_scsi devices=/dev/mapper/fence pcmk_monitor_action=metadata pcmk_reboot_action=off pcmk_host_list="node1 node2" meta provides=unfencing
and don't forget to enable stonith with pcs property set stonith-enabled=true

Configure SONITH. It will help you to fix this issue.

Related

Error when init database postgresql 10.10: PANIC: could not generate secret authorization token

I have a problem when run command: sudo -su user_test ./pgsql/bin/initdb -D /example/folder
I had researched many sources from the internet but don’t found a solution.
I hope everyone could help me. Thanks.
Enviroment:
initdb (PostgreSQL) 10.10
OS: uname -a Linux DL2100 3.10.38 #1 SMP Build-gitb1820a8 x86_64 GNU/Linux
selecting default max_connections … 100
selecting default shared_buffers … 128MB
selecting default timezone … Europe/Helsinki
selecting dynamic shared memory implementation … posix
creating configuration files … ok
running bootstrap script … 2020-11-03 11:52:56.303 EET [3928] DEBUG: invoking IpcMemoryCreate(size=148545536)
2020-11-03 11:52:56.303 EET [3928] DEBUG: mmap(148897792) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2020-11-03 11:52:56.315 EET [3928] DEBUG: SlruScanDirectory invoking callback on pg_notify/0000
2020-11-03 11:52:56.315 EET [3928] DEBUG: removing file "pg_notify/0000"
2020-11-03 11:52:56.316 EET [3928] DEBUG: dynamic shared memory system will support 288 segments
2020-11-03 11:52:56.316 EET [3928] DEBUG: created dynamic shared memory control segment 1852866650 (6928 bytes)
2020-11-03 11:52:56.319 EET [3928] PANIC: could not generate secret authorization token
Aborted
child process exited with exit code 134```
The error is thrown in BootStrapXLOG in src/backend/access/transam/xlog.c:
/*
* Generate a random nonce. This is used for authentication requests that
* will fail because the user does not exist. The nonce is used to create
* a genuine-looking password challenge for the non-existent user, in lieu
* of an actual stored password.
*/
if (!pg_backend_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN))
ereport(PANIC,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("could not generate secret authorization token")));
src/backend/utils/misc/backend_random.c says:
pg_backend_random() function fills a buffer with random bytes. Normally,
it is just a thin wrapper around pg_strong_random(), but when compiled
with --disable-strong-random, we provide a built-in implementation.
So it seems that PostgreSQL was built on a system that had a source for strong random numbers (OpenSSL or /dev/urandom, if you are not on Windows), but the facility is not working on your current system.
try with the lates minor release of v10 (currently 10.15) – maybe a bug has been fixed.
run pg_config --configure to check if PostgreSQL was built --with-openssl
OpenSSL also uses /dev/urandom, so there is likely a problem with that source of random numbers; investigate there
If all fails, build PostgreSQL from source and configure it with
./configure --disable-strong-random ...
It worked fine. Thank you very much, #Laurenz Albe

Openshift 3.11 cloud integration fails with lookup RequestError: send request failed\\ncaused by: Post https://ec2.eu-west-.amazonaws.com

Following the docs: https://docs.openshift.com/container-platform/3.11/install_config/configuring_aws.html#aws-cluster-labeling
Configuring the cloud integration after the cluster build.
When the cluster services are restarted on the masters it fails looking up AWS instances:
22 16:32:10.112895 75995 server.go:261] failed to run Kubelet: could not init cloud provider "aws": error finding instance i-0c5cbd50923f9c6d2: "error listing AWS instances: \"Request.service: main process exited, code=exited, status=255/n/a Error: send request failed\\ncaused by: Post https://ec2.eu-west-.amazonaws.com/: dial tcp: lookup ec2.eu-west-.amazonaws.com: no such host\""
On closer inspection seems to be due to incorrect hostname:
https://ec2.eu-west-.amazonaws.com/ VS https://ec2.eu-west-2.amazonaws.com/
So I double checked the config, which seems to be correct:
# cat /etc/origin/cloudprovider/aws.conf
[Global]
Zone = eu-west-2
Had a google and it seems to be a similar issue to this:
https://github.com/kubernetes-sigs/kubespray/issues/4345
Is there a way to work around this? Moving off 3.11 isn't an option right now.
Thanks.
Looks as though it needs to be zone, rather than the region.
# cat /etc/origin/cloudprovider/aws.conf
[Global]
Zone = eu-west-2a

Can't start / connect to Oracle Database - Windows 10 / Oracle 18 XE / SQL Developer

I am trying to install and run this database for a week... ;<
I previously tried with Oracle 12 c standard edition but it didn't work - I don't know why ;(
For now I have uninstalled (I believe) the 12 c and installed 18 XE..
On SQL Plus I have: ORA-12560: TNS:protocol adapter error service is running when I try to log in as sysdba
All services on services.msc are running:
- OracleOraDB18Home1MTSRecoveryService
- OracleOraDB18Home1TNSListener
- OracleRemExecServiceV2
- OracleServiceXE
- OracleVssWriterXE
On command line when I type "lsnrctl status" I have:
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=DESKTOP-*******)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for 64-bit Windows: Version 18.0.0.0.0 - Production
Start Date 23-NOV-2019 10:41:47
Uptime 0 days 11 hr. 36 min. 34 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Default Service XE
Listener Parameter File C:\app\*****\product\18.0.0\dbhomeXE\network\admin\listener.ora
Listener Log File C:\app\*****\product\18.0.0\diag\tnslsnr\DESKTOP-*******\listener\alert\log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=DESKTOP-*******)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=\\.\pipe\EXTPROC1521ipc)))
Services Summary...
Service "CLRExtProc" has 1 instance(s).
Instance "CLRExtProc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
And on SQL Developer I have one of 2 errors, depending what I write there:
- Status : Failure - The Network Adapter could not establish the connection
- Status : Failure -Test failed: Listener refused the connection with the following error:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
I don't know what I am doing. I just want to install and run this as quickly as possible to practice on sample database for an exam :( This is my PC, no virtual machines or online servers
Please help me investigate the issue - I checked several answers, even on this site, but I can't understand them well, there are a lot more information there than I need....
Your exception message...
TNS:listener does not currently know of service requested in connect descriptor
... and your listener status...
Services Summary...
Service "CLRExtProc" has 1 instance(s).
Instance "CLRExtProc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
... say exactly what's wrong. Your TNS Listener does not know of your Oracle DB service.
You may be right that your Windows "Oracle DB" service is running; that does not mean, though, that your Oracle instance is running. First, log on as admin to your Oracle DB from a Windows user that is privileged to log on locally...
set ORACLE_SID=fill_in_the_SID_of_your_Oracle_instance_here
sqlplus / as sysdba
... and run...
alter system set local_listener = <fill_in_your_local_listener_SID_here>
alter system register;
That should register your database service to your TNS Listener, provided that you have your local TNS listener configured in your server's tnsnames.ora...
<fill_in_your_local_listener_SID_here> = (address = (protocol = tcp)(host = 127.0.0.1)(port = 1521))
Too complicated, I know. That's why it's better (for new developers w/o Oracle DBA skills) to use precreated VMs images as suggested by #thatjeffsmith's comment above.

dashDB Local on fedora 25 - error code 130

I tried 30 day trial of dashDB Local. I followed the steps described in the link:
https://www.ibm.com/support/knowledgecenter/en/SS6NHC/com.ibm.swg.im.dashdb.doc/admin/linux_deploy.html
I did not create a node configuration file because mine is a SMP setup.
Logged into my docker hub account and pulled the image.
docker login -u xxx -p yyyyy
docker pull ibmdashdb/local:latest-linux
The pull took 5 minutes or so. I waited for the image download to complete.
Ran the following command. It completed successfully.
docker run -d -it --privileged=true --net=host --name=dashDB -v /mnt/clusterfs:/mnt/bludata0 -v /mnt/clusterfs:/mnt/blumeta0 ibmdashdb/local:latest-linux
ran logs command
docker logs --follow dashDB
This showed dashDB did not start but exited with error code 130
# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0f008f8e413d ibmdashdb/local:latest-linux "/usr/sbin/init" 16 seconds ago Exited (130) 1 seconds ago dashDB
#
logs command shows this:
2017-05-17T17:48:11.285582000Z Detected virtualization docker.
2017-05-17T17:48:11.286078000Z Detected architecture x86-64.
2017-05-17T17:48:11.286481000Z
2017-05-17T17:48:11.294224000Z Welcome to dashDB Local!
2017-05-17T17:48:11.294621000Z
2017-05-17T17:48:11.295022000Z Set hostname to <orion>.
2017-05-17T17:48:11.547189000Z Cannot add dependency job for unit systemd-tmpfiles-clean.timer, ignoring: Unit is masked.
2017-05-17T17:48:11.547619000Z [ OK ] Reached target Timers.
<snip>
2017-05-17T17:48:13.361610000Z [ OK ] Started The entrypoint script for initializing dashDB local.
2017-05-17T17:48:19.729980000Z [100209.207731] start_dashDB_local.sh[161]: /usr/lib/dashDB_local_common_functions.sh: line 1816: /tmp/etc_profile-LOCAL.cfg: No such file or directory
2017-05-17T17:48:20.236127000Z [100209.713223] start_dashDB_local.sh[161]: The dashDB Local container's environment is not set up yet.
2017-05-17T17:48:20.275248000Z [ OK ] Stopped Create Volatile Files and Directories.
<snip>
2017-05-17T17:48:20.737471000Z Sending SIGTERM to remaining processes...
2017-05-17T17:48:20.840909000Z Sending SIGKILL to remaining processes...
2017-05-17T17:48:20.880537000Z Powering off.
So it looks like start_dashDB_local.sh is failing at /usr/lib/dashDB_local_common_functions.sh 1816th line? I exported the image and this is the 1816th line of dashDB_local_common_functions.sh
update_etc_profile()
{
local runtime_env=$1
local cfg_file
# Check if /etc/profile/dashdb_env.sh is already updated
grep -q BLUMETAHOME /etc/profile.d/dashdb_env.sh
if [ $? -eq 0 ]; then
return
fi
case "$runtime_env" in
"AWS" | "V1.5" ) cfg_file="/tmp/etc_profile-V15_AWS.cfg"
;;
"V2.0" ) cfg_file="/tmp/etc_profile-V20.cfg"
;;
"LOCAL" ) # dashDB Local Case and also the default
cfg_file="/tmp/etc_profile-LOCAL.cfg"
;;
*) logger_error "Invalid ${runtime_env} value"
return
;;
esac
I also see /tmp/etc_profile-LOCAL.cfg in the image. Did I miss any step here?
I also created /mnt/clusterfs/nodes file ... but it did not help. The same docker run command failed in the same way.
Please help.
I am using x86_64 Fedora25.
# docker version
Client:
Version: 1.12.6
API version: 1.24
Package version: docker-common-1.12.6-6.gitae7d637.fc25.x86_64
Go version: go1.7.4
Git commit: ae7d637/1.12.6
Built: Mon Jan 30 16:15:28 2017
OS/Arch: linux/amd64
Server:
Version: 1.12.6
API version: 1.24
Package version: docker-common-1.12.6-6.gitae7d637.fc25.x86_64
Go version: go1.7.4
Git commit: ae7d637/1.12.6
Built: Mon Jan 30 16:15:28 2017
OS/Arch: linux/amd64
#
# cat /etc/fedora-release
Fedora release 25 (Twenty Five)
# uname -r
4.10.15-200.fc25.x86_64
#
Thanks for bringing this to our attention. I reached out to our developer team. It seems this is happening because inside the container, tmpfs gets mounted on to /tmp and wipes out all the scripts
We have seen this issue and moving to the latest version of docker seems to fix it. Your docker version commands shows it is an older version.
So please install the latest docker version and retry the deployment of dashdb Local and update here.
Regards
Murali

MongoDB ReplicaSet on windows azure - Sockect exception error

I have 3 node replica set deployed in windows azure. While doing performance testing, the test code halts after sometime. In the server I can see the following error log -
Fri Aug 30 23:14:59.982 [conn2454] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [ip:port]
For the performance test I am using multithreaded code to only read data from the replicaset.
So far I have tried http://docs.mongodb.org/manual/faq/diagnostics/#does-tcp-keepalive-time-affect-sharded-clusters-and-replica-sets. But it did not help so far.
Any thoughts/suggestions will be welcomed.
Thanks
This is old, but just in case someone else stumbles across this.
You need to set the TCP/IP keep alive times different from the stock Linux configuration if you are running under Azure:
echo 45 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 30 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 20 > /proc/sys/net/ipv4/tcp_keepalive_probes