We are trying to setup a 3-node PG with Patroni-Consul-Postgres, however Patroni fails to spinup the db with below error. I let it run for sometime but with no change in status. Which one is the service ID or is there some mismatch in my config?
[ec2-user#ip-172-31-68-200 ~]$ /home/ec2-user/.local/bin/patroni /etc/patroni.yml
2023-02-02 14:25:52,677 INFO: No PostgreSQL configuration items changed, nothing to reload.
2023-02-02 14:25:52,683 INFO: Deregister service pglab/pgdbhost1
2023-02-02 14:25:52,688 INFO: Lock owner: None; I am pgdbhost1
2023-02-02 14:25:52,688 INFO: Deregister service pglab/pgdbhost1
2023-02-02 14:25:52,693 INFO: waiting for leader to bootstrap
2023-02-02 14:25:52,695 INFO: Lock owner: None; I am pgdbhost1
2023-02-02 14:25:52,697 INFO: Deregister service pglab/pgdbhost1
2023-02-02 14:25:52,698 INFO: waiting for leader to bootstrap
2023-02-02 14:26:02,696 INFO: Lock owner: None; I am pgdbhost1
2023-02-02 14:26:02,698 INFO: Deregister service pglab/pgdbhost1
2023-02-02 14:26:02,700 INFO: waiting for leader to bootstrap
^C2023-02-02 14:26:06,200 INFO: Deregister service pglab/pgdbhost1
[ec2-user#ip-172-31-68-200 ~]$ journalctl -xen | less
-- Logs begin at Thu 2023-02-02 10:09:52 UTC, end at Thu 2023-02-02 14:26:06 UTC. --
Feb 02 14:24:17 ip-172-31-68-200.ec2.internal sshd[13313]: pam_unix(sshd:session): session opened for user ec2-user by (uid=0)
Feb 02 14:24:44 ip-172-31-68-200.ec2.internal dhclient[2925]: XMT: Solicit on eth0, interval 123490ms.
Feb 02 14:25:27 ip-172-31-68-200.ec2.internal sudo[13415]: ec2-user : TTY=pts/2 ; PWD=/home/ec2-user ; USER=root ; COMMAND=/usr/local/bin/patroni /etc/patroni.yml
Feb 02 14:25:27 ip-172-31-68-200.ec2.internal sudo[13415]: pam_unix(sudo:session): session opened for user root by ec2-user(uid=0)
Feb 02 14:25:27 ip-172-31-68-200.ec2.internal sudo[13415]: pam_unix(sudo:session): session closed for user root
Feb 02 14:25:52 ip-172-31-68-200.ec2.internal bash[7029]: 2023-02-02T14:25:52.683Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/pglab/pgdbhost1 from=127.0.0.1:42250 error="Unknown service ID \"pglab/pgdbhost1\". Ensure that the service ID is passed, not the service name."
Feb 02 14:25:52 ip-172-31-68-200.ec2.internal bash[7029]: 2023-02-02T14:25:52.689Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/pglab/pgdbhost1 from=127.0.0.1:42250 error="Unknown service ID \"pglab/pgdbhost1\". Ensure that the service ID is passed, not the service name."
Feb 02 14:25:52 ip-172-31-68-200.ec2.internal bash[7029]: 2023-02-02T14:25:52.698Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/pglab/pgdbhost1 from=127.0.0.1:42250 error="Unknown service ID \"pglab/pgdbhost1\". Ensure that the service ID is passed, not the service name."
Feb 02 14:26:02 ip-172-31-68-200.ec2.internal bash[7029]: 2023-02-02T14:26:02.699Z [ERROR] agent.http: Request error: method=PUT url=/v1/agent/service/deregister/pglab/pgdbhost1 from=127.0.0.1:42250 error="Unknown service ID \"pglab/pgdbhost1\". Ensure that the service ID is passed, not the service name."
Please find the Patroni & Consul server/agent configs below.
Patroni.yml
name: "pgdbhost1"
scope: pglab
namespace: /HAPG/
consul:
url: http://127.0.0.1:8500
register_service: true
postgresql:
connect_address: "172.31.68.200:5432"
bin_dir: /usr/pgsql-12/bin
data_dir: /var/lib/pgsql/12/
listen: "*:5432"
authentication:
replication:
username: replicator
password: replicator
superuser:
username: postgres
password: postgres
rewind:
username: rewind_user
password: rewind_user
parameters:
unix_socket_directories: '/var/run/postgresql'
synchronous_commit: "on"
synchronous_standby_names: "*"
restapi:
connect_address: "172.31.68.200:8008"
listen: "172.31.68.200:8008"
bootstrap:
dcs:
loop_wait: 10
retry_timeout: 5
maximum_lag_on_failover: 1048576
postgresql:
parameters:
use_pg_rewind: true
use_slots: true
users:
app_user:
password: "eW5guPae"
pg_hba:
- local all all md5
- host all all 127.0.0.1/32 md5
- host all all ::1/128 md5
- host all all ::1/128 md5
- host all all 0.0.0.0/0 md5
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 172.31.68.200/0 md5
- host replication replicator 172.31.74.220/0 md5
- host replication replicator 172.31.75.194/0 md5
- host all all 0.0.0.0/0 md5
initdb:
- encoding: UTF8
Consul Config:
{
"node_name": "PGDB1",
"data_dir": "/consul/data",
"datacenter": "AWS",
"retry_join":[
"172.31.70.83"
],
"encrypt": "mUWL5XahLJ3GdLW13W/PUS6vnqXNt10ckAk/lCo9/S8=",
"recursors": ["127.0.0.11"],
"client_addr": "172.31.68.200",
"bind_addr": "172.31.68.200",
"advertise_addr": "172.31.68.200",
"addresses": {
"dns": "127.0.0.1",
"http": "127.0.0.1"
}
}
===========================================
{
"bootstrap_expect": 1,
"client_addr": "0.0.0.0",
"datacenter": "AWS",
"data_dir": "/var/consul",
"domain": "consul-pgdb",
"enable_script_checks": true,
"dns_config": {
"enable_truncate": true,
"only_passing": true
},
"enable_syslog": true,
"encrypt": "mUWL5XahLJ3GdLW13W/PUS6vnqXNt10ckAk/lCo9/S8=",
"leave_on_terminate": true,
"log_level": "INFO",
"rejoin_after_leave": true,
"server": true,
"start_join": [
"172.31.70.83"
],
"ui": true,
"ports": {
"dns": 9600
}
}
Related
We are performing load test on our application using Jmeter, our application uses consul and vault as a backend service for reading/storing application configuration related data. While performing load testing, our application queries the vault for authentication data and this happens for each incoming request. Initially it runs fine for some duration (10 to 15 minutes) and I can see the success response in Jmete, but eventually after sometime the responses starts failing for all the requests. I see the following error in the vault log for each request but do not see any error/exception in the consul log.
Error in Vault log
[ERROR] core: failed to lookup token: error=failed to read entry: Get http://localhost:8500/v1/kv//vault/sys/token/id/87f7b82131cb8fa1ef71aa52579f155d4cf9f095: dial tcp [::1]:8500: getsockopt: connection refused
As of now the load is 100 request (users) in each 10 milliseconds with a ramp-up period of 60 seconds. And this executes over a loop. What could be the cause of this error? Is it due to the limited connection to port 8500
Below is my vault and consul configuration
Vault
backend "consul" {
address = "localhost:8500"
path = "app/vault/"
}
listener "tcp" {
address = "10.88.97.216:8200"
cluster_address = "10.88.97.216:8201"
tls_disable = 0
tls_min_version = "tls12"
tls_cert_file = "/var/certs/vault.crt"
tls_key_file = "/var/certs/vault.key"
}
Consul
{
"data_dir": "/var/consul",
"log_level": "info",
"server": true,
"leave_on_terminate": true,
"ui": true,
"client_addr": "127.0.0.1",
"ports": {
"dns": 53,
"serf_lan": 8301,
"serf_wan" : 8302
},
"disable_update_check": true,
"enable_script_checks": true,
"disable_remote_exec": false,
"domain": "primehome",
"limits": {
"http_max_conns_per_client": 1000,
"rpc_max_conns_per_client": 1000
},
"service": {
"name": "nginx-consul-https",
"port": 443,
"checks": [{
"http": "https://localhost/nginx_status",
"tls_skip_verify": true,
"interval": "10s",
"timeout": "5s",
"status": "passing"
}]
}
}
I have also configured the http_max_conns_per_client & rpc_max_conns_per_client, thinking that it might be due to the limited connection perclicent. But still I am seeing this error in vault log.
After taking another look at this, the issue appears to be that Vault is attempting to contact Consul over the IPv6 loopback address–likely due to the v4 and v6 addresses being present in /etc/hosts–but Consul is only listening on the IPv4 loopback address.
You can likely resolve this through one of the following methods.
Use 127.0.0.1 instead of localhost for Consul's address in the Vault config.
backend "consul" {
address = "127.0.0.1:8500"
path = "app/vault/"
}
Configure Consul to listen on both the IPv4 and IPv6 loopback addresses.
{
"client_addr": "127.0.0.1 [::1]"
}
(Rest of the config omitted for brevity.)
Remove the localhost hostname from the IPv6 loopback in /etc/hosts
127.0.0.1 localhost
# Old hosts entry for ::1
#::1 localhost ip6-localhost ip6-loopback
# New entry
::1 ip6-localhost ip6-loopback
I have an offline host node which includes (compute node, control node and storage node). This host node was shutdown by accident and can't recover to online. All services about that node are down and enable but I can't set to disable.
So I can't remove the host by:
kolla-ansible -i multinode stop --yes-i-really-really-mean-it --limit node-17
I get this error:
TASK [Gather facts] ********************************************************************************************************************************************************************************************************************
fatal: [node-17]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host node-17 port 22: Connection timed out", "unreachable": true}
PLAY RECAP *****************************************************************************************************************************************************************************************************************************
node-17 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
How can I remove that offline host node? THX.
PS: Why I remove that offline host?
node-14(online) : **manage node which kolla-ansible installed**; compute node, control node and storage node
node-15(online) : compute node, control node and storage node
node-17(offline) : compute node, control node and storage node
osc99 (adding) : compute node, control node and storage node
Because when I deploy a new host(osc99) with (the multinode file had comment the node-17 line):
kolla-ansible -i multinode deploy --limit osc99
kolla-ansible will report error:
TASK [keystone : include_tasks] ********************************************************************************************************************************************************************************************************
included: .../share/kolla-ansible/ansible/roles/keystone/tasks/init_fernet.yml for osc99
TASK [keystone : Waiting for Keystone SSH port to be UP] *******************************************************************************************************************************************************************************
ok: [osc99]
TASK [keystone : Initialise fernet key authentication] *********************************************************************************************************************************************************************************
ok: [osc99 -> node-14]
TASK [keystone : Run key distribution] *************************************************************************************************************************************************************************************************
fatal: [osc99 -> node-14]: FAILED! => {"changed": true, "cmd": ["docker", "exec", "-t", "keystone_fernet", "/usr/bin/fernet-push.sh"], "delta": "0:00:04.006900", "end": "2021-07-12 10:14:05.217609", "msg": "non-zero return code", "rc": 255, "start": "2021-07-12 10:14:01.210709", "stderr": "", "stderr_lines": [], "stdout": "Warning: Permanently added '[node.15]:8023' (ECDSA) to the list of known hosts.\r\r\nssh: connect to host node.17 port 8023: No route to host\r\r\nrsync: connection unexpectedly closed (0 bytes received so far) [sender]\r\nrsync error: unexplained error (code 255) at io.c(235) [sender=3.1.2]", "stdout_lines": ["Warning: Permanently added '[node.15]:8023' (ECDSA) to the list of known hosts.", "", "ssh: connect to host node.17 port 8023: No route to host", "", "rsync: connection unexpectedly closed (0 bytes received so far) [sender]", "rsync error: unexplained error (code 255) at io.c(235) [sender=3.1.2]"]}
NO MORE HOSTS LEFT *********************************************************************************************************************************************************************************************************************
PLAY RECAP *****************************************************************************************************************************************************************************************************************************
osc99 : ok=120 changed=55 unreachable=0 failed=1 skipped=31 rescued=0 ignored=1
How could I fixed this error, this is the main point whether or not I can remove the offline host.
Maybe I could fixed that by change the init_fernet.yml file:
node-14:~$ cat .../share/kolla-ansible/ansible/roles/keystone/tasks/init_fernet.yml
---
- name: Waiting for Keystone SSH port to be UP
wait_for:
host: "{{ api_interface_address }}"
port: "{{ keystone_ssh_port }}"
connect_timeout: 1
register: check_keystone_ssh_port
until: check_keystone_ssh_port is success
retries: 10
delay: 5
- name: Initialise fernet key authentication
become: true
command: "docker exec -t keystone_fernet kolla_keystone_bootstrap {{ keystone_username }} {{ keystone_groupname }}"
register: fernet_create
changed_when: fernet_create.stdout.find('localhost | SUCCESS => ') != -1 and (fernet_create.stdout.split('localhost | SUCCESS => ')[1]|from_json).changed
until: fernet_create.stdout.split()[2] == 'SUCCESS' or fernet_create.stdout.find('Key repository is already initialized') != -1
retries: 10
delay: 5
run_once: True
delegate_to: "{{ groups['keystone'][0] }}"
- name: Run key distribution
become: true
command: docker exec -t keystone_fernet /usr/bin/fernet-push.sh
run_once: True
delegate_to: "{{ groups['keystone'][0] }}"
by changing the delegate_to: "{{ groups['keystone'][0] }}? But I can't implement that.
I used scaffolding to generate a new microservice,then I made the following configuration for mongodb:
logging:
level:
ROOT: DEBUG
io.github.jhipster: DEBUG
com.fzai.fileservice: DEBUG
eureka:
instance:
prefer-ip-address: true
client:
service-url:
defaultZone: http://admin:${jhipster.registry.password}#localhost:8761/eureka/
spring:
profiles:
active: dev
include:
- swagger
# Uncomment to activate TLS for the dev profile
#- tls
devtools:
restart:
enabled: true
additional-exclude: static/**
livereload:
enabled: false # we use Webpack dev server + BrowserSync for livereload
jackson:
serialization:
indent-output: true
data:
mongodb:
host: 42.193.124.204
port: 27017
username: admin
password: admin123
authentication-database: fileService
database: fileService
mail:
host: localhost
port: 25
username:
password:
messages:
cache-duration: PT1S # 1 second, see the ISO 8601 standard
thymeleaf:
cache: false
sleuth:
sampler:
probability: 1 # report 100% of traces
zipkin: # Use the "zipkin" Maven profile to have the Spring Cloud Zipkin dependencies
base-url: http://localhost:9411
enabled: false
locator:
discovery:
enabled: true
server:
port: 8081
# ===================================================================
# JHipster specific properties
#
# Full reference is available at: https://www.jhipster.tech/common-application-properties/
# ===================================================================
jhipster:
cache: # Cache configuration
hazelcast: # Hazelcast distributed cache
time-to-live-seconds: 3600
backup-count: 1
management-center: # Full reference is available at: http://docs.hazelcast.org/docs/management-center/3.9/manual/html/Deploying_and_Starting.html
enabled: false
update-interval: 3
url: http://localhost:8180/mancenter
# CORS is disabled by default on microservices, as you should access them through a gateway.
# If you want to enable it, please uncomment the configuration below.
cors:
allowed-origins: "*"
allowed-methods: "*"
allowed-headers: "*"
exposed-headers: "Authorization,Link,X-Total-Count"
allow-credentials: true
max-age: 1800
security:
client-authorization:
access-token-uri: http://uaa/oauth/token
token-service-id: uaa
client-id: internal
client-secret: internal
mail: # specific JHipster mail property, for standard properties see MailProperties
base-url: http://127.0.0.1:8081
metrics:
logs: # Reports metrics in the logs
enabled: false
report-frequency: 60 # in seconds
logging:
use-json-format: false # By default, logs are not in Json format
logstash: # Forward logs to logstash over a socket, used by LoggingConfiguration
enabled: false
host: localhost
port: 5000
queue-size: 512
audit-events:
retention-period: 30 # Number of days before audit events are deleted.
oauth2:
signature-verification:
public-key-endpoint-uri: http://uaa/oauth/token_key
#ttl for public keys to verify JWT tokens (in ms)
ttl: 3600000
#max. rate at which public keys will be fetched (in ms)
public-key-refresh-rate-limit: 10000
web-client-configuration:
#keep in sync with UAA configuration
client-id: web_app
secret: changeit
An error occurred while I was running the project:
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'mongobee' defined in class path resource [com/fzai/fileservice/config/DatabaseConfiguration.class]: Invocation of init method failed; nested exception is com.mongodb.MongoQueryException: Query failed with error code 13 and error message 'not authorized on fileService to execute command { find: "system.indexes", filter: { ns: "fileService.dbchangelog", key: { changeId: 1, author: 1 } }, limit: 1, singleBatch: true, $db: "fileService" }' on server 42.193.124.204:27017
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1771)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:593)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:515)
at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:320)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:318)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:847)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:877)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:549)
at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:141)
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:744)
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:391)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:312)
at com.fzai.fileservice.FileServiceApp.main(FileServiceApp.java:70)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.boot.devtools.restart.RestartLauncher.run(RestartLauncher.java:49)
Caused by: com.mongodb.MongoQueryException: Query failed with error code 13 and error message 'not authorized on fileService to execute command { find: "system.indexes", filter: { ns: "fileService.dbchangelog", key: { changeId: 1, author: 1 } }, limit: 1, singleBatch: true, $db: "fileService" }' on server 42.193.124.204:27017
at com.mongodb.operation.FindOperation$1.call(FindOperation.java:706)
at com.mongodb.operation.FindOperation$1.call(FindOperation.java:695)
at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:462)
at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:406)
at com.mongodb.operation.FindOperation.execute(FindOperation.java:695)
at com.mongodb.operation.FindOperation.execute(FindOperation.java:83)
at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:179)
at com.mongodb.client.internal.FindIterableImpl.first(FindIterableImpl.java:198)
at com.github.mongobee.dao.ChangeEntryIndexDao.findRequiredChangeAndAuthorIndex(ChangeEntryIndexDao.java:35)
at com.github.mongobee.dao.ChangeEntryDao.ensureChangeLogCollectionIndex(ChangeEntryDao.java:121)
at com.github.mongobee.dao.ChangeEntryDao.connectMongoDb(ChangeEntryDao.java:61)
at com.github.mongobee.Mongobee.execute(Mongobee.java:143)
at com.github.mongobee.Mongobee.afterPropertiesSet(Mongobee.java:126)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1830)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1767)
... 19 common frames omitted
But in my other simple springboot project, I used the same configuration, which can run and use successfully:
spring:
application:
name: springboot1
data:
mongodb:
host: 42.193.124.204
port: 27017
username: admin
password: admin123
authentication-database: fileService
database: fileService
This is the user and role I created:
{
"_id" : "fileService.admin",
"userId" : UUID("03f75395-f129-4273-b6a6-b2dc3d1f7974"),
"user" : "admin",
"db" : "fileService",
"roles" : [
{
"role" : "dbOwner",
"db" : "fileService"
},
{
"role" : "readWrite",
"db" : "fileService"
}
],
"mechanisms" : [
"SCRAM-SHA-1",
"SCRAM-SHA-256"
]
}
I want to know what's wrong.
I'm trying to add 2 slaves in mongodb replication after successful initialization. But unfortunately it is failing.
repset_init.js file details
rs.add( { host: "10.0.1.170:27017" } )
rs.add( { host: "10.0.2.157:27017" } )
rs.add( { host: "10.0.3.88:27017" } )
command which i have executed for replicaset addition
mongo -u xxxxx -p yyyy --authenticationDatabase admin --port 27017 repset_init.js
command hangs in terminal and below is the log output
{"t":{"$date":"2021-06-09T12:29:34.939+00:00"},"s":"I", "c":"REPL", "id":21393, "ctx":"conn2","msg":"Found self in config","attr":{"hostAndPort":"MongoD-1:27017"}}
{"t":{"$date":"2021-06-09T12:29:34.939+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn2","msg":"Slow query","attr":{"type":"command","ns":"local.system.replset","appName":"MongoDB Shell","command":{"replSetReconfig":{"_id":"Shard_0","version":2,"protocolVersion":1,"writeConcernMajorityJournalDefault":true,"members":[{"_id":0,"host":"MongoD-1:27017","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":1.0,"tags":{},"slaveDelay":0,"votes":1},{"host":"10.0.2.157:27017","_id":1.0}],"settings":{"chainingAllowed":true,"heartbeatIntervalMillis":2000,"heartbeatTimeoutSecs":10,"electionTimeoutMillis":10000,"catchUpTimeoutMillis":-1,"catchUpTakeoverDelayMillis":30000,"getLastErrorModes":{},"getLastErrorDefaults":{"w":1,"wtimeout":0},"replicaSetId":{"$oid":"60c0b3566991d93637465f55"}}},"lsid":{"id":{"$uuid":"263568b4-ec31-4ea6-8f72-69cec80c1a7c"}},"$db":"admin"},"numYields":0,"reslen":38,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":3}},"ReplicationStateTransition":{"acquireCount":{"w":5}},"Global":{"acquireCount":{"r":1,"w":4}},"Database":{"acquireCount":{"w":2,"W":1}},"Collection":{"acquireCount":{"w":2}},"Mutex":{"acquireCount":{"r":2}}},"flowControl":{"acquireCount":2,"timeAcquiringMicros":3},"storage":{},"protocol":"op_msg","durationMillis":151}}
{"t":{"$date":"2021-06-09T12:29:34.940+00:00"},"s":"I", "c":"REPL", "id":21215, "ctx":"ReplCoord-1","msg":"Member is in new state","attr":{"hostAndPort":"10.0.2.157:27017","newState":"STARTUP"}}
{"t":{"$date":"2021-06-09T12:29:34.941+00:00"},"s":"I", "c":"REPL", "id":4508702, "ctx":"conn2","msg":"Waiting for the current config to propagate to a majority of nodes"}
{"t":{"$date":"2021-06-09T12:33:55.701+00:00"},"s":"I", "c":"CONTROL", "id":20712, "ctx":"LogicalSessionCacheReap","msg":"Sessions collection is not set up; waiting until next sessions reap interval","attr":{"error":"ShardingStateNotInitialized: sharding state is not yet initialized"}}
{"t":{"$date":"2021-06-09T12:33:55.701+00:00"},"s":"I", "c":"CONTROL", "id":20714, "ctx":"LogicalSessionCacheRefresh","msg":"Failed to refresh session cache, will try again at the next refresh interval","attr":{"error":"ShardingStateNotInitialized: sharding state is not yet initialized"}}
{"t":{"$date":"2021-06-09T12:34:35.029+00:00"},"s":"I", "c":"CONNPOOL", "id":22572, "ctx":"MirrorMaestro","msg":"Dropping all pooled connections","attr":{"hostAndPort":"10.0.2.157:27017","error":"ShutdownInProgress: Pool for 10.0.2.157:27017 has expired."}}
Additional details:
Shard_0:PRIMARY> rs.printSlaveReplicationInfo()
WARNING: printSlaveReplicationInfo is deprecated and may be removed in the next major release. Please use printSecondaryReplicationInfo instead.
source: 10.0.2.157:27017
syncedTo: Thu Jan 01 1970 00:00:00 GMT+0000 (UTC) 1623243005 secs (450900.83 hrs) behind the primary
Able to reach the node via port 27017
telnet 10.0.2.157 27017
Trying 10.0.2.157...
Connected to 10.0.2.157.
Escape character is '^]'.
My config file
net:
bindIp: 0.0.0.0
port: 27017
ssl: {}
processManagement:
fork: "true"
pidFilePath: /var/run/mongodb/mongod.pid
replication:
replSetName: Shard_0
security:
authorization: enabled
keyFile: /etc/zzzzzkey.key
setParameter:
authenticationMechanisms: SCRAM-SHA-256
sharding:
clusterRole: shardsvr
storage:
dbPath: /data/dbdata
engine: wiredTiger
systemLog:
destination: file
path: /data/log/mongodb.log
I'm initializing replicaset using below cmd
mongo --host 127.0.0.1 --port {{mongod_port}} --eval 'printjson(rs.initiate())'
Not sure what causing this issue. Could you please help me
The command looks a bit strange:
{
"replSetReconfig": {
"_id": "Shard_0",
"members": [
{ "_id": 0, "host": "MongoD-1:27017", "arbiterOnly": false, "hidden": false, "priority": 1.0, "slaveDelay": 0, "votes": 1 },
{ "_id": 1.0, "host": "10.0.2.157:27017" }
],
}
}
Why do you name your replica set Shard_0? Do you try to setup a Sharded Cluster?
You add _id: 0, host: "MongoD-1:27017" and _id: 1.0, host: "10.0.2.157:27017" which is not consistent, i.e. you mixed IP-Address and hostname. Also _id "0" and "1.0" is confusing.
How does your config files look like and how did you start the MongoDB services?
Our mongodb server deployed with 2 shards, each has 1 master server and 2 slave servers.
The four slave servers run mongo config as proxy, and two of the slave servers run arbiters.
But the mongodb coundn't be used now.
I can connect to 192.168.0.1:8000(mongos) and exec queries like 'use database' or 'show dbs', but i cann't exec queries in a choosed database such as 'db.foo.count()', 'db.foo.findOne()'
Here is the error log:
mongos> db.dev.count()
Fri Aug 16 12:55:36 uncaught exception: count failed: {
"assertion" : "DBClientBase::findN: transport error: 10.81.4.72:7100 query: { setShardVersion: \"\", init: true, configdb: \"10.81.4.72:7300,10.42.50.26:7300,10.81.51.235:7300\", serverID: ObjectId('520db0a51fa00999772612b9'), authoritative: true }",
"assertionCode" : 10276,
"errmsg" : "db assertion failure",
"ok" : 0
}
Fri Aug 16 11:23:29 [conn8431] DBClientCursor::init call() failed
Fri Aug 16 11:23:29 [conn8430] Socket recv() errno:104 Connection reset by peer 10.81.4.72:7100
Fri Aug 16 11:23:29 [conn8430] SocketException: remote: 10.81.4.72:7100 error: 9001 socket exception [1] server [10.81.4.72:7100]
Fri Aug 16 11:23:29 [conn8430] DBClientCursor::init call() failed
Fri Aug 16 11:23:29 [conn8430] DBException in process: could not initialize cursor across all shards because : DBClientBase::findN: transport error: 10.81.4.72:7100 query: { setShardVersion: "", init: true, configdb: "10.81.4.72:7300,10.42.50.26:7300,10.81.51.235:7300", serverID: ObjectId('520d99c972581e6a124d0561'), authoritative: true } # s01/10.36.31.36:7100,10.42.50.24:7100,10.81.4.72:7100
i can only start on mongos, queries wouldn't be exec if more than 1 mongos run at the same time, error log:
mongos> db.dev.count() Fri Aug 16 15:12:29 uncaught exception: count failed: { "assertion" : "DBClientBase::findN: transport error: 10.81.4.72:7100 query: { setShardVersion: \"\", init: true, configdb: \"10.81.4.72:7300,10.42.50.26:7300,10.81.51.235:7300\", serverID: ObjectId('520dd04967557902f73a9fba'), authoritative: true }", "assertionCode" : 10276, "errmsg" : "db assertion failure", "ok" : 0 }
Could you please clarify if your set-up was working before, if you are just setting it up now?
To repair your MongoDB, you might want to follow this link:
http://docs.mongodb.org/manual/tutorial/recover-data-following-unexpected-shutdown/
References
MongoDB Documentation : Deploying a Shard-Cluster
MongoDB Documentation : Add Shards to an existing cluster
Older, outdated(!) info:
YouTube Video on Setting-up Sharding for MongoDB
Corresponding Blog on blog.serverdensity.com