Coco-annotator server crashes unexpectedly - docker-compose

I'm running coco-annotator in a dedicated e2-micro Instance on GCP, and it first it was running smoothly, but recently it has been hanging very frequently, and not only the annotation url gets inaccessible, but the whole VM! I need to manually stop and start the instance on GCP.
I checked the docker-compose logs and these are the last logs before the machine hanging:
annotator_message_q | 2022-04-19 19:09:16.735303+00:00 [erro] <0.183.0> ** Generic server aten_detector terminating
annotator_message_q | 2022-04-19 19:09:16.735303+00:00 [erro] <0.183.0> ** Last message in was poll
annotator_message_q | 2022-04-19 19:09:16.735303+00:00 [erro] <0.183.0> ** When Server state == {state,#Ref<0.2293947595.4227596289.176964>,5000,0.99,
annotator_message_q | 2022-04-19 19:09:16.735303+00:00 [erro] <0.183.0> #{},#{}}
annotator_message_q | 2022-04-19 19:09:16.735303+00:00 [erro] <0.183.0> ** Reason for termination ==
annotator_message_q | 2022-04-19 19:09:16.735303+00:00 [erro] <0.183.0> ** {{timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}},
annotator_message_q | 2022-04-19 19:09:16.735303+00:00 [erro] <0.183.0> [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]},
annotator_message_q | 2022-04-19 19:09:16.735303+00:00 [erro] <0.183.0> {aten_detector,handle_info,2,[{file,"src/aten_detector.erl"},{line,109}]},
annotator_message_q | 2022-04-19 19:09:16.735303+00:00 [erro] <0.183.0> {gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,695}]},
annotator_message_q | 2022-04-19 19:09:16.735303+00:00 [erro] <0.183.0> {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,771}]},
annotator_message_q | 2022-04-19 19:09:16.735303+00:00 [erro] <0.183.0> {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
annotator_message_q | 2022-04-19 19:09:16.735303+00:00 [erro] <0.183.0>
annotator_message_q | 2022-04-19 19:09:18.480362+00:00 [erro] <0.183.0> crasher:
It seems that the problem is linked to rabbitmq, but I have little to no knowledge about it. Can anyone point me some hints to where the problem may be?
Thanks in advance.

So, since this problem arose I changed to a bigger machine. It was 1GB RAM before, now it's running in a 4GB machine, and I haven't had this problem since then. This issue on github also seems to suggest that this was the problem.

Related

Unable to connect the mongodb container to node container in docker

I made 3 docker containers from 2 images in this repo and 1 using MongoDB public image. I turned ON all three containers using sudo docker-compose -f docker-compose.yaml up
docker-compose.yaml is:
version: '3'
services:
frontend:
image: samar080301/mern-frontend:1.0
ports:
- 3000:3000
backend:
image: samar080301/mern-backend:1.0
ports:
- 5000:5000
mongodb:
image: mongo:latest
ports:
- 27017:27017
But the MongoDB couldn't connect with the node server and gave this error:
backend_1 | > crud-app#1.0.0 start /home/app
backend_1 | > node server.js
backend_1 |
backend_1 | (node:18) DeprecationWarning: current Server Discovery and Monitoring engine is deprecated, and will be removed in a future version. To use the new Server Discover and Monitoring engine, pass option { useUnifiedTopology: true } to the MongoClient constructor.
backend_1 | (Use `node --trace-deprecation ...` to show where the warning was created)
backend_1 | App running on port 5000
backend_1 | Error with the database! MongoNetworkError: failed to connect to server [localhost:27017] on first connect [Error: connect ECONNREFUSED 127.0.0.1:27017
backend_1 | at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16) {
backend_1 | name: 'MongoNetworkError'
backend_1 | }]
backend_1 | at Pool.<anonymous> (/home/app/node_modules/mongodb/lib/core/topologies/server.js:438:11)
backend_1 | at Pool.emit (events.js:315:20)
backend_1 | at /home/app/node_modules/mongodb/lib/core/connection/pool.js:562:14
backend_1 | at /home/app/node_modules/mongodb/lib/core/connection/pool.js:995:11
backend_1 | at /home/app/node_modules/mongodb/lib/core/connection/connect.js:32:7
backend_1 | at callback (/home/app/node_modules/mongodb/lib/core/connection/connect.js:280:5)
backend_1 | at Socket.<anonymous> (/home/app/node_modules/mongodb/lib/core/connection/connect.js:310:7)
backend_1 | at Object.onceWrapper (events.js:422:26)
backend_1 | at Socket.emit (events.js:315:20)
backend_1 | at emitErrorNT (internal/streams/destroy.js:84:8)
backend_1 | at processTicksAndRejections (internal/process/task_queues.js:84:21)
backend_1 | (node:18) UnhandledPromiseRejectionWarning: MongoNetworkError: failed to connect to server [localhost:27017] on first connect [Error: connect ECONNREFUSED 127.0.0.1:27017
backend_1 | at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16) {
backend_1 | name: 'MongoNetworkError'
backend_1 | }]
Code of backend/db.js:
const mongoose = require('mongoose');
// Allow Promises
mongoose.Promise = global.Promise;
// Connection
mongoose.connect('mongodb://localhost:27017/db_test', { useNewUrlParser: true });
// Validation
mongoose.connection
.once('open', () => console.log('Connected to the database!'))
.on('error', err => console.log('Error with the database!', err));
Terminal Output of docker inspect mongodb:
Terminal output after adding the mongo uri as environment variable:
backend_1 | App running on port 5000
backend_1 | (node:19) UnhandledPromiseRejectionWarning: MongooseError: The `uri` parameter to `openUri()` must be a string, got "undefined". Make sure the first parameter to `mongoose.connect()` or `mongoose.createConnection()` is a string.
backend_1 | (node:19) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
New error:
backend_1 | Error with the database! MongoNetworkError: failed to connect to server [merncrudapp_mongodb_1:27017] on first connect [Error: connect ECONNREFUSED 172.23.0.3:27017
backend_1 | at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16) {
backend_1 | name: 'MongoNetworkError'
backend_1 | }]
Try to connect not using localhost but a container name. So for example, if you want to connect to MongoDB from another container (in the same docker network) you can use mongodb:27017.
It should work.
When you run docker-compose up, the following happens:
A network called MernCrudApp(takes the default name of the directory) is created.
A container is created using the frontend's configuration. It joins the network MernCrudApp under the name frontend.
A container is created using the backend's configuration. It joins the network MernCrudApp under the name backend.
A container is created using mongodb’s configuration. It joins the network MernCrudApp under the name mongodb.
now if you use mongodb://localhost:27017/db_test to connect to the db, the node app will look for MongoDB in the backend container which you will get a connection error since it does not exist.
To remedy this, change the MongoDB connection string to mongodb://mongodb:27017/db_test so that it
Aditional comments
I would recommend the following to help solve some problems you might face in the future using the current configuration.
Add the connection string as an environment variable. This will make it easier for you to change the DB instances without rebuilding the container.
Since the backend application depends on the database add depend_on on the docker-compose files so that the MongoDB container starts before the backend container
Modify your backend/db.js code.
Your are getting error because when you mention 'localhost' in code your container is trying to connect to backend container on port 2017 which not in use.
// Connection
//mongodb is a container name of mongo image
mongoose.connect('mongodb://mongodb:27017/db_test', { useNewUrlParser: true });
Pro Tip -
1)If you are in same docker network then use hostname or Docker Container Name to communicate/Link with each other.
2)Never use container IP address in code if IP's are not assign manually. Whenever you restart container it may change.

Unable to start FastAPI server with postgresql using docker compose

I am creating a FastAPI server with simple CRUD functionalities with Postgresql as database. Everything works well in my local environment. However, when I tried to make it run in containers using docker-compose up, it failed. I was getting this error:
rest_api_1 | File "/usr/local/lib/python3.8/site-packages/psycopg2/__init__.py", line 122, in connect
rest_api_1 | conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
rest_api_1 | sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not connect to server: Connection refused
rest_api_1 | Is the server running on host "db" (172.29.0.2) and accepting
rest_api_1 | TCP/IP connections on port 5432?
rest_api_1 |
rest_api_1 | (Background on this error at: https://sqlalche.me/e/14/e3q8)
networks_lab2_rest_api_1 exited with code 1
The directory structure:
├── Dockerfile
├── README.md
├── __init__.py
├── app
│   ├── __init__.py
│   ├── __pycache__
│   ├── crud.py
│   ├── database.py
│   ├── main.py
│   ├── models.py
│   ├── object_store
│   └── schemas.py
├── docker-compose.yaml
├── requirements.txt
├── tests
│   ├── __init__.py
│   ├── __pycache__
│   ├── assets
│   ├── test_create.py
│   ├── test_delete.py
│   ├── test_file.py
│   ├── test_get.py
│   ├── test_heartbeat.py
│   └── test_put.py
└── venv
├── bin
├── include
├── lib
└── pyvenv.cfg
My docker-compose.yaml
version: "3"
services:
db:
image: postgres:13-alpine
volumes:
- postgres_data:/var/lib/postgresql/data/
environment:
POSTGRES_USER: ${DATABASE_TYPE}
POSTGRES_PASSWORD: ${DATABASE_PASSWORD}
POSTGRES_DB: ${DATABASE_NAME}
ports:
- "5432:5432"
rest_api:
build: .
command: uvicorn app.main:app --host 0.0.0.0
env_file:
- ./.env
volumes:
- .:/app
ports:
- "8000:8000"
depends_on:
- db
volumes:
postgres_data:
My Dockerfile for the fastAPI server (under ./app)
FROM python:3.8-slim-buster
RUN apt-get update \
&& apt-get -y install libpq-dev gcc \
&& pip install psycopg2
WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
COPY requirements.txt .
RUN pip install -r requirements.txt
# copy project
COPY . .
My database.py
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from dotenv import load_dotenv
import os
# def create_connection_string():
# load_dotenv()
# db_type = os.getenv("DATABASE_TYPE")
# username = os.getenv("DATABASE_USERNAME")
# password = os.getenv("DATABASE_PASSWORD")
# host = os.getenv("DATABASE_HOST")
# port = os.getenv("DATABASE_PORT")
# name = os.getenv("DATABASE_NAME")
#
# return "{0}://{1}:{2}#{3}/{4}".format(db_type, username, password, host, name)
SQLALCHEMY_DATABASE_URI = "postgresql://postgres:postgres#db:5432/postgres"
engine = create_engine(
SQLALCHEMY_DATABASE_URI
)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base = declarative_base()
My main.py
from typing import List, Optional
import os, base64, shutil
from functools import wraps
from fastapi import Depends, FastAPI, HTTPException, UploadFile, File, Request, Header
from fastapi.responses import FileResponse
from sqlalchemy.orm import Session
from . import crud, models, schemas
from .database import SessionLocal, engine
models.Base.metadata.create_all(bind=engine)
app = FastAPI()
SECRET_KEY = os.getenv("SECRET")
# Dependency
def get_db():
db = SessionLocal()
try:
yield db
finally:
db.close()
def check_request_header(x_token: str = Header(...)):
if x_token != SECRET_KEY:
raise HTTPException(status_code=401, detail="Unauthorized")
# endpoints
#app.get("/heartbeat", dependencies=[Depends(check_request_header)], status_code=200)
def heartbeat():
return "The connection is up"
A more complete log is:
Creating db_1 ... done
Creating rest_api_1 ... done
Attaching to db_1, rest_api_1
db_1 | The files belonging to this database system will be owned by user "postgres".
db_1 | This user must also own the server process.
db_1 |
db_1 | The database cluster will be initialized with locale "en_US.utf8".
db_1 | The default database encoding has accordingly been set to "UTF8".
db_1 | The default text search configuration will be set to "english".
db_1 |
...
db_1 | selecting dynamic shared memory implementation ... posix
db_1 | selecting default max_connections ... 100
db_1 | selecting default shared_buffers ... 128MB
db_1 | selecting default time zone ... UTC
db_1 | creating configuration files ... ok
db_1 | running bootstrap script ... ok
db_1 | performing post-bootstrap initialization ... sh: locale: not found
db_1 | 2021-09-29 18:13:35.027 UTC [31] WARNING: no usable system locales were found
rest_api_1 | Traceback (most recent call last):
rest_api_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3240, in _wrap_pool_connect
rest_api_1 | return fn()
...
rest_api_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 584, in connect
rest_api_1 | return self.dbapi.connect(*cargs, **cparams)
rest_api_1 | File "/usr/local/lib/python3.8/site-packages/psycopg2/__init__.py", line 122, in connect
rest_api_1 | conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
rest_api_1 | psycopg2.OperationalError: could not connect to server: Connection refused
rest_api_1 | Is the server running on host "db" (172.29.0.2) and accepting
rest_api_1 | TCP/IP connections on port 5432?
rest_api_1 |
rest_api_1 |
rest_api_1 | The above exception was the direct cause of the following exception:
rest_api_1 |
rest_api_1 | Traceback (most recent call last):
rest_api_1 | File "/usr/local/bin/uvicorn", line 8, in <module>
rest_api_1 | sys.exit(main())
...
est_api_1 | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 584, in connect
rest_api_1 | return self.dbapi.connect(*cargs, **cparams)
rest_api_1 | File "/usr/local/lib/python3.8/site-packages/psycopg2/__init__.py", line 122, in connect
rest_api_1 | conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
rest_api_1 | sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not connect to server: Connection refused
rest_api_1 | Is the server running on host "db" (172.29.0.2) and accepting
rest_api_1 | TCP/IP connections on port 5432?
rest_api_1 |
rest_api_1 | (Background on this error at: https://sqlalche.me/e/14/e3q8)
rest_api_1 exited with code 1
db_1 | ok
db_1 | syncing data to disk ... ok
db_1 |
db_1 |
db_1 | Success. You can now start the database server using:
...
db_1 | 2021-09-29 18:13:36.325 UTC [1] LOG: starting PostgreSQL 13.4 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20210424) 10.3.1 20210424, 64-bit
db_1 | 2021-09-29 18:13:36.325 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
db_1 | 2021-09-29 18:13:36.325 UTC [1] LOG: listening on IPv6 address "::", port 5432
db_1 | 2021-09-29 18:13:36.328 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db_1 | 2021-09-29 18:13:36.332 UTC [48] LOG: database system was shut down at 2021-09-29 18:13:36 UTC
db_1 | 2021-09-29 18:13:36.336 UTC [1] LOG: database system is ready to accept connections
I have gone through a very extensive search and read docs/tutorials about running FastAPI server and Postgresql with docker-compose, such as
https://testdriven.io/blog/fastapi-docker-traefik/
https://github.com/AmishaChordia/FastAPI-PostgreSQL-Docker/blob/master/FastAPI/docker-compose.yml
https://www.jeffastor.com/blog/pairing-a-postgresql-db-with-your-dockerized-fastapi-app
Their approach is the same as mine but it just keeps giving me this Connection refused Is the server running on host "db" (172.29.0.2) and accepting TCP/IP connections on port 5432? error message ...
Can anyone help me out here? Any help will be appreciated !!
First, the SQLALCHEMY_DATABASE_URI in database.py should match the user, password and database name suplied in Your docker-compose.yaml. Ensure that You are running docker-compose up with correct environ. In Your case, the environ for docker-compose up should be:
DATABASE_TYPE=postgres
DATABASE_PASSWORD=postgres
DATABASE_NAME=postgres
But I think that Your problem is somewhere else. Even if you declare Your API service as depends_on: - db, the postgres server can be not ready yet. depends_on ensures that the target image will not be instantiated before the referenced one, but does not ensure anything more. It takes some time for postgres server to be initialized, up and runing inside running container, and if Your API will try to connect before that happens, it will fail.
The common and simplest solution is to write a bunch of code that will check over and over if database is up and running before actual connection happen. As You are not supplying the whole traceback (actually, You have replaced the most important part with ...) I can only guess on what line of Your code the connection event is triggered. I would recommend modifying Your database.py to look like (not tested, may require some adjustments):
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from dotenv import load_dotenv
import os
import time
def wait_for_db(db_uri):
"""checks if database connection is established"""
_local_engine = create_engine(db_uri)
_LocalSessionLocal = sessionmaker(
autocommit=False, autoflush=False, bind=_local_engine
)
up = False
while not up:
try:
# Try to create session to check if DB is awake
db_session = _LocalSessionLocal()
# try some basic query
db_session.execute("SELECT 1")
db_session.commit()
except Exception as err:
print(f"Connection error: {err}")
up = False
else:
up = True
time.sleep(2)
SQLALCHEMY_DATABASE_URI = "postgresql://postgres:postgres#db:5432/postgres"
wait_for_db(SQLALCHEMY_DATABASE_URI)
engine = create_engine(
SQLALCHEMY_DATABASE_URI
)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base = declarative_base()
A more sophisticated solution would be playing with docker-compose healtchecks (v2 only). For docker-compose v3, they recommend doing it manually similar to the solution presented above.
To improve this solution, include wait_for_db in a python, commandline script and run it in some kind of image entrypoint in a prestart stage. You will need a prestart stage in an entrypoint anyway for running migrations (You do include migrations in Your projects, right?)
You could also handle retries by using Docker's restart mechanism. If you use max attempts and the right dela, you can make it so the DB will most likely be ready by the second try while preventing infinite restart.
rest_api:
...
deploy:
restart_policy:
condition: on-failure
delay: 5s # default
max_attempts: 5
...
Note that I'm not a Docker expert, but this seems like it better aligns with the containers as cattle instead of pets paradigm. Why add complexity to the application when the issue can be handled by existing functionality in a higher layer?

How to connect the postgres database in docker

I have created a Rasa Chatbot that asks user information and store it in the postgres database. Locally it works. I have been trying to do that in the docker but it is not working. I'm new to docker. could anyone help me. Thanks in advance
Docker-compose.yml
version: "3.0"
services:
rasa:
container_name: rasa
image: rasa/rasa:2.8.1-full
ports:
- 5005:5005
volumes:
- ./:/app
command:
- run
- -m
- models
- --enable-api
- --cors
- "*"
depends_on:
- action-server1
- db
action-server1:
container_name: "action-server1"
build:
context: actions
volumes:
- ./actions:/app/actions
ports:
- "5055:5055"
networks:
- shan_network
db:
image: "postgres"
environment:
POSTGRESQL_USERNAME: "postgres"
POSTGRESQL_PASSWORD: ""
POSTGRESQL_DATABASE: "postgres"
POSTGRES_HOST_AUTH_METHOD: "trust"
volumes:
- db-data:/var/lib/postgresql/data
ports:
- "5432:5432"
volumes:
db-data:
Logs: All services are running in logs and I checked in docker also postgres is running.
db_1 |
db_1 | PostgreSQL Database directory appears to contain a database; Skipping initialization
db_1 |
db_1 | 2021-08-05 08:21:45.685 UTC [1] LOG: starting PostgreSQL 13.3 (Debian 13.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debia
n 8.3.0-6) 8.3.0, 64-bit
db_1 | 2021-08-05 08:21:45.686 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
db_1 | 2021-08-05 08:21:45.686 UTC [1] LOG: listening on IPv6 address "::", port 5432
db_1 | 2021-08-05 08:21:45.699 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db_1 | 2021-08-05 08:21:45.712 UTC [26] LOG: database system was shut down at 2021-08-05 08:21:25 UTC
db_1 | 2021-08-05 08:21:45.722 UTC [1] LOG: database system is ready to accept connections
Error:
action-server1 | warnings.warn(
action-server1 | Exception occurred while handling uri: 'http://action-server1:5055/webhook'
action-server1 | Traceback (most recent call last):
action-server1 | File "/opt/venv/lib/python3.8/site-packages/sanic/app.py", line 973, in handle_request
action-server1 | response = await response
action-server1 | File "/opt/venv/lib/python3.8/site-packages/rasa_sdk/endpoint.py", line 104, in webhook
action-server1 | result = await executor.run(action_call)
action-server1 | File "/opt/venv/lib/python3.8/site-packages/rasa_sdk/executor.py", line 398, in run
action-server1 | action(dispatcher, tracker, domain)
**action-server1 | File "/app/actions/actions.py", line 148, in run
action-server1 | connection = psycopg2.connect(database="postgres", user='postgres', password='password',port='5432'
action-server1 | File "/opt/venv/lib/python3.8/site-packages/psycopg2/__init__.py", line 122, in connect
action-server1 | conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
action-server1 | psycopg2.OperationalError: could not connect to server: No such file or directory
action-server1 | Is the server running locally and accepting
action-server1 | connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?**
rasa | 2021-08-05 08:25:13 ERROR rasa.core.processor - Encountered an exception while running action 'action_save'.Bot will continue, but
the actions events are lost. Please check the logs of your action server for more information.
rasa | Traceback (most recent call last):
rasa | File "/opt/venv/lib/python3.8/site-packages/rasa/core/actions/action.py", line 685, in run
rasa | response = await self.action_endpoint.request(
rasa | File "/opt/venv/lib/python3.8/site-packages/rasa/utils/endpoints.py", line 172, in request
rasa | raise ClientResponseError(
rasa | rasa.utils.endpoints.ClientResponseError: 500, Internal Server Error, body='b'<!DOCTYPE html><meta charset=UTF-8><title>500 \xe2\x80\x9
4 Internal Server Error</title><style>html { font-family: sans-serif }</style>\n<h1>\xe2\x9a\xa0\xef\xb8\x8f 500 \xe2\x80\x94 Internal Server Error</h1><p>
The server encountered an internal error and cannot complete your request.\n''
rasa |
rasa | The above exception was the direct cause of the following exception:
rasa |
rasa | Traceback (most recent call last):
rasa | File "/opt/venv/lib/python3.8/site-packages/rasa/core/processor.py", line 772, in _run_action
rasa | events = await action.run(
rasa | File "/opt/venv/lib/python3.8/site-packages/rasa/core/actions/action.py", line 709, in run
rasa | raise RasaException("Failed to execute custom action.") from e
rasa | rasa.shared.exceptions.RasaException: Failed to execute custom action.
Think of containers in the stack as of different physical or virtual machines. Your database is on one host and the chatbot is on another. Naturally the chatbot cannot find /var/run/postgresql/.s.PGSQL.5432 locally because it's in another container (as if on another computer), so you need to use network connection to reach it:
# If host is not given it uses unix socket which you appear to have locally,
# thus add it here:
connection = psycopg2.connect(database="postgres",
user='postgres',
password='password',
host='db', # name of the service in the stack
port='5432')
Also, your action-server1 service is configured to be in shan_network:
action-server1:
networks:
- shan_network
Therefore, the action-server1 currently has no network access to the other services in this stack. db and rasa have no networks configured and because of that they use the default network, which is automatically created for you by docker-compose. This is as if you would configure those services as following:
db:
image: "postgres"
networks:
- default
If you wish action-server1 to appear in several networks and thus be able to reach services both in this stack and whatever is in shan_network, you need to add the service to the default network:
action-server1:
networks:
- shan_network
- default
Alternatively, if you are unsure why there is a shan_network at all, you can simply remove the network key from the action-server1 service.

failed to initialize database, got error failed to connect to `host=user_db user=gorm database=gorm`

Envs
gorm.io/gorm v1.20.6
gorm.io/driver/postgres v1.0.
Error Texts
user_1 | [error] failed to initialize database, got error failed to connect to `host=user_db user=gorm database=gorm`: dial error (dial tcp 172.18.0.2:9920: connect: connection refused)
user_1 | panic: failed to connect to `host=user_db user=gorm database=gorm`: dial error (dial tcp 172.18.0.2:9920: connect: connection refused)
All logs
$ docker-compose up
Starting urlshortener_user_db_1 ... done
Starting urlshortener_user_1 ... done
Starting urlshortener_gateway_1 ... done
Attaching to urlshortener_user_db_1, urlshortener_user_1, urlshortener_gateway_1
user_db_1 |
user_db_1 | PostgreSQL Database directory appears to contain a database; Skipping initialization
user_db_1 |
user_db_1 | 2020-11-13 04:21:54.059 UTC [1] LOG: starting PostgreSQL 13.0 on x86_64-pc-linux-musl, compiled by gcc (Alpine 9.3.0) 9.3.0, 64-bit
user_db_1 | 2020-11-13 04:21:54.059 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
user_db_1 | 2020-11-13 04:21:54.059 UTC [1] LOG: listening on IPv6 address "::", port 5432
user_db_1 | 2020-11-13 04:21:54.070 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
gateway_1 | [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
gateway_1 |
gateway_1 | [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
gateway_1 | - using env: export GIN_MODE=release
gateway_1 | - using code: gin.SetMode(gin.ReleaseMode)
gateway_1 |
gateway_1 | [GIN-debug] POST /api/v1/users/login --> github.com/Asuha-a/URLShortener/api/controllers.Login (3 handlers)
gateway_1 | [GIN-debug] POST /api/v1/users/signup --> github.com/Asuha-a/URLShortener/api/controllers.Signup (3 handlers)
gateway_1 | [GIN-debug] Environment variable PORT is undefined. Using port :8080 by default
gateway_1 | [GIN-debug] Listening and serving HTTP on :8080
user_db_1 | 2020-11-13 04:21:54.082 UTC [21] LOG: database system was interrupted; last known up at 2020-11-13 04:17:25 UTC
user_1 |
user_1 | 2020/11/13 04:21:54 /go/src/github.com/Asuha-a/URLShortener/api/services/user/db/user.go:15
user_1 | [error] failed to initialize database, got error failed to connect to `host=user_db user=gorm database=gorm`: dial error (dial tcp 172.18.0.2:9920: connect: connection refused)
user_1 | panic: failed to connect to `host=user_db user=gorm database=gorm`: dial error (dial tcp 172.18.0.2:9920: connect: connection refused)
user_1 |
user_1 | goroutine 1 [running]:
user_1 | github.com/Asuha-a/URLShortener/api/services/user/db.Init()
user_1 | /go/src/github.com/Asuha-a/URLShortener/api/services/user/db/user.go:19 +0x1ed
user_1 | main.main()
user_1 | /go/src/github.com/Asuha-a/URLShortener/api/services/user/main.go:34 +0x37
user_db_1 | 2020-11-13 04:21:54.175 UTC [21] LOG: database system was not properly shut down; automatic recovery in progress
user_db_1 | 2020-11-13 04:21:54.180 UTC [21] LOG: redo starts at 0/15D6CE0
user_db_1 | 2020-11-13 04:21:54.180 UTC [21] LOG: invalid record length at 0/15D6D18: wanted 24, got 0
user_db_1 | 2020-11-13 04:21:54.180 UTC [21] LOG: redo done at 0/15D6CE0
urlshortener_user_1 exited with code 2
user_db_1 | 2020-11-13 04:21:54.208 UTC [1] LOG: database system is ready to accept connections
Code
docker-compose.yml
version: '3'
services:
gateway:
build:
context: ./api/
dockerfile: Dockerfile
ports:
- 8080:8080
tty:
true
depends_on:
- user
user:
build:
context: ./api/services/user
dockerfile: Dockerfile
ports:
- 50051:50051
tty:
true
depends_on:
- user_db
user_db:
image: postgres:alpine
environment:
POSTGRES_USER: gorm
POSTGRES_PASSWORD: gorm
POSTGRES_DB: gorm
POSTGRES_HOST: user_db
ports:
- 9920:9920
user.go(initializing database)
package db
import (
"gorm.io/driver/postgres"
"gorm.io/gorm"
)
var (
db *gorm.DB
err error
)
// Init DB
func Init() {
db, err = gorm.Open(postgres.New(postgres.Config{
DSN: "host=user_db user=gorm password=gorm dbname=gorm port=9920 sslmode=disable TimeZone=Asia/Tokyo",
}), &gorm.Config{})
if err != nil {
panic(err)
}
autoMigration()
}
// Close DB
func Close() {
dbSQL, err := db.DB()
if err != nil {
panic(err)
}
dbSQL.Close()
}
func autoMigration() {
db.AutoMigrate(&User{})
}
What I tried
When I add 'container_name: "user_db"' in the docker-compose.yml, I got the same error.
When I change the hostname to 'user_db:9920', it seems to not be recognized as a hostname.
user_1 | [error] failed to initialize database, got error failed to connect to `host=user_db:9920 user=gorm database=gorm`: hostname resolving error (lookup user_db:9920: Try again)
user_1 | panic: failed to connect to `host=user_db:9920 user=gorm database=gorm`: hostname resolving error (lookup user_db:9920: Try again)
What I want to know
I know that I have to set the container name as the hostname and I set the 'user_db' defined in docker-compose.yml.
Why can't I init DB?
Because of the image postgres:alpine is being exposed port 5432.
Try to change - 9920:9920 to - 5432:5432 and port in connection string to
port=5432

Greenplum DB stuck in recovery mode

I have a Greenplum deployment with several segments.
version: postgres (Greenplum Database) 8.2.15
A few days ago after an error as seen from pg_log (below)
FATAL 54000 out of on_shmem_exit slots
WARNING 1000 StartTransaction while in START state
PANIC XX000 Waiting on lock already held! (lwlock.c:557)
LOG 0 server process (PID 224596) was terminated by signal 6: Aborted
LOG 0 terminating any other active server processes
FATAL 57P01 terminating connection due to administrator command
LOG 0 sweeper process (PID 102635) exited with exit code 2
LOG 0 seqserver process (PID 102632) exited with exit code 2
FATAL 57P03 the database system is in recovery mode
LOG 0 ftsprobe process (PID 102633) exited with exit code 2
FATAL 57P03 the database system is in recovery mode
FATAL 57P03 the database system is in recovery mode
FATAL 57P03 the database system is in recovery mode
FATAL 57P03 the database system is in recovery mode
FATAL 57P03 the database system is in recovery mode
For 4 days now database remains in recovery mode, gpstart, gpstop all return with errors "the database system is in recovery mode" afterwards fails.
See ps response below:
[gpadmin#mdw1 ~]$ ps -ef | grep post
gpadmin 2979 189094 0 12:25 pts/0 00:00:00 grep post
postfix 3264 3251 0 2015 ? 00:34:40 qmgr -l -t fifo -u
gpadmin 102637 230099 0 May18 ? 00:01:47 postgres: port 5432, stats sender process
gpadmin 230099 1 0 Apr24 ? 02:47:53 /usr/local/greenplum-db-4.3.10.0/bin/postgres -D /data/master/gpseg-1 -p 5432 -b 1 -z 96 --silent-mode=true -i -M master -C -1 -x 194 -E
gpadmin 230100 230099 0 Apr24 ? 00:49:45 postgres: port 5432, master logger process
[gpadmin#mdw1 ~]$
I have searched a lot but am not able to find as a solution kindly assist with pointers on how to bring the database up.