postgress dvdrental sample database - postgresql

Why do i get this error while trying to get the sample database on my Ubuntu?
postgres#vagrant:~$ curl -O https://sp.postgresqltutorial.com/wp-content/uploads/2019/05/dvdrental.zip
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0
curl: (51) SSL: no alternative certificate subject name matches target host name 'sp.postgresqltutorial.com'
I downloaded it manually but can't access it from psql shell

Related

ACI Container hangs during run CPU=0

I am attempting to run a bash script using ACI. Occasionally the container stalls or hangs. CPU activity drops to 0, memory flattens(see below) and network activity drops to ~50bytes but the script never completes. The container never terminates that I can tell. A bash window can be opened on the container. The logs suggest the hang occurs during wget.
Possible clue:
How can I verify my container is using SMB3.0 to connect to my share or is that handled at the host server level and I have to assume ACI uses SMB 3.0 ?
This script:
Dequeues an item from ServiceBus.
Runs an exe to obtain a URL
Performs a wget using the URL; writes the output to a StorageAccount Fileshare.
Exits, terminating the container.
wget is invoked with a 4min timeout. Data is written directly to the share so the run can be retried if it fails and the wget can resume. The timeout command should force wget to end if it hangs. The logs suggest the container hangs at wget.
timeout 4m wget -c -O "/aci/mnt/$ID/$ID.sra" $URL
I have 100 items in queue. I have 5 ACI's running 5 containers each (25 total containers.)
A Logic App checks the Queue and if items are present, runs the containers.
Approximately 95% of the download runs work as expected.
Many of the runs simply hang as far as I can tell at 104GB total downloads.
I am using a Premium Storage Account with a 300GB Fileshare using SMB Multichannel=Enabled
It seems on some of the large files (>3GB) the Container Instance will hang.
A successful run looks somethin like this:
PeekLock Message (5min)...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 353 0 353 0 0 1481 0 --:--:-- --:--:-- --:--:-- 1476
Wed Dec 28 14:31:41 UTC 2022: prefetch-01-0: ---------- BEGIN RUN ----------
./pipeline.sh: line 80: can't create /aci/mnt/SRR10575111/SRR10575111.txt: nonexistent directory
Wed Dec 28 14:31:41 UTC 2022: prefetch-01-0: vdb-dump [SRR10575111]...
Wed Dec 28 14:31:44 UTC 2022: prefetch-01-0: wget [SRR10575111]...
Wed Dec 28 14:31:44 UTC 2022: prefetch-01-0: URL=https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR10575111/SRR10575111
Connecting to sra-pub-run-odp.s3.amazonaws.com (54.231.131.113:443)
saving to '/aci/mnt/SRR10575111/SRR10575111.sra'
SRR10575111.sra 0% | | 12.8M 0:03:39 ETA
SRR10575111.sra 1% | | 56.1M 0:01:39 ETA
...
SRR10575111.sra 99% |******************************* | 2830M 0:00:00 ETA
SRR10575111.sra 100% |********************************| 2833M 0:00:00 ETA
'/aci/mnt/SRR10575111/SRR10575111.sra' saved
Wed Dec 28 14:35:42 UTC 2022: prefetch-01-0: wget exit...
Wed Dec 28 14:35:43 UTC 2022: prefetch-01-0: wget Success!
Wed Dec 28 14:35:43 UTC 2022: prefetch-01-0: Delete Message...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
Wed Dec 28 14:35:43 UTC 2022: prefetch-01-0: POST to [orchestrator] queue...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 325 0 0 100 325 0 1105 --:--:-- --:--:-- --:--:-- 1109
Wed Dec 28 14:35:44 UTC 2022: prefetch-01-0: exit RESULTCODE=0
A Hung run looks like this:
PeekLock Message (5min)...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 352 0 352 0 0 1252 0 --:--:-- --:--:-- --:--:-- 1252
Wed Dec 28 14:31:41 UTC 2022: prefetch-01-1: ---------- BEGIN RUN ----------
./pipeline.sh: line 80: can't create /aci/mnt/SRR9164212/SRR9164212.txt: nonexistent directory
Wed Dec 28 14:31:41 UTC 2022: prefetch-01-1: vdb-dump [SRR9164212]...
Wed Dec 28 14:31:44 UTC 2022: prefetch-01-1: wget [SRR9164212]...
Wed Dec 28 14:31:44 UTC 2022: prefetch-01-1: URL=https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR9164212/SRR9164212
Connecting to sra-pub-run-odp.s3.amazonaws.com (52.216.146.75:443)
saving to '/aci/mnt/SRR9164212/SRR9164212.sra'
SRR9164212.sra 0% | | 2278k 0:55:44 ETA
SRR9164212.sra 0% | | 53.7M 0:04:30 ETA
SRR9164212.sra 1% | | 83.9M 0:04:18 ETA
...
SRR9164212.sra 44% |************** | 3262M 0:04:55 ETA
SRR9164212.sra 44% |************** | 3292M 0:04:52 ETA
SRR9164212.sra 45% |************** | 3326M 0:04:47 ETA
The container is left in a Running State.
CPU goes to 0; Network activity goes to ~ 50B received/transmitted.

How to set up pgbouncer in a docker-compose setup for Airflow

I'm running a distributed Airflow setup using docker-compose. The main part of the services are run on one server and the celery workers are run on multiple servers. I have few hundred tasks running every five minutes and I started to run out of db connections which was indicated byt his error message in task logs.
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "SERVER" (IP), port XXXXX failed: FATAL: sorry, too many clients already
I'm using Postgres as metastore and and the max_connections is set to the default value of 100. I didn't want to raise the max_connections value since I thought, that there should be a better solution for this. At some point I'll run thousands of tasks every 5 min and the number of connections is guaranteed to run out again. So I added pgbouncer to my configuration.
Here's how I configured pgbouncer
pgbouncer:
image: "bitnami/pgbouncer:1.16.0"
restart: always
environment:
POSTGRESQL_HOST: "postgres"
POSTGRESQL_USERNAME: ${POSTGRES_USER}
POSTGRESQL_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRESQL_PORT: ${PSQL_PORT}
PGBOUNCER_DATABASE: ${POSTGRES_DB}
PGBOUNCER_AUTH_TYPE: "trust"
PGBOUNCER_IGNORE_STARTUP_PARAMETERS: "extra_float_digits"
ports:
- '1234:1234'
depends_on:
- postgres
pgbouncer logs look like this:
pgbouncer 13:29:13.87
pgbouncer 13:29:13.87 Welcome to the Bitnami pgbouncer container
pgbouncer 13:29:13.87 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-pgbouncer
pgbouncer 13:29:13.87 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-pgbouncer/issues
pgbouncer 13:29:13.88
pgbouncer 13:29:13.89 INFO ==> ** Starting PgBouncer setup **
pgbouncer 13:29:13.91 INFO ==> Validating settings in PGBOUNCER_* env vars...
pgbouncer 13:29:13.91 WARN ==> You set the environment variable PGBOUNCER_AUTH_TYPE=trust. For safety reasons, do not use this flag in a production environment.
pgbouncer 13:29:13.91 INFO ==> Initializing PgBouncer...
pgbouncer 13:29:13.92 INFO ==> Waiting for PostgreSQL backend to be accessible
pgbouncer 13:29:13.92 INFO ==> Backend postgres:9876 accessible
pgbouncer 13:29:13.93 INFO ==> Configuring credentials
pgbouncer 13:29:13.93 INFO ==> Creating configuration file
pgbouncer 13:29:14.06 INFO ==> Loading custom scripts...
pgbouncer 13:29:14.06 INFO ==> ** PgBouncer setup finished! **
pgbouncer 13:29:14.08 INFO ==> ** Starting PgBouncer **
2022-10-25 13:29:14.089 UTC [1] LOG kernel file descriptor limit: 1048576 (hard: 1048576); max_client_conn: 100, max expected fd use: 152
2022-10-25 13:29:14.089 UTC [1] LOG listening on 0.0.0.0:1234
2022-10-25 13:29:14.089 UTC [1] LOG listening on unix:/tmp/.s.PGSQL.1234
2022-10-25 13:29:14.089 UTC [1] LOG process up: PgBouncer 1.16.0, libevent 2.1.8-stable (epoll), adns: c-ares 1.14.0, tls: OpenSSL 1.1.1d 10 Sep 2019
2022-10-25 13:30:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:31:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:32:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:33:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:34:14.089 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:35:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:36:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:37:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:38:14.090 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-25 13:39:14.089 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
The service seems to run ok, but I think it doesn't do anything. There was very little information about this in the Airflow documentation and I'm unsure what to change.
Should I change the pgbouncer setup in my docker-compose file?
Should I change AIRFLOW__DATABASE__SQL_ALCHEMY_CONN variable?
Update 1:
I edited the docker-compose.yml for the worker nodes and changed the db port to be the pgbouncer port. After this I got some traffic on the bouncer logs. Airflow tasks are queued and not procesessed with this configuration so there's still something wrong. I didn't edit the docker-compose yaml that launches the webserver, scheduler etc., didn't know how.
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://<XXX>#${AIRFLOW_WEBSERVER_URL}:${PGBOUNCER_PORT}/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://<XXX>#${AIRFLOW_WEBSERVER_URL}:${PGBOUNCER_PORT}/airflow
pgbouncer log after the change:
2022-10-26 11:46:22.517 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-26 11:47:22.517 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-26 11:48:22.517 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-26 11:49:22.519 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-26 11:50:22.518 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-26 11:51:22.516 UTC [1] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2022-10-26 11:51:52.356 UTC [1] LOG C-0x5602cf8ab180: <XXX>#<IP:PORT> login attempt: db=airflow user=airflow tls=no
2022-10-26 11:51:52.359 UTC [1] LOG S-0x5602cf8b1f20: <XXX>#<IP:PORT> new connection to server (from <IP:PORT>)
2022-10-26 11:51:52.410 UTC [1] LOG C-0x5602cf8ab180: <XXX>#<IP:PORT> closing because: client close request (age=0s)
2022-10-26 11:51:52.834 UTC [1] LOG C-0x5602cf8ab180: <XXX>#<IP:PORT> login attempt: db=airflow user=airflow tls=no
2022-10-26 11:51:52.845 UTC [1] LOG C-0x5602cf8ab180: <XXX>#<IP:PORT> closing because: client close request (age=0s)
2022-10-26 11:51:56.752 UTC [1] LOG C-0x5602cf8ab180: <XXX>#<IP:PORT> login attempt: db=airflow user=airflow tls=no
2022-10-26 11:51:57.393 UTC [1] LOG C-0x5602cf8ab3b0: <XXX>#<IP:PORT> login attempt: db=airflow user=airflow tls=no
2022-10-26 11:51:57.394 UTC [1] LOG S-0x5602cf8b2150: <XXX>#<IP:PORT> new connection to server (from <IP:PORT>)
2022-10-26 11:51:59.906 UTC [1] LOG C-0x5602cf8ab180: <XXX>#<IP:PORT> closing because: client close request (age=3s)
2022-10-26 11:52:00.642 UTC [1] LOG C-0x5602cf8ab3b0: <XXX>#<IP:PORT> closing because: client close request (age=3s)

How to query Go-micro services(v2) inside docker with curl or postman

I use Go-micro(v2) to deploy services inside docker-compose
user-service:
build:
context: ./user-service
restart: always
ports:
- "8086:8086"
deploy:
mode: replicated
replicas: 1
environment:....
See the service configuration
srv = micro.NewService(
micro.Name("my.user"),
micro.Address("127.0.0.1:8086"))
when running docker-compose, the container logs show
2022-07-31 05:43:53 file=v2#v2.9.1/service.go:200 level=info Starting [service] my.user
2022-07-31 05:43:53 file=grpc/grpc.go:864 level=info Server [grpc] Listening on [::]:8086
2022-07-31 05:43:53 file=grpc/grpc.go:697 level=info Registry [mdns] Registering node: my.user-00ee4795-06df-47f1-a07a-cc362e135864
All looks good.
But when I want to query some handlers using curl or postman(for development purpose), It doesn't work,
see an exemple of failed request with postman
GET http://127.0.0.1:8086/my.user/Get
Error: Parse Error: Expected HTTP/
Request Headers
Content-Type: application/json
User-Agent: PostmanRuntime/7.29.2
Accept: */*
Postman-Token: b5ab718a-341b-40ff-81fa-37c66fd4d9f2
Host: 127.0.0.1:8086
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Request Body
GET http://127.0.0.1:8086/my.user/userService/Get // same error
with curl it is not better
curl --header "Content-Type:application/json" --http0.9 --output GET http://localhost:8086/my.user/Get
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 15 0 15 0 0 10638 0 --:--:-- --:--:-- --:--:-- 15000
curl --header "Content-Type:application/json" --http0.9 --output GET http://localhost:8086/my.user/userService/Get
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 15 0 15 0 0 13550 0 --:--:-- --:--:-- --:--:-- 15000
Any idea how to query locally some go-micro services ? Thank you.
ps: Note that the 'Get' handler is working

Kafka REST proxy: how to retrieve and deserialize Kafka data based on AVRO schema stored in schema-registry

I am new to Kafka. I run a docker based Kafka ecosystem on my local machine, including broker/zookeeper/schema-registry/rest-proxy. Also I have a external producer(temp-service), which sends AVRO schema serialized data to the topic temp-topic in broker.
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
411564d10c06 confluentinc/cp-kafka-rest:latest "/etc/confluent/dock…" 27 seconds ago Up 25 seconds 0.0.0.0:8082->8082/tcp kafka_kafka-rest_1
38c4e3ea008c confluentinc/cp-schema-registry:latest "/etc/confluent/dock…" 30 seconds ago Up 27 seconds 0.0.0.0:8081->8081/tcp kafka_schema-registry_1
7abe6cf9a7a0 confluentinc/cp-kafka:latest "/etc/confluent/dock…" 30 minutes ago Up 30 seconds 0.0.0.0:9092->9092/tcp kafka
bdffd9e03088 confluentinc/cp-zookeeper:latest "/etc/confluent/dock…" 30 minutes ago Up 30 seconds 2888/tcp, 0.0.0.0:2181->2181/tcp, 3888/tcp zookeeper
d1909c6877c5 temp-service:latest "node /home/tempserv…" 3 hours ago Up 2 hours (healthy) 0.0.0.0:8107->8107/tcp, 0.0.0.0:9107->9107/tcp, 0.0.0.0:9229->9229/tcp temp-service
I have also posted the AVRO schema of the kafka data of temp-service to the schema-registry, so that it is stored there as id 1.
I created a consumer group temp_consumers and a consumer instance temp_consumer_instance,
$ curl -X POST -H "Content-Type: application/vnd.kafka.v2+json" --data '{"name": "temp_consumer_instance", "format": "avro", "auto.offset.reset": "earliest"}' http://localhost:8082/consumers/temp_consumers
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 219 0 134 100 85 158 100 --:--:-- --:--:-- --:--:-- 260
{"instance_id":"temp_consumer_instance","base_uri":"http://kafka-rest:8082/consumers/temp_consumers/instances/temp_consumer_instance"}
checked topics in kafka:
$ curl -X GET http://localhost:8082/topics
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 29 0 29 0 0 1610 0 --:--:-- --:--:-- --:--:-- 1705
["temp-topic","_schemas"]
subscribed to the topic temp-topic.
$ curl -X POST -H "Content-Type: application/vnd.kafka.v2+json" --data '{"topics":["temp-topic"]}' http://localhost:8082/consumers/temp_consumers/instances/temp_consumer_instance/subscription
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 29 0 0 100 29 0 1046 --:--:-- --:--:-- --:--:-- 1115
tried to consume the records in the topic but failed:
$ curl -X GET -H "Accept: application/vnd.kafka.binary.v2+json" http://localhost:8082/consumers/temp_consumers/instances/temp_consumer_instance/records
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 126 0 126 0 0 10598 0 --:--:-- --:--:-- --:--:-- 12600
{"error_code":40601,"message":"The requested embedded data format does not match the deserializer for this consumer instance"}
I would like to know if there are any ways to deserialize the kafka data posted by the producer, based on the AVRO schema stored in schema registry?

Cannot delete database in Mongo: it returns immediately (and mysteriously) after we delete it

when we run the show dbs command, the x-development database doesn't appear. then when we drop the database from the shell, it still allows us to access it again:
> use x-development
switched to db x-development
> db.dropDatabase()
{ "dropped" : "x-development", "ok" : 1 }
> use x-development
switched to db x-development
>
why is this happening? we're on mongo 2.2.
we're trying to drop the database because it appears under mongostat, and we want to make sure this database isn't taking server resources:
[root#mongo]# mongostat
connected to: 127.0.0.1
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
0 5 0 0 0 1 1 2.11g 4.86g 464m 0 x-development:0.0% 0 0|0 0|0 62b 2k 3 11:42:57
0 0 0 0 0 1 0 2.11g 4.86g 464m 0 x-development:0.0% 0 0|0 0|0 62b 2k 3 11:42:58
0 0 0 0 0 1 0 2.11g 4.86g 464m 0 x-development:0.0% 0 0|0 0|0 62b 2k 3 11:42:59
so the real question is: why is this database appearing under mongostat if the database doesn't exist?
> db.dropDatabase()
...indeed drops the database. use can switch to any database, existing or not, it's not until you actually insert something that the database is recreated.
> show dbs
local (empty)
> use dev
switched to db dev
> show dbs
local (empty)
> db.test.insert({a:1})
> show dbs
dev 0.203125GB
local (empty)
> db.dropDatabase()
{ "dropped" : "dev", "ok" : 1 }
> show dbs
local (empty)
In Mongo you don't create databases explicitly. You just start using it (for example by issuing a use statement) and it gets created automatically.
The database won't be using any resources if you don't query it. So, I don't think you should worry about it.