How to change database to Postges in JupyterHub? - postgresql

I am trying to run jupyterhub with my config and I would like to change database form SQLite, that is created by default to PostgreSQL that alredy exists and has some tables (jupyterhub and other app would work concurrently and share database). On website only thing I see is:
We recommend using PostgreSQL for production if you are unsure ...
But no word how to change this database. Have you done this before and can describe it? Like do I ahve to create some tables on my own or do I just pass a link and jupyterhub willl do the rest?

You need create an empty database in postgres and then set the db_url property in the jupyterhub config. For example, for a postgres database on the local machine:
Connect to your postgres instance with a user that has the 'Create DB' attribute and run:
CREATE DATABASE jupyterhub1;
In your jupyterhub_config.py file set this property:
c.JupyterHub.db_url = 'postgresql://username:password#localhost:5432/jupyterhub1'
When you start jupyterhub it will create the required tables automatically. Also note that you don't have to hard code the credentials into the db_url property, you could access them from environment variables using os.environ["VAR_NAME"]
Thanks

Well, I will share what have worked for me. I hope this instruction help other people that face the same difficulty also.
Make a full backup of your database, just in case things go bad. Source
If you read the source above you got that you need to add some fields to your config.yaml.
you'll have to add this part to your config.yaml:
hub:
db:
upgrade: true
Keep following the source mentioned above and you will understand, because there are a little few details to do.
It is not that all, yet! You need to create your database(using Mysql or Postgres). After you have done your database, now you should have username, password, database_name, host and port.
Now, have a look on this info. This another page highlights one interesting point. JupyterHub doc says that using Postgres is easier than Mysql. You 'll have to add hub.db.type and hub.db.url. Have a look on doc to understand the string connection.
postgresql+psycopg2://<db-username>:<db-password>#<db-hostname>:<db-port>/<db-name>
Pay attention that in this example I just put the username and put nothing as password within string. Once it is possible to add hub.db.password I used this option. I also declared the size of database as 20Gi, in my example case.
hub:
db:
upgrade: true
type: postgres
url: postgresql+psycopg2://postgres:#db_jupyterhub_xxxxx.amazonaws.com:5432/db_jupyterhub
password: ~
pvc:
accessModes:
- ReadWriteMany
storage: 20Gi
At this point you realized that I did not put the password. In this case when I will deploy JupyterHub(or make a helm upgrade ...) I need to pass the --set parameter. So for this example I used this way to upgrade the helm chart that already existed.
helm upgrade -f config.yaml jupyterhub . \
--set hub.db.password=TYPE-PASSWORD-OF-DATABASE
With these steps the deploy should work. I used in this way and it worked fine. I have also another point to highlight, cookieSecret. In the docs they mentioned the needs of recreate it in case of pods restarting. Please have a look in this topic about deafult behavior of jupyterhub_cookie_secret and in this link explaining about cookie generation and uses.
I hope these steps help some of you.

Related

Move my hasura cloud schema, relations, tables etc. and put into my offline docker file using docker-compose

So basically I have my cloud hasura with existing schema, relations tables etc... and i want to offline it using docker and try using metadata export and import that seems not working how can I do it or is there other ways to do it?
this is the docker i want to offline
this is my cloud i want to get the schemas or metadata
OR MAYBE I JUST MANUALLY RECREATE THE TABLES AND RELATIONS??
When using the steps outlined in the Hasura Quickstart with Docker page then the following steps would help get all the table definitions, relationships etc., setup on the local instance just like it is set up on hasura cloud instance.
Migrate all the database schema and metadata using the steps mentioned in Setting up migrations
Since you want to migrate from hasura cloud use the URL of the cloud instance in step 2. Perform steps 3-6 as described in the above link.
Bring up the local docker environment. Ideally edit the docker-compose.yaml file to set HASURA_GRAPHQL_ENABLE_CONSOLE: "false" before running docker-compose up -d.
Resume the process of applying migrations from step 7. Use the endpoint from local instance. For example,
$ hasura metadata apply --endpoint http://localhost:8080
$ hasura migrate apply --endpoint http://localhost:8080

Implementing Postgres Sql in Apache Airflow

I have Apache-Airflow implemented on an Ubuntu version 18.04.3 server. When I set it up, I used the sql lite generic database, and this uses the sequential executor. I did this just to play around and get used to the system. Now I'm trying to use the Local Executor, and will need to transition my database from sqlite to the recommended postgres sql.
Does anybody know how to make this transition? All of the tutorials I've found entail setting up Airflow with postgres sql from the beginning. I know there are a ton of moving parts and I'm scared of messsing up what I currently have running. Anybody who knows how to do this or can point me at where to look is much appreciated. Thanks!
Just to complete #lalligood answer with some commands:
In airflow.cfg file look for sql_alchemy_conn and update it to point to your PostgreSQL serv:
sql_alchemy_conn = postgresql+psycopg2://user:pass#hostadress:port/database
For instance:
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow#localhost:5432/airflow
As indicated in the above line you need both user and database called airflow, therefore you need to create that. To do so, open your psql command line and type the following commands to create a user and database called airflow and give all privileges over database airflow to user airflow:
CREATE USER airflow;
CREATE DATABASE airflow;
GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow;
Now you are ready to init the airflow application using postgres:
airflow initdb
If everything was right, access the psql command line again, enter in airflow database with \c airflow command and type \dt command to list all tables of that database. You should see a list of airflow tables, currently it is 23.
Another option other than adding to the airflow.cfg file
is to set the ENV varibale AIRFLOW__CORE__SQL_ALCHEMY_CONN to the postgresql server you want.
Example: export AIRFLOW__CORE__SQL_ALCHEMY_CONN_SECRET=sql_alchemy_conn
Or you can set it in your Dockerfile setting.
See documentation here
I was able to get it working by doing the following 4 steps:
Assuming that you are starting from scratch, initialize your airflow environment with the SQLite database. The key takeaway here is for it to generate the airflow.cfg file.
Update the sql_alchemy_conn line in airflow.cfg to point to your PostgreSQL server.
Create the airflow role + database in PostgreSQL. (Revoke all permissions from public to airflow database & ensure airflow role owns airflow database!)
(Re)Initialize airflow (airflow initdb) & confirm that you see ~19 tables in the airflow database.

How to load data from S3 to PostgreSQL RDS

I have a need to load data from S3 to Postgres RDS (around 50-100 GB) I don't have the option to use AWS Data Pipeline and I am looking for something similar to using the COPY command to load data in S3 into Amazon Redshift.
I would appreciate any suggestions on how I can accomplish this.
Originally, this answer was trying to use the S3 to Postgres RDS Functionality. That whole enterprise failed (see below).
The way I have finally been able to do this is:
Set-up an EC2 instance with psql installed (see below near end of post)
Copy the relevant CSVs to import from S3 to the local instance
Use the psql /copy command to import the files up
This last part is really, really important. If you use the SQL COPY command the entire RDS Postgres role structure will frustrate you to no end. It has a wonky SUPERRDSADMIN role which is not very super at all. However, if you use the psql /copy commany you apparently can do anything. I have confirmed this be the case and have started my uploads succesfully. I will come back and re-edit this post (time permitting) to add relevant documentation steps for the above.
Caveat Emptor: The post below was all the original work I had done trying to get this implemented. I don't want to bury the lead despite multiple efforts (including what can only be described as pathetic tech support from AWS) I don't believe that this feature is ready for prime time. Despite a very simple test environment, easy to replicate, AWS has not provided an effective way to not get the copy statement to crap out as follows:
The actual call to aws_s3.table_import_from_s3(...) is reporting a permission problem between RDS and S3. From my research work with psql this appears to be a C library, probably installed by AWS.
NOTICE: CURL error code: 28 when attempting to validate pre-signed URL, 1 attempt(s) remaining
NOTICE: HINT: make sure your instance is able to connect with S3.
S3 to Postgres RDS Functionality Now Added
On 2019-04-24 AWS released functionality allowing a Postgres RDS to load directly from S3. You can read the announcement here, and see the documentation page here.
I am sharing with the OP because this appears to be the AWS supported way of solving the question posed.
Key summary points:
Requires Postgres 11.1 or greater
Need access to psql and the ability to connect it to the RDS instance
Need to install the aws_s3 extension which pulls in aws_commons.
You can get to the S3 bucket by specifying credentials or by assigning IAM roles to RDS
It advertises supporting all of the same data formats as the postgres COPY command
It currently only appears to support a single file at a time (ie no regex)
The instructions are fairly detailed and provide a variety of paths to configuring (AWS CLI scripts, Console instructions, etc). Additionally, the option to use your IAM keys rather than have to set-up roles is nice.
I did not find a way to download just psql, so I had to bring down a full postgres install down to my mac, but that was no big deal with brew:
brew install postgres
and since the DB service does not get activated it is the quickest way to get psql.
Update: Decided that having psql on my mac was a security hole, port forwarding, etc. I found that there is a simple Postgres install available for AMI Linux 2 under the AMI Extras rubric. The install command is fairly simple on your ami instance type.
sudo amazon-linux-extras install postgresql10
psql is fairly easy to use, however, important to keep in mind that any instructions to psql itself are escaped by a \. Documentation on psql can be found here. Recommend going through it at least once before executing the AWS recommended scripts.
To the extent you run tight security and have access to your RDS instances seriously restricted (which I do) don't forget to open up the ports from your AMI instance running Postgres to your RDS instance.
If your preference is a GUI then you can try to use PGAdmin4. It is the AWS recommended way of connecting to RDS Postgres instances according to the docs. I was unable to get any of the SSH tunneling features to work (which is why I ended up doing the localhost SSH mapping that I used for psql). I also found it to be rather buggy in other ways. Reading reviews of the product it seems that version 4 may not be the stablest of releases.
http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html
Use the COPY command to load a table in parallel from data files on
Amazon S3. You can specify the files to be loaded by using an Amazon
S3 object prefix or by using a manifest file.
The syntax to specify the files to be loaded by using a prefix is as
follows:
copy <table_name> from 's3://<bucket_name>/<object_prefix>'
authorization;
update
Another option is to mount s3 and use direct path to the csv with COPY command. I'm not sure If it will hold 100GB effectively, but worth of trying. Here is some list of options on software.
Yet another option would be "parsing" s3 file part by part with something described here to a file and COPY from named pipe, described here
And the most obvious option to just download file to local storage and use COPY I don't cover at all
Also worth of mentioning would be s3_fdw (status unstable). Readme is very laconic, but I assume you could create a foreign table leading to s3 file. Which itself means you can load data to other relation...

Ansible 2.2 Create Postgresql Database

I am new to ansible (2.2.1) and have started to migrate from our fabric scripts to ansible which I find somewhat better regarding structure. I have run into an issue, it should be pretty straight forward but since I do not know ansible through and through I am not sure how to proceed. I am running this against a vagrant box as of now.
The issue is regarding user privileges and postgres.
Lets say I have this playbook
- hosts: web
become: yes
become_user: root
vars:
dbname: myapp
tasks:
- name: ensure database is created
postgresql_db: name={{dbname}}
I cannot make this simple example work! All dependencies are met. If I do the same thing with mysql this works fine but here I get issues with unable to connect to database: FATAL: Peer authentication failed for user "postgres".
In mysql I use the "root" user with a blank password, which works because I know that user is created upon install with a blank password.
There is a user postgres created when the installation of postgresql is completed so the user exists. And as root I should be able to login by saying I am the postgres user. Am I missing something in how this is done? It works just fine if log into the server and sudo -su postgres && psql.
I also tried to add become_user: postgres by the task I want to run but then I get unprivileged user issues.
Any ideas of what is missing?
Found a few workarounds and a solution, given you are ok with giving away some security (which makes sense, this being an issue just for that reason).
More people having this problem with the new ansible in this github issue thread
https://github.com/ansible/ansible/issues/16048
What I ended up doing was settings allow_world_readable_tmpfiles = True in the ansible.cfg file. Then this is not an issue anymore but you get warnings. I only need this setting for when handling the postgres role, so maybe I'll end up split it up or putting the setting somewhere less global.

Using Docker and MongoDB

I have been using Docker and Kubernetes for a while now, and have set up a few databases(postgres and mysql) and services.
Now I was looking at adding a mongoDB, but it seems different when it comes to user management.
Take for example postgres:
https://hub.docker.com/_/postgres/
Immediately I can declare users with a password on setup and then connect using this. It seems the mongo image does not support this. Is there a way to simply declare users on startup and use them similar to the postgres setup? This is, without having to exec into the image, modify auth settings and restarting the mongo service.