I am using Mac Yosemite.
I have installed the packages postgresql, psycopg2, and simplejson using conda install "package name".
After the installation I have imported these packages. I tried to create a json file with my amazon redshift credentials
{
"user_name": "YOUR USER NAME",
"password": "YOUR PASSWORD",
"host_name": "YOUR HOST NAME",
"port_num": "5439",
"db_name": "YOUR DATABASE NAME"
}
I used with
open("Credentials.json") as fh:
creds = simplejson.loads(fh.read())
But this is throwing error. These were the instructions given on a website. I tried searching other websites but no site gives a good explanation.
Please let me know the ways I can connect the Jupyter to amazon redshift.
There's a nice guide from RJMetrics here: "Setting up Your Analytics Stack with Jupyter Notebook & AWS Redshift". It uses ipython-sql
This works great and displays results in a grid.
In [1]:
import sqlalchemy
import psycopg2
import simplejson
%load_ext sql
%config SqlMagic.displaylimit = 10
In [2]:
with open("./my_db.creds") as fh:
creds = simplejson.loads(fh.read())
connect_to_db = 'postgresql+psycopg2://' + \
creds['user_name'] + ':' + creds['password'] + '#' + \
creds['host_name'] + ':' + creds['port_num'] + '/' + creds['db_name'];
%sql $connect_to_db
In [3]:
% sql SELECT * FROM my_table LIMIT 25;
Here's how I do it:
----INSERT IN CELL 1-----
import psycopg2
redshift_endpoint = "<add your endpoint>"
redshift_user = "<add your user>"
redshift_pass = "<add your password>"
port = <your port>
dbname = "<your db name>"
----INSERT IN CELL 2-----
from sqlalchemy import create_engine
from sqlalchemy import text
engine_string = "postgresql+psycopg2://%s:%s#%s:%d/%s" \
% (redshift_user, redshift_pass, redshift_endpoint, port, dbname)
engine = create_engine(engine_string)
----INSERT IN CELL 3 - THIS EXAMPLE WILL GET ALL TABLES FROM YOUR DATABASE-----
sql = """
select schemaname, tablename from pg_tables order by schemaname, tablename;
"""
----LOAD RESULTS AS TUPLES TO A LIST-----
tables = []
output = engine.execute(sql)
for row in output:
tables.append(row)
tables
--IF YOU'RE USING PANDAS---
raw_data = pd.read_sql_query(text(sql), engine)
The easiest way is to use this extension -
https://github.com/sat28/jupyter-redshift
The sample notebook shows how it loads redshift utility as an IPython Magic.
Edit 1
Support for writing back to redshift database has also been added.
Related
edit:
In postgresql.conf, the log_statement is set to:
#log_statement = 'none' # none, ddl, mod, all
My objective is to COPY a .cvs file containing ~300k records to Postgres.
I am running the script below and nothing happens; no error or warning but still the csv is not uploaded.
Any thoughts?
import psycopg2
# Try to connect
try:
conn = psycopg2.connect(database="<db>", user="<user>", password="<pwd>", host="<host>", port="<port>")
print("Database Connected....")
except:
print("Unable to Connect....")
cur = conn.cursor()
try:
sqlstr = "COPY \"HISTORICALS\".\"HISTORICAL_DAILY_MASTER\" FROM STDIN DELIMITER ',' CSV"
with open('/Users/kevin/Dropbox/Stonks/HISTORICALS/dump.csv') as f:
cur.copy_expert(sqlstr, f)
conn.commit()
print("COPY pass")
except:
print("Unable to COPY...")
# Close communication with the database
cur.close()
conn.close()
This is what my .csv looks like
Thanks!
Kevin
I suggest you to load in first time your df with pandas
import pandas as pd
import psycopg2
conn = psycopg2.connect(database="<db>", user="<user>", password="<pwd>", host="<host>", port="<port>")
cur = conn.cursor()
df = pd.read_csv('data.csv')
cur.copy_from(df, schema , null='', sep=',/;', columns=(df.columns))
For the part columns=(df.columns) I forgot if they want turple or list but should work with a conversion and you should read this
Pandas dataframe to PostgreSQL table using psycopg2 without SQLAlchemy? who could help you
The general CLP command for listing the databases in DB2
"LIST ACTIVE DATABASES"
what is the sql command to list all the database in a system directory?
It is
list db directory
Details are documented here
Using Python :
In [42]: stmt = ibm_db.exec_immediate(conn, "SELECT DISTINCT(DB_NAME) FROM table(mon_get_memory_pool('','',-2))")
In [43]: while (ibm_db.fetch_row(stmt)):
...: DB_NAME = ibm_db.result(stmt, "DB_NAME")
...: print("DB_NAME = {}".format(DB_NAME))
...:
...:
DB_NAME = SAMPLE
DB_NAME = None
I'm using a Jupyter notebook for working with some database with postgresql
I have the following instances:
import pandas as pd
import (other packages)
conn_string= % I can't show this, but this is ok
conn = psycopg2.connect(conn_string)
cursor=conn.cursor
query= """ copy (select col1,col2 from Table where col3=a_parameter
and col4=b_parameter) to '/tmp/test.csv' with csv """
pd.read_sql(query,conn)
But I got this error:
**ProgrammingError: syntax error at or near "("
LINE 1: COPY (select col1,col2 from Table where col3...**
^
Why the copy sentence has an error?
I am using Postresql 8.0.2
Something like this:
import csv
my_file_csv = my_folder + "\Report_Trip_Day_" + my_opr + "_" + my_local_database + ".csv"
out = csv.writer(open(my_file_csv, "w", newline=''), delimiter=',', quoting=csv.QUOTE_ALL)
out.writerow(colnames)
for row in my_xls_report_table:
out.writerow(row)
You can make this:
query= """ copy (select col1,col2 from Table where col3=a_parameter
and col4=b_parameter) """
df=pd.read_sql(query,con=conn)
df.to_csv("name.csv",sep=",")
I'm using psycopg2 to query a database that starts with a number + ".district", so my code goes like:
number = 2345
cur = conn.cursor()
myquery = """ SELECT *
FROM {0}.districts
;""".format(number)
cur.execute("""{0};""".format(query))
data = cur.fetchall()
conn.close()
And i keep receiving the following psycopg2 error..
psycopg2.ProgrammingError: syntax error at or near "2345."
LINE 1: SELECT * FROM 2345.districts...
Thought it was the type of data the problem, maybe int(number) or str(number)..but no, same error appears.
¿ What am i doing wrong ?
The way you are trying to use to pass parameters is not supported. Please read the docs.
I am working with the script below.
If I change the script so I avoid the bytea datatype, I can easily copy data from my postgres table into a python variable.
But if the data is in a bytea postgres column, I encounter a strange object called memory which confuses me.
Here is the script which I run against anaconda python 3.5.2:
# bytea.py
import sqlalchemy
# I should create a conn
db_s = 'postgres://dan:dan#127.0.0.1/dan'
conn = sqlalchemy.create_engine(db_s).connect()
sql_s = "drop table if exists dropme"
conn.execute(sql_s)
sql_s = "create table dropme(c1 bytea)"
conn.execute(sql_s)
sql_s = "insert into dropme(c1)values( cast('hello' AS bytea) );"
conn.execute(sql_s)
sql_s = "select c1 from dropme limit 1"
result = conn.execute(sql_s)
print(result)
# <sqlalchemy.engine.result.ResultProxy object at 0x7fcbccdade80>
for row in result:
print(row['c1'])
# <memory at 0x7f4c125a6c48>
How to get the data which is inside of memory at 0x7f4c125a6c48 ?
You can cast it use python bytes()
for row in result:
print(bytes(row['c1']))