syntax error at or near "(" postgresql + Python - postgresql

I'm using a Jupyter notebook for working with some database with postgresql
I have the following instances:
import pandas as pd
import (other packages)
conn_string= % I can't show this, but this is ok
conn = psycopg2.connect(conn_string)
cursor=conn.cursor
query= """ copy (select col1,col2 from Table where col3=a_parameter
and col4=b_parameter) to '/tmp/test.csv' with csv """
pd.read_sql(query,conn)
But I got this error:
**ProgrammingError: syntax error at or near "("
LINE 1: COPY (select col1,col2 from Table where col3...**
^
Why the copy sentence has an error?
I am using Postresql 8.0.2

Something like this:
import csv
my_file_csv = my_folder + "\Report_Trip_Day_" + my_opr + "_" + my_local_database + ".csv"
out = csv.writer(open(my_file_csv, "w", newline=''), delimiter=',', quoting=csv.QUOTE_ALL)
out.writerow(colnames)
for row in my_xls_report_table:
out.writerow(row)

You can make this:
query= """ copy (select col1,col2 from Table where col3=a_parameter
and col4=b_parameter) """
df=pd.read_sql(query,con=conn)
df.to_csv("name.csv",sep=",")

Related

I am unable to COPY my CSV to postgres using psycopg2/copy_expert

edit:
In postgresql.conf, the log_statement is set to:
#log_statement = 'none' # none, ddl, mod, all
My objective is to COPY a .cvs file containing ~300k records to Postgres.
I am running the script below and nothing happens; no error or warning but still the csv is not uploaded.
Any thoughts?
import psycopg2
# Try to connect
try:
conn = psycopg2.connect(database="<db>", user="<user>", password="<pwd>", host="<host>", port="<port>")
print("Database Connected....")
except:
print("Unable to Connect....")
cur = conn.cursor()
try:
sqlstr = "COPY \"HISTORICALS\".\"HISTORICAL_DAILY_MASTER\" FROM STDIN DELIMITER ',' CSV"
with open('/Users/kevin/Dropbox/Stonks/HISTORICALS/dump.csv') as f:
cur.copy_expert(sqlstr, f)
conn.commit()
print("COPY pass")
except:
print("Unable to COPY...")
# Close communication with the database
cur.close()
conn.close()
This is what my .csv looks like
Thanks!
Kevin
I suggest you to load in first time your df with pandas
import pandas as pd
import psycopg2
conn = psycopg2.connect(database="<db>", user="<user>", password="<pwd>", host="<host>", port="<port>")
cur = conn.cursor()
df = pd.read_csv('data.csv')
cur.copy_from(df, schema , null='', sep=',/;', columns=(df.columns))
For the part columns=(df.columns) I forgot if they want turple or list but should work with a conversion and you should read this
Pandas dataframe to PostgreSQL table using psycopg2 without SQLAlchemy? who could help you

How to make psycopg2 emit no quotes?

I want to create a table from Pytnon:
import psycopg2 as pg
from psycopg2 import sql
conn = pg.connect("dbname=test user=test")
table_name = "testDB"
column_name = "mykey"
column_type = "bigint"
cu = conn.cursor()
cu.execute(sql.SQL("CREATE TABLE {t} ({c} {y})").format(
t=sql.Identifier(table_name),
c=sql.Identifier(column_name),
y=sql.Literal(column_type)))
Alas, this emits CREATE TABLE "testDB" ("mykey" 'bigint') which fails with a
psycopg2.ProgrammingError: syntax error at or near "'bigint'"
Of course, I can do something like
cu.execute(sql.SQL("CREATE TABLE {t} ({c} %s)" % (column_name)).format(
t=sql.Identifier(table_name),
c=sql.Identifier(column_name)))
but I suspect there is a more elegant (and secure!) solution.
PS. See also How to make psycopg2 emit nested quotes?
There is an example in the documentation how to build a query text with a placeholder. Use psycopg2.extensions.AsIs(object) for column_type:
query = sql.SQL("CREATE TABLE {t} ({c} %s)").format(
t=sql.Identifier(table_name),
c=sql.Identifier(column_name)).as_string(cu)
cu.execute(query, [AsIs(column_type)])

Python psycopg2 - using .format() with a dbname inside a string

I'm using psycopg2 to query a database that starts with a number + ".district", so my code goes like:
number = 2345
cur = conn.cursor()
myquery = """ SELECT *
FROM {0}.districts
;""".format(number)
cur.execute("""{0};""".format(query))
data = cur.fetchall()
conn.close()
And i keep receiving the following psycopg2 error..
psycopg2.ProgrammingError: syntax error at or near "2345."
LINE 1: SELECT * FROM 2345.districts...
Thought it was the type of data the problem, maybe int(number) or str(number)..but no, same error appears.
¿ What am i doing wrong ?
The way you are trying to use to pass parameters is not supported. Please read the docs.

How to execute multi line sql in spark sql

How can I execute lengthy, multiline Hive Queries in Spark SQL? Like query below:
val sqlContext = new HiveContext (sc)
val result = sqlContext.sql ("
select ...
from ...
");
Use """ instead, so for example
val results = sqlContext.sql ("""
select ....
from ....
""");
or, if you want to format code, use:
val results = sqlContext.sql ("""
|select ....
|from ....
""".stripMargin);
You can use triple-quotes at the start/end of the SQL code or a backslash at the end of each line.
val results = sqlContext.sql ("""
create table enta.scd_fullfilled_entitlement as
select *
from my_table
""");
results = sqlContext.sql (" \
create table enta.scd_fullfilled_entitlement as \
select * \
from my_table \
")
val query = """(SELECT
a.AcctBranchName,
c.CustomerNum,
c.SourceCustomerId,
a.SourceAccountId,
a.AccountNum,
c.FullName,
c.LastName,
c.BirthDate,
a.Balance,
case when [RollOverStatus] = 'Y' then 'Yes' Else 'No' end as RollOverStatus
FROM
v_Account AS a left join v_Customer AS c
ON c.CustomerID = a.CustomerID AND c.Businessdate = a.Businessdate
WHERE
a.Category = 'Deposit' AND
c.Businessdate= '2018-11-28' AND
isnull(a.Classification,'N/A') IN ('Contractual Account','Non-Term Deposit','Term Deposit')
AND IsActive = 'Yes' ) tmp """
It is worth noting that the length is not the issue, just the writing. For this you can use """ as Gaweda suggested or simply use a string variable, e.g. by building it with string builder. For example:
val selectElements = Seq("a","b","c")
val builder = StringBuilder.newBuilder
builder.append("select ")
builder.append(selectElements.mkString(","))
builder.append(" where d<10")
val results = sqlContext.sql(builder.toString())
In addition to the above ways, you can use the below-mentioned way as well:
val results = sqlContext.sql("select .... " +
" from .... " +
" where .... " +
" group by ....
");
Write your sql inside triple quotes, like """ sql code """
df = spark.sql(f""" select * from table1 """)
This is same for Scala Spark and PySpark.

How to connect Jupyter Ipython notebook to Amazon redshift

I am using Mac Yosemite.
I have installed the packages postgresql, psycopg2, and simplejson using conda install "package name".
After the installation I have imported these packages. I tried to create a json file with my amazon redshift credentials
{
"user_name": "YOUR USER NAME",
"password": "YOUR PASSWORD",
"host_name": "YOUR HOST NAME",
"port_num": "5439",
"db_name": "YOUR DATABASE NAME"
}
I used with
open("Credentials.json") as fh:
creds = simplejson.loads(fh.read())
But this is throwing error. These were the instructions given on a website. I tried searching other websites but no site gives a good explanation.
Please let me know the ways I can connect the Jupyter to amazon redshift.
There's a nice guide from RJMetrics here: "Setting up Your Analytics Stack with Jupyter Notebook & AWS Redshift". It uses ipython-sql
This works great and displays results in a grid.
In [1]:
import sqlalchemy
import psycopg2
import simplejson
%load_ext sql
%config SqlMagic.displaylimit = 10
In [2]:
with open("./my_db.creds") as fh:
creds = simplejson.loads(fh.read())
connect_to_db = 'postgresql+psycopg2://' + \
creds['user_name'] + ':' + creds['password'] + '#' + \
creds['host_name'] + ':' + creds['port_num'] + '/' + creds['db_name'];
%sql $connect_to_db
In [3]:
% sql SELECT * FROM my_table LIMIT 25;
Here's how I do it:
----INSERT IN CELL 1-----
import psycopg2
redshift_endpoint = "<add your endpoint>"
redshift_user = "<add your user>"
redshift_pass = "<add your password>"
port = <your port>
dbname = "<your db name>"
----INSERT IN CELL 2-----
from sqlalchemy import create_engine
from sqlalchemy import text
engine_string = "postgresql+psycopg2://%s:%s#%s:%d/%s" \
% (redshift_user, redshift_pass, redshift_endpoint, port, dbname)
engine = create_engine(engine_string)
----INSERT IN CELL 3 - THIS EXAMPLE WILL GET ALL TABLES FROM YOUR DATABASE-----
sql = """
select schemaname, tablename from pg_tables order by schemaname, tablename;
"""
----LOAD RESULTS AS TUPLES TO A LIST-----
tables = []
output = engine.execute(sql)
for row in output:
tables.append(row)
tables
--IF YOU'RE USING PANDAS---
raw_data = pd.read_sql_query(text(sql), engine)
The easiest way is to use this extension -
https://github.com/sat28/jupyter-redshift
The sample notebook shows how it loads redshift utility as an IPython Magic.
Edit 1
Support for writing back to redshift database has also been added.