Connect pyodbc to Postgres - postgresql

Trying to connect to Postgres using pyodbc.
I can connect to the DB with isql:
echo "select 1" | isql -v my-connector
Returns:
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL> select 1
+------------+
| ?column? |
+------------+
| 1 |
+------------+
SQLRowCount returns 1
1 rows fetched
But when I try to connect with pyodbc:
import pyodbc
con = pyodbc.connect("DRIVER={PostgreSQL Unicode}; DATABASE=<dbname>; UID=<username>; PWD=<password>; SERVER=localhost; PORT=5432;")
I get the following error:
pyodbc.Error: ('08001', '[08001] [unixODBC]connction string lacks some options (202) (SQLDriverConnect)')
obdc.ini file looks like this:
[my-connector]
Description = PostgreSQL connection to '<dbname>' database
Driver = PostgreSQL Unicode
Database = <dbname>
Servername = localhost
UserName = <username>
Password = <password>
Port = 5432
Protocol = 9.3
ReadOnly = No
RowVersioning = No
ShowSystemTables = No
ShowOidColumn = No
FakeOidIndex = No
ConnSettings =
odbcinst.ini file looks like this:
[PostgreSQL ANSI]
Description = PostgreSQL ODBC driver (ANSI version)
Driver = psqlodbca.so
Setup = libodbcpsqlS.so
Debug = 0
CommLog = 1
UsageCount = 1
[PostgreSQL Unicode]
Description = PostgreSQL ODBC driver (Unicode version)
Driver = psqlodbcw.so
Setup = libodbcpsqlS.so
Debug = 0
CommLog = 1
UsageCount = 1
Notes:
Ubuntu 14.04
Python 3
Postgresql 9.3
I have used psycopg2 in the past to connect to Postgres, however my current company uses Netezza, Postgres, and MySQL. I want to write 1 connection module, and use different drivers to connect to the different databases.
Any help would be greatly appreciated.
-- Thanks

Since you already have a working DSN defined in odbc.ini you can just use that:
con = pyodbc.connect("DSN=my-connector")
Also, for the record, that extra whitespace in your connection string may have been confusing the issue because this worked fine for me, under Python 2.7 at least
import pyodbc
conn_str = (
"DRIVER={PostgreSQL Unicode};"
"DATABASE=postgres;"
"UID=postgres;"
"PWD=whatever;"
"SERVER=localhost;"
"PORT=5432;"
)
conn = pyodbc.connect(conn_str)
crsr = conn.execute("SELECT 123 AS n")
row = crsr.fetchone()
print(row)
crsr.close()
conn.close()

Related

Pyspark EMR "The connection attempt failed" when i try to save a datafreme in Postgres

When I try to save a dataframe in Postgres using pyspark on EMR I get the following error:
org.postgresql.util.PSQLException: The connection attempt failed.
However the table is created, but without the data.
df.write.format("jdbc") \
.option("url", "jdbc:postgresql://host:5432/dbtest")
.option("dbtable", "table_example")
.option("user","user")
.option("password", "pass").option("driver","org.postgresql.Driver" ).save()
Result in postgre:
List of relations
Schema | Name | Type | Owner
--------+---------------+-------+----------
public | table_example | table | user
select * from table_example;
age | name
-----+------
(0 rows)

django-celery-results won't recieve results

I have celery setup and working together with django. I have some periodic tasks that run. The celery log shows that the tasks are executed and that they return something.
[2017-03-26 14:34:27,039: INFO/MainProcess] Received task: my_webapp.apps.events.tasks.clean_outdated[87994396-04f7-452b-a964-f6bdd07785e0]
[2017-03-26 14:34:28,328: INFO/PoolWorker-1] Task my_webapp.apps.events.tasks.clean_outdated[87994396-04f7-452b-a964-f6bdd07785e0] succeeded in 0.05246314400005758s: 'Removed 56 event(s)
| Removed 4 SGW(s)
'
But the result is not showing up on django-celery-results admin page.
These are my settings:
CELERY_BROKER_URL = os.environ.get('BROKER_URL')
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = 'Europe/Stockholm'
CELERY_RESULT_BACKEND = 'django-cache'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
CELERY_RESULT_DB_SHORT_LIVED_SESSIONS = True # Fix for low traffic sites like this one
I have also tried setting CELERY_RESULT_BACKEND = 'django-db'. I know the migrations are made (when using those settings), the table exists in the database:
my_webapp=> \dt
List of relations
Schema | Name | Type | Owner
--------+--------------------------------------+-------+----------------
...
public | django_celery_beat_crontabschedule | table | my_webapp
public | django_celery_beat_intervalschedule | table | my_webapp
public | django_celery_beat_periodictask | table | my_webapp
public | django_celery_beat_periodictasks | table | my_webapp
public | django_celery_results_taskresult | table | my_webapp
...
(26 rows)
Google won't give me much help, most answers is about old libraries like djcelery. Any idea how the get the results in the table?

How to handle NaTs with pandas sqlalchemy and psycopg2

I have a dataframe with NaTs like so that is giving me a DataError: (psycopg2.DataError) invalid input syntax for type timestamp: "NaT": When I try inserting the values into a postgres db
The dataframe
from sqlalchemy import MetaData
from sqlalchemy.dialects.postgresql import insert
import pandas as pd
tst_df = pd.DataFrame({'colA':['a','b','c','a','z', 'q'],
'colB': pd.date_range(end=datetime.datetime.now() , periods=6),
'colC' : ['a1','b2','c3','a4','z5', 'q6']})
tst_df.loc[5, 'colB'] = pd.NaT
insrt_vals = tst_df.to_dict(orient='records')
engine = sqlalchemy.create_engine("postgresql://user:password#localhost/postgres")
connect = engine.connect()
meta = MetaData(bind=engine)
meta.reflect(bind=engine)
table = meta.tables['tstbl']
insrt_stmnt = insert(table).values(insrt_vals)
do_nothing_stmt = insrt_stmnt.on_conflict_do_nothing(index_elements=['colA','colB'])
The code generating the error
results = engine.execute(do_nothing_stmt)
DataError: (psycopg2.DataError) invalid input syntax for type timestamp: "NaT"
LINE 1: ...6-12-18T09:54:05.046965'::timestamp, 'z5'), ('q', 'NaT'::tim...
One possibility mentioned here is to replace the NaT's with None's but as the previous author said it seems a bit hackish.
sqlachemy 1.1.4
pandas 0.19.1
psycopg2 2.6.2 (dt dec pq3 ext lo64)
Did you try to use Pandas to_sql method?
It works for me for the MySQL DB (I presume it'll also work for PostgreSQL):
In [50]: tst_df
Out[50]:
colA colB colC
0 a 2016-12-14 19:11:36.045455 a1
1 b 2016-12-15 19:11:36.045455 b2
2 c 2016-12-16 19:11:36.045455 c3
3 a 2016-12-17 19:11:36.045455 a4
4 z 2016-12-18 19:11:36.045455 z5
5 q NaT q6
In [51]: import pymysql
...: import sqlalchemy as sa
...:
In [52]:
In [52]: db_connection = 'mysql+pymysql://user:password#mysqlhost/db_name'
...:
In [53]: engine = sa.create_engine(db_connection)
...: conn = engine.connect()
...:
In [54]: tst_df.to_sql('zzz', conn, if_exists='replace', index=False)
On the MySQL side:
mysql> select * from zzz;
+------+---------------------+------+
| colA | colB | colC |
+------+---------------------+------+
| a | 2016-12-14 19:11:36 | a1 |
| b | 2016-12-15 19:11:36 | b2 |
| c | 2016-12-16 19:11:36 | c3 |
| a | 2016-12-17 19:11:36 | a4 |
| z | 2016-12-18 19:11:36 | z5 |
| q | NULL | q6 |
+------+---------------------+------+
6 rows in set (0.00 sec)
PS unfortunately i don't have PostgreSQL for testing

Can't use PGPool with Amazon RDS Postgres

I have a Postgres 9.4 RDS instance with Multi-AZ, and there's a slave, read-only replica.
Up to this point the load balancing was made in the business layer of my app, but it's inefficient, and I was hoping to use PGPool, so the app interacts with a single Postgres connection.
It turns out that using PGPool has been a pain in the ass. If I set it to act as a load balancer, simple SELECT queries throw errors like:
SQLSTATE[HY000]: General error: 7
message contents do not agree with length in message type "N"
server sent data ("D" message)
without prior row description ("T" message)
If I set it to act in a master/slave mode with stream replication (as suggested in Postgres mail list) I get:
psql: ERROR: MD5 authentication is unsupported
in replication and master-slave modes.
HINT: check pg_hba.conf
Yeah, well, pg_hba.conf if off hands in RDS so I can't alter it.
Has anyone got PGPool to work in RDS? Are there other tools that can act as middleware to take advantage of reading replicas in RDS?
I was able to make it work here are my working config files:
You have to use md5 authentication, and sync the username/password from your database to the pool_passwd file. Also need enable_pool_hba, load_balance_mode, and master_slave_mode on.
pgpool.conf
listen_addresses = '*'
port = 9999
pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/tmp'
listen_backlog_multiplier = 1
backend_hostname0 = 'master-rds-database-with-multi-AZ.us-west-2.rds.amazonaws.com'
backend_port0 = 5432
backend_weight0 = 0
backend_flag0 = 'ALWAYS_MASTER'
backend_hostname1 = 'readonly-replica.us-west-2.rds.amazonaws.com'
backend_port1 = 5432
backend_weight1 = 999
backend_flag1 = 'ALWAYS_MASTER'
enable_pool_hba = on
pool_passwd = 'pool_passwd'
ssl = on
num_init_children = 1
max_pool = 2
connection_cache = off
replication_mode = off
load_balance_mode = on
master_slave_mode = on
pool_hba.conf
local all all md5
host all all 127.0.0.1/32 md5
pool_passwd
username:md5d51c9a7e9353746a6020f9602d452929
to update pool_password you can use pg_md5 or
echo username:md5`echo -n usernamepassword | md5sum`
username:md5d51c9a7e9353746a6020f9602d452929 -
Output of running example:
psql --dbname=database --host=localhost --username=username --port=9999
database=> SHOW POOL_NODES;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+-------------------------------------------------+------+--------+-----------+---------+------------+-------------------+-------------------
0 | master-rds-database.us-west-2.rds.amazonaws.com | 8193 | up | 0.000000 | primary | 0 | false | 0
1 | readonly-replica.us-west-2.rds.amazonaws.com | 8193 | up | 1.000000 | standby | 0 | true | 0
database=> select now();
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+-------------------------------------------------+------+--------+-----------+---------+------------+-------------------+-------------------
0 | master-rds-database.us-west-2.rds.amazonaws.com | 8193 | up | 0.000000 | primary | 0 | false | 0
1 | readonly-replica.us-west-2.rds.amazonaws.com | 8193 | up | 1.000000 | standby | 1 | true | 1
database=> CREATE TABLE IF NOT EXISTS tmp_test_read_write ( data varchar(40) );
CREATE TABLE
database=> INSERT INTO tmp_test_read_write (data) VALUES (concat('',inet_server_addr()));
INSERT 0 1
database=> select data as master_ip,inet_server_addr() as replica_ip from tmp_test_read_write;
master_ip | replica_ip
--------------+---------------
172.31.37.69 | 172.31.20.121
(1 row)
You can also see from the logs id does both databases:
2018-10-16 07:56:37: pid 124528: LOG: DB node id: 0 backend pid: 21731 statement: CREATE TABLE IF NOT EXISTS tmp_test_read_write ( data varchar(40) );
2018-10-16 07:56:47: pid 124528: LOG: DB node id: 0 backend pid: 21731 statement: INSERT INTO tmp_test_read_write (data) VALUES (concat('',inet_server_addr()));
2018-10-16 07:56:52: pid 124528: LOG: DB node id: 1 backend pid: 24890 statement: select data as master_ip,inet_server_addr() as replica_ip from tmp_test_read_write;
Notice the insert used ip_address of master, and the next select used ip_address of the read only replica.
I can update after more testing, but psql client testing looks promising.
There is Citus(pgShard) that is supposed to work with standard Amazon RDS instances. It has catches though. You will have a single point of failure if you use the open source version. It's coordinator node is not duplicated.
You can get a fully HA seamless fail over version of it but you have to buy the enterprise licence, but it is CRAZY expensive. It will easily cost you $50,000 to $100,000 or more per year.
Also they are REALLY pushing their cloud version now which is even more insanely expensive.
https://www.citusdata.com/
I have also heard of people using HAProxy to balance between Postgres or MySql nodes.

Syntax error in a simple UPDATE query

I have this query (PostgreSQL 9.1):
=> update tbp setĀ super_answer = null where packet_id = 18;
ERROR: syntax error at or near "="
I don't get it. I'm really out of words here.
Table "public.tbp"
Column | Type | Modifiers
--------------+------------------------+-----------
id | bigint | not null
super_answer | bigint |
packet_id | bigint |
It turned out I've copied some white unicode character and Postgres didn't like it.
In a Python console:
>>> u'update "tbp" setĀ "super_answer"=null where "packet_id" = 18'
u'update "tbp" set\xa0"super_answer"=null where "packet_id" = 18'
Life can be strange sometimes.