Too many connections error with redshift using sqlalchemy with pandas - amazon-redshift

I am using sqlalchemy in a flask application to create the engine to connect the redshift database.
I am having a loop and in each loop am executing the query and return a data frame with pandas.read_sql_query(query_string, engine).
When I run my program I am receiving below error with connecting redshift DB.
psycopg2.OperationalError: FATAL: too many connections for user "user"
Please advise how to handle the error in python and sqlalchemy in the flask app?
I have tried poolclass=Nullpool and using engine.dispose()
But none of them worked.
redshift_db = create_engine(db_url)
for id in list:
data_frame = pd.read_sql_query(sql_strung,
redshift_db,
params={id': id})

The database is not allowing multiple concurrent connections from the same API on the same table
You'd rather use a single query:
id_list_str = ', '.join([id for id in id_list])
sql_string = "select id, x, y, z from <db_table> where id in ANY(%s)".format(id_list)
data_frame = pd.read_sql_query(sql_strung, redshift_db)
For more you can read about this on the AWS documentation

Related

Why would postgres views not be visible to PGadmin browser, Psycopg2?

I've created some views in my postgres database. I know they're there, because I can query them through the query tool in PGAdmin4 (and they are persistent between restarting the machine hosting the database), but they are neither visible in the schema browser nor queryable through psycopg2.
For larger context, I'm trying to extract some text from a large collection of documents which are stored in a database. (The database is a copy of the data received from a third party, and fully normalized, etc.) I'd like to do my NLP nonsense in Python, while defining a lot of document categorizations through SQL views so the categorizations are consistent, persistent, and broadly shareable to my team.
Googling has not turned up anything relevant here, so I'm wondering if there is a basic configuration issue that I've missed. (I am much more experienced with SQLServer than with postgres.)
Example:
[Assume I'm connected to database DB, schema SC, which has tables T1, T2, T3.]
-- in PGAdmin4 window
CREATE VIEW v_my_view as
SELECT T1.field1, T2.field2
FROM T1
JOIN T2
on T1.field3 = T2.field3
Restart host machine (so definitely new PGAdmin session), the following works:
-- in pgadmin4 window
SELECT *
FROM v_my_view
-- 123456 results returned
...but even though that works, in the pgadmin4 browser panel, the 'views' folder is empty (right underneath the tables folder that proudly shows T1 and T2).
Within psycopg2:
import psycopg2
import pandas as pd
sqluser = 'me'
sqlpwd = 'secret'
dbname = 'DB'
schema_name = 'SC'
pghost = 'localhost'
def q(query):
cnxn = psycopg2.connect(dbname=dbname, user=sqluser, password=sqlpwd, host=pghost)
cursor = cnxn.cursor()
cursor.execute('SET search_path to ' + schema_name)
return pd.read_sql_query(query, cnxn)
view_query = """select *
from v_my_view
limit 100;"""
table_query = """select *
from SC.T1
limit 100;"""
# This works
print(f"Result: {q(table_query)}")
# This does not; error is: relation 'v_my_view' does not exist
# (Same result if view is prefixed with schema name)
# print(f"Result: {q(view_query)}")
Software versions:
pgadmin 4.23
postgres: I'm connected to 10.13 (Ubuntu 10.13-1-pgdg18.04+1), though 12
is also installed.
psycopg2: 2.8.5
Turns out this was a noob mistake. Views are created to the first schema of the search path (which can be checked by executing show search_path;, which in my case was set to "$user", public despite attempting to set it to the appropriate schema name). So the views were getting created against a different schema from the one I was working with/where the tables were defined.
Created views are all visible in the left-hand browser once I look under the correct schema.
The following modification to the psycopg2 code returns the expected results:
import psycopg2
import pandas as pd
sqluser = 'me'
sqlpwd = 'secret'
dbname = 'DB'
schema_name = 'SC'
pghost = 'localhost'
def q(query):
cnxn = psycopg2.connect(dbname=dbname, user=sqluser, password=sqlpwd, host=pghost)
cursor = cnxn.cursor()
cursor.execute('SET search_path to ' + schema_name)
return pd.read_sql_query(query, cnxn)
# NOTE I am explicitly indicating the 'public' schema here
view_query = """select *
from public.v_my_view
limit 100;"""
table_query = """select *
from SC.T1
limit 100;"""
# This works
print(f"Result: {q(table_query)}")
# This works too once I specify the right schema:
print(f"Result: {q(view_query)}")
Try refresh Object on the PGAdmin toolbar. This should refresh the view.
Thanks
Amar

Querying a PostgreSQL database from Snowflake

PostgreSQL offers a way to query a remote database through dblink.
Similarly (sort-of), Exasol provides a way to connect to a remote Postgres database via the following syntax:
CREATE CONNECTION JDBC_PG
TO 'jdbc:postgresql://...'
IDENTIFIED BY '...';
SELECT * FROM (
IMPORT FROM JDBC AT JDBC_PG
STATEMENT 'SELECT * FROM MY_POSTGRES_TABLE;'
)
-- one can even write direct joins such as
SELECT
t.COLUMN,
r.other_column
FROM MY_EXASOL_TABLE t
LEFT JOIN (
IMPORT FROM JDBC AT JDBC_PG
STATEMENT 'SELECT key, other_column FROM MY_POSTGRES_TABLE'
) r ON r.key = t.KEY
This is very convenient to import data from PostgreSQL directly into Exasol without having to use a temporary file (csv, pg_dump...).
Is it possible to achieve the same thing from Snowflake (querying a remote PostgreSQL database from Snowflake with a direct live connection)? I couldn't find any mention of it in the documentation.
Have you looked into using external functions? It's not exactly what you're looking for (Snowflake doesn't have that capability yet) but it can be used as a workaround in some use cases. For instance, you could create a Python function on AWS Lambda that queries PostgreSQL for small amounts of data (due to Lambda limits) or have it trigger a PostgreSQL process that dumps to S3 to trigger Snowpipe for the bulk import use case.

lo_export does not exist in PostgreSQL 9.3

I am logged in as the postgres user and am trying to export a blob from a Postgresql 9.3 database as using this query:
select lo_export(imgs.rast, '/tmp/img.tif') from imgs where rid = 1;
but I get this error:
ERROR: function lo_export(bytea, unknown) does not exist
Do I need to install the lo_export function? If so, how?
You're trying to use a function, which is used to export a large object, which is something different than a blob (bytea column type in Postgres).
There's no library function to export bytea to a file. You just use in your client program something like (Python example):
cursor = conn.execute("select rast from img were rid=%(rid)s", {"rid":1})
result = cursor.fetchone()
with tempfile.NamedTemporaryFile(mode="wb", suffix=".tif", delete=False) as f:
f.write(result[0])
result_filename = f.name
Please remember that this reads the whole data to memory multiple times, so if the data can be fairly large you can prefer to read it in chunks using substring(rast from %(chunk_start)s for %(chunk_size)s). And ensure that the column is saved externally using set storage external.

Query for PostgreSQL Server status variable?

In my project i want to collect PostgreSQL server's performance counter. For that i want query to collect it from the database. i am new to postgreSQL. when i am searching, i got something like,
SELECT * FROM pg_stat_database
but when i am use this in java in the following manner, Here Map_PostgreSQL is a hashmap.
while(rs.next())
{
Counter_Name.add(rs.getString(1).trim());
Map_PostgreSQL.put(rs.getString(1).trim(), rs.getString(2));
}
I got output like
{12024=template0, 1=template1, 12029=postgres}
What is the actual query to collect its status variables like "SHOW GLOBAL STATUS" in MySQL.
Thanks in advance..
1st, try to launch the sql query in your PostgreSQL Shell to see exactly which data are returned and how it is organised in rows and columns.
You'll see that the hashmap keys are your datid (database ids) and the values are your databases names.
I think you assumed that statistics were structured in "rows" whereas they are structured in columns.
Don't forget : PostgreSQL is a database server which means it can handle several databases (and in fact, it has several databases because some of them are already created such as the 'postgres' database itself - which Postgres (the server) uses internally, or 'template0').
By launching :
SELECT * FROM pg_stat_database;
You're asking the server to return statistics for every databases (provided you're allowed to get them)
If you want to only have stats for your own database, do :
SELECT * FROM pg_stat_database WHERE datname='your_database_name';
Hope this helped

Query from QSYS2.SysTables returns error "Token; void"

I am trying to read data from AS400 DB using Excel, VBA and ODBC driver. Connection is successful but none of the queries are retrieving data from DB. For ex: The select query is not working:
select * from QSYS2.SysTables;
The Client gets the following error message:
[IBM] [System i Access ODBC Driver] [DB2 for i5/OS] SQL0104 - Token; void. Valid tokens: <END Instruction>.
What is wrong with my query?
Edit: I am trying to read data from AS400 only not from DB2. I want to read the table names from SysTables(system table).
Remove the statement termination character (;) for a single statement execution.
This is an example for retrieving queries:
"Select * from Tablename";
If it doesn't work, try checking the manual for query in Microsoft. It's different from standard SQL queries.