Is it possible to maintain a persistent SQL connection to a postgres database? - python-polars

With Pandas, we are able to create persistent connections, which allows (for example) creating temporary tables against which we can join. For example:
import pandas as pd
import sqlalchemy as sa
engine = sa.create_engine("postgresql://me#server:port/my_db")
# create a temporary table
sql = """
CREATE TEMPORARY TABLE my_table (
name varchar(50),
age SMALLINT,
birth_date DATE
);
INSERT INTO my_table VALUES ('me', 38, '1980-01-01');
"""
conn.execute(sql)
# now perform a SELECT using the persistent connection
df = pd.read_sql_table("my_table", conn)
However, polars appears to use a string as a connection, and presumably creates a new connection with each query. Is there a way to use polars along with persistent connections?

Polars uses ConnectorX in read_sql, which is a really thin and convenient wrapper around pl.from_arrow(cx.read_sql(connection_uri, sql,...)). Polars uses Arrow as the underlying data structure, meaning that pl.from_arrow is (mostly) zero-copy and efficient. Thus any python SQL query package that returns an Arrow table would work. An example would be TurbODBC, but there may be more that return an Arrow table and maintain a persistent connection.

Related

Redshift Temp Table Identity column

My stored procedure includes the following code:
CREATE TEMPORARY TABLE #lala
(
idx int IDENTITY(1,1),
tablename nvarchar(128)
);
INSERT INTO #lala(tablename)
SELECT LEFT(tablename, LEN(tablename) - 3)
FROM SVV_EXTERNAL_TABLES
WHERE schemaname = 'spectrum'
AND tablename LIKE '%_v2';
I'm then calling it like this:
BEGIN;
CALL myschema.make_union_views('spectrum_views','spectrum','mycursor');
FETCH ALL FROM mycursor;
COMMIT;
At first it was running succesfully.
Then it began falling over, and debugging, I listed the contents of '#lala
I am confused as to how this has come about - that the [idx] column is not sequential ?
Hope that someone can shed some light on what might be happening?
This is by design. Redshift is a cluster and as such communications between parts of the cluster are expensive. Redshift ensures the uniqueness of identity columns but NOT sequentiality. Per the CREATE TABLE documentation (https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html):
When you load the table using an INSERT INTO [tablename] SELECT * FROM
or COPY statement, the data is loaded in parallel and distributed to
the node slices. To be sure that the identity values are unique,
Amazon Redshift skips a number of values when creating the identity
values.

How to get the describe tables from the Redshift and ALTER it

I have create a redshift cluster and created a db inside.
My schema is new_schema
I have created 2 tables inside two tables inside table1, table2
My Question.
I want to list the datatypes of table1
I need to change the datatype of description which is inside the table1 which is of VARCHAR to TEXT
I have tried to list the datatypes of table1 with below query but nothing listing
SELECT * FROM PG_TABLE_DEF WHERE schemaname = 'new_schema';
A few possibilities as to why you are not seeing the expected results. Most likely is that new_schema isn't in your search_path. Pg_table_info only return info for tables in your search_path - see: https://docs.aws.amazon.com/redshift/latest/dg/r_PG_TABLE_DEF.html
Another possibility is that the tables have no data rows (no blocks assigned) and this can lead to incomplete info from some system tables.
Another possibility is that the tables were not committed by the creating session and being checked by a different session. Since you say that you are creating a new db this comes to mind.
Are the tables visible in svv_table_info?
Also the premise of changing varchar to text is a bit off. From https://docs.aws.amazon.com/redshift/latest/dg/r_Character_types.html#r_Character_types-text-and-bpchar-types
You can create an Amazon Redshift table with a TEXT column, but it is
converted to a VARCHAR(256) column that accepts variable-length values
with a maximum of 256 characters.
So it seems like the objective you are trying to achieve is a bit off.

Can the foreign data wrapper fdw_postgres handle the GEOMETRY data type of PostGIS?

I am accessing data from a different DB via fdw_postgres. It works well:
CREATE FOREIGN TABLE fdw_table
(
name TEXT,
area double precision,
use TEXT,
geom GEOMETRY
)
SERVER foreign_db
OPTIONS (schema_name 'schema_A', table_name 'table_B')
However, when I query for the data_type of the fdw_table I get the following result:
name text
area double precision
use text
geom USER-DEFINED
Can fdw_postgres not handle the GEOMETRY data type of PostGIS? What does USER-DEFINED mean in this context?
From the documentation on the data_type column:
Data type of the column, if it is a built-in type, or ARRAY if it is
some array (in that case, see the view element_types), else
USER-DEFINED (in that case, the type is identified in udt_name and
associated columns).
So this is not specific to FDWs; you'd see the same definition for a physical table.
postgres_fdw can handle custom datatypes just fine, but there is currently one caveat: if you query the foreign table with a WHERE condition involving a user-defined type, it will not push this condition to the foreign server.
In other words, if your WHERE clause only references built-in types, e.g.:
SELECT *
FROM fdw_table
WHERE name = $1
... then the WHERE clause will be sent to the foreign server, and only the matching rows will be retrieved. But when a user-defined type is involved, e.g.:
SELECT *
FROM fdw_table
WHERE geom = $1
... then the entire table is retrieved from the foreign server, and the filtering is performed locally.
Postgres 9.6 will resolve this, by allowing you to attach a list of extensions to your foreign server object.
Well, obviously you are going to need any non-standard types defined at both ends. Don't forget the FDW functionality is supposed to support a variety of different database platforms, so there isn't any magic way to import remote operations on a datatype. Actually, given that one end could be running on MS-Windows and the other on ARM-based Linux there's not even a sensible way of doing it just with PostgreSQL.

PostgreSQL join across 2 databases

I am new to PostgreSQL. I have 2 databases in PostgreSQL 9.0, db1 and db2, and with db2 I have read only access. I want to create a stored function that would be otherwise easily accomplished with a JOIN or a nested query, something PostgreSQL can't do across databases.
In db1, I have table1 where I can query for a set of foreign keys keys that I can use to search for records in a table2 in db2, something like:
SELECT * from db2.table2 WHERE db2.table2.primary_key IN (
SELECT db1.table1.foreign_key FROM db1.table1 WHERE
db1.table1.primary_key="whatever");
What is the best practice for doing this in Postgres? I can't use a temporary tables in db2, and passing in the foreign keys as a parameter in a stored function running in db2 doesn't seem like a good solution.
Note: the keys are all VARCHAR(11)
You'll want to look into the db_link contrib.
As an aside if you're familiar with C, there also is a cute functionality called foreign data wrappers. It allows to manipulate pretty much any source using plain SQL. Example with Twitter:
SELECT from_user, created_at, text FROM twitter WHERE q = '#postgresql';

Most straightforward way to add a row to an SQL Server table in ADO.NET without hardcoded SQL?

I am wondering what the best / most efficient / common way is to add a row to an SQL Server table using C# and ADO.NET. I know of course that I can just create an SQL statement for that, but first, the destination table schema might vary, so I want to keep this flexible, and second, there are so much columns that I do not want to code and maintain this manually. So I currently use a SqlCommandBuilder that is automatically creating the proper insert statement for me, together with an SQLDataAdapter, like this:
var dataAdapter = new SqlDataAdapter("select * from sometable", _databaseConnection);
new SqlCommandBuilder(dataAdapter);
dataAdapter.Fill(dataTable);
// ... add row to dataTable, fill fields from some external file that
// ... includes column names as well,
//.... add some more field values not from the file, etc. ...
dataAdapter.Update(dataTable);
This seems pretty inefficient though to first grab all the records from the table even though I do not need them for anything (especially considering that there might even already be a million records in there). Using some select statement like select * from sometable where 1=2 would work, but it does not seem like a very clean approach. I imagine there is some different solution for this that I am just not aware of.
Thanks,
Timo
I think the best way to insert rows is by using Stored Procedures through the ADO.NET command object.
If you are inserting massive amounts of data and are using SQL Server 2008 you can pass DataTable objects to a stored procedure by using a User-Defined Table Types.
In SQL:
CREATE TYPE SAMPLE_TABLE_TYPE --
AS
field1 VARCHAR(255)
field2 VARCHAR(255)
CREATE STORED PROCEDURE insert_data
AS
#data Sample_TABLE_TYPE
BEGIN
INSERT INTO table1 (field1, field1)
SELECT username, password FROM #data;
In .NET:
DataTable myTable = new DataTable();
myTable.Columns.Add(new DataColumn("field1", typeof(string));
myTable.Columns.Add(new DataColumn("field1", typeof(string));
SqlCommand command = new SqlCommand(conn, CommandType.StoredProcedure);
command.Parameters.Add("#data", myTable);
command.ExecuteNonQuery();
If you data also contains updates you can use the new MERGE function used in SQL Server 2008 to efficiently perform both inserts and updates in the same procedure.
However, if creating User-Defined Table Types and creating stored procedures is too much work, and you need a complete dynamic solution I would stick with what you have, with the recommendation of using the
Where 1 = 0
appended to your SQL text.
You also can use "SELECT TOP(0) * FROM SOMETABLE;" query.