How to remotely execute a Postgres SQL function on Postgres using PySpark JDBC connector? - postgresql

I want to execute the following query on a remote Postgres server from a PySpark application using the JDBC connector:
SELECT id, postgres_function(some_column) FROM my_database GROUP BY id
The problem is I can't execute this kind of query on Pyspark using spark.sql(QUERY), obviously because the postgres_function is not an ANSI SQL function supported since Spark 2.0.0.
I'm using Spark 2.0.1 and Postgres 9.4.

The only option you have is to use subquery:
table = """
(SELECT id, postgres_function(some_column) FROM my_database GROUP BY id) AS t
"""
sqlContext.read.jdbc(url=url, table=table)
but this will execute a whole query, including aggregation, on the database side and fetch the result.
In general it doesn't matter if function is an ANSI SQL function or if it has an equivalent in the source database and ll functions called in spark.sql are executed in Spark after data is fetched.

Related

Exasol not exporting in parallel to PostgreSQL

We have a connection in Exasol (v7.0.18) to PostgreSQL (v14) created like this
create or replace connection POSTGRES_DB to
'jdbc:postgresql://hostname:5432/my_db?useCursorFetch=true&defaultFetchSize=2147483648'
user 'abc'
identified by <>;
I am running an export statement using this connection like this:
EXPORT MY_SCHEMA.TEST_TABLE
INTO JDBC AT POSTGRES_DB
TABLE pg_schema.test_table
truncate;
This works without any error.
The issue is that it runs only one insert statement in the PostgresSQL at a time. I am expecting multiple inserts running at a time in PostgresSQL.
This documentation page says Importing from Exasol databases is always parallelized. For Exasol, loading tables directly is significantly faster than using the STATEMENT option.
How can I make the export statement do parallel insert into PostgreSQL?

SQLCODE=-104, SQLSTATE=42601, SQLERRMC=table;reorg ;JOIN <joined_table>

when running this query on db2 on DBeaver :
reorg table departments
i got this error (just on external channel):
DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=table;reorg ;JOIN <joined_table>, DRIVER=4.19.49
what does this query mean?
how can I fix the error?
appricicate any help.
Try call sysproc.admin_cmd('reorg table db2inst1.departments')
as you are using DBeaver which is a jdbc application.
If you do not qualify the table name (for example, with db2inst1) then Db2 will assume that the qualifier (schema name) is the same as the userid name you used when connecting to the database.
DBeaver runs SQL statements, but it cannot directly run commands of Db2 - instead, any jdbc app can run Db2-commands indirectly via a stored-procedure that you CALL. The CALL is an SQL statement.
The reorg table is a command, it is not an SQL statement, so it needs to be run via the admin_cmd stored-procedure, or it can be run from the operating system command line (or db2 clp) after connecting.
So if you have db2cmd.exe on MS-Windows, or bash on linux/unix, you can connect to the database, and run commands via the db2 command.

Execute queries on redshift using Pyspark

Can any one of you suggest the method of executing queries on redshift tables using pyspark?
Using pyspark data-frames option is one option to read/write to Redshift tables. And executing queries with dataframe's help is restricted to using preactions/postactions method.
If you need to execute multiple queries, one way is to use psycopg2 module.
First, you need to install psycopg2 module on your server:
sudo python -m pip install psycopg2
Then open the pyspark shell and execute the following:
import psycopg2
conn={'db_name': {'hostname':'redshift_host_url','database':'database_on_redshift','username':'redshift_username','password':'p', 'port':your_port} }
db='db_name'
hostname, username, password, database, portnumber = conn[db]['hostname'], conn[db]['username'], conn[db]['password'], conn[db]['database'], conn[db]['port']
con = psycopg2.connect( host=hostname, port=portnumber, user=username, password=password, dbname=database)
query="insert into sample_table select * from table1"
cur = con.cursor()
rs = cur.execute(query)
con.commit()
con.close()
You can also refer: https://www.psycopg.org/docs/usage.html

Select form using the apache zeppelin jdbc (for postgres) interpreter for string field isn't working

I'm using the jdbc interpreter for apache zeppelin with a select form for a string field.
This is the implementation i've found online (https://stackoverflow.com/a/38788612) but for the sql interpreter:
select * from table where textfield="${choices=choice1,choice1|choice2}"
The postgres interpreter documentation says that it is going to be deprecated and to use jdbc instead. But this same query doesn't work with the jdbc interpreter. The error message I get is
org.postgresql.util.PSQLException: ERROR: column "choice1" doesn't exist
I also tried
select * from table where textfield=${choices="choice1","choice1"|"choice2"}
and
select * from table where textfield=${choices=choice1,choice1|choice2}
which also didn't work - it throws the same error. The same thing works as expected for integers because there is no problem with quotes there. Any idea what I'm doing wrong?

Create multiple databases using scala slick

Is it possible to create multiple databases using a single query using slick?
sqlu"""CREATE DATABASE if not exists students;CREATE DATABASE if NOT professors"""
I'm running the above slick query but getting MySQLSyntaxErrorException
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CREATE DATABASE if NOT EXISTS professors' at line 1
Ive found answer to my own question
val createDbs = DBIO.seq(sqlu"""CREATE DATABASE if not exists students""",
sqlu"""CREATE DATABASE if not exists professors """)
db.run(createDbs)