Play 2.4 / Slick 3.1: database evolution connection leak - scala

I have a default setup of Play 2.4 and Slick 3.1 and I'm using database evolutions (my db is PostgreSQL). It seems that evolutions create a connection leak.
I'm connecting to two databases (prod and test):
slick.dbs.default.driver = "slick.driver.PostgresDriver$"
slick.dbs.default.db.driver = "org.postgresql.Driver"
slick.dbs.default.db.url = "jdbc:postgresql://localhost:5432/et"
slick.dbs.default.db.user = "postgres"
slick.dbs.default.db.password = "postgres"
slick.dbs.test.driver = "slick.driver.PostgresDriver$"
slick.dbs.test.db.driver = "org.postgresql.Driver"
slick.dbs.test.db.url = "jdbc:postgresql://localhost:5432/et_test"
slick.dbs.test.db.user = "postgres"
slick.dbs.test.db.password = "postgres"
This is the connection count before the application is started:
et=# select count(*) from pg_stat_activity;
count
-------
3
(1 row)
This is the connection count after the application is started:
et=# select count(*) from pg_stat_activity;
count
-------
45
(1 row)
It would seem each database allocates 21 connections - 42 in total, fair enough.
This is the connection count after the "apply evolutions" is clicked:
et=# select count(*) from pg_stat_activity;
count
-------
87
(1 row)
It seems evolutions allocate 21 more connections per database, so another 42.
After 10 minutes of application running idle:
et=# select count(*) from pg_stat_activity;
count
-------
87
(1 row)
This looks like an obvious connection leak created by evolutions, doesn't it? Correct me if I'm wrong. Any ideas on how to fix this?

Related

Loading data from Oracle table using spark JDBC is extremely slow

I am trying to read 500 millions records from a table using spark jdbc and then performance join on that tables .
When i execute a sql from sql developer it takes 25 Minutes .
But when i load this using spark JDBC it takes forever last time it ran for 18 hours and then i cancelled it .
I am using AWS-GLUE for this .
this is how i read using spark jdbc
df = glueContext.read.format("jdbc")
.option("url","jdbc:oracle:thin://abcd:1521/abcd.com")
.option("user","USER_PROD")
.option("password","ffg#Prod")
.option("numPartitions", 15)
.option("partitionColumn", "OUTSTANDING_ACTIONS")
.option("lowerBound", 0)
.option("upperBound", 1000)
.option("dbtable","FSP.CUSTOMER_CASE")
.option("driver","oracle.jdbc.OracleDriver").load()
customer_casedf=df.createOrReplaceTempView("customer_caseOnpremView")
I have used partitionColumn OUTSTANDING_ACTIONS and here is data distribution
Column 1 is partitionColumn and second is their occurrence
1 8988894
0 4227894
5 2264259
9 2263534
8 2262628
2 2261704
3 2260580
4 2260335
7 2259747
6 2257970
This is my Join where customer_caseOnpremView table loading is taking more than 18 hours and othere two tables takes 1 minutes
ThirdQueryResuletOnprem=spark.sql("SELECT CP.CLIENT_ID,COUNT(1) NoofCases FROM customer_caseOnpremView CC JOIN groupViewOnpremView FG ON FG.ID = CC.OWNER_ID JOIN client_platformViewOnpremView CP ON CP.CLIENT_ID = SUBSTR(FG.PATH, 2, INSTR(FG.PATH, '/') + INSTR(SUBSTR(FG.PATH, 1 + INSTR(FG.PATH, '/')), '/') - 2) WHERE FG.STATUS = 'ACTIVE' AND FG.TYPE = 'CLIENT' GROUP BY CP.CLIENT_ID")
Please suggest how to make it fast .
I have no of worker from 10 to 40
I have used Executor type standard to GP2 biggest one but no impact on job
As your query has lot of filters you don't even need to bring in the whole dataset and then apply filter on it. But you can push this query down to db engine which will in turn filter the data and return back the result for Glue job.
This can be done as explained in https://stackoverflow.com/a/54375010/4326922 and below is an example for mysql which can be applied for oracle too with few changes.
query= "(select ab.id,ab.name,ab.date1,bb.tStartDate from test.test12 ab join test.test34 bb on ab.id=bb.id where ab.date1>'" + args['start_date'] + "') as testresult"
datasource0 = spark.read.format("jdbc").option("url", "jdbc:mysql://host.test.us-east-2.rds.amazonaws.com:3306/test").option("driver", "com.mysql.jdbc.Driver").option("dbtable", query).option("user", "test").option("password", "Password1234").load()

Postgres FTS Priority Field

I am using Postgres FTS to search a field in a table. The only issue is for some reason the below issue is happening.
store=# select name from service where to_tsvector(name) ## to_tsquery('find:*') = true;
name
--------------
Finding Nora
(1 row)
store=# select name from service where to_tsvector(name) ## to_tsquery('findi:*') = true;
name
------
(0 rows)
store=# select name from service where to_tsvector(name) ## to_tsquery('findi:*') = true;
How come when searching using the query findi:*,the result doesnt show?
In my PG 12.2 with default text search configuration I have:
# select to_tsvector('Finding Nora');
to_tsvector
-------------------
'find':1 'nora':2
(1 row)
# select to_tsquery('findi:*');
to_tsquery
------------
'findi':*
(1 row)
I understand that because there is no lexeme findi in the default dictionary, the query does not find any match.

Postgresql pg_profile getting error while creating snapshot

I am referring https://github.com/zubkov-andrei/pg_profile for generating awr like report.
Steps which I have followed are as below :
1) Enabled below parameters inside postgresql.conf (located inside D:\Program Files\PostgreSQL\9.6\data)
track_activities = on
track_counts = on
track_io_timing = on
track_functions = on
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 1000
pg_stat_statements.track = 'top'
pg_stat_statements.save = off
pg_profile.topn = 20
pg_profile.retention = 7
2) Manually copied all the file beginning with pg_profile to D:\Program Files\PostgreSQL\9.6\share\extension
3) From pgAdmin4 console executed below commands successfully
CREATE EXTENSION dblink;
CREATE EXTENSION pg_stat_statements;
CREATE EXTENSION pg_profile;
4) To see which node is already present I executed SELECT * from node_show();
which resulted in
node_name as local
connstr as dbname=postgres port=5432
enabled as true
5) To create a snapshot I executed SELECT * from snapshot('local');
but getting below error
ERROR: could not establish connection
DETAIL: fe_sendauth: no password supplied
CONTEXT: SQL statement "SELECT dblink_connect('node_connection',node_connstr)"
PL/pgSQL function snapshot(integer) line 38 at PERFORM
PL/pgSQL function snapshot(name) line 9 at RETURN
SQL state: 08001
Once I am able to generate multiple snapshot then I guess I should be able to generate report.
just use SELECT * from snapshot()
look at the code of the function. It calls the other one with node as parameter.

pg_locks table has lot of simple select statements

we are connecting to our Postgresql (RDS) server from our django backend as well as lambda, sometimes django backend queries time out and I run the following query to see the locks:
SELECT
pg_stat_activity.client_addr,
pg_stat_activity.query
FROM pg_class
JOIN
pg_locks ON pg_locks.relation = pg_class.oid
JOIN
pg_stat_activity ON pg_locks.pid =
pg_stat_activity.pid
WHERE
pg_locks.granted='t' AND
pg_class.relname='accounts_user'
This gives me 30 rows of simple select queries executed from lambda like this:
SELECT first_name, picture, username FROM accounts_user WHERE id = $1
why does this query hold a lock? should I be worried?
I'm using pg8000 library to connect from Lambda
with pgsql.cursor() as cursor:
cursor.execute(
"""
SELECT first_name, picture, username
FROM accounts_user
WHERE id = %s
""",
(author_user_id,),
)
row = cursor.fetchone()
# use the row ..
I opened an issue at Github maybe it's because I'm using the library wrong. https://github.com/tlocke/pg8000/issues/16
You can also try to reuse the database connection, see https://docs.djangoproject.com/en/2.2/ref/settings/#conn-max-age
DATABASES = {
'default': {
...
'CONN_MAX_AGE': 600, # reuse database connection
}
}

Firebird: Query execution time in iSQL

I would like to get query execution time in iSQL.
For instance :
SELECT * FROM students;
How do i get query execution time ?
Use SET STATS:
SQL> SET STATS;
SQL> SELECT * FROM RDB$DATABASE;
... query output removed ....
Current memory = 34490656
Delta memory = 105360
Max memory = 34612544
Elapsed time= 0.59 sec
Buffers = 2048
Reads = 17
Writes 0
Fetches = 270
SQL>