Interval Task leads to IDLE Connection Exhaustion in R2DBC

Interval Task leads to IDLE Connection Exhaustion in R2DBC - spring-data-r2dbc

I am using reactor java to run a periodic task against Postgres using r2dbc as follows;
Flux.interval(Duration.ofMillis(1000)).doOnNext(i->{
System.out.print("TIME HAS TICKED\n");
Flux.range(0,10).flatMap(j->{
return service.getJob(this.consumerQueueName, this.filter).then();
}).subscribe();
}).subscribe();
after about 5 mins it stops processing jobs and when i check postgres connections are all idle:
select datname as database_name,
client_addr as client_address,
application_name,
backend_start,
state,
state_change
from pg_stat_activity;
integrity_service 10.0.73.1 r2dbc-postgresql 2020-09-18 04:11:07.786098 idle 2020-09-18 04:11:40.471893
integrity_service 10.0.73.1 r2dbc-postgresql 2020-09-18 04:11:07.785822 idle 2020-09-18 04:12:01.196558
integrity_service 10.0.73.1 r2dbc-postgresql 2020-09-18 04:11:07.785598 idle 2020-09-18 04:11:50.971738
integrity_service 10.0.73.1 r2dbc-postgresql 2020-09-18 04:11:07.785317 idle 2020-09-18 04:11:30.506207
integrity_service 10.0.73.1 r2dbc-postgresql 2020-09-18 04:11:07.665800 idle 2020-09-18 04:11:20.570714
How can I appropriately use r2dbc and databaseClient to periodically fetch data from a table without causing this exception?
//ConnectionFactory Settings:
ConnectionFactories.get(
ConnectionFactoryOptions.builder()
.option(Option.valueOf("driver"), "pool")
.option(Option.valueOf("protocol"), "postgresql")
//.option(ConnectionFactoryOptions.DRIVER, "postgresql")
.option(ConnectionFactoryOptions.HOST, "localhost")
.option(ConnectionFactoryOptions.PORT, 5432) // optional, defaults to 5432
.option(ConnectionFactoryOptions.USER, "db")
.option(ConnectionFactoryOptions.DATABASE, "integrity_service")
.option(MAX_SIZE, 5)
.build());
private final String fetchJobFormat =
" WITH cte AS ( SELECT id FROM %s WHERE chain_id='%s' and is_complete=%b ORDER BY id ASC LIMIT 1\n" +
" )\n" +
" UPDATE queue q\n" +
" SET timestamp = extract(epoch from now()),\n" +
" is_complete = TRUE\n" +
" FROM cte WHERE q.id = cte.id\n" +
" RETURNING q.id, q.chain_id,q.timestamp, q.is_complete, q.payload";
public Mono<Queue> getJob(String queue, String filter){
return databaseClient.execute(String.format(fetchJobFormat,queue,filter,false))
.fetch().all()
.flatMap((v) -> {
System.out.println("retrieved result" + v.get("id").toString());
Queue q = this.objectMapper.convertValue(v, Queue.class);
return Mono.just(q);
}).last();
}

Related

Apache Airflow - Slow to parse SQL queries on AWS MWAA

I'm trying to build a DAG on AWS MWAA, this DAG will export data from Postgres (RDS) to S3, but it's getting an issue once the MWAA tries to parse all queries from my task, in total it will export 385 tables, but the DAG gets stuck on running mode and does not start my task.
Basically, this process will:
Load the table schema
Rename Some Columns
Export data to S3
Function
def export_to_s3(dag, conn, db, pg_hook, export_date, s3_bucket, schemas):
tasks = []
run_queries = []
for schema, features in schemas.items():
t = features.get("tables")
if t:
tables = t
else:
tables = helper.get_tables(pg_hook, schema).table_name.tolist()
is_full_export = features.get("full")
for table in tables:
columns = helper.get_table_schema(
pg_hook, table, schema
).column_name.tolist()
masked_columns = helper.masking_pii(columns, pii_columns=PII_COLS)
masked_columns_str = ",\n".join(masked_columns)
if is_full_export:
statement = f'select {masked_columns_str} from {db}.{schema}."{table}"'
else:
statement = f'select {masked_columns_str} from {db}.{schema}."{table}" order by random() limit 10000'
s3_bucket_key = export_date + "_" + schema + "_" + table + ".csv"
sql_export = f"""
SELECT * from aws_s3.query_export_to_s3(
'{statement}',
aws_commons.create_s3_uri(
'{s3_bucket}',
'{s3_bucket_key}',
'ap-southeast-2'),
options := 'FORMAT csv, DELIMITER $$|$$'
)""".strip()
run_queries.append(sql_export)
def get_table_schema(pg_hook, table_name, table_schema):
""" Gets the schema details of a given table in a given schema."""
query = """
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_schema = '{0}'
AND table_name = '{1}'
order by ordinal_position
""".format(table_schema, table_name)
df_schema = pg_hook.get_pandas_df(query)
return df_schema
def get_tables(pg_hook, schema):
query = """
select table_name from information_schema.tables
where table_schema = '{}' and table_type = 'BASE TABLE' and table_name != '_sdc_rejected' """.format(schema)
df_schema = pg_hook.get_pandas_df(query)
return df_schema
Task
task = PostgresOperator(
sql=run_queries,
postgres_conn_id=conn,
task_id="export_to_s3",
dag=dag,
autocommit=True,
)
tasks.append(task)
return tasks
Airflow list_dags output
DAGS
-------------------------------------------------------------------
mydag
-------------------------------------------------------------------
DagBag loading stats for /usr/local/airflow/dags
-------------------------------------------------------------------
Number of DAGs: 1
Total task number: 3
DagBag parsing time: 159.94030800000002
-----------------------------------------------------+--------------------+---------+----------
file | duration | dag_num | task_num
-----------------------------------------------------+--------------------+---------+----------
/mydag.py | 159.05215199999998 | 1 | 3
/ActivationPriorityCallList/CallList_Generator.py | 0.878734 | 0 | 0
/ActivationPriorityCallList/CallList_Preprocessor.py | 0.00744 | 0 | 0
/ActivationPriorityCallList/CallList_Emailer.py | 0.001154 | 0 | 0
/airflow_helperfunctions.py | 0.000828 | 0 | 0
-----------------------------------------------------+--------------------+---------+----------
Observation
If I enable only one table to be loaded in the task, it works well, but fails if all tables are enabled to be loaded.
This behavior is the same if execute Airflow from docker pointing out to RDS
Screenshot from the airflow list_dags:

The issue was solved when I changed those values on MWAA.
webserver.web_server_master_timeout
webserver.web_server_worker_timeout
The default value is 30, I changed it to 480.
Link with documentation.
https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html

AWS Redshift: FATAL: connection limit "500" exceeded for non-bootstrap users

Hope you're all okay.
We hit this limit quite often. We know there is no way to up the 500 limit of concurrent user connections in Redshift. We also know certain views (pg_user_info) provide info as to the user's actual limit.
We are looking for some answers not found in this forum plus any guidance based on your experience.
Questions:
Does recreation of the cluster with bigger EC2 instances, would yield a higher limit value?
Does adding new nodes to the existing cluster would yield a higher limit value?
From the app development perspective: What specific strategies/actions you'd recommend in order to spot or predict a situation whereby this limit will be hit?
Txs - Jimmy

Okay folks.
thanks to all who answered.
I posted a support ticket in AWS and this is the recommendation, pasting all here, it's long but I hope it works for many people running into this issue. The idea is to catch the situation before it happens:
To monitor the number of connections made to the database, you can create a cloudwatch alarm based on the Database connections metrics that will trigger a lambda function when a certain threshold is reached. This lambda function can then terminate idle connections by calling a procedure that terminates idle connections.
Please find the query that creates a procedure to log and terminate long running inactive sessions
:
1. Add view to get all current inactive sessions in the cluster
CREATE OR REPLACE VIEW inactive_sessions as (
select a.process,
trim(a.user_name) as user_name,
trim(c.remotehost) as remotehost,
a.usesysid,
a.starttime,
datediff(s,a.starttime,sysdate) as session_dur,
b.last_end,
datediff(s,case when b.last_end is not null then b.last_end else a.starttime end,sysdate) idle_dur
FROM
(
select starttime,process,u.usesysid,user_name
from stv_sessions s, pg_user u
where
s.user_name = u.usename
and u.usesysid>1
and process NOT IN (select pid from stv_inflight where userid>1
union select pid from stv_recents where status != 'Done' and userid>1)
) a
LEFT OUTER JOIN (
select
userid,pid,max(endtime) as last_end from svl_statementtext
where userid>1 and sequence=0 group by 1,2) b ON a.usesysid = b.userid AND a.process = b.pid
LEFT OUTER JOIN (
select username, pid, remotehost from stl_connection_log
where event = 'initiating session' and username <> 'rsdb') c on a.user_name = c.username AND a.process = c.pid
WHERE (b.last_end > a.starttime OR b.last_end is null)
ORDER BY idle_dur
);
2. Add table for logging information about long running transactions that was terminated
CREATE TABLE IF NOT EXISTS terminated_inactive_sessions (
process int,
user_name varchar(50),
remotehost varchar(50),
starttime timestamp,
session_dur int,
idle_dur int,
terminated_on timestamp DEFAULT GETDATE()
);
3. Add procedure to log and terminate any inactive transactions running for longer than 'n' amount of seconds
CREATE OR REPLACE PROCEDURE terminate_and_log_inactive_sessions (n INTEGER)
AS $$
DECLARE
expired RECORD ;
BEGIN
FOR expired IN SELECT process, user_name, remotehost, starttime, session_dur, idle_dur FROM inactive_sessions where idle_dur >= n
LOOP
EXECUTE 'INSERT INTO terminated_inactive_sessions (process, user_name, remotehost, starttime, session_dur, idle_dur) values (' || expired.process || ' , ''' || expired.user_name || ''' , ''' || expired.remotehost || ''' , ''' || expired.starttime || ''' , ' || expired.session_dur || ' , ' || expired.idle_dur || ');';
EXECUTE 'SELECT PG_TERMINATE_BACKEND(' || expired.process || ')';
END LOOP ;
END ;
$$ LANGUAGE plpgsql;
4. Execute the procedure by running the following command:
call terminate_and_log_inactive_sessions(100);
Here is a sample lambda function that attempts to close idle connections by querying the view 'inactive_sessions' created above, which you can use as a reference.
#Current time
now = datetime.datetime.now()
query = "SELECT process, user_name, session_dur, idle_dur FROM inactive_sessions where idle_dur >= %d"
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
try:
conn = psycopg2.connect("dbname=" + db_database + " user=" + db_user + " password=" + db_password + " port=" + db_port + " host=" + db_host)
conn.autocommit = True
except:
logger.error("ERROR: Unexpected error: Could not connect to Redshift cluster.")
sys.exit()
logger.info("SUCCESS: Connection to RDS Redshift cluster succeeded")
with conn.cursor() as cur:
cur.execute(query % (session_idle_limit))
row_count = cur.rowcount
if row_count >=1:
result = cur.fetchall()
for row in result:
print("terminating session with pid %s that has been idle for %d seconds at %s" % (row[0],row[3],now))
cur.execute("SELECT PG_TERMINATE_BACKEND(%s);" % (row[0]))
conn.close()
else:
conn.close()

As you said this is a hard limit in Redshift and there is no way to up it. Redshift is not a high concurrency / high connection database.
I expect that if you need the large data analytic horsepower of Redshift you can get around this with connection sharing. Pgpool is a common tool for this.

kafka-connect-jdbc does not fetch consecutive timestamp from source

I use kafka-connect-jdbc-4.0.0.jar and postgresql-9.4-1206-jdbc41.jar
configuration of connector of kafka connect
{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "timestamp",
"timestamp.column.name": "updated_at",
"topic.prefix": "streaming.data.v2",
"connection.password": "password",
"connection.user": "user",
"schema.pattern": "test",
"query": "select * from view_source",
"connection.url": "jdbc:postgresql://host:5432/test?currentSchema=test"
}
I have configured two connectors one source and another sink using the jdbc driver, against a postgresql database ("PostgreSQL 9.6.9")
everything works correctly
I have doubts in how the connector collects the source data, looking at the log I see that between the execution of the queries there is a time difference of 21 seconds
11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG Checking for next block of results from TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} (io.confluent.connect.jdbc.source.JdbcSourceTask)
11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} prepared SQL query: select * from view_source WHERE "updated_at" > ? AND "updated_at" < ? ORDER BY "updated_at" ASC (io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier)
11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG executing query select CURRENT_TIMESTAMP; to get current time from database (io.confluent.connect.jdbc.util.JdbcUtils)
11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG Executing prepared statement with timestamp value = 2019-01-11 08:17:07.000 end time = 2019-01-11 08:20:18.985 (io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier)
11/1/2019 9:20:19[2019-01-11 08:20:19,070] DEBUG Resetting querier TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} (io.confluent.connect.jdbc.source.JdbcSourceTask)
11/1/2019 9:20:49[2019-01-11 08:20:49,499] DEBUG Checking for next block of results from TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} (io.confluent.connect.jdbc.source.JdbcSourceTask)
11/1/2019 9:20:49[2019-01-11 08:20:49,500] DEBUG TimestampIncrementingTableQuerier{name='null', query='select * from view_source', topicPrefix='streaming.data.v2', timestampColumn='updated_at', incrementingColumn='null'} prepared SQL query: select * from view_source WHERE "updated_at" > ? AND "updated_at" < ? ORDER BY "updated_at" ASC (io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier)
11/1/2019 9:20:49[2019-01-11 08:20:49,500] DEBUG executing query select CURRENT_TIMESTAMP; to get current time from database (io.confluent.connect.jdbc.util.JdbcUtils)
11/1/2019 9:20:49[2019-01-11 08:20:49,500] DEBUG Executing prepared statement with timestamp value = 2019-01-11 08:20:39.000 end time = 2019-01-11 08:20:49.500 (io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier)
the first query collects data between 08: 17: 07.000 and 08: 20: 18.985, but the second gathers data between 08: 20: 39.000 and 08: 20: 49.500 .. between both there are 21 seconds of difference in which there may be records ...
11/1/2019 9:20:18[2019-01-11 08:20:18,985] DEBUG Executing prepared statement with timestamp value = 2019-01-11 08:17:07.000 end time = 2019-01-11 08:20:18.985
11/1/2019 9:20:49[2019-01-11 08:20:49,500] DEBUG Executing prepared statement with timestamp value = 2019-01-11 08:20:39.000 end time = 2019-01-11 08:20:49.500
I assume that one of the data is the last record obtained and the other value the timestamp of the moment
I can not find an explanation about this
Is the normal operation of the connector?
Should you assume that you are not always going to collect all the information?

The JDBC connector is not guaranteed to retrieve every message. For that, you need log-based Change Data Capture. For Postgres that is provided by Debezium and Kafka Connect.
You can read more about this here.
Disclaimer: I work for Confluent, and wrote the above blog
Edit: This is now a recording available of the above blog too from ApacheCon 2020: 🎥 https://rmoff.dev/no-more-silos

powershell script to capture part of data from command line output

sorry if the title is a bit confusing, honestly i did not know how to put it in plain words or what exactly to search for. I have an output from command line showing as per below.
Remote Copy System Information
Status: Started, Normal
Target Information
Name ID Type Status Options Policy
3PARSYSTEM1 2 IP ready - mirror_config
Link Information
Target Node Address Status Options
3PARSYSTEM1 0:3:1 xxx.xxx.xxx.xxx Up -
3PARSYSTEM1 1:3:1 xxx.xxx.xxx.xxx Up -
receive 0:3:1 receive Up -
receive 1:3:1 receive Up -
Group Information
Name Target Status Role Mode Options
GRP001Temp 3PARSYSTEM1 Started Primary Periodic Last-Sync 2018-11-04 00:08:09 MYT, Period 3h,over_per_alert
LocalVV ID RemoteVV ID SyncStatus LastSyncTime
LUN001-Temp 13304 LUN001-TempDR 16914 Synced 2018-11-04 00:08:10 MYT
Name Target Status Role Mode Options
GRP002-PHY01 3PARSYSTEM1 Started Primary Periodic Last-Sync 2018-11-04 01:17:54 MYT, Period 2h,auto_recover,over_per_alert
LocalVV ID RemoteVV ID SyncStatus LastSyncTime
LUN001-VVT2.12 120 LUN001-VVT2.12 210 Syncing (33%) 2018-11-03 23:51:04 MYT
Name Target Status Role Mode Options
GRP003-PHY02 3PARSYSTEM1 Started Primary Periodic Last-Sync 2018-11-04 01:27:12 MYT, Period 1h45m,auto_recover,over_per_alert
LocalVV ID RemoteVV ID SyncStatus LastSyncTime
LUN002-VVT2.14 130 LUN002-VVT2.14 207 Syncing (49%) 2018-11-03 23:59:27 MYT
Name Target Status Role Mode Options
GRP001-PRD-ORA 3PARSYSTEM1 Started Primary Periodic Last-Sync 2018-11-04 00:45:09 MYT, Period 2h,auto_recover,over_per_alert
LocalVV ID RemoteVV ID SyncStatus LastSyncTime
ORA-PROD-VG01.35 97 ORA-PROD-VG01.35 2451 Synced 2018-11-04 00:45:54 MYT
ORA-PROD-VG02.36 98 ORA-PROD-VG02.36 2452 Synced 2018-11-04 00:46:10 MYT
ORA-PROD-VG03.37 99 ORA-PROD-VG03.37 2453 Synced 2018-11-04 00:45:48 MYT
ORA-PROD-VG04.38 100 ORA-PROD-VG04.38 2454 Synced 2018-11-04 00:45:12 MYT
ORA-PROD-VG05.39 101 ORA-PROD-VG05.39 2455 Synced 2018-11-04 00:45:12 MYT
Name Target Status Role Mode Options
GRP001-PRD-SAP 3PARSYSTEM1 Started Primary Periodic Last-Sync 2018-11-04 01:24:25 MYT, Period 23m,auto_recover,over_per_alert
LocalVV ID RemoteVV ID SyncStatus LastSyncTime
SAP-PROD-APPS.4 80 SAP-PROD-APPS.4 1474 Synced 2018-11-04 01:24:28 MYT
SAP-PROD-LOCK.19 95 SAP-PROD-LOCK.19 1490 Synced 2018-11-04 01:24:25 MYT
SAP-PROD-SAPDT1.5 81 SAP-PROD-SAPDT1.5 1475 Synced 2018-11-04 01:25:16 MYT
SAP-PROD-SAPDT2.6 82 SAP-PROD-SAPDT2.6 1476 Synced 2018-11-04 01:25:05 MYT
SAP-PROD-SAPDT3.7 83 SAP-PROD-SAPDT3.7 1477 Synced 2018-11-04 01:25:07 MYT
SAP-PROD-SAPDT4.8 84 SAP-PROD-SAPDT4.8 1478 Synced 2018-11-04 01:25:41 MYT
SAP-PROD-SAPDT5.9 85 SAP-PROD-SAPDT5.9 1479 Synced 2018-11-04 01:25:35 MYT
SAP-PROD-SAPDT6.10 86 SAP-PROD-SAPDT6.10 1480 Synced 2018-11-04 01:25:56 MYT
Name Target Status Role Mode Options
GRP002-PRD-SAP 3PARSYSTEM1 Started Primary Periodic Last-Sync 2018-11-04 01:24:55 MYT, Period 23m,over_per_alert
LocalVV ID RemoteVV ID SyncStatus LastSyncTime
SAP-PROD-VG01.10 15 SAP-PROD-VG01.10 29769 Synced 2018-11-04 01:28:44 MYT
and i want to use powershell to capture the group information and so that i can get the groupname and run another command to loop through the group name. example output is as per below.
GRP001Temp
GRP002-PHY01
GRP003-PHY02
GRP001-PRD-ORA
GRP001-PRD-SAP
GRP002-PRD-SAP
Hope you can help me with my problem. Thank You in advance.

if every group is in the Role "Primary" one easy way might be the following statement:
get-content Demo.txt | where { $_ -match "Primary" } | % { $_.Split(" ")[0] }
It gets the lines which contain the word "Primary" and takes the first word (in your case the group name)

You can split the output by the linebreak and loop over the result.
If a line starts with Name, split the next line by a space and write the first element to the output.
Something like:
param($output)
$lines = $output -split [Environment]::NewLine
$name = $false
foreach($line in $lines) {
if($name) {
($line -split ' ')[0] | Write-Output
$name = $false
}
if($line.Startswith('Name')) {
$name = $true
}
}

Obviously you are after the first word following the Header
Name Target Status Role Mode Options
Not including the false header
Name ID Type Status Options Policy
the other answers aren't excluding.
So I'd split the file content with a RegEx by this Header into sections and skip the first one.
Then split each section into words separated with white space \s and get the first [0]
(Get-Content .\SO_53154266.txt -raw) -split "(?sm)^Name\s+Target.*?`r?`n" |
Select-Object -skip 1|
ForEach-Object { ($_ -split '\s')[0] }
Sample output:
GRP001Temp
GRP002-PHY01
GRP003-PHY02
GRP001-PRD-ORA
GRP001-PRD-SAP
GRP002-PRD-SAP

Firebird: Query execution time in iSQL

I would like to get query execution time in iSQL.
For instance :
SELECT * FROM students;
How do i get query execution time ?

Use SET STATS:
SQL> SET STATS;
SQL> SELECT * FROM RDB$DATABASE;
... query output removed ....
Current memory = 34490656
Delta memory = 105360
Max memory = 34612544
Elapsed time= 0.59 sec
Buffers = 2048
Reads = 17
Writes 0
Fetches = 270
SQL>

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Interval Task leads to IDLE Connection Exhaustion in R2DBC - spring-data-r2dbc

Related

Apache Airflow - Slow to parse SQL queries on AWS MWAA

AWS Redshift: FATAL: connection limit "500" exceeded for non-bootstrap users

kafka-connect-jdbc does not fetch consecutive timestamp from source

powershell script to capture part of data from command line output

Firebird: Query execution time in iSQL

Categories

Resources