We are using boofuzz for fuzzing remote service with tcp protocol. The fuzzer script is as follows.
session = Session(target = Target(connection = SocketConnection(host, port, proto='tcp')))
s_initialize("Test")
s_string("Fuzz", fuzzable = True)
session.connect(s_get("Test"))
session.fuzz()
After a while, we noticed the remote service is crashed but the fuzzer just repeatly tried to restart. The fuzzer did not detect the remote service is closed and the crashed test case is not stored.
[2022-02-02 04:18:42,231] Test Step: Restarting target
[2022-02-02 04:18:42,231] Info: Restarting target process using CallbackMonitor
[2022-02-02 04:18:42,231] Test Step: Cleaning up connections from callbacks
[2022-02-02 04:18:42,231] Info: Closing target connection...
[2022-02-02 04:18:42,231] Info: Connection closed.
[2022-02-02 04:18:42,231] Info: No reset handler available... sleeping for 5 seconds
[2022-02-02 04:18:47,236] Info: Opening target connection (xxx)...
[2022-02-02 04:18:47,237] Info: Cannot connect to target; retrying. Note: This likely indicates a failure caused by the previous test case, or a target that is slow to restart.
[2022-02-02 04:18:47,237] Test Step: Restarting target
[2022-02-02 04:18:47,237] Info: Restarting target process using CallbackMonitor
[2022-02-02 04:18:47,237] Test Step: Cleaning up connections from callbacks
[2022-02-02 04:18:47,237] Info: Closing target connection...
[2022-02-02 04:18:47,237] Info: Connection closed.
[2022-02-02 04:18:47,237] Info: No reset handler available... sleeping for 5 seconds
[2022-02-02 04:18:52,243] Info: Opening target connection (xxx)...
[2022-02-02 04:18:52,244] Info: Cannot connect to target; retrying. Note: This likely indicates a failure caused by the previous test case, or a target that is slow to restart.
How can we customize the boofuzz script so that:
we can detect the remote service is closed (e.g., try tcp connect)?
we can store the untruncated crashed test case to disk?
If you are not using the monitors, you could add a post_test_case_callbacks that check if the server is alive. This function will be called after each test case.
post_test_case_callbacks (list of method) – The registered method will
be called after each fuzz test case. Default None.
e.g.
logger = FuzzLoggerText()
def target_alive(target, fuzz_data_logger, session, sock, *args, **kwargs):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
sock.connect((host, port))
logger.log_pass(description="alive")
except ConnectionRefusedError:
logger.log_fail(description="Server down")
session = Session(target=Target(SocketConnection(host, int(port))), post_test_case_callbacks=[target_alive])
Related
We are facing this issue with POSTGRES 12.11, only on windows OS, with Linux OS, it works fine,
if the executable that is using the database session crashes, then the entire database goes to recovery mode and restarts, in the Postgres log, we can find the below messages.
2022-10-06 17:44:09.210 CEST [8860] LOG: server process (PID 9980) exited with exit code 0
2022-10-06 17:44:09.210 CEST [8860] LOG: terminating any other active server processes
2022-10-06 17:44:09.211 CEST [9992] WARNING: terminating the connection because of the crash of another server process
2022-10-06 17:44:09.211 CEST [9992] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit because another server process exited abnormally and possibly corrupted shared memory.
2022-10-06 17:44:09.211 CEST [9992] HINT: In a moment you should be able to reconnect to the database and repeat your command.
and then it goes to recovery mode.
any idea or workaround to avoid this issue? and if the Postgres crashes, where does the crash dump gets generated in the windows?
I have 2 members in patroni cluster (1-master and 1-replica). In logs i saw problem after master reconnecting to new etcd server:
ERROR: Request to server http://etcd2:2379 failed: MaxRetryError('HTTPConnectionPool(host=\'etcd2\', port=2379): Max retries exceeded with url: /v2/keys/patroni/patroni-cluster/?recursive=true (Caused by ReadTimeoutError("HTTPConnectionPool(host=\'etcd2\', port=2379): Read timed out. (read timeout=3.333078201239308)"))')
INFO: Reconnection allowed, looking for another server.
INFO: Retrying on http://etcd1:2379
INFO: Selected new etcd server http://etcd1:2379
INFO: Lock owner: patroni2; I am patroni1
INFO: does not have lock
INFO: Reaped pid=3098484, exit status=0
LOG: received immediate shutdown request
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
After this replica node became a master:
INFO: Got response from patroni1 http://0.0.0.0:8008/patroni: {"state": "running", "postmaster_start_time": "2021-08-09 14:43:18.372 UTC", "role": "replica", "server_version": 120003, "cluster_unlocked": true, "xlog": {"received_location": 139045264096, "replayed_location": 139045264096, "replayed_timestamp": "2021-09-27 15:03:10.389 UTC", "paused": false}, "timeline": 30, "database_system_identifier": "6904244251638517787", "patroni": {"version": "1.6.5", "scope": "patroni-cluster"}}
WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
INFO: promoted self to leader by acquiring session lock
server promoting
LOG: received promote request
INFO: Lock owner: patroni2; I am patroni2
INFO: no action. i am the leader with the lock
ERROR: replication slot "patroni1" does not exist
ERROR: replication slot "patroni1" does not exist
INFO: acquired session lock as a leader
As you can see above new master cannot see a patroni1 now. After several times to recover wal patroni1 wrote these logs below:
INFO: establishing a new patroni connection to the postgres cluster
INFO: My wal position exceeds maximum replication lag
INFO: following a different leader because i am not the healthiest node
INFO: My wal position exceeds maximum replication lag
These logs information doesn't change at this time. patroni2 writes acquired session lock as a leader and patroni1 writes my wal position exceeds maximum replication lag.
But i can't see them in patroni cluster when use patronictl -c /patroni.yml list command.
How should i bring them back to cluster in better way?
I have a code that used to work on a simple table and stopped working when the same table was partitioned to many sub-partitioned.
In a distributed application (Spark) we have code that performs batch delete queries in parallel from different computers in the same time (deleting different records).
Most of the queries work but then one of them fails on what seems to be a socket connection
timeout:
java.sql.BatchUpdateException: Batch entry 0 DELETE FROM my_table WHERE vessel_id='xxxxxx' AND day='2020-09-15 00:00:00+00'::timestamp was aborted: An I/O error occurred while sending to the backend. Call getNextException to see other errors in the batch.
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
When the code retries to run the task the connection fails on
:FATAL: the database system is in recovery mode
In the database log I see:
2020-09-21 16:44:27 UTC::#:[26848]:DETAIL: Failed process was running: DELETE FROM my_table WHERE vessel_id=$1 AND day=$2
2020-09-21 16:44:27 UTC::#:[26848]:LOG: terminating any other active server processes
2020-09-21 16:44:27 UTC:172.31.4.110(59468):postgres#postgres:[27705]:WARNING: terminating connection because of crash of another server process
2020-09-21 16:44:27 UTC:172.31.4.110(59468):postgres#postgres:[27705]:DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-09-21 16:44:27 UTC:172.31.4.110(59468):postgres#postgres:[27705]:HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-09-21 16:44:27 UTC:10.3.1.138(57926):rdsrepladmin#[unknown]:[26740]:WARNING: terminating connection because of crash of another server process
2020-09-21 16:44:27 UTC:10.3.1.138(57926):rdsrepladmin#[unknown]:[26740]:DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-09-21 16:44:27 UTC:10.3.1.138(57926):rdsrepladmin#[unknown]:[26740]:HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-09-21 16:44:27 UTC::#:[22480]:WARNING: terminating connection because of crash of another server process
2020-09-21 16:44:27 UTC::#:[22480]:DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-09-21 16:44:27 UTC::#:[22480]:HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-09-21 16:44:27 UTC:127.0.0.1(31826):rdsadmin#rdsadmin:[27967]:FATAL: the database system is in recovery mode
Any ideas why the database fails when the table is partitioned?
Why all the other connections on the other computers are closed and the database goes into recovery mode?
After looking at the logs I found that the problem was out-of-memory.
This database instance is the main instance, it does the writing, replicating and deleting and it didn't have enough memory to handle all these tasks at the same time.
The fix was simply to add more memory.
Nothing fancy.
I have checked server logs in Azure server and see some log records as bellow:
LOG: connection received: host=xxx port=xxx pid=xxx LOG: connection
authorized: user=xxxdatabase=xxx SSL enabled (protocol=TLSv1.1,
cipher=xxx, compression=off)
LOG: could not receive data from client: An existing connection was forcibly closed by the remote host.
At: Link
It mentioned about error code WSAECONNRESET.
But from MS document Link
This code don't mean about
could not receive data from client: An existing connection was forcibly closed by the remote host.
Assume this issue related to WSAECONNRESET, so the developer need to add close command in their application to end their session?
Thanks,
Dokota.
I am trying to start up Zookeeper via the CLI with the command:
bin/zookeeper-server-start.sh ../config/zookeeper.properties
And it hums along for a second with what all seems to be correct until it says this:
INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
and then the below loops indefinitely until I exit:
[2018-08-10 15:07:48,223] INFO Accepted socket connection from /172.31.39.32:46374 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2018-08-10 15:07:48,228] WARN Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running (org.apache.zookeeper.server.NIOServerCnxn)
[2018-08-10 15:07:48,228] INFO Closed socket connection for client /172.31.39.32:46374 (no session established for client) (org.apache.zookeeper.server.NIOServerCnxn)
This is a single server and I believe a single node test server, so there isn't a quorum or other pieces running. My zookeeper config is basic, it only contains this:
dataDir=/tmp/zookeeper
clientPort=2181
maxClientCnxns=0
The weird thing is, my zookeeper had been running fine, and I had made NO changes to the config. Pulled it down to try to fix something else to do a quick restart on the zookeeper, and it won't budge. I've checked, and nothing else is running on port 2181.
I see this question has been asked several times with no answers, any ideas?
This might be happening because of some corruption in zookeeper data. You should not set dataDir to /tmp/*. If your computer purges some data of /tmp, it will be difficult for zookeeper to restore the state upon restart. If you check the zookeeper logs, you should see some kind of exception there.
Since you mentioned this zookeeper instance is for test purpose only. You should set
dataDir to anything but /tmp and try restart.