Drools workbench (business-central) requests timing out - drools

I have installed business central along with Keycloak authentication using MySQL as a database for storing Keycloak's data. The business-central workbench and Keycloak server are behind Nginx.
While working on the workbench some of the request timeout giving a 504 error code. The whole business central UI freezes and the user is not able to do anything after that.
The urls that error out in 504 are like: https://{host}:{port}/business-central/out.43601-24741.erraiBus?z=105&clientId=43601-24741
Other details about the setup are as below:
Java: 1.8.0_242
Business central version: 7.34.Final
Keycloak version: 9.0.0
MySql: 8
Java options for business central: -Xms1024M -Xmx2048M -XX:MaxPermSize=2048M -XX:MaxHeapSize=2048M
Note: All of this setup of mine is on a 4GB EC2 instance.
Any help on this issue would be appreciated.
EDIT: I have checked the access_log.log and it looks like the server takes more than 45 sec to process the request. Here is a log:
"POST /business-central/in.93979-28827.erraiBus?z=15&clientId=93979-28827&wait=1 HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"i 45001 45.001
EDIT 2: Here is a sample request data that is sent:
[{"CommandType":"CDIEvent","BeanType":"org.kie.workbench.common.screens.library.api.ProjectCountUpdate","BeanReference":{"^EncodedType":"org.kie.workbench.common.screens.library.api.ProjectCountUpdate","^ObjectID":"1","count":1,"space":{"^EncodedType":"org.uberfire.spaces.Space","^ObjectID":"2","name":"Fraud_Team"}},"FromClient":"1","ToSubject":"cdi.event:Dispatcher"},{"ToSubject":"org.kie.workbench.common.screens.library.api.LibraryService:RPC","CommandType":"getAllUsers:","Qualifiers":{"^EncodedType":"java.util.ArrayList","^ObjectID":"1","^Value":[]},"MethodParms":{"^EncodedType":"java.util.Arrays$ArrayList","^ObjectID":"2","^Value":[]},"ReplyTo":"org.kie.workbench.common.screens.library.api.LibraryService:RPC.getAllUsers::94:RespondTo:RPC","ErrorTo":"org.kie.workbench.common.screens.library.api.LibraryService:RPC.getAllUsers::94:Errors:RPC"}]
The URL hit is : business-central/in.59966-45867.erraiBus?z=56&clientId=59966-45867&wait=1
It took more than a minute to process.

Problem Description
I had this same problem on 7.38.0. The problem, I believe, is that ERRAI seems to keep rolling 45 second requests open between the client and server to ensure communication is open. For me, Nginx had a default socket timeout of 30s which meant that it was returning a 504 gateway timeout for these requests, when in actuality they weren't "stuck". This would only happen if you didn't do anything within Business Central for 30 seconds, as otherwise the request would close and a new one takes over. I feel like ERRAI should really be able to recover from such a scenario, but anyway.
Solution
For me, I updated the socket timeout of my Nginx server to 60s such that the 45s requests didn't get timed out by Nginx. I believe this is equiavalent to the proxy_read_timeout config in Nginx.
If you can't touch your Nginx config, it seemed like there may also be a way to turn off the server to client communication as outlined here: https://docs.jboss.org/errai/4.0.0.Beta3/errai/reference/html_single/#sid-59146643_BusLifecycle-TurningServerCommunicationOnandOff. I didn't test this as I didn't need to, but it may be an option.

Related

An existing connection was forcibly closed by the remote host - Google Actions SDK (gactions)

We are trying to update our google action using actions SDK by executing the below command in CLI.
Command:
gactions update --action_package GoogleAssistantAction.json --project PROJECTNAMEHERE
Below is the error message we're getting
Error message:
Error: Get https://dl.google.com/gactions/updates.json: read tcp 172.30.63.145:1430->172.217.13.238:443: wsarecv: An existing connection was forcibly closed by the remote host.
Pushing the app for the Assistant for testing...
POST /v2/users/me/previews/PROJECT NAME HERE:batchUpdatePreviewFromAgentDraft HTTP/1.1
Host: actions.googleapis.com
User-Agent: Gactions-CLI/2.1.3 (windows; amd64; stable/dff629ae63fd0b047d19687b79274524569714e6)
Content-Length: 540
Content-Type: application/json
Accept-Encoding: gzip
However, based on our research and troubleshooting, we figured out that the issue stems from gactions not being proxy-aware. All internet traffic should be going through bluecoat proxy, where it will then be passed through the firewall. However, our internet traffic (port 443) is directly reaching out to the internet, which is causing it to be blocked by the firewall. Can anyone please help figuring the proxy settings in our application so that it won’t directly reach the internet? Thank you!
At present, gactions only supports transparent proxy configurations. That is, proxy configurations that processes don't need to be explicitly be aware of.
Unfortunately, I'm unable to give you specific guidance for bluecoat. If you're able configure it as a transparent proxy, gactions should work in your environment.

How to edit http connection in wildfly 8.2.1 on Linux machine

I have deployed a simple Servlet web application on Wildfly 8.2.1 on RHEL 6.9. This application just takes post request and respond with 200 OK.
Now when the client(java client using apache-common-http client) is posting data on the web application. The web application is accepting the request but many of the requests are failing also with ERROR "Caused by java.net.ConnectException: Connection timed out (Connection timed out)" at the client side.
Here my assumption is, Wildfly has some default value for max Http connection which can be opened at any point in time. if further requests are coming which require opening a new connection, web server is rejecting them all.
could anyone here please help me with below question:
How can we check live open HTTP connections in RHEL 6.9. I mean command in RHEL to check how many connection open on port 8080?
How can we tweak the default value of the HTTP connection in wildfly?
Does HTTP connection and max thread count linked with each other. If So, Please let me know how they should be updated in wildfly configuration(standalone.xml).
How many requests can be kept in the queue by Wildfly? what will be happening to the request coming to
wildfly server if the queue is full.
NOTE: It is a kind of load testing for the webserver where traffic is high, not sure about exact value but it's high.
You're getting into some system administration topics but I'll answer what I can. First and foremost - Wildfly 8.2.1 is part of the very first release of Wildfly and I'd strongly encourage upgrading to a newer version.
To check the number of connections in a Unix-like environment you'll want to use the netstat command line. In your case, something like:
netstat -na | grep 8080 | grep EST
This will show you all the connections that are ESTABLISHED to port 8080. That would give you an snapshot of the number of connections. Pipe that to wc to get a count.
Next, finding documentation on Wildfly 8.2.1 is a bit challenging now but Wildfly 8 uses Undertow for the socket IO. That in turn uses XNIO. I found a thread that goes into some detail about configuring the I/O subsystem. Note that Wildfly 8.2.1 uses Undertow 1.1.8 which isn't documented anywhere I could find.
For your last two questions I believe they're related to the second one - the XNIO configuration includes configuration like:
<subsystem xmlns="urn:jboss:domain:io:1.0">
<worker name="default" io-threads="5" task-max-threads="50"/>
<buffer-pool name="default" buffer-size="16384" buffers-per-slice="128"/>
</subsystem>
but you'll need to dig deeper into the docs for details.
In Wildfly 19.1.0.Final the configuration looks similar to the code above other than the version is now 3.0.

Teamcity fails to publish artifacts and stop builds

I'm having an issue with TeamCity that is proving very difficult to solve for a number of reasons. I've looked around and not found any useful answers so far.
We have a teamcity server running on port 8080 with two agents connecting to it on ports 9090 and 9091 respectively. The agents register successfully and can accept new builds just fine. When the build is complete, tests have passed and the logs state "Sending artifacts" things stop and the artifacts never reach the server. Having left this sit overnight I make requests to stop the build which fail.
We have recently switched to a new firewall but things have been working after setting the required port rules for 8080, 9090 and 9091. No changes have been made since we got things working but now things do not work.
To the logs...
The server is aware of the failure as I can see logs in several places stating:
jetbrains.buildServer.SERVER - Failed to upload artifact, due to error: org.apache.commons.fileupload.FileUploadBase$IOFileUploadException: Processing of multipart/form-data request failed. Read timed out
The agent also has logs stating a similar reason:
jetbrains.buildServer.AGENT - Failed to publish artifacts because of error: java.net.SocketException: Connection reset by peer: socket write error, will try again.
During all this the firewall logs show that all traffic on the expected ports is being allowed through. What is odd though are some logs that look like this:
2016-04-01 10:45:00 Deny [sourceIp] [targetIP] 49426/tcp 8080 49426 0-External Firebox tcp syn checking failed (expecting SYN packet for new TCP connection, but received ACK, FIN, or RST instead). 558 113 (Internal Policy) proc_id="firewall" rc="101" msg_id="3000-0148" tcp_info="offset 5 A 478076245 win 258"
Examining port 49426 on the agent shows that it was being used by java.exe. Now I'm assuming this might have something to do with TeamCity as it runs in the JVM. The next step was to scour every bit of config I can find to figure out where this port number comes from. After a while the agent decided to retry and the port changed. It looks to me that java is just using whatever port it wants (as if unassigned in code) so could there be something missing in the agent config instructing it which port to use for artifact uploads?
I did read somewhere that perhaps the server or the firewall doesn't like requests or file uploads that exceed a certain size (the largest file is 81 meg) but we found nothing to suggest there was such a rule in place.
The Teamcity version is old (v7.1.1) but we are currently unable to upgrade (I am waiting on approval to use a newer, bigger server due to hard disk space issues).
UPDATE
We very briefly opened up a bit of the firewall to see if it was the cause of the issues to no avail. At this point I'm not convinced the firewall is the problem.
Any ideas?
Thanks in advance.
UPDATE 2
I've ended up setting up a whole new build server and things work just fine there. The new server has the latest TeamCity version but the agents are the same machines and artifact uploads appear to work just fine. This isn't really a solution to the question but at least I have a working setup now.
This can happen when the agent is too slow to start sending data for whatever reason. This workaround by Jetbrains employee Pavel Sher might help:
Increase the connectionTimeout value in the server.xml file
<Connector port="8111" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8543"
enableLookup="false"
useBodyEncodingForURI="true">
To 20000 to 60000 or even more.

Nginx proxy hangs in a while when idle

We are using nginx proxy_pass feature for bridging RESTful calls to a backend app, and use nginx web socket proxy for the same system at eh same time. Sometimes (guess when the system has no client request for a while) the nginx freezes any request till we restart it and then anything works well. What is the problem? DO I have to change keep-alive settings? I have turned off buffer and cache feature for proxy in nginx.conf.
I found the problem. By checking nginx error log and a bit a hackery sniff and guess, I found out that the web socket connections usually disconnect and reconnect (mobile devices) and the nginx peer tries to keep the connection alive, and then maximum connection limit reaches. I just decreased timeouts and increased max connections.

Why does Perl's Crypt::SSLeay timeout on Intel Mac OS X machines?

A have a Perl cron job that recently started having its HTTPS connections start failing with an error of "500 SSL read timeout". I've tracked that the error is being thrown as part of an alarm in Crypt::SSLeay, but I don't know if this is simply something taking too long to respond.
So far, I've adjusted the timeout from the default 30 seconds to 10 minutes and it still times out. I've moved the script to other machines, and those on Intel Mac OS X systems all time out, while those under Linux, or on PPC Mac OS X systems run fine, so I don't think it's changes on the network or remote server.
When the process started having problems does not coincide with any software updates or reboots on the machine, and I've contacted the server I'm connecting to, and everyone claims that they haven't changed anything.
Does anyone have recommendations on trying to debug HTTPS, or have you ever seen this behavior and give recommendations on something I might've overlooked at that could've caused this problem?
The problem seems to be specific to OS X and related directly to OpenSSL, so not unique to perl. It possibly has to do with one of the latest security updates from Apple (2010-001).
I'm having the same issue with:
python httplib (uploads over ~64k produce 'The read operation timed out' error). Smaller uploads over SSL work. Uploads of all sizes over HTTP work.
curl over HTTPS. curl times out. Same curl command from Linux works fine with both HTTP and HTTPS. curl on OS X over HTTP also works fine.
I found a few places online that cover similar issues across different programming languages / software. I can only post one...
https://blog.torproject.org/blog/apple-broke-openssl-which-breaks-tor-os-x