Rest server (Play Framework) gets "Read Timed out" exception during load test - rest

We are running a heavy load test (jmeter: 350 threads, 35M total requests) on a rest server using Play Framework and run into the following error after ~2 hour. We remove other components so that request simply take requests and do nothing. Anyone has any idea or simply Play Framework cannot handle heavy load like this?
2014/07/05 11:59:38 WARN - com.company.test.RestTest2: Run TestSQL throw error java.lang.Exception: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.company.dispatcher.RexsterRESTTaskDispatcher.dispatchTask(RexsterRESTTaskDispatcher.java:76)
at com.company.test.RestTest2.runTest(RestTest2.java:375)
at org.apache.jmeter.protocol.java.sampler.JavaSampler.sample(JavaSampler.java:191)
at org.apache.jmeter.threads.JMeterThread.process_sampler(JMeterThread.java:429)
at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:257)
at java.lang.Thread.run(Thread.java:744)
Part of the application.conf :
....
db.pool.timeout=100000
play {
akka {
akka.loggers = ["akka.event.Logging$DefaultLogger", "akka.event.slf4j.Slf4jLogger"]
loglevel = WARNING
actor {
default-dispatcher = {
fork-join-executor {
parallelism-factor = 64
parallelism-max = 1000
}
}
}
}
}

Had the this error today. It tool me a while to found out that one of the windows (svchost) processes was occupying the 1099 port, which the Jmeter server was trying to use.
I got a hint for this when trying to start the Jmeter-Server.bat file manually. Then, the following PowerShell command provided the details of that process. After closing that process, Jmeter clients started to connect again.
Get-Process -Id (Get-NetTCPConnection -LocalPort 1099).OwningProcess

There a many things to check:
Are you running Test from same machine ? if yes it's a problem
Is your machine TCP stack tuned ?
What is your JVM configuration regarding Xmx as long as your machine memory, CPU ...
What does your test look like ? could you show a screenshot with all elements unfolded ?
I think Play/AKKA can handle this load without problem so I would look into configuration issues.

Related

Remote Logging using Log4j2

So i have this task to log activities to a file, but it has to be done
remotely on the server side, Remote logging.
NOTE : Remote Logging has to be in latest version of Log4j2(2.10)
My task was simple
Send logging info to a port.
Log info from port to a file.
My Discoveries
Socket Appender exist which help send info to a port. This is it, you dont need to create a client side code or anything.
Socket appender configuration in log4j2.properties
appender.socket.type = Socket
appender.socket.name= Socket_Appender
appender.socket.host = "IP address"
appender.socket.port = 8101
appender.socket.layout.type = SerializedLayout
appender.socket.connectTimeoutMillis = 2000
appender.socket.reconnectionDelayMillis = 1000
appender.socket.protocol = TCP
Adapting from here. But this is also log4j 1.x adaptation.
I found out that before log4j 2.6 to listen to a port we used TcpSocketServer which started a server using LogEventBridgeThis helped reach that conclusion. This class was in core.net.server which is no longer available.Assuming it is not used anymore and the only similar/closest class, TcpSocketManager.Other links that helped. How to use SocketAppend?
Then i tried this
public static final Logger LOG=LogManager.getLogger(myapp.class.getName());
main(){
LOG.debug("DEBUG LEVEL");
}
and got the following error
main ERROR TcpSocketManager (TCP:IPAddress:8111) caught exception
and will continue: java.net.SocketTimeoutException: connect timed out
I know this work because i made it read to a socket but there was no one listening, but somehow i messed up big time and there was a code change.
I need help how to go ahead. Thank You in advance
The socket server to remotely receive log events has been moved to a separate repository: https://github.com/apache/logging-log4j-tools
This still needs to be released.

IBM BLUEMIX BLOCKCHAIN SDK-DEMO failing

I have been working with HFC SDK for Node.js and it used to work, but since last night I am having some problems.
When running helloblockchain.js only few times works, most time I get this error when it tries to enroll a new user:
E0113 11:56:05.983919636 5288 handshake.c:128] Security handshake failed: {"created":"#1484304965.983872199","description":"Handshake read failed","file":"../src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"#1484304965.983866102","description":"FD shutdown","file":"../src/core/lib/iomgr/ev_epoll_linux.c","file_line":948}]}
Error: Failed to register and enroll JohnDoe: Error
Other times, the enroll works and the failure appears deploying the chaincode:
Enrolled and registered JohnDoe successfully
Deploying chaincode ...
E0113 12:14:27.341527043 5455 handshake.c:128] Security handshake failed: {"created":"#1484306067.341430168","description":"Handshake read failed","file":"../src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"#1484306067.341421859","description":"FD shutdown","file":"../src/core/lib/iomgr/ev_epoll_linux.c","file_line":948}]}
Failed to deploy chaincode: request={"fcn":"init","args":["a","100","b","200"],"chaincodePath":"chaincode","certificatePath":"/certs/peer/cert.pem"}, error={"error":{"code":14,"metadata":{"_internal_repr":{}}},"msg":"Error"}
Or:
Enrolled and registered JohnDoe successfully
Deploying chaincode ...
E0113 12:15:27.448867739 5483 handshake.c:128] Security handshake failed: {"created":"#1484306127.448692244","description":"Handshake read failed","file":"../src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"#1484306127.448668047","description":"FD shutdown","file":"../src/core/lib/iomgr/ev_epoll_linux.c","file_line":948}]}
events.js:160
throw er; // Unhandled 'error' event
^
Error
at ClientDuplexStream._emitStatusIfDone (/usr/lib/node_modules/hfc/node_modules/grpc/src/node/src/client.js:189:19)
at ClientDuplexStream._readsDone (/usr/lib/node_modules/hfc/node_modules/grpc/src/node/src/client.js:158:8)
at readCallback (/usr/lib/node_modules/hfc/node_modules/grpc/src/node/src/client.js:217:12)
E0113 12:15:27.563487641 5483 handshake.c:128] Security handshake failed: {"created":"#1484306127.563437122","description":"Handshake read failed","file":"../src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"#1484306127.563429661","description":"FD shutdown","file":"../src/core/lib/iomgr/ev_epoll_linux.c","file_line":948}]}
This code worked yesterday, so I don't know what could be happening.
Does anybody know how can I fix it?
Thanks,
Javier.
ibm-bluemix
blockchain
These types of intermittent issues are usually related to GRPC. An initial suggestion is to ensure that you are using at least GRPC version 1.0.0.
If you are using a Mac, then the maximum number of open file descriptors should be checked (using ulimit -n). Sometimes this is initially set to a low value such as 256, so increasing the value could help.
There are a couple of GRPC issues with similar symptoms.
https://github.com/grpc/grpc/issues/8732
https://github.com/grpc/grpc/issues/8839
https://github.com/grpc/grpc/issues/8382
There is a grpc.initial_reconnect_backoff_ms property that is mentioned in some of these issues. Increasing the value past the 1000 ms level might help reduce the frequency of issues. Below are instructions for how the helloblockchain.js file can be modified to set this property to a higher value.
Open the helloblockchain.js file in the Hyperledger Fabric Client example and find the enrollAndRegisterUsers function.
Add “grpc.initial_reconnect_backoff_ms": 5000 to the setMemberServicesUrl call.
chain.setMemberServicesUrl(ca_url, {
pem: cert, "grpc.initial_reconnect_backoff_ms": 5000
});
Add “grpc.initial_reconnect_backoff_ms": 5000 to the addPeer call.
chain.addPeer("grpcs://" + peers[i].discovery_host + ":" + peers[i].discovery_port,
{pem: cert, "grpc.initial_reconnect_backoff_ms": 5000
});
Note that setting the grpc.initial_reconnect_backoff_ms property may reduce the frequency of issues, but it will not necessarily eliminate all issues.
The connection to the eventhub that is made in the helloblockchain.js file can also be a factor. There is an earlier version of the Hyperledger Fabric Client that does not utilize the eventhub. This earlier version could be tried to determine if this makes a difference. After running git clone https://github.com/IBM-Blockchain/SDK-Demo.git, run git checkout b7d5195 to use this prior level. Before running node helloblockchain.js from a Node.js command window, the git status command can be used to check the code level that is being used.

Google Cloud SQL + Hikari CP + Communications link failure

I'm experiencing intermittent connectivity errors from a Spring Boot application communicating with a D1 Google CloudSQL Server with the configuration settings described here HikariCP MySQL settings
I was wondering if anyone has encountered this before.
I've read the FAQ posted here Hikari FAQ and I'm wondering if my default idleTimeout and maxLifeTime (30 mins) settings might be at fault; wait_timeout and interactive_timeout on the server are both set to default 28800s (8 hours).
The FAQ says that these two settings should be about a minute less that the server settings, but if I'm losing connections after 30 minutes I can't quite see how upping the maxLifeTime to 7hrs 59mins is going to improve the situation.
Does anyone have any recommendations?
Redacted stack trace(s):
Get these from time to time
org.springframework.security.authentication.InternalAuthenticationServiceException: Could not get JDBC Connection; nested exception is java.sql.SQLException: Timeout after 30018ms of waiting for a connection.
at org.springframework.security.authentication.dao.DaoAuthenticationProvider.retrieveUser(DaoAuthenticationProvider.java:110)
at org.springframework.security.authentication.dao.AbstractUserDetailsAuthenticationProvider.authenticate(AbstractUserDetailsAuthenticationProvider.java:132)
at org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:156)
at org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:177)
...
Caused by: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: Timeout after 30023ms of waiting for a connection.
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80)
....
Caused by: java.sql.SQLException: Timeout after 30023ms of waiting for a connection.
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:208)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:108)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
... 59 common frames omitted
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:630)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:695)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:727)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:737)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:787)
Hibernate search:
2015-02-17 10:34:17.090 INFO 1 --- [ entityloader-2] o.h.s.i.SimpleIndexingProgressMonitor : HSEARCH000030: 31050 documents indexed in 1147865 ms
2015-02-17 10:34:17.090 INFO 1 --- [ entityloader-2] o.h.s.i.SimpleIndexingProgressMonitor : HSEARCH000031: Indexing speed: 27.050219 documents/second; progress: 99.89%
2015-02-17 10:41:59.917 WARN 1 --- [ntifierloader-1] com.zaxxer.hikari.proxy.ConnectionProxy : Connection com.mysql.jdbc.JDBC4Connection#372f2018 (HikariPool-0) marked as broken because of SQLSTATE(08S01), ErrorCode(0).
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 1,611,087 milliseconds ago. The last packet sent successfully to the server was 927,899 milliseconds ago.
The indexing isn't particularly quick at the moment I think because I'm not using projections. The process takes about 30 minutes to execute.
Thanks
It could be a couple of things here. First, the network infrastructure (firewalls, load-balancers, etc.) between the application tier and the database tier can impose their own connection timeouts, regardless of MySql settings.
The indexing failure indicates that the connection was out of the pool for ~27 minutes with no SQL activity when that failure occurred.
Second, specifically regarding the "Could not get JDBC Connection" error, you may be running into Cloud SQL connection limits.
I recommend three things. One, make sure you are on the latest HikariCP (2.3.2) and latest MySql Connector/J driver (5.1.34). Two, enable DEBUG-level logging for the com.zaxxer.hikari package. HikariCP debug logging is not "chatty", but will log pool statistics every 30 seconds (and sometimes more detail in failure conditions). Lastly, try setting the maxPoolSize to something smaller (unless already at the default), and setting maxLifeTime to 15 or 20 minutes (1200000ms).
If the error occurs again, post updated logs containing the HikariCP debug logs around the time of failure. Also, feel free to open a tracking issue over on Github as larger logs etc. are easier there.

Handling connection failures in apache-camel

I am writing an apache-camel RabbitMQ consumer. I would like to react somehow to connection problems (i.e. try to reconnect). Is it possible to configure apache-camel to automatically reconnect?
If not, how can I find out that a connection to the queue was interrupted? I've done the following test:
start the queue (and some producer)
start my consumer (it was getting messages as expected)
stop the queue (the messages stopped arriving, as expected, but no exception was thrown)
start the queue (no new messages were received)
I am using camel in Scala (via akka-camel), but a Java solution would be probably also OK
You can pass in the flag automaticRecoveryEnabled=true to the URI, Camel will reconnect if the connection is lost.
For automatic RabbitMQ resource recovery (Connections/Channels/Consumers/Queues/Exchanages/Bindings) when failures occur, check out Lyra (which I authored). Example usage:
Config config = new Config()
.withRecoveryPolicy(new RecoveryPolicy()
.withMaxAttempts(20)
.withInterval(Duration.seconds(1))
.withMaxDuration(Duration.minutes(5)));
ConnectionOptions options = new ConnectionOptions().withHost("localhost");
Connection connection = Connections.create(options, config);
The rest of the API is just the amqp-client API, except your resources are automatically recovered when failures occur.
I'm not sure about camel-rabbitmq specifically, but hopefully there's a way you can swap in your own resource creation via Lyra.
Current camel-rabbitmq just create a connection and the channel when the consumer or producer is started. So it don't have a chance to catch the connection exception :(.

How do I fix server Oops error when deploying Play Framework 2.0 application as a Windows service?

I was following the processes in this very helpful post:
How do I run a Play Framework 2.0 application as a Windows service?
when I ran into trouble in Step 9. When I execute runConsole.bat, the service cycles between the running and restarting states. The full log is here:
wrapper.log
but what jumps out at me in the log are the following:
INFO|7268/0|play.core.server.NettyServer|13-12-28 13:07:28|Oops, cannot start the server.
INFO|7268/0|play.core.server.NettyServer|13-12-28 13:07:28|Configuration error: Configuration error[Cannot connect to database [default]]
...
INFO|7268/0|play.core.server.NettyServer|13-12-28 13:07:28|Caused by: org.h2.jdbc.JdbcSQLException: Database may be already in use: "Locked by another process". Possible solutions: close all other connection(s); use the server mode [90020-168]
...
INFO|wrapper|play.core.server.NettyServer|13-12-28 13:07:28|restart process due to default exit code rule
INFO|wrapper|play.core.server.NettyServer|13-12-28 13:07:28|restart internal RUNNING
INFO|wrapper|play.core.server.NettyServer|13-12-28 13:07:28|stopping process with pid/timeout 7268 45000
INFO|wrapper|play.core.server.NettyServer|13-12-28 13:07:30|process exit code: -1
...
INFO|7812/1|play.core.server.NettyServer|13-12-28 13:07:45|[error] c.j.b.h.AbstractConnectionHook - Failed to obtain initial connection Sleeping for 0ms and trying again. Attempts left: 0. Exception: null
INFO|7812/1|play.core.server.NettyServer|13-12-28 13:07:45|Oops, cannot start the server.
INFO|7812/1|play.core.server.NettyServer|13-12-28 13:07:45|Configuration error: Configuration error[Cannot connect to database [default]]
repeats several times...
Half way through writing this post, I realized that before step 9., you should terminate the start.bat script which you start up in step 6. When I did this, runConsole.bat would execute normally without all the errors I list above.