Vertx - threads are stuck while sending response back to client - vert.x

I'm using vertx-4.2.6 to build a proxy service which takes requests from clients (for ex: browser, standalone apps etc), invoke a single thirdparty server, gets the response and send the same response back to client who initiated the request.
In this process, I'm using shared Webclient across multiple requests, i'm getting response from thirdparty quickly (mostly in milli seconds) but sometimes the response is not returned back to client and stucks at ctx.end(response).
Whenever i restart my proxy server, it serves requests sometimes without any issues but time goes on, lets say by EOD, for new requests client seeing 503 error -service unavailable I'm using one MainVerticle with 10 instances. I'm not using any worker threads.
Below is the pseudo code:
MainVerticle
DeploymentOptions depOptions = new DeploymentOptions();
depOptions.setConfig(config);
depOptions.setInstances(10);
vertx.deployVerticle(MainVerticle.class.getName(), depOptions);
.....
router.route("/api/v1/*")
.handler(new HttpRequestHandler(vertx));
HttpRequestHandler
public class HttpRequestHandler implements Handler<RoutingContext> {
private final Logger LOGGER = LogManager.getLogger( HttpRequestHandler.class );
private WebClient webClient;
public HttpRequestHandler(Vertx vertx) {
super(vertx);
this.webClient=createWebClient(vertx);
}
private WebClient createWebClient(Vertx vertx) {
WebClientOptions options=new WebClientOptions();
options.setConnectTimeout(30000);
WebClient webClient = WebClient.create(vertx,options);
return webClient;
}
#Override
public void handle(RoutingContext ctx) {
ctx.request().bodyHandler(bh -> {
ctx.request().headers().remove("Host");
StopWatch sw=StopWatch.createStarted();
LOGGER.info("invoking CL end point with the given request details...");
/*
* Invoking actual target
*/
webClient.request(ctx.request().method(),target_port,target_host, "someURL")
.timeout(5000)
.sendBuffer(bh)
.onSuccess(clResponse -> {
LOGGER.info("CL response statuscode: {}, headers: {}",clResponse.statusCode(),clResponse.headers());
LOGGER.trace("response body from CL: {}",clResponse.body());
sw.stop();
LOGGER.info("Timetaken: {}ms",sw.getTime()); //prints in milliseconds
LOGGER.info("sending response back to client...."); //stuck here
/*
* prepare the final response and return to client..
*/
ctx.response().setStatusCode(clResponse.statusCode());
ctx.response().headers().addAll(clResponse.headers());
if(clResponse.body()!=null) {
ctx.response().end(clResponse.body());
}else {
ctx.response().end();
}
LOGGER.info("response SENT back to client...!!"); //not getting this log for certain requests and gives 503 - service unavailable to clients after 5 seconds..
}).onFailure(err -> {
LOGGER.error("Failed while invoking CL server:",err);
sw.stop();
if(err.getCause() instanceof java.net.ConnectException) {
connectionRefused(ctx);
}else {
invalidResponse(ctx);
}
});
});
Im suspecting issue might be due to shared webclient. But i'm not sure. I'm new to Vertx and i'm not getting any clue what's going wrong. Please suggest if there are any options to be set on WebClientOptions to avoid this issue.

Related

Vertx event bus slow consuming issue

We have a non clustered vertx application, and we use the event bus to internally communicate between verticles.
Verticle A consumes from the bus, performs a HTTP request, and sends the response back through the bus.
Verticle B just request to perform that HTTP request.
The problem appears when a "high" request volume is performed by Verticle B. Then, the consumer starts receiving the events slower and slower (presumably because they are getting queued in the event bus). For 8 requests/second the bus takes up to 3-4 seconds to consume the event. When the requests/second are elevated, it can take more than 30 seconds to consume it, so the bus timeout is triggered.
The thing is, Verticle A is really fast performing the HTTP operation (~200ms) so I don't really understand why the requests get stuck in the bus.
We've tried many solutions but none ot then worked:
Deploy multiple instances of Verticle A as workers
Use vertx.executeBlocking() to perform the HTTP request
The only thing that worked was commenting the HTTP request and returning a mock object through the bus. But again, the HTTP request doesn't take more than 200ms, so it shouldn't be blocking the bus.
Additional information: We use an autogenerated rest client that uses Retrofit + OkHttpClient. Due to company policy, we cannot use Vertx WebClient, so I didn't try this solution.
EXAMPLE
This is a really simplified version of our code so you can check if I'm missing something.
VERTICLE A
// Instantiated in Verticle A
public class EmailSender {
private final Vertx vertx;
private final EmailApiClient emailApiClient;
public EmailSender(Vertx vertx) {
this.vertx = vertx;
emailApiClient = ClientFactory.createEmailApiClient();
}
public void start() {
vertx.eventBus().consumer("sendEmail", this::sendEmail);
}
public void sendEmail(Message<EmailRequest> message) {
EmailRequest emailRequest = message.body();
emailApiClient.sendEmail(emailRequest).subscribe(
response -> {
if (response.code() == 200) {
EmailResponse emailResponse = response.body();
message.reply(emailResponse);
} else {
message.fail(500, "Error sending email");
}
});
}
}
VERTICLE B
// Instantiated in Verticle B
public class EmailCommunications {
private final Vertx vertx;
public EmailCommunications(Vertx vertx) {
this.vertx = vertx;
}
public Single<EmailResponse> sendEmail(EmailRequest emailRequest) {
SingleSubject<EmailResponse> emailSent = SingleSubject.create();
vertx.eventBus().request(
"sendEmail",
emailRequest,
busResult -> {
if (busResult.succeded()) {
emailSent.onSuccess(busResult.result().body())
} else {
emailSent.onError(busResult.cause())
}
}
);
return emailSent;
}
}
We fixed the issue changing our OkHttpClient configuration so HTTP requests won't get stuck
default void configureOkHttpClient(OkHttpClient.Builder okHttpClientBuilder) {
ConnectionPool connectionPool = new ConnectionPool(40, 5, TimeUnit.MINUTES);
Dispatcher dispatcher = new Dispatcher();
dispatcher.setMaxRequestsPerHost(200);
dispatcher.setMaxRequests(200);
okHttpClientBuilder
.readTimeout(60, TimeUnit.SECONDS)
.retryOnConnectionFailure(true)
.connectionPool(connectionPool)
.dispatcher(dispatcher);
}

Why does my Spring WebFlux controller return data on first request only?

I am working on a web application where the user's connection times out after a specific time (say 20 seconds). For long running requests I have to return a default message ("your request is under process") and then send an email to the user with the actual result.
I couldn't do this with spring web because I didn't know how to specify a timeout in the controller (with customized messages per request) and at the same time let other requests come through and be processed too. That's why I used spring web-flux which has a timeout operator for both Mono and Flux types.
To make the requested process run in a different thread, I have used Sinks. One to receive requests and one to publish the results. My problem is that the response sink can only return one result and subsequent calls to the URL returns an empty response. For example the first call to /reactive/getUser/123456789 returns the user object but subsequent calls return empty.
I'm not sure if the problem is with the Sink I have used or with how I am getting data from it. In the sample code I have used responseSink.asFlux().next() but I have also tried .single(), .toMono(), .take(1). to no avail. I get the same result.
#RequestMapping("/reactive")
#RestController
class SampleController #Autowired constructor(private val externalService: ExternalService) {
private val requestSink = Sinks.many().multicast().onBackpressureBuffer<String>()
private val responseSink = Sinks.many().multicast().onBackpressureBuffer<AppUser>()
init {
requestSink.asFlux()
.map { phoneNumber -> externalService.findByIdOrNull(phoneNumber) }
.doOnNext {
if (it != null) {
responseSink.tryEmitNext(it)
} else {
responseSink.tryEmitError(Throwable("didn't find a value for that phone number"))
}
}
.subscribe()
}
#GetMapping("/getUser/{phoneNumber}")
fun getUser(#PathVariable phoneNumber: String): Mono<String> {
requestSink.tryEmitNext(phoneNumber)
return responseSink.asFlux()
.next()
.map { it.toString() }
.timeout(Duration.ofSeconds(20), Mono.just("processing your request"))
}
}

Timeout in Vert.x 3.9 WebClient not working as expected

I need to set a request timeout on a downstream backend call. However, the WebClient class in Vert.x 3.9 doesn't seem to work as I expected. Here's some test code for it:
package client;
import io.vertx.reactivex.core.AbstractVerticle;
import io.vertx.reactivex.core.Vertx;
import io.vertx.reactivex.ext.web.client.WebClient;
public class Timeout extends AbstractVerticle {
private static final int port = 8080;
private static final String host = "localhost";
private static final int timeoutMilliseconds = 50;
#Override
public void start() {
WebClient client = WebClient.create(vertx);
for (int i = 0; i < 100; i++) {
client.get(port, host, "/").timeout(timeoutMilliseconds).send(
ar -> {
if (ar.succeeded()) {
System.out.println("Success!");
} else {
System.out.println("Fail: " + ar.cause().getMessage());
}
});
}
vertx.timerStream(1000).handler(aLong -> { vertx.close(); });
}
public static void main(String[] args) {
Vertx vertx = Vertx.vertx();
vertx.deployVerticle(new Timeout());
}
}
I'm running the following Go server on the same host for testing:
package main
import (
"fmt"
"net/http"
)
func main() {
http.HandleFunc("/", HelloServer)
http.ListenAndServe(":8080", nil)
}
func HelloServer(w http.ResponseWriter, r *http.Request) {
fmt.Println("Saying hello!")
fmt.Fprintf(w, "Hello, %s!", r.URL.Path[1:])
}
The output for my test server shows that WebClient opens 5 concurrent connections and every request is stopped by the timeout. What am I doing wrong here? How should I set a connection timeout on the requests? The output from the client is:
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
Fail: The timeout period of 50ms has been exceeded while executing GET / for server localhost:8080
...
I would expect to only see "Success!" printed, since the Go server running on the same host should respond well within 50ms.
EDIT: Removed the vertx.close() and clarified original question... Didn't actually have the vertx.close() in my original test code, but added it when editing the SO post, so people running it wouldn't need to hit CTRL-C.
It hangs because you are blocking the main thread.
Remove this:
try {
Thread.sleep(1000);
} catch(InterruptedException ex) {
Thread.currentThread().interrupt();
}
vertx.close();
The application will keep running as long as vert.x is alive.
If you really want to close vert.x yourself, do it in a separate thread.
Or alternatively, do it with Vert.x itself:
vertx.timerStream(1000).handler(aLong -> {
vertx.close();
});
not sure what you are trying to do there, but there are multiple things that are incorrect there:
in AbstractVerticle.start() you do only start logic. also if you have async logic, then you need to use an async interface like start(Promise<Void> startPromise) and report completion properly so that Vertx waits for your start logic to finish.
you are blocking the start process here:
try {
Thread.sleep(1000);
} catch(InterruptedException ex) {
Thread.currentThread().interrupt();
}
as long as this runs, your verticle is not really started and main thread of vertx is blocked.
you never close vertx in a verticle's start! so remove this line vertx.close() and quit the running application in another way.
in general check docs to understand the process and usage of verticles.

Netty Client Connect with Server, but server does not fire channelActive/Registered

I have the following architecture in use:
- [Client] - The enduser connecting to our service.
- [GameServer] - The game server on which the game is running.
- [GameLobby] - A server that is responsible for matching Clients with a GameServer.
If we have for example 4 Clients that want to play a game and get matched to a GameLobby, then the first time all these connection succeeds properly.
However when they decide to rematch, then one of the Clients will not properly connect.
The connection between all the Clients and the GameServer happens simultaneously.
Clients that rematch first removes their current connection with the GameServer and head into the lobby again.
This connection will succeed, no errors are thrown. Even using a ChannelFuture it shows that the client connection was made properly, the following values are retrieved to show that the client thinks the connection was correct:
- ChannelFuture.isSuccess() = True
- ChannelFuture.isDone() = True
- ChannelFuture.cause() = Null
- ChannelFuture.isCancelled() = False
- Channel.isOpen() = True
- Channel.isActive() = True
- Channel.isRegistered() = True
- Channel.isWritable() = True
Thus the connection was properly made according to the Client. However on the GameServer at the SimpleChannelInboundHandler, the method ChannelRegistered/ChannelActive is never called for that specific Client. Only for the other 3 Clients.
All the 4 Clients, the GameServer, and the Lobby are running on the same IPAddress.
Since it only happens when (re)connecting again to the GameServer, I thought that is had to do with not properly closing the connection. Currently this is done through:
try {
group.shutdownGracefully();
channel.closeFuture().sync();
} catch (InterruptedException e) {
e.printStackTrace();
}
On the GameServer the ChannelUnregister is called thus this is working, and the connection is destroyed.
I have tried adding listeners to the ChannelFuture of the malfunctioning channel connection, however according to the channelFuture everything works, which is not the case.
I tried adding ChannelOptions to allow for more Clients queued to the server.
GameServer
The GameServer server is initialized as follow:
// Create the bootstrap to make this act like a server.
ServerBootstrap serverBootstrap = new ServerBootstrap();
serverBootstrap.group(bossGroup)
.channel(NioServerSocketChannel.class)
.childHandler(new ChannelInitialisation(new ClientInputReader(gameThread)))
.option(ChannelOption.SO_BACKLOG, 1000)
.childOption(ChannelOption.SO_KEEPALIVE, true)
.childOption(ChannelOption.TCP_NODELAY, true);
bossGroup.execute(gameThread); // Executing the thread that handles all games on this GameServer.
// Launch the server with the specific port.
serverBootstrap.bind(port).sync();
The GameServer ClientInputReader
#ChannelHandler.Sharable
public class ClientInputReader extends SimpleChannelInboundHandler<Packet> {
private ServerMainThread serverMainThread;
public ClientInputReader(ServerMainThread serverMainThread) {
this.serverMainThread = serverMainThread;
}
#Override
public void channelRegistered(ChannelHandlerContext ctx) throws Exception {
System.out.println("[Connection: " + ctx.channel().id() + "] Channel registered");
super.channelRegistered(ctx);
}
#Override
protected void channelRead0(ChannelHandlerContext ctx, Packet packet) {
// Packet handling
}
}
The malfunction connection is not calling anything of the SimpleChannelInboundHandler. Not even ExceptionCaught.
The GameServer ChannelInitialisation
public class ChannelInitialisation extends ChannelInitializer<SocketChannel> {
private SimpleChannelInboundHandler channelInputReader;
public ChannelInitialisation(SimpleChannelInboundHandler channelInputReader) {
this.channelInputReader = channelInputReader;
}
#Override
protected void initChannel(SocketChannel ch) throws Exception {
ChannelPipeline pipeline = ch.pipeline();
// every packet is prefixed with the amount of bytes that will follow
pipeline.addLast(new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE, 0, 4, 0, 4));
pipeline.addLast(new LengthFieldPrepender(4));
pipeline.addLast(new PacketEncoder(), new PacketDecoder(), channelInputReader);
}
}
Client
Client creating a GameServer connection:
// Configure the client.
group = new NioEventLoopGroup();
Bootstrap b = new Bootstrap();
b.group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.TCP_NODELAY, true)
.handler(new ChannelInitialisation(channelHandler));
// Start the client.
channel = b.connect(address, port).await().channel();
/* At this point, the client thinks that the connection was succesfully, as the channel is active, open, registered and writable...*/
ClientInitialisation:
public class ChannelInitialisation extends ChannelInitializer<SocketChannel> {
private SimpleChannelInboundHandler<Packet> channelHandler;
ChannelInitialisation(SimpleChannelInboundHandler<Packet> channelHandler) {
this.channelHandler = channelHandler;
}
#Override
public void initChannel(SocketChannel ch) throws Exception {
// prefix messages by the length
ch.pipeline().addLast(new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE, 0, 4, 0, 4));
ch.pipeline().addLast(new LengthFieldPrepender(4));
// our encoder, decoder and handler
ch.pipeline().addLast(new PacketEncoder(), new PacketDecoder(), channelHandler);
}
}
ClientHandler:
public class ClientPacketHandler extends SimpleChannelInboundHandler<Packet> {
#Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
super.channelActive(ctx);
System.out.println("Channel active: " + ctx.channel().id());
ctx.channel().writeAndFlush(new PacketSetupClientToGameServer());
System.out.println("Sending setup packet to the GameServer: " + ctx.channel().id());
// This is successfully called, as the client thinks the connection was properly made.
}
#Override
protected void channelRead0(ChannelHandlerContext ctx, Packet packet) {
// Reading packets.
}
}
I expect that the Client could connect properly to the server. Since the other Clients are properly connecting and the client could previously connect just fine.
TL;DR: When multiple Clients try to create a new match, there is a possibility that one, possibly more, Client(s) will not connect properly with the server, after the previous connection was closed.
For some that struggle with this issue in some way or another.
I did a workaround that allows me to continue even tho there is still a bug inside the Netty framework (as far as I am concerned). The workaround is quite simple just create a connection pool.
My solution uses a maximum of five connections inside the connection pool. If one of the connection gets no reply from the GameServer, then it is not that big of a deal, since there are four others that will have a high chance of succeeding. I know this is a bad workaround, but I could not find any information on this issue. It works and only gives a maximum delay of 5 seconds (each retry takes a second)

GWT RequestFactory : Send a request in ClosingHandler

I'm developping an audit service for a GWT-based application with the RequestFactory framework. I have some trouble to audit the user logout using a ClosingHandler. Here's my code:
A sum up of my audit service:
private static final int MAX_CACHE_SIZE = 15;
private int cacheSize = 0;
private AuditServiceRequestContext context;
#Override
public void audit(String event, String details) {
if (context == null)
context = createContext();
AuditServiceRequestContext cxt = createContext();
context.append(cxt);
AuditProxy proxy = cxt.create(AuditProxy.class);
/* intialize the proxy with event and details */
cxt.persist(proxy);
if (++cacheSize >= MAX_CACHE_SIZE)
flush();
}
public void flush() {
context.fire();
cacheSize = 0;
context = null;
}
How I currently handle the log out event:
Window.addWindowClosingHandler(new ClosingHandler() {
#Override
public void onWindowClosing(ClosingEvent event) {
audit.audit("logout", "the user has closed the app");
audit.flush();
}
});
The data are persisted but the request fails because of the HTTP request on /gwtRequest doesn't return any response (status canceled on the chrome's developer tools).
Any idea to solve this issue ?
EDIT:
Strangely, there is no error using a CloseHandler with Window#addCloseHandler(CloseHandler). Don't understand why, but it works (and if someone can explain it to me, I really enjoy) :D
When you're navigating away from the page, the browser cancels ongoing requests. Because you make yours at window closing, you cannot even be sure the request was sent over the wire and reached your server. There's no workaround.
One possibility, but which is likely to fail too, is to open a new window so you can safely make requests there, and then close that window when you're done. It's likely to fail however as such windows are likely to be blocked by browsers' popup blockers (built-in or addons).