We have an existing play server app to which mobile clients talk via web sockets (two way communication). Now as part of load testing we need to simulate hundreds of client requests to the server.
I was thinking to write a separate play client (faceless) app and somehow in a loop make 100s of requests to a server app? Given that I am new to web sockets, does this approach sound reasonable?
Also what is the best way to write a faceless web socket client that makes web socket requests to a web socket server?
If you want to properly validate the performance of your application, it is very important to :
simulate the behavior of real users by simulating real "websocket" connections
- reproduce a realistic end-user journey on the application utilizing the websocket channel
It's important to generate the proper user workflow ( actions done by a user when receiving a websocket message). For example in a betting application users interact with the applicaiton depending on the messages received by the browser.
To be able to generate a realistic load test, I would recommend to use a real loadtesting software supporting Websocket. It will allow you to generate Different kind of users, with different kind of network, different kind of browsers....etc
What is the framework use by your applicaiton? Depending on the framework i could recommend the proper tool for you need.
You have to make a difference between hundreds of clients and hundreds of requests from the same client.
When you have hundreds of clients, the requests can come in at the same time.
When you only have one client, requests will mostly come in sequentially (depending on using one or multiple threads).
When you only have one client, you can perfectly send requests using a loop. What you will actually measure here is the processing latency of the server.
When you want to simulate multiple clients, this is a bit more difficult. If you simulate them from one machine, the requests are pipelined through the network card and hence the requests are not really send in parallel. Also, you are limited by the bandwidth of the machine. Suppose the server has a 1Gb connection and your test machine has a 1Gb connection, then you can never overload the bandwidth of the server. If your clients are supposed to have a limited bandwidth like 50Mb, then you can run 20 clients (not taking into account the serialisation that happens through the network card).
In theory, you should use as many machines as the number of clients you want to test. In reality, you would use a number of machines each running a limited number of clients.
Regarding a headless test application, you could use a headless browser testing framework like PhantomJS.
I have written a simple websocket client using Node.js.
If the server is open and ready to accept the request then you can fire the requests as written below,
const WebSocket = require('ws')
const url = ws://localhost:9000/ws
const connection = new WebSocket(url)
connection.onopen = () => {
for (var i=0;i<100;i++) {
connection.send('hello')
}
}
connection.onmessage = (event) => {
console.log(event.data)
}
connection.onerror = (error) => {
console.log(`WebSocket error: ${error}`)
}
Related
Background
We are writing a Messenger-like app. We have setup Websockets to Inbox and Chat.
Question
My question is simple. What are the advantages and disadvantages when sending data from Client to Server using REST instead of Websockets? (I am not interested in updates now.)
We know that REST has higher overhead in terms of message sizes and that WS is duplex (thus open all time). What about the other things we didn't keep in mind?
Here's a summary of the tradeoffs I'm aware of.
Reasons to use webSocket:
You need/want server-push of data.
You are sending lots of small pieces of data from client to server and doing it very regularly. Using webSocket has significantly less overhead per transmission.
Reasons to use REST:
You want to use server-side frameworks or modules that are built for REST, not for webSocket (such as auth, rate limiting, security, streaming, etc...).
You aren't sending data very often from client to server and thus the server-side burden of keeping a webSocket connection open all the time may lessen your server scalability.
You want your client to run in places where a long-connected webSocket during inactive periods of time may not be practical (perhaps mobile).
You want your client to run in old browsers that don't support webSocket.
You want the browser to enforce same-origin restrictions (those are enforced for REST Ajax calls, but not for webSocket connections).
You don't want to have to write code that detects when the webSocket connection has died and then auto-reconnects and handles back-offs and handles mobile issues with battery usage issues, etc...
You need to run in situations where there are proxies or other network infrastructure that may not support long running webSocket connections.
If you want request/response built in. REST is request/response. WebSocket is not - it's message based. Responses from a webSocket are done by sending a messge back. That message back is not, by itself, a response to any specific request, it's just data being sent back. If you want request/response with webSocket, then you have to build some infrastructure yourself where you tag an id into a request and the response for that particular request contains that specific id. Otherwise, if there are every multiple requests in flight at the same time, then you don't know which response belongs with which request because all the data is being sent over the same connection and you would have no way of matching response with request.
If you want other clients to be able to carry out this operation via an Ajax call.
So, if you already have a webSocket implementation, don't have any problem with it that are lessened with REST and aren't interested in any of the reasons that REST might be better, then stick with your webSocket implementation.
Related references:
websocket vs rest API for real time data?
Ajax vs Socket.io
Adding comments per your request:
It sounds like you're expecting someone to tell you the "right" way to do it. There are reasons to pick one way over the other. If none of those reason compel you one way vs. the other, then it's just an architectural choice and you must take in the whole context of what you are doing and decide which architectural choice makes more sense to you. If you already have the reliably established webSocket connection and none of the advantages of REST apply to your situation then you can optimize for "efficiency" and send your data to the server over the webSocket connection.
On the other hand, if you wanted there to be a simple API on your server that could be reached with an Ajax call from other clients, then you'd want your server to support this operation via REST so it would simplest for these other clients to carry out this one operation. So, it all depends upon which direction your requirements drive you and, if there is no particular driving reason to go one way or the other, you just make an architectural choice yourself.
I'm trying to build instant messaging app. Clients will not only send messages but also often send audios. And I've decided to use websocket connection to communicate with clients. It is fast and allows to send binary data.
The main idea is to receive from client1 message and notify about it client2. But here's the thing. My app will be running on GAE. And what if client1's socket is opened on server1 and client2's is opened on server2. This servers don't know about each others clients.
I have one idea how to solve it, but I am sure it is shitty way. I am going to use some sort of communication between servers(for example JMS or open another websocket connection between servers, doesn't matter right now).
But it surely will lead to a disaster. I can't even imagine how often those servers will speak to each other. For each message server1 should notify server2, server2 should notify client2. But things become even worse when serverN comes into play.
Another way I see this to work is Firebase. But it restricts message size to 4KB. So I can't send audios via it. As a solution I can notify client about new audio and he goes to my server for it.
Hope I clearly explained the problem. Does anyone know how to solve it? Or maybe there are another ways to build such apps?
If you are building a messaging cluster and expect communicating clients to connect to different instances of the server then server-server communication is inevitable. Usually it's not a problem though.
First, if you don't use any load balancing your clients will connect to the same server 50% of time on average (in case of 2 servers).
Second, intra-datacenter links are fast and free in all known public clouds.
Third, you can often do something smart on the frontend to make sure two likely to communicate clients connect to the same server. For instance direct all clients from the same country to the same server using DNS load balancing.
The second part of the question is about passing large media files. It's a common best practice to send it out of band - store on the server and only pass the reference to it. Like someone suggested in the comment, save the audio on the server and just send a message like "audio is available, fetch it from here ...". You don't need to poll the server for that. Just fetch it once when the receiving client requests it.
In general, it seems like you are trying to reinvent the wheel. Just use something off the shelf.
Let all client get connected to multiple servers and each server keeps this metadata
A centralized system like zookeeper stores active servers details
When a client c1 sends a message to client c2:
the message is received by a server (say s1, we can add a load balancer to distribute incoming requests)
s1 will broadcast this information to all other servers to get which server the client c2 is connected to OR a better approach to use consistent hashing which decides which server the client can connect to & in this approach message broadcast is not required
the corresponding server responses to server s1 (say s2)
now s1 sends the message m to s2 and server s2 to client c2
Cons of the above approach:
Each server will have a connection with the n-1 servers, creating a mesh topology
Centralized system (zookeeper) becomes a single point of failures (which is solvable)
Apps like Whatsapp, G-Talk uses XMPP and TCP/IP.
I am currently working on a project that requires the client requesting a big job and sending it to the server. Then the server divides up the job and responds with an array of urls for the client to make a GET call on and stream back the data. I am the greenhorn on the project and I am currently using Spring websockets to improve efficiency. Instead of the clients constantly pinging the server to see if it has results ready to stream back, the websocket will now just directly contact the client hooray!
Would it be a bad idea to have websockets manage the whole process from end to end? I am using STOMP with Spring websockets, will there still be major issues with ditching REST?
With RESTful HTTP you have a stateless request/response system where the client sends request and server returns the response.
With webSockets you have a stateful (or potentially stateful) message passing system where messages can be sent either way and sending a message has a lower overhead than with a RESTful HTTP request/response.
The two are fairly different structures with different strengths.
The primary advantages of a connected webSocket are:
Two way communication. So, the server can notify the client of anything at any time. So, instead of polling a server on some regular interval to see if there is something new, a client can establish a webSocket and just listen for any messages coming from the server. From the server's point of view, when an event of interest for a client occurs, the server simply sends a message to the client. The server cannot do this with plain HTTP.
Lower overhead per message. If you anticipate a lot of traffic flowing between client and server, then there's a lower overhead per message with a webSocket. This is because the TCP connection is already established and you just have to send a message on an already open socket. With an HTTP REST request, you have to first establish a TCP connection which is several back and forths between client and server. Then, you send HTTP request, receive the response and close the TCP connection. The HTTP request will necessarily include some overhead such as all cookies that are aligned with that server even if those are not relevant to the particular request. HTTP/2 (newest HTTP spec) allows for some additional efficiency in this regard if it is being used by both client and server because a single TCP connection can be used for more than just a single request/response. If you charted all the requests/responses going on at the TCP level just to make an https REST request/response, you'd be surpised how much is going on compared to just sending a message over an already established webSocket.
Higher Scale in some circumstances. With lower overhead per message and no client polling to find out if something is new, this can lead to added scalability (higher number of clients a given server can serve). There are downsides to the webSocket scalability too (see below).
Stateful connections. Without resorting to cookies and session IDs, you can directly store state in your program for a given connection. While a lot of development has been done with stateless connections to solve most problems, sometimes it's just simpler with stateful connections.
The primary advantages of a RESTful HTTP request/response are:
Universal support. It's hard to get more universally supported than HTTP. While webSockets enjoy relatively good support now, there are still some circumstances where webSocket support isn't regularly available.
Compatible with more server environments. There are server environments that don't allow long running server processes (some shared hosting situations). These environments can support HTTP request, but can't support long running webSocket connections.
Higher Scale in some circumstances. The webSocket requirement for a continuously connected TCP socket adds some new scale requirements to the server infrastructure that HTTP requests don't demand. So, this ends up being a tradeoff space. If the advantages of webSockets aren't really needed or being used in a significant way, then HTTP requests might actually scale better. It definitely depends upon the specific usage profile.
For a one-off request/response, a single HTTP request is more efficient than establishing a webSocket, using it and then closing it. This is because opening a webSocket starts with an HTTP request/response and then after both sides have agreed to upgrade to a webSocket connection, the actual webSocket message can be sent.
Stateless. If your job is not made more complicated by having a stateless infrastruture, then a stateless world can make scaling or fail-over much easier (just add or remove server processes behind a load balancer).
Automatically Cacheable. With the right server settings, http responses can be cached by browser or by proxies. There is no such built-in mechanism for requests sent via webSockets.
So, to address the way you asked the question:
What are the pitfalls of using websockets in place of RESTful HTTP?
At large scale (hundreds of thousands of clients), you may have to do some special server work in order to support large numbers of simultaneously connected webSockets.
All possible clients or toolsets don't support webSockets or requests made over them to the same level they support HTTP requests.
Some of the less expensive server environments don't support the long running server processes required to support webSockets.
If it's important to your application to get progress notifications back to the client, you could either use a long running http connection with continuing progress being sent down or you can use a webSocket. The webSocket is likely easier. If you really only need the webSocket for the relatively short duration of this particular activity, then you may find the best overall set of tradeoffs comes by using a webSocket only for the duration of time when you need the ability to push data to the client and then using http requests for the normal request/response activities.
It really depends on your requirements. REST services can be much more transparent and easier to pick up by developer compared to Websockets.
Using Websockets, you remove most of the advantages that RESTful webservices offer, such as the ability to reference a resource via a URI. Really what you should be doing is to figure out what the advantages are of REST and hypermedia, and based on that decide whether those advantages are important to you.
It's of course entirely possible to create a RESTful webservice, and augment it with a a websocket-based API for real-time responses.
But if you are creating a service that only you are going to consume in a controlled environment, the only disadvantage might be that not every client supports websockets, while pretty much any type of environment can do a simple http call.
I am building a distributed system that consists of potentially millions of clients which all need to keep an open (preferrably HTTP) connection to wait for a command from the server (which is running somewhere else). The load of messages / commmands will not be very high, maybe one message / sec / 1000 clients which means it would be 1000 msg/sec # 1 million clients. => it's basically about the concurrent connections.
The requirements are simple too. One way messaging (server->client), only 1 client per "channel".
I am pretty open in terms of technology (xmpp / websockets / comet / ...). I am using Google App Engine as server, but their "channels" won't work for me unfortunately (too low quotas and no Java client). XMPP was an option but is quite expensive. So far I was using URL Fetch & pubnub, but they just started charging for connections (big time).
So:
Does anyone know of a service out there that can do that for me in an affordable way? Most I have found restrict or heavily charge for connections.
Any experience with implementing such a server yourself? I have actually done that already and it works pretty well (based on Tomcat & NIO) but I haven't had the time yet to actually set up a large load test environment (partially because this is still a fallback solution, I'd prefer a battle hardened msg server). Any experience to how many users you get per GB? Any hard limits?
My architecture also allows to fragment the msg servers, but I'd like to maximize the concurrent connections because the msg processing CPU overhead is minimal.
I have meanwhile implemented my own message server using netty.io. Netty makes use of Java NIO and scales extremely well. For idle connections I get a memory footprint of 500 bytes per connection. I am doing only very simple message forwarding (no caching, storage or other fancy stuff) but with that am easily getting 1000 - 1500 msg / sec (each half a KB) on the small amazon instance (1ECU / 1.6GB).
Otherwise if you are looking for a (paid) service then I can recommend spire.io (they do not charge for connections but have a higher price per message) or pubnub (they do charge for connections but are cheaper per message).
You have to look more in architecture of making such environment.
First of all, if you will write sockets management by yourself, then don't use Thread per Client Socket. Use Asynchronous methods for receiving and sending data.
WebSockets might be too heavy if your messages are small. Because it implements framing, which has to be applied to each message for each socket individually (caching can be used for different versions of WebSockets protocols), that makes them slower to process both directions: for receive and for send, especially because of data masking.
It is possible to create millions of sockets, but only most advanced technologies are capable to do so. Erlang is able to handle millions connections, and is pretty scalable.
If you would like to have millions of connections using other higher level technologies, then you need to think about clustering of what you are trying to accomplish.
For example using gateway server that will keep track of all processing servers. And have data of them (IP, ports, load (if it will be one internal network, firewalling and port forwarding might be handy here).
Client software connects to that gateway server, gateway server checks the least loaded server and sends ip and port to client. Client creates connection directly to working server using provided address.
That way you will have gateway which as well can handle authorization, and wont hold connections for long, so one of them might be enough. And many workers that are doing publishing of data and keeping connections.
This is very related to your needs, and might not be suitable for your solutions.
I've been asked about the possibilities for writing an ejabber module for an internal application. I am opposed to the idea, but I'm not sufficiently familiar with xmpp to support my response, and perhaps I'm wrong.
When google did wave they chose xmpp; and I understand that choice; real time communication between multiple people. Same goal here.
...but it feels to me like a customized server plugin isn't the right answer.
The issues I see are:
1) You lose sync with the server development and have to go through merge hell to ensure security updates, patches, etc. on the server are patched.
2) Any heavy customization of the server means you're probably expecting to be passing special mark up messages to interact with the server plugin; that means you'll have to do heavy client customization as well.
There is an alternative route:
Standard XMPP server. two customized xmpp clients; one for the client and one for the server.
The server client opens a connection to the XMPP server and sits and waits.
Multiple front end clients open connections to the XMPP server and then use xmpp to open connections optionally: 1) to each other and 2) to the server client user.
The front end can then perform real time updates by talking to the server client. It can even subscribe to multiple server client users and have incoming 'activity streams' for multiple different concurrent tasks.
This has the advantages of:
1) You only need to solve the XMPP problem once (client library)
2) Your application server is never externally visible; only the XMPP server is externally visible, which is massive security win.
3) You can use whatever XMPP server infrastructure you want without any issue.
4) You will never have a server update that causes your application server to become 'legacy' and unable to use those apis any more (short of a complete XMPP protocol update).
Downside:
You application server client needs to be complicated enough to handle multiple requests, or have multiple workers or something (but this scales using resource fields and have multiple application servers from different machines connecting to the XMPP network).
...but, I'm not that familiar with the technology.
Is there any reason why the alternative I've suggested would be worse than a customized xmpp server?
XMPP is used in Google Wave/Wave in a Box only for Federation, i.e. only for server to server communications. This is in order to take advantage of existing XMPP capabilities like discovery protocol. The messages are transported in binary form between servers inside XMPP packets. The Web clients use WebSockets/Socket.IO to communicate with the server. Actually that was the reason to argue about developing an alternative pure HTTP based Federation protocol.